How to create your own Flux AI Model 💥 (Flux LoRA Fine Tuning) 💥

15.6k views2383 WordsCopy TextShare

1littlecoder

In this tutorial, you'll learn 1. How to create Dataset for LoRA Fine-tuning of Flux Dev Model 2. H...

Video Transcript:

I'm going to show you how you can take a bunch of pictures of you and create an AI model with which you can generate unlimited images of yours AI generated this entire training process is going to take less than $5 I'm going to show you step by step this video is going to have three different parts the first one data set creation the second one the fine-tuning and the third one the final inference after we have the model ready we're going to use OST AI Lura toolkit on replicate for that we need to create a

zip folder of all our training images now first let's talk about the training images because this is one of the underrated discussions anytime I see any Lowa tutorial the first thing that you need to have is you need to have a wide range of images from my experiment and from various online communities flux Laura seems to work completely fine even with 10 images I would say maximum go till 15 or 12 do not go beyond that because it might mess up your final result also when you are going to look at the images you have

to make sure that there are not elements present in that image that might spoil your fine-tuning for example one of the example that I recently came across online is having a callar mic or even a type of glass a type of something that your subject always has got so have different varieties of images a wide range of images different shots of images different angles of images different sizes of images so once you have these 10 images all you have to do is put these 10 images together in a zip file and then have it ready

additionally in my case I was not focusing on the dimension of images sometimes you might see that I also have a Lowes image which I was initially worried that it might not produce good result But ultimately it produced good result also I was not sticking to only one type of images I created multiple images one of the thing that you have to make sure is that if you want much higher quality you can also create a caption data set for all the images for example if you take this image then you can say that okay

a man standing behind or in front of a tree so for whatever images that you have got you can create a caption and then for the respective images you can store this in a txt file if you're not going to do that replicates interface will do that for you but that is one of the easiest ways to improve the quality of images after we have our data set ready then we have to go to this particular link and create our model this model gets created created as a placeholder inside replicate so just give a name

that you would be able to use for recognizing this model later then click input images and upload your file once again I'm uploading whatever you're seeing is without the caption file but you can also do it also create a Trier trigger word or the token that you want to use to call this model later on so whenever you use this particular trigger word it would basically call or invoke this particular character or the particular profile or particular style inside the broader flux so flux plus Laura with the trigger word then enable the auto caption and

once you have the auto caption which is by default enabled that would automatically caption the image then say what is the image that you are uploading so it's like a prefix so in this case I've said photo of danush because I'm trying to create a Lura for a human being but if you're trying to do it for a style then you can do it slightly different you can say in the style of so that once that is done you can experiment with this the most important thing in this entire process is the number of steps

that you want to train I want it to be safe because I didn't want like a terrible model after like let's say th000 steps so I went ahead with 2,000 but I know personally a lot of people who have got a lot of good result even with 1,000 steps and 1200 and 1500 steps so you can either take that risk like go with 1200 1500 or you can just try to play safe as like me but the only thing that you have to remember is the larger the steps the more compute time and also the

more time that you're going to spend in training this model with 2000 step it was taking about like 45 minutes on h100 but depending upon what you want to do like for example I would easily say that you can go ahead with 1500 instead of 2,000 and then believe that you're going to get a good model one of the ways you can store this model on the cloud is using hugging faces model Hub this is completely optional but it is highly recommended so go to your hugging face profile and create a new model and give

a preferred name like in this case I'm training danush who's the actor and I'm using danush flux so now that model I've created I've created as a repo and I've made it public somehow I had an issue with private repo so I just went ahead with public even if you try train for your personal model then later on you can make it private after you have created the repo then you have to go to your settings and then go to access tokens and create a new right permission token in my case which already exist so

I can just invalidate and refresh and copy the token if you do not have the token create a new token with right permission for replicate to write the model into the hugging face repository so add the token here and mention the repo now when you mention the repo all you have to do is copy this here and then mention it not with HTTP and all those things come back here and mention this so your profile name and the model repo now at this point you're pretty much set with everything youve got like the number of

steps you've got the name youve got the file you've got the trigger word and you have got ultimately uh everything set up for you to go ahead with this any changes you want to make you have to make it now and uh you cannot make it later one another important thing is the trigger word should be something that is unique not necessarily a very common word now click training and once you click training as you can see it is going to begin the process of training or fine tuning specifically speaking fine tuning this model and

once the fine tuning process starts it takes about like 45 minutes to to complete the good thing unlike Google collab here is that you don't have to panic that your browser is going to get closed you don't have to panic that you're not going to save you don't have to panic that you're going to run out of spaces so the good thing here is that one you can download the model within this interface and two the model will also get a return on hugging faces model Hub and three it is on the cloud so it

is going to happen so 45 minutes later we will see the model one important thing that I wanted to highlight here is that this particular character you might think it is part of training data set one of the reason I picked up this person danush a South Indian actress actor is that I didn't want to pick somebody who is already part of training data I'm not trying to improve wanted to show you that if you were to do this for you or me then how would it look I could have done it with my pictures

but I honestly don't have like 10 good quality pictures so that's why I went ahead with danush and I also wanted to pick somebody who is not vo schin so that you know people can understand the differences and nuances of coming from a different background so in this case I picked danush and as you can see here the picture that you saw here is nothing like danush and uh what we will see later on after we have finetuned the model is how closely realistic the model creation or the AI image generation is to what the

original picture in itself 45 minutes later we have the model successfully trained so 40 minutes approximately and again if you have lesser steps it will take lesser time you can download the waights and keep it just for posterity so it will download two files I think the Lura safe tenser file and also configuration file once you have got downloaded it you can use it anytime you want so there are a couple of ways to use this model one you can go ahead and then use it within replicate and also you can verify whether you have

got the model so as you can see here there's a config file and also the Lura safe tensor file already stored inside our hugging face repo now what we're going to do is we're going to use this hugging face repo as our input and then connected to a different interface which is the Lowa inference see once you have the model you don't have to run it only on replicate you can run it on Google collab you can run it on local machine I think on Mac it takes about like 40 seconds add the HF Laura

which is the same Lura repo that we created once you have added it there then you can play with the Lura strength Lura strength is to indicate how much you want to give importance to Laura imagine there is like this generation happening there is the base model and there is Lura so how much you want to give Lowa importance then go ahead and then change the prompt in this case I'm just giving danush as a keyword as you can see here the trigger word as a Superman flying in the sky I mean this is by

no means a great prompt as you can see here it's a very naive prompt but the good thing with flux unlike stable diffusion is that flux is really good in adhering to the prompt so whatever prompt you give despite being a bad prompt flux would do a tremendous job of generating what you want it takes a couple of seconds and then you have the image here so youve got danush which in this case is very close to how danush would look in real life and flying in the sky as a Superman I think with this

image my only issue is with eyes but if you just look at it a long shot then this is a pretty good image uh you can try with different prompts the main thing is that you need to have the token word or the trigger word here so let's try to create like a professional LinkedIn head shot I mean this is one of the biggest use cases for this like at $5 rupees you can create unlimited head shots for you and you can own the model in itself rather than paying money to somebody so very simple

prompt photo of danush for LinkedIn headshot professional photo DSLR quality I would encourage you to improve the prompt further but let's see how this prompt is going to respond so our image is ready and this is the professional headshot we didn't even have to say that we want a suit we want this kind of background like you know uh let's say a depth of field or a book image but we can see that it has already managed to generate the that good image which a lot of services on internet would charge you like $50 $100

and this is like a model that you ultimately own so let's try with the text I mean Laura is sorry flux is really good with text so we're going to compare the flux character that we created in this case danush with a text like a banner saying I'm not danush once again I'm by no means this is a great prompt but let's take a look at it one of the ways also you can improve the image is by increasing the number of inference steps so we have got the image and it says I'm not danush

the final image that I generated as part of this experiment is photo of danush in Star Wars scene with a green lightsaber I don't think that it gave me a Star Wars scene but the green lightsaber was good even though it is not like exactly straight but overall I was quite impressed with the quality of Lowa that we generated and I think I spent about like less than $5 including the inference so I started with like something around like 95 96 and I've got like still 92 uh this is this is quite insane like now

training your own model is kind of a commodity there you don't have to pay for any service except that you are paying for a computer and then training your own model and then storing your own model and you can use it whenever you want to use it I hope this tutorial was helpful to you it's quite elaborate and please share it with your friends subscribe to the channel if you have not subscribed see you in another video Happy prompting