Flux review, installation, tutorial. Flux is the new king of AI image generation
#flux #ainews #ai #...
Video Transcript:
finally we have an image generator that can actually generate accurate hands and fingers plus it can also generate text accurately not only that but it can follow tricky prompts really well plus it can even generate these mediocre lowquality selfie images that normal people take so I mean with this new image generator it would be almost impossible to tell AI photos apart from real photos let's do a quick test here I'm using this same prompt but with this new image generator called flux and also two other state-of-the-art image generators stable diffusion 3 and sdxl so these three images are based on the same prompt now between left right and Center I'm not going to tell you which model is which you let me know which one looks the best to you so here's the first prompt this image shows three young African children two boys and a girl standing on a dirt ground they are all smiling and making a P sign with their fingers the girl is wearing a green and yellow patterned dress and the boys are wearing casual clothes the boy on the left is holding the girl on top of her while the boy in the middle is holding her up in the air with both hands all right so let me know which image you think is best left right and Center by the way for each model I'm keeping their positions left right and center the same I'm not shuffling any of their positions so here's the next one the image shows three three children sitting in the trunk of a red car the car is parked on a dirt road with trees and bushes they are sitting on a blanket and are eating slices of watermelon in their hands the boy on the left is wearing a blue striped shirt and shorts the girl in the middle is wearing blue shorts and a white T-shirt the girl on the right is wearing an orange shirt so really tricky prompt there's a lot of details here let me know which one looks best to you all right next one The Prompt is a photo of a blond-haired woman with a pistol in each hand and an open mouth facing the camera so let me know which one looks best to you and then next we have the famous prompt of a woman lying on grass so it's pretty obvious here which one looks the best all right here's another comparison let me know which one do you think looks the best so the prompt here is a young woman playing a bass guitar on a stage she is wearing a black dress and black boots she has long blonde hair styled in loose waves and pay attention to her fingers and the bass right a base should have four strings only so which image do you think looks best in this case all right here's another comparison so here the prompt is a young woman standing on a street with her back to the camera she's wearing a white T-shirt black skirt knee high socks brown shoes she has a large blue backpack on her back with a brown strap and a small teddy bear on it so let me know which one looks best to you and finally here we have a young woman standing on a sidewalk with her arms raised in the air she's wearing a gray T-shirt with a graphic of a dog on it a white skirt and white sneaker she has a yellow backpack on her back and is smiling at the camera next to her there is a small white dog possibly a Pomeranian sitting on the sidewalk so let me know which one looks best to you all right right next The Prompt is a photograph of a blackhaired woman in a white dress with blood stains sitting on a red couch to the right of a chiming clock holding a rose in her left hand with three skulls at her feet a very tricky prompt let me know which one looks best to you all right next we have anime so here is a anime girl with massive fluffy fenic ears and a fluffy tail blonde messy long hair blue eyes wearing a made outfit long black dress etc etc eating a slice of apple pie in the kitchen of an old dark Victorian Mansion let me know which one looks the best and not only the quality but which one actually follows the prompt the most all right next one this is a young woman with long blonde hair and bunny ears on her head kneeling in front of an open refrigerator she is wearing a white tank top and black shorts and is holding a black black phone in her hand the fridge is filled with various fruits and vegetables there's a potted plant on the right side of the image the woman appears to be taking a selfie all right so let's reveal the answer so the left side is the new image generation model called flux and then the middle one is stable diffusion 3 this is the latest version of stable diffusion and in theory it should be the best one and then on the right we have stable diff Fusion XL now I think we can objectively say that the left one flux looks the best in most cases so let's go over the prompts really quickly the first one the prompt is three African children so you can see flux is the only one with three kids and then making a p sign flux is the only one with decent hands and fingers actually making a p sign the one in the middle sd3 they're kind of making a p sign but the fingers are still messed up so in terms of hands and fingers flux is the clear winner here and then the girl is wearing a green and yellow pattern dress which is what we're seeing anyways next one so both flux and sd3 are pretty good both of them show that there are three children sitting in the trunk of a red car however in terms of quality you can see again flux just crushed it this photo looks really good the faces are very detailed they're each holding a slice of watermelon the toes actually look real whereas for stable diffusion 3 the faces are kind of blurry the toes are not really accurate and the girl in the middle is missing a watermelon so again for this one I would give the point to flux all right and then here I mean both flux and sd3 got the prompt correct but again it seems like flux is just better quality especially if you're going for this cinematic realistic style flux is really good and then this one woman lying on the grass st3 is Infamous for not getting this correct and generating quite grotesque images and even sdxl sometimes it works sometimes it doesn't like you can see here this woman has an extra hand so it's just not great but as you can see flux nailed this plus her hands and fingers are actually accurate all right so here I would say again flux is the clear winner here in terms of hands and fingers and the base guitar flux is the only one that was able to generate a bass guitar with four strings and the strings are actually straight and the Frets look very realistic as well even the drums in the background look more realistic than the other two examples and then here I would say stable to Fusion 3 followed the prompt a bit better so we did mention that she's wearing brown shoes We're not seeing that for flux but again for flux the image quality is just way better than the other two I mean it's not even close and then here again I would say flux is the clear winner here it followed the prompt rout really well it was even able to generate a graphic of a dog on her shirt plus it's the only one who could generate a white Pomeranian but yeah you can see that flux not only followed the prompt really well but in terms of image quality it's also a lot better than stable diffusion and then here again if you just compare the quality then the flux image looks the best none of them got three skulls at her feet so the flux one had four skulls sd3 had two skulls but they both generated the woman correctly all right now in terms of anime in terms of image quality I would give it to sdxl there are just so many good anime models built for sdxl it's been around for a while so you can expect that some sdxl models do a lot better than the newest flux or stable diffusion 3 however in terms of following the prompt I would have to give it to again flux it's the only one who was able to actually generate the girl eating a slice of apple pie the other two generators did not generate a slice of apple pie all right next one again I think the clear winner here in terms of quality and following the prompt is flux so this woman is kneeling down she does have bunny ears and she is holding a black phone she does have a white tank top and black shorts sd3 looks pretty good but you can see her left hand is missing and her right foot looks really weird so again I think the point goes to flux it can also generate at lowquality cell phone photos like this or this or this so you can see it generates the iPhone pretty accurately plus it Nails the hands and the fingers plus this looks like a legit lowquality selfie shot from an actual human you might be used to AI photos being really polished and everything just looks so perfect so they were quite easy to tell apart but right now flux can generate such average looking photos I mean it's in insanely hard to tell this apart from real photos it's also great at generating text so here's one example here's another example of text and here's another example and note that she has five fingers her hands are perfect speaking of fingers finally we have an AI image generator that actually gets hands and fingers correct so here are some examples it doesn't matter the position of the hand it could be like pouring tea it could be playing against guitar but the fingers do look correct to some extent plus note that the strings on this guitar and the Frets on this guitar are actually straight none of the other image generators could get this consistently here's another example of hands tying a shoe and you can see it nailed this one as well here's another example and another example so I mean this is the only AI image generator that can actually get hands and fingers correct at least most of the time and you know even mid Journey which was the the leading closed Source image generator it still could not generate hands and fingers really well so I mean now with flux you can forget about mid Journey forget about stable diffusion flux is by far the best image generator out there right now so in this video we're going to go over how you can use it including online methods as well as how to install and run it locally if you have a good enough GPU plus I'll go over the architecture and details about the model so really quickly flux is a new image generator that was announced recently and they were developed by black forest Labs now this is quite a new startup and it seems like a lot of the team members were actually from stability AI in fact they claimed that they were the original creators of stable diffusion XL and stable video diffusion plus many other well-known tools now for flux they've actually released three models so I'll go over each of them the first one is called Schnell and this is the fastest model so think of it as like a turbo version for a stable diffusion model this is completely free and open source however this is the worst quality out of the three models this is the smallest one it's the fastest but it's the worst quality so it's kind of a water down version I'll show you a comparison in a second and then next we have the dev model this is slower but it's much better quality this is also free and open source so you can download it and run it locally however this this is only for non-commercial use and if you do want to use it commercially well you need to get in touch with them to discuss further and then finally they have their pro version and this is slightly better than the dev version and this is the best quality out of the three models however this is paid and closed Source you cannot download the weights and run this locally but this is the best quality out there now I'll go over the architecture and benchmarks and more technical details about flux but that's kind of boring so I'll leave that till the end of the video for now let's dive in and actually try this out thanks to two magic for sponsoring this video are you looking to Kickstart your YouTube channel or take your existing channel to the next level look no further than tube magic an AI powered tool that helps you succeed on YouTube their built-in keyword research function lets you sort keywords by search volume competition and an exclusive Magic score which automatically finds the best keywords for your videos imagine never running out of content ideas again with the video ideas tool you can simply drop in any channel link and Tube magic will generate video ideas related to that channel plus it even writes scripts and outlines for you so you're already 80% of the way there instead of starting from scratch you can also upload an unlisted video link and it will generate the best title description and tags for you by the way this is created by YouTuber Matt par who runs the popular make money Matt Channel and over 12 other channels so tube magic is backed by years of experience and success many big YouTubers are already using it to streamline their content so if you want to grow fast try out tube magic for free via the link in the description below so there are a few places online where you can use it for free one option is this space on replicate which I'll link to in the description below so you can see it's pretty simple to use you just enter into prompt flux doesn't even have negative prompts so it's just one positive prompt and then you can change the aspect ratio to whatever you want and then guidance this is how well it follows your prompt so if you drag this all the way to zero it doesn't follow your prompt as well and it would spit out some random image if you drag this all the way to the right it might follow your prompt two literally and give you some unwanted results so it's best to keep this at 3. 5 which seems to be their default and then seed is just the starting point of your image so usually you would leave this to blank which sets it to a random value but if you do set the seed to the same number and you keep all the settings the same then in theory it should generate the same image that you got before and then output format is just well what file type do you want webp jpeg or PNG and then output quality is just well how much compression do you want on your image so let's just leave everything at the default all right let's test this out for the prompt I'm going to put a man wearing glasses writing in his diary and then click run so note that flux is quite bulky and it takes a much longer time to run compared to stable diffusion however in most cases you're going to get a higher quality result compared to stable diffusion so here indeed we have a man he's wearing glasses he's writing in his diary and the pen actually looks straight and his fingers and his hands actually look real so this is really impressive let me test out an even trickier prompt so here we have a zebra with rainbow Stripes playing a grand piano on a mountain top the piano is made of ice and in the background the Northern Lights illuminate the sky so let's click run and see if it can generate that oh my God this is just so incredible there's just one minor flaw which is it has three arms here but this is the only generator so far that can actually get a zebra with rainbow Stripes to actually play a piano made of ice with Northern Lights in the background previously I did a review on another new image generator called oraflow and compared that with stable diffusion and it's not as good as this I mean this looks amazing let's try another example so here the prompt is a girl in a Victorian era dress holding a sign that says flux is King she stands in a garden filled with flowers and butterflies let's see what that gives us all right well it kind of gave us this image in a Disney Pixar style but I did not specify for this to be a realistic photo so whatever and you can see she does have accurate fingers and hands and she isn't a Victorian era dress she is holding a sign that says flux is King the letters are very accurate plus there are flowers and butterflies again just nailed the prompt everything is correct plus the image quality is actually really good super impressive now if you run out of daily credits on this replicate space you can also use the hugging face Space by black forest Labs themselves so there's actually two spaces one is for Schell this is the faster but lower quality model or you could use another space which uses flux Dev which is the slower but better quality model I'll link to both of these in the description as well now since we have both of them open this is actually a good time to show you the difference between Chell and Dev so as I mentioned Schnell is kind of a watered down version so the only reason to use it is if your GPU isn't good enough so you could only run this faster but smaller model but in most cases Dev is the way to go the quality is a lot better than Schnell anyways let's paste both prompts in here so we have a beautiful woman with long brown hair doing yoga at home so I'm going to paste this in the dev space plus I'm going to paste the same thing in the Schnell space face and then let's click run I'm going to set these side by side and you can see Schnell is already done so here we have a woman doing yoga at home and her hands and fingers are perfect she is indeed doing yoga previously I did a comparison with another new tool called aaow plus stable diffusion 3 plus stable diffusion XL and here are the generations so you can see they are awful compared to what flux is able to generate all right so now our flux Dev image is also done and you can see just the colors and the details of the image it's a lot better than Schnell I find that Schnell is kind of oversaturated and the contrast is too much at times whereas this gives you a more cinematic feel all right let's try a different prompt so here the prompt is a ballerina girl with butterfly wings dancing on a lily pad in a Serene Pond anime style so I'm going to click run for both and note how quickly Schnell runs compared to Dev it runs way faster wow and we are already done so here we have a ballerina girl with butterfly wings dancing on a lily pad again it just follows the prompt really well there are some flaws so her face isn't really detailed I'm sure if you bump up the resolution it would be a lot better plus we have some flower floating in midair so that's not really accurate and then here's the same prompt but with Dev so you can see that again in terms of image quality Dev is just better looking than Schnell schel seems to be oversaturated plus the contrast is too high all right so those are some ways you can use it for free online now if you do want to install this and run it locally I'm going to show you that right now note that there are a lot of steps and a lot of things to download plus you do need at least 12 GB of vram on your GPU to run this well as well as 32 GB of RAM on on your computer also this is using comfy UI so you do need to have this installed if you're new to comfy UI I'm going to post a beginner-friendly tutorial on how to install and use it probably by next week so stay tuned for that so let's go over how you can install this locally I'm going to link to this guide by comfy Anonymous in the description below so the first thing to do is to download these safe tensor files so let's click on this link and first we need to download this clip l. saf tensors so let me download that and in our comy UI folder we need to download this in models SL clip so let's save this in here and then now depending on how much vram you have on your graphics card you can download this larger version or the smaller version I'm going to go with this one so it doesn't kill my GPU by the way I have a Dell Precision 5690 laptop which has an Nvidia RTX 5,000 Ada with 16 GB of vram shout out to Dell and Nvidia for sponsoring this but so with 16 GB I would still prefer to use this smaller version so it doesn't kill my GPU if you have like a 4090 with 24 GB of vram then this fp16 would be a better option but anyways I'm going to download this and also put it in my models SL clip folder all right next step is we need to download this vae and it should go in your model /ve folder so if you click on this link it's actually broken so instead you should go to the Black Forest Labs hugging face page which I'll link to in the description below and then when you scroll down you should see these two models the Schnell which is the faster but lower quality model or the dev model now again just so I don't kill my GPU I'm just going to use the Schnell version but if you have a good enough GPU you can go with this Dev version so I'm going to click on this and then in this files and versions tab you should see this AE doaf tensor so this is your vae file so let's go ahead and download this and this goes in models SLV so let's save this in here and then finally we need to download the fuel.
safe denters file or the dev safe dentes file if you're using that model and we would download that in our models SL unet folder so let me click save all right so I'm going to open up my comfy UI folder and make sure all the files are there so in this models folder in this clip folder you should have this clip l. safe tensors file as well as this t5x XL fp8 or fp6 Sav ders file and then let me go back to the models folder and then in unit you should have flux Shel or dev.