There's been a lot of exciting (and weird) advancements in AI Image generation. Let's break it down....
Video Transcript:
AI image generation has seen a huge surge in Innovation Lately by now we've probably all seen the crazy uncensored unhinged images coming out of grock 2 they're realistic they'll generate pretty much anything but like nudie content and they're probably going to end up getting exuded from all sorts of different directions however under the hood grock 2 is just using the flux one model from Black Forest labs and if you've been watching this Channel at all lately you'll have noticed that I've been pretty much going ham on using flux lately my latest videos some of the more popular videos around the topic were this one where I showed you how to make super realistic images and this one where I showed you how to train your own face into the model well this week we've gotten even more AI image generation rollouts and quite honestly it's starting to feel like mid journey is taking notice and making some changes to make their product a little bit more appealing to you but let's just start with what was released most recently this week we got a brand new mod model from idiogram called idiogram 2. 0 it's their most advanced textto image model and it's now available to all users for free now if you're not familiar with idiogram I made some videos on it several months ago but it was pretty much the first text to image generator that was really good at actually adding text into your images and I actually believe it's using its own Foundation models so it's not built on top of something like stable diffusion or Dolly 3 or flux or something like that it's its own unique model model and with this latest roll out of idiogram 2. 0 it's pretty much on par with most of the other AI image generators out there the problem is since it's not an open model we don't have things like control Nets and luras and IP adapters and inpainting and out painting and all of the stuff that you can do with all these other models yet however one thing it does have going for it is that anyone can use it for free right now so in this video what I want to do is I want to put idiogram to the test I want to play with some other new AI models that are coming out and I want to sort of pit them against each other and see what's good at what so in order to do this I actually use clad to generate four different prompts that I'm going to use in a bunch of different image generators and we'll see how they all stack up against each other I'm going to be testing for four things in this video there are a lot more things I can test for and I'll probably do more tests like this in future videos but for this video I'm going to test human realism landscape and scenery incorporating text into images and just weird and absurd images and I had Claude generate prompts for each one of those so for example for human realism I use the prompt a close-up portrait of a weathered elderly fisherman with deep wrinkles wearing a yellow raincoat and knit cap against a stormy sea background and when I pluged that into idiogram I got this image this image this image and this image they all look pretty decent I mean this one looks like he might be blind or something this one to me is is looking a little bit noisy same with this one I'm seeing a little bit of noise but this one to me is really really impressive next up I wanted to test landscape and scenery so I did a Serene Japanese Zen Garden at Twilight with carefully Rak sand patterns moss covered rocks and cherry blossom trees in full bloom here's the first image the idiogram gave me here was the second here was the third and here's the fourth and they all look pretty good no complaints with any of them so incorporating text this is idiogram has and still does really stand out so I did a mystical Forest Clearing where wisps of fog spell out the words magic awaits in flowing ethereal lettering between ancient trees here's generation number one you can see looking pretty good there generation number two I really love the colors of this one however the background looks a little wonky here's the third generation looks pretty good all of the text is perfect in all four of these images and here is the four one magic awaits and it looks excellent and finally for weird and absurd I did a steampunk inspired octopus riding a unicycle made of clockwork gears juggling neon cubes while floating in bubble TC so a lot of elements this is kind of testing weirdness and absurdity but also testing how much of the things it actually gets into the image how adherent is it to my prompt and well here was the first one that it came up with we've got our octop push we've got our unicycle made of clockwork gears juggling neon cubes while floating in a boba TC now it kind of put the Boba T on the side so it's not really in the sea but it did a pretty damn good job of getting all of those elements in there here's the second generation we've got our cubes these kind of look more like marbles to me we've got our Clockwork gear unicycle once again it pretty much got all the elements here's the third one not seeing the Boba T element but everything else it did a pretty good job on and here's number four once again got everything we see the steampunk style the cubes hard to say this is Boba te but pretty dang close with all the elements idog 2 is pretty impressive now on their tweet here they say it is free for all users there is a caveat if we click on their manage subscription page we can see that you can get 10 credits per day since it generates four Images per credit that'll generate about 40 images per day beyond that you do have to pay but you can do 10 generations of four Images each every single day with it now with flux and idiogram getting all of this attention lately I think mid journey is starting to take notice they're probably starting to feel a little bit of heat because all of these free and just as good alternatives are popping up and many of them are either free or much less expensive to use now I don't know if this is a coincidence or not but the same day that idog announces 2.
0 and that you can actually use it for free mid Journey made this announcement the mid Journey web experience is now open to everyone we've also temporarily turned on free trials to let you check it out so idiogram drops the day that I'm recording this mid Journey announces they're opening up free trials the day that I'm recording this so now anyone can generate a handful of images inside of mid Journey now the free trial with mid Journey also comes with a caveat if we check out their announcement in the Discord here they say hey everyone today we're opening up the web image creation to everyone to let people play with the new site we've temporarily turned on free trials roughly 25 images so you get a total of 25 images to test it and I think that's total I think once you've gone through your 25 images you're done you got to pay for Mid journey to use it anymore that's not per month per day anything like that I think 25 images if you want to keep using mid Journey you got to pay unfortunately mid Journey hasn't rolled out a ton of new amazing features lately they did sort of update the way they do in painting and they've added some cool features over the last few months but nothing super recently but of course I wanted to test the same prompts inside of mid journey to see how they compare to what we got out of idiogram and don't worry you don't need to rewind the video to see what I did on idiogram I'll show you the side-by-side comparison a little bit later in the video here but when I did a close-up portrait of a weathered elderly fisherman with deep wrinkles wearing a yellow raincoat and knit cap against a stormy SE background it actually generated four pretty dang decent images they're all pretty realistic I almost feel like the wrinkles are almost too HD to feel like perfect but they're really really good images here's our Serene Japanese garden at Twilight with carefully R sand patterns moss covered rocks and cherry blossom trees in full bloom it looks like it got all of those elements we've got the moss covered rocks the sand patterns the cherry blossoms Twilight it hit on all the marks so mid journey is pretty much doing it just as good as ideogram text is where mid Journey kind of falls apart a little bit right now so a mystical Forest Clearing where wisps of fog spell out the word magic awaits in flowing ethereal lettering between ancient trees well the trees and the fog look amazing but none of this looks like magic awaits to me this looks like migy aists and that's Maggie and imp and then Maggie gigus something like that and then finally a steam inspired octopus riding a unicycle made of clockwork gears juggling neon cubes while floating in a bubble TC yeah mid Journey just kind of falls apart when you want it to adhere to a whole bunch of things in a single prompt I don't see anything that says steampunk about these none of these look like unicycles we've got the octopus we've got the Cubes but the rest kind of didn't make the cut for this generation but like I said there's been a lot happening in the world of AI image generation and well just talking about idiogram in mid journey is not really a lot we got another new AI image generation model from the company free pick now this company free pick it's kind of similar to canva but they recently acquired the AI upscale platform magnific and this new mystic model seems to be kind of the first like real collaborative thing they've done together since the companies have combined forces now personally I'm unclear if this is a brand new Foundation model that they created or if they you know fine-tuned or added their own sort of Pipeline on top of like a stable diffusion or a flux or something like that I'm not totally sure I'm sort of operating under the impression that it's Its Own Foundation model but I can't conclusively say that right now however just like the other Alternatives the outputs are pretty dang good now this one only generates one image at a time when I give it a prompt but here's a close-up portrait of a weathered elderly fisherman with deep wrinkles wearing a yellow raincoat and knit cap against a stormy SE background it got it all except for the C is not really super clear behind it a Serene Japanese Zen Garden at Twilight with carefully ra sand patterns moss covered rocks and cherry blossom trees in full bloom it pretty much got all of that except I'm not really seeing the sand rake pattern on this one it also kind of struggled with the text it's pretty close but you can see that this doesn't really look like a word it sort of Blended the W and the a together on a weights not a horrible image by any means it just sort of fell apart a little bit on the text and then finally here's our octopus I'm not seeing anything about it that tells me this is a steampunk octopus it did seem to get the Clockwork gears a little bit I mean it's got a clock I don't know if I'd really consider that gears but it's got a clock and it's got the neon cubes and the bubble TC I could I can buy it these look like little bubble te bubbles I mean they're pink but I'll buy it now similar to idiogram free pick has a limited amount they have daily limits to AI image generation here I'm not actually sure what the limit is at the exact moment because if we take a look at this Mystic model you can see that it's still in Alpha and it actually says coming soon they actually sent me early access so I can play around with this a little bit ahead of time and show off what it's capable of but it's going to be rolling out really really soon and we'll know more about what sort of limits they're going to put on you once it's actually publicly rolled out and of course we still have the amazing model from Leonardo called Phoenix which in my opinion generates some pretty dang amazing images as well well here's what I got with my old man in the sea prompt down here it seemed to have gotten everything I asked for in these prompts and quite honestly I'm pretty impressed with the images that come out of it I love the contrast of what Leonardo Phoenix gives you here's our zen garden with the moss covered rocks and all that stuff from The Prompt did it perfectly it also did the text pretty good on three out of four so we got magic awaits on this one this one and this one they all look great this one in the middle a little funky with the text but it actually does a pretty good job on the text and once again I'm loving the colors I just I think Phoenix does really really great colors and then finally our octopus here I'm going to go ahead and pick this one and say it's probably looking the best we've got our steampunk elements we've got our Clockwork gear unicycle we've got it juggling cubes and we got our bubble te I mean they're like bubbles close enough looks pretty good now I am an adviser to Leonardo mostly because I think it's a fantastic product but I'm always a little worried that I'm going to come off too overly biased towards them because I am an adviser but I do genuinely think it is as good as any of the other models that are out there right now and of course you still have all the originals that are available out there you've got stable diffusion you've got Dolly 3 you've got Adobe Firefly we've now got image in 3 from Google there are so many other image generators out there and to be quite honest they've all kind of caught up to each other in fact I made this board here inside of figma that compares all of the models that I'm aware of and have access to at the moment to play with and you can see I put the four prompts across the left side here and then all of the various models across the top here so I've got idiogram 2. 0 mid Journey 6. 1 free pick Mystic Leonardo's Phoenix flux .
1 which I generated directly inside of grock we've got Dolly 3 stable diffusion 3 Firefly 3 from adobi meta emu image in 3 from Google and then we've got playground V3 over here I'll actually link up to this figma board in the description below if you want to take a peek because one thing that's really cool about figma and I've only recently started playing with figma is if I hold down control and then zoom in all of these images are actually full quality when we zoom in on them so you can actually look at some of the more fine details of any one of these images that I've shown off by zooming in on this so you can see here when we look at idiogram here's our close-up portrait of a weathered elderly fisherman with deep wrinkles etc etc here's idiogram I'll zoom in on this one move over here you can see mid Journey 6. 1 compared to it slide over a little bit here is the free pick Mystic version of it here's the Leonardo Phoenix version of it flux one that I generated in Gro here's the dolly 3 version that I generated stable diffusion 3 Firefly 3 which kind of looks like Santa wearing a yellow jacket honestly here's the meta emu version of it the imin 3 version of it and the playground version 3 of it all of them did a pretty dang good job and I think what we're seeing here is that pretty much all of the models have kind of converged and are pretty much equally as good at realism if there's one that I say is a low point for realism it's probably this one here with firefly 3 it just looks a little off to me here and then this one here from playground 3 it would be decent but it made the eye is just a little too blue with our zen garden here's how idiogram compares to Mid Journey 3 right here here's how it compares to Mystic again you can see that it's sort of missing the patterns in the sand but other than that it got pretty much everything here's the Leonardo Phoenix version the flux version the dolly 3 version our stable diffusion 3 version the one that came out of firefly and then we have emu imagin and playground 3 for that prompt so you can kind of see the comparison and again I will link this in so you can zoom in and scroll around and look at it all now taking a peek at incorporating text once again here's our image in two model versus mid Journey which doesn't quite do the text very well here's the free pick Mystic version Leonardo Phoenix flux one Dolly 3 which I'm really impressed with how good it's doing text these days it wasn't doing text very well a couple months ago here's stable diffusion 3 kind of didn't get the text amazing but for the most part pretty good it's just messed up the G it kind of looks more like a backwards R here's Firefly it just absolutely refuses to do text in the image didn't even try here's meta's emu actually pretty good at text imagine 2 pretty good at text again and then here's the playground V3 it actually did a decent job of the text it actually kind of more looks like the text is made out of smoke so pretty impressive on that one and then finally when we're looking at our prompt aderit here on our weird abstract steampunk octopus image in 3 kind of crushed it here's mid Journey we already talked about that one it's kind of missing a few of the elements here's the version out of free pick Mystic our version out of Leonardo Phoenix here's what flux one generated it kind of put human legs it didn't do a unicycle it kind of made it more like a bicycle they're definitely not in a sea of boba te but it's got an octopus in some Cubes but no Clockwork no real steampunk it yeah flux did not do great with the prompt adherence here we have Dolly 3 Dolly 3 also kind of crushed it we've got the steampunk we've got the Clockwork unicycle the cubes the Boba it got it all in there Dolly 3 to this day still hands down the best at being prompt adherent here stable diffusion three it got some of the elements there's no steampunk here the unicycle looks all jacked up it doesn't look like A Clockwork unicycle there's nothing to do with bubble te Boba te here but it did get cubes in an octopus in there and I actually do kind of like the colors of it I think it did a good job with the color palette here's what Adobe Firefly generated quite honestly better than I would have expected out of firefly we've got the steampunk We've Got A Clockwork looking unicycle we've got some bubbles we've got the octopus there's no cubes nothing that says this is like bubble te but it got a lot of the elements there pretty surprising honestly here's meta's emu also put legs on more of like a bicycle motorcycle looking thing but it does have the steampunk aesthetic the octopus and the cubes here's imagine three octopus cubes unicycle steampunk legs Boba most of the elements are there and then finally here's playground V3 we got a big old jug of boba here we've got some cubes we've got our octopus we've got our unicycle there's some gears here I'm not seeing the steampunk aesthetic but but it got quite a bit of the elements not my favorite sort of color palette and Aesthetics but it did a pretty decent job so all this to say pretty much all these image models are almost just as good as each other in most of these areas I do want to do some deeper dive testing you can see here I put more prompts to come because I want to think of more prompts to test different areas of these models and then I'll keep on building this out and adding new images with new prompts to test different types of outputs that I want to try to get from these mod and see how they do but if you're sitting there going I want to get into AI image generation I don't know what model to use hopefully this helps you sort of gauge what's better at what I'd say Dolly 3 is still the king at prompt adherence so if you have like nine things in your prompt and you want to make sure they're all in the image Dolly 3 is going to crush that for you if you want realism they're all pretty much catching up to each other I think flux one is probably the king of realism at the moment mid journe is pretty good I actually don't think this mid-journey image does mid Journey Justice on realism I've seen a lot better more realistic images than what I got out of this specific prompt but mid journe is pretty dang good at realism I mean idiograms looking pretty dang good at realism you really can't go wrong Firefly 3 and playground V3 are the two that I'm like N I don't know about that if you want to get text in your image idiogram is definitely going to be the king of that one right now but Phoenix does an amazing job luux seems to do a pretty good job Dolly 3 does a really good job imagine 3 is doing a pretty good job playground V3 is doing a pretty good job most of the models are doing a pretty decent job at getting the text into the image so maybe just avoid Firefly if you need to get that text in it and maybe avoid mid Journey if you need to get text in the image mid journe is not doing amazing at that still and if you're just looking at overall Aesthetics that's just a little too subjective for me to tell you what's the winner we're all going to have different tastes on what we think the best Aesthetics are in an image personally know I'm biased but personally I love love the Aesthetics of Leonardo Phoenix and I also think typically mid Journey does a really amazing job with the Aesthetics and the color contrast and things like that but again I'm going to do more tests I'll probably make a part two add four more prompts to this and test different areas that I didn't test in this video I will link up this board for fig mut down below so you can get in zoom in look at the fine details see if there's anything that you don't like or do like about them and maybe help you make a decision on what the best image generator is for your needs again let's go over pricing real quick idiogram 2. 0 free for 10 images a day mid Journey 6.
1 25 images total on the free trial mystic not quite available yet Leonardo Phoenix has a limited amount of free credits flux one if you are a grock user you can use it directly inside of grock but I also made a video about how to use it with glyph where you can actually generate images with glyph so check that out Dolly 3 you can generate 100 images a day for free using Bing's image Creator stable diffusion 3 I'm sure there's some free models out there I think you can find it on hugging face and generate on hugging face Firefly 3 I think you need to be an Adobe Creative Cloud member I'm not 100% sure I'm an Adobe Creative Cloud member so I was able to do it within my Creative Cloud I don't actually know the pricing to just use Firefly it may just be free meta's emu you can use it for free directly inside of Instagram and WhatsApp and Facebook messenger imagine 3 you can use it inside of Google's AI Test Kitchen for free right now playground version three they give you a certain amount of credits per day to use it for free right now the most expensive model you can possibly use is mid journey and mid Journey knows that which is why they just opened up their free trial again and that's my long probably over complicated breakdown of the current landscape of AI image generation but the beautiful part about all of this is that as this competition heats up us as the consumers we're the ones who win we have free options we have super realistic options we have options that'll do text we have uncensored options we have ultr prompt adhered options and pretty much everything in between if you have an idea in your brain for an image there is a tool out there right now that will be able to generate that image and to me that's pretty cool hopefully you enjoyed this video If you do give it a thumbs up maybe consider subscribing to this channel if you want to hear more about AI tools and news and the latest announcements and cool stuff in the AI world I'd really really appreciate it check out future tools.