FIRST LOOK: Flux Kontext Pro and Kling 2.1 Released

13.87k views4361 WordsCopy TextShare

Bob Doyle Media

First Look at Flux Kontext Pro : https://bit.ly/3Z2g5da Wow! Editing via text chat just took a quan...

Video Transcript:

You are hearing it here first. We now have the power of iterative chatbased image editing like we have with Gemini and GPT. Now with the power and quality of the Flux Pro model.

Welcome back to the channel where we discuss the creative uses of AI. And I am super excited to be among the first to show you this new Flux model that allows you to create images from scratch with it or iterate changes in existing images just with chatbased commands while leaving anything you don't want changed completely intact. And the secret is in good prompting.

Starting today, this new Flux Context Pro model is available over at Open Art. And you can use it in a couple of ways. You can certainly use it just like you would any other model by clicking on create image and choosing the Flux Context model from your list of models and just give it a prompt.

Closeup of an old man playing the ukulele on a park bench in New York City. For now, I'm going to skip the Omni reference. I'll come back to that and then we'll choose an output size.

I will go with cinema and click on create and get a couple of outputs. So now we've got our flux generated images of this old man playing ukulele and it is great. Everything's coherent, right number of fingers, everything's nice quality.

It's good. So if we just want to make one change to this and not do a whole series of changes, we can do that right here in this normal create interface by using the omni reference feature. Now, this allows you to drag in any image and start doing iterative chatbased editing to it using this prompt area.

In this case, instead of bringing in an image from the outside, we're going to refer to our history just by clicking on history here, choosing this guy here, and clicking on confirm. Once this image is in this omniresence area, now anything I type on here will act like a prompt to change it. Now, the key with getting the results that you want with this, which is basically leaving the original image or anything you don't want changed completely alone, maintaining facial integrity and all of that, is a very specific prompt.

You don't want to do generalized prompts because that gives it way too much flexibility to make some changes. That doesn't mean that simple prompts never work. So, if you really want to try one, we can do it.

And we'll see how much it affects the overall image. So, let's just change this guy's cap to, let's say, a Chicago Cubs cap. And we'll use a very simple prompt.

We'll just say change the cap to a Chicago Cubs cap. I won't mess with auto enhance or anything like that. And we'll click on create.

If you've used the flux models on Open Art before, you know that they're extremely fast. And this is no different. We get these in just a few seconds.

And now you can see that we have the man in the same position and he is wearing the cap. And actually did a pretty good job. If we blow these up and go from image to image, it absolutely didn't change a thing.

And that was with a simple prompt. But if we ever get into a situation during this demonstration where things are changing a little bit, we can be more specific with the prompt. But if you've used any of these other chatbased image solutions before, you know that generally you get some kind of change.

And every time you iterate, the face changes a little bit more or you lose some details. But you can see this acts like an inpainting. There's nothing that changes.

Not one pixel other than the cap. And that is pretty amazing. Let's change the ukulele into a mandolin.

But we want to use one of these images with the Chicago Cubs cap. At this time, if I was to try to change this and go to the history, it doesn't show what we just did. And to be able to access that, I would just need to refresh this page.

So then when I click on Omni Reference and click here, I can click on history. And now my changes with the Chicago Cubs cap are there. So I can click that and click on confirm.

And now I can say change the ukulele to a cherry red starburst mandolin and click on create. We got a couple of things wrong here. Number one, that's a guitar and not a mandolin.

But you do see now that the image has changed. We zoomed out a little bit. We got more information, but you'll see that the general position of the man is the same.

The clothing is the same. There's all kinds of consistency here. It just adjusted the aspect ratio a little bit without my permission.

and it didn't actually do a mandolin, but it did a guitar with four strings. Now, if we wanted to keep iterating on this, it would be very cumbersome for us to have to keep reloading the page or even going back to history and all of these things and keep updating this omni reference. Luckily, we don't have to do that because, as I showed you the other week, they have added this chat interface that allows you to interact through text prompts with these various models.

And they have added the flux context model. If you'll remember, they had the Gemini model and three different versions of the GPT model. And as good as they were, the more you iterated, the more the faces changed.

And sometimes the faces were completely different no matter what model you used. Here we don't have that problem. And if we use this interface, it's much much easier to do iterative changes.

Let's start with a new image and we'll go to history. We'll choose our guy with the Chicago Cubscap again. And now with the flux context model chosen, anything we type in here will be our prompt, if you will, the instructions for change.

So I'm going to say it is heavily snowing in the park and all else about the image is being slowly covered with snow. There you go. Man, that is good.

That is so impressive. that we can compare just by clicking back and forth here. And you will see it left everything exactly how it is.

That's got to be blowing your mind. Let's try and add something to the scene with a prompt. Put a garden gnome next to the man, leaving everything else the same.

Okay, it zoomed out a little bit to give us room for the garden gnome, but the garden gnome has a light dusting of snow. This man is still in the same position, and he still looks exactly the same. So, let's start with our new image.

I'll use an old picture of me because it'll be easier for me to track the changes and notice if anything weird happens to the face. I've got this picture not originally AI generated if you can believe it. We're out in the beautiful mountains of Nevada.

So, let's say change the shirt into a blue and purple tank top with the initials BDM. Definitely best practices to put any text in quotes on it. We'll see if it zooms out or whatever, but I'll say leaving all other details the same.

Although I am editing out the generation time, just know it is less than 10 seconds every time. So again, we've got no change except for that. It is like we selected it with an inpainting tool.

It is absolutely stunning. How about we'll go to this one. Change the glasses to sunglasses with a gradient blue and purple lens.

Really pushing it here. Oh my god. Oh, I just realized it says PDM instead of BDM, but we can fix that.

But oh man, oh man, that's good. You can see pixels did change here, but the mountain is the same. The clouds are still the same shape.

Everything's the same. It's just shifted a little bit to make room for these glasses or do whatever change, but none of the identifying characteristics of my face have changed. Unbelievable.

Let's see if I can change that P into a B. change the P on his shirt into a B. I should probably put this in quotes, too.

And I'll say leaving everything else the same. That's for Bob Do Media, in case that wasn't clicking. Look at that.

Oh my god, it's so good. So good. This tool is also great for restyling images.

Now, of course, we have the character lab here, which can allow you to easily change your picture into any of these styles here, but we can also be very, very specific with a prompt. So, let's just say change to a watercolor style. We'll just start with something simple like that.

I might wanted to use the word convert. Nope, that's all right. That worked really well.

So, we got a great watercolor style there. Now, rather than iterating on this one and trying a different style, I'm going to go back to the original and we'll try another style. Let's see.

Convert to Salvador Dolli style. No idea. I'm going to try something because I talk all the time about the power of doing this iterative chat is that you can make specific changes between two images, which makes it perfect for being able to orchestrate very precise start and end frame animations.

So, let's change this one so that he's making the okay sign with his fingers. The man in the tank top is making the okay sign with his fingers on his right hand, leaving everything else in the image the same. Perfect.

Okay. So now I'm going to save these two images out just by clicking download here as a JPEG is fine. Clicking download here.

JPEG is fine. And then I'm going to jump over here real quick to videos. And we'll go with the cling 1.

6 model. It has the start and stop frames. I'm just going to drag this one in as the start frame.

And then click over here to inframe optional. Upload the second one. Pretty simple prompt here.

Probably don't even need to do it, but I will say man makes okay sign with his hands. And maybe we can put some activity in the background because the start and end frame everything in the background is going to be exactly the same. So maybe I can say something like as Bigfoot runs by in the background.

And see if we can add some motion as Bigfoot runs by in the uh background. And I'm going to go with 10 seconds to give it plenty of time for Bigfoot to do that thing. Although this is going to be a pretty slow Okay, I don't know what's going to happen.

Let's just click create. We do get a very Salvador Dolly type of Bigfoot there. And it runs not in the background, but all over him.

It's an amusing take on this for sure. He kind of waits till the very last second to put his hands up there, but it does follow the prompt. I'm sure with a few more generations or playing with the seat, I could get something a little bit different.

Now, if you've created any characters in the Open Art platform, you can access them directly from here and then chat your way into whatever reality you want to create for them. Let's choose this little one here, Fatudi. Right.

And I'll say, "Put this character on a bar stool in a steampunk bar. " Okay, awesome. So, it's actually the same position basically as our original image because we didn't tell it to do anything other than that.

We didn't have to remove a background. We didn't have to do any in painting. We didn't have to define anything.

It just works. So, let's just play with some changes here. Let's say add holiday lighting in the background and have the character holding a glass of wine.

So that dramatically changed it, right? This is a good time for us to make sure that if we don't want the background to change. I just wanted to add like some Christmas lights in the back hanging on the existing bar, but what we got was a completely different room.

The first part stays good. The upfront stuff is great. But let's be more specific with this prompt and say add some holiday lighting to the existing bar in the background, not changing any of the other background details and have the character holding a glass of wine.

This is a little weird. We grew an extra arm and that's not really the kind of holiday lighting I wanted. Let's be a little bit more specific.

I'm going to just copy this cuz I already typed all that. I'll just say add red and green Christmas lighting to the existing background. Not changing any other details and have the character holding a glass of wine in his left hand.

Normally when I do left and right, it translates to my left and right. So, we'll see what it does here. Okay.

Again, we drew an extra arm. And I would say it did a better job of leaving the background alone. You see the arch back there.

The lighting is the same. We've got shelves. It's just a little bit changed.

It's nothing drastic. I think acceptable. This is the part that's not so acceptable.

We did get this one generation with him holding it, but I think by saying leaving everything else the same, that's why we've got the hands being persistent here. So, I might even say leaving everything else except the hand position the same. But let's do another example with the characters, maybe something that's a little bit more photorealistic.

We'll use the Tracy character here and we'll just say woman in white Lowry dress is sitting on the edge of a cliff legs dangling over the edge. Let's see what we get with that and we'll go from there. All right, so it definitely maintained the face and all the other details.

It expanded out the body. Pretty much exactly what I was looking for without changing anything else. add a blue and purple belt to her outfit or outfit.

Okay, that did change a little more than I wanted to. So, let's go back and I'm going to say leaving the original composition details the same, just add a blue and purple belt to the outfit. The face details should not change.

I'm probably overstating that, but I just didn't like how much it changed it just then. So, let's see what we get. Okay, that's a little bit better.

I feel like the face is changing a good bit from that right there. But overall, the consistency, especially even like with the shape of the mountains and the cliff and everything does stay the same. Change her outfit to a red tank top.

It is pouring down rain. I didn't say to leave everything the same. I kind of want to see what it's going to do if you just make it rain on her.

Will she change her position at all? What will she do? Nope, she didn't really change her position.

But look, everything's all nice and wet. It's raining. Even her face is wet.

So, that worked out really well. How about convert to modeling clay? Not very many details there.

We'll see what this does. Oh, wow. That's great.

Let's create an image that we bring to life with image to video using the Cling 2 model. For example, we won't do start and stop frame. Let's create an image from scratch because I'd like to be cinematic.

So, I'm just going to create an image using a model that I have made. So, I have total flexibility here. I'm going to choose the model of Harper the dog.

and I'll say dog is wearing a clown suit sitting in a subway. I'm going to scroll on down. We're going to get the cinematic aspect ratio.

Pop up the weight a little bit on this and click on create. I'm just doing this to underline the probably obvious fact that you don't have to use the flux context model to create your original image. It'll work with any image you have.

But if you got a lot of models set up and you create a lot of images here in the platform, it's great because you're really ready to go. Let's go back over here to chat. Make sure flux context is there and we will add a new image and we'll go to history and then our image of Harper in a subway will be there.

So again I'll say leaving all position details the same. Add Bigfoot sitting on a seat in the background reading a newspaper. Again, we got just the slightest bit of zoom in, but that's the only kind of pixel change that was there because all the details of the original image are there.

And that's exactly what I was looking for there. So, let's just use this to generate an animation. We'll go over here to image to video and we can just send it right there.

We'll click the cling 2. 0 model. Oh, wait a minute.

Stop the presses, ladies and gentlemen. I just realized they've added the cling 2. 1 models.

I didn't even know they'd done this yet. Let's just go with the master and try it. This is exciting.

So, let's say as a dog is looking around casually during a subway ride, Bigfoot is flipping through the pages of a book. I guess I'll just leave it at 5 seconds here and click on create. Ooh, I am really curious.

Just for fun, let's do the cling 2. 1 that isn't the master. So, now we have two different levels here.

We've got standard and pro. Let's just see what the standard does at 5 seconds so we can compare. And if it's been a while since you've been to Open Art and didn't know that they had all these, let's take a look at what they've got while these are rendering.

The Pix first models are fantastic. You can look back on previous videos where I have used this here on the platform and the results are amazing. We're all familiar with Veo 2.

Of course, Veo 3 is sort of eclipsing that these days. So, we have Cling 2, but now we also have 2. 1 Master, which I hadn't even heard of.

Cling 2. 1 just released yesterday or the day before. And then two different Hiluo models, Juan 2.

1, an earlier version of Pixver and VU. These are all high quality models. They all have their strengths.

And now with the flexibility to really create custom images through iterative chat prompting, you really have more and more control than ever if you're an AI filmmaker wanting to be able to truly direct your production. Here's the 2. 1, not the master version.

5 seconds, but the dog is looking around. People are looking there. He's reading a book.

This is exactly what I was hoping for and expecting. So, I certainly don't have any complaints. It'll be interesting to see what the master version brings to the table because that was already very good.

While that's working, let's just play with these styles a little bit. We'll take this image and say convert to anime. Just want to play with a few style conversions.

Oh, that's fun. Uh, that's good. And it's so fast.

Let me do another one. Convert to thickly layered oil painting. Obvious brush strokes.

Rich colors wet paint. Oh, wait. That's not the one with Bigfoot.

Copy. Oh, that's nice. I think I want a thicker brush stroke.

So, let's go back to the one with Bigfoot here. I'll paste this in. I'll just reiterate thick brush strokes.

Shiny. I don't know. Even though the brush strokes aren't as thick as I had in my mind, you can see that they are there.

And I suppose if you made them much thicker, you wouldn't be able to get the detail out of the image. I am curious what this would look like animated. And I'm curious enough that we're going to find out.

So, I'm going to click on image to video. We'll just use the regular Cling 2. 1 for this one.

This time, we'll go pro. We'll go 10 seconds. And should I add some other activity while the overhead lights flicker all in a painting style?

Here's the master version of our original prompt. It's hard to tell if it's much different. It looks like the Bigfoot actually comes into more focus.

Like there's a rack focus there. That's exactly what happens. So, we shift attention from the dog to Bigfoot.

And it just did that on its own. And perhaps it's because that's the order things are prompted. First, we see the dog and then Bigfoot reading in the background.

But that's the main difference between the two models because if we go back to this is the version one model. Still basically the same idea. We've even got an extra person walking in the background, but we don't have that rack focus back.

It stays focused here. So I guess it's just a preference, but both of them are excellent results. Let's go back and just for fun add a unicorn to the back.

Leaving the current position exactly how it is. Add a multicolored unicorn in the back of the subway car in the background. There's our unicorn.

Okay, here's the animated version of the painting. I like it. I like it.

It's weird and awesome. And the lights are flashing just like I said. Now, I have yet to do a deep dive on the Cling 2.

1 model, so I can't tell you exactly all that it's supposed to be, except that I do know that prompt adherence is especially good on this. And I have seen examples of it, as have you just now. So, can we just try to convert this to the Simpsons?

I mean, I can use the character lab to do it, but I want to see what flux context is going to do. So, I'm going to say convert to Simpsons style and see if it knows what that means. It definitely knows what it means.

I just This is so good. And again, just for fun, let's just compare it with the Character Labs version of the Simpsons filter, which is right here. We'll apply.

And remember, it's using the GPT media model. Takes a lot longer. I'm going to go ahead and send this over to animate.

While we're waiting, I'll just leave the same prompt there. I'll go down to standard in 5 seconds, but still use the Cling 2. 1 master for this.

And click create. And it even applied a more animated, lower frame rate animation style to it. I'm just all over the place.

It's just too fun. Oh my gosh. Okay, so that's also a good one.

So, let's compare. We got this one that GPT just did. And then that one that Context did.

What do you think? This one here, Bigfoot looks more like Homer. And this one, Bigfoot looks more like Drick Tatum.

The drawing here is a little bit more crude, but probably truer to the style. This has a little bit more finer details and drawing. This actually seems more like a tribute artist.

And this actually seems more like the real thing. What do you think? I'd like to hear in the comments.

So, speaking of style transfer, we don't just have to apply a new style onto an existing image. We can actually use the style of an existing image to inform the style of a brand new image. What are you talking about, Bob?

Let me show you. So, right here we have an image which is clearly a sort of claymation style. Let's say using this style, a pig and a dog are playing chess.

So, although it's a little bit furer than clay would normally be, you can see what it's doing there. Let's use a different even more stronger style here. We'll say using this style, a bear is sitting in the woods.

Using this style, a man is riding a subway. Okay, they use too much of that style. Let's say using this style, a man with a mustache and large hat is riding in a subway.

A large hat. I don't know. There we go.

That's what I'm talking about. In all of my experimenting with chatbased edit, nothing has come anywhere near as close to the precision type of inpainting results that you get with the flux context model. And I've just started playing with it this morning, and I cannot wait to see what comes out of my brain with this as time goes by.

As of this recording, I honestly don't know where else you can find the Flux Context Pro model, but it is available over at Open Art as we speak. If these are the types of cuttingedge image and video generation technologies you like to stay in the loop on, well, why not just go ahead and click that subscribe button because this is the type of stuff we get into all the time. If you subscribe now, I will not look for you.

I will not pursue you. But if you do not, I will look for you. I will find you.