Runway's GAME CHANGER for AI Video Is Here!

43.79k views2617 WordsCopy TextShare

Theoretically Media

Ever struggled with keeping characters and objects consistent in your AI videos? Well, guess what? R...

Video Transcript:

Well, it is here. The solution to one of AI video's biggest problems, namely that of character and object consistency in a scene has arrived on Runway. So, today we're going to take a look at references to see where it shines, what we'll be waiting for an update on, and some tips and tricks to get you started.

Get ready for takeoff. We are taxiing down the runway. So, runway have been on a bit of a tear recently.

In the last few months, they have dropped Frames, which is, of course, their own AI image generator. Plus, we got the brand new Gen 4 video model and the Turbo version of it as well. Teased fairly early into Gen 4's release was the new references feature, which would allow you to upload an image of a character or an object and have it placed within your video output.

Interestingly, References was pre-released to those who participated in Gen 48, Runway's uh 48 hour AI film festival. And as a quick shout out to all of you who did participate in Gen 48, I have caught a number of the short films. Uh, stellar work everyone.

I very much admire all of your creativity and especially the hustle. But now that Gen 48 has concluded, we're kind of moving into a more typical runway release pattern. So, if you don't have access to references just quite yet, don't stress.

It'll probably appear within the next few days. So, let's take a look at references starting fairly simply and then working our way up the complexity ladder. So, at its core, what references is doing is utilizing runways frames with, you know, references to generate a first frame image.

To be honest, it's it's actually kind of what MidJourney has been promising with omnireence. Uh although, you know, obviously Runway beat them to the punch. I did do a full breakdown on frames when it first launched.

That'll be linked down below. I I do find it to be a pretty good image generator, although it does have some limits, and we'll talk about those in a little bit. But for now, let's go check in with our man in the blue business suit, who we haven't seen in a while.

Last time, he had just entered an old abandoned mining town. So, because we are utilizing references, I of course generated up an image of our man in the blue business suit, who actually came out looking a little bit like, I don't know, the movie star version of Caleb Ward from Curious Refuge. Love you, Caleb.

So, to utilize our man in a blue business suit in references, um, you might be tempted just to drag him directly in. The problem here is that if you do so, then you are just working with image to video. So, you do not want to do this.

Instead, what you'll want to do is hit the Gen 4 references button here, and that will bring us over to uh formally where we would control styles. Uh so, obviously, we are under the image tab here. Uh click over to references, and then just hit the plus button, or optionally, you can just, you know, drag your image in.

So, once our character is loaded up, uh we can then just simply prompt uh man in a blue business suit from and then, uh you actually have to call it out. So, I'm going to put at image one, uh Roger and Blaine, we'll take a look at in a little bit. uh stands in an abandoned minehaft holding a flashlight.

Uh we'll hit generate and see what we get. And frames ends up giving us these four images. Uh not too shabby.

A character stays very consistent. I have found that for the most part, like this guy probably looks the most like our initial image input, but I will say that you do generally end up with a few generations that look very similar, but not quite the character. So taking the image that I liked and issuing the prompt, man walks forward exploring the cave.

Uh we end up with this as an output which looks pretty solid. Uh yes, there is like clearly like some kind of like water mane leak behind him. Uh but overall character remains consistent as he walks along.

I do have to point out that the flashlight kind of turns into like a DSLR at one point. Yeah, kind of. It's like he's holding a camera here.

Um but you know, overall pretty decent. Now, for channel lore historians, you may remember that our man in the blue business suit was traveling with a wolf when we last saw him. And I have not forgotten about our good boy.

So, I did generate up an image of a wolf. So, bringing in Balto, we can now issue the prompt. A man in a blue business suit from image one uh and a wolf from image two stand in an abandoned minehaft.

Man is holding a flashlight. We end up with these as our set of images which look pretty good. So from here we can simply hit the use button down here.

That immediately repopulates it into the video window. Uh and then we can issue the prompt, the man and the wolf walk forward exploring the cave. Uh see what we get.

And we end up with this as a video output which looks pretty good. I will note that it's something that I see with Gen 4 a lot. It's kind of like that inertness in the first like 2 or 3 seconds of the generation.

But I got to say overall uh it is what we asked for. There's no stutter step with our man in the blue business suit. A little bit with the wolf, but again might have something stuck in his paw.

The other interesting thing that you can do with references is to utilize styles alongside them. Um, just simply hit this tab next to references and you'll have access to all of the uh, you know, runway presets. And of course, you can utilize your own as well.

I did go over that in my frames video. So, uh, using my Peak Lynch preset, um, and the same prompt, we ended up with these as an output. Um, this one I ended up really liking a lot, which resulted in this as a video output, which I I to me is a little more on the compelling side.

I will give you that it comes off a little bit more like CSI K9, but uh, still it's it's well, it is a cooler video. That said, you might end up breaking things if you go too far off the beaten path, like uh when I used the style preset of dark anime here uh and ended up with these images, which I mean, they're not they're not great. They're interesting.

They're not great, but I do think that you can get some pretty cool results if you experiment around a little bit. Uh as I did here, uh this was the vivid preset. Moving on, because well, you probably have aspirations to do something that isn't a Jack London novel adaptation.

Uh, has anybody been watching Mob Land? That's the UK crime drama with Tom Hardy and Pierce Broen. It's pretty good.

Anyhow, that got me into like a real like Guy Richie kind of mood. So, I generated up these two units. I named them Blaine and Roger and got to work on a standoff in a warehouse.

Now, here is where we start running into some problems because our outputs were not very good, nor were they very consistent. In fact, here is a good example where uh Roger's head is switched with Blaine's head. Here's another one with a beard on the wrong character and the other character of Roger doesn't even look like Roger.

But given the fact that we were getting such wonky outputs, including this one where like uh Blaine looks downright possessed, uh holding like a demon claw of some kind, um it did get me thinking that I I think that I was kind of overloading uh the reference feature. My overall thought is that because these two characters are so similar in their archetype that, you know, the model was essentially getting confused and blending them together. To test that theory out, I generated up this character and began placing her in, you know, the exact same scene with our character Blaine.

And immediately, uh, things started to get better. Now, I do feel that this will be patched in an update. But for now, I definitely do notice that there is a, you know, a good amount of attribute bleed between characters uh if they are not different enough.

But I think the important part here is the fact that you can generate up a master shot from there. It's very easy to generate up an over- the-shoulder shot. Um, continue on with a reverse angle over the shoulder shot.

Uh, you know, put in a little camera movement just to kind of make it a little more on the cinematic side. And of course, we got to end it with someone getting shot. Now, I will say that up until this point, I have been allowing Gen 4 to, you know, generate locations for us, uh, I think it did a pretty good job in the warehouse scene, but if you have a very specific location in mind, like say, uh, I just generated this one up really quickly, uh, we can put our, uh, two characters here as well, simply by utilizing a prompt like cinematic crime film, two shot of Blaine and image 2, I never did name her, uh, on the sidewalk from image three, gritty crime drama aesthetic.

Ultimately, I landed on this image of our two characters clearly somewhere out in East LA, probably walking towards Dodger Stadium for a playoff game, and they are Red Sox fans. Video output, just one shot. Ended up uh looking pretty decent in my opinion.

Uh definitely has that it's going down for real kind of vibe. Uh yeah, everything about this I think is pretty solid. Again, we do run into that thing where in the first few seconds of the Gen 4 output though, our characters are just kind of, you know, standing there for just a split, like they're almost waiting for action to be called.

The other thing I wanted to point out is that all of the generations that we've seen up until this point are just the straight Gen 4 model. I don't really utilize the Turbo model very often uh because well, you end up with results like these uh where just things get super weird and like old school AI video with Turbo. It's faster, but it's it's so much weirder.

The other thing that I wanted to test out was to see if there was any kind of interaction between frames and the reference feature. Like, would it treat characters generated in frames any differently than, you know, generated externally? So, in frames, I ended up generating up a quick test of, uh, two characters in kind of a, you know, cyberpunky samurai style, uh, as well as a background.

Uh, and then utilizing them, put together a quick master shot as well as two over-the-shoulder shots that ultimately got us this quick little sequence. And while I'm not over the moon about how it looks stylistically, I will say that, you know, at least from like a cinematic vocabulary standpoint, everything holds together, you know, essentially as a traditional scene. By the way, one quick trick that I pulled here in our initial shot, uh, we have her raise her hand here.

Uh, and then in the third follow-up shot, all I did was reverse that so that she's lowering her hand. So, that's a way of getting, you know, two shots for the price of one. rounding out my test with another experiment and a sneaky little test as well.

I was curious to see how references would handle like character sheets like multiple images of the same character. In the past, we have seen these types of features get confused by it and give us like multiple versions of that same character. Uh but no, actually uh reference seems to handle it just fine.

So, you know, our bootleg Laura Croft character here. Uh, character in image one, confident adventure. Um, you know, jungle and, uh, some ruins behind her.

Um, yeah, you know, we get everything that we asked for here. And the video output stays stylistically consistent. This is just the first roll, so I would have necessarily taken this one, especially with that plant coming into frame as it does.

But yeah, overall it looks really good. Now, the sneaky trick that you can pull is that you can take one of your images, bring that in as an image reference, uh, and then prompt something like a wide aerial drone shot high above a dense tropical jungle in image one, which would be our, you know, character shot here, and end up with a result that is stylistically consistent to that initial shot yet containing none of the elements from the initial image. Uh, so yeah, this is a great way of getting like establishing shots.

Ultimately, you can end up bringing the whole thing together. Uh, this is an alternate version of that initial first shot. Uh, from there, I ended up adding in a new character.

Uh, reverse angle here. And there you go. Uh, we've got the beginning of something.

Moving on to some community outputs. Worldrenowned stuntman Dave Clark gives us a vignette slice of life of a character that he is calling Arty. Uh, clearly on vacation from all of those pesky humans.

Thankfully, you know, we have all gone extinct. Arty can pop on his Aloha shirt and it is uh basically every day is Aloha Friday for Arty. Now Tom likes Robots gives us a expansion of this Van Go sort of tilt shift thing that actually I featured a few videos ago.

I really love this. I think that this is such a great use case for AI video. Uh and now by using references uh Tom is able to take this idea and expand upon it to full 35 seconds here.

Um yeah, I mean this stuff looks so good, man. And rounding out, Andy McNamera gives us this kind of jaw-dropping shot featuring four characters. Uh this was a shot that Andy utilized for uh Gen 48.

Um what's interesting is that since Gen 3 only allows for three references, I'm not sure if he actually managed to prompt in that fourth character, but either way, that's pretty cool. So overall, did Runway cook with references? Yeah, Runway definitely cooked with references.

Is it perfect? As always, no, it is not perfect. But as the old saying goes, this is the worst it will ever be.

And while yes, you will still have to make concessions and you'll still have to troubleshoot and figure out ways around problems, um that's also true of traditional film making, the fact is is that we now have the thing that we have been asking for. So, you know, go out and make something. In the meantime, I thank you for watching.

My name is Tim.