Making an AI Film in 10 Seconds vs 10 Hours

15.45k views3081 WordsCopy TextShare

CyberJungle

🎬 Making an AI Film in 10 Seconds vs 10 Hours – Can Speed Beat Quality? 🤖🎥 Links: Kling 2.0: kl...

Video Transcript:

In this video, I will create an AI film produced in just 10 seconds and another one crafted in 10 hours for the same story. I will show you my step-by-step workflow in these limited times to reach optimal results. And you will see how the same story gets richer with consistent characters and progresses visually from the shorter time period to the longer period.

In the process of taking this challenge, I tested and tried many different things and accumulated a lot of practical ideas that I will share with you now. So you can save a lot of time and credits. You will see how I build consistent characters and a coherent cinematic world at top speed, combining midjourney version 7, runway references, Clink version two, and Hicksfield AI.

We'll dive into camera movement, shot types, and making the most out of Clink's powerful imagetovideo engine. Along the way, we will also compare consistency results of Midjourney Omni reference and runway references. Let's get started.

First AI film produced in 10 seconds. Starting with my game plan for this challenge. 10 seconds is such a short time that it's not possible to use imagetovideo workflow.

Instead, I will rely on text to video. When it comes to texttovideo performance, I think Google's V2 is still the best option in the market. I will use V2 on Freepic platform and Freepick is a fantastic platform where you can find all the state-of-the-art AI video models together.

I will pre-prepare my text to video prompts for critical sequence of my story line and in 10 seconds I will run them all together. For text to speech, I will use 11 laps with a pre-selected voice and pre-written text to create the voice over that tells the story. Note that this first version prepared in 10 seconds won't be super amazing, but still it's fun for reference to compare against the version prepared in 10 hours.

I started with a simple synopsis for my story and additional to that I have some physical descriptions of my character that simply I decided that would fit the best and from this starting point I jumped into check GPT and I wrote you will generate textto video prompts for six or seven major events happening below story line. I specifically ask model to mention photo realistic in every prompt. And for creating some character consistency, I wanted model to mention the physical properties of every character depending on who is in the scene.

This is important because achieving character consistency via text to video is pretty difficult and you need to mention same character properties in every prompt to ensure that there is some proper consistency throughout your scenes. After this point, I wrote some scenes that I was thinking would fit the best, but you can actually automate this part and ask Cet GPT to write these scenes for you. And I attached the physical properties of my character Asimma and then CE GPT prepared some text to video prompts for me to use directly on V2.

And I jumped into Freepic and using Google's V2, I ran all of the prompts one by one and I generated around 10 to 12 scenes. One important point to mention here is importance of using negative prompt with V2. It's very common with V2 that aesthetics switches easily to game renders or let's say CGI animation to prevent that and sticking to photo realism.

I highly recommend you to mention some negative prompts like video game graphics, computerenerated animation or 3D game rendering here so it stays most of the time in photo realism. And for the voice over for the story, I decided to use 11 laps. Using the scenes I wrote before, I generated this text and brought the scenes to a storyteller format that I can use for my voice over in the story.

I copy pasted this and picked a nice voice for my story and I hit generate speech. Now let's see the end result. At the fourth bell of a rainwash dawn, his majesty King Trevor III drew his final breath.

The crown shall pass to whichever rightful heir first sets foot within the great hall of the capital and kneels before the throne. Swift couriers galloped in opposite directions, each bearing a sealed parchment. One reached Prince Ulves, the other found Princess Asa.

Olves received the news with flint cold purpose. He had ordered war horses saddled, banners unfurled, and every night sworn to his sigil, roused from sleep. In his heart, he resolved that Asima will never glimpse the capital's walls before him.

Far to the south, Asima mounted her horse and galloped at breakneck speed toward the kingdom's capital, determined to arrive first with a sense of duty for her people. Even as she prepared, Wolves captains spurred ahead, laying a barricade along the forest roads. Thus, upon a single mournful of mourning, brother and sister set their fates and the fate of the kingdom into ruthless motion, racing wind and one another towards the empty throne that waited at the heart of the realm.

Now, let's continue with 10 hours version. 10 hours gave me possibility to create detailed characters and a universe for my story. For character design, I use Midjourney V7.

I still like Midjourney's unique aesthetics as a starting point. I created character photos for Princess Asima and Prince always using these prompts. And I use Magnific upscaling to bring those details and textures to life.

Now I needed to generate my ultra realistic cinematic shots with consistent characters. For this I had two options. One using Midjourney Omni reference.

I use this original image for Princess Aima as an omni reference and you will realize that with default value of omni reference 100 there were quite lot of variations on the face especially when it was about the distant faces where she was pausing on top of a horse after that I increased only reference weight to around 400 and result definitely improved and face consistency improved dramatically I tried same thing with runway references I added Asenna's original photo as a character reference and I wrote cinematic side profile photo of Asenna galloping on the horse and I found results to be also quite good. In the end of the day it all came back to the workflow and how fast I can build the universe because I had limited time and even though midjourney results looked quite good and resemblance is fantastic. I decided to proceed forward with runway references because it is much easier to build a universe changing the camera and shot type and the 3D space and there are additional benefits like multiple character consistency on runway side and that's why I decided to proceed with runway.

In this example, I use one of the Asenas images that I generated using runway and I edit this as a first character reference and I use image of this assassin as second character reference and just by mentioning the image names on my prompt, I was able to create this scene of a sword fight. One tip I can give you is when you mention a character reference, sometimes characters tend to look at the camera like this and it doesn't really look realistic. Because of this, I always use candid photo in my prompts to ensure that characters not necessarily always looking into the camera, but it looks more like a photo has been taken right in the middle of action and they are not actually aware of it.

And in the next example, I was able to generate a complex motion of Aima is kicking the other soldier using multiple character reference again. And just by mentioning the image name and the action, I was able to generate this complex scene. Note that I use again candid photo so they don't stare at the camera and this looks much more organic.

This one is a pretty good example of using tree references and actually designing a scene in an exact way that you want. I use my drawing as a reference point to exactly place my characters to proper places in the scene. So I added asenna and represented it with the triangle and I represented assassins with the circles and I even set their looking direction that they are looking at each other and I also added references of my main character and assassins and I wrote the prompt cinematic behind shoulder photo of Asimma and six assassins looking at each other face to face positioned according to the drawing on image tree and runway was able to provide me the exact design that I asked for.

Similar example here. One of the challenges here is being able to create a fight scene using AI. And even with the highly developed tools today, it's still a big challenge.

And that's why sometimes you need to really specify that where which character will be placed and how the camera will be positioned. I placed Asma here and assassins in front of her. It didn't gave me exactly what I asked for, but we came really close here.

We had Asma running between the assassins while sword fighting. and I wanted characters to be placed according to my drawing here. So with this method using your own drawings, you can really create complex and interesting scenes that wasn't possible before.

I'm a big fan of being able to change the shot type in 3D space using runway references. Here I used exact same shot of AMA holding the sword and I wrote cinematic over the shoulder photo of AMA the sword closeup and six enemy knights in front of her and the AI model behind was able to understand it and model the whole scene and it gave me closeup of the sword with knights in front of her. In a similar example, I use this shot where she's on top of the horse and this time I ask a wide angle shot in the forest and I actually asked her to be standing next to her horse and I was not disappointed with the result.

Actually, it did a pretty good job. By the way, if you are enjoying the content, please like the video and subscribe to the channel. One challenge with runway references is sometimes it's changing the outfit and that's why you need to be very specific in your reference photo.

ideally a medium shot with a clear face. So you will have then less changes on the outfit side or you can always write exact outfit to your prompts which can be also an easy solution. And no matter how developed the tools are still hands and particularly hands holding swords is a big challenge.

I had many of these scenes where there was a sword fight and a duel and you may easily guess that I had many of the anatomy problems and accuracy of details problems. And to solve that, I'm using Freepick. For me, Freepix in painting or as they call retouch is really intuitive and easy to use.

And that's how I was able to fix many of my scenes and make them basically usable. If you feel like you are drifting away from photo realism or if you have some problems with distant faces or extensive blur, you can always use magnific. I'm using magnific for mainly correcting distant faces or bringing some additional details to close-ups.

Most of the time I'm using films and photography mode while upscaling my images with extremely high resemblance and very minimal creativity. But if you want to keep the consistency as high as possible and if you don't want faces to change, you can always use soft portraits or hard portraits. For producing cinematic videos, I'm using Clink version two.

It is rare to find another AI video model who matches the performance of Clink at the moment. For example, here I was able to render a complex motion of my subject where she gets on the horse. And you will realize that I edit calmly to my prompt.

This is one of the tricks I use on clink. Sometimes when I just write simply she gets on the horse, the whole motion gets little bit scrambled. It gets little bit rushed.

But when you add calmly, it gets little bit easier to render complex motion. In the next example, I had another complex motion. So here messenger needs to give the letter to my character Asima.

And here he kneels and delivers the letter to the knight. And you will realize that model was able to understand who needs to move towards who which is very good delivering the letter and being able to understand my prompt perfectly is really fantastic. Here another complex motion of a sword fight that she needs to defend herself with her sword from knight's attack while camera is stationary.

So additional to subject motion here I don't want camera to move and you can see that maybe in some frames we have some coherence issues but mostly it looks natural. If you know how difficult to render complex sword fights you will realize that this is actually a pretty good result for clink. In this example I wrote she's horse riding while camera follows her.

So we have a camera motion here and we want to ensure that we are following her closely and in the same time you can realize that how natural the horse riding looks. The next tool in my workflow is Hicksfield AI. It entered to my favorite tools list pretty quickly since it brings something else to the table.

Hicksfield brings fantastic complex camera motion which is most of the time hit or miss with clink or runway. Here for example, you have a textbook crane up. You can basically pick basic camera control or epic camera control.

You have plenty of options. They also keep adding new stuff here. It's updated very often.

You just pick one of these camera motion types and you don't even need to write prompt most of the time because it fills the prompt section automatically. For example, here we have a crane over the head shot which is super difficult to generate with other AI video models. And with Hicksfield, this was a single trial.

This camera motion is called whip pan. It basically shows to the audience what the other part of the scene looks like. and it does it in a way that it's quite smooth and it whips the camera.

It's very cool. Wes Anderson uses this technique quite often and it's very very difficult to do this with AI video models where Hicksfield does it easily. And to achieve this effect, you need to add a first frame and a last frame and you need to pick whip pan from the list of camera controls.

This camera motion is definitely part of the popular culture now and you can easily do this using Hicksfield. All you need to do is choosing bullet time and definitely adding a nice contextual first frame because this is really important. I believe this is one of the creative uses of Hicksfield that not many people knows.

You can basically use baseball kick something called baseball kick and this is mainly for sports and to be able to use it with baseball shots but you can actually use this with swart fights and it gives a pretty cool nice effect. And this one is similar again. I use baseball kick and in the end you will realize there is a bit of problem here with the hand but beginning of this shot is actually usable and one of the things I appreciate about Hicksfield is it gives a pretty cool nice handheld shot and of course as cherry on top of the cake you have this nice face punch effect that I'm sure you saw on social media.

I think this can be pretty handy for fight scenes and things like that. For lip syncing, I'm using Dina's lips sync. Uh, I think it does a pretty good job.

I can show you a small piece here. So, the old lion is dead at last. I shall be first to the capital.

And the fire here, for example, was rendered nicely. Whereas the lip sync also looks pretty natural and pretty good. Similarly, you can see another example here.

by the king's new decree in the name of his majesty. It's also possible to upscale it to HD. And for voiceovers, I use a combination of 11 Labs and Minimax audio is also very very good because it has also a control of emotion which comes very handy if you have a specific emotion that you want it to be transmitted to your audience.

And speed control works actually very good. And now let's watch Princess Asima AI film produced in 10 hours. To her highness Princess Asima and His Highness Prince Ulas at the fourth bell of dawn this very day.

Your noble father, his majesty, King Trevor III drew his final breath. In keeping with the ancient covenant of our realm, the crown shall pass to whichever rightful heir first sets foot within the great hall of the capital and kneels before the throne. The council therefore bids each of you make haste to the royal capital that you may pay proper homage at his vigil and answer the summons of succession.

Delay not, the realm awaits your presence at court without fail. Lord Chancellor Kursome. So the old lion is dead at last.

I shall be first to the capital. And I'll see to it that a s never crosses its gates. Saddle my warhorse and rouse every knight loyal to me.

Now I have to claim the throne for my people before it's too late. Princess Asa, by the king's new decree, in the name of his majesty, I place you under arrest. Yield your sword and come with us.

Don't worry about me. I will be fine. [Music] [Music] Wow.

Wow. [Music] Please no. No.

[Music] Hopefully this video was truly helpful for you. Don't forget to give a thumbs up and subscribe for more in-depth tutorials.