(curious electronic music) - All right, let's play a game real quick. Is this next video real or AI-generated? (timer ticking) Go ahead, take your time.
(timer dings) So, that was a real video. That was a video from a YouTube channel. Now, what about this video?
Real or AI-generated? (soft electronic music) Okay, that one's AI-generated. Not a lot of giveaways, unless you happen to have memorized the geographical landscape around Mount Fuji and hunted for inconsistencies.
But, okay, what about this one? And, yeah, you guessed it already. This one's AI-generated too.
Okay, what about this one? Pay really close attention. Did you catch it?
See, no, okay, that one was real, but the fact that you thought about it. . .
So I, for the last week, have had full access to the newest and most advanced version of Sora, which is the AI video generation tool from OpenAI. And the results that I've gotten from it are both horrifying and inspiring at the same time. The last time I looked at this, nine months ago, this was a very private tool that was just unveiled.
They were just starting to show its capabilities, and so we were looking at other people's prompts. Actually, basically, it was OpenAI's carefully hand-selected prompts for what they wanted to show us, and we could learn some stuff from that. But this time I just get the controls myself.
It's just an open text box. My wish is Sora's command, and that's actually kind of a lot of pressure. You can type whatever you want.
But I did not take this responsibility lightly. I basically, over the past couple days, have been taking every possible angle of asking Sora to make things for me. I've asked for photorealistic things.
I've asked for cartoon things. I've asked for objects, people, signs, actions, text, still life, everything you can imagine. And I feel like I've come away with a pretty good sense of what it's good at, what it's bad at, and what it could actually be used for today.
But this is a powerful tool that's about to be in the hands of millions of people. So consider this the first ever Sora review: the good, the bad, and the ugly. Let's jump in.
So, first of all, this is the UI. On the left side, you can explore other prompts that people have recently made, and then some featured ones where I suppose OpenAI is still gonna curate some of the best results to showcase what it can do. And if you hit the bookmark button on any of these, they'll show up in your saved tab.
And what I like about these, again, is anything you click on, you can see obviously who made it, but you can also read exactly what they typed into the prompt box to get this result. And I'll get back to that in a second. So, then, underneath that you have your library of all the videos I've generated from Sora prompts, anything I've tagged as a favorite from my own creations, and then uploads, which is any of the files that I've uploaded to Sora for it to make videos out of, another crazy feature that I'll get to in a second.
But then, lastly, you have your folders, so you can organize things that you've created into different folders, maybe for different projects or different themes or whatever, just to keep it somewhat organized. So, okay, right off the bat, Sora is a tool after all, and so this feature of being able to see what other people are typing and then the results that they create from those prompts is both for inspiration but also for education, because it is really interesting reading how simple or how detailed some of these prompts are and then looking at what Sora generated and what it took creative liberties with to add on its own. And then if you really like someone else's results but wanna adjust it in just a slight way for your own use, there's the remix feature.
So, you hit that button, and then you can describe specific changes you want it to make to that video for it to then generate a new one. So for this one, for example, I like this shot, and the house is pretty cool, but I kind of want there to be a golf course on those cliffs in the background, right? So, I type that, and then I can also change the resolution, which, as you can see, it will take longer for higher resolutions.
And then you also can specify exactly how much of a remix it is, so if it's a subtle, mild, or strong remix, which dictates basically how much it's willing to change the results from the original. And then you can even dial that in on a slider from one to eight if you want to. I think adding golf course would be nice, but I wanna keep those cliffs and the house, so it's a mild change, right?
Click "remix", wait a bit, and then, boom, you have the new creation, a fresh, new, artificially-generated video with that same house and those same cliffs, but now 1080p and with a golf course in the background. Wild. I've found that 360p videos take, actually, very little time to generate.
I do not want to use the word "create" here. These are artificially-generated from source material, but basically, a five-second, 360p video will generally take less than 20 seconds to generate. And they're being generated on OpenAI servers, so it has nothing to do with the speed of my computer or even my internet connection.
Once I send off the prompt, it just takes that time, and then when it's done, it shows up, and I can download it if I want to. And then, something like a 1080p, 10-second video, it looks significantly better, but it also takes much longer. It can take a couple of minutes for a really detailed prompt and a 1080p video.
And that's also right now, when almost no one else is using it. I kind of wonder how much longer it'll take when this is just open for anyone to use. But that's roughly how long they've taken.
And there's also the storyboard feature, which kind of looks and acts like an online video editor where you can string together several prompts. And it works best for stringing together several different actions in a row. Basically, I've noticed it's hard to get a video to do several different things in succession with a single prompt, but you can use their built-in editor to string several prompts together in a row and then storyboard can help you blend them all together into one longer video, and that actually can work better instead.
So I've been playing with this all week, and I've been throwing dozens and dozens and dozens of prompts at it with all sorts of characters, different styles, different unique things in the video. And so here are some observations that may surprise you, or maybe not. So, one, it seems like there is no object permanence.
So, we had AI-generated images, right? And they've gotten. .
. You've seen the ark of them getting better and better and higher resolution and realistic over time. Now that we're doing videos, basically they're stringing together a bunch of different AI-generated photos in a row with some sort of continuity.
But for a video to make sense to a human, the objects kind of have to move in certain ways and behave in certain ways in the real world. And so one of the most common dead giveaways from a flaw of an AI-generated video, especially with this tool, is just errors in object permanence. So you'll see things like objects passing in front of each other or behind each other in ways that don't make sense.
You'll see stuff disappear or reappear out of thin air, especially if objects pass in front of them, but even sometimes without any real reason at all. They can also just materialize and then vanish, like the smartphone in this imaginary tech reviewer's hands. Boop, just gone.
And there's also another one that's really common. Anytime something with legs has to walk, if you just watch the legs for long enough, they almost guaranteed will mess up which leg is the front leg and which is the back. Just look at this one.
Just try to watch one of the legs, and you'll see it happens. It switches back and forth from the front leg to the back leg. This is super common.
It happens all the time in these. And that brings me to another one of my biggest observations, which is it just struggles with physics in general, the way things move, which actually makes perfect sense, because physics would seem to require some understanding of what you're making a video about. So, in the same way that large language models can struggle with hallucinations or putting together sentences that are technically incorrect or don't quite make sense, these video models will also struggle with putting together videos where the movement of the objects, since it doesn't know that they're objects, sometimes also doesn't really make sense.
Pretty much anytime I try anything remotely photorealistic, basically, you instantly notice that the movements don't look right. It tends to look kind of slow motiony, actually, but then other parts won't be in slow motion, which our eye picks up on right away. I was playing with making like CCTV-style footage.
'cause in my head, that's gonna be the easiest thing to fake. People are gonna get their hands on this, they're gonna fake security camera footage all the time. But even in this, you can see the movement of the humans is just slightly off.
First it's a little slow, then it's fast for no reason. It's just weird. Now, it seems to be decent sometimes at fluid dynamics for whatever reason.
I've been very impressed with some examples of water rippling or crashing in waves, or moving in realistic waves. It does happen sometimes. It can be actually pretty good-looking.
And same thing also with fire. Sometimes it can look pretty realistic, even if the smoke isn't. But, yeah, physics in general is definitely a weakness.
So then, if you know this, you can mess more with claymation or cartoon style, and then those irregularities and movement or physics become a lot easier to palate, because it feels more like a stylistic choice or an artistic choice. But I'll get to the stuff that it actually does really well in a second. But there is another feature that you might have seen around the internet a little bit with some other tools like this that takes a source image and can turn it into a video with a prompt.
So, people have been bringing memes to life and imagining what would happen right after some iconic image, just by typing into the AI video generator and asking Sora what it would look like, and Sora can do that too. The difference with Sora, I've found, is it's way less likely to try anything that it recognizes as having any copyrighted or intellectual property whatsoever. It's pretty picky.
It actually does reject a lot of stuff. Anything with any public figures or recognizing characters in general, or logos, it actually refuses to do it. It will also refuse if it thinks any subject you upload in the image is under 18 years old.
Makes sense. But I did upload this AI-generated image, which Dolly made, and asked it to make them all singing and dancing, and hit "generate", and it totally did it. Again, it helps a lot that they are cartoon styled and not some super realistic scene, but this is a totally AI-generated video.
I also tried a few other images from my camera roll, And these just. . .
It just gets weird. It just doesn't know the context of what direction any object was moving in the photo. Again, it doesn't know physics, and so things just get really wonky.
It's impressive that it's AI-generated video, but you can tell pretty quickly that it's AI-generated video. So, all that being said, we've seen what it's good at, we've seen what it's not so good at, and you might be wondering, what is this tool actually for? And I actually found a few things that are not so scary that are actually pretty good at.
So, these are the things that I found, as of right now, that Sora can be used for. Abstracts. So, you can be as descriptive as you want and create all sorts of textures and colors and gradients to make an abstract shape move around in a way that essentially looks like it could be a screensaver or just some background piece or whatever you want.
I'm sure there's a world where someone's gonna make an NFT out of this stuff. Also totally weird. But you can do abstract stuff for sure.
I noticed it's getting better at reproducing text, especially when you ask it for specific text. So, there are sometimes background garbled text in these videos, and that's gonna continue to happen, but I've found that, if I asked it for an individual title slide in a style, it actually could give me the correct text on that slide. This animation it gave me of sketching the Empire State Building, this would make a killer title slide or even an intro to a documentary about the building, or Manhattan, or something like that.
I was very impressed. And then stop motion or cartoon-style characters. Again, like I said before, cartoons don't necessarily have to have realistic movements or physics, so the errors in those things don't hit the eye as much, and so the characters still look like, dare I say, art.
They're characters. Especially in scenes like this one. The graffiti doesn't have to actually say real words for this to look normal.
The bear doesn't have to have the correct number of fingers or toes. It just looks like some album art come to life, basically, which I'm sure someone would find useful on social media somewhere. And while we're at it, here's a couple other random prompts that impressed me.
I asked for Santa fighting Frosty the Snowman in the style of "Mortal Kombat", and it built a readable scoreboard. I didn't ask for the scoreboard, but it built all that, with the names in everything, which is pretty crazy. Immediately, again, the details are weird when you start looking at it, but this could be a good inspirational starting point.
And then I also asked for a video of a tech reviewer sitting at a desk, and I said, nothing about this fake plant, but Sora took the liberty of adding this exact fake plant here, which I said nothing about in my prompt. And that is a convenient reminder of the other side of the tools like this, the unknown behind a lot of the tools. Clearly we already know that these are AI-generated videos that have to come from some source material.
Are my videos in that source material? Is this exact plant part of the source material? Is it just a coincidence?
I don't know. OpenAI has talked about using publicly available media and data, and it's always, to me, been kind of a sketchy definition. We don't know if it's everything.
We don't know if it's too late to opt out, if we wanted to opt out. And also, I have no idea how much energy this uses or if it is significantly more than the LLMs or Dolly we're using. But still, the craziest part of all of this is the fact that this tool, Sora, is going to be available to the public, I guess around the time of this video publish, to everyone, to millions of people all at once.
And I mean, yes, it does a pretty good job with guardrails of refusing to do anything photorealistic, refusing to use actual people's likenesses or any dangerous or harmful acts. And, yes, it does watermark every single generated video with this little animation here in the corner. But you can still crop watermarks.
And it's still an extremely powerful tool that directly moves us further into the era of not being able to believe anything you see online. I mean, read through the comments on my last video about AI-generated video. This is a lot for humanity to digest right now.
All of that, and these videos don't even have audio. They're just a few seconds long. They're just 1080p.
This is the new baseline. This is, once again, the worst that they will be as they continue to get better in the future. It's a lot to think about.
Thanks for watching. Catch you in the next one. Peace.
(light percussion music) Real quick, thanks to Ridge for sponsoring this portion of the video and for supporting the channel. Just as a PSA, we're coming up on that time of year where we're perfectly in the middle of we should rush to get good holiday gifts. We're kind of running out of time.
Lucky for you, Ridge is running a holiday sale right now where they're doing up to 40% off. And you guys know one of the reasons I partner with them is because they have actually high-quality gear. So that's one of the things I'm gonna be doing for this year, is getting gifts from Ridge for people.
And I think I actually haven't gotten Andrew his gift yet, so hopefully he doesn't watch this part of the video, but I think I'm gonna get him, since he's a hiker, one of the Ridge knives. But don't tell him. But whatever you end up getting, if you slide over to ridge.
com/mkbhd, that's where you'll be able to find everything that's on sale and everything Ridge does. So, shout out to them again and catch you guys in the next video. Peace.