did you notice that there hadn't been much interesting AI news in the last few weeks and I know that's news to all the accounts that Proclaim huge AI news every 3 days but I mean actually and I even had a whole video ready on that hibernation going over a bunch of new papers on some of the hurdles ahead as we inch closer to AGI but then a couple of announcements came tonight and that video will have to wait the mini AI winter which is more of a cold snap really might be drawing to an end
but after covering those announcements we're going to see evidence that no matter what AI Tech Titans tell you never leave your hypom at home first what just got announced by Sam Alman and open AI well it's 12 days of releases and that's all they tell us but we can join the dots and piece together some reporting to find out a bit more about what's coming what's almost certainly coming in these next 12 days is Sora at long last in case you've long since forgotten what Sora is it's a text to video generator from open aai
it was first showcased back in February of this year and even though it's been almost a full year I would argue that some of these demo videos are still the best I've seen from a text video model a version of Sor was leaked by disgruntled artists around a week ago and that's how some people were able ble to generate these clips some of these are obviously state-of-the-art but then some of them are less impressive and what we learn unofficially is that it seems like there might be a SORA turbo mode in short a model that
generates outputs more quickly but with less quality I'll have more on hallucinations in a minute but what else might be coming in these 12 days well very likely their smartist model which is called 01 one of their employees writing under an alias said first open I is unbelievably back that's yesterday someone asked give us full 01 and he said okay indeed open AI senior vice president of research newly promoted Mark Chen wrote If you know you know the full version of 01 simply called 01 as compared to 01 preview looks set to be the smartest
model at least in terms of mathematics and coding that doesn't automatically mean it will become your chosen model in some areas it actually slightly underperforms the currently available 01 preview that does though leave one question for me which is what are they going to do to fill the other 10 days according to one of their key researchers they're going to have to ship faster than the goalpost move and now that chat GPT has surpassed 300 million weekly active users just 3 months after it surpassed 200 million weekly active users no one can deny that plenty
of people might use anything they do ship of course none of us have to wait for things like Sora there are some Epic free tools that you can use today that I will show you at the end of this video but the biggest news of the day didn't actually come from open aai it came from Google Deep Mind with their presentation on Genie 2 in short it's a model that you can't yet use but which can turn any image into a playable world and there's something just a little bit ironic about the announcement of Genie
2 tonight the background is that I covered Genie 1 on this channel and talked about it potential moving forward the playable worlds that Genie one could conjure up were decent but pretty low definition and limited but I noted from the paper that the architecture could scale gracefully with additional computational resources that's not the irony the irony is that just a couple of days ago I interviewed the person who managed and coordinated this genie2 project Tim rocktashel this was for a podcast that's been released on my AI insiders platform on patreon I will of course go
through the genie2 announcement the demo videos and what they say is coming next but I can't help but point out I directly asked Tim Rock dashel about Genie 2 the paper it capped out I think 2.7 billion parameters I I just wonder you might not be able to say but surely Genie 2 you know 270 billion parameters or something is that something that you're working on or excited about as with more data cuz the paper even talked about how one day we might train on a greater scale of internet data all of YouTube potentially is
that something you're working on or could speak about at all I'm excited about this uh you can just look basically what happened over the last few months since Genie was published there's for example The Oasis work that came out a few weeks ago so where people basically learned a new network to simulate Minecraft before that there was a paper learning to simulate Doom right using new network that space is definitely heating up and I think it's exciting I think maybe at some point these simulators these learn simulators are getting fast and Rich enough so that
you then can also use them to adversar probe abodi AGI and and like teacher new capabilities the reason I was interviewing him by the way is because he is the author of the brand new book AI 10 things you should know but what really is Genie 2 and what does it say about what's coming Deep Mind call it a foundation World model and essentially you give it a single image and Genie 2 will turn it into an interactive world the world might not quite be as high resolution as the original image but you can use
keyboard actions to control that world jump fly skip swim all that kind of thing I can imagine this being used for dream sequences within games where a character might have a dream of an alternate reality and you can interact with that world or maybe in the future websites instead of having static images in the background or even looping videos will have interactive environments that you can play like games but just a few quick caveats these worlds these Generations on average last 10 to 20 seconds or for up to a minute next is even though they
seem that way these example videos aren't quite real time as it stands if you want real time interaction you'd have to suffer from a reduction in quality and let's be honest these outputs weren't exactly high resolution to begin with so we're not talking about replacing AAA games anytime soon next the outputs can go quite wrong quite quickly with no real explanation like in this one a ghost appears for no reason in this one the guy started with a snowboard but then immediately decides just to run the course as Google wrote the character prefers parkour over
snowboarding yes by the way the initial prompt could be a real world image and that of course is super cool it can kind of model lighting although we're not talking Ray tracing here and it can it says model gravity but look at this horse jump on the left I wouldn't say that's terribly high accuracy physics this bit though I did find more impressive which is that Genie 2 is capable of remembering parts of the world that are no longer in View and then rendering them accurately when they become observable again so if you look at
the characters they'll look away from something look back to it and it's mostly the same as when they first looked away interestingly in the announcement page which didn't yet come with a paper they actually pushed a different angle for why this was important they said that if we're going to train General embodied agents in other words AI controlling a robot that's bottleneck by the availability of sufficiently rich and diverse training environments and they gave an example of how they use Genie 2 to create this interactive world and then told an AI agent to for example
open the red door the SEMA agent which I've covered before on the channel was indeed able to open the red door but I personally would put an asterisk here because it's an AI agent trained on this AI generated not particularly realistic world because of the start Gap that exists between these kind of simulations and our Rich complex reality I'm not entirely convinced that this approach will lead to Reliable agents of course Google deepmind could well prove me wrong they say that they believe Genie 2 is the path to solving a structural problem of training embodied
agents safely while achieving the breadth and generality required to progress towards AGI now of course my objection would fall away if we had a path for removing these creative but not particularly reliable AI hallucinations but as even the CEO of Nvidia recently admitted we are quote several years away from that happening his solution as you might expect is just buy more gpus some of you may say that reliability issues and hallucinations they're just a minor bug they're going to go soon what's the problem with Jensen hang saying that the solution is still just a few
years away well I think many people including AI lab leaders massively underestimated the hallucination issue as I covered on this channel in June of 2023 samman said and I quote we won't be talking about hallucinations in 1 and 1/2 to 2 years 1 and a/ half years from then is like today and we are talking about hallucinations and even at the upper end we're talking mid 2025 and I don't think anyone would now vouch for the claim echoed by Mustafa sullan that llm hallucinations will be largely eliminated by 2025 in short the very thing that
makes these models great at generating creative interpolations of their data creative worlds is the very thing that makes them unreliable when it comes to things like physics remember that even Frontier generative models like Sora when given 10 minutes to produce an output still produce things where the physics don't make sense and this links to a recent paper that I was going to analyze for this video in mathematics and plausibly physics large language models based on the Transformer architecture don't learn robust algorithms they rely on a bag of humanistics or rules of Thumb in other words
they don't learn a single cohesive World model they deploy a collection of simpler rules and patterns that's why for example with Genie 2 and Sora you get plausible continuations that if you look closely don't make too much sense imagine Sora or Genie 2 generating a car going off a cliff and all the resulting physics you might have hoped that their training data had inculcated Isaac Newton's laws of physics and you get a very exact result but they don't actually have the computational bandwidth to perform those kind of calculations instead it's a bit more like this
when models are tasked with 226 takeway 68 they get this vibe that kind of feels like an answer between 150 and 180 that's one of the humanistics or rules of thumb that these authors studied patch together enough of these Vibes or humanistics and you start getting pretty accurate answers most of the time each fistic they learn only slightly boosts the correct answer logic but combine they cause the model to produce the correct answer with high probability not reliability indeed their results suggest that improving lm's mathematical abilities may require fundamental changes to training and architectures and
I totally get it this is the same video that I showed you the 01 model getting 83% in an exceptionally hard math competition but you may have noticed that none of these models tend to get 100% in anything for example surely if they can solve 90 3% of PhD level physics problems why do they only get 81% in AP Physics that's the same 01 model that we're set to get in the next 12 days I almost can't help myself I'm starting to cover the two papers that I said I would cover another day but still
I just want to touch on this other paper the ridiculously compressed tldr is that they show that models do learn procedures rather than memorizing individual answers the way they show this is really complex and does rely on some approximation estimating for example if you remove these 500 tokens how would that affect the model parameters and therefore the likelihood of getting an answer right you can then in short judge which kind of sources the model is relying on for particular types of question again what the authors are showing is that the models aren't memorizing particular answers
to reasoning questions like when asked what is 7 - 4 in Brackets * 7 they're not looking up a source that says the answer of 21 they're relying on multiple sources that give the kind of prec procedures you'd need to answer that question but while that seems really promising for these models developing World models and true reasoning they add this crucial caveat they don't find evidence for models generalizing from pre-training data about one type of reasoning to another similar type of reasoning you could kind of think of that like a model getting fairly good at
simulating the physics of the moon but not then applying that when asked to simulate the physics of Mars in distribution generalized ation versus outof distribution generalization anyway I've definitely gone on too long time to bring us back to the real world and before I end with that cute turtle and how you can move it around here is another real world tool that you can use today this is assembly ai's Universal 2 Speech the text model and you can see its performance here as many of you know I reached out to assembly Ai and they are
kindly sponsoring this video I use their models to transcribe my projects and you can see the comparison not just with universal one but with other competitive models one thing I've learned in doing so is don't always focus on word error rate think about how models perform with proper nouns and Alpha numerics that is at least for me what sets the universal family apart now as we wrap up this video just for anyone wondering about an update to simple bench first what about the new Gemini experimental models well they are rate limited I might soon be
getting early access to something else but for now we can't run it in full on some bench what about deep seek R1 well again as of today not available through the API what though about alibaba's qwq model that has been getting a lot of hype lately but honestly what's new almost everything gets hyped in AI of course I am following all of the models coming out of China and testing them as much as I possibly can and I did read that interview with the founder of deeps I may cover that in a different video but
for now I could actually run qwq on simple bench and unfortunately it got a score below Claude 3.5 Hau so it doesn't appear on the list I'm sorry that I can't do a shocked faed thumbnail and say that AGI has arrived but that's just the result we got it was around 11% I'm going to show you one more tool just very quickly and you can use it for free today it's clling 1.5 actually another model coming out of China and you could argue in a way it's a foretaste of the kind of interactivity that something
like Genie 2 will bring again free to sign up for at least five professional generation ations click on the left to upload an image I generated this one with ideogram then go down to motion brush then I selected Auto segmentation so I could pick out the turtle and then for tracking I drew this arrow to the right then confirm of course go to professional mode I only have two trial uses left and then generate you control the movement and you can end up with super cute Generations like this so whatever you found the most interesting
thank you for watching and have a wonderful day