I have to talk about the new 03 model from open AI this is a major major new development with AI in that it not only broke through some of the different evaluation techniques dealing with how good an AI is at reasoning it didn't just break the record it monumentally went through it so um as far as the changes it's gone through and it's been able to move Way Beyond what was thought to have been coming next right we were thinking 10% increase 20% increase but no it was uh crazy as far as its new level
in fact it's so good at reasoning now that they're changing the actual tests that are being used in order to measure how capable an AI is at being able to understand and reason through novel new questions so that's a major thing right so we have this arcs organization that create this Benchmark this test and now they're even changing what they do because of the capabilities of of open ai's new model and the reason why we do that is because we want to test the model's ability to learn new skills on the Fly we don't just
want it to uh repeat what it's already memorized that that's the whole point here now Arc AGI version one took 5 years to go from 0% to 5% with leading Frontier models however today I'm very excited to say that 03 has scored a new state-of-the-art score that we have verified on low compute for uh 03 it has scored 75.7 on Arc ai's semi-private holdout set now this is extremely impressive because this is within the uh compute requirements that we have for our public leaderboard and this is the new number one entry on rkg Pub so
congratulations that thank so much yeah now uh as a capabilities demonstration when we ask 03 to think longer and we actually ramp up to high compute 03 was able to score 85.7% on the same hidden holdout set this is especially important 87.5 sorry 87.5 yes this is especially important because um Human Performance is uh is comparable at 85% threshold so being Above This is a major milestone and we have never tested A system that has done this or any model that has done this beforehand so this is new territory in the rcgi world congratulations with
that congratulations for making such a great Benchmark yeah the work also is not over yet and these are still the early days of AI so um we need more enduring benchmarks like Arc AGI to help measure and guide progress and I am excited to accelerate that progress and I'm excited to partner with open AI next year to develop our next Frontier Benchmark amazing so this 03 and it's it's funny that it's named 03 because the previous one was 01 so they've jumped to 03 they said that it's because they didn't want to uh sort of
step on the name of that there's some other group some telecommunications thing or something that's called O2 so this is in UK they didn't want to name it that so they jump to 03 but it's funny because it's like it kind of makes sense because this is such a huge leap from 1 to3 it's more than double more than triple so it's it's a big deal so I want us to think about that because now we have to think well what does this mean for for Academia does it mean anything and the answer is yes
it does mean something um this isn't a a AGI right this isn't this isn't artificial general intelligence it can't just solve everything now through reasoning no and going through the test it it wasn't able to solve a multiple of different questions that would be easy for you and I as humans to answer but it was able to answer a lot of Novel new questions so this this is important because this means that it's answering questions that it didn't have in its pre-trained data set so that means it was simply looking at the question and then
it figured it out on its own through different mechanisms in trying to understand so it's using real reasoning to figure something out now of course there's all sorts of debates on what what is real reasoning and all all these things and that's great it's great that we're having those debates and it's great that we have this type of competition this type of benchmarking because what's happening is something really interesting and I I really want to express this because it's it's the same thing that's happening in Academia right so initially we had the touring test and
this was this idea that oh if an AI can can come across and seem like it's real through its emotional uh responses or through understanding what you're asking well then that means it has real intelligence right but then of course we were able to achieve that with cat GPT we almost achieved it before then right with them Alisa and all this stuff it convinced a lot of people but no even with Chad GPT now we had something really powerful and so that touring test kind of faded away and that it didn't make sense anymore right
it's like what were we thinking before like that's not a a true definition of intelligence and now in the same way with the previous test that we had within ARS for trying to understand what is intelligence and can we achieve that and the fact that this 03 model did it and surpassed it now we're going to be saying well okay we're changing it that that's not real intelligence now it's going to be more like this so we're going to create something new which is which is actually not bad at all right I think we should
be doing that because what's happening is that we're learning more about ourselves and what is intelligence by developing the AI as we develop this Ai and its capabilities get more and more powerful then we're starting to see oh yeah intelligence is more than just that and it requires this and and this and this so I think that's great now going back to this thing of how AI is also holding up a mirror to us in education I think it's great because what's happening is that it's forcing us to to re-evaluate and rethink what is a
good education in talking about writing essays right it used to very be very very common for you as a student show up to class you get a lecture and then you get an assignment write this type of essay and turn it in within two weeks okay and that's what we did course the problem was that that wasn't good education that never was good education maybe when we didn't have books it kind of made some sense but hey we have lots of information out there we have YouTube we have the internet and we have this understanding
now that oh that isn't good because one the students aren't learning very well through a lecture right they need more engagement more Hands-On learning more interaction and also now with going away for two weeks and writing an essay well now we could easily turn to the AI and it could write an essay for us but in the same way before AI we could easily turn to a friend we could pay for a service we could have someone else we could copy from The Intern internet so it it never was a good assignment assessment technique to
Simply give it to the student wait two weeks and then evaluate them right we need to have more formative assignments more formative assessments more things going on in class so that we can for sure evaluate so we can hold that student accountable and now when the student writes an essay now we should look for additional things not just the product itself but the process can the student understand what they're doing can they talk about what they've done right if they turn in an essay I read it and then I ask them questions about it do
they know what they did so now the AI is sort of holding up a mirror to us in that we're starting to think much better as far as understanding that the student needs to go through an educational experience not just be given data and then have to recall that information but no they have to fully understand so this holding students more accountable for their learning I think is is excellent and I think it really helps us to understand education that much more thanks to AI holding up that that mirror to us but let's go back
and and think a little bit more as far as what's going on here because the reasoning capability has increased so much this is important for us to understand in Academia yes this new AI 03 is very expensive to run it costs a lot to to for it to have accomplished this with this competition of of breaking this record that cost a lot of money uh some calculations are that it cost $300,000 for it to compute uh in order to be able to answer these questions right and that's because it took a long time like 16
hours or so to go through and be able to answer it now that's what the 013 model does is that it takes longer to understand what's going on to understand the question and then reason it through and in fact what it's actually doing is something that that I thought would be more important to actually get us to AGI I mentioned this in the past as far as using multiple systems and it looks like that's exactly what's happening is that it uses one system to go through and and understand the question and then it goes through
and creates a bunch of different answers and then it uses another system to look at those answers to go through the possibilities of is this correct are those reasoning steps logical to come up with this solution tion it's even creating some synthetic data to to go through possibilities so it's putting all that together multiple systems and then coming up with the best solution and that takes time so it's much longer than simply using chat GPT where it's just using one system one large language model to look at its previously trained information and then come up
with a solution now this is important for us to understand for a couple different reasons in that the the what the research shows is that it actually works better when we use a large language model like gp4 because that's good for things like writing essays right where there isn't a 100% hey here's the actual answer well no we need more creativity for different types of essays of putting it together in various different ways but when we're talking about things like science you know the stem Fields well now an 01 model that goes through many different
variations of possibilities and then looks to see well what is the best reasoning to get to the best answer and now we can hold that as yes this is the best answer in the sciences that makes sense because there is one correct answer or one correct answer that's at least more correct so that's the difference here 01 model awesome for science the GPT models that's going to be awesome for things like writing things like essays things like that so we're having both of these things now this comes back to Academia in that hey if these
0103 models are going to be that much more powerful for science then this has lots of implications for for research right for doing research associated with these science fields and then we run into issues associated with with Equity it's going to cost money to access these things now of course the prices are going to go down as they come up with different ways to to uh improve efficiencies with the model making it better in itself and it's going to improve the overall process so we have that to look forward to but it is going to
cost more so then that's going to be an aspect of understanding of what about access to these things am I going to require student to use this if I do I need to make sure that they can access the information that they use this as a tool so that's going to be a big thing now the things that are learned from that 03 model from this revolutionary new capabilities a lot of that is going to be extracted and then applied to the G GPT models as well so we can definitely look forward to improvements with
GPT 4 with GPT 5 in the future where it's using a different type of system but now it's just important to see that yes there are definite improvements that are happening here with the AI models remember the the video that I made just a couple of weeks ago talking about that AI is not slowing down it's not slowing down in any stretch of the imagination no we're not running out of data and no we're not slowing down because there's new techniques new possibilities new ways of improvement which the o03 model is a prime example of
that so it's definitely moving forward and enhancing overall capabilities now there's a big call here for for safety aspects what's going to happen with this new model that can reason so much more that has this enhanced capability enhanced possibilities so they are calling for for people to contribute to to be part of the sort of a safety uh group that's going through and doing all sorts of different testing to help ensure that they can release this model and it can be safe for everyone to use I went through and tried to see if I could
sign up but you do have to be a plus member so you have to have a paid subscription in order to be part of this safety group so that's something to know so if you are a plus member definitely go out there be part of the solution uh help in order to ensure that we have a safe overall uh product that can be used for Academia and for the world as well the final thing I want you to understand is dealing with AGI right this is not AGI this is it's not it's not generalized intelligence
but it is definitely and for sure one step closer we are one step closer to understanding what AGI even is and we're one step closer to getting closer to that by by being able to do this and the way that they've structured things is that it seems like they'll be able to continually improve this for foreseeable future by going through and understanding inference and understanding the way that it's presenting the information and being able to solve based off of how it's going through it so it's moving beyond simply just the large language models and now
using additional aspects additional techniques in order to come up with novel solutions to novel problems very important for research but very important for moving towards AGI so that's a big thing for us to contemplate as far as yeah this is is looking more and more realistic as far as being able to achieve that sooner much more than later so we in Academia really need to make sure that we're staying on top of this and really focus on this aspect this idea of hey it's not just about knowledge it's not just about facts and figures that
we try to spend so much of our time on in the classroom no it's about experience it's about critical thinking it's about being able to use and Implement those ideas being able to manipulate that content and to put it together in different ways so creativity critical thinking application those are all big things that we need to be pushing and in order to really do that we need to create experiences within the classroom I can't stress that enough and that that is what we as humans will be able to maximize within the classroom is this is
this real experience social learning with other students being able to go through and display our capabilities discussions real discussions with emotion role playing all these things we should be maximizing in education and really to to to ensure that we are remaining relevant and the AI simply can just you know present information no we're doing much more because we are that human element so keep that in mind and continue to develop that because that's what we really need to do within education all right I hope you got a lot out of this this is a huge
deal and the improvements of AI and continuing to move forward with with AGI so lots of lots of new possibilities going into the new year 2025 will be unbelievable there's going to be lots of stuff going on with AI into robots and this is another prime example of how it can reason and when it has new novel situations which a robot will be running into many times it'll be able to use this more advanced reasoning capabilities to to do that much more so again next year 2025 year of the robot you have that to look
forward to as well Lots going on if you got a lot out of this please like share so that we can continue to develop our our channel here and uh please comment I really want to know what you think about all this is this a big deal is this going to change the way you do something what are your thoughts on AI I really want to know so please share so that we can continue to develop our community inquiry thank you and remember learning is for [Music] life for