the new 01 model is a total game changer for all e agent developers to really understand why this is such a game changer we need to first talk about systems one and systems 2 thinking as described in one of my favorite books by Daniel kimman thinking fast and slow so you know that feeling when you're talking to someone and it feels almost effortless like you've been in this situation before that systems one thinking at work it's fast and intuitive type of thinking with little to no effort and almost no sense of control it uses our
subconscious mind to operate based on patterns we already recognize for example it is used when we are speaking on a spot like in a conversation with a friend systems to thinking on the other hand is more deliberate it requires a lot more mental effort consideration and reasoning this is something we use when we are preparing for a speech or problem solving but how does this relate to AI well as Andre Kathy one of the engineers behind chat GPT said a few months ago llms currently support only systems one thinking they simply sped out text based
on familiar patterns that they've learned before without any consideration this is why they can't come up with anything original or create anything new or they could couldn't before because what openi has just done is introduced some aspects of systems to thinking into large language models previously if nii agent couldn't find the solution it would just keep looping on the task forever or it would have to ask a human for help for example sometimes the browsing agent might get stuck on a research task because it encounters a pay wall and then it just keeps looping forever
trying to avoid that pay wall when much simpler approach is to go back a page and do another search this is why for all complex nonlinear multi-step workflows in our EI agency we always add a way for users to supervise these agents there are also some other ways you can use to avoid these hallucinations like you can add hardcoded validation logic however the real game Cher is that now with these new models these agents can finally stop and think for themselves so we don't have to worry about this anymore this has huge implications for all
EI agent developers first as highlighted in this McKinzie study the state of AI in 2024 the primary reason Enterprises are still barely using any AI in their operations is inaccuracy and obviously the fear of inaccuracy comes from incorrect reasoning not from AI just outputting undesirable text you can always add a disclaimer that AI might be wrong and with Gro 2 I think people have finally started to distrust things on the internet not just AI models the biggest consequences of hallucinations occur when an AI model misuses in action not just outputs on text for example if
it refunds a customer when the refund shouldn't have been made this will cause a lot more trouble this is why this new model is not just smarter but as open say it's also safer and secondly we can now cover significantly more complex open-ended tasks from our experience at our AI agency the 01 model essentially does what we do with my agent framework it adds planning breaking down tasks exploring various options and so on however now instead of you having to create and prompt multiple agents yourself add a Chain of Thought parameters and so on this
new model does it for you out of the box and it does it much better so the agent creation process is not only more reliable but also more streamlined you need fewer agents less prompting and less fine-tuning in order to accomplish a task now how did they do this well in theory it's quite simple they just trained an llm on thoughts literally they just show these models how to think before speaking that's it in practice however it's definitely a lot more advanced than that from what we know so far open the eye combined reinforcement learning
with Chain of Thought synthetic data which allows it to evaluate its own thinking with reasoning so it doesn't just output thoughts it actually learns to recognize and correct its own mistakes break down tricky steps into simpler ones and try different approaches what we know for a fact is that the Chain of Thought training data is the secret Source here this is why open AI is currently keeping it private so how exactly they process this data is still unclear but I believe it's not all synthetic data because from the few examples they shared it actually includes
words like H and interesting so this is not just another technique like structured outputs I talked about earlier this is a whole new paradigm in eii you see before 99% of compute in all EI models was being spent on pre-training and posttraining with less than 1% spent on inference now open hii has significantly shifted this Paradigm by dramatically increasing the inference time compute which is now spent on reasoning tokens this has been shown to significantly improve results not just by 2 to 5% like we've seen with all other models this year on some benchmarks which
involve freezing it's been shown to improve performance eight times compared to previous models and it's not even GPT 5 or some new breed of a model is the same foundational model as GPT 40 as far as we know only with this new additional layer on top but the best of all is that as we increase the inference time compute it seems like the performance just keeps on increasing as well so this definitely tells us that open EI is heading in the right direction and that reasoning models is the next big thing by the way if
you're using this model on chpt right now keep in mind that it's only a preview version with temperature set to one that's why some people on Twitter are reposting that it doesn't get the strawberry question right in our agency we always set the temperature to at least point3 when working on reasoning tasks so open purposely released an imperfect preview model first so it can make mistakes and they can then use them to improve its reasoning even further now of course there are certain limitations the first major limitation is that this model is a lot more
expensive it's significantly more expensive because it generates a ton of reasoning tokens before it outputs a response and you are charged for these tokens as completion tokens not as input tokens not only that but the output tokens themselves are also four times more expensive than the previous version so if you combine it all together the total inference costs for this model are around 10 times more than GPT 40 many use cases with this pricing simply don't make sense the second limitation is that it takes much longer to respond since it takes around 10 seconds to
think realtime applications like phone calls might not be feasible and lastly evaluator still prefers standard GPT models for certain tasks like personal writing and editing so this model is primarily designed for complex problem solving and reasoning not for Creative tasks overall it's really interesting to see how the limitations of these models are Evol in over time to handle different types of tasks much like how some people are better at math While others excel in Creative tasks so when deploying agents we will soon also have to consider the right type of the model for the right
role now the question is when should you use it and when should you not my answer is use it the same way you use systems to thinking whenever your agent needs to perform any kind of design work that does not have significant time constraints use this model think about roles like managers or CEOs who are typically more expensive because they need to guide the rest of the team they kind of disappear once in a while to really think about the problem scope the project and only then they come back with the tasks that they assign
to other team members they don't execute the tasks themselves because this would be inefficient they only provide high level supervision so only when a team member is stuck or or needs any help they jump in and provide more guidance this is essentially how Devon did it if you watch their demo video Dev breaks down complex problems into manageable steps and then executes them one at a time the reasoning model determines the steps and the approach while other smaller models can implement the code so hopefully soon with the release of this model in assistant CPI everyone
will be able to create their own Deon for any kind of role in a matter of days or or even faster however that's not it you can also use these models to assist you in creating agents and process their knowledge which I will show you in the next video so what does this all mean for the future well if you thought that the 01 model was groundbreaking just wait until Orion arrives I believe open EI is moving towards EI models that can take 10 to 20 minutes or even more maybe even days to deeply consider
a problem conduct research with web browsing other tools files as needed and then return to you with a truly Noel solution for example they could tackle a problem which Humanity has struggled with for decades this will ultimately lead to an intelligence explosion as scientists will use these models to significantly accelerate their own research oh my God it ran now here's where it gets really interesting for EI agents with this new reasoning capability AI agents won't just follow the same instructions they will reflect on their own performance identify areas for improvement and then adjust themselves accordingly
this kind of self-awareness with this new model opens up the door to exponentially self-improving AI agents in the next few years I'm sure that we will see some businesses that were created just from an idea and an initial set of instructions solely using AI agents that constantly self-improve and evolve over time but that's not it in the meantime AI agents will stop being just simple assistants they will become strategic partners for businesses in all important decisions they will analyze all your company's data all at once and then come up with strategic insights that you could
not have possibly foreseen on your own also this will of course lead to a significant increase in Enterprise adoption because companies that were previously cautious due to inaccuracy might finally find these new models reliable enough more videos on the 01 model are definitely coming up soon so if you don't want to miss out don't forget to subsscribe