Building OpenAI o1

135.89k views674 WordsCopy TextShare
OpenAI
Top row (left to right): Mark Chen, Giambattista Parascandolo, Trapit Bansal, Łukasz Kaiser, Hunter ...
Video Transcript:
we're starting a series of new models uh with the new name o1 and this is to highlight the fact that you might feel different uh when you use o as a compared to previous models such as GPT 40 so as others will explain later O is a reasoning model so it will think more before answering your question we are releasing two models 01 preview which is to preview what's coming for 01 and 01 mini which is a faster slow smaller and faster model that is trained with a similar framework as o1 so we hope you
like our new naming scheme oh1 so what what is reasoning anyway so one way of thinking of reasoning is that um there are times where we ask questions and we need answers immediately because they're simple questions for example if you ask what's the capital of Italy you know the answer room and you don't really have to think about it much but if you um Wonder um about a complex puzzle or you want to write really good business plan um you want to write the novel you probably want to think about it for a while and
the more you think about it the better the outcome so reasoning is the ability of turning thinking time into better outcomes whatever the task you're doing it's been going on for a long time but I think what's really cool about research is there's that aha moment there's that particular point in time where something surprising happens and things really click together are there any times for you all when there was you had that aha moment there was the first moment when the moment was hot of the press we started talking to the model and people were
like wow this is this mod is really great and starting doing doing something like that and I think that there was a certain moment in our in our training process where we trained like put more comput in ourl than before and train first all generating coherent chains of thought and we so wow this this looks like something meaningfully different than before and I think I think for me this is the moment uh I think related to that uh when we think about like training a model for reasoning one thing that immediately jumps to mind is
you could have humans write out their thought process and train on that um when AA moment for me was like when we saw that if you train the model using RL to generate and hone its own chain of thoughts it can do even better than having humans right Chain of Thought for it and that was in aha moment that you could really scale this uh and explode models reasoning that way for a lot of the time that I've been here we've been trying to make the models better at solving math problems as an example and
we've put a lot of work into this and we've come with a lot of different methods but one thing that I kept like every time I would read these outputs from the models I'd always be so frustrated that uh the model just would never seem to question what was wrong or when it was making mistakes or things like that but one of these early uh 01 models when we trained it and we actually started talking to it we started asking it these questions and it was scoring higher on these math tests we were giving it
we could look at how it was reasoning and you could just see that it started to question itself and have really interesting reflection and that was a moment for me where I was like wow like we we've uncovered something different this is going to be something new and and it was just like one of these coming together moments that that that was really powerful thank you and congrats on releasing this
Copyright © 2024. Made with ♥ in London by YTScribe.com