OpenAI o1 and o1 pro mode in ChatGPT — 12 Days of OpenAI: Day 1

3.38k views2748 WordsCopy TextShare

OpenAI

Sam Altman and some members of the OpenAI team introduce & demo o1 and o1 pro mode in ChatGPT and di...

Video Transcript:

hello welcome to the 12 days of open AI we're going to try something that as far as we know no tech company has done before which is every day for the next 12 every week day we are going to launch or demo some new thing that we built and we think we've got some great stuff for you starting today we hope you'll really love it and you know we'll try to make this fun and fast and not take too long but it'll be a way to show you what we've been working on and a little

holiday present from us so we'll jump right into this first day uh today we actually have two things to launch the first one is the full version of 01 we have been very hard at work we've listened to your feedback you want uh you like o one preview but you want it to be smarter and faster and be multimodal and be better in instruction following a bunch of other things so we've put a lot of work into this and for scientists engineers coders we think they will really love this new model uh I'd like to

show you quickly about how it performs so you can see uh the jump from GPT 40 to o1 preview across math competition coding GP QA Diamond um and you can see that 01 is a pretty big step forward um it's also much better in a lot of other ways but raw intelligence is something that we care about coding performance in particular is an area where people people are using the model a lot so in just a minute uh these guys will demo some things about a one they'll show you how it does at speed how

it does at really hard problems how it does with multimodality but first I want to talk just for a minute about the second thing we're launching today a lot of people uh Power users of chat gbt at this point they really use it a lot and they want more compute than $20 a month can buy so we're launching a new tier chat gbt pro and pro has unlimited access to our models uh and also things like advanced voice mode it also has a uh a new thing called 01 PR mode so 01 is the smartest

model in the world now except for 01 being used in PR mode and for the hardest problems that people have uh 01 PR mode lets you do even a little bit better um so you can see a competition math you can see a GP QA Diamond um and these boosts may look small but in in complex workflows where you're really pushing the limits of these models it's pretty significant uh I'll show you one more thing about Pro about the pro mode so one that people really have said they want is reliability and here you can

see how the reliability of an answer from prom mode Compares to1 and this isn't even stronger Delta and again for our Pro users we've heard a lot about how much people want this chat PT Pro is $200 a month uh launches today over the course of this these 12 days we have some other things to add to it that we think you also really love um but Unlimited Model use and uh this new 01 prom mode so I want to jump right in and we'll show some of those demos that we talked about uh and

these are some of the guys that helped build 01 uh with many other people behind them on the team thanks Sam hi um I'm H onean I'm Jason and I'm Max we're all research scientists who worked on building 01 o1 is really distinctive because it's the first model we've trained that thinks before it responds meaning it gives much better and often more detailed and more correct responses than other models you might have tried 01 is being rolled out today to all uh plus and soon to be Pro subscribers on chat gbt replacing o1 PR o1

model is uh faster and smarter than the o1 preview model which we launched in September after the launch many people asked about the multimodel input so we added that uh so now the oan model live today is able to region through both images and text jointly as Sam mentioned today we're also going to launch a new tier of Chad gbt called chbt pro chbt pro offers unlimited access to our best models like 01 40 and advanced voice chbt Pro also has a special way of using 01 called 01 Pro mode with o1 Pro mode you

can ask the model to use even more compute to think even harder on some of the most difficult problems we think the audience for chat gbt Pro will be the power users of chat gbt those who are already pushing the models to the limits of their capabilities on tasks like math programming and writing it's been amazing to see how much people are pushing a one preview how much people who do technical work all day get out of this and uh we're really excited to let them push it further yeah sure we also really think that

01 will be much better for everyday use cases not necessarily just really hard math and programming problems in particular one piece of feedback we received about o1 preview constantly was that it was way too slow it would think for 10 seconds if you said High to it and we fixed that was really annoying it it was kind of funny honestly it really thought it cared really thought hard about saying hi back yeah um and so we fixed that 01 will now think much more intelligently if you ask it a simple question it'll respond really quickly

and if you ask it a really hard question it'll think for a really long time uh we ran a pretty detailed Suite of human evaluations for this model and what we found was that it made major mistakes about 34% less often than o one preview while thinking fully about 50% faster and we think this will be a really really noticeable difference for all of you so I really enjoy just talking to these models I'm a big history buff and I'll show you a really quick demo of for example a sort of question that I might

ask one of these models so uh right here I on the left I have 01 on the right I have o1 preview and I'm just asking at a really simple history question list the Roman EMP of the second century tell me about their dates what they did um not hard but you know GPT 40 actually gets this wrong a reasonable fraction of the time um and so I've asked o1 this I've asked o1 preview this I tested this offline a few times and I found that 01 on average responded about 60% faster than1 preview um

this could be a little bit aable because right now we're in the process of swapping all our gpus from 01 Pro preview to 01 so actually 01 thought for about 14 seconds 01 preview still going there's a lot of Roman emperors there's a lot of Roman emperors yeah 40 actually gets this wrong a lot of the time there are a lot of folks who rolled for like uh 6 days 12 days a month and it sometimes forgets those can you do them all for memory including the six day people no yep so here we go

01 thought for about 14 seconds preview thought for about 33 seconds these should both be faster once we finish deploying but we wanted this to go live right now exactly um so yeah we we think you'll really enjoy talking to this model we we found that it gave great responses it thought much faster it should just be a much better user experience for everyone so one other feature we know that people really wanted for everyday use cases that we've had requested a lot is multimodal inputs and image understanding and hungan is going to talk about

that now yep to illustrate the multimodal input and reasoning uh I created this toy problem uh with some handdrawn diagrams and so on so here it is it's hard to see so I already took a photo of this and so let's look at this photo in a laptop so once you upload the image into the chat GPT you can click on it and um to see the zoomed in version so this is a system of a data center in space so maybe um in the future we might want to train AI models in the space

uh I think we should do that but the Power number looks a little low one G okay but the general idea rookie numbers in this rookie numbers rookie okay yeah so uh we have a sun right here uh taking in power on this solar panel and then uh there's a small data center here it's exactly what they look like yeah GPU Rex and then pump nice pump here and one interesting thing about um operation in space is that on Earth we can do air cooling water cooling to cool the gpus but in space there's nothing

there so we have to radiate this um heat into the deep space and that's why we need this uh giant radiator cooling panel and this problem is about finding the lower bound estimate of the cooling panel area required to operate um this 1 gaw uh uh data center probably going to be very big yeah let's see how big is let's see so that's the problem and going to this prompt and uh yeah this is essentially asking for that so let me uh hit go and the model will think for seconds by the way most people

don't know I've been working with henan for a long time henan actually has a PHD in thermodynamics which it's totally unrelated to Ai and you always joke that you haven't been able to use your PhD work in your job until today so you can you can trust hungan on this analysis finally finally uh thanks for hyping up now I really have to get this right uh okay so the model finished thinking only 10 seconds it's a simple problem so let's see if how the model did it so power input um so first of all this

one gwatt that was only drawn in the paper so the model was able to pick that up nicely and then um radiative heat transfer only that's the thing I mentioned so in space nothing else and then some simplifying um uh choices and one critical thing is that I intentionally made this problem under specified meaning that um the critical parameter is a temperature of the cooling panel uh I left it out so that uh we can test out the model's ability to handle um ambiguity and so on so the model was able to recognize that this

is actually a unspecified but important parameter and it actually picked the right um range of param uh temperature which is about the room temperature and with that it continues to the analysis and does a whole bunch of things and then found out the area which is 2.42 million square meters just to get a sense of how big this is this is about 2% of the uh land area of San Francisco this is huge not that bad not that bad yeah oh okay um yeah so I guess this this uh reasonable I'll skip through the rest

of the details but I think the model did a great job job um making nice consistent assumptions that um you know make the required area as little as possible and so um yeah so this is the demonstration of the multimodal reasoning and this is a simple problem but o1 is actually very strong and on standard benchmarks like mm muu and math Vista o1 actually has the state-ofthe-art performance now Jason will showcase the the pr mode great so I want to give a short demo of uh chb1 Pro mode um people will find uh o1 prom

mode the most useful for say hard math science or programming problems so here I have a pretty challenging chemistry problem that o1 preview gets usually Incorrect and so I will uh let the model start thinking um one thing we've learned with these models is that uh for these very challenging problems the model can think up to a few minutes I think for this problem the model usually thinks anywhere from 1 minute to up to 3 minutes um and so we have to provide some entertainment for for people while the model is thinking so I'll describe

the problem a little bit and then if the model's still thinking when I'm done I've prepared a dad joke for for us uh to fill the rest of the time um so I hope it think for a long time you can see uh the problem asks for a protein that fits a very specific specific set of criteria so uh there are six criteria and the challenge is each of them ask for pretty chemistry domain specific knowledge that the model would have to recall and the other thing to know about this problem uh is that none

of these criteria actually give away what the correct answer is so for any given criteria there could be dozens of proteins that might fit that criteria and so the model has to think through all the candidates and then check if they fit all the criteria okay so you could see the model actually was faster this time uh so it finished in 53 seconds you can click and see some of the thought process that the model went through to get the answer uh you could see it's uh thinking about different candidates like neuro Lian initially um

and then it arrives at the correct answer which is uh retino chisen uh which is great um okay so to summarize um we saw from Max that o1 is smarter and faster than uh o1 preview we saw from hangan that oan can now reason over both text and images and then finally we saw with Chach BT Pro mode uh you can use o1 to think about uh the the to to to to reason about the hardest uh science and math problems yep there's more to come um for the chpt pro tier uh we're working on

even more computer intensive tasks to uh Power longer and bigger tasks ask for those who want to push the model even further and we're still working on adding tools to the o1 um model such as web browsing file uploads and things like that we're also hard at work to bring o1 to to the API we're going to be adding some new features for developers structured outputs function calling developer messages and API image understanding which we think you'll really enjoy we expect this to be a great model for developers and really unlock a whole new frontier

of aent things you guys can build we hope you love it as much as we do that was great thank you guys so much congratulations uh to you and the team on on getting this done uh we we really hope that you'll enjoy 01 and prom mode uh or Pro tier uh we have a lot more stuff to come tomorrow we'll be back with something great for developers uh and we'll keep going from there before we wrap up can can we hear your joke yes uh so um I made this joke this morning the the

joke is this so Santa was trying to get his large language model to do a math problem and he was prompting it really hard but it wasn't working how did he eventually fix it no idea he used reindeer enforcement learning thank you very much thank you