Introduction to Deep Research

8.68k views3918 WordsCopy TextShare
OpenAI
Mark Chen, Josh Tobin, Neel Ajjarapu, and Isa Fulford introduce and demo Deep Research from Tokyo.
Video Transcript:
hi everyone my name is Mark and I lead research at openi today I'm joined by Issa and Josh from our research team and also Neil from our product team do you guys notice anything strange H yeah it looks a little different well it's because we're here in Tokyo so hello from Tokyo everyone the reason we're here is later on we're going to do a special event with one of our close Partners but this stream is about our next agentic offering I want to first talk about agents as they relate to openai so open cares about
agents because we believe that they're going to transform knowledge work we think that they're going to help Enterprises streamline their processes make workers more productive but it'll also be really really important for consumers so last year we launched 01 which was the first Model in our O Series of reasoning models these are models that differ from traditional models in that they think for a long time before come up with an answer and usually the longer they think the better the answer that they come up with one of the limitations however of these models is that
they don't have access to tools and one of the really core missing tools is the ability to browse the internet what this means is that a lot of the things that we use in everyday life are now not accessible to the model so we like to announce a next big step we are introducing a capability called Deep research what is deep research deep research is a model that does multi-step research on the internet and what it does is it discovers content it synthesizes content and it reasons about these this content adapting its plan as it
uncovers more and more information so one important feature of deep research why we call it deep research instead of just research is that we've removed latency constraints from the model typically models return fairly quickly but deep research models they can take five even 30 minutes before they come back with an answer and we think this is a good thing not a bad thing we think it's important for our models to start doing autonomous tasks for much longer in an unsupervised way and this is core to our AGI road map as well I think our ultimate
aspiration is a model that can uncover and discover new knowledge for itself and the first step here is a model that can go and synthesize and understand the models uh sorry the information on the web what you get from Deep research is a comprehensive fully cited research paper essentially something that an analyst or an expert in a field might produce to you now we've talked about usages for knowledge work but there are also many usages for other things um that require extensive web browsing for instance often times you may be searching for something very very
specific right um this also takes a lot of manual labor on the internet um you know you might want a specific item that your shop for with all these constraints that are tailored for your personal use so it's also very good for that I've also personally used deep research for putting together content for slides that I've used in a presentation so very very good across the board across a variety of different use cases finally I'm happy to announce that deep research is launching in Pro later today we're going to soon roll it out to plus
and team and then after that to education and Enterprise to show you how deep research works we have Neil sure thanks Mark so deep research is in Chach BT later today very excited to show you all how to use it deep research is accessible from a button right here in the beginning of chat BT and from here you can immediately put in any query and it's going to send it off to deep research uh I'm a PM at open aai and one of the things that we like to think about is what new features and
products should we build uh one of the things we've been tossing around is should we build a new language translation app and so this is something that can ask deep research to go and research for me so I'm actually going to type in this query um I want to learn a little bit more about all the different markets that I could go off and Target so I'm asking deep research help me find IOS and Android adoption rates the percent of folks who want to learn another language and the change in Mobile penetration over the past
couple years and give me that difference between the top developed countries and developing countries and I also really want this information in a formatted uh Report with some tables and a clear recommendation on what the best emerging opportunities are for chbt so this is a query that would have taken me hours to put together but with deep research I can just immediately kick it off is this your actual uh side project at opening this is my side hustle when I'm not working on deep research uh so what you'll first see is that deep research comes
back with a set of clarifying questions this just like a PM just like uh this is super important because if deep research is going on for 5 30 minutes you really want to get those requirements right and so there's a couple questions that it's giving to us right now you know how do you want mobile penetration U set up do you want overall adoption rates or specific categories so the percentage of folks uh be on general interest or really engaged interest these are really good questions that you'd expect an analyst to want to ask you
when you're giving them a really tough prompt and so it's really important that you can capture these up front so I might answer something along the lines of you know I want to look at this as a you know give me penetration as a percentage of users and look at overall usage and then make your best assumptions on the rest you know the model is really good at taking information that's sometimes specified and a little bit more open-ended and using that to go off on a mission and get all the information that you need so
you can see right now deep research has taken all of that and synthesized it and started kicking off its own research process deep research is really good across a number of different knowledge work domains and so we've seen folks being able to use it for market research for different academic uh you know areas across physics you know Computer Science Biology um I've been using it myself to try to PM a little bit on the side um and we're really hopeful that it'll be useful for you too at work and so what you'll see over here
is deep research pops open a little sidebar and it shows you all the reasoning that it's doing so you can see right now it's identifying you know the top countries it's gathering information and it's starting its process of searching for different information so zooming in over here you've seen that deep research is searching for information opening Pages reasoning about what it's seeing under the hood what's actually happening is that the model is conducting searches quite literally opening and browsing the pages and looking through all the different components including images tables PDFs and pulling out all
of that information and using that to determine what it does next and it's really cool here you can see it's uh using the information from one search to inform what it searches for in The Next Step yeah super cool it's fun to just watch and follow it along sometimes all right cool well why we wait for this one I'll hand over to Josh to show us a different way of how deep research can work thanks yeah so we've talked a lot about uh deep research for knowledge work and that's one of the use cases that
we're really excited about for this but it's not just for uh doing your job better it's also useful for things that you might want to do for fun or at home um um so one thing I really like to use deep research for is to do research on products that I might want to buy especially for like larger purchases where like for me if I'm buying something expensive I often will like read every page about it on the internet and um you know I want if there's like some review somewhere that's on the internet I
want to make sure that I've taken into consideration before I actually make the purchase so uh we're here in Japan and um I've heard the skiing is pretty good this time of year um but we planned this trip a little bit last minute um so I didn't actually bring my skis and I'm wondering if I can actually maybe just uh buy some skis and you know take a little bit of a ski vacation at the end of this so I want to buy some skis um for uh for skiing in Japan and um I one
thing I like to do also is um specify how deep research can format the output so format this as a report with a nice table at the end um and just like in Neil's example um this is going to come back with some questions that I can choose to answer or not answer so I'll say Advanced gear um all Mountain but powder sometimes um I've heard that powder is pretty good here hopefully we'll get lucky later this week um I'm tall uh so need long skis um long skis and um let's do something more fun
like maybe um I guess it'd be really cool to have like a nice color palette so about something with uh with a nice color palette and so I'll kick this off and um just like in Neil's example deep research will go off and do a bunch of research on uh different websites on the internet and hopefully come back with some good recommendations um so I'll hand it off to ISO to explain how this all works under the hood sounds good so deep research is powered by a fine-tuned version of our soon to be relased O3
reasoning model and we trained it using endtoend reinforcement learning on hard browsing and other reasoning tasks through that training the model learned to plan and execute a multi-step tra trajectory um reacting to real-time information and backtracking when necessary the final model is able to browse over user uploaded files it's also able to use a python tool for calculations and for creating an um images and um plots and then it can also actually embed those plots in it in its final response it's also able to embed images from websites in its final response and when it
cites its sources it actually CES specific cic sentences and passages the resulting model is able to complete tasks that would take humans many hours and that are quite complex and it also reaches new highs on a number of both public and private evaluations on Humanity's last exam a recently released Benchmark from the center for AI safety and scale AI which tests mod the model's capabilities across a range of expert subjects the Deep research model reaches a new high of 26.6% accuracy that's super impressive last final exam to um the this task consists of around 3,000
short answer and multiple choice questions across a range of around 100 different subjects and um it's actually really cool if you see the the trajectories and thinking process of the model because it's actually very similar to how a human would solve a problem so if I was given a really hard problem I'd probably do some online research to try and help me figure out the answer and we've seen examples for example in physics where the model has to answer some hard calculation it will look up an equation in an existing scientific paper and help that
and use that for um answering the question or in a poetry example the model had to identify a very Niche um poetic meter for a new poem and so we saw it looking up examples of other existing poems and trying to help that use that to help it reason through um how to get to the answer on another Benchmark um guia that measures models agenda capab ility and requires web browsing multimodal capability code execution um reasoning over files um the model also reaches a new high on all three levels of difficulty we've also put together
some internal benchmarks that are pretty broad-based can you talk about those yeah for sure so we also put together some um expert level internal ebals and we have a range of tasks that experts would do in their in their jobs and we had the Deep research model answer them and then how the experts rate the respon sponsors and the model was able to um complete tasks that the experts said would have taken them hours and very you know a lot of manual investigation so we have two graphs to illustrate this so we have on the
left Pass rates um for different estimated economic value ranges and then on the right we have pass rate for different ranges of number of hours to complete a task and what's in and pass rate is the rate at which the model provides a satisfactory answer to an expert level task as rated by that expert so what's interesting from these graphs is that um pass rate is more correlated with estimated economic value than it is with estimated number of hours to complete the task which shows us that um the things that the model finds difficult aren't
necessarily the same things that humans find time consuming so this graph is par rate on these expert level tasks against the maximum number of tool calls and what this shows us is that as the model is able to spend more time thinking and browsing the performance increases and this is really important because as you know Mark described we're moving towards a world where agents are going to be able to um take longer and longer and complete harder and harder tasks and so if we give them more time to think and more time to use these
tools they should be able to solve harder tasks and then one final internal evaluation is um a hallucination evaluation and this model actually performs the best on that EV um of any model we've released however um it's still possible that it will recate so when you're making reports make sure to check the sources yourself yes and so as we mentioned the Deep research model can take a really long time to respond so we generated some examples this morning um to show you the range of different things it can do and so we can look through
some of them now ni super long very very longor we Sol the scrolling to the problem okay so this is a finance so I'm an investment analyst in the Silicon Valley VC firm I want to analyze the market for civilian supersonic air travel and prepare a thorough investment memo and then many other specifications and so the model clarifies and we provided some um additional requirements for the memo and then the model kicked off the task and as you can see it went and researched for S minutes used 12 different sources and then came back to
us with quite a comprehensive report of the field and you can imagine if you were doing this for your job um this would be quite helpful to bootstrap your research as you're doing your initial investigation yeah hopefully uh hopefully this works and uh next time we're we have to come to Japan we'll be a little bit less jet lag with suic super so here's another example um it's a biology example so we uploaded a paper and we want to find other papers on the same topic um this was actually um a task from one of
our friends at open ey who's very um advanced in biology so not going to pretend to understand exactly what this says but we wanted to show the range weren't paying attention in biolog we knew SM we want to show the range of things it can do so it asked some clarifications we followed up and then um this task the model took quite a long time and it was able to um find a bunch of different papers that are on the same topic and when we showed this to our friend he said that it was pretty
good response so um was a good vote of confidence for the model and then one final example here okay so I'm sure everyone's had this moment where you can't remember the name of the restaurant that you went to in Tokyo 10 years ago or the name of the TV show that you're looking for and so this example might seem a bit contrived but that's we wanted to show how good the model is at finding those needle and a Hy stack pieces of information so the prompt is there's a TV show that I watched a while
ago I forgot the name but I do remember what happened in one of the episodes can you help me find the name here is what I remember in one of the episodes two men play poker one Folds after another tells him to bet and then a little bit more detail about the story and the only additional information we were able to provide was I think it was 5 to 10 years ago but I'm not really sure and the model is able to do um online research and figure out like through reading a bunch of different
sites and reasoning about the contents of those sites the actual TV show episode that we're thinking of which is pretty cool was that the right answer was that the one that is a TV show um so now hand back to Neil and Josh to check in on the task that you guys kicked off at the beginning yeah absolutely thanks you sir so we'll take a look at the original task over here it actually looks like the task is still going on right now but in the meantime while we've kicked it off it's already looked at
29 different sources and gone through a lot of different information oh wow okay perfect great timing incredible timing great so deep research just put together its full analysis it took us 11 minutes and in that process it looked at 29 different sites really in depth and as you can see live on this live stream it gave us a perfectly formatted report here you can see the mobile market analysis for mobile adoption and language learning we got a nice introduction our different adoption Trends everything put together in a really great uh report style where you can
see mobile penetration over time and a ton of different data and as you go down you can see it not only has information over here but also different uh um table formats and ways that it's presented the the data in a way that's super digestible so one of the other things that's really cool about this model is that you're able to click in and see all the different sources that it's able to site U over here you can see every uh citation that the models encountered and also different sites that it might have encountered that
it you know didn't necessarily put into the final output but it wants to let you know that it uh found along the way yeah um awesome well great check in on the skis let's check in on the skis all right um so um scrolling up here what I like about this is wow this did a lot of research um this is the kind of thing that I would probably like have to spend all afternoon just you know you know for for my own sanity to make a good purchase to read every single thing that's written
about it um but this does a pretty good job of actually just doing like hitting all the sites that I would hit and consolidating this all in a format that's a lot easier to digest than you know doing my own searches and um it also provides a table at the bottom here that just gives kind of like the high level comparison across uh the specific things that I I mentioned that I wanted uh for for this purchase um we I find that deep research works really well if you're like very very specific about um the
the type of answer that you're looking for both in terms of what information that you want um what comparisons you want to see and uh and anything about the format that you want the final output in because the model is able to take all of those things into consideration and um and uh and think about them all as it does it searches puts together its final report so uh this sort of passes the sniff test for me because the top recommendation here is actually the skis that I own at home which is kind of cool
that I'll uh I'll take a closer look at this and um and uh maybe maybe plan a little bit of a ski trip after this all right um let's go this weekend yeah so yeah as you can imagine um there's a lot more that we can do with this technology um so I'll hand it back over to Mark to talk through where we're going with this awesome yeah just to recap deep research is available later today on Pro and we're soon going to bring it to desktop and mobile but again what we're launching today is
just scratching the surface of what you can imagine us doing with deep research today we have a deep research agent that browses the web but you can imagine that same deep research agent connecting to custom context right or really just kind of Enterprise storages of data again deep research is important to our AGI road map we believe in agents that think longer and longer more autonomously to solve very difficult tasks and we believe that this you know the ability to work on a task for 30 minutes really does motivate a lot more compute investment so
we're excited to see what you guys do and um please share with us thank you so much
Copyright © 2025. Made with ♥ in London by YTScribe.com