AI Won’t Plateau — if We Give It Time To Think | Noam Brown | TED

133.06k vistas2024 PalabrasCopiar TextoCompartir

TED

To get smarter, traditional AI models rely on exponential increases in the scale of data and computi...

Transcripción del Video:

the incredible progress in AI over the past 5 years can be summarized in one word scale yes there have been algorithmic advances but the frontier models of today are still based on the same Transformer architecture that was introduced in 2017 and they are trained in a very similar way to the models that were trained in 2019 the main difference is is the scale of the data and compute that goes into these models in 2019 gbt2 cost about $5,000 to train every year since then for the past 5 years the models have gotten bigger trained for

longer on more data and every year they've gotten better but today's Frontier models can cost hundreds of millions of dollars to train and there are reasonable concern among some that AI will soon Plateau or hit a wall after all are we really going to train models that cost hundreds of billions of dollars what about trillions of dollars at some point the scaling Paradigm breaks down this is in my opinion a reasonable concern and in fact it's one that I used to share but today I am more confident than ever that AI will not plateau and

in fact I believe that we will see AI progress accelerate in the coming months to explain why I want to tell a story for my time as a PhD student I started my PhD in 2012 and I was lucky to be able to work on the most exciting projects I could imagine developing AIS that could learn on their own how to play poker now I had played a lot of Poker when I was in high school and college so for me this was basically my childhood dream job now contrary to its reputation poker is not

just a game of luck it's also a game of deep strategy you can kind of think of it like chess with a deck of cards when I started my PhD there had already been several years of research on how to make AIS the play poker and the general feeling among the research Community is that we had figured out the Paradigm and now all we needed to do was scale it so every year we would train larger poker eyes for longer on more data and every year they would get better just like today's Frontier language models

by 2015 they got so good that we thought they might be able to rival the top human experts so we challenged four of the world's top poker players to an 880,000 hand poker competition with $120,000 in prize money to incentivize them to play their best and unfortunately our bot lost by a wide margin in fact it was clear even on day one that our bot was outmatched but during this competition I noticed something interesting you see leading up to this competition our bot had played almost a trillion hands of poker over thousands of CPUs for

about 3 months but when it came time to actually play against these human experts the bot acted instantly it took about 10 milliseconds to make a decision no matter how difficult it was meanwhile the human experts had only played maybe 10 million hands of poker in their lifetimes but when they were faced with a difficult decision they would take the time to think if it was an easy decision they might only think for a couple seconds if it was a difficult decision they might think for a few minutes but they would take advantage of the

time that they had to think through their decisions in Daniel Conan's book Thinking Fast and Slow he describes this as the difference between system one thinking and system two thinking system one thinking is the faster more intuitive kind of thinking that you might use for example to recognize a friendly face or laugh at a funny joke system two thinking is the slower more methodical thinking that you might use for things like planning a vacation or writing an essay or solving a hard math problem after this competition I wondered whether this system to thinking might be

what's missing from aot and might explain the difference in the performance between our bot and the human experts so I ran some experiments to see just how much of a difference this system two thinking makes in poker and the results that I got blew me away it turned out that having the bot think for just 20 seconds in a hand of Poker got the same boost in performance as scaling up the model by 100,000x and training it for 100,000 times longer let me say that again spending 20 seconds thinking in a hand of Poker got

the same boost in performance as scaling up the size of the model and the training by 100,000x when I got this result I literally thought it was a bug for the first 3 years of my PhD I had managed to scale up these models by 100x I was proud of that work I had written multiple papers on how to do that scaling but I knew pretty quickly that all of that would be a footnote compared to just scaling up system 2 Thinking So based on these results we redesigned the poker AI from the ground up

now we were focused on scaling up system two thinking in addition to system one and in 2017 we again challenged four of the world's top Poker Pros to a 120,000 hand poker competition this time with $200,000 in prize money and this time we beat all of them by a huge margin this was a huge surprise to everybody involved it was a huge surprise to the poker Community it was a huge surprise to the AI community and honestly even a huge surprise to us I literally did not think it was possible to win by the kind

of margin that we won by in fact I think what really highlights just how surprising this result was is that when we announced the competition the poker Community decided to do what they do best and gamble on who would win when we started when we announced the competition the betting odds were about to one against us after the first 3 days of the competition when we had won for the first 3 days the betting odds were still about 50/50 but by the eth day of the competition you could no longer gamble on which side would

win you could only gamble on which human would lose the least by the end this pattern of AI benefiting by thinking for longer is not unique to Poker and in fact we've seen it in multiple other games as well for example in 1997 IBM created deep blue an AI that plays chess and they challenged the world champion Gary Kasparov to a tournament and beat him in a landmark achievement for AI but deep blue didn't act instantly deep blue thought for a couple minutes before making each move similarly in 2016 Deep Mind created alfago and a

that plays the game of Go which is even more complicated than the game of chess and they too challenged a world champion leis s doal and beat him in a landmark achievement for AI but alphao also didn't act instantly Alpha Alo took the time to think for a couple minutes before making each move in fact the authors of alphago later published a paper where they measured just how much of a difference this thinking time makes for the strongest version of alfago and what they found is that when alfago had the time to think for a

couple minutes it would beat any human alive by a huge margin but when it had to act instantly it would do much worse than top humans in 2021 there was a paper that was published that tried to measure just how much of a difference this thinking time made a bit more scientifically in it the authors found that in these games scaling up thinking Time by 10x was roughly the equivalent of scaling up the model size and training by 10x so you have this very clear clean relationship between scaling up system 2 Thinking time and scaling

up system one training now why does this matter well remember I mentioned at the start of this talk that today's Frontier models cost hundreds of millions of dollars to train but the cost of querying them the cost of asking a question and getting an answer is fractions of a penny so this result says that if you want an even better model there are two ways you can do it one is to keep doing what we've been doing for the past 5 years and scaling up system one training go from spending hundreds of millions of dollars

on a Model to billions of dollars on a Model the other is to scale up system to thinking and go from spending a penny per query to 10 cents per query at a certain point that trade-off becomes well worth it now of course all these results are in the domain of games and there was a reasonable question about whether these results could be extended to a more complicated setting like language but recently my colleagues and I at open AI released o1 a new series of language models that think before responding if it's an easy question

o1 might only think for a few seconds if it's a difficult decision it might think for a few minutes but just like the AIS for chess go and poker 01 benefits by being able to think for longer this opens up a completely new dimension for scaling we're no longer constrained to just scaling up system one training now we can scale up system two thinking as as well and the beautiful thing about scaling up in this direction is that it's largely untapped remember I mentioned that the frontier models of today cost less than a penny to

query now when I mention this to people a frequent response that I get is that people might not be willing to wait around for a few minutes to get a response from a model or pay a few dollars to get an answer to their question and it's true that 01 takes longer and costs more than other models that are out there but I would argue that for some of the most important problems that we care about that cost is well worth it so let's do an experiment and see raise your hand if you would be

willing to pay more than a dollar for a new cancer treatment all right basically everybody in the audience keep your hand up how about $1,000 how about a million dollars what about for more efficient solar panels or for a proof of the reman hypothesis the common conception of AI today is chatbots but it doesn't have to be that way this isn't a revolution that's 10 years away or even two years away it's a revolution that's happening now my colleagues and I have already released a one preview and I have had people come to me and

say that it has saved them days worth of work including researchers at top universities and that's just the preview I mentioned at the start of this talk that the history of a progress over the past 5 years can be summarized in one word scale so far that has meant scaling up the system one training of these models now we have a new paradigm one where we can scale up system two thinking as well and we are just at the very beginning of scaling up in this direction now I know that there are some people who

will still say that AI is going to Plateau or hit a wall and to them I say want to bet thank you [Applause]

Videos Relacionados