Are we ready for human-level AI by 2030? Anthropic's co-founder answers

11.32k views10007 WordsCopy TextShare
Azeem Azhar
Anthropic's co-founder and chief scientist Jared Kaplan discusses AI's rapid evolution, the shorter-...
Video Transcript:
you last year put forward the prospect of human level artificial intelligence by 2030 if anything I expect it probably sooner than 2030 probably more like in the next 2 to 3 years what would need to be true in terms of deep seat for it to propel itself beyond the capabilities of us Frontier models there's so much linging fruit to collect that it's unpredictable who's going to sort of find which advances first there's no reason why they can't be very competitive algorithmically what does it mean to be interpretable if the machines are operating in spaces that
make us look not so much like silverback gorillas but like hamsters one of the directions we're moving is where you could have ai systems that think about what another version of CLA is doing in order to sort of monitor it and steer it in a good direction so by the time you're at a point where AI is sort of as smart as people or Beyond you're able to leverage those smarter than human AI the way in which these models impact economic productivity and the labor market could be much faster than the canal electricity or the
iPhone what is the debate that ought to be happening that would most help us prepare for that kind of fast deployment scenario is it really safe to have ai that is smarter than you and I think that is a real question like should we be having these super intelligent AI aliens kind of invading the Earth or should we decide not to I'm delighted that I've got Jared Kaplan who's the co-founder and chief scientist of anthropic Jared it's great to have you here thanks so much for having me it's great to be here you last year
put forward the prospect of human level artificial intelligence by 2030 given everything that's happened since then what's your current assessment I mean I I think if anything and I mean maybe I'm I'm ding too much of my own Kool-Aid but I mean if anything I expect it probably sooner than than 2030 probably more like in the next two to three years but what is human level AI exactly I mean it's uh it's it's not it's not something like an objective measure that you either cross the line or you don't I think AI is just going
to keep getting better in a lot of different ways so you've raised a really important question there which is what is it you know it's not like Landing two astronauts on the moon and bringing them back safely uh that was very very clear and well understood what is the purpose of having a test that says human level AI yeah well we don't really have test I mean at anthropic I mean we have probably tens hundreds of different tests and evaluations that we're running on Claud and as time goes on I think honestly the experience of
working with Claude collaborating with Claude and like kind of getting productivity benefits from that is in some ways a better measure for how useful clot is I think than than any given test I guess the way that I think about sort of how capable AI is is sort of on two two taxis there's what environments can an AI actually go out and act in so I go I go back to sort of alphao which was this superhuman go playing program better than any human smarter than any person but it was restricted to be on a
go board on a very very restricted little grid that you can you can you can act on a very specific game and as we sort of developed large language models large multimodal models the different environments that a I could interact in has has grown so I mean it grew a lot where you could just talk to chat Bots like Claude I think it's grown further where AI can understand images it gr goes goes further when AI can use computers and eventually obviously the thing that we all imagine the sort of sci-fi thing is AI being
embodied in a robot uh that can that can go out into the world so I guess that's one one of the directions I think of the other is just like how complex are the things that AI can do like can it do something that would take me a minute or 10 minutes or an hour or a day and I think we're just going to keep moving in that direction and that's where sort of uh AI has to sort of actually take action to the world and learn things the the way that we do to to
be useful in that way yeah I think that's a really helpful way of thinking about it so if I play that back one is like what is the range and breadth of domains in which it can operate and you know of course we've gone from text to multimodal images and and of course that next boundary being the the physical world although I always think that we do a lot of our most useful work in our heads uh anyway and then the second one I think is so interesting it's this idea of what is that unit
of human time that the machine is able to operate on because the very early large language models if you go back to I guess it was bir right they they did tasks that were a second look at a sentence and find a noun and then when you got the first versions of I guess gpt3 you had task that were that could last maybe 10 seconds look at a paragraph and pull out a sentence and of course if you go to Sonic 3.7 I mean I can give Sonic 3.7 tasks that might take me hours uh
so here is 20,000 words distill out eight or nine of the key arguments identify where they are coherent with each other and where they don't agree with each other and that that's a job that would take a graduate student half a day and it's quite interesting to see the rapid progression of that sort of duration of task from these models so I I suppose one question I would have for you is is that something that you track and you can forecast in your heads space 3.7 which is the latest Claude how long can it operate
for it's a great question yeah know that this is something that I track and it's definitely something certainly it's something that I think about very actively and and a lot of our research is oriented around this I think there's a sort of we talk about it is sort of the Horizon that Claud can operate on I think the way that at least if you're a developer and obviously not everyone is that maybe this is most visitable with something like Claude code where you can ask Claude to sort of search through repository of code make changes
across all sorts of different features and maybe iterate and test the code itself so I think those kinds of capabilities feel like they're the most complex as you said a lot of what we do happens in our heads and that's that's true for me too but I think that like the way that we really sort of get a purchase on the world is by trying things and see what works and see what does doesn't and so I think that's what sort of really allows you to to sort of extend this Horizon it's definitely something I
track I mean it's something that also I remember people years ago who were AI enthusiasts talking about well maybe AI won't be able to do things that take longer and longer and I think we are seeing this Horizon expand and so the utility of AI goes up wh why does the Horizon expand is it more memory in the gpus is it some bit of magic that you're tweaking it's a great question so I think a few things I think one one aspect is just sort of the model intelligence kind of in a general sense is
going up so the model's able to attend to more different issues to track more things another is sort of the context length so the context length of our models keeps going up and we find that we could extrapolate it much further than anything that we've shipped yet and so um it should be possible for for AI to understand more and more like to go from understanding like a paragraph to a chapter to a book to something much much longer that's helping it and then finally I think that we're training AI using reinforcement learning to do
more complex tasks in a sort of useful way so we're training AI to do more complex coding tasks to study longer documents exactly like the example you gave and distill out more information we're always sort of trying to find the tasks that push the envelope on what Claude can do and train it to get better better there just as like in our own education as people like we're always trying to solve harder and harder problems as we get older and we progress from elementary school to high school to University so I think it's all of
those things together that are sort of pushing this envelope right but you know when we think about large language models a lot of the emphasis is on that first L at the large and we we lived in this regime of of scaling laws where to some degree there was a predictability that if you 10x the size of a model which would mean 10 times as much data 10 times as much comp 10 times more complexity in the end you got this sort of you know predictable linear Improvement in how well the model worked and the
argument has been that that kind of scaling is getting harder and harder either it's that we're running out of data or it's really expensive or it's actually just really complex I mean you've been at the front line of that when you look at that pre-training scaling what has been the bit that has started to put the breaks on the rate at which we're seeing results from it maybe maybe zooming out for a second I think there sort of scaling laws as uh quite precise and that was what was so surprising about them very precise empirical
finding um just from studying Ai and AI training and that was you you you put it beautifully that if you increase the size of neural networks uh the number of parameters they have if you increase the amount of data if you increase the amount of compute you use to train then you get these stunningly predictive curves for how well the AI can model uh its data can make its quote unquote loss go out what this really means is how well can large language models predict the next word in a sentence paragraph document uh Etc and
that just very very precisely improves as you as you scale up and we haven't seen any limits to that I mean we are seeing that I think as you models bigger as long as you have all of these ingredients that that that you mentioned model size compute and and data you still get improvements I think probably the the limiting factor that people talk about the most for a good reason is is is data eventually one uh one is going to run out of data I actually don't know that we we have reached that point yet
um I guess we will we will see but I do think eventually in the next couple of years uh we'll reach that point certainly cost also matters but there all these different other ingredients I think they're driving cost down I think we're finding algorithmic improvements that make model training much more efficient and we're also seeing that Hardware is improving very quickly as well um and so so costs are going down for those reasons so I think I think the sort of scaling is going to continue now there's a separate question there's this very nice empirical
statement that the AI can model its data better but that doesn't necessarily mean it's more useful for you that doesn't necessarily mean it's more useful as as Claude um I think generally generally it has that implication but it's much less precise and so it's possible that sort of the gains that we get in in what AI can do for you will come more from training it to do useful tasks after pre-training rather than pure scale right right I I okay I want to talk about the the after after pre-training in in a second but it
it's been um I think one year and 10 days since Claude 3 was released so belated happy birthday uh to Claude 3 which I think became everyone's most personable uh llm uh a year ago one of the things that we've seen happen in the the full AI stack is that the generation time has got shorter and shorter so semiconductors used to be on a three-year cycle Jensen Wang has put them on a one-year cycle uh AMD is is responding in a similar sort of way uh and there has been an acceleration but we're over a
year since cord 3 came out so what what is the right time between generations for these large language models um and what should we expect as as consumers on the other end I think that the generation time for models has been really really fast at least to me it feels fast and I think that's that's basically going to continue so I think that we should expect a new generation of clad models and not not too long certainly the next six months or so right and I think that basically that's going to continue and it's both
because we're improving sort of post trainining or reinforcement learning training plot on more tests and because I think we're we're we're able to improve the iciency and intelligence uh from from pre-training so I think that's not slowing down anytime soon um I uh I think in some ways the Model cycle is even faster than the hardware cycle um we'll see if the hardware cycle is really one year but uh but it's definitely moving quickly and and we're getting new Chips uh uh sort of as we speak so there was that very very fascinating and challenging
paper written by a young man called Leopold Ashen brener uh last year which I'm sure you you would have read um and and in that he had a 2year generation time between the sort of uh order of magnitude improvements in in models and I remember reading that and I was thinking I don't think it can be two years because honestly it takes time to build a data cental to get the chips from Nvidia and to find the power and to generate the data assuming you needed synthetic data in some cases when you look back at
that now with a bit of distance um you know how would you communicate that that that sense of practical Generations in terms of how we should expect as as consumers as members of society these models to improve is it a three-year clock cycle or is it going to be a you know a two-year one I guess I think of it as being smaller than two years but that's maybe because like we're iterating very very quickly um so I would say that like there's some pre-training life cycle but that's usually measured in months rather than years
um now there's a question of research like how quickly can researchers come up with new innovations that are are sort of worth with shipping but I think of that as being being quite quite a bit shorter than a year and then with reinforcement learning I think that historically that's been much less compute much less resource intensive training or though that may be maybe changing now and um and for that I think we can iterate much more quickly so I think that there's sort of a desire uh at least as we develop Claude to every time
we think that there's like a significant Improvement that we can deliver in Claud and that might be for any of those reasons because of pre-training improvements or because we just realized that we can train Claude on some new task like what goes into Claude code and and and that will be useful to people I think we're expecting to ship it so I think it's really it's more of a Continuum it's more like like you could ask how quickly does does Mo's law develop and it's really more of a Continuum where it's like I don't know
it's like uh maybe it doubles every 18 months or something like that I think with AI it's faster than that I don't know how viscerally that that feels when you're playing playing with each generation of models but I mean you can tell me I mean you've been playing with cl 3.7 Sonic you CL played with cla3 Opus like I don't I don't know what you feel is the biggest I no I mean it's completely wild uh the the rate with which we have to update our behaviors as somebody who uses these these tools um is
really really astonishing and I I find myself coming up with something you guys introduced something called clawed project um a a while ago and so I built lots of projects to help in my research work then as the models got better and of course I use I use every family of the models I use the open AI ones and uh the uh perplexity and and u.com and lots of other one ones all at the same time and I I also often use them adversarially because they all have slightly different flavors I used to think of
Claude as being that really super smart history grad student when you're an undergrad kind of charming a lot of lot of stuff and was never clever than thou and I would think of another model being a bit like that nerdy mathematician who always had to get it right you you know which one I'm referring to there um and so there was this different in personalities but I have found that the rate with which they feel like they're getting better uh means that I almost don't document my changes in behavior I'm just literally living it through
through practice uh and yes you're right so I think it feels very fast so I want to come to this other question which is this phrase that um SAA Adela and of course Jensen has now started to use um quite a lot from the the middle of last year which was test time scaling or inference time scaling what is it and how big a deal is it I think it's a big deal so the the claim here is that uh as you let an AI model think for longer then uh you can get predictable improvements
in the accuracy when it's doing a hard task where just say pure thought improved performance so I think the classic example is like solving a really hard math contest problem or a competition coding problem what we see is that uh as you let literally like say Claud 7 son think for say a thousand words or 2,000 words or 4,000 words or 8,000 words or 16,000 words um you get sort of predictable improvements in which each doubling of the amount of time Claud can think you get sort of a constant uh increase increase in performance and
you can actually also see that extended in other ways like you can train a separate AI model or you can even just ask clad itself to analyze the solution but you can you can train it to decide sort of which possible solution to a problem is best and then in parallel you could generate you could generate one solution two solutions four eight Etc and you can ask you to choose which is the best of those uh of the ones that have been generated in parallel after the fact and again I think you tend to see
uh pretty clean scaling where you can get better and better performance that way and so I think that for very difficult tasks this means that you can either choose to have a smarter model sort of solve it in a shorter amount of time or you can ask a smaller model to work longer and and maybe get the same performance and so I think this is exciting because for the very very hard tasks that you might want AI to solve maybe if you just kind of throw enough test time compute at it you you can solve
them I mean I think the things that we imagine in the future things like helping to cure diseases or making new breakthroughs in theoretical physics things like that right so so the the additional thinking uh this doubling of uh of of time spent thinking gives you this predictable Improvement I noticed in the new version of Claude 3.7 Sonic you've got this this thinking time and it's just a little feature that I flag that I tick is there some way of looking at my query and then make you make a judgment of how much thinking should
should go into that type of a query to give me a sufficiently improved response rather than have me sit around waiting for forever yeah indeed so I mean CLA 3.7 Sonet is the first hybrid uh sort of reasoning model where it can act just like Claude 3.6 son or 3.5 son it new I mean we're we're we're we're great at naming your naming is yes I you thought of using AI to help with your naming a very very primitive or like we we're using like gbt2 to to name our models but yeah um uh yeah
Claud 3.7 son it um can behave very similarly to Prior Generations where it doesn't think at all or you can ask it to think and it tries to sort of decide from its training how much thinking to do and sometimes you wish it would think a little bit more sometimes a little bit less but but basically based on sort of the difficulty of the task you assign to it it will think uh the the amount that that it expects his best and that is something that we're working on and we think will get better over
time is that it should become possible kind of let it think as much as it wants but kind of get a response in whatever the most reasonable time frame is and and I guess the question is just like I don't know if you start a new job and your boss gives you something hard to do you might really want to spend a lot of time thinking because you really want to get the right answer you don't want to get fired but on the other hand in some situations maybe once you're comfortable at at your new
job you might uh you might feel like oh I'm just G to give a quick answer like we have a good Rel so so I think Claude is sort of in the same space where it doesn't know like am I expected to try really really hard to get the best answer or should I just am I wasting someone's time and so um that's something that we want Claude to sort of learn from Context over time tried to make a little bit of that in but but hopefully that will improve with with future Generations but architecturally
is it a pre-processing step where the query comes in you do some kind of a process you make some kind of judgment you send a parameter to the underlying model to say think for x000 tokens or think for 2x th000 tokens or is it all in the the single model it's all in the single model if you're a developer you can specify precisely sort of what budget Claud gets right sort of 99 plus per of the time it will of stay within that budget and some of a lot of the time it will actually undershoot
that budget quite a bit so you'll say you can think for 16,000 words but it'll only think for 4,000 or something um and it's using its own judgment uh uh in deciding should I use all the space I have uh or not um as as a user on cloud I think there are sort of fewer parameters that you can set just to just to keep things simple uh but it's all one model that's sort of deciding based on what you've asked how how much to think for a lot of people this thinking time experience exploded
into their understanding Inauguration Day this year January the 20th because of deep seek R1 how big a deal was was that in anthropic I had been following deep seeks progress for for at least sort of a year year and a half because they've been writing writing papers and improving their models so it wasn't actually very surprising uh very surprising to me or or to to to anthropic it was interesting to see that like the reaction sort of globally of wow China has this this this great model and there are people that I talked to in
the US who who have thought historically maybe China's many years behind and seeing deep se's progress in the papers they were writing I kind of thought well they're like I don't know maybe they're six months behind but they're they're not very far behind yeah yeah I mean it was it was apparent with what was called V3 which was I think October 2024 where they were achieving gp4 level uh capability at you know 30 50 times cheaper and then I think the earlier paper was maybe November 2023 with with version two and you were starting to
see some some pretty good uh you know good results from the outside I suppose partly we're often looking at that chart that shows the gap between the top ELO rated uh models so an ELO rating uh is a like a chess-like rating for how good uh models are in chat I know you know this Jared just for people listening um and and you know you see the frontier is a bunch of American firms and the best Chinese model is quite far behind it over the course of the months the Delta has got shorter and and
and shorter and so you're looking at that and you're saying well is it just going to be a Delta where the Chinese are slightly behind the best US models or is there a real momentum within those Chinese firms that will take them past the frontier US models what would need to be the true in terms of the kind of research breakthroughs that deep seek might need for it to propel itself beyond the capabilities of us Frontier models yeah I mean research breakthroughs are happening very very quickly I mean maybe in a certain sense they're not
even breakthroughs I mean one thing that I always say is that uh when you see really rapid progress in science it's not because the scientists suddenly got much smarter um they're not superhuman it's because people have found an area where there's just a lot of like very low hanging fruit a lot of iteration that that can be used to improve and so I think that's what's been happening in AI I mean maybe for 10 15 years certainly certainly the last five years and I think there's so much Ling fruit to collect that it's unpredictable who's
going to sort of find which advances first my expectation though though I don't know for sure is that going forward I think that there are these sort of export controls that are in place that I think mean that that sort of Western firms uh will probably have an advantage in terms of the amount of compute available and I think that that will probably make it uh more difficult for deep seek and and and and others to to be competitive but I think that in terms of the basic algorithms um all of the sort of leading
AI companies I think are finding ways to do very simple things that work well and scale well and there's no reason why I mean I think deep seek based on their papers has also found found a lot of these ideas and techniques and there's no reason why they can't uh can't be very comp comptitive algorithmically but it does seem that there's been a a tone shift in the discussion of of of AI and AI uh development and I know that your uh your co-founder Dario amadai has been speaking quite a lot about increasingly the importance
of some sort of export control or licensing regime around chips as regards getting them out to to China and I even detected and please correct me if I've got this got this wrong a sh in in anthropic approach to how quickly development needs to happen I got a subtle sense that maybe anthropic which has often argued very strongly in favor of a sort of a safety first approach had slightly changed its gate and said we just need to go faster than we we have have I misread that the way that anthropic thinks about sort of
speed of development and its interplay with safety is primarily through our responsible scaling policy so it's true that in those sort of very early days when we when we founded anthropic I think there was a general sense among us although I I think this was not something that I mean AI was kind of not a big deal in The Wider World our sense was that AI was going to make very very rapid progress and that that was primarily going to be very beneficial to the world but there were also a lot of risks assoc associated
with it and we didn't have some sense that this powerful technology being developed slightly more slowly could could actually actually better in terms of sort of getting things right now the way that we've sort of we kind of figured out and I think this was kind of a a breakthrough of its own uh was that sort of we created this responsible scaling policy as kind of a a way to help coordinate with with with with other labs to make sure that AI development is beneficial and and isn't isn't creating harms so the idea there was
that we would have would think carefully about what kinds of real risks from AI exist that we want to actually take seriously and we would measure the capability of our systems to sort of actually do harm to be to be risky um and then if we crossed certain thresholds we would um uh basically commit to have mitigations in place to to to avoid those those problems and so in a certain sense this meant that when we were thinking in these terms um we were more free to move quickly because the idea would be we have
this framework in place where we can move as fast as we can both in terms of AI capabilities and also Safety Research and risk mitigations and we would be sort of bound by to to sort of not cross certain lines until we were ready and so I think that's something that we've sort of alluded to with with even the release of cloud 3.7 sonnet and the way that we've sort of discussed it in its its System card that we think that we are approaching some of these thresholds and therefore future models may need more protections
but at the same time in terms of our research we are getting those Protections in place so we had this both research release that we call constitutional classifiers and an Associated sort of jailbreaking demo where we asked people uh sort of anyone on on the internet to try to sort of jailbreak this new system and uh uh in order to sort of test out this this method and so basically what we what we think is that we want to move as quickly as we can um for a variety of reasons um but we want to
make sure we have these systems in place and so that's that's kind of how we're kind of coordinating to uh we hope scale responsibly yeah I I mean you this this gets the heart I think of some of the paradoxical ideas that you have to hold in your head you and your your your colleagues um you know on on the one hand um the way in which we're building AI today uh has has has many critics and the critics say this system is uncontrollable right you you don't have any verifiable safety and we don't have
the science um for how we build safety around this method of make you know of making uh a Ai and and and you you have a a great comment I'm just going to read it here maybe supervising a thing that's smarter than us is hard maybe not but once you make a thing that's broadly much smarter than you and given that it would be easy to run millions of copies of that thing once you have one you're going to lose and be disempowered if there's a conflict given the stakes being 90% sure it'll work out
is very far from okay I think that was you was that you that sounds like me yeah it sounds like you right okay good good so I guess what one thing I'm curious about is is how do you personally manage the that that paradoxical piece of the sort of risks that you've yourself articulated and the work that you are doing and the speed with which it's moving generation times that are you know less than 18 months I mean is there some internal cognitive mechanism that you've built to yourself yeah so I mean I guess I
can talk about all of the different kinds of research we're doing to try to meet the moment um I mean I think very broadly I mean you you mentioned deep seek there are all these AI labs in China that are that are that are near the frontier or almost at the frontier at the frontier um there are a lot of different ai ai researchers and labs in the US in in SC scattered across the world everyone sort of pushing forward this technology and obviously I mean you've mentioned the potential for risk and critics that say
that this technology uh is is is is is dangerous on the other side obviously there are a lot of people saying that's uh that's really silly um that's totally unnecessary what's most important is delivering the benefits of this technology because other things that I've said that Dar have said maybe this technology can help us to cure cancer 10 times faster than we would otherwise um and so people say things like how could we possibly slow down when when we're going to be able to deliver those kinds of broad uh improvements for human welfare so there
are a lot of competitors there's a broad capitalist ecosystem moving very quickly there are a lot of benefits to this technology but also if it's really as powerful as millions of geniuses in in a data center Etc then there're also risks and so it's it's a it's a difficult difficult thing to juggle I think that as AI becomes more capable I think the stakes go up and I think the confidence level you you would like to have before you develop or or deploy such systems also goes up um the way that we're thinking about this
is through a lot of different different research directions interpretability is one we talked about a lot and it's something that even I was unsure about I would say I don't know three or four years ago I wasn't really sure whether we'd be able to to get any benefits but I do think that we're starting to see to say what's the benefit of interpretability the benefit of interpretability is basically if you have a really Advanced Ai and you're not sure if you trust it it sure would be useful if you could read its mind and it
sure would be useful if you could not even just read its mind but understand how it puts its thoughts together to take actions in the world so interpreta I think first and foremost at least I think is could be a very very powerful way of checking whether AI really is doing what you what you want it what you want it to right but but as AI gets better on this Continuum the way in which it's going to make its decisions are going to become less and less interpretable to us you're a theoretical physicist you are
somebody who understands tensors um you know multi hundred dimensional spaces and you have formalisms that allow you to navigate and make sense of those I can say the words in English but all of the work you do as a theoretical physicist is not interpretable to me and so at at some point maybe it's 18 months maybe it's 36 maybe it's 54 on your trajectory we will have systems that will be smarter than everyone we have the the the the astronomer Royal in the UK Lord Martin re on watching this right now they'll be smarter even
than than than him and the number of people who could interpret that then drops it gets to world where only Terren to the mathematician can interpret it and then he he can't so what does it mean to be interpretable if if the machines are operating in in spaces that make us look not so much like silverback gorillas but like hamsters yeah it's a good question so there are a couple of things I would say so one is that if you want to understand everything that the AI is doing I agree that's going to be very
difficult but you may be able to study a bunch of spe specific examples where you can understand for example what is the goal that the AI is seeking maybe you don't fully understand all of the steps in the process maybe you can break it down and and study it intensively and you you can maybe you can use AI to help you to analyze what's going on inside maybe you can use a dumber AI to help you to understand a smarter AI but but generally like there may be components of what the AI is doing like
specifically its goals or the way that it might respond in particularly dicey situations that you can set up and analyze that might give you insight into whether the AI is sort of ultimately aligned or not the other thing I would say is that interpretability is really just just one tool so I think that it it hopefully will give us insight and some way of auditing Ai and checking what's going on see there's a few other things that that we're doing simultaneously one is trying to figure out better and better ways to use AI to help
supervise and monitor AI I think both of those so constitutional AI which sort of anthropic developed a few years ago I think was like the very first example of this where we used AI systems to check whether other AI systems were obeying a constitution and to guide them towards behaviors that were in accord with a list of principles that we call the Constitution so what we're trying to develop now one of the directions we're moving is kind of like constitutional AI souped up where you could have ai systems that think using the reasoning we were
talking about earlier about what another version of claw is doing in order to sort of monitor it and steer it in a good direction and so the goal there is that um you want your monitoring and the supervision you have of AI to improve with the intelligence of AI so by the time you're at a point where ai ai is sort of as smart as people or or Beyond you're able to leverage those smarter than human AI for alignment so right so can I can I can I jump in with a historical parallel so that's
a little bit like an enlightenment idea right it's an enlightenment idea in the sense that that you you layer on top of what's gone gone before and and I you know I have a sense of when I use these large language models that they do embed that idea already it's really really difficult to imagine a large language model that will believe the Earth is flat because in order for it to believe the Earth is flat it can't then be helpful in other ways right because so many of the internal relationships have to be broken that
it's it's not going to help so there is some sense it almost feels that this is like a teleological argument which is that if each subsequent generation is is roughly aligned there is a an arrow of progress that suggests that you build you know one on on top of of of another and I mean is it is it that there's a kind of a selection pressure that goes on that models that end up diverging from that path become less useful and therefore they don't benefit from economic incentives or is that that am I just drawing
too strong a parallel with history well I think I think there's no I think I think it's a good parallel I would say there's two things so one is we got extremely lucky and I remember talking to folks concerned about about safety um uh who were excited about this back I don't know in 2017 2018 there was this Vision that AI was going to look like alphao where it starts off as a completely blank slate and it trains just to optimize to to do a particular task we got I think lucky or maybe we make
good choices in developing large language models because fundamentally large language models the very first thing they learn is to understand human writing human ideas the way humans use words to conceptualize the world and viously that has encoded a lot of our biases it has a lot of the intuitions we have about ethics and morality Common Sense human history the things that have gone well the things that have gone wrong and furthermore it means that it's very the very first thing we got these models to do was to chat with us I mean I think it's
not a coincidence that that dialogue chat was one of the very first breakout applications because uh these models are trained on on language and so I think that is sort of exactly in sync with what you were saying where there's a a very very strong bias that these models will at least understand and be able to communicate and their basis for thinking will be very much a mirror to our own now it might be a a strange mirror in a lot of ways it might be a funh house mirror but nevertheless they really need to
sort of understand all of those ideas and so I think that does give us a a pretty strong foundation and it also means that at least some informal senses of interpretability like being able to look at the Chain of Thought thinking these models are using and and understand it I think is sort of baked in baked into the model so I think that is a major advantage of the way that AI has developed it didn't necessarily have to be that way a lot of people no one really predicted it five or six years ago but
I do think that it makes our task with alignment a little bit easier right but but you know you you have within this idea the Constitutional AI you have character training these are all U decisions that you make about uh how the model is going to behave and what its sense of you know goodness is right I mean goodness me being giving us things in ways that are honest and and helpful and and harmless but those are choices of technologies that could end up being quite infrastructural right you know I think many people now believe
that the if the 19th century infrastructure was canals and Railways and then sewage systems 21st century uh infrastructure will be you know a layer of of of AI and the nature of infrastructure is that it is a it is a social product in of itself but the way that we currently guide uh these the success of these models is actually through the interactions in the market and it's through the interactions in the market with lots of different SKS so if you're building a based model like Gro you're not competing on a fair basis with with
Claude because you have 200 million people a month on on X that you can pour at that model or if you're at open AI you've developed some forward momentum so these models are not going to compete in a way that's necessarily aligned with with public interest is that a problem it's definitely I think for any any advanced technology it's a problem I think it's very much a social problem that's that's fundamental to sort of capitalism though in the sense that with any new technology there are certain capitalist incentives there's a sense in which those often
are not completely anti aligned I mean people are buying products they they put their dollars where their opinions are but I do think that like there are externalities there can be problems introduced with with with with new technologies capitalism is very very very far from perfect in terms of making sure human welfare is improved is broadly available but I think it's sort of a lot of the problems are similar to problems we already have in the world they're just potentially if the if the technology moves very quickly then maybe we'll have to face them more
quickly than we do with other Technologies but I think they're broadly similar also I mean we do try to B in some flexibility like you could ask Claude to roleplay in in a lot of different ways and instead of sort of sounding just like Claude it's happy to sort of sound sound different and I think we try to set some guard rails of sort of basic harmlessness on on some of this but you can change Claude And I think part of that is that we do think that it shouldn't be up to us to completely
dictate the value I think AI is fundamentally is very empowering and should be of broadly accessible and we want people to sort of be able to benefit in whatever way they see I mean I think freedom to use the technology in the way that you want is is really important I think one discontinuity is between what's happening in the Bay Area then what's happening within the frontier labs and Main Street for sake of argument and a sense of how quickly or not things are moving and I get a sense from you that you know you
really feel things are moving very very quickly that the generations between advances is quick quicker than perhaps we feel outside there's a contrary perspective which is that um you know short-term equilibri is often constrained by social organizational uh inertia by the fact that you have to build power stations or just get people to change their behaviors I mean I've been using large language models now for you know regularly since November the 30th 2022 I'm still doing searches on Google I mean not so many but I'm still doing a few so so there there is this
sort of cont view that says paradigmatic change still takes quite a bit of time but that still leaves the possibility that the way in which these these models impact economic productivity and the labor market could be much faster than uh the canal or electricity or or the iPhone what is the debate that ought to be happening that would most help us prepare for that kind of fast deployment scenario that's a great question I mean so I was a theoretical physicist until pretty recently and often when I talk to physicists I sort of say if you
believe me and I mean obviously people should be skeptical about about AI people should think about it carefully I if you'd ask me six or seven years ago didn't in any way expect this I sort of slowly became more and more convinced that AI was going to make very rapid progress now I'm pretty convinced but I might still be wrong so people should be should have have appropriate levels of skepticism I certainly question is AI really going to be on this trajectory but if you accept that I mean a thing that I say to my
fellow physicists is look we really need people with all different kinds of intellectual training experience to work on this because if this is really happening it's pretty revolutionary and and we want sort of our best and brightest to be paying attention want everyone to be paying attention and thinking about what it what it means in terms of what what should we be thinking about um I think there's a lot of interesting debates about what the economic impacts of AI really will be um I think it's different from a lot of other Technologies I think something
that's very interesting and I don't know what's going to happen with it is that in terms of where is AI most most useful most productivity enhancing it's really sort of more educated sort of White Collar jobs that that might be impacted so I think that's that's a difference I think from from the way we often think about automation so I think parsing the consequences of that is is is is pretty interesting we ourselves I guess are we're trying to study empirically we're very focused on empirical approaches for for AI and otherwise how AI is getting
used so we have this tool Cleo that allows us to uh uh in a privacy preserving way sort of aggregate how CLA is being used and we're studying things like is that complimentary is it productivity enhancing to what extent is it is it potentially replacing tasks that people would otherwise do and we're sort of opening that data set up to economists to try to study so I think that's an example of a place where I think there's a lot of interesting work to do to try to understand like what is the the progress that it's
going to be because right now we see a huge uptick in in usage of AI for software engineering um and I think that's sort of the perfect area because software Engineers love to adopt new technology it's very exciting and also software is verifiable right so if Claude produces something that doesn't work it won't execute it won't pass its viit tests so you can go back and say listen this didn't do the thing I needed to do do it again exactly exactly so I think there's a lot of reasons why software engineering is a natural place
for AI to be adopted but I think it's a great question like is something like what we see in software engineering with with with so many uh so many folks using using AI um is that going to happen all across knowledge work or is it going to be much slower and then what how is it going to kind of permeate permeate our day day-to-day lives I think it's interesting to think about that I think it's interesting to think about do we really want AI That's at or Beyond human level you were asking me all these
questions about is it really safe to have ai that is smarter than you and I think that is a real question like I think it's a society level question should we be sort of having these super intelligent AI aliens kind of invading the Earth or should we should we decide not to and how from an international perspective do we decide how we roll out this technology you know but I I frame it slightly differently to that to that because I I I see you you and your colleagues are evolving right the each sub subsequent AI
is really impressive and if you had shown me Claude 3.7 Sonic thinking 5 years ago I I would have 100% said we have AGI we we H we have it and what you're doing in the sense is you're sort of progressively disappointing us because we're getting so used to it it's like oh wait a second 3.7 sonets two weeks old right I'm not interested in that anymore but I I one of the reasons I I slightly disagree with the framing of a super intelligence that that appears out of nowhere is because it's not it's not
appearing out of nowhere it's appearing in an evolving way in our hands we're also starting to recognize that as with any system there's a an efficient Frontier you know you can't no model is better than every other model in every way right there are trade off and the way it will get embedded across our economy will be to be honest the model that's going to run the OCR on a scanner in a warehouse is just not going to be as smart as Claude 5 Sonet why would you why would you do that why would you
run up the cost why would you have all the latency and so you know I think for me part of the challenge actually is really about what is that evolving governance of these this machine intelligence systems where there will be thousands or millions of these models I think some of the AI risk debate emerged from Nick bostrom's excellent book a decade ago where the view was you'd have a Singleton right you'd have a solitary all powerful AI rather than many many thousands or millions or tens of millions of them all interacting in slightly different ways
which I think creates a different control problem to be to be honest I completely agree I think being able to continue to iteratively deploy improvements to Claud I think is valuable because we can sort of see where it's headed we can sort of identify problems if uh if the next model has has an issue people can complain to us we can discuss it to the extent that it's that important we can kind of discuss it as a society and then then we can we can we can remedy it so I agree super intelligence is not
going to be one specific moment where we hit super intelligence it is very much a Continuum AI is evolving in in that way and yeah I agree like we're going to be using AI systems that are not very sophisticated basically tools forever I mean yeah there's no reason to use an expensive model for OCR so I agree we're going to have this ecosystem it's a good point that this ecosystem itself especially if AI is interacting with AI to do a lot of the work we do now like maybe your AI doctor interacts with your AI
pharmacist and negotiates with like the AI health insurance to deny you your cover no doubt but you something don't change right yeah I think that like that ecosystem could have a lot of problems that maybe Claude is and it's aligned but the ecosystem has problems and that's something that is so new that no one's really studied it but but it's something to worry about I think there's this General worry that the way that AI development will cause harm eventually is that things kind of go off the rails in the modern world I don't understand how
most things work like do I understand how my car Works do I understand how I understand how how my my iPad works so when you get more and more systems that no one really understands things can kind of go off the rails in ways that are really hard to predict din kind of uh due to the interaction the ecosystem so that's definitely a risk that is another conversation though I mean and I you know you you you've tickled my brain now and there's so many places to uh to explore that one you've been super generous
with your time I wanted to ask just one one last question really which is about the things that we should be excited about so if you look out over this next 12 months and you look at Claude you look at anthropic what are you most excited about I'll say something that has nothing to do with Claud the pr I think I'm most excited about getting what I described in terms of sort of scalable supervision of AI helping us to sort of make sure that AI is is is useful is is doing doing good things for
us I'm very excited about kind of getting that really working going Beyond uh constitutional AI because I think that's really the lever for becoming more confident that we can continue to improve the capabilities of AI and make it more useful and make it beneficial uh for for all of us well I look forward to that as well I will be stopping this call and straight into Claude and asking all about all the questions I should have asked you uh Jared Kaplan thanks so much this morning for for chatting to us thanks so much for having
me it was a lot of fun
Related Videos
Pedro Domingos on the Real Path to AGI
1:08:12
Pedro Domingos on the Real Path to AGI
Eye on AI
3,370 views
What Will AI Look Like in 2027? | Interview
25:32
What Will AI Look Like in 2027? | Interview
Hard Fork
15,439 views
Sir Niall Ferguson decodes Trump, China, and the new world order
52:24
Sir Niall Ferguson decodes Trump, China, a...
Azeem Azhar
69,249 views
Full interview: "Godfather of AI" shares prediction for future of AI, issues warnings
51:48
Full interview: "Godfather of AI" shares p...
CBS Mornings
410,975 views
Could AI models be conscious?
43:43
Could AI models be conscious?
Anthropic
39,785 views
How AI Is Creating the Industries of the Future│Stuart J. Russell(University of California, Berkeley
35:38
How AI Is Creating the Industries of the F...
World Knowledge Forum
8,532 views
Anthropic Co-FounderJared Kaplan Discusses Future of AI
24:13
Anthropic Co-FounderJared Kaplan Discusses...
Hertz Foundation
1,639 views
Why Can't AI Make Its Own Discoveries? — With Yann LeCun
59:23
Why Can't AI Make Its Own Discoveries? — W...
Alex Kantrowitz
75,269 views
The difference between early and late AI adopters
49:31
The difference between early and late AI a...
Azeem Azhar
727 views
AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference
56:49
AI Snake Oil: What Artificial Intelligence...
MIT Shaping the Future of Work Initiative
79,680 views
Trump on Upholding Constitution: "I Don't Know" | The Daily Show
19:46
Trump on Upholding Constitution: "I Don't ...
The Daily Show
2,284,683 views
Debating the existential risk of AI, with Connor Leahy
1:07:21
Debating the existential risk of AI, with ...
Azeem Azhar
7,720 views
Our AI Future Is WAY WORSE Than You Think | Yuval Noah Harari
1:37:44
Our AI Future Is WAY WORSE Than You Think ...
Rich Roll
847,233 views
Geoff Hinton - Will Digital Intelligence Replace Biological Intelligence? | Vector's Remarkable 2024
41:45
Geoff Hinton - Will Digital Intelligence R...
Vector Institute
102,113 views
Veritasium: What Everyone Gets Wrong About AI and Learning – Derek Muller Explains
1:15:11
Veritasium: What Everyone Gets Wrong About...
Perimeter Institute for Theoretical Physics
2,317,421 views
Why the 'intelligence explosion' might be too fast to handle | Will MacAskill
4:08:06
Why the 'intelligence explosion' might be ...
80,000 Hours
95,306 views
How to make Chinese AI work for the US – with Jeffrey Ding
43:33
How to make Chinese AI work for the US – w...
Azeem Azhar
314 views
Yann LeCun "Mathematical Obstacles on the Way to Human-Level AI"
56:23
Yann LeCun "Mathematical Obstacles on the ...
Joint Mathematics Meetings
85,662 views
Building Anthropic | A conversation with our co-founders
51:50
Building Anthropic | A conversation with o...
Anthropic
78,672 views
Stephen Wolfram's 2025 AI predictions, Advice to SWEs to stay ahead of AI, and productivity HACKS
1:18:13
Stephen Wolfram's 2025 AI predictions, Adv...
Codesmith
16,544 views
Copyright © 2025. Made with ♥ in London by YTScribe.com