- Why are we working on AI in the first place? I'm just gonna arbitrarily pick Jared. Why are you doing AI at all? - I mean, I was working on physics for a long time and I got bored, and I wanted to hang out with more of my friends, so yeah. - Yeah, I thought Dario pitched you on it. - I don't think I explicitly pitched you at any point. I just kind of, like, showed you results of, like, AI models and then was trying to make the point that, like, they're very general and they
don't only apply to one thing and then just at some point, after I showed you enough of them, you were like, "Oh yeah, that seems like it's right." - How long had you been a professor before? Like, when you started? - I think like six years or so. I think I helped recruit Sam. - I talked to you and you were like, "I think I've created a good bubble here-" - Yeah. - "And like my goal is to get Tom to come back." That worked. - And did you meet everyone through Google when you were
doing the interpretability stuff, Chris? - No. So I guess I actually met a bunch of you when I was 19 and I was visiting- - Oh yeah? - In the Bay Area for the first time. So I guess I met Dario and Jared then, and I guess they were postdocs, which I thought was very cool at the time. And then I was working at Google Brain and Dario joined and we sat side by side, actually, for a while. We had desks beside each other, and I worked with Tom there as well. And then, of course,
I got to work with all of you at OpenAI when I went there. - Yeah. - So I guess I've known a lot of you for, like, more than a decade, which is kind of wild. - If I remember correctly, I met Dario in 2015 when I went to a conference you were at and I tried to interview you and Google PR said I would've read all of your research papers that you needed to- - Uh, yeah, I think I was writing "Concrete problems in AI safety" when I was at Google. - I think you
wrote a story about that paper. - I did. - I remember right before I started working with you, I think you invited me to the office and I got to like come chat and just like tell me everything about AI. And you like explained, I remember afterwards being like, "Oh, I guess this stuff is much more serious than I realized. And you were, like, probably explaining the big blob of compute and like parameter counting and how many neurons are in the brain and everything. - And I feel like Dario often has that effect on people;
"This is much more serious than I realized." - Yeah, I'm the bringer of happy tidings. - But I remember when we were at OpenAI when there was the scaling law stuff and just making things bigger and it started to feel like it was working, and then it kind of kept on eerily working on a bunch of different projects, which I think is how we all ended up working closely together, 'cause it was first GPT-2 and then scaling laws and GPT-3 and we ended up- - Yeah, we were the blob of people that were making things
work. - Yeah. - That's right. - I think we're also excited about safety, 'cause in that era there was sort of this idea that AI would become very powerful but, like, potentially not understand human values or not even be able to communicate with us. And so I think we were all pretty excited about language models as a way to kind of guarantee that AI systems would have to understand kind of implicit knowledge that- - And RL from human feedback on top of language models, which was the whole reason for scaling these models up was that,
you know, the models weren't smart enough to do RLHF on top of, so that's the kind of intertwinement of safety and scaling of the models that we, you know, still believe in today. - Yeah, I think there was also an element of like the scaling work was done as part of the safety team that, you know- - Mm-hmm. - Dario started at OpenAI, because we thought that forecasting AI trends was important to be able to have us taken seriously and take safety seriously as a problem. - Correct. - Yeah, and we took, I remember being
in some airport in England sampling from GPT-2 and using it to write fake news articles and Slacking Dario and being like, "Oh, this stuff actually works. It might have, like, huge policy implications." I think Dario said something like, "Yes." His typical way. But then we worked on that a bunch as well as the release stuff, which was kind of wild. - Yeah, I remember the release stuff. I think that was when we first started working together. - Yeah. - That was a fun time. - Yes. - The GPT-2 launch. - Yeah. But I think it
was good for us 'cause we did a kind of slightly strange, safety-oriented thing altogether, and then we ended up doing Anthropic, which is a much larger, slightly strange, safety-oriented thing- - That's right. - Together. - So I guess just like going back to the concrete problems, 'cause I remember, so I joined OpenAI in like 2016, like one of the first 20 employees or whatever with you, Dario, and I remember at that time the concrete problems in AI safety seemed like it was like the first mainstream AI safety- - Yes. - Paper, I guess? I don't
really know if I ever asked you what the story was for how that came about. - Chris knows the story, 'cause he was involved in it. I think, you know, we were both at Google. I forget what other project I was working on, but like with many things, it was my attempt to kind of procrastinate from whatever other project I was working on that I've now completely forgotten what it was. But I think it was like Chris and I decided to write down what are some open problems in terms of AI safety, and also AI
safety you usually talked about in this very kind of abstruse, abstract way. Can we kind of ground it in the ML that was going on at the time? I mean, now there's been, like, you know, six, seven years of work in that vein, but it was almost a strange idea at the time. - Yeah, I think there's a way in which it was almost a kind of political project, where at the time, a lot of people didn't take safety seriously. So I think that there was sort of this goal to collate a list of problems
that sort of people agreed were reasonable, often already existed in literature, and then get a bunch of people across different institutions who are credible to be authors. And like, I remember I had this whole long period where I just talked to 20 different researchers at Brain to build support for publishing the paper. Like in some ways, if you look at it in terms of the problems and a lot of things that emphasized, I think it, you know, hasn't held up that well in that it's, you know, I think it's not really the right problems, but
I think if you sort of see it instead as a consensus-building exercise, that there's something here that is real and that is worth taking seriously, then it was a pretty important moment. - I mean, you end up in this really weird sci-fi world where I remember at the start of Anthropic, we were talking about constitutional AI and I think Jared said, "Oh, we're just gonna write a constitution for a language model and that'll change all of its behavior." And I remember that sounded incredibly crazy at the time, but why did you guys think that was
gonna work? Because I remember that was one of the first early, like, big research ideas we had at the company. - Yeah, I mean, I think Dario and I had talked about it for a while. I guess I think simple things just work really, really well in AI. And so like, I think the first versions of that were quite complicated, but then we kind of like whittled away into, like, just use the fact that AI systems are good at solving multiple choice exams and give them a prompt that tells them what they're looking for and
that was kind of a lot of what we needed. - And then we were able to just write down these principles. - I mean, it goes back to, like, the big blob of compute or the bitter lesson or the scaling hypothesis. If you can identify, you know, something that you can give the AI data for and that's kind of a clear target, you'll get it to do it. So like here's this set of instructions, here's this set of principles. AI language models can like read that set of principles and they can compare it to the
behavior that they themselves are engaging in. And so like you've got your training target there, so once you know that, I think my view and Jared's view is there's a way to get it to work; you just have to fiddle with enough of the details. - Yeah. I think it was always weird for me, especially in these early eras, 'cause like I was in physics and then coming from physics, and I think now we forget about this 'cause everyone's excited about AI, but like, I remember talking to Dario about concrete problems and other things, and
I just got the sense that AI researchers were very, very kind of psychologically damaged by the AI winter where they just kind of felt like having really ambitious ideas or ambitious visions was, like, very disallowed. And that's kind of how I imagine it was in terms of talking about safety. In order to care about safety, you have to believe that AI systems could actually be really powerful and really useful, and I think that there was kind of a prohibition against being ambitious. - And I think one of the benefits is that physicists are very arrogant
and so they're constantly doing really ambitious things and talking about things in terms of grand schemes, and so- - Yeah. - I mean, I think that's definitely true. Like I remember in 2014, it was like there were just like, I don't know, there were just some things you couldn't say, right? But I actually think it was kind of an extension of problems that exist across academia, other than maybe theoretical physics, where they've kind of evolved into very risk-averse institutions for a number of reasons. And even the industrial parts of AI had kind of transplanted or
fork lifted that mentality. And it took a long time. I think it took it until like 2022 to get out of that mentality. - There's a weird thing about like, what does it mean to be conservative and respectful, where you might think like, one version you could have is that what it means to be conservative is to take the risks or the potential harms of what you're doing really seriously and worry about that. But another kind of conservatism is to be like, "Ah, you know, taking an idea too seriously and believing that it might succeed
is sort of like scientific arrogance." And so I think there's like kind of two different kinds of conservatism or caution, and I think we were sort of in a regime that was very controlled by that one. I mean, you see it historically, right? Like if you look at the early discussions in 1939 between, you know, people involved in nuclear physics about what the nuclear bombs were, sort of a serious concern. You see exactly the same thing with Fermi resisting these ideas, because it just seemed kind of like a crazy thing. And other people, like Szilard
or Teller, taking the ideas seriously because they were worried about the risks. - Yeah. Perhaps the deepest lesson that I've learned in the last 10 years, and probably, you know, all of you have learned some form of it as well, is there can be this kind of seeming consensus, these things that kind of everyone knows, that, I don't know, seem sort of wise, seem like they're common sense, but really, they're just kind of herding behavior masquerading as maturity and sophistication. - Mm-hmm. - And when you've seen the consensus can change overnight- And when you've seen
it happen a number of times, you suspected but you didn't really bet on it and you're like, "Oh man, I kind of thought this, but what do I know?" You know, "How how can I be right and all these people are wrong?" You see that a few times, then you just start saying, "Nope, this is the bet we're gonna make. I don't know for sure if we're right, but like, just ignore all this other stuff, see it happen, and, I don't know, even if you're right 50% of the time being right 50% of the time
contributes so much," right? You're adding so much that is not being added by anyone else. - Yeah. - And it feels like that's where we are today with some safety stuff, where there's like a consensus view that a lot of this safety stuff is unusual or doesn't naturally fall out of the technology. And then at Anthropic, we do all of this research where weird safety misalignment problems fall out as a natural dividend of the tech we're building, so it feels like we're in that counter-consensus view right now. - But I feel like that has been
shifting over the past, even just like 18- - We've been helping to shift- - We've definitely been helping. - No, I mean- - Yeah. - By publishing and doing research. - Constant publishing. - This constant force. Yeah. - But I also think, just, world sentiment around AI has shifted really dramatically, and, you know, it's more common in the user research that we do to hear just customers, regular people say, "I'm really worried about what the impact of AI on the world more broadly is going to be." And sometimes that means, you know, jobs or bias
or toxicity, but it also sometimes means like, is this just gonna mess up the world, right? How is this gonna contribute to fundamentally changing how humans work together, operate? Which is, I wouldn't have predicted that actually, you know? - Oh yeah. But yeah, for whatever reason, it seems like people in the ML research sphere have always been more pessimistic about AI becoming very powerful- - Yeah. - Than the general public. - Maybe it's a weird- - General public's just like- - Humility or something, yeah. - When Dario and I went to the White House in
2023, in that meeting, like Harris and Raimondo and stuff basically said, I'll paraphrase, but basically said like, "We've got our eye on you guys. Like, AI's gonna be a really big deal and we're now actually paying attention," which is- - And they're right. - They're right. - They're absolutely right. Absolutely right. - But I think in 2018 you wouldn't have been like, "The President will call you to the White House to tell you they're paying close attention to the development of language models." - Yeah. - Which is like a crazy place- - That was not
on the bingo card. - That was like 2018. - One thing that I think is interesting, too, just is, I guess, like all of us kind of got into this when it didn't seem like there was- - Mm-hmm. - Like we thought that it could happen, but yeah, it was like Fermi being skeptical of the atomic bomb. It was like he was just a good scientist and there was some evidence that it could happen, but there also was a lot of evidence against it happening. - Mm-hmm. - And he, I guess, decided that it would
be worthwhile because if it was true, then it would be a big deal. And I think for all of us, it was like, yeah, like 2015, 2016, 2017, there was some evidence and increasing evidence that this might be a big deal, but I remember in 2016, like talking to all my advisors- - Yeah, yeah. - And I was like, "I've done startup stuff. I wanna help out with AI safety, but I'm not great at math. I don't exactly know how I can do it." And I think at the time people either were like, "Well, you
need to be super good at decision theory in order to help out." And I was like, "Eh, that's probably not gonna work." Or they were like, "It doesn't really seem like we're gonna get some crazy AI thing," and so I had only a few people, basically, that were like, "Yeah, okay, that seems like a good thing to do." - I remember in 2014 making graphs of ImageNet results over time when I was a journalist and trying to get stories published about and people thought I was completely mad. And then I remember in 2015 trying to
persuade Bloomberg to let me write a story about NVIDIA, because every AI research paper had started mentioning the use of GPUs, and they said that was completely mad. And then in 2016 when I left journalism to go into AI, I have these emails saying, "You're making the worst mistake of your life," which I now occasionally look back on, but it was like it all seemed crazy at the time, from many perspectives, to go and take this seriously, that scaling was gonna work, and something was maybe different about the technology paradigm. - You're like Michael Jordan
and that coach that didn't believe in him in high school. - How did you actually make the decision, though? Was it did you feel torn or was it obvious to you? - I did a crazy counter-bet where I said, "Let me become your full-time AI reporter and double my salary," which I knew that they wouldn't say yes to. And then I went to sleep and then I woke up and resigned. It was all fairly relaxed. - You're just a decisive guy. - In that instance, I was. I think it's because I was like going to
work, reading archive papers, and then printing archive papers off and coming home and reading archive papers, including Dario's papers from the Baidu stuff and being like, "Something completely crazy is happening here." And at some point I thought you should bet with conviction, which I think everyone here has done in their careers, is just betting with conviction that this is gonna work. - Yeah. - I definitely was not as decisive of you. I spent, like, six months flip flopping, like, "Okay, should I, actually? Should I do it? Should I try to do a startup? Should I
try to do this thing?" - But I also feel like back then, there wasn't as much talk of engineers and the impact that an engineer can have on AI, right? - Yeah, yeah, no way. - That feels so natural to us now. And we're at the same sort of talent raise for engineers of all different types, but at the time, it was like, you're a researcher, and that's the only people that can work on AI. - Yeah. - So I don't think it was crazy that you were spending time thinking about that. - Yeah. Yeah,
and I think that that was basically the thing that got me to join OpenAI, was like I messaged the people there and they were like, "Yeah, we actually think that you can help out-" - Yeah. - "By doing engineering work." - Yeah. - "And that you can help out with AI safety in that way." - Mm-hmm. - Which I think there hadn't really been an opportunity for that, so that was what- - That's right. - Brought me there. - You were my manager at OpenAI. - I was, that's right. - I think I joined after
you'd been there for a while. - A little bit. - 'Cause I was at Brain for a bit. - Yeah. - I don't know if I ever asked you what it was that got you to join? - Yeah, so I had been at Stripe for about five and a half years and I knew Greg; he had been my boss. He was my boss at Stripe for a while, and I actually introduced him and Dario, because I said, when he was starting OpenAI, I was like, "The smartest person that I know is Dario. You would be
really lucky to get him." So Dario was at OpenAI, I had a few friends from Stripe that had gone there, too. And I think sort of like you, I'd been thinking about what I wanted to do after Stripe. I had gone there just 'cause I wanted to get more skills after working in, you know, nonprofit and international development, and I actually thought I was gonna go back to doing that. Like, essentially, I had always been working. I was like, "I really wanna help people that have, you know, less than I do," but I didn't have
the skills when I was doing it before Stripe. - Yep. - And so I looked at going back to public health. I thought about going back into politics very briefly, but I was also looking around at other tech companies and other sort of ways of having impact, and OpenAI, at the time, felt like it was a really nice intersection. It was a nonprofit, they were working on this really big, lofty mission. I really believed in sort of the AI, you know, potential, because, I mean, I know Dario a little bit, and so he was- -
And they needed management help. - The definitely needed help. That is a fact. And so I think that it felt very me-shaped, right? I was like, "Oh, there's this nonprofit and there's all these really great people with these really good intentions, but it seems like they're a little bit of a mess." - Yeah. And that felt really exciting to me, to get to come in and even, you know, just I was such a utility player, right? I was running people, but I was also running some of the technical teams- - Scaling orgs, yep. - Yeah,
the scaling org, I worked on the language team, I took over- - You worked on policy- - I worked on some policy stuff, I worked with Chris, and I felt like there was just so much goodness in so many of the employees there, and I felt a very strong desire to come and sort of just try to help make the company a little more functional. - I remember towards the end, after we'd done GPT-3, you were like, "Have you guys heard of something called trust and safety?" - Yes, I remember that! That did happen. -
Yeah. - Yeah. - I said, you know, "I used to run some trust and safety teams at Stripe. There's a thing called trust and safety that you might wanna consider for a technology like this." And it's funny because it's sort of is the intermediary step between, you know, AI safety research, right, which is how do you actually make the model safe to something just much more practical. I do think there was a value in saying, you know, this is gonna be a big thing; we also have to be doing this sort of practical work day
to day to build the muscles for when things are gonna be a lot higher stakes. - That might be a good transition point to talk about things like the responsible scaling policy and how we came up with that, or why we came up with it and how we're using it now, especially given how much trust and safety work we do on today's models. So whose idea was the RSP? Was you and Paul? - So, yeah, it was me and Paul first talked about it in late, Paul Christiano, in late 2022. - Mm-hmm. - First it
was like, oh, should we cap scaling at a, you know, particular point until we've discovered how to solve certain safety problems? And then it was like, well, you know, it's kind of strange to have this one place where you cap it and then you uncap it, so let's have a bunch of thresholds, and then at each threshold you have to do certain tests to see if the model is capable and you have to take increasing safety and security measures. But originally we had this idea, and then the thought was just look, you know, this'll go
better if it's done by some third party. Like, we shouldn't be the ones to do it, right? It shouldn't come from one company, 'cause then other companies are less likely to adopt it. So Paul actually went off and designed it, and, you know, many, many features of it changed and we were kind of, on our side, working on how it should work. And, you know, once Paul had something together, then pretty much immediately after he announced the concept, we announced ours within a month or two. I mean, many of us were heavily involved in it.
I remember writing at least one draft of it myself, but there were several drafts of it. - There were so many drafts. - I think it's gone through the most drafts of any doc. - Yeah. - Which makes sense, right? It's like, I feel like it is in the same way that the US treats like the Constitution, as like the holy document. - Yeah. - Which like, I think is just a big thing that like strengthens the US. - Yes. - And like we don't expect the US to go off the rails, in part, because
just like every single person in the US is like, "The Constitution is a big deal, and if you tread on that-" - Yeah. - "Like, I'm mad." - Yeah, yeah. - Like I think that the RSP is our, like, it holds that thing. It's like the holy document for Anthropic. So it's worth doing a lot of iterations getting it right. - Some of what I think has been so cool to watch about the RSP development at Anthropic, too, is it feels like it has gone through so many different phases and there's so many different skills
that are needed to make it work, right? There's like the big ideas, which I feel like Dario and Paul and Sam and Jared and so many others are like, "What are the principles? Like what are we trying to say? How do we know if we're right?" But there's also this very operational approach to just iterating where we're like, "Well, we thought that we were gonna see this at this, you know, safety level, and we didn't, so should we change it so that we're making sure that we're holding ourselves accountable?" And then there's all kinds of
organizational things, right? We just were like, "Let's change the structure of the RSP organization for clearer accountability." And I think my sense is that for a document that's as important as this, right, I love the Constitution analogy, it's like there's all of these bodies and systems that exist in the US to make sure that we follow the Constitution, right? There's the courts, there's the Supreme Court, there's the presidency, there's, you know, both houses of Congress and they do all kinds of other things, of course, but there's like all of this infrastructure that you need around
this one document, and I feel like we're also learning that lesson here. - I think it sort of reflects a view a lot of us have about safety, which is that it's a solvable problem. - Mm-hmm. - It's just a very, very hard problem that's gonna take tons and tons of work. - Mm-hmm. - Yeah. - All of these institutions that we need to build up, like there's all sorts of institutions built up around like automotive safety, built up over many, many years. But we're like, "Do we have the time to do that? We've gotta
go as fast as we can to figure out what the institutions we need for AI safety are, and build those and try to build them here first, but make it exportable." - That's right. - It it forces unity also, because if any part of the org is not kind of in line with our safety values, it shows up through kind of the RSP, right? The RSP is gonna block them from doing what they want to do, and so it's a way to remind everyone over and over again, basically, to make safety a product requirement, part
of the product planning process. And so, like, it's not just a bunch of kind of like bromides that we repeat; it's something that you actually, if you show up here and you're not aligned, you actually run into it. - Yeah. - And you either have to learn to get with the program or it doesn't work out. - Yeah. - The RSPs become kind of funny over time because we spend thousands of hours of work on it, and then I go and talk to senators and I explain the RSP, and I'm like, "We have some stuff
that means it's hard to steal what we make, and also that it's safe." And they're like, "Yes, that's a completely normal thing to do. Are you telling me not everyone does this?" You're like, "Oh, okay, yeah." Yeah. - It's half true that that not everyone does this. - Yeah. - But that kind of, it's amazing because we've spent so much effort on it here and when you boil it down, they're like, "Yes, that sounds like a normal way to do that." - Yeah, that sounds good. - That's been the goal. Like Daniela was saying, "Let's
make this as boring and normal. Like let's make this a finance thing." - Yeah, imagine it's like an audit. - Yeah, yeah. - Right? Yeah. - No, boring and normal is what we want, certainly in retrospect. - Yeah. Well also, Dario, I think in addition to driving alignment, it also drives clarity- - Mm-hmm. - Because it's written down what we're trying to do and it's legible to everyone in the company, and it's legible externally what we think we're supposed to be aiming towards from a safety perspective. It's not perfect. We're iterating on it, we're making
it better, but I think there's some value in saying like, "This is what we're worried about, this thing over here." Like you can't just use this word to sort of derail something in either direction, right? To say, "Oh, because of safety, we can't do X, or because of safety, we have to do X." We're really trying to make it clearer what we mean. - Yeah, it prevents you from worrying about every last little thing under the sun. - That's right. - Because it's actually the fire drills that damage the cause of safety in the long
run. - Right. - I've said like, "If there's a building, and, you know, the fire alarm goes off every week, like, that's a really unsafe building." - Mm-hmm. - 'Cause when there's like actually a fire, you're just gonna be like- - No ones gonna care. - "Oh, it just goes off all the time." So- - Yeah. - It's very important to be calibrated. - Yeah. - That's right. - Yeah. A slightly different frame that I find kind of clarifying is that I think that RSP creates healthy incentives at a lot of levels. - Mm. -
So I think internally it aligns the incentives of every team with safety because it means that if we don't make progress on safety, we're gonna block. I also think that externally it creates a lot of healthier incentives than other possibilities, at least that I see, because it means that, you know, if we at some point have to take some kind of dramatic action, like, if at some point we have to say, "You know, our model, we've reached some point and we can't yet make a model safe," it aligns that with sort of the point where
there's evidence that supports that decision and there's sort of a preexisting framework for thinking about it, and it's legible. And so I think there's a lot of levels at which the RSP, I think in ways that maybe I didn't initially understand when we were talking about the early versions of it, it creates a better framework than any of the other ones that I've thought about. - I think this is all true, but I feel like it undersells sort of like how challenging it's been to sort of figure out what the right policies and evaluations and
what the lines should be. I think that we have and continue to sort of iterate a lot on that, and I think there is a question also that's difficult of sort of you could be at a point where it's very clear something's dangerous or very clear that something's safe, but with some technology that's so new, there's actually like a big gray area. And so I think that has been, like, all the things that we're saying were things that made me really, really excited about the RSP at the beginning, and still do, but also I think
enacting this in a clear way and making it work has been much harder and more complicated than I anticipated. - Yeah, I think this is exactly the point. Like- - Yeah. - Like the gray areas are impossible to predict. There's so many of them. Like, until you actually try to implement everything, you don't know what's going to go wrong. So what we're trying to do is go and implement everything so we can see as early as possible what's going to go wrong, so- - Yeah, you have to- - The gray areas are- - You have
to do three or four passes before- - Yeah. - Yeah. - Before you really, really get it right. Like, iteration is just very powerful and, you know, you're not gonna get it right on the first time. And so, you know, if the stakes are increasing, you want to get your iterations in early; you don't want to get them in late. - You're also building the internal institutions and processes, so the specifics might change a lot, but building the muscle of just doing it is the really valuable thing. - I'm responsible for, like, compute at Anthropic,
and so- - That's important. - So, thank you. I think so. So like for me, like I guess we have to deal with external folks- - Yeah. - And different external folks kind of are on different spectrums of the, like, how fast do they think stuff is gonna get? - Mm-hmm. - And like, I think that's also been a thing, where I started out like not thinking stuff would be that fast and have changed over time. And so, I have sympathy for that. And so I think the RSP has been extremely useful for me in
communicating with people who think that things might take longer because then we have a thing where it's like, we don't need to do extreme safety measures until stuff gets really intense, and then they might be like, "I don't think stuff will get intense for a long time." And then I'll be like, "Okay, yeah, we don't have to do extreme safety measures." And so that makes it a lot easier to communicate with other folks externally. - Yeah, yeah, it makes it like a normal thing you can talk about, rather than something really strange. - Yeah. -
Yeah. How else is it showing up for people? You are- - Evals, evals, evals. - Good. - It's all about evals. Everyone's doing evals. Like, your training team is doing evals all the time. We're trying to figure out, like, has this model gotten enough better that it has the potential to be dangerous? So how many teams do we have that are evals teams? You have Frontier Red Team. There must be, I mean there's a lot of people- - Every team produces evals, basically. - And that means you're just measuring against the RSP, like measuring for
certain signs of things that would concern you or not concern you. - Exactly. Like it's easy to lower bound the abilities of a model, but it's hard to upper bound, so we just put tons and tons of research effort into saying, like, "Can this model do this dangerous thing or not? Maybe there's some trick that we haven't thought of, like chain of thought or best event or some kind of tool use that's gonna make it so it can help you do something very dangerous." - It's been really useful in policy, because it's been a really
abstract concept, what safety is, and when I'm like, "We have an eval which changes whether we deploy the model or not," and then you can go and calibrate with policymakers or experts in national security or some of these CBRN areas that we do, to actually help us build evals that are well-calibrated and that, counter-factually, just wouldn't have happened otherwise, but once you've got the specific thing, people are a lot more motivated to help you make it accurate, so it's been useful for that. How has it shown up for- - The RSP shows up for me,
for sure. Often. I actually think, weirdly, the way that I think about the RSP the most is like what it sounds like- - Mm-hmm? - Just like the tone. I think we just did a big rewrite of the tone of the RSP because it felt overly technocratic, and even a little bit adversarial. I spent a lot of time thinking about like, how do you build a system that people just wanna be a part of? - Mm-hmm. - Right? It's so much better if the RSP is something that everyone in the company can walk around and
tell you, you know, just like with OKRs like we do right now- - Yeah. - Like, what are the top goals of the RSP? How do we know if we're meeting them? What AI safety level are we at right now? Are we at ASL-2? Are we at ASL-3? That people know what to look for because that is how you're going to have good, common knowledge of if something's going wrong, right? If it's overly technocratic, and it's something that only particular people in the company feel is accessible to them, it's just like not as productive, right?
And I think it's been really cool to watch it sort of transition into this document where I actually think most, if not everybody at the company, regardless of their role, could read it and say, "This feels really reasonable. I wanna make sure that we're building AI in the following ways, and I see why I would be worried about these things, and I also kind of know what to look for if I bump into something," right? It's almost like make it simple enough that if you are working at a manufacturing plant and you're like, "Huh, it
looks like the safety seatbelt on this should connect this way, but it doesn't connect," that you can spot it. - Mm-hmm. - And that there's just like healthy feedback flow between leadership and the board and the rest of the company and the people that are actually building it, because I actually think the way this stuff goes wrong in most cases is just, like, the wires don't connect or like they get crossed, and that would just be like a really sad way for things to go wrong, right? It's just all about operationalizing it, making it easy
for people to understand. - Yeah, the thing I would say is none of us wanted to found a company. We just felt like it was our duty, right? - It felt like we had to. - Like, we have to do this thing. This is the way we're gonna make things go better with AI. Like that's also why we did the pledge, right? - Yeah. - Because we're like the reason we're doing this is it feels like our duty. - I wanted to invent and discover things in some kind of beneficial way. That was how I
came to it, and that led to working on AI, and AI required a lot of engineering and eventually AI required a lot of capital, but what I found was that if you don't do this in a way where you're setting the environment, where you set up the company, then a lot of it gets done, a lot of it repeats the same mistakes that I found so alienating about the tech community. It's the same people, it's the same attitude, it's the same pattern-matching, And so at some point it just seemed inevitable that we need to do
it in a different way. - When we were hanging out in graduate school, I remember you had kind of this whole program of trying to figure out how to do science in a way that would sort of advance the public good. And I think that's pretty similar to how we we think about this. I think you had this like Project Vannevar or something, to do that. I was a professor. I think basically I just looked at the situation and I was convinced that AI was on a very, very, very steep trajectory in terms of impact,
didn't seem like because of the necessity for capital, and as a physics professor, I could continue doing that and I kind of wanted to work with people that I trusted in building an institution to try to make AI go well. But yeah, I would never recommend founding a company. Or really want to do it. I mean, yeah, I think it's just a means to an end. I mean, I think that's like usually how things go well, though. If you're doing something just to sort of like enrich yourself or gain power or, like, you have to
sort of actually care about accomplishing a real goal in the world and then you find whatever means you have to. - Well, something I think about a lot as just a strategic advantage for us is, I mean, it sounds really funny to say, but just like how much trust there is at this table, right? - Mm-hmm. - Like I think that's not, I mean, Tom, you were at other startups. I was never a founder before, but it's actually really hard to get a group of, like, a big group of people, to have like the same
mission. Right? And I think the thing that I feel like the happiest about when I come into work, and probably the most proud of at Anthropic, is how well that has scaled to a lot of people. It feels to me like in this group and with the rest of leadership, everyone is here for the mission, and our mission is really clear- - Yep. - And it's very pure, right? And I think that is something that I don't see as often, to Dario's point, in sort of the tech industry. It feels like there's just a wholesomeness
to what we're trying to do. Like, no, I agree, none of us were like, "Let's just go found a company!" I felt like we had to do it, right? It just felt like we couldn't keep doing what we were doing at the place we were doing it. We had to do it by ourselves. - And it felt like with GPT-3, you know, which all of us had like touched or worked on, and scaling laws and everything else, we could see it in front of us in 2020. And it felt like, well, if we don't do
something soon, all together, you're gonna hit the point of no return. And you have to do something to have any ability to change the environment. - Mm-hmm. - I think, building off Daniela, I do think that there's just like a lot of trust- - Mm-hmm. - In this group. I think each of us knows that we got into this because we wanna help out with the world. - Yeah. - We did the the 80% pledge thing, and that was like a thing that everybody was just like, "Yes, obviously we're gonna do this." - Mm-hmm. -
Yeah, yeah. - And yeah, I do think that the trust thing is a special thing that's extremely rare. - Yeah. - I credit Daniela with keeping the bar high. I credit you with the fact- - Keeping out the clowns. Keeping out the clowns. - Chief clown wrangler! That's my job. - No, but you're the reason the culture scaled, I think. - Yeah. People say how nice people are here. - Yeah. - Which is actually a wildly important thing. - I think Anthropic is really low politics, and of course, we all have a different vantage point
than average, and I try to remember that. - It's because of low ego. - But it's low ego, and I think I do think our interview process and just the type of people who work here, like, there's almost a like allergic reaction to politics. - And unity. - Mm-hmm. - Unity is so important. The idea that the product team, the research team- - Yes. - The trust and safety team, you know, the go-to market team, the policy team- - Yeah. - Like, the safety folks, they're all trying to contribute to kind of the same goal,
the same mission of the company, right? - Yes. - I think it's dysfunctional when different parts of the company think they're trying to accomplish different things- - Yeah. - Think the company's about different things or think that other parts of the company are trying to undermine what they're doing. - Yeah. - And I think the most important thing we've managed to preserve is, and again, things like the RSP drive it, this idea that it's not, you know, there's some parts of the company causing damage and other parts of the company trying to repair it, but
that there are different parts of the company doing different functions and that they all function under a single theory of change. - Extreme pragmatism, right? - Yeah. - You know, the reason I went to OpenAI in the first place, you know, it was a nonprofit, it was a place where I could go and focus on safety, and I think over time, you know, that maybe wasn't as good a fit and there were some difficult decisions. And I think, in a lot of ways, I really trusted Dario and Daniela on that, but I didn't want to
leave. That was something that I think I was actually pretty reluctant to go along with, because I think, for one thing, I didn't know that it was good for the world to have more AI labs. And I think it was something that I was pretty, pretty reluctant for. And I think, as well, when we did leave, I think I was, you know, I was reluctant to start a company. I think I was arguing for a long time that we should do a nonprofit instead, and just focus on safety research. - Yes. - And I think
it really took pragmatism and confronting the constraints and just being honest about what the constraints implied for accomplishing that mission- - Mm-hmm. - That led to Anthropic. - I think just a really important lesson that we were good about early on is make less promises and keep more of them. - Yeah. - Right? - Yeah. - Like, try to be calibrated, be realistic, confront the trade-offs, because, you know, trust and credibility are more important than any particular policy. - Yeah. - It is so unusual to have what we have, and watching Mike Krieger defend safety
things, of like reasons why we shouldn't ship a product yet, but also then to watch Vinay sort of say like, "Okay, we have to do the right thing for the business. Like how do we get this across the finish line?" - Mm-hmm, yep. - And to hear, you know, people like deep in the technical safety org talking about how it's also important that we build things that are practical for people, and hearing, you know, engineers on inference talk about safety. That's amazing. Like, I think that is, again, one of the most special things about working
here, is everybody with that unity is prioritizing the pragmatism, the safety, the business. That's wild. - The safest move- - I think about it as- - Yeah. - Spreading the trade-offs- - Yeah. - From just the leadership of the company to everyone, right? - Yeah. - I think the dysfunctional world is like, you have a bunch of people who only see a big, you know, safety is like, "We always have to do this," and product is like, "We always have to do this," and research is like, you know, "This is the only thing we care
about." And then you're stuck at the top, right? - Yeah. - You're stuck at the top. You have to decide between you don't have as much information as either of them. That's the dysfunctional world. The functional world is when you're able to communicate to everyone, "There are these trade-offs we're all facing together." - Yeah. - The world is a far from perfect place. There's just trade-offs. Everything you do is gonna be suboptimal. Everything you do is gonna be some attempt to get the best of both worlds that, you know, doesn't work out as well as
you thought it was, and everyone is on the same page about confronting those trade-offs together, and they just feel like they're confronting them from a particular post. - Mm-hmm. - From a particular job, as part of the overall job of confronting all the trade-offs. - It's a bet on race to the top, right? - It's a bet on race to the top, yeah. - Like it's not a pure upside bet. Things could go wrong, but- - Yeah. - Like, we're all aligned on like, "This is the bet that we're making." - And markets are pragmatic,
so if the more successful Anthropic becomes as a company, the more incentive there is for people to copy the things that make us successful. And the more that success is tied to actual safety stuff we do, the more it just creates a gravitational force in the industry that will actually get the rest of industry to compete. And it's like, "Sure, we'll build seat belts and everyone else can copy them." That's good. Yeah. - That's like good world. - That's really good. - Yeah. This is the race to the top, right? But if you're saying, "Well,
we're not gonna build the technology, you're not gonna build it better than someone else," that in the end, that just doesn't work because you're not proving that it's possible to get from here to there. - Mm-hmm. - Where the world needs to get, nevermind the industry, nevermind one company, is it needs to get us successfully through from this technology does-doesn't exist to the technology exists in a very powerful way and society has actually managed it. And I think the only way that's gonna happen is that if you have, at the level of a single company,
and eventually at the level of the industry, you're actually confronting those trade-offs. You have to find a way to actually be competitive, to actually lead the industry, in some cases, and yet manage to do things safely. And if you can do that, the gravitational pull you exert is so great. There's so many factors, from the regulatory environment, to the kinds of people who wanna work at different places, to, even sometimes, the views of customers- - Yeah. - That kind of drive in the direction of if you can show that you can do well on safety
without sacrificing competitiveness, right, if you can find these kind of win-wins, then others are incentivized to do the same thing. - Yeah, I mean I think that's why getting things like the RSP right is so important, because I think that we ourselves, seeing where the technology is headed, have often thought, "Oh wow, we need to be really careful of this thing," but at the same time we have to be even more careful not to be crying wolf, saying that like, "Innovation needs to stop here." We need to sort of find a way to make AI
useful, innovative, delightful for customers, but also figure out what the constraints really have to be that we can stand behind, that makes systems safe, so that it's possible for others to think that they can do that too, and they can succeed, they can compete with us. - We're not doomers, right? Like, we wanna build the positive thing. - Yeah. - We wanna like build the good thing. - And we've seen it happen in practice. A few months after we came out with our RSP, the three most prominent AI companies had one, right? Interpretability research, that's
another area we've done it. Just the focus on safety overall, like collaboration with the AI safety institutes, other areas. - Yeah, the Frontier Red Team got cloned almost immediately, which is good. You want all the labs to be testing for like, very, very security-scary risks. - Export the seat belts. - Yeah, exactly. - Mm-hmm, export the seat belts. Well, Jack also mentioned it earlier, but customers also really care about safety, right? Customers don't want models that are hallucinating. They don't want models that are easy to jailbreak. They want models that are helpful and harmless, right?
- Yeah. - And so a lot of the time, what we hear in customer calls is just, "We're going with Claude because we know it's safer." I think that is also a huge market impact, right, because our ability to have models that are trustworthy and reliable, that matters for the market pressure that it puts on competitors, too. - Maybe to unpack something that Dario said a little bit more, I think there's this narrative or this idea that maybe the virtuous thing is to almost like nobly fail, right? It's like you should go and like put
safety, you should go and put things, you should sort of demonstrate like in an impragmatic way so that you can sort of demonstrate your purity to the cause or something like this. And I think if you do that, it's actually very self-defeating. For one thing, it means that you're gonna have the people who are are deciding, making decisions, be self-selected for being people who don't care, and for people who aren't prioritizing safety and who don't care about it. And I think, on the other hand, if you try really hard to find the way to align
the incentives and make it so that if there are hard decisions, they happen at the points where there is the most force to go and support making the correct hard decisions, and where there's the most evidence, then you can sort of start to trigger this race to the top that Dario is describing, where instead of going and having, you know, the people who care get pushed out of influence, you instead pull other people to have to go and follow. - So what are you all excited about when it comes to the next things we'll be
working on? - Well, I think there's a bunch of reasons you can be excited about interpretability. One is obviously safety, but there's another one that I think I find, at an emotional level, you know, equally exciting or equally meaningful to me, which is just that I think neural networks are beautiful and I think that there's a lot of beauty in them that we don't see. We treat them like these black boxes that we're not particularly interested in the internal stuff, but when you start to go and look inside them, they're just full of amazing, beautiful
structure. You know, it's sort of like if people looked at biology and they were like, "You know, evolution is really boring. It's just a simple thing that goes and runs for a long time, and then it makes animals," and like, instead, it's like, actually, you know, each one of those animals that evolution produces, and I think that, you know, it's an optimization process, like training a neural network. You know, they're full of incredible complexity and structure, and we have an entire sort of artificial biology inside of neural networks. If you're just willing to look inside
them, there's all of this amazing stuff. And I think that we're just starting to slowly unpack it, and it's incredible, and there's so much there, but there's just so much we discovered there. We're just starting to crack it open and I think it's gonna be amazing and beautiful. And sometimes I imagine, you know, like a decade in the future, walking into a bookstore and buying, you know, the textbook on neural network interpretability or really, like, on the biology of neural networks, and just the kind of wild things that are gonna be inside of it. And
I think that in the next decade, in the next couple of years even, we're gonna go and start to go in and really discover all of those things. And it's gonna be wild and incredible. - It's also gonna be great that you get to buy your own textbook. - Just have your face on it. - I mean, yeah. - I'm excited that a few years ago if you had said like, "Governments will set up new bodies to test and evaluate AI systems and they will actually be competent and good," you would've not thought that was
going to be the case. But it's happened, it's kind of like governments have built these new embassies, almost, to deal with this new kind of class of technology or, like, thing that Chris studies. And I'm just very excited to see where that goes. I think it actually means that we have state capacity to deal with this kind of societal transition, so it's not just companies. And I'm excited to help with that. - I'm already excited about this to, you know, to a certain extent today, but I think just imagining the future world of what AI
is going to be able to do for people is it's impossible to not feel excited about that. Dario talks about this a lot, but I think even just the sort of glimmers of Claude being able to help with, you know, vaccine development and cancer research and biological research is crazy, like, just to be able to watch what it can do now, but when I fast forward, you know, three years in the future or five years in the future, imagining that Claude could actually solve so many of the fundamental problems that we just face as humans.
Even just from a health perspective alone, even if you sort of take everything else out, feels really exciting to me, just like thinking back to my international development times. It would be amazing if Claude was responsible for helping to do a lot of the work that I was trying to do a lot less effectively when I was, like, 25. - I mean, I guess similarly, I'm excited to build Claude for work. Like, I'm excited to build Claude into the company and into companies all over the world. - I guess I'm excited just for, I guess
like, personally, like I like using Claude a lot at work, and so like, definitely, there's been increasing amounts of like home times with like me just like chatting with Claude about stuff. I think the biggest recent thing has been code- - Mm-hmm. - Where like six months ago, I didn't use Claude to do any coding work. Like, our teams didn't really use Claude that much for coding, and now it's like just phase difference. Like, I gave a talk at YCU like week before last, and at the beginning I just asked like, "Okay, so how many
folks here use Claude for coding now?" And literally 95% of hands- - Wow. - Wow. - Like, all the hands in the room, which just is totally different than how it was four months ago or whatever. - Mm-hmm. - Yeah. - So when I think about what I'm excited about, I think about places where, you know, like I said before, where there's this kind of consensus that, again, seems like consensus, seems like what everyone wise thinks, and then it just kind of breaks, and so places where I think that's about to happen and it hasn't
happened yet. One of them is interpretability. I think interpretability is both the key to steering and making safe AI systems, and we're about to understand, and interpretability contains insights about intelligent optimization problems and about how the human brain works. I've said, and I'm really not joking, Chris Olah is gonna be a future Nobel Medicine Laureate. - Aw, yeah. - I'm serious. I'm serious, because a lot of these, I used to be a neuroscientist, a lot of these mental illnesses, the ones we haven't figured out, right, schizophrenia or the mood disorders, I suspect there's some higher
level system thing going on and that it's hard to make sense of those with brains because brains are so mushy and hard to open up and interact with. Neural nets are not like this; they're not a perfect analogy, but as time goes on, they will be a better analogy. That's one area. Second is, related to that, I think just the use of AI for biology. Biology is an incredibly difficult problem. People continue to be skeptical, for a number of reasons. I think that consensus is starting to break. We saw a Nobel Prize in chemistry awarded
for AlphaFold; remarkable accomplishment. We should be trying to build things that can help us create a hundred AlphaFolds. And then finally, using AI to enhance democracy. We worry about if AI is built in the wrong way, it can be a tool for authoritarianism. How can AI be a tool for freedom and self-determination? I think that one is earlier than the other two, but it's gonna be just as important. - Yeah, I mean, I guess two things that at least connect to what you were saying earlier, I mean, one is I feel like people frequently join
Anthropic because they're sort of scientifically really curious about AI and then kind of get convinced by AI progress to sort of share the vision of the need, not just to advance the technology, but to understand it more deeply and to make sure that it's safe. And I feel like it's actually just sort of exciting to have people that you're working with kind of more and more united in their vision for both what AI development looks like and the sort of sense of responsibility associated with it. And I feel like that's been happening a lot due
to a lot of advances that have happened in the last year, like what Tom talked about. Another is that, I mean, going back really to concrete problems, I feel like we've done a lot of work on AI safety up until this point. A lot of it's really important, but I think we're now, with some recent developments, really getting a glimmer of what kinds of risks might literally come about from systems that are very, very advanced- - Mm-hmm. - So that we can investigate and study them directly with interpretability, with other kinds of safety mechanisms, and
really understand what the risks from very advanced AI might look like. And I think that that's something that is really gonna allow us to sort of further the mission in a really deeply scientific, empirical way. And so I'm excited about sort of the next six months of how we use our understanding of what can go wrong with advanced systems to characterize that and figure out how to avoid those pitfalls. - Perfect. Fin! - Yay! - We did it! - Woo! - Good time. - I know. We gotta do this more often.