And propic CEO admits we have no idea how AI works. I saw this headline and in the comment section on Reddit, one of the top comments says, "Speak for yourself. I have a good idea of how it works.
" I love that. I love the idea that the anthropic CEO does not know, but this random Redditor named Professor Aftercare. Okay, he does.
He knows all the secrets. And for some reason, that comment has seven upvotes. Why does that have seven up votes?
Who are you seven people who clearly did not read the article? Cuz if you did, you would not be upvoting that. But fear not, you don't have to read the article.
You can instead waste your time watching a 10-minute YouTube video saying the exact same thing. Welcome. No, but I think that that interaction just shows we are all on really, really different pages about what it means to understand why something works.
Now, I think something that we can all agree on is that we as a society do understand the bare bones of how large language models, LLMs, work. If we didn't understand that, then all these companies couldn't be trying to create their own bigger and better models. But what I want to convince you of is that the capabilities of AI that are the most impressive, that are changing society the quickest.
So, can you edit this photo of me? Can you summarize these text messages? The ones where you really feel like the AI understands, those are the ones that we as a society do not yet understand.
And by the way, I know that AI is a super overloaded term. I'm specifically using the word AI the way we do in casual conversation, which generally means generative AI as opposed to like a more technical definition, which is like anything intelligent. And I'm mostly going to be using examples from large language models because those are the ones that are most well researched and I think they're the ones that are easiest to understand.
And really quickly, this is not going to be a super technical video. I don't have an AI research background myself. I'm a software engineer.
I work in AI the same way I think a lot of developers work in AI right now, which is that I'm really good at calling the API that runs the model. I promise that at no point during this video is there going to be matrix multiplication on the screen that you have to follow along with. You're not going to have to remember your college linear algebra.
So before we get into what we don't understand, let's talk about what we do understand. As a lot of you guys know, LLMs are next word prediction models. So that means you feed in a sentence like the dog is and then the model's job is to fill that in with the word fluffy.
That next word prediction problem has been around for a while. The big breakthrough that basically made LM's good is this paper called attention is all you need that came out of Google in 2017. And I know that saying it's what made LM's good is an oversimplification, but basically what I mean is that this paper introduced the transformer model architecture.
And that is the model architecture that Chachi PT and basically all major language models are based off today. I know most of you already know this paper, but every single time I see it, I'm like, that sounds like a Lady Gaga song. That name is so good.
Anyway, I'm not going to get into the details here, but the thousandm view of that paper is that it introduces this transformer model architecture and instead of going through the words one by one like older models, it looks at all the words in a given input sentence at the same time. So, it sped up training a lot. It also allowed words to assign different levels of importance to other words in the sentence.
So, like a word could pay attention to the parts of the sentence that are the most relevant for understanding that specific word. Again, this is not a channel where I'm going to get the matrix multiplication up on the screen, but if you're interested in learning more about that, I can link some videos that will go into the nitty-gritty. My point of bringing all this up is that this paper had a specific goal.
In their case, what they were trying to do was actually machine translation, translation from, I think it was German to English. They were not trying to enable summarizing text like reasoning. That was not part of their goal.
But somehow introducing this model architecture that allowed the models to get a lot bigger, training to get a lot faster, made the magic happen. So what do I actually mean by magic? What I'm talking about is these emergent capabilities.
In the context of AI, when people use the phrase emergent capabilities, what they generally mean is capabilities that only seem to come out when the model gets to a certain size. Some examples of these capabilities is like following instructions. So that's a huge one, right?
Like a big part of prompting is following instructions, answering questions truthfully to the best of their ability, language understanding. So one example here is like, what is the capital of the state that has Dallas? And the answer is Austin.
This is basically why so many people are using ChachiBT every day. Like why the sentences that ChachiBT and all the other models are producing are actually good sentences most of the time. And if this is still kind of vague for you, you can also think of more concrete examples.
So one of them is like doing arithmetic problems and the other one is unscrambling a word. Those are both like other emerging capabilities that people generally see when the model gets above a certain size. Now there's this open debate about whether those capabilities are actually emergent.
Meaning, did those capabilities only appear once the models got above a certain size or were those capabilities kind of there all along? I don't think this is super relevant to my main argument here, but I'm just telling you because in case you Google emergent capabilities and the internet says they don't exist, that is why. The point overall is that LLMs have these capabilities.
We've all seen these capabilities ourselves right in front of us. Like again, summarizing a text is not the same as just predicting the next word in a sentence. It does demonstrate some level of what I think a lot of people would call understanding.
I don't love the use of the word understanding. I have a whole video about that, but unfortunately I just don't have a better word for that right now. And these behaviors are not only not pre-programmed, as in someone did not go in and manually write code that this is what summarizing means and this is like the behavior you should do to summarize text.
They're also in most cases something the model was not trained for. How the in those cases do we have models that have these capabilities? How the is a model good at reasoning or unscrambling a word or telling the truth if we never trained it for those things?
That is what we don't know. Kind of a meta point here, but it's a little bit hard to prove a negative. Like if my argument was that we did understand the AI models, maybe I could just explain it to you.
But because my point is the opposite, it's kind of hard for me to prove to you that not only do I not know, but nobody knows. And before somebody's like, "Well, I built an AI model from scratch, so I know how it works. " That is not the same thing.
You might know the exact instructions for how to build it, but that does not mean you understand why it is working. As an example, I could glue together a model train and it would probably look all right, but that does not mean I know how trains work. I could glue together a model airplane, but I wouldn't know the physics of what makes airplanes fly.
In general, in our society, most of our technology is completely well understood. And the effect of that is that one, we're able to innovate better, right? Because like if you're saying this is the way that the airplane flies with the wing, okay, well then maybe we make the wing longer and then it will fly better, whatever.
And also because people understand how airplanes work, they can do an effective safety check and know that maybe if like a part of the paint is peeling, the plane isn't going to crash. You know, but if one of the wings is dented in a specific way, maybe it is and maybe that plane can't fly that day. But because we do not understand AI models completely, we don't know the difference between the paint chipping and the wing being dented.
Anyway, because it is hard for me to prove a negative, what I am going to talk to you about is how AI researchers are trying to understand how AI models work. And I'm hoping that along the way that will somewhat convince you that the researchers don't know the answers and that's why they're looking for them. A lot of this work is coming out of anthropic these days.
That's why the article was talking about the anthropic CEO. And this field specifically is called interpretability, which is basically how and why the AI models work the way that they do. So, one cool thing anthropic was able to do as part of their research is they built something called Golden Gate Bridge Claw where they were able to find the feature in the model that represented the Golden Gate Bridge and then they basically like turned that feature up to the max and found that every time they talked to that model, it really wanted to talk about the Golden Gate Bridge.
This is really cool first of all because it shows that there are features within the blackbox, this model that actually are corresponding to real world concepts as opposed to all the features in there being sort of like too abstract for our human brains to understand. This also has cool applications because if we found the features for unsafe concepts and unfortunately I can't talk about what those unsafe concepts are on YouTube but you can imagine them then you could actually steer the model away from talking about those unsafe concepts. This is basically high level a way that you could control the model without changing any of the training data.
So it's super valuable to be able to know how we could possibly do that. So another piece of research out of Enthropic that's sort of taken one step further is that they're able to identify these things called circuits and they're supposed to represent the pathways between features and they're trying to use this to track the way the model thinks. One example is that question I had earlier of what is the capital of the state that contains Dallas.
So that's sort of a reasoning question, right? What they saw is this located within circuit triggers from Dallas over to Texas and then there is a circuit that causes Austin to trigger after Texas and capital. And this was all done manually.
This is not generalizable to any concept, to any pathway, circuit, whatever you want to call it, but it is like a first foothold into understanding the inner workings of LLMs. I think what's really surprising for me is that I didn't expect the LLM to go through the same reasoning steps that I do, right? The Dallas is contained within Texas.
The capital of Texas is Austin. But that is what this really, really early research is showing is that LM go through the same steps of reasoning that we do. Hopefully that convinced you a little bit that we don't fully understand what is going on in that black box.
But now I want to talk about why you should care. Why is it important that we understand how LLM's how generative AI works? The first thing is making progress.
Right now AI research is a lot of guessing and checking, right? And a lot of like can it just work better if we make the model bigger. But if we understood more deeply the way that large language models worked, it would seem to follow that it would be a lot easier to make progress in the field.
If we know what it is that's working really, really well, maybe we can keep doing that to make it even better. And maybe if we keep making LLMs better in this way, that's how we'll reach an eventual AGI or artificial general intelligence. Now, that's not the opinion of all scholars.
Yan Lun, for example, does not believe that we are going to reach AGI by just making LLMs better and better and better. But it's an open argument in the world of AI right now. The second reason I think that understanding AI is really important is for these safety fears.
This is the fear that AI might manipulate us, that it might lead us in a bad direction. This is also in the same vein of thought of AI might take over. It's kind of hard for me to talk about this without overly anthropomorphizing the AI model and giving it human qualities.
So, just bear with me. But basically, the thought here is that we don't know right now if AI is manipulating you. It might have accidentally given you bad output, telling you to go harm someone or it might be intentionally manipulating you to go harm someone because it was aligned to think that that someone is a bad actor and deserves something bad to happen to them.
Right now, we don't know. We're just guessing because we only are looking at the outputs. We're not able to look into the AI model and see what is going on within all the features within all of the circuits.
And lastly, to be honest, it's just kind of crazy that we don't understand AI. To have that technology be so pervasive and not really understanding what's going on under the hood is, in my opinion, crazy. And in the words of the anthropic CEO, it's really unprecedented.
But that's all I have for you guys now. Um, for folks who subscribe to me for my funnier short form content, sorry to let you down. This is not a 10-minute long skit.
I'd love to be able to keep making all my short form comedic stuff and also some of my longer form, more in-depth analysis stuff when I have the time to make it. Please subscribe, follow along for more.