[MUSIC PLAYING] TOM CRUISE: I'm going to show you some magic. It's the real thing. I mean, it's all the real thing. DAVID MALAN: All right, this is CS50. And this is our Family Weekend here at Harvard College. So we have lots of parents and siblings and other relatives here in the group. And this is meant to be a family friendly lecture on Artificial Intelligence, or AI. My name is David Malan. I am your instructor today. And in CS50 for some time, we have had this tradition of giving every student in the class a rubber duck
like this one here, whereby the third or so week of the class we hand these out. And the idea is that if students are struggling with some concept or they have some bug or mistake in their code, they are encouraged to literally talk to the rubber duck. And in that process of verbalizing what confusion and questions they're having, invariably, that proverbial light bulb tends to go off. Now, some years ago, we actually virtualized this rubber duck and implemented a software version of it, whereby students, for instance, could open up a chat window in CS50's programming
interface. They could begin to converse with this here virtual duck. I'm hoping you can help me solve a problem. And then randomly once, twice, or three times with this duck, quack back in response. We have anecdotal evidence that this was sufficient, educationally, to actually prompt the student to figure out what was going wrong because they had already verbalized their confusion. But much to the surprise of some of these students' predecessors, just over a year ago, did the duck literally overnight in spring of 2023 start talking back to them in English. And that is all because
of, underlying the hood now, is, indeed, some artificial intelligence. So among our goals for today is to give you a taste of artificial intelligence and, in turn, CS50 itself, but to also give you a sense of how this technology itself works, because it's certainly not going anywhere. It's only going to become all the more omnipresent, most likely. So hopefully at the end of today's hour, you will exit here all the more of a computer person. But the talk of the town has been specifically something called generative AI. Like AI as a field of computer science
has been with us for decades. But it really has made exponential improvements in recent months and in recent years. But the focus of late has been on, indeed, generative AI, whereby we're using some form of artificial intelligence to generate stuff. And that stuff might be text. That stuff might be images. That stuff might be video content and so much more in the years to come. In fact, the Tom Cruise that you saw greet us just a moment ago was not, in fact, the real Tom Cruise, but a so-called deepfake, which we introduced today's class with,
playfully, of course, but there's actually serious implications, suffice it to say, when it comes to information, disinformation in the world. But for today, we'll focus really on the underlying technology and know-how. So we want to make this as participatory, too, as we can. And so over the past couple of years, lots of publications, the New York Times among them, has sort of tested people's ability to discern true from false, reality from artificial intelligence. And the New York Times for instance, put together a sequence of images that we thought we'd share with you. I'm joined by
CS50's preceptor here up on stage, Julia who's going to help guide us through a sequence of multiple choice problems, if you will. The first of which is going to be this one here. Two images-- one left, one right-- which was generated by AI, left or right? And if Julia, you want to switch over to our special software, we'll see the votes coming in. Looks like 80% of you are voting at the moment for right. The left is making some progress here. 4% or so unsure. I think that's pretty close to the right winning. Let's see
what the correct answer is, if Julia, we can switch back to this. The answer was, indeed, the one on the right. And why is that? Someone who voted right, why did you say? Feel free to just shout it out. Yeah. Why right? AUDIENCE: It's more clear. DAVID MALAN: OK, so it's more clear. Seems a little more vivid, maybe a little too good. OK, so a pretty good heuristic. But still, 20% of you got that wrong. So let's try one more. Left or right, which was generated now by AI, left or right? Which was generated by
AI? Let's toggle over and see what the responses are looking like. This time it looks like you just switched. Your most confident answer here, so 70%, just under, are voting now for the person on the left, 30% person on the right. About 5% still unsure. If we toggle back to the two photographs. Unfortunately, trick question, both of them were generated by AI. So it's getting harder already, indeed. And it's only going to get daresay impossible before long. Well, let's turn our attention to text, because clearly that's what underlies something like CS50's own rubber duck. Here
is a headline from the New York Times some months ago. Did a fourth grader write this, or the new chat bot? And a chat bot is just a piece of software that converses with you textually and, invariably soon, via voice, as well. This test is textual. And I'll read it aloud. Essay 1-- I like to bring a yummy sandwich and a cold juice box for lunch, and sometimes I'll even pack a tasty piece of fruit or a bag of crunchy chips. As we eat, we chat and laugh and catch up on each other's day. Essay
2-- my mother packs me a sandwich, a drink, fruit, and a treat. When I get in the lunchroom, I find an empty table and sit there and eat my lunch. My friends come and sit down with me. So one of those was written by a fifth grader. One of those was written by AI. Which was written by AI, essay 1 or essay 2? Let's see the votes as they come in. So similar percentage. So maybe it's roughly the same group of people for each of these votes, having switched last time and now stayed in the
lane. About 76% Essay 1, 23% Essay 2. Fewer people are now unsure, so that's progress. 2% to 3%, so still, indeed, progress. Let's go back and take a look. The answer in this case was essay 1 was the AI. And here, too, it's not necessarily obvious. But I'm not sure how many fifth graders say they catch up on each other's day in middle school, for instance. So this, too, though, is only going to become more of a challenge. Thank you to CS50's preceptor, Julia, here as we-- and maybe round of applause for having choreographed that
so perfectly. Thank you, Julia. So where do we begin? So in CS50 in spring of 2023, we began to embrace artificial intelligence in some form. We were not sure quite how. We weren't quite sure how well it would work. This has all been very much an experiment. But ChatGPT itself, as you might recall, only came out about 23 months ago in November of 2022. And how quickly the world seems to have changed already. But at least in our educational context, this is the premise from which we begin all of CS50's work and development with AI
is that ChatGPT and Bing and Claude and tools like it, they're just too helpful out of the box. Educationally, as you yourselves might have experienced, they're all too willing to just hand students' answers to problems, which is great if they just want to get that answer. But if they want to learn the material, certainly if a teacher wants to assess their understanding of the material, all too willing to just hand out answers, rather than lead students to successful outcomes. And so we actually put, two years ago, almost, in CS50 syllabus that it's not reasonable to
use ChatGPT or similar tools. We can't prevent it technologically, but we do communicate both ethically and through policy that it's, indeed, not allowed, and thus not reasonable. But we do think it reasonable to use CS50's own AI-based software, including that virtual rubber duck in several different forms, only one of which you've glimpsed so far, which, indeed, is designed to lead students to solutions, but, indeed, not hand them to them outright. So we thought we'd share with you, then, a taste of how this duck is implemented, but then, in turn, how artificial intelligence is making all
of this work. And here, for more of the engineering folks in the audience, the computer persons is an architectural diagram of what the CS50's team has been building over the past couple of years, including CS50.ai, which is like the central server that runs all of this. It provides, of course, students with a very user friendly interface, including a rubber duck. But we also have a local vector database, as they're called these days, where we actually, after every lecture, convert, for better or for worse, everything that comes out of my mouth to text, then run it
through a database of ours so that it can then be searched not only by students, but, in turn, this underlying AI. And all of this is built on top of third party tools. So we have not reinvented the wheel. We have not built our own large language model, as these things are called, as we'll soon see. But we're building on top of things called APIs, Application Programming Interfaces, which are third party services that the OpenAI's, Google's, Microsofts, and others provide so you can build your own tools, educational in nature, in our case here. Now, as
for what this looks like, for instance, this is just one of the views. Your own student or child can perhaps give you a better sense. But this is the chat interface as students experience it in CS50, similar in spirit to ChatGPT. And indeed, for now, we do have disclaimers that it's not going to be perfect. There are things called hallucinations that we, on occasion, might suffer from, as well. More on that soon. But here is a representative question that a student might ask. Bless your hearts, but unfortunately, this is about as detailed as the questions
get sometime. My code is not working as expected. Any ideas? And so the duck, upon seeing not only that question, but the student's code, actually, and with a wave of a hand I'll stipulate today, the duck debugger or DDB, doesn't just answer the question outright, but responds with something like this. It looks like you're trying to add two integers, but there's an issue with how you're handling the input. What data type does input return, and how might that affect your addition? So ideally, behaving like a good teacher, a good tutor, and interactively having the conversation
with the student. Not always perfect, but pretty darn good already out of the box. And surely, as industry progresses, it's only going to get better. And, indeed, the conversations will only get richer and more involved for students. Now, besides our students here in Cambridge, besides our students down the road in New Haven at Yale, we actually have a very rich history of OpenCourseWare in CS50, where everything we do curricularly and technologically, is freely available to anyone around the world, teachers and students alike. So to date, since spring of 2023, we have some 201,000 students and
teachers already using this here duck. That's averaging 35,000 prompts or questions per day, a total of 9.4 million as of this morning. So not only are our own students here, but really, the world is starting to embrace these tools, whether it's off the shelf, like ChatGPT, or ducks like ours here. So the overarching pedagogical goal, though, and the utility, as our own students probably know by now, is really to provide this-- students with 24/7 office hours, one-on-one, or duck-on-one opportunities for help with the course's homework assignments and more, and to approximate, really, what is the
Holy Grail, I dare say, educationally, a one-to-one student to teacher ratio, which even at a place like Harvard or Yale, where we have the luxury of lots of teaching fellows or teaching assistants, many of whom are students themselves, we still might have in class ratios of 1 to 6, 1 to 10, 1 to 20, which is great. But as you think about this, as we often do, even if you have just six students in a room at office hours, and that office hour is, indeed, an hour, that's, like, 10 minutes per student, which, for the
struggling student, has never been historically enough time, necessarily, for that in-person interactions. And so with software now, we hope to continue to leverage the exact same humans we have, but allocate those same resources ideally to the students who need it and benefit from it most, while allowing those students more comfortable to, indeed, interact virtually if they prefer any time of the day with this duck. So as for what then is powering this duck and similar technologies underneath it is sort of a new term of art that you might have heard in the real world that
have prompt engineering. So we've got these AIs out there, ChatGPT among them. And it's become a skill, sort of a LinkedIn thing, daresay, for better or for worse, to know prompt engineering, which essentially means how to ask good questions of an AI-- generally in English, but really in any human language. It's a little bit hackish. This is not really an engineering skill as much as it is just getting acclimated to what kinds of patterns of questions tend to induce the AI to respond to you better. It's like being good at Google searches. This is not
something that's probably going to be a necessary skill before long, because the software is just going to get better. But in essence, what prompt engineering means is that someone somewhere has at least written for the AI a system prompt, a set of instructions, usually in English, that tell the AI how to behave, what domain of information to focus on, and so forth. So for instance, in the case of CS50, when we built our duck on top of OpenAI's APIs, we literally tell OpenAI this in CS50's so-called system prompt, quote unquote, "you are a friendly and
supportive teaching assistant for CS50," essentially coercing the underlying version of ChatGPT to behave in a CS50-specific way. But our second sentence in our so-called system prompt is this-- "you are also a rubber duck." And that is enough to coerce some degree of quacking or other similarly adorable behavior. But we further go on to tell the AI, "answer student questions only about CS50 in the field of computer science. Do not answer questions about unrelated topics. Do not provide full answers to problem sets, as this would violate academic honesty." And then in essence, we say, answer this
question. And we copy and paste whatever the student has typed into the window their question, which is generally known as a user prompt, much like you might type into ChatGPT. So that not only does the underlying AI know what the question is, it has this system prompt for additional context so that it behaves unlike the default and more like a pedagogically designed rubber duck, in our case. In fact, let's see how we might implement this in code. Let me go over to a program I've got running on my machine here already, which students know as
VS Code, Visual Studio Code, which is the programming environment we and lots of folks in industry use. I'm going to run a command called code chat dot pi. And I'm going to implement the simplest of chat bots here live in a language called Python that we just learned a few days ago. I'm going to go ahead and say message equals input, maybe something like, what's your question, question mark? And what this line of code is doing for the parents and siblings in the room, this is what's called a variable, similar in spirit to mathematics, like
x, y, z. But I'm using an English word instead. Input is a function, like a verb, that will tell the computer what to do for me, in this case, get input from the user. And in quotes here, I have the prompt or the question that I want the computer to ask of the human. Then I'm going to go ahead and do this-- print, quote unquote, "quack, quack, quack." So in essence, this was version one of our rubber duck. And if I run this program now with a command in my the bottom of my screen, Python
of chat dot pi and hit Enter, you'll see that the cursor is now waiting for me to provide my user prompt, if you will. How about, what is AI, question mark? And that was it for sort of version 1. But I dare say, if you'll humor me and let me just type somewhat quickly a little more advanced code, that even CS50 students this past week have not yet seen, I can actually turn this into an artificially intelligent duck, as well. So let me clear the bottom of my window and hide that for a moment. And
let me start doing a few things that I'm going to wave my hand at to some extent, but I'll explain a few of the lines as we go. So I'm going to first import some functionality relating to the underlying operating system. I'm then going to import from a dot env, a dot environment, a function called load env, which is just going to make it easier for me to talk to OpenAI without having to log in and do a bunch of stuff that I did in advance of class already. And I'm going to call that function
called load dot env right away. After that, I'm going to import from a library I already installed called OpenAI, which they make freely available to us and anyone else. I'm going to import a feature called OpenAI, capital O, capital AI. I'm then going to create another variable called client, which refers to the software I'm writing. And I'm going to set that equal to whatever that functionality, that feature does for me, OpenAI's feature. And I'm going to specify that the API key that I want to use is equivalent to whatever is in my operating systems environment
in a special variable called API key, which, again, I configured before class, just saving myself the trouble of logging in with a username and password here. All right, what do I do next? Let's first define a system prompt now and a third variable. And that system prompt will be reminiscent of what we actually do in CS50. You are a friendly and supportive teaching assistant for CS50 period. You are also a duck, period. And that's it. Now I'm going to go ahead and create a fourth variable called user prompt, set that equal to the input function,
which we saw briefly earlier. And I'm just going to say, again, what's your question, question mark? But now I'm going to do something whereby I'm talking to OpenAI's API, passing to OpenAI my system prompt and my user prompt together so that it behaves in the way I want. I'm going to create another variable called ChatGPT completion, set that equal to this function, which is a mouthful, client dot chat dot completions dot create open parentheses. Then on a new line, just so it doesn't look messy on the screen, I'm going to say that the messages I
want to send to OpenAI is this list of messages for students from this past week. Messages is thus a named parameter, which is just a parameter to a function. And the square brackets mean this is a list. The list I'm going to provide has two things, two Python dictionaries, sets of key value pairs. The first is going to say a special keyword here role colon system. And then the content for this role is going to be the system prompt variable that I defined earlier. So this is passing to OpenAI the system prompt. Then I'm going
to pass in one more of these things, where the role this time is going to be that of quote unquote "user," then the content of that is going to be the user prompt, which the human typed in. After that, I'm going to specify one other named parameter, which is to say the model I want to use is something called GPT 4.0, which is the latest and greatest version with which some of you might be familiar in the real world. I know it's a mouthful, but we're almost done. Now, I'm going to go ahead and create
a final variable, response text, to literally look at the text that comes back from OpenAI. I'm going to set that equal to the chat completion variables, choices, attribute, the first element therein, starting at 0, the message therein and the content thereof. And then lastly, finally, I'm going to print out that response text. Now, I don't normally write this many lines of code all at once in class. So I'm going to cross my fingers big time now, reopen my terminal window, run Python of chat dot pi. I'll increase the size of my terminal just so we
focus on this. Hopefully I made no typographical errors. All right, let's ask again. What is AI, question mark? Enter. OK, some of you knew that. Thanks a lot. OK, so response text. The last line, too. OK, all right. So that's a bug, as we call it in programming. Let's run this again, python of chat.py. What is AI, question mark? There we go. AI, or Artificial Intelligence, refers to the simulation of human intelligence and machines that are programmed to think and learn like humans, dot, dot, dot, some other educational stuff, and quack at the end, which
was generated for us. So thank you. [APPLAUSE] So suffice it to say, we've spent a lot more time-- the whole team of CS50 has spent a lot more time building the fancier version of ChatGPT dot pi that is CS50 by itself. But that is it in essence as to how people like us are building on top of today's AIs to build very specific applications that hopefully leverage them for better instead of for worse. But how did OpenAI make all of that possible? How do these large language models, ChatGPT and, in turn, CS50's duck work? Well,
let's consider, ultimately, what has been getting developed over the decades now underneath the hood that defines what AI is. But before we go in that direction, let me propose that we look at not only generative artificial intelligence, which is, again, the use of AI to generate content, as we just did in text, but specifically, artificial intelligence more generally. So here's where we take a spin through the world of AI. So AI's been with us for years, even though we only really started talking about it every day. In the past couple of years, any of us
who've been using Gmail or Outlook or the like, the spam filters have been getting pretty darn good. It's pretty rare, at least with most big mail programs nowadays, where you have to manually deal with spam. It's often going into your spam folder. But there isn't some human who works at Google or Microsoft who's looking at all of your email or even all of the email coming in and saying spam, not spam, spam. That would just not be feasible nowadays. So somehow, AI is inferring using some kind of techniques or algorithms, step by step instructions for
solving problems, what is spam and what is not. It's not perfect. But hopefully it's usually correct. And indeed, that's the behavior we might want. Handwriting recognition on tablets and phones and the like, this has been using AI for years because no one at Microsoft or Google knows what your specific handwriting looks like. It looks similar, though, to other people that have trained the AI. Watch history-- the Netflix and other streaming services are getting pretty darn good at recommending based on things you have watched and maybe upvoted or downvoted, similar shows or movies that you might
want to watch, as well, that, too, has been AI. And then, of course, all of these voice assistants, like Siri and Alexa and Google Assistant, they don't know your voice specifically, but it's pretty similar to other humans' voices. And so they learn and figure out how to respond to not only the Google and Microsoft employees, but to your and my voice, as well. And then, of course, we can actually go way back to things like the very one of the earliest arcade games, which some of you might have played as a kid, this here being
pong. It's sort of like a tennis game where two people move the pedal up, down and up, down and bounce the ball back and forth. So it turns out that games is a nice domain in which to start talking about AI because one, it's fun. But two, it also tends to have very well defined rules and goals, like maximize your score or minimize the other person's score. In fact, here's another incarnation of really the same idea. This was an arcade game that came out later called Breakout-- exists in many different flavors. But in this case
here, things are sort of flipped around. There only needs to be one player. And the idea is with this paddle, you bounce this ball against the bricks, and the bricks disappear once you break them. And therefore, the goal is to get rid of all of these bricks. But even just based on this screenshot, odds are, all of us, as humans, have an instinct for which way the paddle should move. If the ball just left the paddle and went this way, which way should the paddle be moved by the human, left or right? I mean, obviously
to the left if this thing is going to bounce or reflect off of that first brick. So there's a very well defined heuristic that even we have ingrained in us already. Maybe we can translate that to code and something a computer can ultimately do. Well, the first way we'll try thinking about this actually comes from the class before us, EC10, or decision trees, or from strategic thinking more generally, whereby you can actually draw a picture, like a tree in programming that we've seen in week five of CS50, where you have some root node where you
begin and then all of these different children via which you can decide yes or no, do this thing. So a decision tree for something like the game we just saw could be a drawing like this-- is the ball to the left of the paddle. If so, then go ahead and move the paddle left, which is what everyone's instinct here was. But if the ball is not to the left of the paddle, then you should ask a second question. You shouldn't just blindly move to the right, because there's actually three scenarios here, even if non-obvious. Is
the ball to the right of the paddle? That, too, has a yes/no answer. If yes, then obviously move the paddle to the right. But if no, then you should probably don't move the paddle and stay where you are. So not very hard, but now you can imagine, especially if you're in CS50 or you're a computer person, we could probably translate this to some kind of code in some language because it's very much based on conditionals-- if, else if, else, so to speak. So what does that do for us? Well, we could translate this to pseudocode,
English-like code that we might write in CS50. While the game is ongoing, if the ball is to the left of the paddle, then move the paddle left. Else if the ball is to the right of the paddle-- that's a typo. Second bug, sorry. If else if the ball is the right of the paddle, move the paddle. Else, don't move the paddle. So there's a perfect translation from decision trees as a concept and as a picture to the code that we've been writing over the past several weeks in any number of languages. But let's try a
slightly more sophisticated game that most of us probably grew up playing on like pieces of paper and napkins and the like, that of Tic-Tac-Toe. For those unfamiliar, it's a 3 by 3 grid. You play X's or O's as two different people. And the goal is to get three in a row, three X's, either horizontally, vertically, or diagonally. And whoever achieves that first wins, unless it's a tie. Well, Tic-Tac-Toe lends itself to a pretty juicy discussion of decision making because there's a lot of different ways we could play even the simplest of these games. So, for
instance, if we are considering here this board, where X and O have each gone once, let's consider what should happen next. In particular, if we translate this into a decision tree, whoever's turn it is should ask themselves, can I get three in a row on this turn? Because if yes, well, then they should play in the square to get three in a row, period. But if they can't get three in a row on this turn, they shouldn't just choose randomly. They should probably answer another question for themselves like this. Can my opponent get three in
a row on their next turn? Because if they can, then I want to block them by playing in the square to block the opponent's three in a row. But here's where things get interesting. If the answer is no, where do I go? Well, if there's only one space left, it's pretty obvious, go there. If there's two left, which is better? If there's three left, which of those is better? And the further you rewind in the game, the less obvious it gets where you, the human, should go. That said, there is an algorithm for solving this
problem optimally. In fact, I'll disclose now, if you ever lose Tic-Tac-Toe, you are objectively bad at Tic-Tac-Toe. Because there is a strategy that won't guarantee that you'll win always. But there is a strategy that will guarantee that you will never lose. You can at least force a tie if you're not going to win. So with that set up and sort of bubble burst, perhaps, let's consider how we can go about answering the question mark part of the tree, especially when the game starts super early like this. Where does O go to play optimally? Well, what
we could do is this. Recognize first that with this particular game, we have these inputs and outputs that can be represented mathematically. More on that in just a moment. But the goal of Tic-Tac-Toe, then, can be said to be to maximize maybe or minimize the score. Maybe X wants the biggest score possible or O wants the smallest score possible, for instance. Specifically, let's do this. Let's talk about an algorithm in computing called minimax. And as the name implies, this is all about minimizing and maximizing something, which is really what Tic-Tac-Toe, we can see, is all
about. For instance, here are two sample boards of Tic-Tac-Toe. On the left is one in which O has one. On the right is one in which X has one. And then in the middle is one that's a tie. It doesn't matter what numbers we use, but we need to agree. So let me propose that if O wins, we're going to call the board a negative 1 value. If X wins, it's a positive 1. And if no one wins, it's a 0. As such, it stands to reason that X's goal in life now is clearly to
be to maximize its score. And O's goal in life is to minimize its score. We could flip the numbers around. It doesn't matter. But we've reduced a fairly fun childhood game now to boring mathematics, if you will, but in a way that we can now reason about it and quantify just how easy or hard this game is as each of X and Y-- as each as X and O aspire to maximize or minimize their score, respectively. So here, then, in the board, in the middle, here, is one little sanity check. What is the value of
this board on the screen per my definition a second ago? AUDIENCE: 1. AUDIENCE: 0. DAVID MALAN: I heard 1. I heard 0. [INTERPOSING VOICES] DAVID MALAN: I heard minus 1. Great. We're seeing the range of answers. The answer here is going to be 1, because I do see that x has won. And I proposed earlier that if X has won, the value of the board is a 1. So again, if X wins, it's a 1. If O wins, it's a negative 1. Those are the correct answers for Tic-Tac-Toe here, too. And a tie means 0.
So let's back up one step. Here is a board that's got only two moves left. Suppose now that it's O's turn. Where should O go? Now, you might, as an expert Tic-Tac-Toe player have an immediate instinct for this. But let's break it down into some decisions. So if it's O's turn, O can ask itself, well, what is the value of this board? Because I want it to be negative 1. Or barring that, I want it to be 0. Well, what's it going to be? Well, if O goes in the top hand corner, what's the value
of this board? We're not sure yet, because no one has won. Well, if then x goes invariably in the bottom location there-- darn it. Now X has won. So the value of this board way down here is, by definition, 1. There's only one way, logically, to get from this board to this board. So we might as well, by transitivity, say that the value of this board is 1, even though it hasn't been finished yet, because we know where we're going with it. So that, then, invites the question, is this board any better? So if O
goes bottom middle, what's the value? Well, the only place X can go is top left. And now the value of that bottom right board is actually better. It's a 0 because no one has won. Logically, the value of this board might as well be considered a 0 as well, because that's the only way to get from one to the other. So now the decision for O is, do I want value 1, or do I want value 0? O's goal, as I defined it, is to minimize its score. 0 is less than 1. So O had
better go to-- sorry-- O had better go to, then, the bottom middle location. And the value, therefore, of this board is ultimately going to be 0. So this is to say if you play like O just did, you won't always win, but you will never lose because you can choose between the right paths in the tree. Now, the problem is, even if you're on board with that algorithm, let's go one step back such that there's not two places left, but three. Unfortunately, the size of this decision tree essentially doubles. In fact, it's an exponential relationship
because now there's 1, 2, 3 spaces in the board. If we remove one of those and we go to four moves left, the tree doubles again. Remove another, the tree doubles again. And so the initial board of decisions that you might need to consider for Tic-Tac-Toe actually gets really, really darn big, more so than you might imagine. In fact, in the world of tic-tac-toe, if we implement it exactly like this, if the player is X, for each possible move-- that is in a loop in CS50 speak-- consider every possible move, calculate the score for the
board, and then choose the move with the highest score, just like I did. So this is the pseudocode for what we just walked through verbally. If the player is O, though, for each possible move, calculate the score for the board. And then choose the move with the lowest score. So in other words, if both X and O are thinking as many steps ahead as they can, they can either win or force a tie, and I claim, never lose, according to this algorithm. But here's the rub. How many different ways are there to play Tic-Tac-Toe? You
might have bored of it years ago as a kid. But you surely did not play all possible versions of Tic-Tac-Toe with your brother or sister growing up, for instance. Why? There are 255,000 ways to play a 3 by 3 grid of Tic-Tac-Toe back and forth, back and forth, which means that's a really big decision tree, certainly for a kid, to keep in their mind, let alone waste the time sort of figuring all of that out. To be fair, computers, no problem, considering 255,000 different ways to play a game. They have lots of memory. They have
very fast processors nowadays. Drop in the bucket for a computer. Big deal for a human. But what about other games that are more sophisticated than Tic-Tac-Toe? Some of you might play chess. And we'll keep chess simple. If you consider only the first four pairs of moves-- so one player goes, then the other, then again, then again, then again, so four pairs of moves. How many ways are there for those two humans to play the first four moves of chess? Over 85 billion ways because of the various permutations on a normal chess board. If you're familiar
with the game of Go, 266 quintillion ways to play that game. There's no way, with our current computers, that they can possibly think that many steps ahead and then make a decision tree and optimal decision. So even the IBM Watson's of the world, with which you might be familiar, playing Jeopardy years ago and the like, they were really doing their best to approximate correct answers. But they were not taking all day long or all of our lifetimes to crunch all of those numbers, to get through those numbers. So here, really then, is the motivation for
actual AI. Everything we've talked about thus far the world's called AI, or they called the computer, the CPU player. But it was really just code written with ifs, else ifs, and else to dictate how the ball moves, how the paddle moves, who goes in Tic-Tac-Toe and what order and the like. It's all very deterministic. But today, it's really about artificial intelligence learning and figuring out on its own how to win a game when it can't possibly have enough memory or enough time to figure out, deterministically, the perfect answer. So thus was born machine learning. We're
not writing code to tell computers exactly what to do, but we're really writing code that tries to teach computers how to figure out the solutions of two problems, even if we, ourselves, don't know the correct answer. And so machine learning is, indeed, about trying to get computers to learn from available data. And, in fact, what we feed to them is input is training data of some sort. And there's different ways we can train computers. One way that we thought we'd introduce today is called reinforcement learning. And it's actually relatively simple. In fact, could I get
one volunteer who's comfortable coming up on stage on the internet? OK, come on down. Come on over here. Maybe round of applause, because this is always awkward. [APPLAUSE] All right, we have no microphone today, so just talk near me. What's your name? AUDIENCE: Max. DAVID MALAN: And do you want to say a little something about yourself? AUDIENCE: Hi, I'm Max. I'm a senior in high school. I'm here for family weekend. DAVID MALAN: Nice. Well, welcome. Come on over here. And we're going to teach Max how to flip a pancake. So we've got an actual pan
here, and we've got a fake pancake here. And what I'd like, Max, you to do is to figure out how to flip this pancake up so it goes up and around, but stays in the pan. And I will either reward or punish you, so to speak, by saying good or good or bad each time. That was actually very good. OK, do more of that. That was bad. Do less of that. Getting worse. [LAUGHTER] Didn't really flip. One more try. All right, it's round of applause. Thank you to Max. [APPLAUSE] Here, come on in here. We
have a parting gift for you here, too, if you would like. Thank you. So the point here is that even though that sort of peaked early there and did really well with the first one, but I was sort of rewarding and punishing Max by giving some kind of feedback, like good or bad, or somehow reinforcing the positive behavior, and not reinforcing the bad behavior, if you will. So this is actually representative of what we mean by reinforcement learning. And if you're a parent, you've done this in some form, presumably, with your kids over time to
get them to do more good behavior and ideally less bad behavior. But how might we do this, then, with code? Well, here, for instance, is a visualization of a researcher working in a lab with not Max this time, but an actual robot. And we'll see over time that the more we reward and reinforce good behavior, the better even a robot controlled by software can get over time without being programmed to move up or down or left or right, but just try movements and then do more of the good movements and less of the bad movements.
So the human is going once just to show it some good movements. But there's no code here in question. There's no one way to flip a pancake correctly. And so the first time does worse than Max. The third time, still not so good. Fifth time, not so good. But it's trying different movements again and again. That's 10 trials. The human is now fixing things again, 11 trials. No, getting a bit more air. 15 trials. Still bad. But if we start to fast forward to 20 trials-- [LAUGHTER] Now, just another angle of the same. So all
of these movements can be broken down into x, y, and z movements. So when we say do more of that, it can do more of the x, more of the y, more of the z. But it's really trying to infer. And now it's sort of picking up what to do more of. And it seems to be repeating the good behavior, such that after 50 such trials, the robot is able now to do this again and again and again. So that, then, is reinforcement learning. So, Max, you were well-taught growing up, it would seem, for that
particular exercise. But let's consider now the implications of using reinforcement learning in other contexts and see if this solves all problems for us. Well, here's a very boring distillation of a game that's like a maze, whereby this might be the player in yellow here. The goal is to get to this green exit. And then the rest of the floor may or may not be lava, whereby there could be some red lava pits that the yellow dot does not want to fall into. So the best this player can do is really try randomly up, down, left,
right. And when it falls into a lava pit, do less of that. But if it doesn't, do more of that. So for instance, suppose that we know, with the bird's eye view here, where the lava pits are. Suppose that the yellow dot gets unlucky and trips into the one first. So now we say, don't do that. And so it can use a little bit of memory as represented by this darker red line. Let me not go there again. That was a bad decision to make. So now I have a new life. Just try again. So
the yellow dot now tries this. Maybe it tries this. Maybe it then falls into the same lava pit, not realizing, because it does not have the same bird's eye view as us, that it fell into a lava pit again. So let's remember with a bit of memory or RAM, do less of that. Again, again, again, lava pit. That's OK. Let's use a bit more memory to reinforce that bad behavior negatively. Again, again, again. OK, bad, but we're making progress. Again, again, again. And now I think the yellow dot, just by luck, might find its way
to the green output. And so this is a success. We've now won the game. But we haven't necessarily maximized our score. Why? That was a correct solution, but-- AUDIENCE: It could get there in much less moves. DAVID MALAN: Yeah. It could get there in fewer moves by going more like a straight line. Even though it still has to go up, down, left, right, it didn't really need to take this circuitous route. But the problem is if that you only reinforce the good behavior and then you stick to your guns, you may never maximize your score
by just following the path with which you're most familiar. And so there's this principle actually in computing whereby ideally, this thing would know that, yes, this is a correct solution, as per the green recollections. But what if we start exploring a little bit nonetheless, whereby each time I play this game, even if I know how I can win, let me just, with 5%, 10% probability, try a different path. This is something you can actually practice in the real world. As soon as I learned about this principle in computing, I realized that this explains my own
behavior and restaurants, whereby if I go to a restaurant for the first time, I choose something off the menu and I really like it, I will never try anything else off of that restaurant's menu because I liked it well enough. But who knows if there's an even better dish on that menu. Problem is, I tend to, in the real world, exploit knowledge I already have. I really reinforce that first process of learning. But I rarely explore. But maybe we can find better solutions to problems by just exploring a little bit. Maybe we'll fail sometimes, but
maybe we'll get lucky, too. And so here in pseudocode is how we might distill this idea. Let's choose some epsilon value, just a variable to 10%, whatever it is, to sprinkle a little bit of randomness in here. And then if a random number we choose is less than that value, which will not happen often if it's so small, make a random move. Instead of going right and following the path already traveled, go up this time and see what happens. Else, make the move with the highest value, so to speak. So sometimes you will fall into
another lava pit. But again, if you do this again and again and again over time, probabilistically, you might, in fact, find a better path. And if you let your mind wander for a moment and consider why tools like ChatGPT are wrong sometimes, maybe they're doing a little bit of exploration for just me. And darn it, they gave me a wrong answer as a result. You can think about it being a little bit like that in the real world. And so now if we try again, sprinkling in a bit of randomness, I might very well find
a path that, ah, as you noted, gets me to the green exit all the more quickly. So we can still reinforce good behaviors and punish bad behaviors. But by sprinkling in a little bit of randomness in there, we can instead ensure that, maybe over time, we will find an even better solution. Now, we can see this in other contexts, as well. If we revisit Breakout, let me go back now to a video version thereof, whereby you might think that over time, the best way to play Breakout is, again, move the paddle left and right, very
deterministically, like we proposed earlier with the screenshot. And you will just gradually work your way through the blue. Then you can work your way through the green. Then you can work your way through the yellow, and so forth. But what this video is of is a computer learning by reinforcement learning what to do more of. So somewhere there's a human probably giving it feedback that was good, do more of that, or maybe don't do that. Or it can be baked into the score based on the color of these bricks. And I dare say if we
give more points to the top bricks and fewer points to the bottom, that's equivalent to rewarding the best strategy and maybe punishing the worst strategy, because you really want to get to those higher bricks first. But here, to the surprise of the researcher, if you will, the AI, a little creepily, finds out that the best strategy is to let the game play itself. And you can perhaps see where this is going. Now it's sort of in hands-off mode. It's getting all of the highest score on its own. And it only would have done that if
maybe it tried a few things randomly, like, oh my God, I found a better strategy now than just mindlessly going back and forth and forth. And so this exists in so many different domains, not just restaurants, but games in the world of these large language models and more. But what we've seen thus far is that these are examples of really reinforcement learning. There's got to be some point system associated with this, maybe a human supervising the whole process. And indeed, in the context of learning, any time you have a human providing feedback, whether it's the
fellow in the video of the pancakes giving feedback to the robot or someone kind of working behind the scenes at Google initially trying to teach the servers what is spam and what is not, the catch with supervised learning is that there's only one of that guy and there's only a finite number of humans working at Google doing this. And once the data exceeds what a human can do or wants to do, we probably need to transition from supervised learning to unsupervised, where we still write the code that ideally teaches the machines how to learn, but
we don't have to tell it constantly what is good, what is bad, what is correct, what is incorrect. Heck, let's let the software figure that out, too, and take us out, for better or for worse, of the picture altogether. So what we're really transitioning to as a society now, if you will, is something called deep learning, which goes beyond the reinforcement learning, the supervised, the unsupervised, the supervised learning that we just saw. But deep learning is often grounded in what we might call these here neural networks. And a neural network is really inspired by real
world biology, whereby governing our human neural system are all these little neurons that have some kind of physical connections to each other. And somehow there's electrical signals traveling through us such that if I think a thought, that's how I know to stick out my hand or shake someone's hand or the like. There's some kind of control system going on here. So here's a picture, a rough picture of what a neuron might look like. Here's a pair of them being close enough to somehow communicate. Being computer scientists, we're going to abstract this away and really just
think of each such neuron as a circle. And if they have a means of communicating, we're going to draw an edge between them, turning these into really mathematical graphs. So these are nodes in our CS50 speak. And these are edges, in this case. But it's still the same idea. There's something communicating with something. And heck, we can think of this as maybe the input and this, now, as maybe the output. We can really map this biological world to the computing world. Suppose you have a world of two inputs, though. That's fine. Maybe based on this
input and this input, this here output can give us some answer to some question. Well, what does this mean? Well, let's make it more real. Let's shrink that down and move it at left. Let's come up with a Cartesian plane here with x and y-coordinates. And let's assume that in this world there exist dots. And those dots are either blue or they are red. And it would be nice, in this world, to be able to predict, based on the xy coordinates of some dot, if it is going to be blue or red. So you can
imagine this representing other types of questions to which we might want answers. So here's our y-axis. Here's our x-axis. If I only have a limited amount of training data, though, two dots-- one blue, one red-- the best I can really do is guess at what defines red versus blue in this world. So maybe I can think of this neuron as representing x, this neuron as representing y. And the output I want is an answer-- red or blue dot based on x comma y. Well, how do I come up with this? Well, best guess, maybe it's
just a straight line. So maybe everything to the left of this line is going to be blue in this world. Everything to the right is going to be red. So what I really am trying to figure out here and learn here is, what is the best fit line that somehow separates the red from the blue? Well, what I'm really trying to do then over time is adjust this. The more training data I get, the more actual dots in the real world, I might need to modify my best guess here. And so blue is now over
here. So I think I want to maybe tilt this line and come up with a different slope for it. And if you give me more and more dots, I bet I can do better and better and refine the line. Might not be perfect, but this is pretty darn good. I got most of the blue at left, and most of the red at right. And frankly, if I want to really get it correct, you're going to have to let me do more than just a straight line. I'm going to need to somehow use some quadratics or
something to have curly lines in there to capture this perfectly. But if you see where this might be going, if I've got x's and y's, I'm trying to find the best fit line. And what I'm trying to do is think of this as representing a neural network. What I'm really trying to do is something like this. Can I come up with, mathematically, a value for A and a value for B that gives me an answer C that is either red or blue? Yes or no, if you will? Well, how might I do that? Well, again,
not to dwell too much on grade school math here or high school math, but ax plus by plus c is a line of some sort. And maybe we can just arbitrarily say that if you give me an x and a y value and you've given me enough training data to figure out what A should be and B should be and C should be, if the answer to that mathematical expression is, say, greater than 0, I'm going to say the dot should be blue. And if it's less than or equal to 0, I'm going to say
it should be red instead. It doesn't matter. It's just like Tic-Tac-Toe. We just have to agree how to represent these games or these questions mathematically to which we can then get a Boolean answer, yes or no, red or blue. And so what these neural networks are really trying to do is based on lots and lots of data, just plug-in a whole bunch of parameters, coefficients, if you will, of those x's and y's so that when you pass in inputs like these, you get back a correct answer over here. And what's funny about neural networks, at
least right now among computing circles, is that even the best engineers at Google, Microsoft, and OpenAI who are using neural networks to, given your input or your question, produce an answer thereto, a la ChatGPT, even though there's millions, if not billions of numbers involved underneath the hood, no computer scientist, even the head engineer, can point at this and say, well, this node represents such and such. And this edge is this value because of this reason. It's all sort of a black box in the sense of abstraction. And so because we're just throwing lots and lots
of data at these things, the computer is figuring out what all of these interconnections should be mathematically. And what we're really just trying to do probabilistically is, with high confidence, spit out the right answer. But even we humans don't really know how these work step by step underneath the hood. And therein lies this idea of machine learning. So with that said, how might we apply this to some real world domains? Well, maybe you're in meteorology. And so given a humidity level and pressure, the goal is to output a rainfall prediction. Maybe you can do that
with a neural network by feeding in your computer with a whole bunch of sample humidity, sample pressure values, and historical rainfall amounts and just let it figure out how to represent that kind of pattern. Alternatively, in the world of advertising, if you know how much you spent in a month and what month that is, I bet if you give me enough historical data, I can crunch those numbers somehow and predict what your sales are going to be based on that data-- not 100% correctly, but probably confidently correctly most of the time. Well, what we have,
then, in ChatGPT and what we have, then, in CS50's duck is what's called a Large Language Model, whereby the inputs to these neural networks have been like all of the content of the internet, so think Google search results and Reddit and Stack Overflow dictionaries and encyclopedias and any such works that it's just been consuming as input. And what these large language models are trying to do is figure out, based on patterns and frequencies of text in all of those inputs, well, if someone asks me, how are you, question mark, probablistically, based on all of this
data, I bet 99% of the time I, ChatGPT or CS50's duck, is supposed to reply, good, thanks. How are you? So not always the correct answer, but probabilistically. And that's why ChatGPT is sometimes wrong. Because there might be misinformation on the internet. Maybe there's a bit of exploration sprinkled in randomly. But it's not always going to give you the right answer. But probabilistically, it's going to. And this stuff is truly hot off the presses. In 2017, folks at Google proposed what is generally now called attention, which is a feature that underlies AI whereby you can
figure out dynamically what the relationship is between words in English paragraph or an English text or really any human language. And giving weight to words in that way has actually fed a lot of these new large language models. In 2020, OpenAI published its GPT model. And most recently, in 2022, did ChatGPT itself come out. And what underlies what we're talking about here is technically, this big mouthful-- generative pre-trained transformers, whereby the purpose of these AIs is to generate stuff. They've been pre-trained on lots of publicly available data. And the goal is to transform the user's
input into ideally correct output. And if you see where I'm going with this, that's the GPT in ChatGPT, which itself was never meant to be like a branded product. It's a little weird that GPT has entered the human vernacular. But what it does is evokes exactly these ideas. So here's a sample paragraph, for instance. Massachusetts is a state in the New England region of the Northeastern United States. It borders on the Atlantic Ocean to the East. The state's capital is, dot, dot, dot, essentially inviting us to answer now this question. Well, historically, prior to 2017-ish,
it was actually pretty hard for machines to learn that this mention of Massachusetts is actually related to this mention of state. Why? Because they're pretty far apart. This is in a whole new sentence. And unless it knows already what Massachusetts is-- and technically, it's a Commonwealth-- it might not give much attention to these two words, too much weight to the relationship thereof. But if you train these GPTs on enough data and you start to break down the input into sequences of words, for instance, well, you might have an array, or a list of words here,
in CS50 speak. You might be able to figure out, based on your training data, that if you number of these words from 1 to 27 or whatnot, in this case, you could represent them mathematically somehow. As an aside, the way that these large language models are representing words like Massachusetts literally is with numbers like this. This is 1,536 floating point values in a vector, a.k.a. list or array, that literally represents the word Massachusetts, according to one of these algorithms. Let's take a step back and abstract it away as little rectangles instead and use these little
edges to imply that if there's a bolder edge here, that implies that there's really a relationship in the training data between Massachusetts and state. One of those words is giving more attention to the other as opposed to is, which is maybe a thin line because there's not much going on there between Massachusetts and is as opposed to those two nouns, in that case. All of this input, all of these vectors, are fed into large neural networks that have lots and lots of inputs, far more than 1 and 2 and 3, the output of which, ideally,
then, is the answer to this question, or a whole answer to your question. And so when you ask the duck a question, you ask ChatGPT the question, essentially the software is navigating this neural network, trying to find the best path through it to give you the most correct answer. And ideally, it's going to get it correct. But it might not necessarily do it every such time. And so, in fact, there are these things-- and we'll end where we began-- known as hallucinations where sometimes ChatGPT, and admittedly, even CS50's own duck, might just make something up
and tell you such very confidently. Those are, indeed, known as hallucinations. And what I thought we'd end on is a note that actually was published quite some time ago, perhaps in your childhood, as well, from Shel Silverstein here, the Homework Machine, from which we have this here poem. "The Homework Machine, oh the Homework Machine, most perfect contraption that's ever been seen. Just put in your homework then drop in a dime. Snap on the switch, and in ten seconds' time, your homework comes out clean as can be-- quick and clean as can be. Here it is--
nine plus four? question mark, and the answer is three. Three, oh, me. I guess it's not as perfect as I thought it would be." So foretold decades ago what we're now here talking about. But this, then, was CS50. This was, then, AI. This is the URL which you parents and siblings and others are welcome to take the course, so to speak, in any way. Feel free, though, to come on up with any hellos or questions. That, then, is our class. And we will see you, hopefully, next time. [APPLAUSE]