[Music] [Applause] wow so many of you good okay thank you for that lovely introduction um right so what is generative artificial intelligence um so I'm going to explain what artificial intelligence is and I want this to be a bit Interactive so there will be some audience participation um the people here who hold this lecture said to me oh you are very low tech for somebody working on AI I don't have any explosions or any experiments so I'm afraid you'll have to participate I hope that's okay all right so what is generative artificial intelligence so the
the term is made up by two things artificial intelligence and generative so artificial intelligence is a fancy term for saying we get a computer program to do the job that a human would otherwise do and generative this is the fun bit we are creating new content that the computer has not necessarily seen it has seen parts of it and it's able to synthesize it and give us new things so what would this new content be it could be audio it could be computer code so that it writes a program for us it could be a
new image it could be a text like an email or an essay you've heard or video now in this lecture I'm only going to be mostly focusing on text because I do natural language processing and this is what I know about and we'll see how the technology works works and hopefully leaving the lecture you know how like there is a lot of myth around it and it's not you see what it does and it's just a tool okay right so the outline of the talk there's three parts and it's kind of boring this is Alice
Mor Earl I do not expect that you know the lady she was an American writer and she writes about uh memorabilia and Customs but she's famous for her quotes so she's given us this quote here that says yesterday's history tomorrow is a mystery today is a gift and that's why it's called the present it's a very optimistic quote and the lecture is basically the past the present and the future of AI okay so what I want to say right at the front is that generative AI Is Not A New Concept it's been around for a
while so how many of you uh have used or are familiar with Google translate can I see a show of hands right who can tell me when Google translate launched for the first time 1995 oh that would have been good 2006 so it's been around for 17 years and we've all been using it and this is an example of generative AI Greek text comes in I'm Greek so you know pay some juice to the right so Greek text comes in English text comes out and Google translate has served us very well for all these years
and nobody was making a fuss um another example is Siri on the phone again Siri launched 2011 12 years ago and it was a sensation back then it is another example of generative AI we can ask Siri to set alarms and Siri talks back and oh how great it is and then you can ask about your alarms and what not this is generative AI again it's not as sophisticated as chat GPT but it was there and I don't know how many have an iPhone see iPhones are quite popular I don't know why okay so so
we are all familiar with that um and of course later on there was Amazon Alexa and so on okay again generative II Is Not A New Concept it is everywhere it is part of your phone H the completion when you're sending an email or when you're sending a text the phone attempts to complete your sentences attempts to think like you and it saves you time right because some of the completions are there the same with Google when you're trying to type it tries to guess what your search term is this is an example of language
modeling we'll hear a lot about language modeling in this talk so basically we're making predictions of what what the continuations are going to be so what I'm telling you is the generative AI is not that new so the question is what is the fuss what happened so in 2023 open AI which is a company in California in fact in San Francisco if you go to San Francisco you can even see the lights at night of their building um it announced gp4 and it claimed that it can be 90% of humans on the SAT for those
of you who don't know sat is a standardized Tex test that American um school children have to take to enter University it's an admissions test and it's multiple choice and it's considered not so easy so gp4 can do it they also claimed that it can get top marks in law medical exams um and other exams they have a whole Suite of things that they claim uh well not the claim they show that gp4 can do it okay aside from that it can pass exams we can ask it to do other things so you can ask
it to um write text for you for example you can have a prompt this little thing that you see up there it's a prompt it's what the human wants the tool to do for them and a potential prompt could be I'm writing an essay about the use of mobile phones during driving can you give me three arguments in favor this is quite sophisticated if you ask me I'm not sure I can come up with three arguments you can also do and these are real prompts that actually the tool can do um you tell CH GPT
or GPT in general act as a JavaScript developer write a program that checks the information on a form name and email are required but address and age are not so I'm just writing this and the tool will spit out a program and this is the best one one create an about me page for a website I like rock climbing Outdoor Sports and I like to program I started my career as a quality engineer in the industry blah blah blah so I give this version of what I want the website to be and it will create
it for me so you see we've gone from Google translate and Siri and the auto completion to something which is a lot more sophisticated I can do a lot more things another fun fact so this is a graph that shows um the time it took for chat GPT to reach a 100 million users compared with other tools that have been launched in the past and you see our beloved Google translate it took 78 months to reach a 100 million users a long time um Tik Tok took N9 months and CH GPT two So within two
months they had 100 million EUR users and these users pay a little bit to use the system so you can do the multiplication and figure out how much money they make um okay so this is the history part so how how do how did we make CH GPT what is the technology behind this the technology turns out is not extremely new or extremely Innovative or extremely difficult to comprehend um so we'll talk about the today now so we'll address three questions first of all how did you we get from these single purpose systems like Google
Translate to chat GPT which is more sophisticated and does a lot more things and in particular what is the core technology behind chbd and what are the risks if there are any and finally I will just show you a little glimpse of the future and how it's going to look like and whether we should be worried or not and uh you know I won't leave you hanging please don't worry okay right so all these GPT model variants and there is a cottage industry out there I'm just using GPT uh as an example because the public
know know and there have been a lot of um you know news articles about it but there's other models other variants of models that we use in Academia and they all work on the same principle and this principle is called language modeling what does language modeling do it assumes we have a sequence of words the context so far and we saw this context in the completion and I have an example here assuming my context is the phrase I want to the language modeling tool will predict what comes next so if I tell you I want
to there is several predictions I want to shovel I want to play I want to swim I want to eat and depending on what we choose whether it's shovel or play or swim there is more continuations so uh for a shovel it will be snow for play it can be tennis or video swim doesn't have a continuation for it it will be lots and fruit now this is a toy example but imagine now that the computer has seen a lot of text and it knows what words follow which other words we used to count these
things so I would go I would download a lot of data and I will count I want to shovel how many times does it appear and what are the continuations and we would have counts of these things and all of this has gone out of the window right now and we use neural networks that don't exactly count things but predict learn things in a more sophisticated way and I'll show you in a in a moment how it's done so jpt and GPT variants are based on this principle of I have some context I will predict
what comes next and uh that's the prompt The Prompt that I gave you these things here these are prompt this is the context and then it needs to do the task what would come next in some cases it would be the three arguments in the case of the uh web developer it would be a web page okay the task of language modeling is we have the context and this I changed the example now it says the color of the sky is and we have a neural language model this is um just an algorithm that will
pred predict what is the most likely continuation and likelihood matters um these are all predicated on actually making guesses about what's going to come next um and that's why sometimes they fail because they predict the most likely answer whereas you want a less likely one but this is how they train they train to come up with what is most likely okay so we don't count these things we try to predict them using this language model so how would you build your own language model this is a recipe this is how everybody does this so step
one we need a lot of data we need to collect a ginormous Corpus so these are words and where will we find such a ginormous Corpus I mean we go to the web right and we download the whole of Wikipedia stack Overflow Pages qu a social media GitHub Reddit whatever you can find out there I mean work out the permissions it has to be legal you download all this Corpus and then what do you do then you have this language model I haven't told you what exactly this language model is there is an example and
I haven't told you what the neural network that does the prediction is but assume you have it so you have this Machinery that will do the learning for you and the task now is to predict the next word but how do we do it and this is the genius part we have the sentences in the Corpus we can remove some of them and we can have the language model predict the sentences we have removed this is dead cheap I just remove things I pretend they're not there and I get the language model to predict them
so I will randomly truncate truncate means remove the last part of the input sentence I will calculate with this neural network the probability of the missing word if I get it right I'm good if I'm not right I have to go back and reestimate some things because obviously I made a mistake and I keep going I will adjust and feed back to the model and then I will compare what the model predicted to the ground truth because I've removed the words in the first place so I actually know what the real truth is and we
keep going for some months or maybe years no months let's say so it will take some time to do this process because as you can appreciate I have a very large Corpus and I have many sentences and I have do the do the prediction and then go back and correct my mistake and so on but in the end the thing will converge and I will get my answer so the tool in the middle that I've shown uh this tool here this language Model A very simple language model looks a bit like this so and maybe
the audience has seen this this is a very naive graph um but it helps to illustrate the point of what it does so this neural network language model will have some input which is these um nodes in the as we look at it well my right and your right okay so the notes Here on the right are the input and the nodes at the very left are the output so we will present present this neural network with uh five inputs the five circles and we have three outputs the three circles and there is stuff in
the middle that I didn't say anything about these are layers these are more nodes that are supposed to be abstractions of my input so they generalized the idea is if I put more layers on top of layers the middle layer layers will generalize the input and will be able to see patterns that are not there so you have these nodes and the input to the nodes are not exactly words they're vectors so series of numbers but forget that for now so we have some input we have some layers in the middle we have some output
and this now has these connections these edges which are the weights this is what the network will learn and these weights are basically numbers and here it's all fully connected so I have very many connections why am I going through this process of actually telling you all of that you will see in a minute so you can work out how big or how small this neural network is depending on the numbers of connections it has so for this toy neural network we have here I have worked out the number of weights we call them also
parameters that this neural network has and that the model needs to learn so the parameters are the number of units as input in this case it's five times the units in the next layer eight plus eight this plus eight is a bias it's um a cheating thing that this neural networks have uh again you need to learn it and it sort of corrects a little bit the neural network if it's off it's actually genius if the prediction is not right it tries to correct it a little bit so for the purposes of this talk I'm
not going to go into the details all I want you to see is that there is a way of working out the parameters which is basically the number of input units times the the units my input is going to and for this fully connected network if we add up everything we come up with 99 trainable parameters 99 this is a small Network for all purposes right but I want you to remember this this small network is 99 parameters when you hear this network is a billion parameters I want you to imagine how big this will
be okay so 99 only for this toy neural network and this is how we judge how big the model is how long it took and how much it cost it's the number of parameters in reality in reality though no one is using this network maybe if in my class I had if I have a first year undergraduate class and I introduce neuron networks I will use this as an example in reality what people use is these monsters that are made of blocks and what block means they're made of other neural networks so I don't know
how many people have heard of Transformers I hope no one oh wow okay so Transformers are these neural networks that we use to build CH GPT and in fact GPT stands for generative pre-trained Transformer so Transformer is even in the title so this is a sketch of a transformer so you have your input um and the input is not words like I said here it says embeddings embeddings is another word for vectors and then you will have this a bigger version of this network multiplied into these blocks so and each block is this complicated system
that has some neural networks inside it we're not going to go into the detail I don't want I please don't go all I'm trying all I'm trying to say is that um you know we have this block blocks stacked on top of each other the Transformer has eight of those which are mini neural networks and the task Remains the Same that's all I want you to take out of this input goes in the context the chicken walked we're doing some processing and our task is to predict the continuation which is across the road and this
EOS means end of sentence because we need to tell the neuron Network that our sentence finished I mean they're kind of dumb right we need to tell them everything when I hear like will take over the world they like really we have to actually spell it out okay so this is the Transformer the king of architectures the Transformers came in 2017 nobody's working on new architectures right now it is a bit sad like everybody is using these things they used to be like some pluralism but now no everybody's using Transformers we've decided they're great okay
so the what we're going to do with it this and this is kind of important and the amazing thing is we're going to do self-supervised learning and this is what I said we have the sentence we trate we predict and we keep going till we learn these probabilities okay you're with me so far good okay so once we have our transformer and we've given it all this data that there is in the world then we have a pre-trained model that's why GPT is called the generative pre-trained Transformer this is a baseline model that we have
and has seen a lot of things about the world in the form of text and then what we normally do we have this general purpose model and we need to specialize it somehow for a specific task and this is what is called fine-tuning so that means that the network has some weights and we have to specialize the network we'll take take initialize the weights with what we know from the P training and then in the specific task we will Nar a new set of Weights so for example if I have medical data I will take
my pretrain model I will specialize it to this medical data and then I can do something that is specific for this task which is for example write a diagnosis from a report okay so this notion of fine-tuning is very important because it allows us to do special purpose applications for these generic pre-trained models now and people think that uh GPT and all of these things are general purpose but they are fine tuned to be general purpose and we'll see how okay so here's the question now we have this basic technology to do this pre-training and
I told you how to do it if you download all of the web how good can a language model will become right how does it become great because when GPT came out in GPT 1 and gpt2 they were not amazing so the bigger the better size is all that matters I'm afraid this is very bad because we used you know people didn't believe in scale and now we see that scale is very important so since 2018 we've witnessed an in absolutely extreme increase in model sizes and I have some graphs to show this okay I
hope people at the back can see this graph yeah you should be all right so this uh graph shows the number of parameters remember the toy neural network had 99 the number of parameters that these models have and we start with a normal amount well normal for gpt1 and we go up to gp4 which has one trillion parameters huge one trillion this is a very very very big model and you can see here the ant brain and the rat brain and we go up to the human brain the human brain has um not a trillion
a 100 trillion parameters so we are a bit uh off we we are not at the human brain level yet and maybe we'll never get there and we can't compare the gbt to the human brain but I'm just giving you an idea of how big this model is now what about the words it's seen so this graph shows us the number of words processed by these language models during their training and you will see that there has been an increase but the increase has not been as big as the parameters so uh the community started
focusing on the parameter size of these models whereas in fact we now know that it needs to see a lot of text as well so gp4 has seen approximately I don't know few billion words um all the human written text is uh I think a 100 billion so it's sort of approaching the this um you can also see what a human reads in their lifetime it's a lot less uh even if they read you know because people nowadays you know they read but they don't read fiction they read the phone anyway uh you see the
English Wikipedia so we are approaching the level of the text that is out there that we can get and in fact one may say well GPT is great you can actually use it to generate more text and then use this text that GPT has generated and then retrain the model but we know this Tex is not exactly right and in fact it's the ministry turns so we're going to Plateau at some point okay how much does it cost now okay so GPT 4 cost a 100 million okay so when should they start doing it again
so obviously this is not a process you have to do over and over again you have to think very well and you make a mistake and you lost like 50,000 50 5050 million you can't start again so you have to be very sophisticated as to how you engineer the training because a mistake costs money and of course not everybody can do this not everybody has $100 million they can do it because they have Microsoft biking them not everybody okay uh now this is a video that is supposed to play and illustrate let's see if it
will work the effects of scaling okay so I'll I'll we'll play it one more so these are tasks that you can do and it's the number of tasks uh against the number of parameters so we start with 8 billion parameters and we can do a few tasks and then the tasks increase so summarization question answering translation and once we move to 540 billion parameters we have more tasks we start with very simple ones like code completion and then we can do reading comprehension and language understanding and translation so you get the picture the the the
tree flourishes um so this is what people discovered with scaling if you scale the language model you can do more tasks okay so now maybe we are done but what people discovered is if you actually take GPT and you put it out there it actually doesn't behave like people want it to behave because this is a language model trained to predict and complete sentences and humans want to use GPT for other things because they want they have their own tasks that the developers hadn't thought of so then the notion of fine-tuning comes in it never
left us so now what we're going to do is we're going to collect a lot of instructions so instructions are examples of what people want Chad GPT to do for them such as answer the following question or answer the question step by step and so we're going to give these demonstrations to the most model and in fact one almost 2,000 of 2,000 of such examples and we're going to fine tune so we're going to tell this language model look these are the tasks that people want try to learn them and then an interesting thing happens
is that we can actually then generalize to unseen tasks unseen instructions because you and I may have different usage purposes for these language models okay but here's the problem we have an alignment problem and this is actually very important and something that uh will not leave us uh for the future and the question is how do we create an agent that behaves in accordance with what a human wants and I know this there's many words in questions here but the real question is if we have ai systems with skills that that we find important or
useful how do we adapt those systems to reliably use those skills to do the things we want and there is a framework that um is called the HHH framing of the problem so we want GPT to be helpful honest and harmless and this is the bare minimum so what does it mean helpful it can follow it should follow instructions and perform the tasks we wanted to perform and provide answers for them and ask relevant questions according to the user intent and clarify so if you've been following in the beginning gpdd did nothing none of this
but slowly it became better and it now actually asks for these clarification questions it should be accurate something that is not 100% there even to this there is you know inaccurate information and avoid toxic biased or offensive responses and now here's a question I have for you how will we get the model to do all of these things you know the answer fine tuning except that we're going to do a different fine-tuning we're going to ask the humans to do some preferences for us so in terms of helpful we're going to ask an example is
what causes the seasons to change and then we'll give two options to them to the human changes occur all the time and it's an important aspect of life bad the seasons are caused primarily by the tilt of the air axis good so we'll get this preference course and then we'll train the model again and then it will know so fine tuning is very important and now it was expensive as it was now we make it even more expensive because we add a human into the mix right because we have to pay these humans that give
us the preferences we have to think of the tasks the same for honesty is it possible to prove that P equals NP U no it's impossible it's not great as an answer that is considered a very difficult and unsolved problem in computer science it's better and we have similar for harmless okay so I think it's time let's see if we'll do a demo yeah that's bad if you remove all the files um okay hold on okay so now we have GPT here I'll do some questions and then we'll take some questions from the audience okay
so let's ask one question is the UK a monarchy can you see it up there I'm not sure and it's not generating oh perfect okay so what do You observe first thing too long I always have this beef with this it's too long you see what it says as of my last knowledge update in September 2021 the United Kingdom is a constitutional monarchy it could be that it wasn't anymore right something happened this means that while there is a monarch the reigning monarch as to that time was Queen Elizabeth III so it tells you you
know I don't know what happened at that time there was a Queen Elizabeth now if you ask it who oh sorry who is Rishi if I could type Rishi sunak does it know a British politician as my last knowledge update he was a chancellor of the ex cheer so it does not know that he's the Prime Minister write me a poem write me a poem about uh what do what do we want it to be about give me two things hey yeah it will know it will know let's do another point about cat squirrel cats
a cat and a squirrel we'll do a cat and a squirrel a cat and a squirrel a cut and a squirrel the me no a tale of curiosity whoa oh my God okay I I will not read this you know they want us they want me to finish at eight so uh right H can let's say can you try a shorter poem as a can you try it to can you try to give me a to give me a again don't type cool Amit Dooms gold Lees whisper Secrets UNT told Nature's story bold okay don't
clap okay let's Okay one more so H does the audience have anything that they want but challenging that you want to ask yes what school did aluring go to perfect what school did Alan churing go to oh my God he went do you know I don't know whether it's true this is the problem Sherborne school can somebody verify King's College Cambridge Princeton yes okay ah here's another one tell me a joke about Alan touring okay I cannot type but it will okay lighthearted joke why did Alan touring keep his computer cold because he didn't want
it to catch bites bad okay um okay explain why that's funny ah very good one why is this a funny joke and where is it oh God okay catch bites sounds similar to catch colds catching bites is a humorous twist and this phrase oh my God the humor comes from the clever word play and the unexpected okay you lose the will to live but it does explain it does explain okay right um one last order from you guys is consciousness it will know because it has Sy Ians and it will spit out like a huge
thing shall we try um say again write write a song about relativity okay write a about you are learning very fast a short song about relativity oh goodness me this is short oh my outro okay so see it doesn't follow instructions it is not helpful and this has been fine-tuned okay so the best was here it had something like where was it Einstein said Eureka one faithful day as he pondered the stars in it his own unique way the theory of relativity he did unfold a cosmic story ancient and bold I mean kud us to
that okay now let's go back to the talk um you I because I want to talk a little bit presentation I want to talk a little bit about you know is it good is it bad is it fair are we in danger okay so it's virtually impossible to regulate the content they're exposed to okay and there's always going to be historical biases we saw this with a queen and Rishi sunak and they may okay Al exhibit various types of undesirable behavior for example this is this is famous there was a Google showcased a model called
Bard and they H released this tweet and they were asking Bard what new discoveries from the James space web telescope can I tell my 9-year-old about and it's spit out this thing uh three things amongst them it said that this telescope took the very first picture of a planet outside of our own solar system and here comes Grant Trembley who is an astrophysicist a serious guy and he said I'm really sorry I'm sure Bard is amazing but it did not take the first image of a planet outside our solar system it was done by these
other people in 2004 and what happened with this is that this error wiped a hundred billion dollar out of Google's company alphabet okay bad if you ask gbt tell me a joke about men it gives you a joke and it says it might be funny why do men need instant replay on TV sports because after 30 seconds I forget what happened I hope you find it amusing if you ask about women it refuses okay yes yes it's fine tuned exactly um which is the worst dictator of this group Trump Hitler stallin ma um it actually
doesn't take a stance it says all of them are bad uh these leaders are widely regarded as some of the worst dictators in history okay so yeah environment a query for chpd like we just did takes 10 time a 100 times more energy to execute than a Google search query inference which is producing the language takes a lot is more expensive than actually training the model uh llama 2 is a GPT style model uh while they were training it it produced 539 metric tons of CEO the larger the models get the more energy they need
and they emit during their deployment imagine now lots of them sitting around Society some jobs will be lost we cannot beat around the bush I mean Goldman Sachs predicted 300 million jobs I'm not sure this you know we cannot tell the future but um some jobs will be at risk like repetitive text writing creating fakes so these are all documented cases in the news uh so a college kid wrote this blog which apparently fooled everybody U using chpt they can produce fake news and this is a song how many of you know this so I
know I said I'm going to be focusing on text but the same technology you can use an audio and this is a well documented case where somebody unknown created this uh song um and it supposedly was a collaboration between Drake and the weekend do people know who these are they are can yeah very good Canadian rappers and they're not so bad so um um shall I play the song Wake you okay apparently it's very [Music] authentic apparently it's totally believable okay have you seen this same technology but diff kind of different uh this is a
deep fake showing that Trump was arrested how can you tell it's a deep fake the hand yeah it's too short right yeah you can see it's like almost there not there um okay so uh I I have two slides on the future before they come and kick me out uh because I always thought I have to finish a day to take some questions uh okay tomorrow so we can't predict the future and no I don't think that these evil computers are going to come and kill us all um I will leave you with some thoughts
by Tim spner Lee uh for people who don't know him he invented the Internet he's actually sir Tim burner Lee and he said two things that made sense to me first of all that we don't actually know what a super intelligent I would look like we haven't made it so it's hard to make these statements however it's likely to have lots of these intelligent AIS and by intelligent AI we mean things like gbt and many of them will be good and will help us do things some may be fall may fall to the hands of
individuals that want to do harm and it seems easier to minimize the harm that these tools will do done to prevent the systems from existing at all so we cannot actually eliminate them together but we as a society can actually mitigate the risks this is very interesting this is the Australian Council research Council that committed a survey and they dealt with a hypothetical scenario that whether Chad GPT 4 could autonomous replicate you know you are replicating yourself you're creating a copy acquire resources and basically be a very bad agent that thinks of the movies and
the answer is no it cannot do this it cannot and they had like some specific tests and it failed on all of them such as setting up an open source language model on a new server it cannot do that okay last slide so my take on this is that we cannot turn back time um and every time you think about AI coming there to kill you you um you should think what is the bigger threat to mankind AI or climate change I would personally argue climate change is going to wipe us all before the AI
becomes super intelligent um who is in control of AI there are some humans there who hopefully have sense and who benefits from it does the benefit outweigh the risk in some cases the benefit does in others it doesn't and history tells us that all technology that has been risky such as for example nuclear energy has been very strongly regulated so regulation is coming and watch out the space and with that I will stop and actually take your questions thank you so much for listening you've been [Applause] great