How to set up RAG - Retrieval Augmented Generation (demo)

39.85k views3226 WordsCopy TextShare
Don Woodlock
When I posted my video on Retrieval Augmented Generation (RAG), I got a lot of requests to show exac...
Video Transcript:
hello everyone welcome back to my code deare uh video series I rotate through education topics uh use case topics and uh bias safety ethics kind of topics and I'm back on my education rotation uh the last education video I did is what is rag retrieval augmented generation and that had way more video viewers than any other topic that I've done uh and a few of you commented that you'd like me to go a little bit deeper into the topic or not just be theoretical but just show you how it actually works so I thought I'd
do a different kind of video one that intersperses uh some of the lightboard stuff here with some coating so you can kind of see how rag actually uh works so I'm going to do this in three different segments and if you um watch my rag video first because I'm going to assume you watched uh that so you can follow uh this one but we're going to dive into this concept of a prompt before the prompt in three different stages so the first stage is let's say you have a user and you're a health system and
you want to provide a patient chatbot uh that allows your patients to ask you questions so they might ask a question do you have parking something like that and you have the prompt uh and then you have the large language model that uh returns an answer to that prompt back to the user so the first thing we're going to do is just a pass through prompt we're going to take literally what the user says and pass it onto the large language model and get a response back so that um that in my prior diagram was
just that bottom area where we just passed the user's prompt do you have parking to the large language model and see how it responds back so let's go to the desktop and take a look at what that uh looks like in code so what we're what you're looking at here is an environment called Jupiter notebook and this is the environment that most data scientists will use and machine learning people will use to do simpler experimental kind of programs like uh like this so the way it works is you have code cells and when you run
a cell so I'm going to run this cell uh it will basically execute all the code and then if there's anything to display it'll display it right underneath the uh the cell so kind of intersperses code with with output which is a nice environment for this kind of work uh this first cell is not doing too much it's importing a bunch of libraries um which will'll be used uh uh later so we're going to do simple pass through prompt and the prompt I chose is just something you might ask chat GPT what temperature should I
should I set my house on should what temperature I set my house on vacation so the pipes don't freeze so that's the question question so if I run that cell it sets this string into this variable and then it'll display uh the value of that variable uh and this is how you call um uh chat GPT in particular most of the models look kind of like this you um there's some uh API uh here you pass the model name that you'd like to use I'm using the um the fast uh cheap gp35 turbo um and
basically I'm asking question here and I'm the role of the user and the content of my package is this uh this question so if I do that uh in a second or so uh it comes back from chat GPT um and it has a bunch of uh parameters and attributes and stuff like that but you can pull out the actual response um by this code the First Choice the message attribute the content attribute of that message it's recommended to keep your house temperature at 55 degrees F okay so that was just taking that question and
then passing it through over to chat GPT and getting a response back in displaying all right so that was pretty straightforward uh now what we're going to do is we're going to add one layer of prop before the prompt and this is just extra instructions so in my prior diagram I drew it here where you might add instructions before you send to the large language model like answer the question as if you're a contact center specialist talking to a patient or answer the question at such and such reading level or if the patient asked the
question in Spanish answer the question in Spanish or um use this kind of style we we want a folky style at our health system so use a folksy style responding back so the user doesn't see that they just see their question but you have added a promp before the prompt with some instructions send that to the large language model I'm me to add this Arrow uh and then got to response back so um so that's a really light promp of for the prompt so let's take a look at what that looks like in code now
so I'm still going to use that same question uh but now I'm just going to um add a little bit of instructions here and you can see the way this is done in this API is that you're telling the system what it is doing and so the system you want you're saying you are an you are an assistant who is helping answer questions please answer as if you're talking to an 8-year-old uh so I wanted to just give you know something that we might be able to identify as an instruction so if I do this
uh and then show the response when you go on vacation it's a good idea to set your thermostat bah blah BL this temperature is cool enough to save energy but warm enough to keep the pipes from getting too cold and possibly first so you can see this was written for um you know for a child and you could do other sorts of things please answer in P in in a p something silly like that and we'll see how it does when actually let me print this new lines come out when you leave your home behind
and worries of frozen pipes fill your mind said your thermost set to 50° interesting it gives a slightly different answer keep your popes uh pipes safe with ease um so you can see that um you can give it commands it's a little bitty prompt before the prompt in this case and that will adjust um the way it will answer your question so that's not quite rag yet but that's just showing a little bit of prop before the prop worked the user still thought they were just saying this uh but you are supplying more information in
your code uh beyond what the user said um to give a particular response that you may uh that you may want okay so now what we're going to do next is retrieval augmented Generation all right so now we're going to do the real rag stuff the real retrieval augmented generation so what we're going to do is you have a database of content and the content we're going to use for this example is the health systems website so the website uh in their case is like 7500 web pages and essentially we want to pull out the
sections of the website that talk about parking so the way that's done is each web page is turned into a bunch of numbers which is called an embedding so it's a list of numbers that represent the essence of that page and then what you do is you create an embedding of the question itself as another set of numbers and the way these numbers work these Vector embeddings is the vectors that are closest to each other and you can think about this like in math the distance between two points the vectors that are closest to each
other have the same kind of essence they're talking about the same things so essentially you have this embedding for the question and you have all the embeddings for the website and you take the four embeddings in my case so the four web pages that have the closest Vector to the question so these would presumably be four web pages that talk about parking you take the text of those web pages and you stick it here inside the prompt and you basically say okay here are your instructions please use this content when answering the patient's question and
here is the patient question and you send that whole bundle to the large language model and then it should do a better job answering the patient question so now let's jump back into the code and I'll show you how that all gets put together and you can see it in action so um here what we want to do is we're going to pretend we're a health system uh and we want to answer patient questions uh in this case I took the entire website of this local health system in Cambridge Massachusetts called Cambridge Health Alliance um
and I downloaded it so in advance of this uh video and I put it in this um CSV file so let me show you what that looks like it input web page text um actually I'll show it in Excel uh it's a little hard to see because it's wrapping around make cells bigger um but every page of the site gets a different uh cell uh and it just says the text of the page okay so some pages are small some pages have more information in it uh covid guidelines um you know independent review of some
event uh how to contact uh Chas Cambridge Health Alliance that sort of uh that sort of thing so um so there's a there's a 7,000 page or 7,000 line CSV file basically that has all the text from this relatively small uh uh website but big enough big enough to make it interesting uh and we want to answer patient questions the example question I have is do you have parking okay so um so you have all these 7,000 Pages let me show you that it's 7,000 or recall yes 7509 Pages um and what we want to
do and it's all in this little database basically a CSV file we want to add a second column with the embeddings so this is taking each text Page uh and creating uh a vector embedding they kind of represent the essence of that text so this is a little function that does that by calling um gpt3 basically gpt3 small called it's just a little embedding model from the same group uh open AI group so that's a little function that a given text it'll return and embedding and for example uh this is taking the very first line
of text here uh in this table and it's getting an embedding for that text you can see it's relatively fast you can see this embedding is huge I forget the length uh of it actually tell you if you're interested um the length of this particular embedding model is 1536 floats uh numbers so that essentially represents the essence of that first uh line of text so what I'm going to do here is I'm going to um I'm just testing things get the embedding for five lines here and this is the embedding for the first five this
takes um uh under a second but nevertheless since there's 7,000 of them it takes a while to get all the embeds so what I did I'm not going to run this cell um but what this cell does is on every single row it applies the get embedding function and puts the output in this column called embedding okay so I have done that I have saved it twice actually CSV file so we can look at um and then a pickle file which is a little bit a little bit faster uh let's take a look at it
actually uh no notebooks website with embeddings open uh this is a substantially bigger file these embeddings actually are larger than the text itself so um here you have that first thing of text um oops move that column over and this is the embeddings this is that very long number so you have kind of two columns each column is the the First Column is the text and the second column is the embedding let me show you so if I read back in the file and show you it you can see there there's the text and now
we have a second column with the embedding okay so now basically we want to we want to ask ask our question essentially so we want to get the embedding for the question uh let me get this new question new question is do you have parking so I'm going to get the embedding for that question do you have parking and this is the first 10 um floats of that 1500 uh um number embedding uh and so now we want to compare each of the pages to the question embedding and find the four closest ones um the
way you can calculate the closest one is the dot product of those two vectors basically and this is a little function from one of the libraries that computes the dot product between the embedding on each row and the question embedding and then adds that as a distance column into the uh data frame it's called the um uh this little database so now you have the text you have the embedding coresponding to that text and then you have a distance from this embedding to the question embedding okay and um for distance probably not the right uh
term here but I typed it anyway uh for dotproduct which is a version of the distance formula you want the largest value the largest value is the closest so you can see here um just by eyesight you can see that ones on the top here I've sorted this uh you can see the word parking a lot public transportation subtle F blah blah blah and you can see the ones on the bottom that's the way this display Works um are not about parking or not in English few those uh you can see so these are the
ones that are farthest farthest away so what we want to do is take the top four which I'm doing here the text of the top four and we'll just concatenate that together so if I do that you can see that the top four is a bunch of things about parking main entrance free parking lot public transportation plan your trip blah blah blah so you can see you have um four web pages there or content from web pages that are all about parking so now what we can do is include that as context so this kind
of completes this rag uh approach so now we have an instruction you are an assistant helping the Cambridge Health Alliance respond to Patient questions when you answer the question you use the first person to refer to Cambridge Health Alliance so I found that that's a nice instruction to give a natural answer then we're asking the question uh and then we're giving some context use this information from Cha's website it as context to answer the question and that context variable will be replaced by all this text up here please stick to this context when answering the
question okay so that basically um completes the prompt before the prompt we have instructions we have some context that uh gp35 turbo should use to answer the question um and then we have the question itself so now it's asking the question and we'll see how it did yes we have parking in our facilities main entrance is on Revolution Drive in Somerville acoss from the Home Depot blah blah blah so uh parking cost all that stuff so see how helpful this answer ended up being um sort of chewing up essentially their whole website but then further
um further analysis of these four pages that we sent it as context and has a nice readable answer so that's the rag uh process I'll just show you just a little bit uh more to make it a little bit more repeatable now I just put everything into one function which takes the question uh and then Returns the answer basically by looking up um uh you know calculating the distance between that question and each of the embeddings uh finding the top four um web pages uh based on that distance um joining those together into this context
string asking the question just like we did before using that context uh string in there uh and then returning the response so um so that's a function that does it all in one step so you can see I'm going to run it again and do you have parking um and um it actually uh just like these open AI models or any of these llms it gives a different answer every time there did a Randomness uh in it um but another good answer and then I'm just going to run everything below run selected cell and all
below uh these are just different questions um that uh I think are popular with uh patients depending on their situation do you offer Addiction Services how do I prepare for my knee uh surgery I picked one of the Physicians that was listed early first on their website is she taking new patients do you take Blue Cross do you take Medicare is there somebody that speaks Spanish what hospitals are you affiliated with and how far Advanced do I have to make a point so that's what it all looks like in code I'll go back to the
lightboard and just uh finish up now okay that was it I hope that uh I hope that clarified how rag really works at one level deeper than my last video you can see it in code there and it's not that complicated that was about 30 40 lines of code to put together this whole system obviously leveraging a lot of code behind a large language model but just putting together your content this stitching together in a rag system uh like I've talked about and then leveraging a large language model you can really create powerful software uh
without a lot of effort uh frankly just just pulling these pieces together so that was it hope that was useful and until next time bye
Copyright © 2025. Made with ♥ in London by YTScribe.com