What is RAG? (Retrieval Augmented Generation)

133.09k views1925 WordsCopy TextShare
Don Woodlock
How do you create an LLM that uses your own internal content? You can imagine a patient visiting y...
Video Transcript:
hello everyone uh welcome to my code deare uh video series um what I'm doing is I'm rotating through three different types of topics educational topics uh use case topics and then kind of bias ethics safety uh topic so now on the education rotation and today what I wanted to talk about is uh what is retrieval augmented generation or rag uh and you may think that I'm going into some kind of nook and cranny of the AI uh field but this is a very important and popular kind of solution pattern that I see um being used
over and over and over again for uh how to leverage large language models so I thought I would explain it uh to you uh and the the the thing that this is used for is basically systems that leverage large language models but on your own content so let me describe that if you think of like the chat GPT experience and if you think about that um relative to like the search engine experience that we had before if you ask a question like um I don't know what color is the sky or how do I fix
this plumbing issue or something like that a search engine would go out uh or appear to go out search the internet find relevant content and then just list that content for you list those links and then you as a user would need to click on the links that seem seem right read it digest it and figure out the answer to your question what a large language model does is it seems to do that first part meaning leverage the content on the whole internet but instead of just listing that content it sort of digests it digests
it combines it assembles it together and answers your question sort of generates an answer um so it's a whole lot better I mean search engines have been great but this is taking the whole experience to another level and in addition the question and answering uh you can also give it instructions like write me this document or write me a lesson plan to teach geometry to seventh graders uh and it will do something similar it will kind of assemble content that it SE that it has seen uh that talks about geometry or seventh graders or how
to do lesson plans or whatever uh pulls that together assembles it and then writes out a lesson plan okay so it's a much better experience than just taking the raw content from the internet but it really uh creates something new from that now let's say you want that same experience but on your own content so it might be a chatbot on your website or you might have a library of PDF documents that this documentation for one of your products uh and instead of just linking the user to parag sections of the documentation you want to
actually answer their question uh it might be your service ticketing uh system so when a new issue comes in you could say how would I resolve this issue and it can assemble past similar issues uh and then come up with a new uh new solution based on that so this is an incredible experience that these large language models offer but how can you create that experience on your own content uh that might not be available to the internet or available to these large language models well the solution to this is this rag um architecture this
retrieval augmented uh generation architecture so now I'm going to do my best to explain that uh to you so let's say you have a um user and I'm going to use the example of a uh patient chatbot and the content source is going to be that content from your website let's say or could be content from PDF documents or or whatever but you want this to be the content to answer the patient's questions so if the patient has a question like how do I prepare for my knee surgery instead of just going to chat sheet
PT and getting a generic answer you'd like to provide an answer that's from your health system or a question like do you have parking you'd like to provide an answer for your health system for your the office where the patient is seen okay so that's a scenario that I'd like to do so the patient has a question uh and I'm going to do do you have parking have parking um you can uh imagine that question being bundled up into a prompt what's called a prompt and I'll describe this more later so there is the question
that prompt is sent to a large language model and that large language model will come up with a response to that question okay now um if you just wanted to use uh chat GPT let's say or some other llm uh without any extra content you could just use this flow how do I prepare for my knee surgery or do you have parking put that into a prompt send that to the uh large language model and get a response back okay but uh but what we want to do is enhance this experience with our own content
so let's say here is your content source and again this might be all the content of your website or PDF documents or internal ticketing system or databases or that uh that sort of thing and what you'd like to do is something called called the prop before the propt so in these systems you don't just send the user question to the large language model you usually have some level of instructions So the instructions might be you are a contact center specialist working for a hospital answering patient questions that come in over the Internet uh please be
uh nice to the patients and responsive and folksy because that fits with our brand or some instructions like that are sometimes sent with the prompt um and then uh Additionally you want to provide the information that the L llm needs to answer the question so what you'd ideally like is information from your website to be included here um and uh and that to be sent to the llm as well so the full prompt might be your instructions it might be something like please use this content um in order to answer the patient question at the
end and then you put in a bunch of information about parking or about knee surgery or whatever the patient asked you put that in the prompt before the prompt then you have the question then you send that whole package to the llm and the llm will give a great response based on your content okay with me so far so um so this notion is the prop before the prompt um and and that's why prompt engineering and these types of things are a big field right now now because you can really hone the um these systems
by doing a better and better job with the actual prompt before the prompt um in uh in this style now the last trick here is your website or your content is huge and it talks about all kinds of topics Beyond parking and Beyond knee surgery so you really want to somehow pull out only the parts of your content that are relevant to the patient's question so this is another um a tricky part of this whole rag architecture uh and the way that works is that um you take all your content and you break it into
chunks or these systems will break it into chunks so chunk might be a paragraph of content or a p or a couple paragraphs a page something like that and then those um chunks are sent to a large language model could be the same one or a different one and they are turned into a vector and uh so each each paragraph or each chunk will have a vector which is just is just a series of numbers and that series of numbers you can think of it as the numeric representation of the essence of that paragraph and
what's uh different about these numbers just they're not random numbers but paragraphs that talk about a similar topic have close by numbers they almost have the same vectors okay so in addition to the uh it's a numera Zed version of the paragraph but it's such that similar paragraphs on similar topics will have similar vectors will have similar numbers so that means that what happens is when um uh a user will ask a question like do you have parking let's say then that is also sent to the llm in real time right after the user asked
the question that comes up with the vector as well you could think of that as the question vector and then what happens we do we do a mathematical comparison real quick between the vector of the question and then the vectors of your content and pick like the top five documents that are closest to this question so do you have parking will be a vector then you have all your content and it's going to try and find the five documents that taught the most about parking basically um and so it'll find those I don't know what
that is it'll find those documents let's say uh from these it'll grab the paragraphs associated with those documents um and it'll use that here so those will be the subset of your content basically that is used as part of the prompt before the prompt okay so this whole uh concept is uh kind of vectorizing your content uh typically that then our storage in something called a vector database which is basically a representation of your content in this numeric form and then this system that you build this rag system will uh take the question find retrieve
the most relevant content make that as part of the prompt before the prompt send that to the llm and then you'll get a good response back actually so it's a little bit confusing but um but it's actually not that confusing um uh I just made it more confusing by this horrible uh horrible drawing but this whole thing is um what is uh called rag retrieval so you're retrieving the relevant documents from your content you're augmenting the generation process so you're augmenting the lm's ability to do generative AI based on the documents that you retrieve so
that's why it's retrieval augmenting generation okay so I hope that made sense uh like I said this is a very popular um solution pattern that I'm seeing over and over again in fact the majority of llm projects that I see are this kind of thing using my content packaging that up with an llm system to create a kind of chat chpt like experience for my employees or for my customers for my users that kind of thing and it works extremely well that's why uh that's why it's so popular so I hope that was interesting and
educational and made sense if you have any questions please leave them for me uh as part of the comments uh thank you very much
Copyright © 2025. Made with ♥ in London by YTScribe.com