Make Your RAG Agents Actually Work! (No More Hallucinations)

14.7k views6181 WordsCopy TextShare
Leon van Zyl
In this Agentic RAG tutorial you will learn how to use the latest RAG techniques to build an AI Agen...
Video Transcript:
<b>In this video, we will push the</b> <b>boundaries of what is possible with llama</b> <b>3. 2, a tiny open-source</b> <b>model that you can run on your own</b> <b>machine by building this advanced</b> <b>multi-stage RAG agent. </b> <b>This flow might look intimidating, but</b> <b>let's simplify it before</b> <b>we jump into the detail.
</b> <b>With a traditional RAG agent, you would</b> <b>have a user sending a</b> <b>question to an AI agent. So the</b> <b>agent will use an LLM to try and answer</b> <b>the user's question and</b> <b>respond with some answer. </b> <b>Now of course, we can also add RAG to</b> <b>this flow by adding some sort of</b> <b>additional knowledge</b> <b>to the flow, like a vector store, a</b> <b>database, a PDF file, or it could just be</b> <b>pretty much anything</b> <b>else.
So the agent will try to do a</b> <b>semantic search to retrieve the most</b> <b>relevant information</b> <b>from the data source and inject that into</b> <b>the LLMs prompt. Now this</b> <b>agent will work fine for</b> <b>very simple use cases, but it can also</b> <b>introduce some problems. For example,</b> <b>when the agent reaches</b> <b>out to the vector store, it's very</b> <b>possible that the documents</b> <b>returned from the database</b> <b>are not of a very high quality or not</b> <b>related to the user's question.
The</b> <b>accuracy of the documents</b> <b>returned can depend on quite a few</b> <b>things, like the vector</b> <b>store being used or the embedding</b> <b>model that was used. So bad data can lead</b> <b>to hallucinations, and</b> <b>that is where the agent will</b> <b>kind of make up its own answers, which</b> <b>are not factually rooted in</b> <b>what is being returned from</b> <b>the knowledge base. And secondly, the</b> <b>agent will try its best to</b> <b>use the data returned from the</b> <b>knowledge store to come up with some sort</b> <b>of answer, and that response might</b> <b>actually not answer the</b> <b>user's original question.
So what we can</b> <b>do instead is build agentic</b> <b>RAG systems. These workflows</b> <b>execute several steps that will ensure</b> <b>that the quality of the</b> <b>responses are greatly improved. </b> <b>This implements several concepts that are</b> <b>being used today to build</b> <b>these advanced RAG agents,</b> <b>first of which is routing.
With routing,</b> <b>we can direct the user's</b> <b>question down different paths</b> <b>to retrieve the data that's the most</b> <b>relevant to the user's question. For</b> <b>example, for certain</b> <b>questions, we might want to go to our</b> <b>vector store or our knowledge base, or</b> <b>for certain questions,</b> <b>we might want to do a web search instead,</b> <b>for example, to retrieve live</b> <b>information. The second</b> <b>technique that we will implement is</b> <b>called fallback.
This means that after</b> <b>we've retrieved the data</b> <b>from the vector store, we will use</b> <b>another LLM to first check if the</b> <b>documents returned are actually</b> <b>relevant to the user's question, and if</b> <b>the documents are not</b> <b>relevant, we can try to do a</b> <b>web search instead. And lastly, we have a</b> <b>technique called self-correction. This</b> <b>means that after we've</b> <b>retrieved the most relevant information,</b> <b>we will use an LLM to try</b> <b>and answer the user's question</b> <b>from the context, but before showing the</b> <b>answer to the user, we will</b> <b>use another LLM to check the</b> <b>answer for any hallucinations and that</b> <b>the answer is actually relevant to the</b> <b>user's question.
And</b> <b>if it's not, we can attempt to rephrase</b> <b>the question or simply</b> <b>respond with something like,</b> <b>"I couldn't find the answer. " Now, let's</b> <b>visualize these three</b> <b>different techniques,</b> <b>and afterwards, we'll build this flow in</b> <b>Flowwise. As with the normal</b> <b>RAG agent, our user will send</b> <b>the question to our RAG pipeline.
Now,</b> <b>this first agent will</b> <b>actually not try to answer the user's</b> <b>question outright, but instead, it will</b> <b>actually look at the user's</b> <b>question and try to determine</b> <b>what the correct data source would be. We</b> <b>will call this a router</b> <b>agent. It could decide to go</b> <b>to the knowledge base like a vector</b> <b>store, or it might determine</b> <b>that a web search would be the</b> <b>best approach.
So we will provide some</b> <b>instructions in this agent system prompt</b> <b>to help it make this</b> <b>decision. So for certain information, go</b> <b>to the knowledge base, and</b> <b>for other info, just do a</b> <b>web search. And of course, this knowledge</b> <b>base could be a vector</b> <b>store, a database, a file, or</b> <b>whatever else.
Or if we do a web search,</b> <b>we can use something like</b> <b>Davily or SERP API or any of</b> <b>those services, effectively allowing our</b> <b>agent to do a Google search. In</b> <b>respective of which route</b> <b>we decided to take, we should have some</b> <b>information available. So you</b> <b>would think that it would make</b> <b>sense to generate an answer at this</b> <b>point, but we still have to</b> <b>risk that the information coming</b> <b>back from the vector store is still not</b> <b>relevant to the user's</b> <b>question.
So what we can do instead</b> <b>is break this connection and then add in</b> <b>another agent into the mix. And this</b> <b>agent is responsible</b> <b>for checking the relevance of the</b> <b>documentation returned from the knowledge</b> <b>base and then provide</b> <b>it with some sort of grade. If these</b> <b>documents are semantically</b> <b>similar to the user's question,</b> <b>then we can go ahead and generate an</b> <b>answer.
Or if this agent</b> <b>determines that these documents are</b> <b>not related to the user's question, then</b> <b>it can fall back to this web search</b> <b>instead. So that means</b> <b>we will discard the documents that we</b> <b>retrieved from the knowledge</b> <b>base and attempt to find the</b> <b>answer from the web instead and then try</b> <b>to generate an answer. </b> <b>So that covers routing and</b> <b>fallback.
Now let's have a look at the</b> <b>final technique and</b> <b>that's called self-correction. </b> <b>So what we can do is add another agent to</b> <b>this flow that's</b> <b>responsible for checking the answer</b> <b>generated from this LLM to ensure that</b> <b>there are no hallucinations</b> <b>and that the answer actually is</b> <b>relevant to the user's question. If the</b> <b>answer is relevant, we will pass that</b> <b>response back to the</b> <b>user.
Or if the answer is not relevant to</b> <b>the user's question or it</b> <b>contains hallucinations,</b> <b>then our agent can simply respond with</b> <b>something like "I don't</b> <b>know". And of course you can take</b> <b>all the techniques that you learned in</b> <b>this video to take this a step further. </b> <b>Where this agent will</b> <b>first try to go to a different data</b> <b>source or even try to rephrase the</b> <b>question for the web search</b> <b>and try again a certain number of times</b> <b>before eventually</b> <b>responding with "I don't know".
So in</b> <b>today's video you will learn a lot. And</b> <b>this is a slightly more</b> <b>complex topic but it's definitely</b> <b>worth learning if you want to take your</b> <b>agent game to the next level. So let's</b> <b>finally build this out</b> <b>in Flow Wise.
I do want to mention that</b> <b>this example is inspired</b> <b>by this article written by</b> <b>the Lang Chain team where they do</b> <b>something extremely similar</b> <b>but coding it out using Lang</b> <b>Graph in Python. So instead of writing</b> <b>this in Python, we will try</b> <b>to build this using a no-code</b> <b>platform called Flow Wise. And by the</b> <b>way, Flow Wise is also using</b> <b>Lang Graph behind the scenes</b> <b>to make this all work.
I also want to</b> <b>mention that you can</b> <b>download this flow for absolutely</b> <b>free and you will find a link to it in</b> <b>the description of this</b> <b>video. This video was a</b> <b>lot of work to create so if you do enjoy</b> <b>it or find any value in it</b> <b>then please hit the like button</b> <b>and subscribe to my channel. So to get</b> <b>started go to Agent Flows and</b> <b>click on Add New.
Let's save</b> <b>this flow by giving it a name something</b> <b>like "Multistage RAG</b> <b>Agent". Let's save this. I will</b> <b>assume that you know the basics of using</b> <b>Agent Flows so if you are new to Agent</b> <b>Flows and sequential</b> <b>agents then check out my crash course</b> <b>over here and then come back to this</b> <b>video.
Let's start by</b> <b>adding a start node to the canvas. So</b> <b>under sequential agents</b> <b>let's go to start and let's</b> <b>add this node to the canvas. Let's start</b> <b>by adding a chat model.
Let's</b> <b>go to Add Nodes, Chat Models,</b> <b>and let's add a chat-alama node because</b> <b>we do want to demonstrate</b> <b>the ability of using small</b> <b>models like Lama 3. 2. But of course</b> <b>you're more than welcome to use OpenAI,</b> <b>Anthropic, or any other</b> <b>model.
If you're unfamiliar with Olama</b> <b>it's an awesome tool that</b> <b>allows you to run open source</b> <b>models on your own machine. Check out my</b> <b>dedicated video on setting up Olama</b> <b>especially this one that</b> <b>shows you how to build a RAG chatbot</b> <b>using Olama and the Lama 3. 2</b> <b>model.
So for the model name</b> <b>let's enter Lama 3. 2 and we'll be using</b> <b>the 3 billion parameter</b> <b>model. It's set to temperature</b> <b>to a lower value like 0.
2 and let's</b> <b>assign the chat node to the</b> <b>start node. We're not going to</b> <b>add ancient memory in this tutorial but</b> <b>you're free to do it in your</b> <b>version as well. I'll simply</b> <b>add state so under Add Nodes let's add</b> <b>the state node and let's</b> <b>assign it to the start node as</b> <b>well.
At the moment the state node does</b> <b>not contain any values but we</b> <b>will use this to control the</b> <b>state and the behavior of this</b> <b>application during the course of this</b> <b>video. So just to test that the</b> <b>LLM is actually up and running I'll</b> <b>simply add an LLM node like so. I'll</b> <b>simply call it LLM and</b> <b>finally let's connect an end node just to</b> <b>complete this process.
Let's save this</b> <b>flow and in the chat</b> <b>let's simply enter hey and I do get a</b> <b>response back meaning that</b> <b>the connection to Olama and</b> <b>the Lama 3. 2 model is working. I just</b> <b>added this node for testing</b> <b>so I'm going to delete it and</b> <b>instead let's have a look at what we want</b> <b>to build.
First we want to</b> <b>receive the user's question and</b> <b>then use an LLM to route the question</b> <b>down different parts. </b> <b>Either we want to retrieve data</b> <b>from the knowledge base or simply perform</b> <b>a web search. So let's</b> <b>start by building this router.
</b> <b>Let's go to Add Nodes then under</b> <b>sequential agents we want to do some</b> <b>routing. So either we</b> <b>can use the condition node or the</b> <b>condition agent node. The difference is</b> <b>that with the condition</b> <b>node we can simply look at a specific</b> <b>value and then hard code the</b> <b>path that we need to follow.
</b> <b>Because we're not looking at a very</b> <b>simple value we have to intelligently</b> <b>look at the question</b> <b>that the user is sending and use an LLM</b> <b>to decide where to route</b> <b>to. So let's add the condition</b> <b>agent node. Let's attach the start node</b> <b>to this condition agent</b> <b>node and let's call this the</b> <b>router agent and under additional</b> <b>parameters let's set the system prompt as</b> <b>well as a human prompt.
</b> <b>In the interest of time I'm actually</b> <b>going to paste in the</b> <b>prompts but as a reminder you can</b> <b>download the entire flow for free from</b> <b>the description of this</b> <b>video. You can then easily</b> <b>import that flow into your flow voice</b> <b>instance and copy across my</b> <b>prompts. But in a nutshell we're</b> <b>saying that if the question is related to</b> <b>things like agents, prompt</b> <b>engineering and adversarial</b> <b>attacks then reach out to the vector</b> <b>store or if the question is</b> <b>not related to those topics</b> <b>then perform a web search.
We're also</b> <b>telling this agent to</b> <b>return a JSON structure with a</b> <b>single key called data source which</b> <b>contains the values web</b> <b>search or vector store. We will use</b> <b>these values to figure out where to route</b> <b>the question to and this</b> <b>is simply telling the agent</b> <b>not to try and answer the question itself</b> <b>but to simply return that</b> <b>single value called data source. </b> <b>For the human prompt we can simply create</b> <b>a variable called</b> <b>question and we can then assign</b> <b>a value to that variable by clicking on</b> <b>format prompt values.
</b> <b>Then let's click on this edit</b> <b>button, let's click on this field and</b> <b>let's select the question from the chat</b> <b>box. Now of course we</b> <b>want this node to produce a JSON</b> <b>structure with this data source key so</b> <b>I'll actually copy that</b> <b>value then under JSON structured output</b> <b>we can tell this node to</b> <b>return a JSON structure as the</b> <b>response and we can define the exact</b> <b>fields that we want to get</b> <b>back. So I am expecting one field</b> <b>called data source which is of type enum</b> <b>so enum allows us to provide</b> <b>specific enum values that we</b> <b>expect like vector store and web search</b> <b>and for the description</b> <b>I'll simply enter data source.
</b> <b>Let's close this pop-up and now all we</b> <b>have to do is set these routing</b> <b>conditions. So we could set</b> <b>them within this table which is easy</b> <b>enough to do but what I don't like about</b> <b>this approach is that</b> <b>we will always have this end output over</b> <b>here. I simply want the</b> <b>routes to either be vector store</b> <b>or web search so under condition I'm</b> <b>actually going to switch</b> <b>over to the condition code menu</b> <b>but don't worry we won't be writing any</b> <b>complex code in this video.
</b> <b>Let's click on see example</b> <b>and what we can do is replace this</b> <b>content part with the value that we are</b> <b>outputting in this JSON</b> <b>structure. So just to have a look at that</b> <b>value again we are</b> <b>creating a property called data</b> <b>source so what we can do in the code is</b> <b>to replace content with data source so</b> <b>result will now give</b> <b>us whatever value is captured in data</b> <b>source and we can now see if result</b> <b>includes any of those</b> <b>values. So if result includes vector</b> <b>store then we want to go down</b> <b>the vector store route.
I like</b> <b>to keep these two values the same then</b> <b>let's also copy this and</b> <b>paste it below and for this one we</b> <b>want to see if result includes web search</b> <b>and if it does we want to</b> <b>go down the web search path</b> <b>and I do not want this end route. Let's</b> <b>save this and now on this</b> <b>note we can see that we only have</b> <b>the vector store and web search paths. To</b> <b>test this out let's add</b> <b>two LLM nodes to this canvas</b> <b>so I'll add one over here let's call this</b> <b>one the vector store</b> <b>LLM and this also attaches</b> <b>end node for now and it's also attaches</b> <b>vector store route to this</b> <b>LLM node.
Let's copy this</b> <b>node let's move it down here let's also</b> <b>copy the end node let's</b> <b>attach it to this LLM node let's</b> <b>attach this web search path to the LLM</b> <b>node as well and let's</b> <b>rename this to web search LLM. Now</b> <b>at the moment these LLM nodes won't do</b> <b>much because we simply want</b> <b>to see if this is actually</b> <b>working so let's give an instruction like</b> <b>respond with vector store</b> <b>route and let's do the same</b> <b>thing for the web search node so we'll</b> <b>just say respond with web</b> <b>search route. Okay let's save</b> <b>this and in the chat let's see if this is</b> <b>working what is a react</b> <b>agent so because this question is</b> <b>referring to agents we should expect this</b> <b>flow to go to the vector</b> <b>search LLM which it does and we</b> <b>can see that our router agent determined</b> <b>that the vector store should</b> <b>be called let's try something</b> <b>else like what is the current weather in</b> <b>New York let's send this</b> <b>and this time we've been</b> <b>sent down the web search path great so</b> <b>that means our router is</b> <b>working so let's go ahead and set up</b> <b>this vector store logic and afterwards</b> <b>we'll work on the web search</b> <b>logic so the first thing we need</b> <b>to do is to set up our knowledge base</b> <b>let's go back to the</b> <b>dashboard let's go to document stores</b> <b>so in the interest of time i've already</b> <b>set up a document store for</b> <b>us and in this document store</b> <b>i'm scraping information from these web</b> <b>pages and i didn't set up the</b> <b>config for this document store</b> <b>to use the olama embedding model and i'm</b> <b>using the faius vector</b> <b>store that simply creates the</b> <b>vector store database on my local machine</b> <b>if you're new to document stores then</b> <b>definitely check out</b> <b>my dedicated video on setting up document</b> <b>stores the right way so</b> <b>back in our agent flow let's</b> <b>improve this LLM node i'm actually going</b> <b>to disconnect this end</b> <b>node and in this vector</b> <b>store LLM node let's do the following for</b> <b>the system prompt we can</b> <b>simply enter use the provided</b> <b>tool to answer the user's question and in</b> <b>the human prompt let's</b> <b>enter a variable called question</b> <b>and in format prompt values let's assign</b> <b>a value to question and that</b> <b>will be the question coming</b> <b>in from the chat window then let's add</b> <b>another note to this</b> <b>canvas and that will be the tool</b> <b>node and then let's assign this LLM node</b> <b>to the tool node i'll</b> <b>simply call this tool node vector</b> <b>store tool now let's go ahead and design</b> <b>our vector store retriever</b> <b>tool so under tools let's</b> <b>add the retriever tool let's assign this</b> <b>to the tool node then for</b> <b>the retriever name i'll call</b> <b>this a knowledge base retriever and for</b> <b>the description we can</b> <b>enter the vector store</b> <b>contains documents related to agents</b> <b>prompt engineering and</b> <b>adversarial attacks finally</b> <b>let's attach our document store so under</b> <b>nodes go all the way down</b> <b>to vector stores and add the</b> <b>document store node and attach it to the</b> <b>retriever tool from the drop</b> <b>down select your document store</b> <b>and i'm also going to enable return</b> <b>source documents so that</b> <b>the user can see where this</b> <b>information was retrieved from at this</b> <b>point let's attach this</b> <b>tool node to this end node</b> <b>let's save this flow and let's test this</b> <b>what is a react agent so in these</b> <b>messages we can see that</b> <b>the vector store tool was indeed called</b> <b>this node called our</b> <b>knowledge base retriever tool</b> <b>passing in the keyword react agent and</b> <b>this is the response that</b> <b>came back from our document store</b> <b>we can also click on this button and this</b> <b>will take us directly to</b> <b>that web page that we scraped</b> <b>there is one change that i do want to</b> <b>make to this tool node the reason for</b> <b>this will make sense</b> <b>as we progress with this video but i do</b> <b>think it's a good idea to set</b> <b>this up now as you saw in the</b> <b>chat this tool node retrieves documents</b> <b>from our document store and</b> <b>then those documents could be</b> <b>passed along to the next node but what i</b> <b>want to do instead is</b> <b>actually write the documents to a</b> <b>state value and we can then use the state</b> <b>value to continue with our</b> <b>process that will also allow</b> <b>other nodes in this flow to override the</b> <b>state value let's go back</b> <b>to our state node let's add</b> <b>a new property called documents and for</b> <b>the operation let's</b> <b>select replace and initially</b> <b>this will have no value so what we can do</b> <b>now in the tool node is</b> <b>once we get a response back</b> <b>we can update the state value by adding a</b> <b>new item then under keys</b> <b>let's select documents and</b> <b>for the value let's select the flow</b> <b>output tool output this</b> <b>property contains the output from</b> <b>calling our vector store tool and we can</b> <b>now close this we are now done</b> <b>setting up the knowledge base</b> <b>retrieval so let's move on to the second</b> <b>route which is the web</b> <b>search so let's go to this web</b> <b>search allo m let's click on additional</b> <b>parameters for the system prompt let's</b> <b>enter use the provider</b> <b>tool to answer the user's question and</b> <b>for the human prompt let's enter a</b> <b>variable for the question</b> <b>then let's click on format prompt values</b> <b>under question let's select</b> <b>the question from the chat</b> <b>box now let's also assign a tool node so</b> <b>under sequential agents</b> <b>let's add the tool node like so</b> <b>let's attach the allo m node to the tool</b> <b>node and let's call this</b> <b>tool node web search tool now</b> <b>thankfully setting up web search is</b> <b>really easy let's go to add nodes then</b> <b>under tools let's add</b> <b>SERP API and let's attach it to the tool</b> <b>node now if this is your</b> <b>first time using SERP API</b> <b>then simply go to SERP API.
Copyright © 2025. Made with ♥ in London by YTScribe.com