Make Your RAG Agents Actually Work! (No More Hallucinations)

14.7k views6181 WordsCopy TextShare

Leon van Zyl

In this Agentic RAG tutorial you will learn how to use the latest RAG techniques to build an AI Agen...

Video Transcript:

In this video, we will push the boundaries of what is possible with llama 3. 2, a tiny open-source model that you can run on your own machine by building this advanced multi-stage RAG agent. This flow might look intimidating, but let's simplify it before we jump into the detail.

With a traditional RAG agent, you would have a user sending a question to an AI agent. So the agent will use an LLM to try and answer the user's question and respond with some answer. Now of course, we can also add RAG to this flow by adding some sort of additional knowledge to the flow, like a vector store, a database, a PDF file, or it could just be pretty much anything else.

So the agent will try to do a semantic search to retrieve the most relevant information from the data source and inject that into the LLMs prompt. Now this agent will work fine for very simple use cases, but it can also introduce some problems. For example, when the agent reaches out to the vector store, it's very possible that the documents returned from the database are not of a very high quality or not related to the user's question.

The accuracy of the documents returned can depend on quite a few things, like the vector store being used or the embedding model that was used. So bad data can lead to hallucinations, and that is where the agent will kind of make up its own answers, which are not factually rooted in what is being returned from the knowledge base. And secondly, the agent will try its best to use the data returned from the knowledge store to come up with some sort of answer, and that response might actually not answer the user's original question.

So what we can do instead is build agentic RAG systems. These workflows execute several steps that will ensure that the quality of the responses are greatly improved. This implements several concepts that are being used today to build these advanced RAG agents, first of which is routing.

With routing, we can direct the user's question down different paths to retrieve the data that's the most relevant to the user's question. For example, for certain questions, we might want to go to our vector store or our knowledge base, or for certain questions, we might want to do a web search instead, for example, to retrieve live information. The second technique that we will implement is called fallback.

This means that after we've retrieved the data from the vector store, we will use another LLM to first check if the documents returned are actually relevant to the user's question, and if the documents are not relevant, we can try to do a web search instead. And lastly, we have a technique called self-correction. This means that after we've retrieved the most relevant information, we will use an LLM to try and answer the user's question from the context, but before showing the answer to the user, we will use another LLM to check the answer for any hallucinations and that the answer is actually relevant to the user's question.

And if it's not, we can attempt to rephrase the question or simply respond with something like, "I couldn't find the answer. " Now, let's visualize these three different techniques, and afterwards, we'll build this flow in Flowwise. As with the normal RAG agent, our user will send the question to our RAG pipeline.

Now, this first agent will actually not try to answer the user's question outright, but instead, it will actually look at the user's question and try to determine what the correct data source would be. We will call this a router agent. It could decide to go to the knowledge base like a vector store, or it might determine that a web search would be the best approach.

So we will provide some instructions in this agent system prompt to help it make this decision. So for certain information, go to the knowledge base, and for other info, just do a web search. And of course, this knowledge base could be a vector store, a database, a file, or whatever else.

Or if we do a web search, we can use something like Davily or SERP API or any of those services, effectively allowing our agent to do a Google search. In respective of which route we decided to take, we should have some information available. So you would think that it would make sense to generate an answer at this point, but we still have to risk that the information coming back from the vector store is still not relevant to the user's question.

So what we can do instead is break this connection and then add in another agent into the mix. And this agent is responsible for checking the relevance of the documentation returned from the knowledge base and then provide it with some sort of grade. If these documents are semantically similar to the user's question, then we can go ahead and generate an answer.

Or if this agent determines that these documents are not related to the user's question, then it can fall back to this web search instead. So that means we will discard the documents that we retrieved from the knowledge base and attempt to find the answer from the web instead and then try to generate an answer. So that covers routing and fallback.

Now let's have a look at the final technique and that's called self-correction. So what we can do is add another agent to this flow that's responsible for checking the answer generated from this LLM to ensure that there are no hallucinations and that the answer actually is relevant to the user's question. If the answer is relevant, we will pass that response back to the user.

Or if the answer is not relevant to the user's question or it contains hallucinations, then our agent can simply respond with something like "I don't know". And of course you can take all the techniques that you learned in this video to take this a step further. Where this agent will first try to go to a different data source or even try to rephrase the question for the web search and try again a certain number of times before eventually responding with "I don't know".

So in today's video you will learn a lot. And this is a slightly more complex topic but it's definitely worth learning if you want to take your agent game to the next level. So let's finally build this out in Flow Wise.

I do want to mention that this example is inspired by this article written by the Lang Chain team where they do something extremely similar but coding it out using Lang Graph in Python. So instead of writing this in Python, we will try to build this using a no-code platform called Flow Wise. And by the way, Flow Wise is also using Lang Graph behind the scenes to make this all work.

I also want to mention that you can download this flow for absolutely free and you will find a link to it in the description of this video. This video was a lot of work to create so if you do enjoy it or find any value in it then please hit the like button and subscribe to my channel. So to get started go to Agent Flows and click on Add New.

Let's save this flow by giving it a name something like "Multistage RAG Agent". Let's save this. I will assume that you know the basics of using Agent Flows so if you are new to Agent Flows and sequential agents then check out my crash course over here and then come back to this video.

Let's start by adding a start node to the canvas. So under sequential agents let's go to start and let's add this node to the canvas. Let's start by adding a chat model.

Let's go to Add Nodes, Chat Models, and let's add a chat-alama node because we do want to demonstrate the ability of using small models like Lama 3. 2. But of course you're more than welcome to use OpenAI, Anthropic, or any other model.

If you're unfamiliar with Olama it's an awesome tool that allows you to run open source models on your own machine. Check out my dedicated video on setting up Olama especially this one that shows you how to build a RAG chatbot using Olama and the Lama 3. 2 model.

So for the model name let's enter Lama 3. 2 and we'll be using the 3 billion parameter model. It's set to temperature to a lower value like 0.

2 and let's assign the chat node to the start node. We're not going to add ancient memory in this tutorial but you're free to do it in your version as well. I'll simply add state so under Add Nodes let's add the state node and let's assign it to the start node as well.

At the moment the state node does not contain any values but we will use this to control the state and the behavior of this application during the course of this video. So just to test that the LLM is actually up and running I'll simply add an LLM node like so. I'll simply call it LLM and finally let's connect an end node just to complete this process.

Let's save this flow and in the chat let's simply enter hey and I do get a response back meaning that the connection to Olama and the Lama 3. 2 model is working. I just added this node for testing so I'm going to delete it and instead let's have a look at what we want to build.

First we want to receive the user's question and then use an LLM to route the question down different parts. Either we want to retrieve data from the knowledge base or simply perform a web search. So let's start by building this router.

Let's go to Add Nodes then under sequential agents we want to do some routing. So either we can use the condition node or the condition agent node. The difference is that with the condition node we can simply look at a specific value and then hard code the path that we need to follow.

Because we're not looking at a very simple value we have to intelligently look at the question that the user is sending and use an LLM to decide where to route to. So let's add the condition agent node. Let's attach the start node to this condition agent node and let's call this the router agent and under additional parameters let's set the system prompt as well as a human prompt.

In the interest of time I'm actually going to paste in the prompts but as a reminder you can download the entire flow for free from the description of this video. You can then easily import that flow into your flow voice instance and copy across my prompts. But in a nutshell we're saying that if the question is related to things like agents, prompt engineering and adversarial attacks then reach out to the vector store or if the question is not related to those topics then perform a web search.

We're also telling this agent to return a JSON structure with a single key called data source which contains the values web search or vector store. We will use these values to figure out where to route the question to and this is simply telling the agent not to try and answer the question itself but to simply return that single value called data source. For the human prompt we can simply create a variable called question and we can then assign a value to that variable by clicking on format prompt values.

Then let's click on this edit button, let's click on this field and let's select the question from the chat box. Now of course we want this node to produce a JSON structure with this data source key so I'll actually copy that value then under JSON structured output we can tell this node to return a JSON structure as the response and we can define the exact fields that we want to get back. So I am expecting one field called data source which is of type enum so enum allows us to provide specific enum values that we expect like vector store and web search and for the description I'll simply enter data source.

Let's close this pop-up and now all we have to do is set these routing conditions. So we could set them within this table which is easy enough to do but what I don't like about this approach is that we will always have this end output over here. I simply want the routes to either be vector store or web search so under condition I'm actually going to switch over to the condition code menu but don't worry we won't be writing any complex code in this video.

Let's click on see example and what we can do is replace this content part with the value that we are outputting in this JSON structure. So just to have a look at that value again we are creating a property called data source so what we can do in the code is to replace content with data source so result will now give us whatever value is captured in data source and we can now see if result includes any of those values. So if result includes vector store then we want to go down the vector store route.

I like to keep these two values the same then let's also copy this and paste it below and for this one we want to see if result includes web search and if it does we want to go down the web search path and I do not want this end route. Let's save this and now on this note we can see that we only have the vector store and web search paths. To test this out let's add two LLM nodes to this canvas so I'll add one over here let's call this one the vector store LLM and this also attaches end node for now and it's also attaches vector store route to this LLM node.

Let's copy this node let's move it down here let's also copy the end node let's attach it to this LLM node let's attach this web search path to the LLM node as well and let's rename this to web search LLM. Now at the moment these LLM nodes won't do much because we simply want to see if this is actually working so let's give an instruction like respond with vector store route and let's do the same thing for the web search node so we'll just say respond with web search route. Okay let's save this and in the chat let's see if this is working what is a react agent so because this question is referring to agents we should expect this flow to go to the vector search LLM which it does and we can see that our router agent determined that the vector store should be called let's try something else like what is the current weather in New York let's send this and this time we've been sent down the web search path great so that means our router is working so let's go ahead and set up this vector store logic and afterwards we'll work on the web search logic so the first thing we need to do is to set up our knowledge base let's go back to the dashboard let's go to document stores so in the interest of time i've already set up a document store for us and in this document store i'm scraping information from these web pages and i didn't set up the config for this document store to use the olama embedding model and i'm using the faius vector store that simply creates the vector store database on my local machine if you're new to document stores then definitely check out my dedicated video on setting up document stores the right way so back in our agent flow let's improve this LLM node i'm actually going to disconnect this end node and in this vector store LLM node let's do the following for the system prompt we can simply enter use the provided tool to answer the user's question and in the human prompt let's enter a variable called question and in format prompt values let's assign a value to question and that will be the question coming in from the chat window then let's add another note to this canvas and that will be the tool node and then let's assign this LLM node to the tool node i'll simply call this tool node vector store tool now let's go ahead and design our vector store retriever tool so under tools let's add the retriever tool let's assign this to the tool node then for the retriever name i'll call this a knowledge base retriever and for the description we can enter the vector store contains documents related to agents prompt engineering and adversarial attacks finally let's attach our document store so under nodes go all the way down to vector stores and add the document store node and attach it to the retriever tool from the drop down select your document store and i'm also going to enable return source documents so that the user can see where this information was retrieved from at this point let's attach this tool node to this end node let's save this flow and let's test this what is a react agent so in these messages we can see that the vector store tool was indeed called this node called our knowledge base retriever tool passing in the keyword react agent and this is the response that came back from our document store we can also click on this button and this will take us directly to that web page that we scraped there is one change that i do want to make to this tool node the reason for this will make sense as we progress with this video but i do think it's a good idea to set this up now as you saw in the chat this tool node retrieves documents from our document store and then those documents could be passed along to the next node but what i want to do instead is actually write the documents to a state value and we can then use the state value to continue with our process that will also allow other nodes in this flow to override the state value let's go back to our state node let's add a new property called documents and for the operation let's select replace and initially this will have no value so what we can do now in the tool node is once we get a response back we can update the state value by adding a new item then under keys let's select documents and for the value let's select the flow output tool output this property contains the output from calling our vector store tool and we can now close this we are now done setting up the knowledge base retrieval so let's move on to the second route which is the web search so let's go to this web search allo m let's click on additional parameters for the system prompt let's enter use the provider tool to answer the user's question and for the human prompt let's enter a variable for the question then let's click on format prompt values under question let's select the question from the chat box now let's also assign a tool node so under sequential agents let's add the tool node like so let's attach the allo m node to the tool node and let's call this tool node web search tool now thankfully setting up web search is really easy let's go to add nodes then under tools let's add SERP API and let's attach it to the tool node now if this is your first time using SERP API then simply go to SERP API.