<b>In this video, we will push the</b> <b>boundaries of what is possible with llama</b> <b>3. 2, a tiny open-source</b> <b>model that you can run on your own</b> <b>machine by building this advanced</b> <b>multi-stage RAG agent. </b> <b>This flow might look intimidating, but</b> <b>let's simplify it before</b> <b>we jump into the detail.
</b> <b>With a traditional RAG agent, you would</b> <b>have a user sending a</b> <b>question to an AI agent. So the</b> <b>agent will use an LLM to try and answer</b> <b>the user's question and</b> <b>respond with some answer. </b> <b>Now of course, we can also add RAG to</b> <b>this flow by adding some sort of</b> <b>additional knowledge</b> <b>to the flow, like a vector store, a</b> <b>database, a PDF file, or it could just be</b> <b>pretty much anything</b> <b>else.
So the agent will try to do a</b> <b>semantic search to retrieve the most</b> <b>relevant information</b> <b>from the data source and inject that into</b> <b>the LLMs prompt. Now this</b> <b>agent will work fine for</b> <b>very simple use cases, but it can also</b> <b>introduce some problems. For example,</b> <b>when the agent reaches</b> <b>out to the vector store, it's very</b> <b>possible that the documents</b> <b>returned from the database</b> <b>are not of a very high quality or not</b> <b>related to the user's question.
The</b> <b>accuracy of the documents</b> <b>returned can depend on quite a few</b> <b>things, like the vector</b> <b>store being used or the embedding</b> <b>model that was used. So bad data can lead</b> <b>to hallucinations, and</b> <b>that is where the agent will</b> <b>kind of make up its own answers, which</b> <b>are not factually rooted in</b> <b>what is being returned from</b> <b>the knowledge base. And secondly, the</b> <b>agent will try its best to</b> <b>use the data returned from the</b> <b>knowledge store to come up with some sort</b> <b>of answer, and that response might</b> <b>actually not answer the</b> <b>user's original question.
So what we can</b> <b>do instead is build agentic</b> <b>RAG systems. These workflows</b> <b>execute several steps that will ensure</b> <b>that the quality of the</b> <b>responses are greatly improved. </b> <b>This implements several concepts that are</b> <b>being used today to build</b> <b>these advanced RAG agents,</b> <b>first of which is routing.
With routing,</b> <b>we can direct the user's</b> <b>question down different paths</b> <b>to retrieve the data that's the most</b> <b>relevant to the user's question. For</b> <b>example, for certain</b> <b>questions, we might want to go to our</b> <b>vector store or our knowledge base, or</b> <b>for certain questions,</b> <b>we might want to do a web search instead,</b> <b>for example, to retrieve live</b> <b>information. The second</b> <b>technique that we will implement is</b> <b>called fallback.
This means that after</b> <b>we've retrieved the data</b> <b>from the vector store, we will use</b> <b>another LLM to first check if the</b> <b>documents returned are actually</b> <b>relevant to the user's question, and if</b> <b>the documents are not</b> <b>relevant, we can try to do a</b> <b>web search instead. And lastly, we have a</b> <b>technique called self-correction. This</b> <b>means that after we've</b> <b>retrieved the most relevant information,</b> <b>we will use an LLM to try</b> <b>and answer the user's question</b> <b>from the context, but before showing the</b> <b>answer to the user, we will</b> <b>use another LLM to check the</b> <b>answer for any hallucinations and that</b> <b>the answer is actually relevant to the</b> <b>user's question.
And</b> <b>if it's not, we can attempt to rephrase</b> <b>the question or simply</b> <b>respond with something like,</b> <b>"I couldn't find the answer. " Now, let's</b> <b>visualize these three</b> <b>different techniques,</b> <b>and afterwards, we'll build this flow in</b> <b>Flowwise. As with the normal</b> <b>RAG agent, our user will send</b> <b>the question to our RAG pipeline.
Now,</b> <b>this first agent will</b> <b>actually not try to answer the user's</b> <b>question outright, but instead, it will</b> <b>actually look at the user's</b> <b>question and try to determine</b> <b>what the correct data source would be. We</b> <b>will call this a router</b> <b>agent. It could decide to go</b> <b>to the knowledge base like a vector</b> <b>store, or it might determine</b> <b>that a web search would be the</b> <b>best approach.
So we will provide some</b> <b>instructions in this agent system prompt</b> <b>to help it make this</b> <b>decision. So for certain information, go</b> <b>to the knowledge base, and</b> <b>for other info, just do a</b> <b>web search. And of course, this knowledge</b> <b>base could be a vector</b> <b>store, a database, a file, or</b> <b>whatever else.
Or if we do a web search,</b> <b>we can use something like</b> <b>Davily or SERP API or any of</b> <b>those services, effectively allowing our</b> <b>agent to do a Google search. In</b> <b>respective of which route</b> <b>we decided to take, we should have some</b> <b>information available. So you</b> <b>would think that it would make</b> <b>sense to generate an answer at this</b> <b>point, but we still have to</b> <b>risk that the information coming</b> <b>back from the vector store is still not</b> <b>relevant to the user's</b> <b>question.
So what we can do instead</b> <b>is break this connection and then add in</b> <b>another agent into the mix. And this</b> <b>agent is responsible</b> <b>for checking the relevance of the</b> <b>documentation returned from the knowledge</b> <b>base and then provide</b> <b>it with some sort of grade. If these</b> <b>documents are semantically</b> <b>similar to the user's question,</b> <b>then we can go ahead and generate an</b> <b>answer.
Or if this agent</b> <b>determines that these documents are</b> <b>not related to the user's question, then</b> <b>it can fall back to this web search</b> <b>instead. So that means</b> <b>we will discard the documents that we</b> <b>retrieved from the knowledge</b> <b>base and attempt to find the</b> <b>answer from the web instead and then try</b> <b>to generate an answer. </b> <b>So that covers routing and</b> <b>fallback.
Now let's have a look at the</b> <b>final technique and</b> <b>that's called self-correction. </b> <b>So what we can do is add another agent to</b> <b>this flow that's</b> <b>responsible for checking the answer</b> <b>generated from this LLM to ensure that</b> <b>there are no hallucinations</b> <b>and that the answer actually is</b> <b>relevant to the user's question. If the</b> <b>answer is relevant, we will pass that</b> <b>response back to the</b> <b>user.
Or if the answer is not relevant to</b> <b>the user's question or it</b> <b>contains hallucinations,</b> <b>then our agent can simply respond with</b> <b>something like "I don't</b> <b>know". And of course you can take</b> <b>all the techniques that you learned in</b> <b>this video to take this a step further. </b> <b>Where this agent will</b> <b>first try to go to a different data</b> <b>source or even try to rephrase the</b> <b>question for the web search</b> <b>and try again a certain number of times</b> <b>before eventually</b> <b>responding with "I don't know".
So in</b> <b>today's video you will learn a lot. And</b> <b>this is a slightly more</b> <b>complex topic but it's definitely</b> <b>worth learning if you want to take your</b> <b>agent game to the next level. So let's</b> <b>finally build this out</b> <b>in Flow Wise.
I do want to mention that</b> <b>this example is inspired</b> <b>by this article written by</b> <b>the Lang Chain team where they do</b> <b>something extremely similar</b> <b>but coding it out using Lang</b> <b>Graph in Python. So instead of writing</b> <b>this in Python, we will try</b> <b>to build this using a no-code</b> <b>platform called Flow Wise. And by the</b> <b>way, Flow Wise is also using</b> <b>Lang Graph behind the scenes</b> <b>to make this all work.
I also want to</b> <b>mention that you can</b> <b>download this flow for absolutely</b> <b>free and you will find a link to it in</b> <b>the description of this</b> <b>video. This video was a</b> <b>lot of work to create so if you do enjoy</b> <b>it or find any value in it</b> <b>then please hit the like button</b> <b>and subscribe to my channel. So to get</b> <b>started go to Agent Flows and</b> <b>click on Add New.
Let's save</b> <b>this flow by giving it a name something</b> <b>like "Multistage RAG</b> <b>Agent". Let's save this. I will</b> <b>assume that you know the basics of using</b> <b>Agent Flows so if you are new to Agent</b> <b>Flows and sequential</b> <b>agents then check out my crash course</b> <b>over here and then come back to this</b> <b>video.
Let's start by</b> <b>adding a start node to the canvas. So</b> <b>under sequential agents</b> <b>let's go to start and let's</b> <b>add this node to the canvas. Let's start</b> <b>by adding a chat model.
Let's</b> <b>go to Add Nodes, Chat Models,</b> <b>and let's add a chat-alama node because</b> <b>we do want to demonstrate</b> <b>the ability of using small</b> <b>models like Lama 3. 2. But of course</b> <b>you're more than welcome to use OpenAI,</b> <b>Anthropic, or any other</b> <b>model.
If you're unfamiliar with Olama</b> <b>it's an awesome tool that</b> <b>allows you to run open source</b> <b>models on your own machine. Check out my</b> <b>dedicated video on setting up Olama</b> <b>especially this one that</b> <b>shows you how to build a RAG chatbot</b> <b>using Olama and the Lama 3. 2</b> <b>model.
So for the model name</b> <b>let's enter Lama 3. 2 and we'll be using</b> <b>the 3 billion parameter</b> <b>model. It's set to temperature</b> <b>to a lower value like 0.
2 and let's</b> <b>assign the chat node to the</b> <b>start node. We're not going to</b> <b>add ancient memory in this tutorial but</b> <b>you're free to do it in your</b> <b>version as well. I'll simply</b> <b>add state so under Add Nodes let's add</b> <b>the state node and let's</b> <b>assign it to the start node as</b> <b>well.
At the moment the state node does</b> <b>not contain any values but we</b> <b>will use this to control the</b> <b>state and the behavior of this</b> <b>application during the course of this</b> <b>video. So just to test that the</b> <b>LLM is actually up and running I'll</b> <b>simply add an LLM node like so. I'll</b> <b>simply call it LLM and</b> <b>finally let's connect an end node just to</b> <b>complete this process.
Let's save this</b> <b>flow and in the chat</b> <b>let's simply enter hey and I do get a</b> <b>response back meaning that</b> <b>the connection to Olama and</b> <b>the Lama 3. 2 model is working. I just</b> <b>added this node for testing</b> <b>so I'm going to delete it and</b> <b>instead let's have a look at what we want</b> <b>to build.
First we want to</b> <b>receive the user's question and</b> <b>then use an LLM to route the question</b> <b>down different parts. </b> <b>Either we want to retrieve data</b> <b>from the knowledge base or simply perform</b> <b>a web search. So let's</b> <b>start by building this router.
</b> <b>Let's go to Add Nodes then under</b> <b>sequential agents we want to do some</b> <b>routing. So either we</b> <b>can use the condition node or the</b> <b>condition agent node. The difference is</b> <b>that with the condition</b> <b>node we can simply look at a specific</b> <b>value and then hard code the</b> <b>path that we need to follow.
</b> <b>Because we're not looking at a very</b> <b>simple value we have to intelligently</b> <b>look at the question</b> <b>that the user is sending and use an LLM</b> <b>to decide where to route</b> <b>to. So let's add the condition</b> <b>agent node. Let's attach the start node</b> <b>to this condition agent</b> <b>node and let's call this the</b> <b>router agent and under additional</b> <b>parameters let's set the system prompt as</b> <b>well as a human prompt.
</b> <b>In the interest of time I'm actually</b> <b>going to paste in the</b> <b>prompts but as a reminder you can</b> <b>download the entire flow for free from</b> <b>the description of this</b> <b>video. You can then easily</b> <b>import that flow into your flow voice</b> <b>instance and copy across my</b> <b>prompts. But in a nutshell we're</b> <b>saying that if the question is related to</b> <b>things like agents, prompt</b> <b>engineering and adversarial</b> <b>attacks then reach out to the vector</b> <b>store or if the question is</b> <b>not related to those topics</b> <b>then perform a web search.
We're also</b> <b>telling this agent to</b> <b>return a JSON structure with a</b> <b>single key called data source which</b> <b>contains the values web</b> <b>search or vector store. We will use</b> <b>these values to figure out where to route</b> <b>the question to and this</b> <b>is simply telling the agent</b> <b>not to try and answer the question itself</b> <b>but to simply return that</b> <b>single value called data source. </b> <b>For the human prompt we can simply create</b> <b>a variable called</b> <b>question and we can then assign</b> <b>a value to that variable by clicking on</b> <b>format prompt values.
</b> <b>Then let's click on this edit</b> <b>button, let's click on this field and</b> <b>let's select the question from the chat</b> <b>box. Now of course we</b> <b>want this node to produce a JSON</b> <b>structure with this data source key so</b> <b>I'll actually copy that</b> <b>value then under JSON structured output</b> <b>we can tell this node to</b> <b>return a JSON structure as the</b> <b>response and we can define the exact</b> <b>fields that we want to get</b> <b>back. So I am expecting one field</b> <b>called data source which is of type enum</b> <b>so enum allows us to provide</b> <b>specific enum values that we</b> <b>expect like vector store and web search</b> <b>and for the description</b> <b>I'll simply enter data source.
</b> <b>Let's close this pop-up and now all we</b> <b>have to do is set these routing</b> <b>conditions. So we could set</b> <b>them within this table which is easy</b> <b>enough to do but what I don't like about</b> <b>this approach is that</b> <b>we will always have this end output over</b> <b>here. I simply want the</b> <b>routes to either be vector store</b> <b>or web search so under condition I'm</b> <b>actually going to switch</b> <b>over to the condition code menu</b> <b>but don't worry we won't be writing any</b> <b>complex code in this video.
</b> <b>Let's click on see example</b> <b>and what we can do is replace this</b> <b>content part with the value that we are</b> <b>outputting in this JSON</b> <b>structure. So just to have a look at that</b> <b>value again we are</b> <b>creating a property called data</b> <b>source so what we can do in the code is</b> <b>to replace content with data source so</b> <b>result will now give</b> <b>us whatever value is captured in data</b> <b>source and we can now see if result</b> <b>includes any of those</b> <b>values. So if result includes vector</b> <b>store then we want to go down</b> <b>the vector store route.
I like</b> <b>to keep these two values the same then</b> <b>let's also copy this and</b> <b>paste it below and for this one we</b> <b>want to see if result includes web search</b> <b>and if it does we want to</b> <b>go down the web search path</b> <b>and I do not want this end route. Let's</b> <b>save this and now on this</b> <b>note we can see that we only have</b> <b>the vector store and web search paths. To</b> <b>test this out let's add</b> <b>two LLM nodes to this canvas</b> <b>so I'll add one over here let's call this</b> <b>one the vector store</b> <b>LLM and this also attaches</b> <b>end node for now and it's also attaches</b> <b>vector store route to this</b> <b>LLM node.
Let's copy this</b> <b>node let's move it down here let's also</b> <b>copy the end node let's</b> <b>attach it to this LLM node let's</b> <b>attach this web search path to the LLM</b> <b>node as well and let's</b> <b>rename this to web search LLM. Now</b> <b>at the moment these LLM nodes won't do</b> <b>much because we simply want</b> <b>to see if this is actually</b> <b>working so let's give an instruction like</b> <b>respond with vector store</b> <b>route and let's do the same</b> <b>thing for the web search node so we'll</b> <b>just say respond with web</b> <b>search route. Okay let's save</b> <b>this and in the chat let's see if this is</b> <b>working what is a react</b> <b>agent so because this question is</b> <b>referring to agents we should expect this</b> <b>flow to go to the vector</b> <b>search LLM which it does and we</b> <b>can see that our router agent determined</b> <b>that the vector store should</b> <b>be called let's try something</b> <b>else like what is the current weather in</b> <b>New York let's send this</b> <b>and this time we've been</b> <b>sent down the web search path great so</b> <b>that means our router is</b> <b>working so let's go ahead and set up</b> <b>this vector store logic and afterwards</b> <b>we'll work on the web search</b> <b>logic so the first thing we need</b> <b>to do is to set up our knowledge base</b> <b>let's go back to the</b> <b>dashboard let's go to document stores</b> <b>so in the interest of time i've already</b> <b>set up a document store for</b> <b>us and in this document store</b> <b>i'm scraping information from these web</b> <b>pages and i didn't set up the</b> <b>config for this document store</b> <b>to use the olama embedding model and i'm</b> <b>using the faius vector</b> <b>store that simply creates the</b> <b>vector store database on my local machine</b> <b>if you're new to document stores then</b> <b>definitely check out</b> <b>my dedicated video on setting up document</b> <b>stores the right way so</b> <b>back in our agent flow let's</b> <b>improve this LLM node i'm actually going</b> <b>to disconnect this end</b> <b>node and in this vector</b> <b>store LLM node let's do the following for</b> <b>the system prompt we can</b> <b>simply enter use the provided</b> <b>tool to answer the user's question and in</b> <b>the human prompt let's</b> <b>enter a variable called question</b> <b>and in format prompt values let's assign</b> <b>a value to question and that</b> <b>will be the question coming</b> <b>in from the chat window then let's add</b> <b>another note to this</b> <b>canvas and that will be the tool</b> <b>node and then let's assign this LLM node</b> <b>to the tool node i'll</b> <b>simply call this tool node vector</b> <b>store tool now let's go ahead and design</b> <b>our vector store retriever</b> <b>tool so under tools let's</b> <b>add the retriever tool let's assign this</b> <b>to the tool node then for</b> <b>the retriever name i'll call</b> <b>this a knowledge base retriever and for</b> <b>the description we can</b> <b>enter the vector store</b> <b>contains documents related to agents</b> <b>prompt engineering and</b> <b>adversarial attacks finally</b> <b>let's attach our document store so under</b> <b>nodes go all the way down</b> <b>to vector stores and add the</b> <b>document store node and attach it to the</b> <b>retriever tool from the drop</b> <b>down select your document store</b> <b>and i'm also going to enable return</b> <b>source documents so that</b> <b>the user can see where this</b> <b>information was retrieved from at this</b> <b>point let's attach this</b> <b>tool node to this end node</b> <b>let's save this flow and let's test this</b> <b>what is a react agent so in these</b> <b>messages we can see that</b> <b>the vector store tool was indeed called</b> <b>this node called our</b> <b>knowledge base retriever tool</b> <b>passing in the keyword react agent and</b> <b>this is the response that</b> <b>came back from our document store</b> <b>we can also click on this button and this</b> <b>will take us directly to</b> <b>that web page that we scraped</b> <b>there is one change that i do want to</b> <b>make to this tool node the reason for</b> <b>this will make sense</b> <b>as we progress with this video but i do</b> <b>think it's a good idea to set</b> <b>this up now as you saw in the</b> <b>chat this tool node retrieves documents</b> <b>from our document store and</b> <b>then those documents could be</b> <b>passed along to the next node but what i</b> <b>want to do instead is</b> <b>actually write the documents to a</b> <b>state value and we can then use the state</b> <b>value to continue with our</b> <b>process that will also allow</b> <b>other nodes in this flow to override the</b> <b>state value let's go back</b> <b>to our state node let's add</b> <b>a new property called documents and for</b> <b>the operation let's</b> <b>select replace and initially</b> <b>this will have no value so what we can do</b> <b>now in the tool node is</b> <b>once we get a response back</b> <b>we can update the state value by adding a</b> <b>new item then under keys</b> <b>let's select documents and</b> <b>for the value let's select the flow</b> <b>output tool output this</b> <b>property contains the output from</b> <b>calling our vector store tool and we can</b> <b>now close this we are now done</b> <b>setting up the knowledge base</b> <b>retrieval so let's move on to the second</b> <b>route which is the web</b> <b>search so let's go to this web</b> <b>search allo m let's click on additional</b> <b>parameters for the system prompt let's</b> <b>enter use the provider</b> <b>tool to answer the user's question and</b> <b>for the human prompt let's enter a</b> <b>variable for the question</b> <b>then let's click on format prompt values</b> <b>under question let's select</b> <b>the question from the chat</b> <b>box now let's also assign a tool node so</b> <b>under sequential agents</b> <b>let's add the tool node like so</b> <b>let's attach the allo m node to the tool</b> <b>node and let's call this</b> <b>tool node web search tool now</b> <b>thankfully setting up web search is</b> <b>really easy let's go to add nodes then</b> <b>under tools let's add</b> <b>SERP API and let's attach it to the tool</b> <b>node now if this is your</b> <b>first time using SERP API</b> <b>then simply go to SERP API.