Build a FREE AI Chatbot with LLAMA 3.2 & FlowiseAI (NO CODE)

8.38k views2023 WordsCopy TextShare
Leon van Zyl
In this Llama 3.2 and Flowise tutorial you will learn how to create a free, local RAG chatbot that c...
Video Transcript:
<b>Hey guys, I actually have an exciting</b> <b>tutorial for you today. </b> <b>We're going to create a free, local rag</b> <b>chatbot using the brand</b> <b>new llama 3. 2 model, which</b> <b>was just released by Meta.
</b> <b>But before we jump in, let me quickly</b> <b>break down what makes</b> <b>llama 3. 2 so exciting. </b> <b>It includes both vision capable models,</b> <b>like the 11 billion and 90</b> <b>billion parameter models,</b> <b>as well as lightweight text-only models,</b> <b>like the 1 billion and 3</b> <b>billion parameter models.
</b> <b>The smaller models can run on edge and</b> <b>mobile devices, opening</b> <b>up new possibilities for</b> <b>on-device AI. </b> <b>And these smaller models also have a</b> <b>massive context length of</b> <b>128,000 tokens, which is</b> <b>perfect for creating a powerful, local</b> <b>chatbot that can chat</b> <b>with your own documents. </b> <b>You'll learn how to download and run the</b> <b>model locally using</b> <b>llama, and will then use the</b> <b>amazing open source platform, FlowWise,</b> <b>to build out our chatbot.
</b> <b>By the end of the video, you'll have your</b> <b>very own AI assistant that can access and</b> <b>understand your personal knowledge base. </b> <b>Let's get started. </b> <b>The first thing we want to do is download</b> <b>and run llama 3.
2 on our own machines. </b> <b>For that, we will be using olama. </b> <b>So go over to olama.
com, then download</b> <b>olama for your operating system. </b> <b>And after that, simply run the file that</b> <b>you just downloaded to install olama. </b> <b>You can then go ahead and open up the</b> <b>command prompt or</b> <b>terminal, and simply enter olama.
</b> <b>And if you install the llama correctly,</b> <b>you should receive a</b> <b>response similar to this. </b> <b>Now that we've installed olama, we can go</b> <b>ahead and download llama 3. </b> <b>So back on the olama website, we can</b> <b>simply search for the</b> <b>model in the search bar by</b> <b>typing llama 3.
2. </b> <b>I'll select the first result. </b> <b>And on this page, we actually get access</b> <b>to the 3 billion</b> <b>parameter model and the 1 billion</b> <b>parameter model.
</b> <b>Let's select the 3 billion parameter</b> <b>model, and let's copy</b> <b>this run command over here. </b> <b>And back in our terminal, let's paste in</b> <b>that command and press enter. </b> <b>This will now go ahead and</b> <b>download the llama 3.
2 model. </b> <b>But since I've already downloaded it,</b> <b>it's instantly taken me</b> <b>to this prompt to send</b> <b>a message to the LLM. </b> <b>I'll just send something like hello.
</b> <b>And if you get a response back, it means</b> <b>the model was downloaded successfully. </b> <b>We can exit out of this</b> <b>by entering front slash by. </b> <b>We will create our chatbot in a minute,</b> <b>but I first want to</b> <b>download one more model,</b> <b>which we will use later on in this video.
</b> <b>So back in our llama,</b> <b>search for nomic embed text. </b> <b>We will use this as our embedding's model</b> <b>later on in this video. </b> <b>But for now, simply copy this command and</b> <b>run it in the terminal.
</b> <b>Great. </b> <b>Now that we have both llama 3. 2 and our</b> <b>embedding model downloaded,</b> <b>we can move on to Flow-wise.
</b> <b>If you would like to learn more about</b> <b>using llama, I actually</b> <b>do have a dedicated video,</b> <b>which I'll link to in the</b> <b>description of this video. </b> <b>If you're new to Flow-wise, it's a free</b> <b>open source platform</b> <b>that makes it super simple</b> <b>to create AI applications</b> <b>using a drag and drop interface. </b> <b>Setting up Flow-wise is super simple.
</b> <b>We only have one dependency. </b> <b>Go over to nodejs. org</b> <b>and then download Node.
js. </b> <b>You can also go ahead and install Node</b> <b>after the file was downloaded. </b> <b>Then all you have to do is open up your</b> <b>command prompt or terminal</b> <b>again and run npx flow-wise</b> <b>and press enter.
</b> <b>You will now be asked if you want to</b> <b>install the Flow-wise package. </b> <b>Simply press Y and enter. </b> <b>And this will take</b> <b>about a minute to install.
</b> <b>Now in order to run Flow-wise, going</b> <b>forward, all you have to</b> <b>enter is npx flow-wise start. </b> <b>You will now be able to access Flow-wise</b> <b>by going to localhost 3000. </b> <b>The first thing I'm going to do is to</b> <b>enable dark mode, as I don't like</b> <b>blinding my audience.
</b> <b>We can use Flow-wise to create all sorts</b> <b>of AI applications, from</b> <b>simple chatbots to advanced</b> <b>multi-agent flows. </b> <b>And I have several videos on my channel</b> <b>going through many</b> <b>different use cases of Flow-wise. </b> <b>But what we want to do is create a</b> <b>chatbot that can answer</b> <b>questions based on a custom</b> <b>knowledge base.
</b> <b>In order to set up this custom knowledge</b> <b>base, we can simply</b> <b>go to Document Stores,</b> <b>and let's create a new document store. </b> <b>And let's give our document store a name. </b> <b>I'll call mine something like custom</b> <b>knowledge base, and I'll hit add to</b> <b>create this document</b> <b>store.
</b> <b>Let's now open up this document store. </b> <b>We can use these document stores to</b> <b>easily manage our knowledge</b> <b>base by adding and removing</b> <b>data sources. </b> <b>Let me show you an example of this.
</b> <b>In this folder, I have two documents, a</b> <b>Word document and a CSV file. </b> <b>This is just a simple Q&A document for a</b> <b>fictitious restaurant. </b> <b>We also have the CSV file, which contains</b> <b>the menu items and their prices.
</b> <b>So let's say that I wanted to upload</b> <b>these two documents to</b> <b>my knowledge base, so that</b> <b>my chatbot can answer</b> <b>questions from these documents. </b> <b>Let's start with the Word document. </b> <b>I'll click on add document loader, and</b> <b>here we have many different</b> <b>types of document loaders.
</b> <b>We could extract information from air</b> <b>table, confluence, web scrapers, etc. </b> <b>What I want is this docx file. </b> <b>Then I'll select that Word document on my</b> <b>PC, and if I click on</b> <b>preview chunks, we will</b> <b>now get one chunk back, which contains</b> <b>all the text in that document.
</b> <b>But this is not ideal. </b> <b>These documents can be massive, and it's</b> <b>good practice to break</b> <b>this document up into</b> <b>smaller chunks. </b> <b>This will reduce the token usage.
</b> <b>So to split the document, we can simply</b> <b>go down to text</b> <b>splitters, and within this drop</b> <b>down, let's select</b> <b>recursive character text splitter. </b> <b>We can now set the chunk size to</b> <b>something like 500</b> <b>characters, and I'll set the chunk</b> <b>overlap to something like 20. </b> <b>Now when we run preview, we can see that</b> <b>we now get 10 chunks</b> <b>back, and these are more</b> <b>bite sized pieces of</b> <b>text from our document.
</b> <b>Lastly, I'll click on process to finish</b> <b>loading this document. </b> <b>Let's add another document</b> <b>loader for that CSV file. </b> <b>In this list, I'll select CSV file.
</b> <b>Let's select it from my PC. </b> <b>I'll leave all the default values as the</b> <b>CSV file will</b> <b>automatically create a unique chunk</b> <b>for each record in the CSV file. </b> <b>Lastly, I'll click on process.
</b> <b>I hope you can see that document stores</b> <b>make it super easy to</b> <b>add new data sources, and</b> <b>if you ever wanted to remove a data</b> <b>source, you can simply</b> <b>click on options and delete</b> <b>it. </b> <b>All we have to do now is load these data</b> <b>sources into a vector database. </b> <b>Our chatbot will effectively reach out to</b> <b>the vector database to</b> <b>retrieve the most relevant</b> <b>documents related to the user's query.
</b> <b>So to do that, we can click on "upsert</b> <b>config", and the first step</b> <b>is to select the embedding's</b> <b>model. </b> <b>We will be using "Olama embeddings". </b> <b>Now we just have to specify the model</b> <b>name, and as a reminder,</b> <b>we downloaded this "Nomic</b> <b>embed chat" model to</b> <b>perform the embeddings for us.
</b> <b>So back in Flow-wise, we can simply enter</b> <b>"Nomic embed text" as the model name. </b> <b>I'll leave the rest of these fields on</b> <b>their default values, and</b> <b>now we just have to select</b> <b>a vector store, and we will be using</b> <b>"Fias" in this tutorial. </b> <b>Now all we have to do is provide a path</b> <b>where this "Fias"</b> <b>database will be created.
</b> <b>So I've simply created this "Vector"</b> <b>folder on my machine,</b> <b>and I'll paste the path to</b> <b>that folder in this field. </b> <b>We can now save our config, and finally,</b> <b>we can click on</b> <b>"upsert", and this will now</b> <b>grab all the documents from our document</b> <b>store and load them into our database. </b> <b>And in fact, if I go back to this folder,</b> <b>we can now see this</b> <b>"Fias index" was created</b> <b>in this folder.
</b> <b>Now if we wanted to, we could even test</b> <b>this retrieval, and if I</b> <b>enter something like "What</b> <b>are the current specials? "</b> <b>This will simulate the retrieval process</b> <b>that our chatbot would</b> <b>execute, and we can see</b> <b>that the four most relevant documents</b> <b>were returned from our document store. </b> <b>Right, so now that we've created our</b> <b>document store and uploaded</b> <b>our documents into a vector</b> <b>index, we can now go ahead</b> <b>and create our rag chatbot.
</b> <b>Let's go to chatflows, let's click on</b> <b>"add new", and let's</b> <b>keep our chatflow in name</b> <b>by saving the chatflow, and</b> <b>let's call it "myragchatbot". </b> <b>Let's save this, and let's start by</b> <b>adding a new node to the</b> <b>canvas, and let's go to</b> <b>"chains", and let's add the</b> <b>"conversational retrieval chain". </b> <b>That was a mouthful.
</b> <b>But this basically means that we can have</b> <b>a back and forth</b> <b>conversation with our chatbot,</b> <b>which also includes memory, so that the</b> <b>chatbot can recall</b> <b>information from the conversation</b> <b>history. </b> <b>We can also attach a vector store</b> <b>retriever, and that will</b> <b>allow this QA chain to reach</b> <b>out to a vector database to retrieve</b> <b>information related</b> <b>to the user's question. </b> <b>Let's start by adding the chat model.
</b> <b>Under "add nodes", let's go to "chat</b> <b>models", and let's add</b> <b>the "chat olama" node. </b> <b>We can then connect our</b> <b>chat model to our chain. </b> <b>Now for the model name, we can either</b> <b>grab the model name</b> <b>from the olama website, or</b> <b>alternatively, if you open your terminal</b> <b>or command prompt, you</b> <b>can enter "olama list",</b> <b>and this will show all the available</b> <b>models on your machine.
</b> <b>And from here, you can simply copy the</b> <b>model name and paste it into this field. </b> <b>We can now set the temperature, and this</b> <b>is a value between 0 and 1. </b> <b>0 means the model needs to stick to the</b> <b>original prompt, and the</b> <b>value of 1 means the model</b> <b>can have full creative</b> <b>control, all into something like 0.
6. </b> <b>Let's also add a "memory" node to this</b> <b>canvas, so under "memory",</b> <b>let's add the buffer memory</b> <b>node, and let's connect</b> <b>that to our chain as well. </b> <b>The buffer memory node will allow the</b> <b>model to be able to</b> <b>recall information from our</b> <b>conversation history, and this will allow</b> <b>us to ask follow-up questions.
</b> <b>Lastly, let's add our</b> <b>vector store retriever. </b> <b>Let's click on "add nodes", and under</b> <b>"vector stores", let's add</b> <b>the "document store" node,</b> <b>and let's attach this</b> <b>to our chain as well. </b> <b>Now, on the "document store", we can</b> <b>simply click on this dropdown,</b> <b>and select the "document</b> <b>store" that we created earlier.
</b> <b>And believe it or not, that's actually</b> <b>all we need to create this rag chatbot. </b> <b>Let's save this flow, let's test it out</b> <b>by clicking on the chat bubble,</b> <b>so we can ask our questions in this</b> <b>window, or we can click on</b> <b>this button to expand this view. </b> <b>Let's enter something like "hello", and</b> <b>look at that, we do get a response back.
</b> <b>Now, let's ask a question, something</b> <b>that's in the knowledge</b> <b>base, like "what are your current</b> <b>specials? ", and that is 100% correct. </b> <b>Let's also ask a</b> <b>question related to the menu,</b> <b>like let's ask it how much the lamb chops</b> <b>are, and we are</b> <b>expecting the answer to be 210</b> <b>South African rand.
So, let's enter "how</b> <b>much are your lamb</b> <b>chops? ", and this is perfect,</b> <b>and this is all running locally on your</b> <b>own machine, absolutely free. </b> <b>And if you ever wanted to adjust your</b> <b>knowledge base, all you have to do is</b> <b>open up the "document</b> <b>store", add a new document loader, or</b> <b>delete any of these</b> <b>items, and very importantly,</b> <b>remember to click on "upsert config", and</b> <b>then click on "de-load</b> <b>the data into your vector</b> <b>database".
If you enjoyed this video,</b> <b>then please hit the like button, and</b> <b>subscribe to my channel,</b> <b>and check out these other flow-wise</b> <b>videos over here. I'll see</b> <b>you in the next one. Bye bye.
Copyright Ā© 2024. Made with ā™„ in London by YTScribe.com