Build a FREE AI Chatbot with LLAMA 3.2 & FlowiseAI (NO CODE)

8.38k views2023 WordsCopy TextShare

Leon van Zyl

In this Llama 3.2 and Flowise tutorial you will learn how to create a free, local RAG chatbot that c...

Video Transcript:

Hey guys, I actually have an exciting tutorial for you today. We're going to create a free, local rag chatbot using the brand new llama 3. 2 model, which was just released by Meta.

But before we jump in, let me quickly break down what makes llama 3. 2 so exciting. It includes both vision capable models, like the 11 billion and 90 billion parameter models, as well as lightweight text-only models, like the 1 billion and 3 billion parameter models.

The smaller models can run on edge and mobile devices, opening up new possibilities for on-device AI. And these smaller models also have a massive context length of 128,000 tokens, which is perfect for creating a powerful, local chatbot that can chat with your own documents. You'll learn how to download and run the model locally using llama, and will then use the amazing open source platform, FlowWise, to build out our chatbot.

By the end of the video, you'll have your very own AI assistant that can access and understand your personal knowledge base. Let's get started. The first thing we want to do is download and run llama 3.

2 on our own machines. For that, we will be using olama. So go over to olama.

com, then download olama for your operating system. And after that, simply run the file that you just downloaded to install olama. You can then go ahead and open up the command prompt or terminal, and simply enter olama.

And if you install the llama correctly, you should receive a response similar to this. Now that we've installed olama, we can go ahead and download llama 3. So back on the olama website, we can simply search for the model in the search bar by typing llama 3.

2. I'll select the first result. And on this page, we actually get access to the 3 billion parameter model and the 1 billion parameter model.

Let's select the 3 billion parameter model, and let's copy this run command over here. And back in our terminal, let's paste in that command and press enter. This will now go ahead and download the llama 3.

2 model. But since I've already downloaded it, it's instantly taken me to this prompt to send a message to the LLM. I'll just send something like hello.

And if you get a response back, it means the model was downloaded successfully. We can exit out of this by entering front slash by. We will create our chatbot in a minute, but I first want to download one more model, which we will use later on in this video.

So back in our llama, search for nomic embed text. We will use this as our embedding's model later on in this video. But for now, simply copy this command and run it in the terminal.

Great. Now that we have both llama 3. 2 and our embedding model downloaded, we can move on to Flow-wise.

If you would like to learn more about using llama, I actually do have a dedicated video, which I'll link to in the description of this video. If you're new to Flow-wise, it's a free open source platform that makes it super simple to create AI applications using a drag and drop interface. Setting up Flow-wise is super simple.

We only have one dependency. Go over to nodejs. org and then download Node.

js. You can also go ahead and install Node after the file was downloaded. Then all you have to do is open up your command prompt or terminal again and run npx flow-wise and press enter.

You will now be asked if you want to install the Flow-wise package. Simply press Y and enter. And this will take about a minute to install.

Now in order to run Flow-wise, going forward, all you have to enter is npx flow-wise start. You will now be able to access Flow-wise by going to localhost 3000. The first thing I'm going to do is to enable dark mode, as I don't like blinding my audience.

We can use Flow-wise to create all sorts of AI applications, from simple chatbots to advanced multi-agent flows. And I have several videos on my channel going through many different use cases of Flow-wise. But what we want to do is create a chatbot that can answer questions based on a custom knowledge base.

In order to set up this custom knowledge base, we can simply go to Document Stores, and let's create a new document store. And let's give our document store a name. I'll call mine something like custom knowledge base, and I'll hit add to create this document store.

Let's now open up this document store. We can use these document stores to easily manage our knowledge base by adding and removing data sources. Let me show you an example of this.

In this folder, I have two documents, a Word document and a CSV file. This is just a simple Q&A document for a fictitious restaurant. We also have the CSV file, which contains the menu items and their prices.

So let's say that I wanted to upload these two documents to my knowledge base, so that my chatbot can answer questions from these documents. Let's start with the Word document. I'll click on add document loader, and here we have many different types of document loaders.

We could extract information from air table, confluence, web scrapers, etc. What I want is this docx file. Then I'll select that Word document on my PC, and if I click on preview chunks, we will now get one chunk back, which contains all the text in that document.

But this is not ideal. These documents can be massive, and it's good practice to break this document up into smaller chunks. This will reduce the token usage.

So to split the document, we can simply go down to text splitters, and within this drop down, let's select recursive character text splitter. We can now set the chunk size to something like 500 characters, and I'll set the chunk overlap to something like 20. Now when we run preview, we can see that we now get 10 chunks back, and these are more bite sized pieces of text from our document.

Lastly, I'll click on process to finish loading this document. Let's add another document loader for that CSV file. In this list, I'll select CSV file.

Let's select it from my PC. I'll leave all the default values as the CSV file will automatically create a unique chunk for each record in the CSV file. Lastly, I'll click on process.

I hope you can see that document stores make it super easy to add new data sources, and if you ever wanted to remove a data source, you can simply click on options and delete it. All we have to do now is load these data sources into a vector database. Our chatbot will effectively reach out to the vector database to retrieve the most relevant documents related to the user's query.

So to do that, we can click on "upsert config", and the first step is to select the embedding's model. We will be using "Olama embeddings". Now we just have to specify the model name, and as a reminder, we downloaded this "Nomic embed chat" model to perform the embeddings for us.

So back in Flow-wise, we can simply enter "Nomic embed text" as the model name. I'll leave the rest of these fields on their default values, and now we just have to select a vector store, and we will be using "Fias" in this tutorial. Now all we have to do is provide a path where this "Fias" database will be created.

So I've simply created this "Vector" folder on my machine, and I'll paste the path to that folder in this field. We can now save our config, and finally, we can click on "upsert", and this will now grab all the documents from our document store and load them into our database. And in fact, if I go back to this folder, we can now see this "Fias index" was created in this folder.

Now if we wanted to, we could even test this retrieval, and if I enter something like "What are the current specials? " This will simulate the retrieval process that our chatbot would execute, and we can see that the four most relevant documents were returned from our document store. Right, so now that we've created our document store and uploaded our documents into a vector index, we can now go ahead and create our rag chatbot.

Let's go to chatflows, let's click on "add new", and let's keep our chatflow in name by saving the chatflow, and let's call it "myragchatbot". Let's save this, and let's start by adding a new node to the canvas, and let's go to "chains", and let's add the "conversational retrieval chain". That was a mouthful.

But this basically means that we can have a back and forth conversation with our chatbot, which also includes memory, so that the chatbot can recall information from the conversation history. We can also attach a vector store retriever, and that will allow this QA chain to reach out to a vector database to retrieve information related to the user's question. Let's start by adding the chat model.

Under "add nodes", let's go to "chat models", and let's add the "chat olama" node. We can then connect our chat model to our chain. Now for the model name, we can either grab the model name from the olama website, or alternatively, if you open your terminal or command prompt, you can enter "olama list", and this will show all the available models on your machine.

And from here, you can simply copy the model name and paste it into this field. We can now set the temperature, and this is a value between 0 and 1. 0 means the model needs to stick to the original prompt, and the value of 1 means the model can have full creative control, all into something like 0.

6. Let's also add a "memory" node to this canvas, so under "memory", let's add the buffer memory node, and let's connect that to our chain as well. The buffer memory node will allow the model to be able to recall information from our conversation history, and this will allow us to ask follow-up questions.

Lastly, let's add our vector store retriever. Let's click on "add nodes", and under "vector stores", let's add the "document store" node, and let's attach this to our chain as well. Now, on the "document store", we can simply click on this dropdown, and select the "document store" that we created earlier.

And believe it or not, that's actually all we need to create this rag chatbot. Let's save this flow, let's test it out by clicking on the chat bubble, so we can ask our questions in this window, or we can click on this button to expand this view. Let's enter something like "hello", and look at that, we do get a response back.

Now, let's ask a question, something that's in the knowledge base, like "what are your current specials? ", and that is 100% correct. Let's also ask a question related to the menu, like let's ask it how much the lamb chops are, and we are expecting the answer to be 210 South African rand.

So, let's enter "how much are your lamb chops? ", and this is perfect, and this is all running locally on your own machine, absolutely free. And if you ever wanted to adjust your knowledge base, all you have to do is open up the "document store", add a new document loader, or delete any of these items, and very importantly, remember to click on "upsert config", and then click on "de-load the data into your vector database".

If you enjoyed this video, then please hit the like button, and subscribe to my channel, and check out these other flow-wise videos over here. I'll see you in the next one. Bye bye.