Chatting With Your Own Data! Chat, Predict, & Analyze - FlowiseAI Tutorial #6

30.68k views3146 WordsCopy TextShare

Leon van Zyl

#flowiseai #flowise #openai #langchain We can use Retrieval Chains to create chatbots that are abl...

Video Transcript:

Flow-wise makes it easy to create chatbots that are able to answer questions from our own data sources, like PDF documents, websites, databases, or pretty much whatever you want. This is useful for creating chatbots that are able to answer questions about our businesses, services, or products. And this can also be used to create some interesting use cases, like persona chatbots.

We used this technique to create an AI clone of Mreast in one of my previous videos. So in this video you will learn a lot. We will create a retrieval chatbot that is able to answer questions from our own data source, and we will also have a look at storing this knowledge base in a pinecone serverless vector store.

Before we jump into the chat flow, I first want to explain the concept of RAG, or retrieval augmented generation. So let's swap over to chatgpt just to demonstrate this. One of the limitations of AI models is that they can only accurately answer questions based on the data that they were trained on.

If we ask questions about recent events or things like the current weather, the model will typically respond with a message saying that it doesn't know the answer. And if I had to ask it a question from the flowwise documentation, chatgpt won't be able to provide an answer since this information was not included in chatgpt's training data. For instance, let's ask it how we can deploy flowwise to render.

And even worse, the answer that chatgpt is giving us is actually incorrect, as it's actually making assumptions about what flowwise is, and it's assuming that flowwise is using Python. And this is not at all what the recommended steps in the flowwise documentation are. Now let's have a look at how we can fix this response.

And what we could do is to include context in this prompt. So for example, in our prompt we could say something like, "Answer the user's question from the following context. " Then what we could do is inject some context into this prompt, and finally we would then ask our question again, which was, "How do I deploy flowwise to render?

" So in order to improve this response, we could now inject this context, which is effectively all the text on this page, which I could simply just copy from this page and then paste into this prompt. Let's send this prompt, and we are now receiving the correct response. We're almost done with the theory, but I do think it's important for you to understand how RAG actually works.

Obviously we will not be hard coding the context, but instead flowwise will retrieve the context from a data source, and then dynamically inject that context into the final prompt. And I think this diagram in the Langchain documentation explains it best. We will connect our chat flow to a data source, which could be anything.

This could be a website, a PDF document, a Word document, pretty much any data source you can imagine. We will then load the content of that data source into our application as one massive piece of text. Now in order to reduce the token usage, we will then split this data up into smaller chunks, and these chunks will then be converted into a vector representation, which is something that a vector store can use to only retrieve the most relevant documents.

And this will make sense once we start implementing this project, and these embeddings will then be stored in a vector database. And the final step is the retrieval step. So when we ask our chatbot a question, the chatbot will reach out to this vector database, and it will ask the database to retrieve the most relevant documents to our query, and those documents will then be injected into our prompt as context.

I think that's enough theory for now. Let's create our retrieval chatbot. In our Flow-wise dashboard, let's click on Add New.

Let's save this. I'll call mine rag chatbot. Let's save this, and let's start off by adding a chain to our project.

So under chains, let's add the conversational retrieval chain. This is similar to the conversational chain that we created in the previous video, where we have to add a chat model, and optionally we can add memory, but this also allows us to retrieve data from a vector store. Let's start by adding the chat model.

So under Add Nodes, go to Chat Models, and let's add the chat openai model. And let's attach this to our chain. Let's also select our credentials, and let's change the temperature to something like 0.

4. And this is because we don't want the model to be too creative, and it should ideally only answer the questions from the context. We're not going to add memory in this tutorial, as we already covered that in the previous video, so let's simply focus on this vector store retrieval.

We will have a look at adding pinecone to this project later on in this video, but for now, let's click on Add Nodes, and let's go to Vector Stores. Let's add the in-memory vector store. Let's connect this to our chain, and this vector store node takes in two properties, a document as well as embeddings.

So this embeddings input relates to this embed step in this diagram. This is where we will take the data that we retrieved from our data source, and we will use an embed function to convert the data into a numeric representation, which we can store in this vector store. So to add this embeddings function, we can click on Add Nodes.

Let's go to Embeddings, and the embedding node that we have to add will greatly depend on the model that you're using. And since we're using an OpenAI model, I'll simply add the OpenAI embedding node, and let's attach this to the vector store. We can leave the model name as is, and let's select our OpenAI credentials.

Now all that's left to do is to attach our document loader. Let's go to Add Nodes, let's go to Document Loaders, and here you will find a list of document loaders for pretty much any data source you can imagine. We can use the Cheerio web scraper to grab information from a website.

We can upload CSV files, Word documents. You can even select a folder with multiple different file types. You can connect a Notion database, we can upload PDF files, etc.

And something that's very popular as well is to fetch your files from an AWS S3 bucket. In this video, we'll have a look at the PDF uploader, and we'll also have a look at the Cheerio web scraper. Let's first have a look at the PDF file uploader, and let's attach this to our vector store as well.

And you might notice that this flow is starting to look very familiar, and that is the same flow as this diagram over here, with the exception of splitting the document into smaller chunks. And let me explain why you would want to split this data up into smaller chunks. Let's go to the Langchain Expression Language page as an example.

So this page briefly explains what LCEL is, and it then also goes on to describe some of the other features about LCEL. So let's say I wanted to scrape the information from this page, and then ask the model what LCEL stands for. The scraper will extract all the information from this page, including the content from the menu, the sidebar, and all of this information on this page.

And this will be injected into the context. We will get an accurate answer from the chatbot, but there is a massive issue with passing all of this information into the context. The first issue is that all models have some sort of context limit, and by passing in the entire page, we could very easily exceed this limit.

And secondly, in the context of OpenAI, we are charged for the amount of tokens that we use. So by splitting this content up into smaller chunks, we could create separate documents that contain the definition of LCEL, and perhaps another document for streaming support, etc. And then when we ask the model about the definition of Langchain, the vector store will only pass back this section over here, and only this will be injected into our prompt, and not the entire page.

So let's have a look at how we can chunk this data by adding a text splitter. Under Add Nodes, let's go to Text Splitters, and from here, let's add the recursive character text splitter, and let's attach this to our document loader. Here we can specify the chunk size in characters.

Let's leave it as 1000 characters, and we can also specify a chunk overlap. Let's make this something like 50 characters. I do recommend playing with these values, and if you find that your responses are not accurate, simply increase this chunk size until the responses improve.

Of course, there are different types of text splitters as well, so you might want to also play with the different text splitters. For instance, there is a text splitter that is great at splitting code into chunks, and I found that the HTML to mark down text splitter works fantastic when scraping information from a website. Let's upload a PDF file.

For this example, I've downloaded the latest TSLR financial statement. So I'll go ahead and upload this file. I'm actually going to change this to one document per file, and I'll allow the text splitter to break this information up into different chunks.

Alternatively, you could just leave it on one document per page. Let's go ahead and save this chat flow. When dealing with these retrieval chatbots, there are basically two phases to dealing with these retrieval chatbots.

The first phase is to up-cert our data, which simply means fetching the data from this data source and then loading it into the vector store. The second phase is simply ingesting that data, which simply means chatting with the data in this vector store. Now, in order to up-cert this data into the database, we can simply click on this green button, and this will bring up this popup.

Here we can see the different steps involved in loading this data. And what's also cool about Flow-wise is that when we click on Show API, we can actually get the API endpoint for triggering this up-cert from our custom solutions. But in order to load this data using Flow-wise, we can simply click on up-cert.

And I'm getting a message saying that the up-cert was successful. So let's close this popup, and let's try this out. In the chat, let's ask a question relevant to this PDF document.

So let's ask it something like, "What were the total assets in 2023? " And we should get this amount back, which was 93,941 million USD. "What were the total assets in 2023?

" And indeed, we get the correct answer back. Now, let's have a look at how we can scrape information from a website. And specifically, let's have a look at scraping information from the Flow-wise documentation.

How I can deploy Flow-wise to render. Let's go back to our chat flow, and let's delete this PDF document loader. Instead, let's go to Add Nodes, let's go to Document Loaders, and let's add this Cheerio web scraper.

Let's attach this web scraper to the vector store, and let's also connect our text splitter to the document loader. This web scraper takes in a URL as input. So let's copy this URL from the documentation, and let's paste it into this node.

Let's test this out by saving this chat flow. Let's click on "upsert vector database". Let's click on "upsert".

Let's close this popup, and let's try this. So in the chat, let's ask, "How can I deploy Flow-wise to render? " And as you can see, we do get the correct response back.

But let's try asking a question that's actually not in the context, like, "How can I deploy Flow-wise to AWS? " as an example. Let's try that.

So let's send this, and we should get an answer like this. And this is fantastic, because this means that the chatbot will not hallucinate any answers, and it will only provide answers from the context that we provided. So I do want to mention that it is possible to fetch more than one URL at a time, and you could scrape all the content off your website.

You can do that by clicking on additional parameters, then changes get relative links method to web crawl or scrape XML sitemap. So we could select sitemap. You can also specify the maximum amount of pages.

That should be scraped. This can take some time to complete, so I'm actually going to change this to something like five pages. Let's close this popup, then let's go to "manage links", and let's replace this with a link to the sitemap.

We can then click on "fetch links", and this will show us which pages will be scraped. So I can actually see that this getting started page is included in this list, and that refers to this page in the Flow-wise documentation that explains how to install Flow-wise. So we will be asking questions about this then.

Let's click on "save", let's save the chat flow, and let's "upsert" this. Let's click on "upsert", and this might take a minute or two to complete. All right, so this is complete.

Let's test this out. So in the chat, let's ask a question about this getting started page. Let's ask something like, "How can I set up Flow-wise as a developer?

" And that is referring to this section over here, where we effectively have to clone the repository, and then run "yarn install, yarn bold", and then "yarn start". Let's see what we got back. Indeed, it's saying we have to clone the repository, CD into Flow-wise, run "yarn install", then "yarn bold", then "yarn start".

Excellent. Now finally, let's have a look at how we can add a pinecone serverless database to this project. The issue with this in memory vector store is if our server had to be restarted, we would lose our knowledge base.

And ideally, we only want to perform the "upsert" once, and then continue to chat with that data going forward. Let's actually remove this vector store. Let's go to "Add nodes", and under "Vector stores", let's add the pinecone vector store.

Let's connect our embeddings with pinecone, as well as our document loader. Let's also connect pinecone to our conversation chain. Now we need to provide our pinecone credentials, as well as the name of our pinecone index.

We can find those by going to pinecone. io. I know pinecone used to be exceptionally expensive in the past, but since the release of pinecone serverless, it's become one of the most affordable vector store solutions out there.

Let's get started by either signing up or just logging into your account. After signing in, you should be presented with a dashboard similar to this. During this process, you might be asked to upgrade to a paid account.

But don't worry though, they actually start you off with a hundred dollars in free credits. And I've used this database extensively, and I've barely made a dent in the free credits. So although I'm not sponsored by pinecone, I highly recommend you try this out for yourself.

Let's start by creating an index. Let's give our index a name. I'll just call it Flow-wise.

Then for the dimensions, in this tutorial we'll simply stick with 1536, but OpenAI also allows for larger dimensions up to 3072 at the time of recording. I'll simply stick with 1536. So let's add that as our dimensions.

We'll leave the metric as cosine. Under capacity mode, if you haven't upgraded to the paid plan, your only option might be this starter capacity mode, which is perfectly fine for following along with this tutorial. And for everyone else who's interested in using serverless, simply select serverless and then click on create index.

So after creating the index, let's go to API keys and then click on copy key value. Then back in Flow-wise, under credentials, click on create new, give your credential a name and then paste in that API key. Then for the pinecone index, I called my index Flow-wise.

That is this name over here. And for this example, I'm actually going to scrape this Lang-chain expression language page. So I'm going to replace this URL in manage links.

I'm just going to delete these links. Let's click on additional parameters. Let's change this to web crawl.

I'll just change this to one page. Let's run absurd. And if we go back to the pinecone console, let's refresh this page.

We will see that the vectors were indeed added to the pinecone index. And in our chat, let's ask what is LCL? And indeed, we do get the correct response back.

Now just a few more tips on using these retrievals. If you wanted to see the source documents, we can simply enable this. Let's save our chat flow.

And let's ask this question again. What is LCL? And this time the bot will tell us where it fetched this information from with the links to those web pages.

Now the benefit of using something like pinecone is after we've absurd these documents, we could in theory remove the scraper as well as this text splitter because this document node is actually optional. So you could perhaps have a separate chat flow for the absurd functionality that is solely responsible for absurd thing the data and a separate flow like this that will only be responsible for retrieving the data like your chatbot. And you will notice that although I deleted the document loader, this will still work.

So let's clear this chat. Let's ask what is LCL and my chatbot still works. And then finally, I know a lot of you will be asking this in the comments is how can we add a prompt template to this in order to affect the personality or the behavior of the chatbot.

Now obviously there's no way to attach a prompt template, but if we click on additional parameters, we can set the system prompt. So you could add things in here like the name of the chatbot, the name of the company that it's assisting with, etc. If you found this video useful, then please consider subscribing to my channel and please hit the like button.