<b>Flow-wise makes it easy to create</b> <b>chatbots that are able to answer</b> <b>questions from our own data sources,</b> <b>like PDF documents, websites, databases,</b> <b>or pretty much whatever you want. </b> <b>This is useful for creating chatbots that</b> <b>are able to answer questions about our</b> <b>businesses, services, or products. </b> <b>And this can also be used to create some</b> <b>interesting use cases,</b> <b>like persona chatbots.
</b> <b>We used this technique to create an AI</b> <b>clone of Mreast in one</b> <b>of my previous videos. </b> <b>So in this video you will learn a lot. </b> <b>We will create a retrieval chatbot that</b> <b>is able to answer</b> <b>questions from our own data source,</b> <b>and we will also have a look at storing</b> <b>this knowledge base in a pinecone</b> <b>serverless vector store.
</b> <b>Before we jump into the chat flow, I</b> <b>first want to explain the concept of RAG,</b> <b>or retrieval augmented generation. </b> <b>So let's swap over to</b> <b>chatgpt just to demonstrate this. </b> <b>One of the limitations of AI models is</b> <b>that they can only accurately answer</b> <b>questions based on the data</b> <b>that they were trained on.
</b> <b>If we ask questions about recent events</b> <b>or things like the current weather,</b> <b>the model will typically respond with a</b> <b>message saying that it</b> <b>doesn't know the answer. </b> <b>And if I had to ask it a question from</b> <b>the flowwise documentation,</b> <b>chatgpt won't be able to provide an</b> <b>answer since this information was not</b> <b>included in chatgpt's training data. </b> <b>For instance, let's ask it how we can</b> <b>deploy flowwise to render.
</b> <b>And even worse, the answer that chatgpt</b> <b>is giving us is actually incorrect,</b> <b>as it's actually making</b> <b>assumptions about what flowwise is,</b> <b>and it's assuming that</b> <b>flowwise is using Python. </b> <b>And this is not at all what the</b> <b>recommended steps in the</b> <b>flowwise documentation are. </b> <b>Now let's have a look at</b> <b>how we can fix this response.
</b> <b>And what we could do is to include</b> <b>context in this prompt. </b> <b>So for example, in our prompt</b> <b>we could say something like,</b> <b>"Answer the user's question</b> <b>from the following context. "</b> <b>Then what we could do is inject some</b> <b>context into this prompt,</b> <b>and finally we would</b> <b>then ask our question again,</b> <b>which was, "How do I</b> <b>deploy flowwise to render?
"</b> <b>So in order to improve this response, we</b> <b>could now inject this context,</b> <b>which is effectively</b> <b>all the text on this page,</b> <b>which I could simply just copy from this</b> <b>page and then paste into this prompt. </b> <b>Let's send this prompt, and we are now</b> <b>receiving the correct response. </b> <b>We're almost done with the theory,</b> <b>but I do think it's important for you to</b> <b>understand how RAG actually works.
</b> <b>Obviously we will not be</b> <b>hard coding the context,</b> <b>but instead flowwise will retrieve the</b> <b>context from a data source,</b> <b>and then dynamically inject that context</b> <b>into the final prompt. </b> <b>And I think this diagram in the Langchain</b> <b>documentation explains it best. </b> <b>We will connect our chat flow to a data</b> <b>source, which could be anything.
</b> <b>This could be a website, a</b> <b>PDF document, a Word document,</b> <b>pretty much any data</b> <b>source you can imagine. </b> <b>We will then load the content of that</b> <b>data source into our application</b> <b>as one massive piece of text. </b> <b>Now in order to reduce the token usage,</b> <b>we will then split this</b> <b>data up into smaller chunks,</b> <b>and these chunks will then be converted</b> <b>into a vector representation,</b> <b>which is something that a vector store</b> <b>can use to only retrieve</b> <b>the most relevant documents.
</b> <b>And this will make sense once we start</b> <b>implementing this project,</b> <b>and these embeddings will then be stored</b> <b>in a vector database. </b> <b>And the final step is the retrieval step. </b> <b>So when we ask our chatbot a question,</b> <b>the chatbot will reach</b> <b>out to this vector database,</b> <b>and it will ask the database to retrieve</b> <b>the most relevant documents to our query,</b> <b>and those documents will then be injected</b> <b>into our prompt as context.
</b> <b>I think that's enough theory for now. </b> <b>Let's create our retrieval chatbot. </b> <b>In our Flow-wise</b> <b>dashboard, let's click on Add New.
</b> <b>Let's save this. </b> <b>I'll call mine rag chatbot. </b> <b>Let's save this, and let's start off by</b> <b>adding a chain to our project.
</b> <b>So under chains, let's add the</b> <b>conversational retrieval chain. </b> <b>This is similar to the conversational</b> <b>chain that we created</b> <b>in the previous video,</b> <b>where we have to add a chat model, and</b> <b>optionally we can add memory,</b> <b>but this also allows us to retrieve data</b> <b>from a vector store. </b> <b>Let's start by adding the chat model.
</b> <b>So under Add Nodes, go to Chat Models,</b> <b>and let's add the chat openai model. </b> <b>And let's attach this to our chain. </b> <b>Let's also select our credentials,</b> <b>and let's change the</b> <b>temperature to something like 0.
4. </b> <b>And this is because we don't want the</b> <b>model to be too creative,</b> <b>and it should ideally only answer the</b> <b>questions from the context. </b> <b>We're not going to add</b> <b>memory in this tutorial,</b> <b>as we already covered</b> <b>that in the previous video,</b> <b>so let's simply focus on</b> <b>this vector store retrieval.
</b> <b>We will have a look at adding pinecone to</b> <b>this project later on in this video,</b> <b>but for now, let's click on Add Nodes,</b> <b>and let's go to Vector Stores. </b> <b>Let's add the in-memory vector store. </b> <b>Let's connect this to our chain,</b> <b>and this vector store</b> <b>node takes in two properties,</b> <b>a document as well as embeddings.
</b> <b>So this embeddings input relates to this</b> <b>embed step in this diagram. </b> <b>This is where we will take the data that</b> <b>we retrieved from our data source,</b> <b>and we will use an embed function to</b> <b>convert the data into a</b> <b>numeric representation,</b> <b>which we can store in this vector store. </b> <b>So to add this embeddings function, we</b> <b>can click on Add Nodes.
</b> <b>Let's go to Embeddings,</b> <b>and the embedding node that we have to</b> <b>add will greatly depend on</b> <b>the model that you're using. </b> <b>And since we're using an OpenAI model,</b> <b>I'll simply add the</b> <b>OpenAI embedding node,</b> <b>and let's attach this</b> <b>to the vector store. </b> <b>We can leave the model name as is,</b> <b>and let's select our OpenAI credentials.
</b> <b>Now all that's left to do is to attach</b> <b>our document loader. </b> <b>Let's go to Add Nodes,</b> <b>let's go to Document Loaders,</b> <b>and here you will find a list of document</b> <b>loaders for pretty much any</b> <b>data source you can imagine. </b> <b>We can use the Cheerio web scraper to</b> <b>grab information from a website.
</b> <b>We can upload CSV files, Word documents. </b> <b>You can even select a folder with</b> <b>multiple different file types. </b> <b>You can connect a Notion database,</b> <b>we can upload PDF files, etc.
</b> <b>And something that's very popular as well</b> <b>is to fetch your files</b> <b>from an AWS S3 bucket. </b> <b>In this video, we'll have</b> <b>a look at the PDF uploader,</b> <b>and we'll also have a look</b> <b>at the Cheerio web scraper. </b> <b>Let's first have a look</b> <b>at the PDF file uploader,</b> <b>and let's attach this to</b> <b>our vector store as well.
</b> <b>And you might notice that this flow is</b> <b>starting to look very familiar,</b> <b>and that is the same flow</b> <b>as this diagram over here,</b> <b>with the exception of splitting the</b> <b>document into smaller chunks. </b> <b>And let me explain why you would want to</b> <b>split this data up into smaller chunks. </b> <b>Let's go to the Langchain Expression</b> <b>Language page as an example.
</b> <b>So this page briefly</b> <b>explains what LCEL is,</b> <b>and it then also goes on to describe some</b> <b>of the other features about LCEL. </b> <b>So let's say I wanted to scrape the</b> <b>information from this page,</b> <b>and then ask the model</b> <b>what LCEL stands for. </b> <b>The scraper will extract all the</b> <b>information from this page,</b> <b>including the content from the menu, the</b> <b>sidebar, and all of this</b> <b>information on this page.
</b> <b>And this will be</b> <b>injected into the context. </b> <b>We will get an accurate</b> <b>answer from the chatbot,</b> <b>but there is a massive issue with passing</b> <b>all of this information into the context. </b> <b>The first issue is that all models have</b> <b>some sort of context limit,</b> <b>and by passing in the entire page, we</b> <b>could very easily exceed this limit.
</b> <b>And secondly, in the context of OpenAI,</b> <b>we are charged for the</b> <b>amount of tokens that we use. </b> <b>So by splitting this</b> <b>content up into smaller chunks,</b> <b>we could create separate documents that</b> <b>contain the definition of LCEL,</b> <b>and perhaps another document for</b> <b>streaming support, etc. </b> <b>And then when we ask the model about the</b> <b>definition of Langchain,</b> <b>the vector store will only</b> <b>pass back this section over here,</b> <b>and only this will be injected into our</b> <b>prompt, and not the entire page.
</b> <b>So let's have a look at how we can chunk</b> <b>this data by adding a text splitter. </b> <b>Under Add Nodes, let's</b> <b>go to Text Splitters,</b> <b>and from here, let's add the recursive</b> <b>character text splitter,</b> <b>and let's attach this</b> <b>to our document loader. </b> <b>Here we can specify the</b> <b>chunk size in characters.
</b> <b>Let's leave it as 1000 characters, and we</b> <b>can also specify a chunk overlap. </b> <b>Let's make this</b> <b>something like 50 characters. </b> <b>I do recommend playing with these values,</b> <b>and if you find that your</b> <b>responses are not accurate,</b> <b>simply increase this chunk</b> <b>size until the responses improve.
</b> <b>Of course, there are different types of</b> <b>text splitters as well,</b> <b>so you might want to also play with the</b> <b>different text splitters. </b> <b>For instance, there is a text splitter</b> <b>that is great at</b> <b>splitting code into chunks,</b> <b>and I found that the HTML</b> <b>to mark down text splitter</b> <b>works fantastic when</b> <b>scraping information from a website. </b> <b>Let's upload a PDF file.
</b> <b>For this example, I've downloaded the</b> <b>latest TSLR financial statement. </b> <b>So I'll go ahead and upload this file. </b> <b>I'm actually going to change</b> <b>this to one document per file,</b> <b>and I'll allow the text splitter to break</b> <b>this information up</b> <b>into different chunks.
</b> <b>Alternatively, you could just leave it on</b> <b>one document per page. </b> <b>Let's go ahead and save this chat flow. </b> <b>When dealing with</b> <b>these retrieval chatbots,</b> <b>there are basically two phases to dealing</b> <b>with these retrieval chatbots.
</b> <b>The first phase is to up-cert our data,</b> <b>which simply means fetching</b> <b>the data from this data source</b> <b>and then loading it</b> <b>into the vector store. </b> <b>The second phase is</b> <b>simply ingesting that data,</b> <b>which simply means chatting with the data</b> <b>in this vector store. </b> <b>Now, in order to up-cert</b> <b>this data into the database,</b> <b>we can simply click on this green button,</b> <b>and this will bring up this popup.
</b> <b>Here we can see the different steps</b> <b>involved in loading this data. </b> <b>And what's also cool about Flow-wise is</b> <b>that when we click on Show API,</b> <b>we can actually get the API endpoint for</b> <b>triggering this up-cert</b> <b>from our custom solutions. </b> <b>But in order to load this data using</b> <b>Flow-wise, we can</b> <b>simply click on up-cert.
</b> <b>And I'm getting a message saying that the</b> <b>up-cert was successful. </b> <b>So let's close this</b> <b>popup, and let's try this out. </b> <b>In the chat, let's ask a question</b> <b>relevant to this PDF document.
</b> <b>So let's ask it something like, "What</b> <b>were the total assets in 2023? "</b> <b>And we should get this amount back, which</b> <b>was 93,941 million USD. </b> <b>"What were the total assets in 2023?
"</b> <b>And indeed, we get</b> <b>the correct answer back. </b> <b>Now, let's have a look at how we can</b> <b>scrape information from a website. </b> <b>And specifically, let's have a look at</b> <b>scraping information</b> <b>from the Flow-wise documentation.
</b> <b>How I can deploy Flow-wise to render. </b> <b>Let's go back to our chat flow, and let's</b> <b>delete this PDF document loader. </b> <b>Instead, let's go to Add Nodes, let's go</b> <b>to Document Loaders,</b> <b>and let's add this Cheerio web scraper.
</b> <b>Let's attach this web</b> <b>scraper to the vector store,</b> <b>and let's also connect our text splitter</b> <b>to the document loader. </b> <b>This web scraper takes in a URL as input. </b> <b>So let's copy this URL</b> <b>from the documentation,</b> <b>and let's paste it into this node.
</b> <b>Let's test this out by</b> <b>saving this chat flow. </b> <b>Let's click on "upsert vector database". </b> <b>Let's click on "upsert".
</b> <b>Let's close this</b> <b>popup, and let's try this. </b> <b>So in the chat, let's ask, "How can I</b> <b>deploy Flow-wise to render? "</b> <b>And as you can see, we do</b> <b>get the correct response back.
</b> <b>But let's try asking a question that's</b> <b>actually not in the context,</b> <b>like, "How can I deploy Flow-wise to</b> <b>AWS? " as an example. </b> <b>Let's try that.
</b> <b>So let's send this, and we</b> <b>should get an answer like this. </b> <b>And this is fantastic,</b> <b>because this means that the chatbot</b> <b>will not hallucinate any answers,</b> <b>and it will only provide answers from the</b> <b>context that we provided. </b> <b>So I do want to mention that it is</b> <b>possible to fetch more</b> <b>than one URL at a time,</b> <b>and you could scrape all</b> <b>the content off your website.
</b> <b>You can do that by</b> <b>clicking on additional parameters,</b> <b>then changes get relative links method to</b> <b>web crawl or scrape XML sitemap. </b> <b>So we could select sitemap. </b> <b>You can also specify the</b> <b>maximum amount of pages.
</b> <b>That should be scraped. </b> <b>This can take some time to complete,</b> <b>so I'm actually going to change this to</b> <b>something like five pages. </b> <b>Let's close this popup, then</b> <b>let's go to "manage links",</b> <b>and let's replace this</b> <b>with a link to the sitemap.
</b> <b>We can then click on "fetch links",</b> <b>and this will show us</b> <b>which pages will be scraped. </b> <b>So I can actually see that this getting</b> <b>started page is included in this list,</b> <b>and that refers to this page in the</b> <b>Flow-wise documentation</b> <b>that explains how to install Flow-wise. </b> <b>So we will be asking</b> <b>questions about this then.
</b> <b>Let's click on "save", let's save the</b> <b>chat flow, and let's "upsert" this. </b> <b>Let's click on "upsert", and this might</b> <b>take a minute or two to complete. </b> <b>All right, so this is complete.
</b> <b>Let's test this out. </b> <b>So in the chat, let's ask a question</b> <b>about this getting started page. </b> <b>Let's ask something like, "How can I set</b> <b>up Flow-wise as a developer?
"</b> <b>And that is referring to</b> <b>this section over here,</b> <b>where we effectively have</b> <b>to clone the repository,</b> <b>and then run "yarn install, yarn bold",</b> <b>and then "yarn start". </b> <b>Let's see what we got back. </b> <b>Indeed, it's saying we have to clone the</b> <b>repository, CD into Flow-wise,</b> <b>run "yarn install", then</b> <b>"yarn bold", then "yarn start".
</b> <b>Excellent. </b> <b>Now finally, let's have a look at how we</b> <b>can add a pinecone</b> <b>serverless database to this project. </b> <b>The issue with this in memory vector</b> <b>store is if our</b> <b>server had to be restarted,</b> <b>we would lose our knowledge base.
</b> <b>And ideally, we only want</b> <b>to perform the "upsert" once,</b> <b>and then continue to chat</b> <b>with that data going forward. </b> <b>Let's actually remove this vector store. </b> <b>Let's go to "Add nodes", and under</b> <b>"Vector stores", let's add</b> <b>the pinecone vector store.
</b> <b>Let's connect our embeddings with</b> <b>pinecone, as well as our document loader. </b> <b>Let's also connect pinecone</b> <b>to our conversation chain. </b> <b>Now we need to provide our pinecone</b> <b>credentials, as well as the</b> <b>name of our pinecone index.
</b> <b>We can find those by</b> <b>going to pinecone. io. </b> <b>I know pinecone used to be exceptionally</b> <b>expensive in the past,</b> <b>but since the release</b> <b>of pinecone serverless,</b> <b>it's become one of the most affordable</b> <b>vector store solutions out there.
</b> <b>Let's get started by either signing up or</b> <b>just logging into your account. </b> <b>After signing in, you should be presented</b> <b>with a dashboard similar to this. </b> <b>During this process, you might be asked</b> <b>to upgrade to a paid account.
</b> <b>But don't worry though, they actually</b> <b>start you off with a</b> <b>hundred dollars in free credits. </b> <b>And I've used this database extensively,</b> <b>and I've barely made a</b> <b>dent in the free credits. </b> <b>So although I'm not sponsored by</b> <b>pinecone, I highly recommend</b> <b>you try this out for yourself.
</b> <b>Let's start by creating an index. </b> <b>Let's give our index a name. </b> <b>I'll just call it Flow-wise.
</b> <b>Then for the dimensions, in this tutorial</b> <b>we'll simply stick with 1536,</b> <b>but OpenAI also allows for larger</b> <b>dimensions up to 3072</b> <b>at the time of recording. </b> <b>I'll simply stick with 1536. </b> <b>So let's add that as our dimensions.
</b> <b>We'll leave the metric as cosine. </b> <b>Under capacity mode, if you haven't</b> <b>upgraded to the paid plan,</b> <b>your only option might be</b> <b>this starter capacity mode,</b> <b>which is perfectly fine for following</b> <b>along with this tutorial. </b> <b>And for everyone else who's</b> <b>interested in using serverless,</b> <b>simply select serverless and</b> <b>then click on create index.
</b> <b>So after creating the index, let's go to</b> <b>API keys and then</b> <b>click on copy key value. </b> <b>Then back in Flow-wise, under</b> <b>credentials, click on create new,</b> <b>give your credential a name and then</b> <b>paste in that API key. </b> <b>Then for the pinecone index,</b> <b>I called my index Flow-wise.
</b> <b>That is this name over here. </b> <b>And for this example, I'm actually going</b> <b>to scrape this Lang-chain</b> <b>expression language page. </b> <b>So I'm going to replace</b> <b>this URL in manage links.
</b> <b>I'm just going to delete these links. </b> <b>Let's click on additional parameters. </b> <b>Let's change this to web crawl.
</b> <b>I'll just change this to one page. </b> <b>Let's run absurd. </b> <b>And if we go back to the pinecone</b> <b>console, let's refresh this page.
</b> <b>We will see that the vectors were indeed</b> <b>added to the pinecone index. </b> <b>And in our chat, let's ask what is LCL? </b> <b>And indeed, we do get</b> <b>the correct response back.
</b> <b>Now just a few more tips</b> <b>on using these retrievals. </b> <b>If you wanted to see the source</b> <b>documents, we can simply enable this. </b> <b>Let's save our chat flow.
</b> <b>And let's ask this question again. </b> <b>What is LCL? </b> <b>And this time the bot will tell us where</b> <b>it fetched this information from</b> <b>with the links to those web pages.
</b> <b>Now the benefit of using</b> <b>something like pinecone</b> <b>is after we've absurd these documents, we</b> <b>could in theory remove the scraper</b> <b>as well as this text splitter because</b> <b>this document node is actually optional. </b> <b>So you could perhaps have a separate chat</b> <b>flow for the absurd functionality</b> <b>that is solely responsible for absurd</b> <b>thing the data and a</b> <b>separate flow like this</b> <b>that will only be responsible for</b> <b>retrieving the data like your chatbot. </b> <b>And you will notice that although I</b> <b>deleted the document</b> <b>loader, this will still work.
</b> <b>So let's clear this chat. </b> <b>Let's ask what is LCL</b> <b>and my chatbot still works. </b> <b>And then finally, I know a lot of you</b> <b>will be asking this in the comments</b> <b>is how can we add a prompt template to</b> <b>this in order to affect the personality</b> <b>or the behavior of the chatbot.
</b> <b>Now obviously there's no way</b> <b>to attach a prompt template,</b> <b>but if we click on additional parameters,</b> <b>we can set the system prompt. </b> <b>So you could add things in here like the</b> <b>name of the chatbot,</b> <b>the name of the company that it's</b> <b>assisting with, etc. </b> <b>If you found this video useful, then</b> <b>please consider subscribing to my channel</b> <b>and please hit the like button.