3 Simple Ways to Set Up a RAG Chatbot (Sell These for $$$)

9.44k views7660 WordsCopy TextShare

Nick Saraev

Join Maker School & get your first automation customer ⤵️ https://www.skool.com/makerschool/about M...

Video Transcript:

here are three ways to set up a rag chatbot today and the first takes just 90 seconds I find a lot of people make rag or retrieval augmented generation out to be a lot harder than it actually is in reality rag is easy as hell and I'll show you how to do it all in just a minute my name is Nick I scaled an automation agency to 72k a month using no code tools like nadn and make. com my whole thing is cut in the fluff so let's get into it all right so there are three ways to rag I'm going to show you all of them in due time the first is going to take you just 90 seconds so that's our 90c method the second is a lot more detailed and we're going to be using NN and pine cone in order to basically set up our own uh database and then run retrieval using a customized strategy then the third is going to be using a couple of thirdparty tools which I believe are probably the simplest and the easiest if you want really wide reach but this is definitely like the fastest by far the simplest way to get up and running is using openai assistant so head over to platform. open.

com playground log in if you haven't already you'll be faced with a screen to look something like this open eyes always changing their user experience so this may look a little bit different when you are checking it out this is what it looks like for me and then head over to the assistance tab you may have to make your first assistant if you haven't already done so that's what I'm going to do here and I'm just going to be doing a very simple example for the purpose of this video it's going to be like a Bitcoin assistant um it's very archetypal in reg to use the 2008 Bitcoin paper there are a lot of templates out there for nadn that show this as your like document chat example um a couple of make. com ones as well but basically I'm just going to upload the 2008 Bitcoin paper uh which is quite the quite the dry read I must say so myself and I'm going to have this model tell me something about it you can substitute that Bitcoin wi for whatever the hell you want knowledge base which I'll show you how to do when we move over to nadn in a moment uh tons of stuff but anyway as under system instructions going say you are helpful intelligent assistant that answers questions related to the seol 2008 Bitcoin paper by Satoshi Nakamoto beautiful under model you need to select GPT 40 that'll enable file search give that a click and what file search is is file search is your rag it is essentially running retrieval augmented generation under the hood for you without even really exposing or showing you the the inner workings so most for most use cases is actually totally fine and you can get away with just doing that uh which is why I'm front loading this video click files then go to upload and then just upload your knowledge Source in my case it's this Bitcoin white paper I have this like fictional company that I created over here which I'll show you guys in a in a moment when we do NAD then you're going to have some Advanced options here the two big parameters in retrieval augmented generation are U what's called chunk size and then chunk overlap if you're unfamiliar with what these mean basically you know you have some document okay and we're going rainbow mode because rainbow Mode's fun this is your document have a bunch of um colorful lines on said document now the reason why rag was invented to begin with is because back in the day AI models had very small context Windows context window is just how much stuff you can fit in a prompt basically before the model starts going haywire so because we had very small context Windows if we wanted to ask a question about a particular document and that document was longer than the context window let's say this document was 10,000 words 10,000 rainbow words and um the context window is only 5,000 you know how how are you going to get all the information of this document into the prompt in order to ask questions about it that's barring even like the quality reduction the longer the prompt gets because that's something that you need to keep in mind when you're working with AI the longer a prompt is um generally speaking there's like an inverse relationship between the length of the prompt and then the quality of the output all else held equal um you know if you add a bunch of great training examples this definitely goes up up into a point and then it goes back down but anyway if you're faced with this problem of 5,000 words uh in your context window and then you want to fit 10,000 words in there what do you do well this is where rag comes in so chunk size basically means you take this document then you split it into chunks okay so now instead of this being I don't know this big right instead of this one document having 1 2 3 4 5 six lines in it it's actually just three documents and there are two lines per document okay and these documents are really just like text files you can think of them as and that is set by our chunk size so hypothetically if this had I don't know 2,400 tokens or whatever this would be 800 up here this would be 800 over here and then this would be 800 over here hopefully that makes sense that's what chunk size refers to the next next step is chunk overlap now just because of the way that language Works a quick hack that some people found to improve the quality of retrieval augmented generation outputs um was to make it so that these actually don't just divvy up the article or the PDF or the file or whatever perfectly actually what ends up happening is it takes a little bit of the next chunk and then adds that or or appends that to the end of the first trunk and basically we just repeat that um so it's not actually like we're getting you know two out of these these six lines actually we're getting like two and a half then over here we're getting like you know the first half and then the last half and the first half and the last half this is just like how much they overlap basically um so if you have any questions uh on this I'm going to be talking a little bit more about it when we get to the n8n side of things but yeah that's more or less how that works and this is a brief little overview into how rag functions under the hood the good news is obviously we obfuscate a lot of that away just using open AI assistant and that's why I'm starting with this Okay cool so once you're done with that you've actually now created what's called a vector store under the hood we've uploaded a file that file has gone undergone a process called vectorization or embedding which has turned this big string of words and lovely Concepts into just a giant list of numbers really and this giant list of numbers can now be communicated with pretty easily by GPT 40 so um let me actually open up the paper here so we just take a look at it together and then let's ask at some something hypers specific about the paper that it would only know if it had access to that exact paper so I'm going to go downloads Bitcoin paper Okay so the traditional banking model achieves a level of privacy by limiting access to information to the parties involved in the trusted third party awesome let's see what happens when I turn file search off okay um so how does the traditional banking model um achieve privacy now right now this doesn't have access to our paper it's just working off of its brain more or less and as you see the answer here is sort of unrelated to anything of value this is not really answering a question based off the source data this is turning us to some accessory resource called M search uh it's giving us details on how we could look for key terms or whatever now if I'm a customer and I'm interacting with the file store sorry if I'm uh interacting with an assistant I ask the question I get this sort of response it's going to suck right but if I just turn on file search and ask the same question okay then I run it what you'll see is now instead of just answering the question like it did a moment ago it's actually going to be running retrieval so this means that it's actually now accessing our specific data set it's finding the string of text that most closely corresponds to this question which if we're just logical about it is probably this paragraph right over here okay it's then taking that and it's feeding it into the am model and saying hey here's the info can you answer the person's question based off of that and so in addition what we also get is we also get a citation click on this obviously it'll just go to the Bitcoin PF paper um because of the only citation we've uploaded but if you have like a thousand of these things it'll it'll find the specific one now the reason why I'm showing you guys this first is just because rag honestly they're very there there are a lot fewer there are many fewer use cases for rag today than there were just a year ago and my expectation is that in a year or two from now they're probably going to be even fewer use cases for rag the reason why to be pragmatic is just because context windows are getting enormously long you don't really have to run this sort of like strategy of chunking a document and then uploading it to some file store and vectorizing the contents and running some sort of similarity search over it like retrieval augmented generation does um because you could just copy and paste like 700 books worth of stuff and then stick it right in the prompt itself and say hey use that info to answer all my questions right so this solves a need that was itself a symptom and not like a deeper problem as technology has gotten better over the course of the last year or two um you know we just haven't really needed to use it as much and nowadays you can achieve pretty similar results just by like you know copying over the entire Bitcoin white paper this whole thing and then just pasting it directly over here essentially now I know it's pretty long and there going to be some cost issues associated with this obviously I'm not saying it's perfect but you can achieve a pretty similar result so that is number one the second way to get up and running with rag is in n8n and this is a much deeper form of going through the rag process I'm going to run you through um basically setting up a pine cone store so we're going to have pine cone and let's actually not use rainbow today I'm going to use nice blue we're going to use pine cone then we're also going to use an AI agent okay and basically the way it's going to work is we're going to have our little chat window over here this is going to be the agent and the agent is going to have a tool and this tool is going to be a vector store and so you know the way that agents work in nadn many of you are probably familiar but basically the agent makes a decision on what it should do based off of The Prompt that you give it so you say Hey you know can you tell me a little bit about this thing if the thing isn't related to the data that it has access to it's not going to use the vector store but if we upload our Bitcoin paper or in our case it's going to be like a business knowledge base and then um it has a feeling like you know the information might be in there it'll actually trigger a search then it'll go do the rag stuff grab the appropriate and relevant data from our Pine code database and then bring it back to the agent to to get answered so you know it's pretty neat again I'm going to show you how it Works under the hood do I use this stuff in my own day-to-day um you know I'm I'm mostly concerned about making money with this so no I don't but you know there's been a lot of demand for it so I figure I might as well run you through what this looks like so first things first just click add First Step then type agent I'm just going to use the default tools agent this is fine for my purposes and what I'm going to do next is I'm going to add as a chat model open AI I always use open a as a chat model for me just because it's the easiest and simplest but that doesn't mean that you have to if there's a better or simpler tool or a better simpler model um than than by any means there a bunch of options here I'm just going to leave everything as the base and then allow you guys to take this stem and customize it however you want for the purposes of this demo we're just going to use window buffer memory but there are many different types of memory that you could use if you wanted to have more context sort of like a longer chat agent and use 15 um previous uh responses as context and again the reason for that is because the context window of modern models like gbt 4 mini and whatever you are looking at as of the time of you watching this uh they're a lot longer than when the agent concept kind of came uh you know was invented essentially okay great and then the next thing we need to do is we just need to use a tool and the specific tool we're going to be using is we scroll all the way down to the bottom there's one here called Vector store okay now keep in mind there are a variety of tools that you could use that could probably answer similar sorts of information like let's say you want to do that Bitcoin example a moment ago it's just do it an naad Wikipedia would probably suffice for that right you could just have it connect to Wikipedia and then just ask Wikipedia questions instead of necessarily asking your in Vector store um but in these this hypothetical example basically what I'm going to do is I'm going to take a uh list of a big text file for a hypothetical SAS company that I called one cone and basically I'm just going to ask it a bunch of questions about one cone using this file and because of that you know because this file has so much specific information about this fictional company and the only way that the model would know anything would be if it is accessing this file and this this is something you can actually do you can actually sell to people uh which is really ultimately what I care about so we're going to go Vector store tool what I'm going to call this is exactly what the demo says company knowledge base and I'll say this gets data about one cone the SAS company with relevant knowledge bases attached the limit tab here uh just tells you how many chunks you want to pull from the knowledge base so the default is four I'm just going to stick with four for the purpose of this example but you can't actually make this really long nowadays because as I mentioned all of these things go into your prompt anyway there's no real reason why you can't set this to 20 and then kind of get away with it although this does depend on your your um chunk sizes which I'll cover next up you'll see this is red the reason why is because we need to add a vector store which is going to be our pine cone instance and then a model so under model I'm going to click this plus button and then again go down and do open AI now the way that this works under the hood just connecting this to my credential is it's basically using open AI to uh get the data using embeddings and then tell you something about that data um so that's why you need you need this because we're actually going to need to do this again in a moment then when you click Vector store there a variety of options you could choose so you can do an inmemory postc quadrant superbases a lot of people using super base now it's awesome very similar to pine cone I'm just going to c a pine cone now I already have a connection uh set up but I'm actually going to just create my own account show you guys how to do it live operation mode we're just going to leave it to retrieve documents for agent chain pine cone index I'll run you through in a second but the last thing I'm going to do just before that is notice that you have to connect this to now what's called an embedding model this embedding model is going to allow us to basically convert the um text in the documents that we're uploading into uh vectors which is pretty neat and the variety of cool ones that we have access to three large three small at a O2 I'm just going to leave it as three small for now um this is going to influence the setup of our pine cone Vector store now this is red cuz I haven't actually done the connection but just before I do and I set everything up in Pine Cone just want you to take a look at just how many like AI uh instances we we have here how many AI tools we're using to accomplish this one thing we have the open AI chat model over here this is basically just like our general purpose uh like logic engine right this is going to be choosing what tools to use it's going to be choosing when to access our uh Vector store about the knowledge base and it's going to be responsible for formulating an answers and stuff then we have uh this Vector store tool which is going to retrieve data from our pine cone Vector store okay but we've also attached to that another open AAP model and basically what's going to happen is every time that we get a request where we need to access the vector store this request is going to go over to this AI agent this AI agent is going to think a little bit and it's going to say you know we need some data so it's going to go to this Vector store tool after it goes to the vector store tool the vector store tool is going to call the specific pine cone instance that's going to run a search that this embeddings model is going to return to the Pine on Vector store and then this is going to return a result to the vector store that this model is then going to interpret okay before finally returning it to the agent returning it back to the open a chat model and returning it back to the agent to send to you so there's a lot that's going on under the hood okay like I understand that this may seem pretty simple but essentially what we're asking our AI models to do in order to get this solved involves quite a few steps it's usually why I recommend just using an open ass assistant anyway I'm going to stop harping on it for now so we need to set up a pine cone account right I'm going to go to Pine cone. now in my case I'm going to pretend that I haven't set this up yet so we're just going to sign up with a new account just going to use this one here click continue welcome Nick I'm now going to open up my email accept this thank you very much B aut full we'll go back to Pine Cone now uh I don't know what the hell happened here okay so that cop paste did not work I'm going to go back here oh I think I pasted in the entire Bitcoin paper nice that definitely isn't the code or who knows maybe it is maybe Pine con's crazy with it okay so now we need to add some additional information setting that up right now I'm just going to say I don't code no coding for me Q&A AI agents less than 100K their uh their stuff's probably going to look pretty different um as of the time that you are watching watching it now we have our API key which is cool so I'm actually just going to copy this key and as you see it says they won't show you this key again after you close this dialog so please copy this key like copy it somewhere otherwise you basically just have to create a new account and I'm saying this because I've done this a couple of times at this point um then we need to add our credential so create new credential paste this API key in okay save it um I didn't expose it at all because I just like left it um we go back here to Pine Cone I just left it as sort of these OB fiscated uh stars but maybe you want to like save this to I don't know some some file somewhere some messaging app or something just in case the first thing it's going to tell you to do is set up your API quick start with some some some pip some python package manager and all that stuff because reasing in now we don't need to worry about any of that we just go over to create index now just create something that you want to call this I'm going to call this um one let me see what was the name of our company one cone one cone okay well we're going to want to use is our configuration there are a variety of of different ones you could do um the number of Dimensions that is the standard as of the time of this recording is 1,536 as you see if you click on text embedding three large um that's the the bigger model that's a little bit smarter um it's up to 372 you basically need to match the number of Dimensions which with whatever model you're using to do the embeddings how you figure that out is you just go back here right and then go to the the specific model that that we selected for from I'm using open AI right so there are three options the large which is 372 Dimensions the small which is 1,536 or something and I believe Ada O2 is the same we might actually be able to find it I don't know if we can if not um text embedding a O2 and then just type in Dimensions yeah so that one's also 1536 and you know you you'll just get the data um uh for the specific embedding thing that you're using so I'm going to use small okay leave it as cosine that's fine uh looks like I can only contain lower case so I'm just going to go one cone it's a little bit better we're going to have our capacity mode be serverless we'll leave the cloud provider as AWS cute little smiley face man I wish my brand look like that and then I'll leave it as Virginia and I'm going to create an index now an index is basically well I don't even know how to finish that sentence it's like uh some highlevel table that organizes the rest of your database what we need to do next is we need to add some sort of record to this okay you can add a record manually so I could click add a record I could add some stuff then I can add all of my records here um but basically what you want to do instead is you want to do this through naden so you want to upload a file to naden and actually like um embed it and then add it to Pine Cone directly that way uh so basically like right right now this is this is fine we're like we're we're good we're good with this this is okay I'm going to go back oh hold on a second uh no I'm going to go back over to NAD now and then I'm just going to connect to the specific pine cone instance which is my pine cone API account to then under index because we've added one now just just click one cone okay and at this point we've set up everything that we need to get this running except for the document itself now um remember earlier I just looked at the record panel here and I tried adding something manually what you have to do in order to get something into pine cone is you have to embed the document yourself and then and then add it if that seems like witchcraft I don't worry about it just add a new trigger add the trigger manually here and what we're going to want to do is we're going to need to find a way to download a file get it in nen and then send that over to Pine Cone using this node okay this pine cone Vector store node that says add documents to a vector store so I can just reuse the same credentials I had the operation mode is going to be insert documents I'm then going to insert said document into one cone there also some um yeah there are a couple of other properties here that'll leave blank basically we can now add our embedding model which has to be the same embedding model that we're going to be pulling from so that should be text embedding three small perfect then obviously we have the specific um document here we get that using this default data loader and there's a bunch of different types of data that you can load um what we're going to do is we're going to download a file in binary and then add it you can also do Json data technically but I'm just going to say uh load specific data automatically detect by m type and then we're going to leave the input data field name as data for text splitter this is where you're going to Define your chunks basically so you can split things into chunks based off tokens the default chunk size I'm going to leave is 800 chunk overlap is going to be 400 if you remember this these are the same settings that open ai's Assistant used in order to Define our uh you know chunking um so this is like I don't want to say this is the standard or the best but sorry I don't want to say this is the best but this is definitely the standard is it right now I believe chunk overlap should be like half of whatever the chunk size is don't quote me on that if you do make sure to take out wly out of context and make me seem Super Evil okay great so now all we have to do is we have to add a file so I'm just going to go between these two nodes and there variety of ways to get a file I'm just going to upload it to Google Drve and then download the file from Google Drve here so I'm going to go drive okay then I'm going to go download this is how you get stuff into binary and NN now I've already connected my Google Drve account using um ooth if you don't if you if you don't have one what you need to do is you basically have to go and you have to set up the Google Drve API in the cloud console this can be pretty annoying so just click the documentation click on Google Drve then go to Google Drve credentials and then what you need to do is you need to set up um oo 2 which involves you more or less creating a Google API console account that'll take you to a page that looks something like this you'll have to then create an account or project as well as some credentials then from here you're going to have a client ID you're going to have a client creation date and then you're also going to have a client secret um make sure not to expose this to everybody like I'm doing right now I can only get away with this because I always rotate mine afterwards um you're going to grab those two pieces of data and if we go back to our nend rag thing you're going to have to paste the client ID and the client Secret in over here that's going to open up a page that allow you to connect to Google and then voila after that we are going to download a file now obviously I'm going to have to select the file so I go into my own Google drive right now and then upload the file so that I can download it later they're obviously automated ways that you could do this as well but this is me I'm going to go over to this one con knowledge based.

txt I'm going to drag it in should now be uploading okay it's definitely still not uploading there we go this knowledge B file is called one Conor knowledge base. txt looks good to me I'm going to go back to NAD now I'm going to select a file want to type in one cone it's going to grab that file okay great so basically if we test this now uh we'll have the file in binary okay because it's in binary the cool thing is what you can do so you could you can then feed that into this Pine con store let me just um I don't think you can pin binary data right no you can't that's fine we're then going to insert this data into pine cone we're now executing this node so we've now inserted this file into our record okay this is what it looks like it's parsed it looks really nice it's then ran through the default data loader and it's actually chunked it as we discussed now I believe here we only have two chunks so um our file was not super long to begin with but if you're uploading a longer file you'll probably have much more actually for the purpose of this demo why don't we do like chunk size 200 with overlap of 50 I don't just want to leave it there let me go back to Pine Cone and just remove this we we have this right now right but I'm just going to delete both of these let's do that one more time except with a much much smaller uh number of chunks so you guys can see what it looks like great so now we have obviously much smaller chunks and we have many this is probably what yours are going to look like when you upload like a super big um you know company knowledge base and now we have our knowledge base inside of pine con how cool is that the only thing we have to do now is we we have to talk with it right so that's what I'm going to do up here I'm going to go chat let me just exit out all of the test data and I'll say hey what's up so first I'm just going to ask a basic question notice how we went down the chat model route and then the window buffer memory route we did not access the vector store the reason why is because our agent determined that we did not need to consult this third party database basically in order to answer said question now what I'm going to do is I'm going to find some data that's hyp specific about one cone I'm going to ask it a question where we can get that let's see I'm going to have it tell me about one con pricing so you see up here where it says one cone pricing follows the subscription model why don't we ask it tell me about one con pricing now it's going to consult our Vector store access Pine phone get the embeddings return them consult our next chat model return the vector store data to the original agent then consult the model on how to formulate all this and then send it back and if we go back to the chat what you see is we've now retrieved all of the data that we've just chunked using our own custom chunk size of 200 with a 50 overlap so um you know couple of couple of benefits and drawbacks to this if we go down the right hand side of this log okay basically what we did at the very top here is we this is our AI agent log for those of you that aren't familiar shows us the stack of things that are called um starting at the very first thing and then the thing that needed to be called to answer the first thing so in our case tell me about one con's pricing right from there that we fed that prompt into our open aai Assistant our open AI assistant sorry our our openai chat model then called the vector store tool with this query what is one cones pricing model then the vector store consulted um the settings that we we created which are that we're going to bring four results back from there um we fed those four results back here the most relevant four results I should say now I think in this case it looks like it was only one result because the other ones probably weren't relevant it looks like we actually got all the data just in one chunk then that fed back into the opena chat model and then that actually you know was capable of um of answering our question down here which is pretty cool so you know we are doing a lot of stuff uh things are working in concert with each other uh pretty cool to see as you can imagine you know if you were to create some sort of sales agent or something like that which I think nowadays is one of the main purposes you could have uh your main route which is just like a wonderful prompt that just answers most general purpose questions but then if somebody asks a hypers specific question about your product or your business or whatever you could go through the whole Vector store thing um chunk those documents add them all to Pine Cone and then add the logic like we've done here to like retrieve dynamically the specific ones that matter although I just want you to keep in mind this entire time that if you do that don't just contrast that against nothing contrast that against the other situation where if you could have just copied and pasted all that into the prompt to begin with right you know it's not all or nothing here it's um yeah there there's an opportunity cost all of it I guess is what I'm trying to say okay so that is Method number two why don't we cover method number three which I think is much much easier and probably a little bit faster as well method number three is using a third party now in my case I'm going to be using a simple third party called bot to press but there are many third parties that you could use if you want to get this done and I'm not affiliated with bot press I believe I have an affiliate code actually somewhere I don't think I'm going to put it in this video not that I don't absolutely love bot press but basically the way that it works is uh it's just like an AI rag platform where you can create your own chat Bots upload specific sorts of data to the chatbot and then whereas the open AI assistant and then the n8n rag chat Bots um these are sort of constrained right like the open AI assistant you can only really communicate with it through the open a playground the um chat bot over here you can only really communicate with if you make it live um you know and then chat with it through some sort of hosted link um uh you can also embed this on your website of course the really cool thing about bot press is you can go one step further so instead of just being constrained to what we have here um you can upload you can have your Bot press trap up be in like a WhatsApp or a telegram or something like that um and a lot of people like this right because there are a lot of tools out there that now or there are a lot of use cases out there that that require the use of you know WhatsApp as a tool telegram as a tool lots of communities are cropping up over here um you can also do stuff with make.