Step-By-Step: Add 100 Files to Pinecone for RAG AI Agent with n8n

37.51k views3288 WordsCopy TextShare

Nate Herk | AI Automation

JOIN THE FREE SKOOL COMMUNITY👇 https://www.skool.com/ai-automation-society-3440/about 🚧 Start Bui...

Video Transcript:

a few weeks ago I made a video about an RG AI agent where we pushed a PDF into a vector database and then we were able to chat with the agent to get answers from the PDF so after that video a lot of people started asking me well what if we have hundreds of PDFs or hundreds of different documents that we want to push into a vector database and have that database keep growing and we don't want to have to manually you know test workflow every time in order to send it through into pine cone

every single time so in this video that's what we're going to be doing and it may be a lot simpler than you think if if you haven't already watched that RG video I would maybe do that real quick so that you understand the whole process of you know retrieval augmented generation and pushing data into pine cone you'll understand indexes and name spaces a little bit more um and then you can come over here and then this video will make a bit more sense and also like setting up pine cone in the the API keys and

all that kind of stuff and finally before we get into the video just wanted to say thank you guys so much the amount of support and feedback I've been getting from you all has been overwhelming and I've been super happy to hear that you guys are learning from my content and enjoying the content um please hop into the school Community the link will be down in the description it's 100% free it's really nice to get to interact with you guys and I'll see you in there so for the purpose of this video I'm going to

be pretending that I own a restaurant and I have all this information that I want to use for an internal sort of agent where my staff can ask questions about our current reviews current menu changes promotions uh policies stuff like that could also be great for training but what we have here is all the information I have a bunch of Google docs that I had chat gbt write up some dummy data for me and so we've got these documents and then I just wanted to put them within a single folder in my Google Drive so

I've got a folder here just called data and I've got all of the documents in here and this is really how we're going to be able to get everything we want into pine cone without having to manually do it for each you know each document each file whatever it may be so now that you've got all your information that you want in your database into a single folder let's hop into naden now that we're ready to start building the workflow we have to add the first step which is always going to be a trigger so

we're going to do trigger manually here just so we can hit it test workflow and then it will you know search the folder grab the files put them into pine cone for us but in the future once you already have your original database set up and all you need to do is add more information to it you could have this trigger be like a Gmail folder trigger that way every time you add a file to that folder that it's pulling from then it will run through and just put it in the database for you automatically

so you don't have to come back here and test it manually but that's um you know in the future right now we're just setting up the original database so we've got our trigger now we want to go and grab a Google Drive node to be able to actually um find the folder that we're searching through so we're going to grab search files and folders I know it's a little confusing because you see like download file you see these folder actions but we're going to be searching files and folders to find the folder and then we

can have another one after this to download the information so as always configure this node get set up with your account the resources file folder the operation is that we're searching for one and then you have this query here which a little confusing just don't touch it this is where you may think you need to put in parameters to find a folder but all we're going to do is we're going to go to filter and click on folder then we have from list which is super nice that NN lets you just choose from a list

and the only folder I've got in my drive right now is data so we're going to grab that one I'm going to hit return all because we want to get all the information back from this folder and then we'll just hit test step and we'll see I think it's 14 documents so we've got all 14 documents coming back here we'll switch to Json real quick as you can see all we're getting back about these folders is we're getting the ID and then we're getting the name of the folder so we got feedback special events reservations

loyalty program promotions stuff like that but it's just important to know we're not actually getting any content from these files here all we're getting is ID and name so that's why we need to come in here and grab another node which is going to be another Google drive but this time we're going to grab the download file because this one will actually give us the information from the file so I'm just going to call this one get content just so we know what's going on we'll have this configured already we're looking for a file we

want to download the file and instead of choosing from a list we are going to do by ID this way we can just grab the IDS that are coming in from the previous node so we'll hit schema up here in the top left that way we can drag and drop the ID right here that was from the Google Drive node previously and now we have ID and it's going to be an expression so it'll get content for every single ID of a file that's coming in from the previous Google Drive node so we we'll hit

test up here this one's going to take a little longer cuz it's going to be searching through those 14 files and pulling the information back so we'll just give this one a second okay this just finished running as you can see we've got 14 files all of them are here this last one is for orders the 15th one or 13th one is for staff this one's for reservation policy as you can see but what's important to note here is that this is coming out as binary so Json we're just getting the original IDs and names

but the binary data that we're getting is the actual content so that's important to know um for future steps so just keep that in mind this is binary data so now what we want to do is we want to add a loop and we're doing it this way because you know I have 14 files but if you did have hundreds and hundreds I don't know if you'd be able to just take all 14 and put them straight into pine cone I haven't tested with hundreds but you definitely could do that I've done that in the

past where you don't Loop and it's fine but for the purpose of this video let me just show you how the loop is going to work it's going to be taking batch sizes of one which basically means it's going to grab the first file right here and then it's going to run it through pine cone once we configure that and then once it's um once it's put that first file into pine cone it's going to come back here and then it's going to grab the second file put it in Pine Cone then grab the third

one put it in Pine Cone so that's how it's going to work and we'll see all of that play out so we've got the loop set up here we'll get get rid of this and we're going to set up the actual Loop so in here what we want to Loop is the pine cone Vector store we're going to add a document set up your API um or sorry your credential which is an API key in Pine Cone obviously you've got your indexes and then you have your API keys right here you'll just copy this value

and paste it back in nadn that's all you're going to do there the operation as you can see we can retrieve update or insert right now we're just updating our database or configuring it setting it up so we're going to insert and then in Pine Cone I don't know if you saw but my index was called sample so we're going to grab sample and then finally you have the option to add this to a namespace if you want um I think it's beneficial to keep all your information organized it also can help your agent find

things quicker if it knows which namespace to look within each index so we're going to be calling this namespace restaurant because it's information pertaining to the restaurant and now we're good to configure the rest of this node and then we can test it out so we're going to grab an embedding set up the credential when we set up our index called sample we set it up with the embedding of three small so we're going to choose three small and then this is where I was talking about we need to keep in mind that the information

we're getting from each file is binary so we're setting up our document loader we've got default document loader right here types of data coming in we want to change this from Json to Binary because the Json all that's processing if you remember was the ID of the file and the name of the file but the binary data that's going to be the actual content that it's going to find within each document so we've got binary we can leave that as is and then finally all we need to do is add a text splitter down here

character text splitter um we're going to leave the chunk size or sorry actually I don't want to use character I want to use um recursive because this one sort of keeps things um like as you can see split text in the chunks by characters recursively recommended for most use cases it's going to keep the context of what's going on so if you if it's getting chunked in between sentences it'll kind of keep those related so just better for giving the agent context of what's going on rather than just a ton of information with like a

token splitter so we're going to do recursive here we'll just keep the chunk size it's going to grab th characters at a time and it's not going to overlap anything so we're going to keep that as is we'll hit save and then finally we need to make it loop back so after it grabs the first item goes through here and then it would just end but we need to grab this and put it back to the loop so that it will keep going until all 14 are done so let's test this out real quick um

as you can see it's going to search the folder it got all the IDS now it's going to grab the content this is again going to take maybe 30 40 seconds so we'll give it a sec but then we'll see it actually Loop through okay just finished now it's got the first item it's embedding it second item third fourth fifth so as you can see it's going through and doing that for each one so this is the way you want to do it if you have you know 100 documents within this folder it'll grab all

100 here and then it's going to Loop through and embed each one into your vector database so that's done let's head over to Pine Cone this is our index called sample we'll click into it and we can see the data that's come through we've got our name space called restaurant and it's got 14 vectors in there so we're good to go now we can hop back into NN and then just quickly set up a super simple agent in order to talk to our restaurant information and see what's coming back so new workflow here this is

going to be a demo restaurant agent once again we need to set up the first step which is a trigger and we're going to be talking to this agent so we're going to set up a chat message received we're going to come in here and add the actual agent itself so AI agent and let's just keep this one as a tools agent for now we can do a simple prompt once we get things set up the chat model I'm going to be using for this agent um I'm pretty loyal to the open AI I've been

using the 40 pretty much for everything unless maybe it's a smaller task like like labeling things but 40 for you know reasoning and for conversational things so 40 here memory I'm going to grab the window buffer memory super easy to set up you literally just click on it and it's going to give your agent context of the conversation so the context window length is going to be five chats is how much it will remember if you don't set this up it'll still work but you won't be able to reference a previous question it's just going

to reset its memory completely after each question answer chain so we've got our memory got the model now we just need to add the tool which in this case is a vector store so we've got our Vector store we'll just call this data uh the description for this tool is going to be call this tool to um let's say access the database to answer the users's question okay so that's good to go we need to add the model real quick once again open AI we're going to grab our 40 and then we need to set

up the Vector store so we did pine cone it's right here set up the credentials this time we're not inserting obviously we're going to be retrieving for agent so we've got that and now we finally just need to set up the embedding once again we did three small so we're going to come in here and just grab three small real quick and we should be good to go what else did we forget here oh the actual index of course so the index is sample and we need to make sure it's pulling from the name space

of restaurant and make sure that that's spelled um the same way it is so restaurant restaurant okay restaurant's a tough word to spell I always have to Google it so we're good here all right so I'm just going to give this guy a super quick prompt and I'll be back in a sec so I just came in here and said you're a restaurant assistant your job is to answer questions from the staff about the menu feedback policies hours of operation Etc all this information can be found in the vector store tool so call that time

that tool each time you were asked a question to ensure you are providing the staff with accurate information please be friendly and throw in some jokes in emojis so this is not the optimal way to prompt an agent obviously for the sake of this video you know I just wanted to show you guys how to get a ton of documents into pine cone in one Fell Swoop rather than doing it all manually so this should suffice in this case but let's talk to this agent and see what it's got for us so what's on the

menu let's just try that I know it's really simple let's see what it says okay so here's the delicious lineup we've got for you today we' got appetizers it even gives us the price we've got courses tiasu heads up cheesecake lovers Che cheesecake is not available today Feast your eyes and your taste bud so it gives us information it it's friendly it threw in some emojis um let's see what else we can ask it let me just go back to these documents we've got special events loyalty programs let's ask about some customer reviews so what

are the customers thinking about us okay so here's what our fabulous customers are saying about us John Doe was impressed with Michael Brown our waiter praising him for being attentive and polite Jane Smith gave a shout out to Sarah Williams our Chef for preparing the steak exactly as she wanted and then at the end it says looks like we're serving satisfaction with every dish and service keep up the great work team nice um let's just do one more real quick about um I don't know let's let's let's ask about the suppliers so who who are

our suppliers okay so yeah I know this is a simple example simple data but just goes to show how quick it is to get something like this set up and um once you have something like this set up how you can expand on it but let's see we got our culinary creators are supported by some top-notch suppliers we've got fresh farm produce Laura green we've got Prime meets from John Carter Baker's best lots of emojis very colorful with this dream team it's no wonder our dishes are always a hit okay so as you can see

it's doing what we wanted it's friendly giving emojis and that was just a really simple prompt but it's working as it should it's grabbing all that information from the Vector store tool which we set up really quickly we put 14 documents in there with a loop so I hope that that answered your guys' questions about how can you get you know lots of documents rather than just one into um pine cone the things to keep in mind the first time when you're searching through the folder just remember it's not actually grabbing content so you need

to do another node to grab content and then just remember with um the content coming through just making sure you know if it's binary or if it's going to be stored in Json that way you're embedding it correctly and you're actually getting information you want so with that all I've got for you guys today really appreciate all the support like I said again I'm excited to keep making videos for you all so let me know what you want to see and um thanks guys