Let's build a RAG system - The Ollama Course

13.39k views1362 WordsCopy TextShare

Matt Williams

In this next installment of the Ollama Course, we are going to use what we learned in the Intro To R...

Video Transcript:

in recent videos we looked at what rag or retrieval augmented generation is at a high level and then we explored one of the core components of that embeddings in this video let's put those two elements together to actually build a very simple rag system we'll approach this in a few different ways first we'll start with a list of the tasks that we need to accomplish then we'll build the system using Python and build the same thing in typescript finally we'll take a really brief look at some of the different tools that include rag so that

you don't even have to do any coding welcome back to the AMA course this is a free course available here on this YouTube channel that will teach you everything you need to know about how to use AMA to run artificial intelligence models locally on your laptop on your computer or even on an instance that you own up in the cloud if you find this content interesting be sure to subscribe to ensure you see the next entry in the course when it comes out so what are the different tasks that we need to solve to achieve

rag first we need to set up the environment next we need to bring text into the database and then we need to find the relevant document chunks from the database and when we do that we will populate a prompt and send it to the model to generate an answer so task one set up the environment the big thing here is the vector store there are a lot of ways to go with this and no matter what I choose some someone is going to be upset with my choice you can choose to go with a self-hosted

database or or a hosted one for hosted there's super base which is amazing and there are so many others for self-hosted there are also so many but my go-to a lot of the time is chroma it's just the easiest one to work with and it does all the basics and there are a number of ways to run chroma but I think the best and easiest is to run it as a Docker container now if you have Docker up running just run Docker run- d-p 8000 8000 DV chroma D dat colon chroma dbd chroma DB chroma

now we have chroma up and running and can use it in different projects and we are writing data to a volume so we can update the database without losing our work sweet so now we need to get text into the database I have a directory of sample content in my video projects repo called scripts there are a bunch of scripts for my YouTube videos the code for this project is in 2024 d910 DBU rag in that same video projects repo which you can find at github.com techno evangelist SLV video projects let's start with python to

add chunks to Chromo we first need to create a collection and then we can add either a single document or a bunch of them all at once each document should include a on ID The Source text or the document the embedding and the metadata which in our case is just stating the source file name to read the file Contents I have a function that takes a path and then returns a dictionary where the keys are file names and the values or the text in that file then it chunks up the text so I made a

function that takes the text then returns a list of chunks in that file finally since. embeds can work with a single chunk of text or a list of chunks we can call the embed function once and get back the entire list of embeddings so to use these functions we set up the chroma client with a collection and then for each file create the chunks and embeds generate some IDs and the metadata and then add everything to the database all at once to run it run python import docs. py how can we verify that something got

into the database well with chroma we can visit the URL and then tack on the docs endpoint so HTTP coloo host 8000 SL docs find the API slv1 collections get endpoint and try running it by clicking on try it out and then execute we should see a response body with info about our collection copy the ID and then scroll down to the post endpoint API V1 collections SL collection idget click try it out and paste in the colle ction ID remove everything above limit in the request body and then execute then scroll down and we

should see everything in the database now let's move on to doing the same thing with typescript I'll use Doo for this which is just a great way to work with typescript you can see I'm basically doing the same thing I did in Python here's the function to read the text files and then here we Chunk Up the text and then this embeds each chunk then to use it create the chroma client then call each of the functions and add everything to the database to run that run Dino task Import in this pair of examples I

chunk The Source text by 100 words there are a lot of variations you could have done here I talked about a lot of the options in the previous video on embedding try some of them out and see what works better for you and your data now we can do a query to the database this time let's use typescript first so create the chroma client and grab the collection get the question from the command line and create the embedding now query the database using that embedding and then generate our prompt now make the call to the

generate endpoints and spit out the results I'm printing out results without and with rag so we can see the benefits also note that you can use the chat endpoint instead of generate they both do exactly the same thing and generate is easier to work with chat offers the benefit of being able to tweak the history if you want to update it to run the search run Dino no task search what is AMA and we should see one answer without using the scripts as input and the second answer with rag we can also look at the

python version and see that it's pretty much the same thing create the client point to the collection embed the query get the related chunks from the database build the prompt query the model and spit out the results and that's it in all the examples you'll notice I use the native olama API and not the open AI compatible API you're always going to have a better experience using the native API and should always reach for it unless you have a specific need for that alternative so as you can see it's pretty easy to build out a

rag solution but you can also just Implement one of the many tools out there that does rag super easily my favorites are open web UI and Misty page assist is another great tool with rag built in and then anything llm is also popular you can find a list of tools on the alma GitHub repo and scroll down to the end a lot of these tools have some sort of rag solution built in what do you think are you ready to build a rag solution yourself or would you prefer to use one of the pre-built Solutions

in some of the upcoming videos in the AMA course we'll look at those rag Solutions in a bit more detail I hope you're finding these videos useful and that they're helping you get better with olama be sure to subscribe to the channel to find out about new videos as they come out thanks so so much for being here goodbye