RAG Langchain Python Project: Easy AI/Chat For Your Docs

175.78k views3386 WordsCopy TextShare

pixegami

Learn how to build a "retrieval augmented generation" (RAG) app with Langchain and OpenAI in Python....

Video Transcript:

Hey everyone, welcome to this video where I'm going to show you how to build a retrieval augmented generation app using Langchain and OpenAI. You can then use this app to interact with your own documents or your own data source. This type of application is great for when you have a lot of text data to work with.

For example, a collection of books, documents or lectures. And you want to be able to interact with that data using AI. For example, you might want to be able to ask questions about that data or perhaps build something like a customer support chatbot that you want to follow a set of instructions.

Today, we're going to learn how to build this using OpenAI and the Langchain library in Python. We're going to be using a technique called RAG, which stands for retrieval augmented generation. In this example, the data source I've given it is the AWS documentation for Lambda.

And here I'm asking it a question based on that documentation. The agent will be able to use that documentation to give me a response as well as quote the source where it got that information from originally. This way, you always know that it's using data from the sources you provided it with rather than hallucinating the response.

If this project sounds complex or difficult to you, then don't worry because it's a lot easier than you think. I'll walk you through every step of the project, starting with how to prepare the data that you want to use and then how to turn that into a vector database. Then we'll also look at how to query that database for relevant pieces of data.

Finally, you can then put all those pieces together to form a coherent response. If that sounds good, then let's get started. To begin, we'll first need a data source like a PDF or a collection of text or markdown files.

This can be anything. For example, it could be documentation files for your software. It could be a customer support handbook, or it could even be transcripts from a podcast.

First, find some markdown files you want to use as data for this project. But if you want some ideas, then here I've got the Alice in Wonderland book as a markdown file, or I also have the AWS documentation as a bunch of markdown files. And I have each of them in their own separate folder under this data folder in my project.

So make sure you have a setup like this first before you start. Once you have that source material, we're going to need to load it up and then split it into different chunks of text. To load some markdown data from your folder into Python, you can use this directory loader module from Langchain.

Just update this data path variable with wherever you've decided to put your data. Here I'm using data/books. If you only have one markdown file in that folder, it's okay.

Or if you have multiple markdown files, then this will load everything and turn each of those files into something called a document. If I use this piece of code on my AWS Lambda documents folder instead, then each of these markdown files will become a document. And a document is going to contain all of the content on this page.

So basically all of the text you see here. And it's also going to contain a bunch of metadata. For example, the name of the source file where the text originally came from.

And after you've created your document, you can also choose to add any other metadata you want to that document. Now the next problem we encounter is that a single document can be really, really long. So it's not enough that we load each markdown file into one document.

We have to also split each document if they're too long on their own. With something as long as this, we're going to want to split this big document into smaller chunks. And a chunk could be a paragraph, it could be a sentence, or it could be even several pages.

It depends on what we want. By doing this, the outcome that we're looking for is that when we search through all of this data, each chunk is going to be more focused and more relevant to what we're looking for. To achieve this, we can use a recursive character text splitter.

And here we can set the chunk size in number of characters and then the overlap between each chunk. So in this example, we're going to make the chunk size about 1000 characters, and each chunk is going to have an overlap of 500 characters. So I've just ran the script to split up my text into several chunks.

And here I've printed out the number of original documents and the number of chunks it was split into. Since I used this on the Alice in Wonderland text, it split one document into 282 chunks. And down here, I've just picked a random chunk as a document and I printed out the page content and the metadata so you could see what it looks like.

So the page content is just literally a part of the text taken out of that chunk. So here you can see that it's about one or two paragraphs of the story. And the metadata right now, it only has the source, which is the path of the file it got this from, and the start index.

So where in that source does this particular chunk begin? And if you try the same code with the AWS Lambda docs instead, you'll see that the source also points to the file that the information of each chunk is from. So this is also useful if you have a lot of different files, rather than just one big file splitting into smaller chunks.

To be able to query each chunk, we're going to need to turn this into a database. We'll be using ChromaDB for this, which is a special kind of database that uses vector embeddings as the key. This is the code that you can use to create a Chroma database from our chunks.

For this, you're going to need an OpenAI account because we're going to use the OpenAI embeddings function to generate the vector embeddings for each chunk. I'm also going to create a Chroma path and set that as the persistent directory so that when we create this database, I have a bunch of folders on my disk that I can use to load the data later on. This is useful because normally I might want to put this database into a Lambda function or I might want to put it in the cloud somewhere.

So I want to be able to save it to disk so that I can copy it or deploy it as a file. Now before I create the database or before I save it to disk, I can also use this code snippet to remove it first if it already exists. This is useful if I want to clear all of my previous versions of the database before I run the script to create a new one.

Now the database should save automatically after we create it, but you can also force it to save using this persist method. So once you've put all of that together and then you've run your script to generate your database, you should see this line where it's saved all of your chunks to the Chroma database. And you can see here on your disk that the data should be there as well.

And here it's going to be saved as a SQLite3 file. So now at this point we have our vector database created and we're ready to start using it. But first you're probably going to want to know what a vector embedding is.

If you already know what embedding vectors are, then feel free to skip the section entirely. Otherwise, I'll give you a really quick explanation just to bring you up to speed. Embeddings are vector representations of text that capture their meaning.

In Python, this is literally a list of numbers. You can think of them as sort of coordinates in multi-dimensional space and if two pieces of text are closely related to each other in meaning, then those coordinates will also be close together. The distance between these vectors can then be calculated pretty easily using cosine similarity or Euclidean distance.

We don't need to do that ourselves though, because there's a lot of existing functions that can do that for us already. And this will give us a single number that tells us how far these two vectors are apart. To actually generate a vector from a word, we'll need an LLM, like OpenAI.

And this is usually just an API or a function we can call. For example, you can use this code to turn the word "apple" into a vector embedding. And this is the result I get from using that function.

So you could see that the vector here is literally a really long list of numbers. And the first number is 0. 007 something-something, but I truncated the rest because the list is quite long.

In fact, if you print the length of the vector, you could see that the list has 1536 characters. So this is basically a list of one and a half thousand numbers. The numbers themselves aren't interesting though.

What's really interesting is the distance between two vectors themselves. And this is quite hard to calculate from scratch, but Langchain actually gives us a utility function to compare the embedding distance directly using OpenAI. So it's called an evaluator and this is how you can create one.

And here's the code to run an evaluation. So here I'm comparing the distance of the word "apple" to the word "orange". And running this, the result is a score of 0.

13. So we don't actually know whether that's good or not by comparing an apple to an orange, because we don't know where 0. 13 sits on the scale of other words.

So let's try a couple of other words just to see what's a better match with apple than orange, and what's a worse match. Here, if I compare "apple" to the word "beach", it's actually 0. 2.

So "beach" is further away from "apple" than "orange" is, I suppose because one is a fruit. So that naturally makes it more similar. Now if I compare the word "apple" to itself, this should technically be 0 because it's literally the same word.

But in this case, it's close enough. It's 2. 5 x 10^-6.

Now what about if we compare the word "apple" to "iPhone"? In this case, the score is even better than when we compared it with "orange". The score is 0.

09. And this is really interesting as well, because in our first example with apples and oranges, they were both fruits, so they were similar in that respect. But here, we're sort of interpreting the word "apple" from a different perspective.

We're seeing it as the name of the company "apple" instead. So when you compare it with the word "iPhone", the association is actually much stronger. So now that you understand what embeddings are, let's see how we can use it to fetch data.

To query for relevant data, our objective is to find the chunks in our database that will most likely contain the answer to the question that we want to ask. So to do that, we'll need the database that we created earlier, and we'll need the same embedding function that we used to create that database. Our goal now is to take a query, like the one on the left here, and then turn that into an embedding using the same function, and then scan through our database and find maybe five chunks of information that are closest in embedding distance from our query.

So here, in this example, I might ask the question like, "How does Alice meet the Mad Hatter in Alice in Wonderland? " And when we scan our database, we might get maybe four or five snippets of text that we think is similar to this question. And from that, we can put that together, have the AI read all of that information, and decide what is the response to give to the user.

So although we're not just simply returning the chunks of information verbatim, we're actually using it to craft a more custom response that is still based on our source information. To load the Chroma database that we created, we're first going to need the path, which we have from earlier, and we're going to need an embedding function, which should be the same one we used to create the database with in the first place. So here, I'm just going to use the OpenAI embeddings function again.

This should load your database from that path. If it doesn't, then just check that the path exists, or just go back to the previous chapter and run the script to create the database again. Once the database is loaded, we can then search for the chunk that best matches our query by using this method.

We need to pass in our query text as an argument and specify the number of results we want to retrieve. So in this example, we want to retrieve three best matches for our query. The results of the search will be a list of tuples where each tuple contains a document and its relevance score.

Before actually processing the results though, we can also add some checks. For example, if there are no matches or if the relevant score of the first result is below a certain threshold, we can return early. This will help us to make sure that we actually find good, relevant information first before moving to the next step of the process.

So now let's go to our code editor and put all that together and see what we get. So here I've got the main function. I just made a quick argument parser so I can input the query text in the command line.

I've got my embeddings function and I'm going to search the database that I've loaded and I'm just going to print the content for each page. So I'm going to find the top three results for my query. So that's my script.

Let's give it a go. So here I'm running my script with the query, which is how does Alice meet the mad hatter? Here it's returned the three most relevant chunks in the text that it thought best match our query.

So we have this piece of information, this piece of information, and then this one here. Now here the chunk size is quite small, so it doesn't have the full context of each part of the text. So if you want to edit that you can play with that chunk size variable and make it either bigger or smaller, depending on what you think will give you the best results.

But for now, let's move on to the next step and see if we can get the AI to use this information and give us a direct response. Now that we have found relevant data chunks for our query, we can feed this into OpenAI to create a high quality response using that data as our source. First, we'll need a prompt template to create a prompt with.

You can use something like this. Notice that there's placeholders for this template. The first is the context that we're going to pass in.

So that's going to be the pieces of information that we got from the database. And then the second is the actual query itself. Next, here's the code to actually use that data to create the actual prompt by formatting the template with our keys.

So after running this, you should have a single piece of string. It's going to be quite a long string, but it's going to be the entire prompt with all the chunks of information and the query that you asked at the beginning. After running that piece of code, you should get a prompt that looks something like this.

So you're going to have this initial prompt, which is to answer the question based on the following context. And then we're going to have our three pieces of information. And this can be as big or as little as we want it to be, but here this is what we've chosen.

And then the question that we originally asked. So here's our query. How does Alice meet the Mad Hatter?

So this is the overall prompt that we're about to send to OpenAI. This is actually the easy part. So simply just call the LLM model of your choice with that prompt.

So here I'm using chatOpenAI, and then you'll have your response. Finally, if you want to provide references back to your source material, you can also find that in the metadata of each of those document chunks. So here's the code on how you can extract that out and print it out as well.

And going back to our code editor, this is what my script looks like with all of those pieces put together. So I've got my prompt template here. I've got my main argument here, which takes the query, searches the database for the relevant chunks, creates the prompt, and then uses the LLM to answer the question.

And then here I'm collecting all the sources that were used to answer the prompt and print out the entire response. Let's go ahead and run that. And here's the result of running that script.

So again, we see the entire prompt here, and this is the final response. The response is Alice meets the Mad Hatter by walking in the direction where the March Hare was set to live. And obviously it took this from the first piece of the context.

And here we also have a list of the source references that it got it from. This is pretty much pointing to the same file because I only made it print out the actual file itself and not the index. But this is pretty good already because you can see how it's using our query to search for pieces of information from our source material and then answer based on that information.

Now let me switch up my data source and show you a different example just so you can see what else you can do with something like this. I switched my database to one I prepared earlier, which uses the AWS Lambda documentation as a source. And here the query I'm going to ask it is what languages or runtimes does AWS Lambda support?

So after I ran this, you can see that the chunks I use here are much bigger than in the previous example, but it still managed to find three relevant chunks of my information and it's published a response that summarizes that information. So here it says AWS Lambda supports Java, C#, Python, and etc. You can read the rest here.

But this is more interesting because unlike in the first example, the sources were actually from different files. So you can see here that each of the source is its own file. So this is useful as well.

If you have data source that spread out across a lot of different files and you want to see how to reference the source. So we just covered how you can use the line chain and OpenAI to create a retrieval augmented generation app. I'll post a link to the GitHub code in the video description and I encourage you to try this out for yourself and with your own data set.

If you want to see more tutorials like this, then please let me know what type of topics you'd be interested to see next. Otherwise, I hope you found this useful and thank you for watching.