Graph RAG: Improving RAG with Knowledge Graphs

80.55k views2702 WordsCopy TextShare

Prompt Engineering

Discover Microsoft’s groundbreaking GraphRAG, an open-source system combining knowledge graphs with ...

Video Transcript:

Graph RAG works great, but there was one major issue and that is the cost. Microsoft just open sourced GraphRAG, a system that they presented almost a year ago. This is a groundbreaking system that combines knowledge graphs with Retrieval Augmented Generation or RAG.

And the goal is to address some of the limitations of the current RAG systems. The code is available on GitHub and you can start using it in your own projects right now. You can use this with both proprietary models like GPT 4 and local models like Lama3.

In this video, I'm going to show you how graph RAG works and then guide you through setting it up on your local machine to run some example tests. We will also take a look at cost implications of a run. But before we dive into graph RAG, let's first understand the motivation behind it by looking at traditional RAG approach.

Traditional RAG is a method where the language model retrieves relevant documents from a large corpus to generate more accurate and contextually relevant responses. There are three steps and here is how it works. In the first step we process the documents and convert them into vectors.

So we take our original documents, we divide them into sub documents using a chunking strategy. We compute embeddings for each of the chunks and then we store the chunks plus the embeddings in a vector store. That becomes our knowledge base.

The next phase is query phase where the user asks a question, we compute embeddings for that query, then do a similarity search on all the vectors that are present in the vector database and we retrieve the most relevant chunks or sub documents from our vector store. Then we combine the query plus the retrieved context, give it to large language model to generate a final response. As you can see, there are three major limitations with this approach.

The first one is limited contextual understanding. So RAC can sometimes miss the nuances in the data due to its reliance on retrieved documents alone. It doesn't have a holistic overview of the document, so it doesn't really understand the overall picture.

Now there are scalability issues. As the corpus grows, the retrieval process can become less efficient. And there is associated complexity.

So integrating external knowledge sources in meaningful way can be complex and cumbersome. And with GraphRag, Microsoft is trying to address some of these issues. Along with the code, Microsoft also released a highly detailed technical report or paper titled from local to global, a graph rag approach to query focused summarization.

In this section, we are going to look at the technical details of how this works. If you are just interested in using the package, skip to the next section, but I highly recommend to stick around for this section to understand how this whole thing actually works. So here's a quick representation of the approach in the form of a flowchart that I created with the help of cloud 3.

5 sonnet. Now, just like rag there are two different parts or phases. One is the indexing phase in the other one is the query phase.

During the indexing phase, we take our original source documents. and convert them into sub documents using a chunking strategy. This step is very similar to traditional rag approaches, but then within each chunk, we try to identify different entities.

Now these entities can be people places, companies, right? Depending on the context that you're providing. And we also look for relationship between these different entities across different chunks.

So we. do two parallel things. One is entity extraction and then relationship extraction.

And we use that information to create a knowledge graph. Knowledge graph is basically a set of nodes that preserves the relationship between different entities. Now, based on the knowledge graph, we create communities and I'll explain the step and a lot more details in the subsequent section.

But this is basically we detect entities that are closer to each other. And then we describe the relationship between these entities or communities using different levels. So in the paper they talk about three different levels of communities, and I'll explain what those are.

But for each one of those, we create summaries. So think about it this way, that we basically look at a set of chunks. Create summaries for those and then combine it with Another set of chunks using reduced map approach create summaries with those And so on and so forth until we have a holistic overview of whatever is in this set of documents Now during the query phase we take the user query.

Then we select the community level Basically, what level of information or what level of details we want, and then think about this is again like a retrieval process that you're doing on chunks, but rather than chunks, now you're doing it on communities. And we look at summaries of the communities that will generate partial responses for us. If there are multiple communities involved, then we combine those responses into a single response.

And that is going to be the final answer from the model. As we will learn in this video, graph RAG is awesome, but there are still use cases for traditional RAG systems, especially when it comes to the cost of running a graph RAG. If you want to learn about RAG beyond basics, I have a course dedicated to the topic, in which we start with basic techniques, and then we go into advanced techniques of building robust RAG systems.

If that interests you, link is in the video description. Now, back to the video. I hope this gives you a very good understanding of how GraphRag works.

Now let's set this up on our local machine and we can start experimenting with it. They have provided very detailed instructions on how to get started, so we're going to be using those. So first I'm going to create a conda virtual environment.

I'm going to call it GraphRag. And then we need to activate this virtual environment, so we're going to just use conda activate GraphRag, and our virtual environment is ready to go. Next, we need to install the package, so we're going to use pip install graph rank.

This is going to install the graph rank python package for us. Okay, next we need to run the indexing process. For that, we need our own dataset.

But before copying the data set, we're going to create another folder within the current working directory. If you can see the current working directory is completely empty. They recommend to create a rack test and then when that there's another folder called input, but you can essentially provide any path that you want.

So what I, what I did here was I just created another package or sorry, another folder here. And it's basically rack test input, and we're going to put our data in there. Next, we need a source document so currently I think they only support plain text, and they have provided a link to Charles Deacon's book, A Christmas Carol, so we're going to just use that as a source of information.

So if I run this command, this will download the text of the book. So here's the project Gutenberg ebook of a Christmas carol. I believe they currently support only plain text, so you can potentially use something like Markdowns.

And this is a pretty huge book. Okay, next we're going to set up our workspace variables and for that we will be using this command python dash m then Graph rag dot index. So basically we want to create an index out of the Data that we have provided.

However before that we need to initialize our configurations for the variable to work And then we provide the root directory where the data is stored. So when we run this, you're going to see that it's going to create a whole bunch of different files in our current workspace. Okay, so we can see that here is the input, but apart from that it also created an output where we can see a log, but it hasn't really run the indexing process yet because we need to provide our LLM.

It also created different prompts, so these are the prompts that it's going to internally use to create this knowledge graph for us. And these are basically the prompts that they have set up. Now, there has been a lot of discussion regarding these prompts these are very comprehensive prompts, so it basically uses these prompts to not only extract different entities from the provided corpus, but also creates the communities as well as the summaries for each community.

Next, we need to provide our graph API key. This is basically the OpenAI API key. So you can select your OpenAI model and provide that in here.

Now you also have a settings. yml file. This is where you want to set different configurations.

For example, we set our graph API key. So it's going to get the information from there. We want to use the GPT 4 O in this case because that's faster and it's going to hopefully cost us less.

You can also set the maximum number of tokens that it's going to process, right? There are a whole bunch of settings that you can do in here. And if you were to use a local model such as the one that you are serving through OLAMA, You can also change the base API path.

So the URL so for example, if you were to use grok that is serving lemma three. You will just provide that base URL here. Now for embeddings it also is currently using the open AI embedding the small model.

But you can change that if you want, if you want to use another provider. Currently the chunk size is limited to 300 tokens. We can play around with it, but we're going to just go with the defaults.

And there's going to be an overlap of a hundred tokens. Thanks. Now as I was showing you different prompts so for example, for entity extraction, here's the prompt that it's going to be using.

You can modify these prompts based on your own needs which I highly recommend to do because that will give you a lot more control compared to whatever is there by default. Okay, so with the previous command it created the structure for us like how The different parameters are set but now we need to Run this in order to actually start creating the index. So instead of initializing it.

We'll just run the index creation process this is going to basically Go through the whole document identify different entities that are present in the documents or the corpus and then Create relationship between those create a knowledge bank graph, then create communities on top of it, and then it will create a summarization of different communities at a different level. So this process can take some time. And I also want to see how much this is going to cost us, because cost is definitely a factor because you are not only running the embedding model, but you are also running this entity recognition step as well as the community summarization step that involves the use of an LLM.

Now in this step it's actually currently doing the summarization description. So the index creation process is complete and then you can look at the output. So we're going to look at different artifacts that it created.

So these are just the database files that it created. There is a JSON which keeps track of different stats. So for example, total runtime, that's the number of seconds it took.

So about two minutes. There was a single document, right? So you will get a whole bunch of information here.

And then there is also another indexing engine log that also describe different parameters. And now the next step is going to be to run queries. Again, we're going to just use the examples that they have provided.

Now there are different set of queries that you can run. So for example in order to run a query, you're going to use Python M. That's basically referring to the current Python environment.

Then instead of indexing, you are going to run the query. We will need to provide the path where the data is stored. And the method is basically the community level that you want to use.

So basically, if you want to use the root level, which is looking at all the information present in the document. Then you can use the global method. So something like this prompt, what are the main themes in this story will need access to the global level information.

So if you run this, this will just use the global level or the top level community to generate answers. And here's the response that we got. So it says, success, global search response, and top themes in the story.

So it's transformation and redemption, charity, and generosity, right? We are just looking at the examples that they have provided in the subsequent videos, I'll show you a lot more complex examples, working with different types of datasets. Now If you are looking for a specific character within a story, then you probably want to use more local level or lower level communities or information.

So, in this case, we are using the method as local because we are specifically looking for a single character. So In this case, it will just look at as a community level or chunk level summaries and try to combine multiple of them to generate an answer for this specific character for us. And then it was able to identify a different relationships.

Now you can see that a normal traditional rag might be able to do something like this because it will simply look at in different chunks where this specific characters is mentioned and if they're it's describing like a relationship with another character. However, if you are looking for the main theme of the document, that's where RAG is going to fail because RAG just looks at the specific chunks that are retrieved during the retrieval process. It doesn't really have an overall big picture of the corpus that you are providing.

Also both for the global as well as for the local level it will tell you where the information is coming from. So it actually cites its sources, which is pretty neat. Graph RAG works great, but there was one major issue and that is the cost.

So for this specific example we send a total of 570 requests through the API and we are talking about GPT 4 or requests But for the embedding model, we only send about 25 requests, Now in terms of the total number of tokens that were processed It's well over 1 million tokens, which comes out to be around 7 7 So we spend about 7 in total to process this book and create a graph rack, which could be prohibitively expensive for a large corpus of data. So this is definitely something you need to consider if you're planning on using graph rack in your own application. I think this is substantially more expensive if you were to build a traditional rack system.

Anyways, I highly recommend you check out graph rack. It's an innovative approach. Now, Microsoft is not the only company that they have implemented a graph RAG.

There are some other options. For example, Lama Index has their own implementation of Knowledge Graph RAG query engine, and Neo4j has their own graph RAG package that you can use to create graph RAGs. If there is interest, I will create some content comparing these different implementations as well.

Let me know in the comment section below. I hope you found this video useful. Thanks for watching and as always, see you in the next one.