RAG vs. Fine Tuning

64.97k views1707 WordsCopy TextShare
IBM Technology
Get the guide to GAI, learn more → https://ibm.biz/BdKTbF Learn more about the technology → https://...
Video Transcript:
let's talk about rag versus fine-tuning now they're both powerful ways to enhance the capabilities of large language models but today you're going to learn about their strengths their use cases and how you can choose between them so one of the biggest issues with dealing with generative AI right now is one enhancing the models but also to dealing with their limitations for example I just recently asked my favorite llm a simple question who won the Euro 2024 World Championship and while this might seem like a simple query for my model well there's a slight issue because
the model wasn't trained on that specific information it can't give me an accurate or up-to-date answer at the same time these popular models are very generalistic and so how do we think about specializing them for specific use cases and adapt them in Enterprise applications because your data is one of the most important things that you can work with and in the field of AI using techniques such as rag or fine-tuning will allow you to supercharge the capabilities that your application delivers so in the next few minutes we're going to learn about both of these techniques
the differences between them and where you can start seeing and using them in let's get started so let's begin with retrieval augmented generation which is a way to increase the capabilities of a model through retrieving external and up-to-date information augmenting the original prompt that was given to the model and then generating a response back using that context and information and this is really powerful because if we think back about that example of with the Euro Cup well the model didn't have the information in context to provide an answer and this is one of the big
limitations of llms but this is mitigated in a way with rag because now instead of having an incorrect or possibly um a hallucinated answer we're able to work with what's known as a corpus of information so this could be data this could be PDFs documents spreadsheets things that are relevant to our specific organization or knowledge that we need to specialize in so when the query comes in this time we're working with what's known as a retriever that's able to pull the correct doc doents and Rel relative context to what the question is and then pass
that knowledge uh as well as the original prompt to a large language model and with its intuition and pre-trained data it's able to give us a response back based on that contextualized information uh which is really really powerful because we can start to see that we can get better responses back from a model with our proprietary and confidential information without needing to do any retraining on the model uh and this is a great and popular way to enhance the capabilities of a model uh without having to do any fine-tuning so as the name implies what
this involves is taking a large language foundational model but this time we're going to be specializing it in a certain domain or area so we're working with labeled and targeted data that's going to be provided to the model and and when we do some processing we'll have a specialized model for a specific use case to talk in a certain style to have a certain tone that could represent our organization or company and so then when a model is queried from um a user or any other type of way we'll have a a response that gives
the correct tone and output or specialty in a domain that we'd like to receive and this is really important because what we're doing is essentially baking in this context and intuition into the model um and it's really important because this is now part of the model's weights versus being supplemented on top with a a technique like rag okay so we understand how both of these techniques can enhance a model's accur output and performance but let's take a look at their strengths and weaknesses in some common use cases because the direction that you go in can
greatly affect a model's performance its accuracy outputs compute cost and much much more so let's begin with retrieval augmented generation and something that I want to point out here is that because we're working with a corpus of information and data this is perfect for dynamic data sources such as databases uh and other data repositories where we want to continuously pull information and have that up to date for the model to use understand and at the same time because we're working with this retriever system and passing in the information as context in the prompt well that
really helps with hallucinations and providing the sources for this information is really important in systems where we need trust and transparency when we're using AI so this is fantastic but let's also think about this whole system because um having this efficient retrieval system uh is really important in how we select and pick the data that we want to provide in that limited context window and so maintaining this is also something that you need to think about and at the same time what we're doing here in this system is effectively supplementing that information on top of
the model so we're not essentially enhancing the base model itself we're just giving it the relative and contextual information it needs versus fine-tuning is a little bit different because we're actually baking in that context and intuition into the model well we have greater um influence um in essentially how the model behaves and reacts in different situations is it an insurance adjuster can it summarize documents whatever we want the model to do we can essentially use fine tuning in order to uh help with that process and at the same time because that is baked into the
model's weights itself well that's really great for Speed and inference cost and a variety of other um factors that come to running models so for example we can use smaller prompt context windows in order to get the responses that we want from the model and as we begin to special these models they can get smaller and smaller for specific use case so it's really great for running these specific uh specialized models in a variety of use cases but at the same time we have the same issue of cut off so up until the point where
the model is trained well after that we have no more additional information that we can give to the model so the same issue that we had with the World Cup example so both of these have their strengths and weaknesses but let's actually see this in some examples and use cases here so when you're thinking about choosing between r and fine-tuning it's really important to consider your AI enabled application priorities and requirements so namely this starts off with the data is the data that you're working with slow moving or is it fast for example if we
need to use uh up-to-date external information and have that ready contextually every time we use a model then this could be a great use case for rag for example a product documentation chatbot where we can continually update the responses with up-to-date information now at the same time let's think about the industry that you might be in now fine tuning is really uh powerful for specific industries that have nuances in their writing styles terminology vocabulary and so for example if we have a legal document summarizer well this could be a perfect use case for fine tuning
now let's think about sources this is really important right now in having um transparency behind our models and with rag being able to provide the context and where the information came from uh is really really great so this could be a great use case again for that chatbot for retail insurance and a variety of other uh uh uh specialities where having that source and information in the context of the prompt is very important but at the same time we may have things such as past data in our organization that we can use to train a
model so let it be uh accustomed to the data that we're going to be working with for example again that legal summarizer could have past data on different legal cases and and documents that we feed it so that it understands the situation that's working in we have better more desirable outputs so this is cool but I think the best um situation is a combination of both of these methods so let's say we have a financial news reporting service well we could fine-tune it to be uh native to the industry of finance and understand all the
lingo there uh we could also give it past data of financial records and let it understand um how we work in that specific industry but also be able to provide the most up-to-date sources for news and data and be able to provide that with a level of confidence and transparency and Trust to the end user who's making that decision and needs to know the source and this is really where a combination of fine-tuning and rag is so awesome because we can really build amazing applications taking advantage of both rag as a way to retrieve that
information and have it up to date but fine tuning to specialize our data uh but also specialize our model in a certain domain so uh they're both wonderful techniques and they have their strengths but the choice to use one or combination of both techniques is up to you and your specific use case and data so thank you so much for watching uh as always if you have any questions about fine-tuning rag or all AI related topics let us know in the comment section below don't forget to like the video and subscribe to the channel for
more content thanks so much for watching
Copyright © 2025. Made with ♥ in London by YTScribe.com