Complete GenAI in 5 hours For Free 🔥 | RAG System Course

83.32k views49162 WordsCopy TextShare

Ayush Singh

Most students learning GenAI and RAG are stuck at the basics—watching tutorials, copying code, and m...

Video Transcript:

99% of the students who are learning gen are absolutely doing it wrong most of the students who are learning gen are actually stuck into the loop they're watching 30 40 minutes of videos or tutorials around RGS gen they're blindly following all the quotes Frameworks and they think that they mastered it by building a so-called project in just 30 40 minutes but the reality is they have not even learned a 1% of what this field is you cannot land a job by just copy pasting the code from other videos and putting it out there but with

different ideas or probably you cannot even add this into your portfolio project and you think that this will get you a job absolutely not I've seen tutorials on RG build a chatbot in 10 minutes build a chatbot in 40 minutes do you really think that you can learn to build a chat bot and still get to know about the foundations or the production skills and all those things let me be very clear on this you cannot even get to know 1% of the foundations forget about the produ uction grade skills which you need for this

domain while there are immense opportunities into this field I have seen so many of my students grabbing a very high paying job just because of this domain but they were doing it right now the problem is very simple most of the students are talking about how to do this that's it they are not talking about what wise and underlying principles and details foundations and code towards it and that's the main problem and that's why I come up with complete gen RG systems course and we won't teach you to build a particular cool application just few

minutes or few hours we will first of all start by teaching you the Core Concepts and the foundations which will give you the why answers for most of the things we should be doing into the projects so that you know better than most of the people the inner workings towards it how exactly all these systems are made what are the intuitions behind it and that's what I call this core and I have seen even those videos which teaches you about tutorials they're literally building a basic system they're not even building the produ production grade systems

while we teach you the core we'll also tell you how to build production grade systems which will help you to be a professional into this field the first step is that we truly learn about foundations and the core of the nlps which will be used into the r systems so you can find the syllabus in the description box below I've written all the things which you'll be learning throughout and then we'll work on some of the projects where we'll be learning we'll be utilizing all the concepts which we learned into our foundation and then try

to bring that into RGS and then we'll be using Advanced applications like PGA PG Vector scale as well as PG vectorizer all these interesting libraries to build our production grade applications as I said which does not only works for demos but also works for millions and millions of data out there and you'll find the first sections of this will be Theory but is not about Theory we have enance on as I said we start with first of all building a Q&A system which is taking the concept which we have learned and then implementing it and

then we'll be working on our very first project which is Lexi chat and Lexi chat is nothing but an application where you can upload your documents and have the conversation with that document I hope that this video will find you well all the installation guides first of all I ask you to watch all the installation guides first and make sure that you have the system set up and then you get started with the course I'll catch up with the tutorials in the next ones thank you so much so now let's get started with the course

so hey everyone welcome to the first module uh and into this module so hey everyone welcome to the first module into this module we'll specifically cover the foundations in NLP so we have originally started with covering up uh some of the foundations because these foundations will eventually make your understanding of rags and other Concepts very clear and when we work on our major project Lexi chat we should be able to understand much more in depth and uh instead of just learning from upper wave you you will be able to learn something in more foundational because

I always say that core is very important instead of just you learning how to write a couple of lines of code so we'll focus on the little bit of Foundations we'll not go too much into mathematical details and all but still I'll make sure that you truly understand what exactly each process and steps are you don't really need a lot of big mathematics background for this it's just that we'll be covering uh very much in detail about text processing tokenization edings in this module one and then we'll be covering several other modules for the scene

so let's get started with eventually uh a story is why why do we need natural language processing I know you have been coming from several NLP background and you already might know about this but for the people who don't know I want to take one a very simple example so let's say you a detective you want to analyze thousands of witness statements in a big case now so you have a lot of statements like you can see in the image a lot of um kind of a witness statements and they all are in raw and

un unstructured format and you have very very limited F time to find key patterns name of the subsect suspects locations and perhaps connections between people so basically you have to make sense out of all these uh witness statements so let's say that if you take all these thousand witness statements you go step by step and you read the each statement very manually it will take you forever so human thought okay what if it could computers help you to do this computers help you to find patterns find connection C between text and probably find insights from

the text little faster so that's where the the idea of NLP comes into the place but now let's say that the computers becomes your investigative assistant let's take an example like that so the only problem with it that it does not understands the language like humans do it will not understand like the text written out there just like the humans does it sees words as the strings like you you you know in a programming language you have something known as strings as a data structure so so it understands that as more as a strings instead

of just what exactly we do so to computer for the computer to help you efficiently it's need a way to break down each statement into manageable chunks understand meaning and then connect the dots across documents so it needs something in a way that it like a uh what can you say binary language soal binary language to fully understand so it it really needs a way to break down each sentence into computer understandable language we'll talk about what those understandable language is but it needs a way to find to to to make it in a format

the way computers understand so what exactly uh what is the way for the computers to understand the text so a computer to help computer understands these text and help you to help you to solve some witness cases and then possibly make patterns make insights and all these things very efficiently and productively is computers need three steps I or originally classify into three steps the first step is processing of the text to make sure that we are one feeding the right information the processed information so if you're coming from a machine learning background you should be

knowing it very nicely that before you feed in the data to be trained onto the the model you first of all process it and that's what you do it over here that you first of all process the text to make sure that the irrelevant texts goes and one relevant text goes to your computers then the next step which comes into the play which we call as a tokenization so as I as I told that uh that your whole documents needs to be broken down into a manageable chunks that's what the tokenization means and then once

that is broken down into manageable junks it has to be converted into a language like a binary language in order for the computer to understand that's what we call as embeddings our whole concept in RG will revolve around tokenization embeddings and all these three steps there's no other steps which will focus on other way but now let's try to expand each and every step of this like text processing tokenization embeddings and if you don't understand this don't worry eventually you will get it over the period of an hour which will study but one thing I

want to make sure that we will continuously keep on revisiting the topic and make the good story line out of it so that you try to connect how these things eventually help because lot of people eventually jump to just RG tutorial and they learning 17 20 minutes it does not really helps that's why this course is pretty long to help you go step by step from starting so you understand each and everything so let's first of all go to text processing so if you're coming all already from a NLP background or some machine learning background

you you should be knowing how does this really helps so text processing what does it really mean it starts by prepare the text by lower casing so so whenever you get a some sort of document you lowercase it or probably you do some processing like you remove some punctuation or you remove stop wordss stop wordss are those which does not really adds any value like a common words like the is in they do do not add much meaning meaning to it of course they may add they may add some grammatical meaning but the is in

does not really add such value or the meaning to this sentence and punctuation removing punctuation that's one of the processing areas another way is that so we do lower casing the the the reason why we do lower casing is let's say that you have two quick word and one quick is in lower word and one quick is in the upper case so the computer to make sense that this is the same word it just um lower cases so the computer will not take the same word but uh differently it will take the same word same

that's why we call this lower casing all the sentence so it's basically to ensure let's say a hello in the lower case and hello in the upper case are treated as the same words once we do the lower casing then we do uh stop what removing and then there's one interesting step known as stemming and lemmatization so what ises it it really means stemming and lemmatization means that it reduces your words to the root form so let's say that you have quick brown fox running it will make the running to run the the reason why

it does it is because we want to make sure that all the uh uh added plur added you know grammatical things gets back to the root form so so that anything let's say running and run will be considered different so what we do in we cannot do anything with run but we can do something with running so we apply stemming to it so it converts back to a root form so I want to show you how does it really looks like and this is by the way stemming so you have let's say you have a

paragraph where these things uh doc Changs with their time with the changing stuff so they have a these these they all mean the to be honest same but it really is uh different grammatical stuff so stemming will make this to a root form so all the chain changing changes changes convert to Chang that is stemming is little harsh so there the difference between stemming versus lemmatization is stemming sometimes may give something which is not uh you know known so it just make backs to the root form like this change me you're forgetting the e but

lemmatization is little more flexible and makes sense of the words it changes to so over here if you see change so stemming or lemmatization based on the context if you have too many if you want very stricter guy who just makes the uh to a root form like hell that's where the stemming is and otherwise lemmatization is one of the form so let's go to one of the Practical example I really want to showcase to each and every one people out here is we'll start so we have I've actually taken some of the reviews I

absolutely love this product all these things and I converted it this into a data frame which we call this as a review and then the column was review and then what I did the the first step is lowercased it so I simply taken a DAT data frame and that particular column and then lowered it and created a new so that for you so that you can see it so you can see that I actually love this product it's super actually converted into all these lower cases and then the next step which which goes in that

you go ahead and remove the punctuation and emojis irrelevant punctuation and emojis of course see sometimes emojis adds a really good value so what you do instead of removing emojis you eventually go ahead and make sense out of Emoji so like instead of having an emoji you just say what is the meaning of that emoji there are several libraries available to convert a particular Emoji into a particular meaning so for example if it is two emojis so 2x happiness would be written out there so of course over here we have for the Simplicity we are

just removing emojis but emojis truly add a great value so you just go ahead you remove the Emojis and punctuations and all and then you go ahead and then that's where you see that all these things are removed and then you go ahead and remove the stop words so we use something known as nltk library and then we download the stop words so stop wordss are there are lot of stop wordss uh so nltk provides a corpus of stop wordss which you can see out here that uh there are a lot of stop wordss which

you can see uh so and nltk gives a corpus of stop words so we download those Corpus of stop wordss and then from the Corpus we just import those stop PS so we take out this Corpus of stops which are ining English and then we say that uh okay here we go if find out the Stop wordss and remove it or possibly it is saying that uh make a new column of review no stop wordss as removing every stop wordss out from there using uh list comprehension and all so which you can see that it

it it it removed I it removed this it removed it the reason why it remove because it does not really add a value so absolutely love product super efficiently reading so it's more like processing as much as you can do so that your computers does a good job or gets a good data in it then there you go with stemming so stemming means as I said making it back to the root word so you use nltk library and use the pter stemmer you can use the lemmatization lemmatizer tool so you can see this absolute has

been converted to this the product super efficient so you see absolutely got converted to Absolute right but but stemming is little harsh that's why the the word is not making sense if you use l lization it will convert this to Absolute so these are few few of the text processing techniques I would love you to know and what you can do you can go ahead use several libraries make sense out of it I'll also link some of the resources for you to explore more about the text processing now the next step comes in now once

you have the good data once you have the good text Data the next step is to break it down into a chunks so that so once the text is clean we need to break it down into smaller pieces because as you see if you have a big problem convert it to a small pieces so typical it's like if you have a sentence you convert this to into words so that the computers can go to each and every one of them and analyze them individually so we'll go ahead and talk about the tokenization and and embeddings

right now so now the next step is nothing but known as tokenization right so what does tokenization really mean tokenization means once your text is clean we need to break it down into smaller pieces which is typically into words or subwords so that the computers can go ahead and then analyze them individually so let's say that you have quick brown fox jumps you have a tokenizer so tokenizer will convert that into several words like a like a chunk right so what does it really mean is it's it splits the sentences into words or a smaller

units so it's like a indexing uh so it's like a basically converting a bigger sentence to a smaller smaller sentence so that your computers have a very easy time to go through each step individually so let's go ahead and uh find out uh some of the uh practices around it some of the uh collab notebook some of the practicals which we should do is what is why token is is needed I think I'm I'm big fan of anything you know you do it you should first of all know what why how and all these small

questions you'll truly get a concept so why tokenization is truly needed uh as I've written into this scab notebook to for a computer the sentence I love NLP is simply a string of characters without any inherent meaning so what you do you truly convert this into a smaller subpaths so that computers can go analyze and compare and per per perform operations on so it's typically easy for them to know about uh breakable sentences instead of a long sentence out there now the next is contextual understanding so tokenization allows models to handle words subwords or even

individual characters meaningfully even if you have the Emojis it will have its own uh relevance towards it So eventually that TR truly helps I know it might not make sense too much of why we are doing tokenization right now it will make sense if we truly go to model number 2 three and all it it will start making sense to you so how tokenization tokenizers actually work very simple let's say if you you use uh tokenized uh one of the analytical Library again and use word tokenized you can also do sentence tokenize and all it

I don't like NLP so it will converts into this tokenizer it uses that tokenize and converts I do not like NLP and even the punctuation has its own relevance out here that's why uh we need to have it the tokenizers and then there you have several other tokenizers which is uh subword tokenizer which we call this as a bite pair encoding bpe which handles uncommon words and language variations more effectively so there are a lot of tokenizers if youve Truly find there lot of trained code token to iers by hugging face and all these libraries

all these big giants like Google and all the foundations is set by tokenizers because a model needs to know how should I split a particular sentence one might be we just spaces but that shouldn't always make sense because over here the spaces are not there but still it is able to so sub tokenizer what it does is it splits unhappiness uh unhappiness into unhappiness so allowing models to recombine subwords and understand them into a different context out there so that's that's why we use subword tokenizers by per encoding another one is sentence tokenizer so sentence

tokenizer means we split the text into a sentence rather than words it is not rarely used it is very R rarely used I have never I have rarely used it uh but but but but there are a lot of uh companies who still use it so the stock tokenized are nothing but breaking down text into chunks there are several ways to do it words subword sentence it should make sense right so this is the uh about tokenization and all our next is so let's say once you have processed the text once you broken down into

manageable chunks so that the computers could understand but even though you have broken down it's still a text it's still a string of text in in a list so what you truly do you have something known as embeddings so what is embeddings embeddings are nothing but uh it's like a uh what do you what can say you represents the words as a numbers so computers can understand similarities so if you know about one hot encoder so one hot encoder what it does it converts your text a categorical feature by creating a new column and then

stating whether that feature was available or not for it to make sense to the models the same way embeddings work embeddings are nothing but you represent a so let's say man woman king queen so this these are the uh what what can you say uh kind of a tokenized vector so every word into that will have its own relevance so man with a living being will have 0.6 feline it's way away human yes it has a more similarity so it it it is converted into an understandable numbers so that's what we call as embedding so

like kind of a vectors we'll talk more about Vector database embeddings and all in more detail in a bit but it basically turns the words into a vectors and if you see over here that similar words like king and queen will be more closer than than than than any other uh over there for example man and woman will be more closer so their Vector will be more closer to each other rather than um what can you say uh other words so let's quickly go back to our embeddings so how you can think of embedding is

that it is a way to represent words the tokenized words each and every word over there so that the computers can understand the meaning and find out the relations between them so let's say that each student in your class has a different Straits like height hair color or favorite subject and each word has a different meanings or traits that needs to be represented so let's say I would like to take one example uh so imagine you have the words like king queen apple banana and let's represent each word with these three traits which is royalty

score fruitness score and gender score so these are the three traits just like you rate a student based on their strength Speed and Agility you rate you actually uh rate these based on the braid these based on these three traits so que King will be more inclined towards the the the the king and queen have a high royalty score apple does not really relates to these uh uh royalty score so if you see the similar words king and queen have a high royalty score so that computer knows that they are related in terms of being

Royal but apple and banana have a high fru iness score but low royalty score so they are recognized as a fruit not royalty you can also see the gender differences King one Queen 0.5 apple banana does does does not make sense so the embedding is nothing but they create a sort of a vector with some traits out there and you don't have to worry about every time about how we'll do this in a real data set and all do not worry about that right now try to understand why embeddings are created embeddings are created because

of very simple because we want to make sure that computers understand the word based on their traits for example King will have a higher rity score so traits needs not to be always defined that has to be trained we'll talk about how these uh models are trained later on in a very brief way but Bings are nothing but a set of vector databases so what can you really do let's say if you ask a computer to find a word similar to the king it will look at the numbers vector and whoever is Clos who who

whoever is close to the king whoever is close to the que King director who whoever word is close to the king Vector it will show that king king queen is so close because they both have a high scores so these are one of the few examples uh so there are several ways so you know there are a lot of algorithms available right now who convert your text Data into embeddings because they're already trained enough to represent these words so one of the algorithm is known as word to W we'll not go into architectural details on

how it is created and all these are for research purposes we are learning R so we'll just focus on I want you to just know word to W is nothing but an algorithm which can help you to make or uh make meaningful vectors because you cannot just put any numbers into it it should be meaningful like king and queen should be closed and man and woman should be closed so we train the word to wck model first of all on these sentences right and then we get the uh embedding so embedding for the king is

this right and then there what is the most similar words to the king so we find using the model we find the most similar which is king queen fruit man and apple of course it has disparities but but but you can see the queen is first so King is to queen so it it it has the higher similarity score between we'll talk talk about how these similarity scores and all these things are created for just now I want you to focus on understanding the core idea behind embeddings we'll talk more in detail over the period

of a time so if you splot this into higher dimensionality you see king and queen are closer to each other of course we have limited data set so it has it it it has not been performing very well but this is how it looks truly over here so if you have used something known as one hot encoding so one hot encoding what does it really do it simply goes ahead and then takes a categorical creates a new column and then says 1 0 1 0 1 so if you have 100 categorical variables it will create

100 no more columns so higher dimensionality higher more complex there lot of issues and there's no semantic meaning there's no relationship between words there's no relationship but over here there is a relationship and because we are putting some embedding into it so there is a relationship uh so that that's how you're able to use the traits to find out king is to queen apple is to Banana Man is to woman so it's like that so embeddings are nothing but uh converting your words tokenized words into a set of vectors meaningful vectors uh which when compared

which which represents the relationships between two words so this is an small overview about embeddings what next we will do is we'll talk a little bit more in detail about something known as uh uh attention mechanism Transformers and you will see how these three steps of processing tokenization and embeddings are always coming into the place and you will TR and you'll get more understanding about it in a bit so now we will go ahead and talk about attention mechanism so in the last couple of topics we eventually went through the three steps which is useful

for the computers to make sense of the language so we process the text we tokenize it into manageable junks so that computers can go and see the words individually but still the chunks are still string of characters so what we we do we convert those words into an embedding so every word will have its own Vector representing its meaning and embeddings can be several embeddings one of them one of the algorithm which I shown which is word to algorithm which converts that words to a particular uh meaning out there so we'll not talk in detail

about how this training process looks like and all those things I want you to get a Core architecture of it core idea of what exactly these attention Transformers or embeddings are so that when we go to R AG you'll be able to understand them much more in that so it's attention so now once we have the embeddings embeddings are nothing but a set of vectors given to a particular word uh to represent the meaning now once you have those words now let's say that you have a whole whole new document of 1,000 2,000 100,000 words

so you'll have you'll be having 100,000 vectors so if computer needs to make sense of it they need to go to each and every vectors out there and then try to understand it now that's where the problem arises the problem is let's say that you have an long sentence which is U kind of a 100,000 words so what what you will do um you you will eventually go ahead the computers needs to go ahead go through each vectors which is an embedding of a word and then try to make sense out of it so I

want you to look at one of the example so let's say the chef who was famous for his delicious pasta quickly prepared the dish for the guests so as a human reader personally I would I just focus on the main point I say okay the shef prepared the dish for the guest so the cont so the the main point over here is the chef prepared the dish that's what the the human reader will do if you are given a document you'll have the main point from that document instead of you going to the each word

even the and everything and forgetting the main point so uh de is like as you stated that over here famous for his delicious pasta will add some context to it but are not in essential for understanding the main action what has been done so you will not every time go to your dad and say that um I mean the chef who was famous for Del pasta quickly prepared the dish for the guest does not make sense the main point is the chef prepared the dish that's it and details like famous does not really add so

much of uh value or the main action towards it so just like this when computers process Texs when the when when models process text they struggle with their selective focus which means that they tend to give equal weight so for them the the chef would would would have the same weight AG who was famous will have the same weightage so every word they give it they give equal weight to every word which means which makes it hard for them to focus on what's really important so if you take uh let's say you have a several

words out here and you take every you give every word equal weight you pay more attention to every word you'll forget what the main point is so humans will automatically focus on the main point but but computer struggles because it gives Focus to every word into that sentence which makes them really hard to know what's really important so how does this uh there's something known as attention so attention solves this problem by helping models focus on the critical part of the sentences just like how a human pay more attention to the main points when reading

right you have a one one chapter you you have few main points from from that chapter so to give that capability of focusing on what's really important and what's really um matters a lot that's what attention mechanism comes into the play and that's where attention we uh the the the the the concept of attention comes into the play so what is attention it's a process that helps the model decide which words in a sentence are most important into a specific context so it's just like it helps computers to know what is really important into that

particular sentence into that really specific context so it's like giving each word a weight that indicates how much attention it deserves so let's uh let's take an example let's say that you have the cat which was sleeping jumped over the wall so words like cat and jumped get a higher scores because this explains the whole meaning it says cat jumped which is the main um important point now let's say if you're trying to understand who is performing the action the model might Place more words onto the uh so let's say that you want to understand

who the the main course of action so the chef prepared over here it will be more the chef and the prepared so the weightage would be given to these um important words and less weightage will be given to uh famous I mean or delicious pasta because they are not they do not add much um I what what can you say it's more uh essential for understanding what was the action if you if your task is to understand that ction right so it's more like that uh it it assigns in a sentence attention assigns each word

a weight of how important you are into this particular context right so higher weight means more attention lower weight means lower attention or less attention so it in a simple terms it helps the model zoom in on Words that matter the most for the task for example if you're translating a sentence it focuses on a most important word if you're answering a question it t it just takes uh uh from a big question it takes okay this is the question that is the question okay I will use these two important words or more weightage words

to find out my answer so it's basically zoom in into the uh one words that matter the most and then try to find the task so let's say that you have 100,000 words embeddings it will only focus on few hundreds who has the higher weightage instead of focusing on all those one ,000 uh because that is computationally expensive hell lot of issues and all these things comes into the play so I don't want to go into more detail about uh the architectures encoder decoders and all those things but I will just quickly skim through how

exactly it works the attention and then when we go to Transformers you will see how attention uh builds up to Transformers so attention how does it really work so I want to classify attention to two different steps the first step is it assigned importance to the words so when processing a sentence attention assigns each word an important score which shows how much each word contributes to the meaning into the current context so let's say that in this example the cat which was sleeping jumped over the wall um if we want to know who jumped the

words cat and jumped would receive higher attention scores and in instead of which was sleeping over the wall will receive lower scores because it does not really relate to the action of jumping so if you ask who jumped is cat jumped so cat will cat and jump will have a higher scores while rest of them will have a lower scores because they do not really really direct with the relate with the mean with the action of jumping into this particular context so that's the first step is assign importance to each words then calculate attention scores

so the model calculates these important scores based on the relationship between words so as I told you in embedding king and queen are more closer together man and woman are more closer together because their vectors have the similar uh meanings so the model calculates the importance these uh I mean scores based on the relationship between two vectors or edings so words that are more closely related in meaning or action will have a strong connection or higher scores because if you talk about who jumped cat and jumped will get a higher score because they're directed related

to each other and wall will also have a moderate relationship with jump because it tells where the jumping happened so if you truly see it involves in in practice it involves some math as I told I don't want to get into too much detail about maths and stuff at this particular point of time but the goal is to figure out which words influence each other the most so it's nothing but giving or finding out the relationship between two words and if there is a higher relationship then you give the higher scores if the moderate you

give moderate scores if there's low then you give the low scores if there's a cat and dog it will be together but into this particular context it will be far apart right so that is the step number two is once you have once you give a importance then you try to use that importance to figure out the relationships between the words and then you you adjust the focus based on this course so after calculating the attention scores the model adjust is focused to give more importance to the higher scoring words so instead of treating every

word equally what model really does the model focuses on the words with the highest retention scores and use them to interpret a meaning so what does it really mean is it adjusts the focus so as we go along it adjust the focus to uh Focus starts focusing on the highest scoring words for example if you see if you chat with a GPT it understands your context right so it is it is it is adjusting the focus on the way it is not uh what can you say it is not having the same scores it is

if you are asking a one question and then you're asking a different question so different question will have different words so those words will have its own uh now the GP will start to focus on those words so it basically adjust focus on the uh scores out there I hope that it makes sense now this is how the attention works so attention is more like um let's say that you're in a class and teacher is explaining an important topic so when the teacher talks you might pay close attention to the key points because they only

talk about the main ideas from the chapter they ignore small details which does not really add much value so the same way attention mechanism helps the model SL computers focus on the main idea with within a sentence like who did rather than getting stuck on every word out there right that's why the attention is pretty important into this analogy I want to take some more some of the examples of uh attention mechanism so in machine translation uh let's say how attention has helped in a context like machine translation so you might be using something known

as Google translate so into this you focus on the specific words in one language and finding their more important equivalence in another so it's basically you focus specific words into one language instead of going through every word into that particular language you focus on it in question answering like GPT models can zoom in on in the part of a paragraph that has the answer instead of reading everything equally so let's say that you ask what is data they GPT will have its own set of two three paragraphs so instead of going through each word it

just zoom s into the particular answers over over there and tries to get you the uh answers same goes with text summarization so attention helps to summarize a text by focusing on key points rather than every sentence so when we do our project on Lexi chat you will try you notice that how well a model takes the particular whole big document and eventually goes ahead by answering your crisp question and and now you'll be wondering how this is possible that that is possible because of a attention mechanism because it helps to focus on important words

in the main parts of the sentence which assigns weights to the words based on the important importance in a particular context and attention scores are calculated based on relationships uh King King is to queen man is to Woman based on the relationships and guiding them to focus on the important and the key words so that is why I wanted to go a little bit deep into the attention mechanism I hope it makes sense to you now now we'll talk a little bit about how we can use the attention mechanism to build more powerful NLP systems

and eventually how we eventually come to models like retrieval models and generative models over there so now we come to the another major aspect known as Transformers so before going on to Transformers I want you to realize that any topic we are studying we studying three main such things why what and how or we can say what why and how these three main ideas into every topic so that you understand it fully in depth so I want you to think about a problem uh into the attention mechanism let's say that you have your tokenized text

and then you have the embeddings attention mechanism help you to focus on um uh what do you say each words attention mechanism helps you to focus on The Words which are really important based on the context now tell me how will the model know which words are important like based on the context right so how will models know about the context so let's figure out that the cat jumped over the wall that is a sentence right and for for attention mechanism to know what is really important you need to know the full context which is

the cat jumped over the wall so if somebody asked who who jumped cat jumped how jumped over the wall so the models needs to know the context attention doesn't not helps in context who helps Transformers helps and that is the another problem which arised that how to make models understands the context in one go it's like that so I want you to understand a story which is let's say that you read a long novel one word at a time right without being able to look back and forward it would be tough for you to understand

the full context so if you are having 100 page of Novel you're reading one word one word One word you'll not you'll be like uh let's say you are reading a story it's like that so if you read a one by one you'll not able to understand it right so this way what you do you go ahead and understand the whole context like a human right so this was a problem for older NLP models which rate sentences like the cat jumped to the world one by one word by word in a specific order so as

soon as somebody reads to the 10th page the model read to 10th page it forgets the first what it right so it it was not having the context as I told that even attention mechanism needs something which can help you to understand the context better then when attenion mechanism will work otherwise it will not work so uh instead of reading one by one Transformers helps them to solve this issue by letting the model see the entire sentence at once so the the the in so Transformer will see the entire the cat jumped over the wall

and the Transformers will help attention to Transformers will give the context to the attention and attention will see all the words inside it now uh basically they don't process text word by word they can capture complex patterns and relationships within a sentence in a language without losing a context so instead of looking one embedding which is the words embeddings at at a time it looks at entire embedding at once and gives the um ability to the attention models to focus to find out the important words from here right so before Transformers it was hard for

the computers to remember the details which had appeared in the earlier in the sentence understand the complex relationships or process text quickly it used to take a lot of time so Transformers changed this by letting model see the entire sentence or a text at once which means that they pick out the most important words one at a time and understand relationship ships even if the words are far apart the cat over the wall so they are far apart and process taks much faster much easier than what they can they could do so let's say that

you're reading a sentence the lion tired after H hunting all day rested under a tree so without Transformers the model might forget the word lion by the time it reaches the rested so with Transformers it can remember that lion and rested are connected even though there are the wor words in between right so it eventually helps the attention mechanism it eventually helps the attention mechanism to focus on the important parts of the text and capture relationships between the words even if they are far apart into a sentence so I hope that makes sense about what

is Transformers and why do it really need because to understand a context or see the sentence at one go otherwise it used to take a hell lot of time and the problem which was coming is people the models were forgetting by the time it was reaching to the end of the sentence so it sees the whole story at one glance and uses attention mechanism to find the key ideas immediately and capture relationships between the words out there that is what the Transformers are it's pretty simple I don't know why people think it's too hard it's

pretty simple it's like a looking at everything so if you understand a thing via story you will get it much easier with without too much in mathematical details so how it works because as I told we'll talk about why what and how so how it works the first step is that you represent the words with the embeddings and the positions so as I told the words has to be a string of characters has to be converted into an embeddings so if you see the dog chased balls so every word of course we have not done

text processing over here that's why the is still there um so basically it is convert into a numerical representation which is a vector embeddings which we'll talk about vector embeddings vector databases uh in in a bit but over here we you convert the words into a set of numbers and embeddings that represents the meaning of the word and which captures the relationship right which captures the relationship and also you add the position to the number to each word to keep track of order of the sentences otherwise into the into the model model might forget which

is the order so the also add a position to each and every word into that for that particular sentence so the position number tells the Transformers where each word in the sentence lies so it knows the order is the dog CH is the ball instead of ball CH is Dog the so it's it shouldn't happen like that so the step one is you make the words into an embeddings by giving the positions to each embedding where it lies into that particular sentence and then you use attention to find important words so then then the Transformer

then uses attention to determine which words in the sentence are the most important to understanding to understand the meaning and attention scor tells the Transformers which words relate most closly to each other so it's more like that you use attention to determine which words in the sentence are the more important and then you use attention scores to tell the Transformers which words are related to each other so let's say that the in the sentence the dog tired after running chased the ball now here dog and chased are directly related while tired after running is an

extra information added so attention helps the Transformers focus on the dog and the chased because they tell the main story now if the question is different what dog chased then the attention scor gives dog chase the ball that that's where the score will be high for that right so every context every question will have its different different attention scores which tell who are closely related to what so let's say that when you have a when you have a model asking you you you are asking who chased the ball the Transformers will knows to look at

the dog and the chased and ignore other details because they have a higher attention scores and they are closely related to each other right and then the next next step is you use the attention to find important words and you use attention scores to group them I mean uh to to know which words relate most closely to each other then a step three is you less Transformers have less each when refining the model understanding so imagine um a group of detectives trying to solve a mystery each detective specialize in finding certain Clues like fingerprints eess

motives and all those things so your group of defective and trying to find some certain stuff to solve a mystery and in every detective into that particular group we'll have it uh has its own speciality fingerprints and all because as you see even in other shows they have it so they build a full picture of the case so together one may be good at fingerprints one may be good at motives one may be good at understanding some certain Clues so together they build up a very strong case full picture of the case same way each

layer of a transformer model adds a new level of understanding until the model has the complete idea of sentence's meaning so let's say that um in the first lay the Transformer might look at the basic relationship like connecting dog with chased but in the next layer of the Transformer model it might understand that chase the ball is an action happening because the dog is involved so it's more like that every layer of transform farmers will have its will improve the understanding of the model better and better so each layer is like a filter that focuses

on the different aspects of that particular sentence that's the step number three step number three helps you to refine the model by having more less because a single layer might know okay dog chase the ball which is dog and ball next sentence would be what they chased uh how they chase so all these layers will have it will make it easy for the models to really understand the whole context in now and then you go ahead and then you have uh several examples so let's say the go Google translate so Google Translate is how does

Transformers Transformers are heavily used in Google Translate they understand how words relate across languages so they can translate phrases accurately even when the sentence structure is different Transformers can allow chat Bots to know what you're saying and respond naturally so it's like if you type tell me a joke the Transformers focuses on the important details like joke and response with something funny and ignor another unnecessary details so all these are few steps to know about Transformers so I hope that you're getting a story that how embeddings are very useful right and how attention helps you

to build a better and how Transformers builds you to solve a specific problems while dealing with texts so I hope that this really makes sense in the next one we'll talk a little bit about text similarity because that's another major concept which we'll talk about in a bit so now we'll go ahead and talk about something known as text similarity so what is text similarity so it's a way to measure how much two pieces of a text are alike or related to each other so um you would I was I want to tell you a

story U if you remember I was telling that attention scores will tell you how two sentences are e are I mean closely related related to each other right so we used to say king is to queen man is to woman so they were the their embeddings were closely related to each other by seeing a plot you can see it right but how can you do this mathematically because I want you to get a really understanding of how does this happen when in code and all so it's basically similar let's say you have a two sentence

and every sentence will will be tokenized will be into words and then you'll be it will be converted into impairing right so let's say you have a two sentence I love to play basketball and basketball is my favorite sport so what how does the tech similarity helps so Tech similarity measures helps computers understand this relationship by assigning a score to show how these two sentence are um related to each other you can also do for word similarity all right so let's say that if user asked a question simil um um I mean who played what

is your favorite sport then similarity scores will help you to locate a text that's more closer related to that question I wanted to quickly go ahead and tell you about it so I want to talk about something known as cosine similarity so cosine similarity is nothing but helps you to get um relationship between s similarity scores between two words so let's say item one is men item two is woman so it will tell you the score between two of them by taking out the angle of it I don't want to get in more mathematical details

I want you to understand this via an example so let's say you and your friend are each holding a flashlight if your flashlight is in the same direction let's say in a North so your paths are similar so you'll be having the higher similarity score but if you if you are having if your flashlight is in different direction your friend is in different direction then eventually your similarity score will be different so cosine similarity what it does it measures the angles between your two vectors I hope that you know how to plot a vector into

two two dimensional vector or into a X and Y graph uh uh it eventually finds out the angle between two Tex vectors and it shows how similar two pieces of text are by comparing the direction of their meanings rather than the exact words so if item one is into the North and item two is in North they are similar like they're facing the same direction but if the one vector is in north and one vector is in this uh another another Direction then it is not similar right so cosine distance will be high so coine

similarity is one way so let's say if someone asks tell me about basketball so what it does cosine similarity finds sentences about the basketball by comparing all the angles of the vectors that represent those sentence so they it will pick the basketball using attention and then it will find all the vectors which are closely related to each other by finding out the direction and if they're facing the same direction then it shows the uh results relevant to it right so it's more like that it considers the overall meaning by taking the direction instead of going

through each word so if you want to say I mean so it just say I will match your each sentence to the sentence if it is 100% match I'll give you 100% match it is not like that Co similarity is more like if it even has the similar meaning overall meaning then they'll point in the same direction if they are a different then they point in a different direction that's coign similarity another similarity which we call this as jakar similarity so let's say that you have two group of friends and if two people have a

lot of friends in common they have a higher similarity score so but if they don't share many uh friends then their similarity score is low and you'll Vive less with them so it's basically what it does it focuses on comparing words directly it measures the similarity between two sets of Word by checking how many words are there in common so let's say uh for questions like what is basketball jakat similarity will match answers that include similar words like basketball sport or game it fails to do something uh which has a different words but with the

same meaning so that's where jakar similarity is more like it's for finding matches when we want exact word overlap so it's like it compares the two different words if it if the sets the two two different set of words and if the sets match with more words then they are similar if it does not it does not so they are not similar so jakat similarity but but it fails to match different words if let's say somebody says basketball but might be a different name for a basketball right so they will they jakar similarity will fail

to match it another one is ukian distance so ukian distance is let's say that you have two cities on a map the shorter the distance between them the closer they are right similarly two sentences close together in a space have more related meanings just like in cosign similarity but over here instead of finding an angle between them you find the distance between the straight straight line distance between the two vectors right you clearan distance all of this I think you have already studied in mathematical part it's pretty easy so you that's that's another measure of

finding out the uh uh what can you say finding out the similarity between two test sentences of words over here you're using embeddings embeddings uh which are vectors which are plotted and then you will see okay these are closed related and then you use ukia or course and SAR to figure out the exact score right so that is the about the tech similarity now once you uh now uh I want I just wanted to clear Tech similarity in a very quick way now I want to cover something know as information retrieval so let's say that

we are we have talk talked about how should computer focus on irrelevant words how how would computer focus on based on the context how would computer figure out the scores of how similar two words are but how would the computers or models will retrieve the information right will will from the large pool of answers right so that's where it comes in something known as information retrieval so let's say you are in a library with millions and millions of book and you want to find a book about space exploration so what you you do if you

are a non guy then non expert guy you'll go ahead you'll find every book and then see if it is a space exploration book or what but if you use but if there's an expert librarian it will go ahead and then find out pretty easily right or if you have information triable systems um find out those information very quickly right so I would like to take one example and I would like to talk about how this works and how this eventually Works how the information of treil happens into three steps so so the first step

is document representation so what does document representation means document representation means that it converts the system the the set of sentences into a vectors basically in Anem an embeddings so it it it eventually converts a vector into nothing but an embeddings or a set of vectors that's it right so that computer can make sense out of it that's what the document representation means there lot of uh embedding platform some may be let's say uh one hot encoders one hot encoder is non-semantic meaning so let's leave that let's say word to work embedding or other embedding

uh Tech techniques which we eventually like which we eventually use uh which we'll learn a little bit later right now uh what embeddings technique which will used to convert the words to the numbers but first of all you convert that turn text into a data to search the information and information retrieval system needs to represent each document in a way that makes the searching possible if there's a string of words it it won't be possible so you convert that text into numbers or vectors which makes it easier to compare sort and rank information right so

it's like you are creating a catalog for Library you Summarize each book by its genre keywords and Main topics just that that's why I say every chapter will have its own main idea so use attention to find find out the really important words and then you convert that to an embedding and that's your first chapter that that that that's how the document representation looks like it basically captures text into numbers which eventually the numbers should capture the key Concepts and all so there are different type of representation tfidf which we'll learn a little bit in

a later part embeddings as I'm talking about it converts the words to to a meaningful numbers uh which tells about how closely they are together and then the next step is a scoring and ranking so representation as you can see that small dog cute cute cat cute has converted into you using bag of words they have converted this into this particular Vector that's what the document repres representation means so that computers can easily search another one is scoring and ranking which is deciding what's the most relevant so let's say you search for uh planets in

our solar system solar system so the IR system scores doc based on how closely they relate to this topic so a document about Earth and Mars will have a higher score than about Galaxy formations because it is more relevant so use the text similarity scores to find out the like coine similarity or probably ukan distance to find out the similarity scores and then rank them into a particular order based on the question so it simply works it calculates a similarity score using cosine similarity Oran distance for each document compared to the query so if the

planets in our solar system so it will compare to every document and we'll see which is the which has the higher similarity score then you have something known as indexing so indexing so when let's let's say that you have um a book and into each book you have chapter number one chapter 2 chapter 3 chapter 4 chapter 5 and in into this chapter you have several Sub sub chapters and all so indexing is just like that it's a table of contents so that in let's say um think uh let's say you want to find information

of photosynthesis so you don't flip every page of the book right you go to the index you find out the chapter where photosynthesis is discussed and then you only search into that so instead of going and matching the similarity scores in every document every document is assigned with some in index and the solar system is talked in this particular chapter and this is what So eventually indexing is nothing but it does the index to create a table of contents for the IR systems to find the documents very quickly instead of uh going through each and

every document and measuring the similar similarity scor that that that that wastes a hell lot of time so it's basically it's creates a list of important words and links them to the relevant documents and whenever a query comes in it first of all sees the index and then with within the index it finds the relevant documents and then compares the similarity scores and gives you the relevant results right so I want to take one example of Google Search so how does Google search works so first of all Google represents web pages by looking at the

content important words and keyboards so if you remember Google you need to put if you're from SEO you put keywords so that Google would know that this article is about this particular thing so whenever some question asked they they can eventually uh refer to those articles and then Google assigns course to each paged on based how closely they match your search so it rank the pages based on the most relevant appear first and then Google index keeps track of which Pages cover which topics making the search more relevant so if you have machine learning data

science content AI content Tech content so the indexing happens so whenever the search comes in it sees through the index and goes into that particular index which has the relevant document and why I'm teaching all of this TCH similarity information retrieval because in in in in R AG when we learn about it's just about retriev retrieval and retrieval models and generative models and that's it so all of this really helps you to when you go ahead and understand RG in much more in depth level because that's where you'll try to connect the dots of why

we studyed information retrial why we studied other details and all so we have studied a lot of things we have covered a lot of foundation which eventually helps you to uh get a thorough understanding now what we will do we'll come come to something known as retrieval models so right now we you we'll come back to again information retrieval because there are several ways how we can retrieve it the process Remains the Same representation scoring ranking and indexing organize locate and rank that is the um way to think about this so we'll eventually go ahead

learn about several retrial models because how do we represent the document how do we score it how how do we rank it how do we index it so there are several ways to do it right something know as traditional retrieval systems dense retrieval systems all of this we'll talk in great detail right now uh and then we eventually go to generative models and then the RG and then we have the final project which we need to cover I hope you're enjoying this let's catch up so now let's dive into retrieval models which are like search

engines within a system within a models so I want you to first of all let's uh let's imagine uh situation so what is the full form of R A the full from of RG is nothing but retrieval augmented generation so uh what we will do we'll break down the two components of RG which is retrieval and generate uh generative so we'll talk both in Greater detail so into this particular modu we'll talk in detail about retrieving models so they are nothing but like a search engine within your database or within your system so so to

make you better understand what is retrieval models let's first of all understand the need of retrieval models then only we'll try to frame a very nice definition out of it so let's say that you are in a huge library with millions and millions of books and you need information on ancient Egyptian history now into the previous example as I told you why we need information retrieval the same challenge was there the same scenario I put there that you have that your a massive library and if you're only expert only then only uh you'll be able

to pick out something so the same way the scenario is over here that you're in a massive library with millions of books and you have been asked to write a report on let's say ancient Egyptian report sort of thing history so instead of specific books on that topic there are books on everything because there are millions of books so it ancient R modern science and even cookbooks or even other set of books so the challenge is how do you find the exact books that cover your ancient Egyptian history and without even spending hours or even

days sometimes flipping through each book and seeing where you'll find this particular information and that is where uh that is where retrieval models comes into the play that is where retrieval models plays a crucial role so think of retrieval models as more like a highly trained Librarians who understand exactly how to find the information you need because it is organized in a particular way and your librarian knows about where you will find the particular book out there and you ask for ancient to that librarian and then you'll be quickly he will be quickly pulling out

all the relevant books from the shelves and giving it to you so that is what the retrieval models helps to it's basically retrieval models in RG acts as these sort of librarians who search through a massive collection of a database and point and pinpointing you to the most relevant and pieces of information from the text if you search I want to know about ancient Egyptian history the retrieval model will know exactly where it is and will pull out the exact stuff and give it to you that is what the retrieval models are so if he

if if you go into the more uh definition architectures and all sort of thing so it eventually helps you to locate the right information providing the data to the generative model to answer questions correctly so it's more like there are two parts of any system of a RG system the three model and generative model so whenever you ask a question like I want to know more about Egyptian history the in GPT also if you use first of all retrieval models will retrieve the answer from the embeddings database which we call this as a vector database

which will study so it pulls out from the database uh of all the data and then it gets that information and then it gives to the another part which is generative model which we'll study in a bit and that generative model will frame an answer and give it to you so that is what the two part system So eventually it helps the generative model to locate the where the information is and eventually gives you an answer so retrieval ensures that a generative model has the right in information to generate useful answers so if the retrieval

is not there it may not produce relevant information it may not produce relevant answers or correct answers so this is what retrieval models ISS in RH now there are several types of retrieval model two of the types which we'll talk about is traditional retrieval model and dense retrieval model so each method each type has its own way of locating information and each works best in different different situations so let's say um we need to uh we will talk first about traditional retrieval models which is something uh which we'll eventually talk about and then we'll go

ahead with dense retrieval model so a traditional retrieval model is nothing but it's like a keyword search so let's say that you when you search for an ancient uh Egyptian history a traditional retrieval system will look out for documents such as so let's say that you search for ancient Egyptian history what this a retrial model will do a retrieval model will find the documents which has the words like ancient Egyptian or and history so it it only considers the document which has this exact keywords so how uh the better way to understand this is let's

say that you have something of you search of something of benefits of exercise so the retrieval models will go into the all the documents all the database the the data into the database and then will one yield such documents which discusses advantages of physical activity which means where the keywords benefits and exercise are there uh not entirely sorry this is this is the traditional model so uh where that's why I was thinking so it just it yields benefits of exercise only those keywords which has this benefit in exercise this is different one uh sorry by

by mistake Avent to D but traditional it's more about if you search for a solar energy it only retrieves those documents containing these exact words so if the even if the document is related to energy but the words are not present it will not yield so it's more like a uh term matching sort of thing so it searches for exact keyword to match the query similar to basic search engine which you'll build so uh let's say that as as as we took one example of uh let's say this ancient Egyptian history so when you search

for ancient Egyptian history and and the retrievable model will go through the entire database and one yield documents which contains ancient Egyptian history keywords that's it it will not produce anything which may be relevant but the if the words are not there it will not give out so that is what traditional retrial models are and the benefit of it is the traditional method is pretty fast and pretty simple because it's just term matching so you just yield documents which are similar to that word and then you you get it all the uh exact stuff so

it works well when you need exact matches that's it because it one pulls our document where the exact words we are searching for so the only limitation it has that it is limited by exact words we use so if we ask for solar energy it may not find documents that use similar phrases like sun powered energy or Renewable Power from the Sun even though the documents of Renewable Power from the Sun or probably sun powered energy are similar to solar energy but the keywords are not present into that particular document it will not yield that

such so that is one of the limitations of traditional retrieval it works well in many cases but in real word scenario it misses out the documents that are related in the meaning but uses different words and that's another big challenge that even if the document are related in the meaning but uses different words that's where another challenge and that's where this gives a birth to something known as dense retrieval models so what does dense retrieval models say so dense retrieval basic basically identifies meaning instead of the word and retrieve the retrieve the information related Concepts

even with the different meaning so it's like a instead of searching based on specific words it looks for the meanings and concepts of that particular sentence so if you search for benefits of exercise it may yil documents even with advantages of physical activity instead of searching for exact words so how does how does this does So eventually it does this by text into embeddings and then eventually the embeddings will have its own meanings and relationships so whoever are closer to that it it will be ilding so another usage of embeddings comes into the place so

Den retrieval is more about embeddings and all those things so uh dense retrieval is great for finding related ideas it does not really rely on exact word word matches it can find information even which is relevant in the meaning even even if the wording is different but the only down point is that is very complex very slower than ret traditional retrieval models because over here you literally need to find out the embedding and then find out the things which are more relate the the embeddings which are closer to each other and then eventually suggest so

it requires a little bit of more computational expensive and more complex so the role of retrievals in RG is retrieval is essential for grounding responses with relevant data to the generative model so the generative model can produce relevant output and traditional model if you want to find exact facts and definitions but dense retrieval is more for broader context and related ideas instead of just exact facts and definitions so these are some exercise I want you to see if you the let's say that you have a search query known as how does exercise improve health so

traditional retrieval what it will do it will find Direct mentions of exercise and health and then when pull out the information but dense retrieval will find phrases like benefits of physical activity it will yield such stuff so I hope that it makes sense these are few types of retrievable models and the why we really need and all those things now what we will do we will quickly go ahead and learn about each retrieval models in Greater detail which is traditional retrieval model and dense retrieval models so now we'll go ahead and learn about traditional retrieval

models so there are two types which will study today which is tfidf and bm25 and you'll notice what problem we face in tfidf eventually builds up to something as bm25 and these are very fundamentals it may not be very in use when you actually learn it but these are the fundamentals because they're building blocks of more advanced retrieval models and you will notice the key pattern when you learn about RG you'll be easily able to understand stuff pretty easily and you will and you'll know the inner working of it because most of the people don't

know and you'll know the inner work workings and everything so let's start with retrieval models I will not go into why retrieval models again so I will just put put put you up with a scenario right uh and uh I want you to think about it so let's say you are searching for articles on global warming from a vast collection so the challenge is that not all articles focus on this particular topic you'll have 100 200 1,000 two 2,000 articles from different different area and not everyone might Focus but only few mention it very brief

so retrieval models will score documents to help identify the most relevant one very quickly as as we have discussed the reason why we need retrial models but there is a way but this there's one type of traditional retrieval model so in that particular scenario how this retrieval model will work so a retrieval model depends on keywords to match documents with Search terms so it's like it helps by giving each document a score based on how well it matches to your search term so let say you have 100 documents inside it and you have something and

and and you SE something with global warming so retrieval this the traditional retrieval model what it will do it will give the score of to the every document of how well it is matching to that search query and the most highest the score the more likely the document is relevant and this helps you narrow down your search to the documents that truly focus on global Waring so so maybe you have received five highest scores you just pull out the five highest scores and give it to you but it matches the words not the meaning meaning

it matches the words global warming to all the documents out there so one of the key thing these models heavily rely on keywords in a document that match your Search terms as I spoke so let's start with something known as tfidf term frequency inverse document frequency method with scor because we I I just told that we that the search query and the document so every document is compared with the search query and given a code so how do we the score there is one way so tfidf is a method that scores words based on their

importance in an individual documents and across all documents in the collection so let's start by what exactly uh the the the goal is what exactly the TF IDF stands for so TFI IDF is nothing but term frequency inverse document frequency and there's two parts of it the first part is term frequency which is which tells you the importance of that particular word into an individual document and IDF it measures across all documents so the first one is TF IDF so so let's talk about term frequency in Greater detail and then we'll come to the definition

and all the in the term frequency I want you to take one example so an example says that in an article about global warming uh the word warming might appear 10 times so you have an article which which is uh about global warming and uh the warming appears global warming appears around 10 times so this will be having a higher term frequency score because the term into that particular document is about 10 times so you'll get a higher term frequency out there so it's more like that words that appear more often into that particular document

so let's say that you have a document which is just telling about climate so that that that that document is about climate change so if an article is mentioning about global warming 10 times that that article is about the document is about global warming so it's more like words that appear more often in a document are more likely to be important on the documents topic so it's nothing but it measures the frequency of a word within a document so if you when you search for a global warming or when you search for global warming it

will search into every document first of all every document will have its own heading sort of thing that I'm I'm I'm about this I'm about this sort of thing an inverse document frequency what it does common words like the and and all those things will have a lower scores while specific terms like Greenhouse are higher so to understand this much more better so let's say that uh uh you have in a document you have something known as the and a i and all those such keywords so what it IDF does IDF lowers the importance of

a common words so if if there's too much of common words is lowers the importance of a common word but Greenhouse is little rare word so it will give a higher score to something which is green um Greenhouse in this case so basically it gives the higher score to the uh rare and topic specific words instead of common words so even if the appears at 100 times it will not even it will give a very low score that's for the inverse document frequency so inverse document frequency measures how rare a or common a word is

across or document so if it is rare it gives a higher score if it is common it gives a lower score so I want you to work through example so the TF IDF formula is TF multili by IDF that's it and you'll get a TF IDF so TF for warming so let's say so let's say the word warming appears 10 times in a document of 100 words so uh you have a document and the warming appears 10 times in a document and the document has 100 words so the TF score would be 0.1 now if

warming appears only in five out of 100 documents its IDF score will be high so TF is for individual documents and IDF is for across all the documents how rare or how common this topic is so let's say that worming only appears five out of 100 documents so you'll eventually yield around three which is which is the uh ID of score and and then you multiply TF multiplied by IDF you get 0.3 as your TF IDF score so that is the score which you'll get when you search for global warming that document will get this

particular 0.3 right so then then the TF ID score will be calculated for another document so if that document has zero warming and zero word zero mentions of warming and it is and the warming as mentioned is again the idea would be also zero so it is zero right so document number three again so you calculate tfidf for every documents out there and then the scores and then you see which has the higher scores so to understand this very simply very very simply that let's say that if a student searches for global warming documents with

high TF scores tfidf scores for words like warming emissions Greenhouse are likely to be more relevant right so it's basically gives you the scores to each document by uh by calcul in the search so if your search is global warming it will calculates that search query and then it calculates the tfidf score for each document and then let's say if you have another query known as climate change so climate change okay I I go to the first document I go to the second document I go to the third fourth fifth six all the around to

the end whatever the documents which you have right now and then you see from it and then you eventually produce so TF IDF but this only happens we are the words because over here you will you're counting the number of mentions in the particular document and number of imensions across all the document right so that's what the tfit TF is it's pretty simple pretty easy to understand and I I want you to if if it is very simple you just need to recap if you if you're having hard time to understand it so now we

come to something known as bm25 so bm25 is an advanced retrieval model so it's basically a smarter version of tfidf so it improves on tfidf on two important factors the first factor is frequency saturation frequency sat saturation and second one is document length normalization so let's study about these two things in a greater detail now in the frequency sat saturation a word that appears more time into a document does increase its relevance if a document is having worming 10 times then it yes it might be a warming uh document but let's say that warming appears

five times is important but appearing 50 times does not make it more relevant so what it really does let's say that you search for a particular parameter I mean warming five the it it appears five times but if if it appears 50 times does not make it it will consider okay now the warming is important but if that is mentioned 50 times it will not consider that right it will say yeah I know you are capable of doing it I so it's like basically I know you're capable you don't need to prove me even if

you prove me it will not help me out so bm25 captures this by saturating the score for the frequent terms so it saturate so if the warming has been mentioned five times be at five because being being at 50 will not increase your I mean it it it will not make you 10 10 times more important so it saturates it gives you the saturation that you should stay at this that's it because even if you go beyond this I'll not able to help it's just a computational expensive sort of thing then uh another one is

document length normalization so what does it really mean long documents tend to have more words in general so let's say that you have 10 longer documents and and 10 shorter documents 10 which has a thousand thousands of words and shorter documents will have us a few hundred words so to make sure that we are not forgetting the shorter documents just by looking at the longer documents M ments the with the more words to to to to what what can say normalize this situation that we just like in machine learning you have something concept known as

overfitting the same way over over here you have document length normalization so it normalizes the length so that every document receives his importance that's it so it it it it it actually focuses on two major problems which it solve which is frequency saturation that if if Waring appears five times okay I know you're important but appearing 50 times into a document does not makes it 10 times more important or relevant so it eventually saturates it so that the numbers do not be biased that's that's another one okay so it does this by two important uh

parameters which is k1 which we call this is a term frequency saturation and another one is document length normalization now what is k1 so K1 controls the impact of frequency and B which is the document length normalization as I said it can it it adjust for the document lent to avoid any biases so that even the shorter documents are being looked after so parameter K1 it controls how much score increases with frequency that's one so if a word appears very frequently K1 prevents it from unfair boosting the document's relevance score right so uh let's say

that if you have uh in into this uh into TF a document contains 100 words out of 100 um I mean all the 100 words are warming right so it will give you direct one right direct one now IDF only three times appeared uh sorry five five so you'll get three so one multiply so it's three which it yields a very high um what could three which is a very high TF ID of score but just writing warming worming warming does not helps right so to make sure to make sure the relevance is still there

to make sure that it's not unfairly boosting up the documents relevance score this is a parameter to control it the B is it controls how much document length impact this score so it eventually prevents the documents from being too score too high because this contains more words so it basically normalizes both um uh number of a number of a times the word has been taken name and secondly how long so that the shorter documents does not gets ignored so I want to I I want to take one example which is for queries let's say let's

suppose you are searching for effects of greenhouse gasis so with bm25 bm25 uh so I I forgot to you know uh tell you the full form of it bm25 is nothing but best matching 25 okay so suppose we are searching for effects of greenhouse gases so with this algorithm with this retrieval model documents that focus more on greenhouse gases in depth which is I'm talking about in depth not just the um words it will be ranked higher even if they're longer because even and even if they're shorter so that's where tfidf improves so what is

the key difference um between tfidf and bm25 tfidf might give let's say that you search for a warwing so tfidf might give all the documents similar scores even if they contain the keyword known as warming but bm25 would rank documents with more balanced approach prioritizing relevance based on not the share length or the number of times it has been appeared that may go very biased so this this is the basic difference between both of it and I hope that it helps you to understand the RTF IDF and bm25 which is a more retrieval and you

can see how bm25 eventually builds upon tfidf to be more relevant now one of the key thing which I want to mention is you will see that when we study about dense retrieval model you will see that how well this particular uh model improves on top of it so bm25 has its own limitation so it's still doing based on the words but it's trying to extend it by saying stating that I'll put I'll give you little more relevant than what tfidf can give so the same way D retrial will say Hey listen I'll give you

the best relevant stuff so let's talk about um T retrial models in a greater detail now so now I'll go ahead and talk about something known as dense retrieval models where we go beyond keywords to understand meaning I think that this is my favorite topic whenever I teach into the classes to it's something which eventually I also love to teach because when I first of all so I I I saw when I was learning R I was seeing a lot of tutorials on YouTube I was seeing a lot of blogs and nobody was really telling

me what is the inner working behind RS when we actually use it and this is the inner working behind this is something which eventually uh goes ahead so I want you to because I always believe in a scenario I always believe in something which I could present to my students and then eventually get it why do we really needed so I want you to think about a problem so let's say that you're studying for a science Quest and you want to learn about ways to stay healthy it's that and you start by searching for articles

or notes on this topic so some documents we contain ways to stay healthy exactly so ways to stay healthy let's say like that while other phrases tips for a healthy lifestyle or steps to improve wellbe although these phrases means the exact same thing but they don't really contain the exact words traditional model will faila traditional retrial models will fail to answer this because the words is the words and the and the documents are very different the phrases are very very different so even if you search on a Google you'll find several documents which are relevant

but not match the exact but it matches the topics based on the exact words but it finds the more relevant words even if the meanings are not same so traditional model as as I said will only find documents that match the exact words which is ways to to stay healthy but dense retrieval models are different they look for a related meanings and can recognize that healthy lifestyle or improving well-being are also relevant so that is the the challenge is that it is it is hard if you search ways to stay healthy for a traditional reval

it is hard to find out these challenges but the solution is that you have something known as dense retrieval models which is they look for a related meanings and can recognize so it's like it's an ability to understand the meaning and how it comes from embeddings because embeddings are nothing but a vector with relationships so a word converted into vectors with relationships so it's like a code that captures the core ideas the embeddings are nothing but the code that captures the core ideas behind words and phrases right so there is something known as um I

just just go ahead yeah something known as DPR which is dense passage retrieval so DPR is a dense retrieval model because in traditional we had something know as tfidf and bm25 and DPR is one of one of the model for the dense retrieval and which we study throughout so DPR is a dense retrieval model which is designed to find information based on the meaning because it's the same it basically tries to find meaning instead of just relying on exact word matches so how DPR works so DPR is used currently in huge systems which is QA

question answering systems search engines even Google and all these big giants use dense passage retrieval to in such applications and all and um it's basically that when you search for healthy lifestyle that is represented as edings like a vectors of words and then you have several documents which are represented at in bearings and then what happens that the questions are then compared with using Sim using similarity like cosine similarity or ukian distance you get the score that question that question betting is then scored with this question betting is then scored with every uh embeddings of

the document and then which Whoever has the highest score are being recommended it's as simple as that uh you don't you don't really need to you know worry about so I will not go in by again use because we have talked a lot in detail and uh I want to talk about it how it works and what is the inner working of DPR and all so DPR creates embeddings for both queries and passages and documents which is so if you ask a sort of question about benefits of exercise the embeddings are created for this question

and then you have several documents which has which has its own embeddings so what it does it finds the most similar embeddings for the retrieval based on the question asked so it was how it does it compares the embeddings it compares the embeddings so basically it Compares uh the embeddings and it identifies the documents which is close to the meaning in the query so what are embeddings that is the I I I want to again go back and tell you a little bit about what are embeddings embeddings are nothing but a dents of vectors or

a sets of numbers uh which represent the meaning of the text which represent the meaning of the word so uh let's say that uh you ask a question about benefits of exercise and a document talks about positive effects of physical activity but benefits of exercise is not um I mean the words may may be different from the document but it has the same meaning so embedding for these phrases will be closer to each other because they share a similar meaning so that's why that's what the embeddings because it is very very important for us to

uh embeding are very important because it helps the dense model to understand and retrieve conceptually related information not word based on meaning or or word exact word match sort of thing So eventually the DPR uses something known as dual encoder architecture so what is dual encoder architecture so as I told DPR creates embeddings for both one for the uh query and one for the documents so there is two encoders which is one for one encoder for the question and one encoder for the passages so each encoder transforms the text as I as I told the

encoder trans you can use any encoder there are several types of encoders so if you ask a question then that is passed from an from an encoder and that creates an embedding another encoder passes the passages uh pass the passage from the another and then that creates another embeddings so why we do not use a single encoder to encode both of them it's because if you use single if using two encoders it allows this model to treat questions and documents differently and then you can go ahead and compare it and then you can go ahead

and imp compare it out there so we give a different name the query encoder creates an embedding for the search question and the passages encoder creates embedding for all the documents and then it compares the embeddings to find the documents that closest to the questions meaning out there so let's say that you have a query of uh let let's say that you have a query of how does physical activity benefit health so the query encoder will turn this into an embedding first and then the passage encoder will have embeddings for all the possible documents and

the DPR will identify the document that best matches this question by comparing we uh embeddings so so how how this algorithm this dual architecture is trained so that it gives the right answers it it it it gives you the relevant information so to train a DPR the model learns from both positive examples and negative examples we can also call this model learns from the examples of relevant and irrelevant document query or document pairs right so let's say that you have a positive example so when in in the training process of DPR because DPR has to

be trained in the positive examples queries and documents that match like what are the benefits of an exercise with the document about exercise benefits so basically it's like you give a good positive examples and then you give the negative examples so positive examples where the matching is correct where the question and the matching of document is correct and negative example with the matching and the non um I mean non-matching there's a non-matching of if the question is one and the documents which are uh suggested were not correct so you use both positive and negative examples

to train your DPR which is as simple as that so positive examples helps DPR learn what a good match looks like while negative examples teach it what is not relevant so it's basically just like a human do you see positive and negative and you know what to do what not to do and this way DPR learns to recognize relevant documents and ignore irrelevant documents because it has been trained on both there's no in something in Middle it's as simple as that so as as I told the how does DPR learn relevance is basically matches in

positive reinforcement it matches the pairs strengthen the connection between relevant and non-matching pairs reduces connections between unrelated and then this refines then DPR models ability to find and prioritize relevant information accurately so I hope it makes sense and uh I hope that you got the point of I I have not went into DPR of encoder architecture where you encode the words into a document and decoder is where what the output comes in and then you eventually frame we'll talk about decoders a lot into generative because currently we're just talking about encoding so encoding is nothing

but retrievals because you need to encode the words into the embeddings and that embedding has to be come so dense retrieval is nothing but if you search for something you create an embedding and you have the embeddings for all the possible documents and then you truly match which is correct and the training happens by giving positive and negative examples to the model so that model knows how to find and prioritize relevant information accurately so I hope it makes sense I hope retrieval dense dense retrieval model makes sense and that's it about dense retrieval models and

all those things now what we'll do we'll talk a little bit about Vector databases and then we we'll eventually go to uh we'll use something known as PG Vector scale because there a lot of vector databases we'll talk about Vector database in great detail and then we eventually jump to something known as generative models which will give you a more broader understanding of how even though you find the documents now what now after finding documents now what that's where the generative model comes into the play which will help you to um which is known as

decoder into this case where we eventually go ahead and find the stuff so I hope it makes sense let's go back and let's go ahead and learn about Vector databases and generative models so hey everyone now we will go ahead and talk about something about generative models and generative model is another building block of your whole r a structure so into the last couple of modules we eventually talked a lot about retrieval models how it retrieves using embeddings and all those things so now is the time for the generative model and to better understand about

generative models let's first of all look at the limitations of retrieval models and what is the problem which came which eventually generated something known as generative model out there and how generative model overcomes the limitations of retrieval models so to talk about it retrival model is nothing but it helps you to scan the documents and whenever you search for the query it gives you the relevant information that's it it it gives you the those documents or those those kind of a sentences which matches with your query so if you ask for what is photosynthesis it

will take that photosynthesis and start matching with your documents and we'll pull out pull out those information which has the word photosynthesis in traditional retrieval systems and now to better understand uh to uh like what is the big problem which comes into the play let's say that when you search for let's say photosynthesis process so you get all the pieces of information that have the words photosynthesis and process what really comes out as an output from the retrieval models are nothing but a list of sentences or phrases that contains the right words but let's say

that you have those documents which contain some words let's say you have three documents the the the the model gives you the three documents the three sentences first sentence please hear me carefully is photosynthesis is a process used by the plants right to make food another sentence you which eventually retrieval model sends out that it involves sunlight carbon dioxide and water and third sentence is the plant produces glucose and oxygen as an result now let's say that somebody asks a a question about tell me about photosynthesis process it has these valuable information it retrieves these

three sentences which is very valuable but it is it is not clearly communicating the response it is not clear in response it should should be very smooth it should be very it it it should contain a one shot full answer which the way like if somebody ask you what is photos synthesis in the process you shouldn't be telling the these unstructured information by three lines right you should be very clear coherent and smooth response instead of three blunt lines so the the the the actual problem which comes with the retrieval models is you get a

list of words senten es or phrases which contains the right information but not in a way that can be given out to the user which is more clear more crisp and I mean clear response and smooth response out there so um that's where the generative model comes into the play I want to think of one one more example out here so let's say imagine your teacher gave you several pieces of information on photosynthesis from different different sources and asked you to create a paragraph explaining it so what you will do you will take all these

information and then eventually you will use your understanding in English and then we'll try to write in a way you pick the important parts from it and write in a way that flows that you may start with photosynthesis is a process that is used by plants to make food which involves sunlight carbon and dioxide which help plants to produce glucose and oxygen as a result this will be your answer when you when you're giving several when you're given several sources right you do not write you do not just pick out those source and push it

there you should you'll be making a very crisp and you you'll create a paragraph explaining it but retrieval models can only give the information so that's where the uh Foundation of gener model starts in and that's where the uh I mean uh the the the formation of generative model starts which helps you to make information clear and complete so you can see in the slide that the issue with the retrieval model which can retrieve the relevant sentences but they often lack coherence with piece together which means float together combined together in a way that makes

a complete answer so users typically want a complete understandable answer without the need to rec reorganize the information and then get it from their own so uh how does generative model goes ahead and creates so what is generative models into this case so generative models gen Ates the new sentences by predicting one word at a time based on the words that comes before now that is the uh one of the thing which might confuse you so let's understand in this way uh so let's understand in this way that it predicts the next word so you

say photosynthesis is the process by which plants use sunlight and then the generative will will predict the next word which is carbon dioxide based on the previous word which cames before so that if if if somebody says sunlight then the next one should be carbon dioxide because uh it says that photosynthesis is the process by which plants use sunlight so the next word is being predicted by looking at the previous which word which is sunlight process then carbon dioxide and then the water which is again looked by how photosynthesis is create created so photosynthesis is

sorry so generative models generates new sentences by predicting one word at a time based on the word that comes before so that it makes a complete answer coherent answer and very smooth because it will know okay this came at this because previous word is this then the next one should be this so you can think of them as a talented writers who takes different set of pieces of information and then it turns them into a story into an understandable answer right so that is the generative models are right so into the SL which you can

see the purple for them is to build sentences word by word for natural coherent Answers by taking in the information from retrieval models and uh think of them as like a talented writer which takes the information and then potentially creates it so what is the process so models predict each word sequentially sequentially creating a smooth and logical sentence so if it is says photosynthesis is the process then it predicts the next one okay it should be sunlight carbon dioxide water because it it has been talking about photosynthesis process so to make it very smooth to

make it very clear they actually use they actually predict the next word sequentially in in order for it to make it logical and smooth so uh the next step is how it helps to solve the challenge how it helps to solve the challenge of these three things the first one is coherence so the coherence there are three uh three main ways where generative models helps in solving this particular challenge of non-coherent and non-complete and non adaptability answer is they connect the pieces of information into full for for easy understanding full flowing sentences so that's the

first ways where generary model helps it connects the information in a way that is easy to understand that makes your answer easy to understand into full and flowing sentences another one is completeness the build a response that includes all the necessary details without needing you to piece together separate sentences so you get a complete answer without any gaps inside it another one is adaptability which is they can answer questions on different topics or in different styles for instance let's say if you ask about can you explain photosynthesis in simplest term so instead of the retrieval

model will give you several responses which may not be in the simplest form but generative model has been asked to make an simple form so it will make that into a simplest form instead of using hard and harsh language or jargons out here right so that's why if you search on just try to search explain me generative model like a class 5ifth student so it will help you to uh tell you in a class like a class 5ifth student now if you tell to explain generative model like my PhD student it will explain you like

a PhD student so you will see the retrieved information will be same but the way it makes the adapts to your uh question that is what makes a gener models so powerful so how generative models work and there are several generative models the first is generative uh which is GPT and then we have T5 and then we have Bart and then we have so many ter generative models out here which helps you to frame those level of answers adapt it and then have the completeness and coherence in their answer so how does it work I'm

not going to go too much in mathematical details but I want you to know how B exactly works whenever you use GPT and all you will know that using it uses generative models uses attention mechanisms to focus on the most relevant parts of the retrieved information so that's the first step that let's say that somebody asks for photosynthesis so it uses that attention mechanism to focus on the important words only and ignoring irrelevant details so if the model sees the word photosynthesis it might pay more attention to related terms like plants sunlight glucose when generating

the next so it uses the attention mechanism to focus on the most relevant thing in know and uh relevant terms to maintain the relevance out there and then it predicts the next word so once the model knows what question is about it predicts one word at a time to form a sentences to make it coherent and cohesive and probably and adapt to the tone it's like you write a story building on each word to create meaningful sentences so if you were told to write about uh tell me about photosynthesis is it will tell you it

predicts the first word then second third fourth fifth to make it a continuous FL from the retrieved information and then you have decoring the response which is the model keeps predicting words until it completes a sentence or a paragraph that fully answers the question so it keeps on predicting the next word until unless the model feels like okay I'm done and this is something which we call as decoding this is what we call a call as decoding where it generates a complete coherent response by adding word words one by one so the way generative model

work is uses attention to focus on the important details and important terms from the retrieved information and then uses the model to generate one word at a time based on the question and then it keeps on generating until unless there's a full stop or completing completeness of your answer I hope it makes sense now I want to talk about a very quick detail on the difference between retrieved models and gener models so in retrieval models it finds an existing information but in generative models it it creates new sentences to explain the information so you might

find several information but you need to explain it you can use Google to also find such information but generative models eventually create new sentences using those retrieved information they they are often disjointed they are like in like a list of information but gen generative model are like Smooth connected and it flows naturally out there and it delivers full and clear answers on that sense so I hope it makes sense about generative models now what we will do we'll work through a simple example where we'll combine both retrieved models and generative models and then we'll try

to build a system on top of it so now let's go if you haven't understood into this lecture don't worry in the next one you will see a example which will help you to understand it much more better I want you to see this this example which is pretty pretty important um so the the the example will include both retrieval and generative step so whenever a question is asked the model is goes through the first phase which is the retrieval phase to find relevant content in a large database of text so the first step is

transforming the question so whenever the quo comes in it transforms that quotient into a vector so let's say somebody asked how does photosynthesis Works which is represented in a vector with embeddings so it is because the question has been converted into embeddings and then the question when when it is converted into edings we use a vector database so we have already talked about it we use a PG Vector so this this is a database which stores the vectors that represent the key information on different topics such as photosynthesis climate change and all those things so

you'll be having this V de database which will have all the information different different set of topics now I will not get into detail of vector databases right now because we have already explained it a lot and we'll talk a lot about once we do the project but this database will have lot of information not only about photosynthesis but about climate change water cycle and all these things and in a in a very relevant format which makes it easy to locate now this question about how does photosynthesis work if you're using a traditional atrial systems

it takes this photosynthesis words and try to match with all the stored vectors using similarity measures it might use cosine it might use equan distance to find the closest matches so the first step is that the model trees then sorry the for for example somebody asked U photosynth for a question on photosynthesis is there might be several retrieved phrases which may include plants make food food involves sunlight and carbon dioxide and several other chunks of information so the next step is once you actually have this all this data stored in a vector database and your

question is being converted into edings you find similar vectors which is into into the database which means the vector representation of the question is then compared to the vector stored into database using a similarity measure now once you do this third step similarity matching you will get a bunch of retrieved information you get a bunch of retrieved information which will uh tell you the the uh most relevant information based on the similarity scores right so it picks out the most relevant information out there for example for a question photosynthesis the results might include process plants

used to make food involve sunlight Co and water so these are the retrieved information when once you compare the qution vector embeddings with all the photosynthesis with all the documents if you using traditional and then you take out the top similarity scores whoever whoever documents had got the top similarity scores and these two these two sentence has got the top uh similarity scores so this step Narrows down the vast amount of data to just the information that is most likely to help answer the question which was asked which was about tell me about photosynthesis now

once you have this retrieved information then the generative models comes comes into the play and generative model takes over a task like a GPT or probably U some some T5 or some other generative model which creates a well and cleared and structured response the first step is it understands the retrieved information so the generative model receiv the question and the retrieve content as a context so that it reads through this content to understand what points have been provided so basically first of all it understands so whenever you get a fact and if if here to

make this fact to a paragraph and a flowing sentences you will first of all understand the full context of it and understand the key points and then the next step is generate a coherent answer using attention mechanism so this model uses attention mechanism which means focusing on important words within the retrieved information you are these two information which are retrieved it focuses on those important it uses the attention mechanism to focus on important words which is in the into to this case uh photosynthesis is sunlight and glucose which pays more attention to these words to

maintain the coherence of it and then a step three it it predicts each word in the response one one at a time so once it start generating a response it predicts what should be here and the next one based on the previous one step by step which means this a process is known as decoding now let's say that somebody says um yeah I will take one example so let's say that photosynthesis is let's give an example so photosynthesis is the first generator is photosynthesis is the process by which plants dot dot dot so the model

will predict use sunlight as the next words because because that's what the next next word should be because it has been seen okay I should predict this word uh based on my previous context so the photosynthesis in which PL process by which plants you dot dot dot which means now it predicts use sunlight and then it predicts carbon dioxide and then predicts glucose and so it will continue like that in order to make a full and complete sentence so it will keep on predicting until unless your uh decoding process is done so to tell you

about the three main mechanism which is attention mechanism to focus on the retrieved information to build relevance in the answer and predicts and keeps on predicting the word based on the context forming the full sentence and it continues predicting until the full response is generated so I hope it makes sense now uh now I want to talk about these I I will just summarize all these retrievals and generation into two steps so whenever somebody ask a question about how does photosynthesis work the retrieval models comes into the play which converts the question into a vector

and then s and then it searches the vector database for a vector similar to the quo which means it compares the vector the qu Vector with all the all the vectors into the vector database to find which is similar to the question and then it retrieves the relevant information by using similarity scores now once you have the relevant information it goes to the generation step which means the generative model reads the retrieve content to fully understand it and then it uses attention mechanism to focus on what's really important to ensure accuracy and then it predicts

the word sequentially which generates the answer one word at a time based on the retrieve facts and then it keeps on predicting like a decoding process until unless your sentence is complete and this makes a full R structure and an architecture now if somebody I have seen so many people just jumping into R explaining what it is but now we have I have went through lot of problems that this was the problem and this was invented this was the problem this was invented now I hope that is very clear why we studied information retrieval why

we studied embeddings why we studied Vector database why we studed retrievals and all those things so I hope it really makes sense now what we will do we will talk a little bit more about retrieval models we'll talk a little bit more about RG which is RG architecture technically you know everything about RG theoretically which you need to know but now we will go a little bit more deeper in understanding why they are so effective how exactly what are the benefits of it why do we really need it to make sure because we'll be studying

the same thing again but I'll make sure that I'll give you the formal definitions and probably formal understanding of what RG is I hope it makes sense uh we'll complete R very quickly and then we'll jump to our very first project and one of the most advanced project of Lexi chat so I hope it excites you let's focus this on the next one now we'll go ahead and talk about something known as RG architecture and uh it's nothing but combining both retrieval and generative models in order to make this r architectures and that's why the

name is retrieved retrieval which when we augment it that gives a part to the generative model so that's where the name comes from that if we augment retrieval model we eventually when uh get something known as generative model so the ARG the rag architecture is an approach to combining these two types of VI models which is retrieval models and generative models so this combinations allows Ai and getes a powerful technique for them to answer questions explain topics or summarize information in a way that can be both accurate and easy to understand accurate by retrieval models

and easy to understand by generative models so we have been doing too much of theory and I want to take a last one for Theory and then we will eventually go in too much of practicals because that's what the Practical is very important for this but I want to take a time to tell you again what uh retrieval models and how exactly uh the whole RG architecture and why there's a need of it so let's say why do we need to combine that's the first question which should come and you you now at this point

of time I don't need to tell you why so I hope that you got this answer why we really need to combine so let's say that you're preparing for a science quiz and you want to study about photosynthesis but instead of flipping every uh page through your book you prefer to get a clear and direct answer right so for that you will search a Google and you'll go ahead and then you first of all find and accurate information from The Trusted sources and then you should be use those sources to to explain it clearly and

that's what the the both the retrieval and generative models helps you to do the capabilities of retrieval models will be to find the relevant relevant information from a database or a collection of a documents but the only problem with them is the struggle to make responses which are more coherent which are more complete which are more adaptable towards the question because they'll give you the list of ideas list of phrases list of sentences then you then the generative model will take all of these sentences and eventually get all the context and then prepare a full

grounded sentence which is craft a fully connected sentences now if you do not give the retrieved information from the retrieval model to this it may be you know not good enough it may lack accuracy so accuracy is defined by how well the information has been extracted and gener models helps you to use those extracted information to make a well-crafted answer now what is the benefits of combining both why do we really combine it first is a factual accuracy uh first is the factual accuracy that the retrieval models gives you the correct answers from The Trusted

sources the grounding the responses basically it gives you the accurate information from The Trusted sources and then the generative model will organize it the facts into a smooth and readable answer and adapting it to the mode of question and then together combining both of it will deliver responses that are both correct and easy to understand that's what gives them art to or delivering to r a uh retrieval augmented generative system so as I told that there are two systems right now which is the treal model and then the gener model which both comeb to make

an factually information correct and articulates smoothly now how so if you if you mentioned the word is retrieval augmented generative systems now why there's a word of augment this is the reason so without retrieval a generative model alone may Guess answers based on the prior training so the the the training might be very outdated of GPT models and all so you need something which is so let's say that if you go and ask about asked to GPT about what happened so you're having several chats right and then you and then uh the the model might

not be trained on it right so it may guess the answer but if you already provide Hey listen this is the database I have from where you should pick my answer that's where the retrieving things happens because if you want to make something known as let's say uh second may maybe chat B sort of thing so you need to know what chat has already been done in order to provide the next but GPT is not trained on those chats right so you tell okay the these are my chats you should retrieve the information from these

chats and then only give me the information so without retrieval it may only guess but training does not happen real time it happens it it it happens based on the prior training so it may not be able to Guess answers correctly but the generative model uh with retrieval it will have the specific and the relevant information to use which eventually increases its accuracy in its responses and what is the augmentation effect the augmentation effect is nothing but it grounds the retrieval step grounds the generative model making the final answer both correct and clear that's why

the word augments are there now I want to take one example what are the effects of global marving so retrieval model will give these phrases and generative model will use all the three steps attention mechanism and then forming the coherent sentences and like predicting the one word at a time and the decoding process process to give this final response based on these sentences now what is the workflow of RG pipeline in action which will also Implement once we go to the designing of Lexi chat we'll eventually go through this but what is the workflow of

a simple pipeline first of all let's say user inputs what happens during photosynthesis the system converts the question into a vector to identify similar ideas and then it Compares those vectors with the from the vectors which is already present which is already present which you given into the database you should take the information from there compare it from those relevant information and gets the top most Sim similar scores now it Fe it retriev the it feed the information to the generative model so whatever information is retrieved it feeds to the generative model to know about

the context along with the question and then the generative model uses those retriev data and questions to create a clear and a complete answer and then it eventually outputs which is easy to understand and this is the combined RG system so I hope it really makes sense about R Ag and how we eventually how it eventually helps combining both of it and I I hope that you know the essense of the meaning of augmentation now from now on what what we will do we'll focus on the Practical aspects of implementing it we'll talk about PG

vector and then we'll talk about Vector databases and then eventually we'll work onto the our final module which is our project which will eventually help you to learn more about it looking forward to chat hey everyone now we come to our first basic project where we going to implement an interactive q&s system using retrieval augmented generation this is a very basic process I want you to understand the essense of what step by step in R A looks like so that when we go to the Lexi chat and we build that agentic agentic system where it

chats with a document with taking entire context you'll be easily able to understand that major project so before coming to that I wanted to ensure that you already know some of the paths of it some of of the processes to it and the whole pipeline of how does RH looks like in action but still you have some way a very nice understanding of how Q&A systems are built so this is an interactive Q&A systems where you ask a question and based on the information it yelds you the answer so it's like you give them a

bunch of information BN B bunch of documents and then based on those documents it will if you ask the question it will answer based on those documents so the very first step which I want to talk about is uh how this Q&A system will look like so what we will do we will have a set of text a sample text because I have not taken it very uh often so what I'll do I will simply take a a bunch of texts and that will be acting as our knowledge base and then step one would be

the document chunking so if you have a PDF so if you if you remember recall our conversation our conversation was you convert that text or a sentence into words or maybe phrases of words or maybe chunks right for the computers or models to understand it well the same way we convert a bunch of text into chunks I've already did that to you but when we go to the real world system and somebody uploads a PDF the step one comes in converting those PDFs into manageable and understandable chunks so that can be given to the models

to understand it better in inste of just giving them the whole bunch of text so that's the step one then what we will do then we will go ahead because the next step is once the chunks are generated but still computers cannot really understand from those chunks it has to be converted into embeddings and there are three chunks into this project and this chunk has been converted into several embeddings so each chunk has been converted into embeddings of 1536 Dimensions previously it was very small but now there are 1536 Dimensions what does it really mean

so this this text has been converted into a vector that has a 1536 Dimension which means that it yields 1536 ways of representing this text yes you heard it me right so the if if you remember machine learning and uh in machine learning you have a data in which you have a 13 features 14 independent variables and these are 14 dimens they are dimensions right 14 Dimensions so every feature represents something the same way every Dimensions represent something into this chunks and this embedding is will be generated by one of the best embedding model from

open Ai and this has not to be trained from your side it's just that it is already available so you'll make use of already being available embedding we'll not go into Integrity details of how these embeddings and all those things are trained they have a separate training process that has to be learned so so basically into this if it it has around 1536 dimensions in which every Dimensions represent some part or a relationship between these words right that's the first same goes for this next chunk same goes for the third chunk and once we have

the embeddings retrieved then what we do then we ask a question let's say I ask what is ML and then what it will do we and then what it will do it it will convert this question into an eding because if if you remember our conversation our discussion it was around with then convert retrieval model retrieval process is done retrieval process is chunking and generating edings and then if the question comes in then what it does it uses that embedding and the embedding of the question and then takes takes out the uses the similarity search

to find out the relevant context or chunks for this question and then you get the relevant chunks however over here there are three chunks because I've continuously run the there are three samsons because I've continuously run the same database so it yields the three doc three stuff from there they they are all same right now once the relevant chunks are generated now there's a part of the generative model where it uses the openi model itself openi generative model to use this accepted or to use this relevant chunks in order to make the cohesive and concise

answer so it says based on the information provided is a subset of AI that Ena and this is from here only but it makes a concise information you can increase the more number of a chunks here and then see the answer but here this is how this is what we will build it's very simple retrieval models PR generative models they both combine in order to make the final retrievable augmented generation systems so now what what we will do we'll get into the implementation of it I hope you have already installed P foress database you have

set up a database you have installed PG Vector as an extension so we because we'll be using PG Vector as an extension to implement uh some of the processes of it and then query the database and all those things so let's go ahead by first of all setting up something so I hope that you have already have the database set up and I have lot of tables you might see not these these lot of tables because these lot of tables will be created in the next project I'm going to use the same database which I'm

going to use in the next project is I create a table so first of all I create and enable the extension of PG Vector you must have already in installed this already so if you run this command command enter it says PG Vector already exists skipping this right so it already creates the extension so because you'll be seeing where we'll be using it so it enables this extension of PG vector and then what we do we want to ensure all our information embedding because if the ction comes in the ction will be converted into embedding

and that will be compared from every embeddings of the chunks so that has to be stored somewhere so that every time it comes and then we see it so basically we create a table document chunks where the content where we create ID right and then the another one is content content is about uh uh the the the chunk and then eding and this is this is one of the data type of PG Vector that's what enables PG Vector to be this this this much powerful and makes more efficient because if you because a SQL does

does does not accept Vector as an data type so you say Vector for 1536 dimensionals so 1536 dimensional Vector so we create another column embedding into this database another feature into into this database which which will have a data type of vector of 1536 Dimensions so I want you to see it how does it really look like and I'm going to document chunks and because I've run it several times so that's why it has it has added that database again and again but over here if you see uh yeah artificial intelligence is the simulation that

this is one of one of the chunk and this is the content of it and the embeding is 1536 so if you do it manually if you do it it will because if if it is 1536 we need a specialized uh I mean operations to do it efficiently such as we need to make sure the indexing happens correctly we need to ensure that similarity search and all those things so PG Vector eventually helps in doing your similarity search better in U in in doing your similarity search better and other important things so the same way

you have you have the another one is the EMB bearing and in this eding what you have is you have 1536 dimensional vector okay perfect so we'll come to that uh how this embeddings are created using the code but this is how it will look like where you have the content and the eding for the same so now we we can get to the code so once you have created the table that's enough I hope that you already have now we need to set up some of the variables we need to set up some of

the uh connections so that that can be it it can be written over here what I'm going to do I'm going to simply delete all the information information from here so that uh how can I delete let's delete everything I'm going delete from here so what I'm going to do I'm just going to create a different one because let's say new so that this okay so another one has been created right letard all the changes right I'll just because I don't have a pro I'll go here now you can see that the database is complete

empty now what we'll do I will simply uh show you how you can uh fill this database and then follow this step to build this stepbystep procedure so now let's get on the next one to start off with this now what we will do we'll start working on the project and over here the first step is that we need to install necessary Imports we need to import all the libraries which is pretty needed for this project and the first Library which we importing is nothing but open AI which is a library where which will be

extensively using I just do one more yeah so open AI Library which will include which will help us to generate embeddings and which will help us also in generating the response using models then we're going to import PS c 2 which eventually helps us for connecting and interacting with POS SQL database because we just now has created the database so that has to be connected with uh with this particular system so we are going to use it and then we're going to use streamlet uh because we'll be making a dashboard from it and then we

have something register adapter because there are several environment variables of course we'll be not using it too much so I just uh imported right OS which will eventually help us to load the environment variables because there are some keys a because see I cannot show you my API keys if I show you then you'll be um maybe you can misuse my API keys I can do I I can't show you my database name so to protect it I've created thatv variable where all my keys important keys are and then where I'm retrieving from there to

here itself that's it so we need to load the environment variable fromb file so thatv file is is important from here now you can use os. getet ENB because first of all you need to configure your open AI API key so you can find the open AI API key now if you don't have a credits we have announced uh a credit system till 31st 27th I think somewhere around that time you can use that we we are giving away $6,000 worth of credits to all the students who are interested into it so you can go

ahead feel free to fill that form you'll be getting it ASAP any any amount of credit for sure now over here we you can get the API key and then you can paste it over here of course I'm importing it from open A.V variable sorry file where the key is open API key which is equals to the a key over there now I'm using psyo 2 to connect to my database now you must have already getting all these details when you're setting up the database which is which is your database intable plus uh so which

is your database what is a username what is your password what is the host what is the port which you're hosting and then you have Co nn. cursor which allows you so this this this cursor which will allow you to establish a connection to this postgress SQL database where uh these are the particular details to identify the database details and then it will help you this this cursor will help you to execute any sort of uh any sort of uh instructions because because we need to to run several SQL queries uh to embed or to

retrieve and to make use of PG Vector uh also there is PG Vector python Library also but for the ease we'll be using cql PG Vector extension itself into this now as I said that I've already created from the PDF to a sample of chunks there are three chunks only for a Simplicity purposes I've created over over here you can add more chunks you can add simple you can add a PDF and then you can define a function that takes the PDF and converts that into several chunks because that we'll be doing in into the

next project and then we using streamly to have this title of interactive Q and A system this and blah blah then what we are doing the first step is document chunking so we are going to the each of the documents with the indexing that's why we are using enumerate function and then we are going to write every chunks uh from the doc which we retrieved now what we will do we will generate and store the embeddings in the posis of course with the user display because we be showing you because the main point was to

make this particular uh uh particular uh what do you say streamlit is to show you step-by-step procedure so because we need to show you that's why I'm going to create a list of embeddings also now what I'm going to do I'm going to go through each documents uh each doc into into that particular documents list because these documents list is a set of list of documents so we go to each document over there and I'm going to use open ai. embedding doc create and I'm going to use some text embedding 002 for the taking of

the embeddings of my e text and the input would be that particular document and then what I'm going to do I'm going to append and then I'm going to retrieve that embedding from this response and then I'm going to append the embedding into this embedding in into this list and then what I'm going to do I'm going to also because this is for showcasing purposes and and then what I'm going to do because I also because we need to store the embedding so that similarity search is easy I'm going to use the cursor which is

my P which establishes my connection with posg SQL and I'm going to exe execute one command I'm going to say insert into document chunks new what I have to insert I have to inser insert first of all the content which is my chunk and then the embeding now the values of that would be first the document the chunk that that particular chunk and the EMB bearing of that particular chunk so it will add and then it will run the loop again and then it will add my another chunk and then so it will it will

keep on adding and eventually it will also showcase you on a streamlet and then it commits these embeddings to the database using con. commit and now once we have the edings done then what we do we eventually go ahead and then get relevant chunks so before that we need to have the question so that we can retrieve the relevant chunks so we get the question and then how many chunks we need to retrieve is three for Simplicity purpose of written three if you add more documents then you'll get lot of stuff from there now it

generates the embeddings for the questions so the first step is question embedding response where we are going to use the same embedding model in order to retrieve the response of the uh embedding of the question uh now what what we do we eventually take the embedding by by the response of it by going into the data and then going to the zeroth index for and getting the embedding from the response because there are multiple outputs for into this so you just need to get the response and then what we do we query and that's where

another uh this is one of the uh helpfulness of PG Vector where embedding was your column of vector and that's why you able to insert your 1536 dimensional Vector over here now there's another powerful feature where you can use the PG Vector operator of this this PG Vector similarity operator where you can use to get this top and top three three similar scores based on your questions so what we say we need to get we need to select that content select the content feature from this particular document chunk new database and then we need to

order by the eding by taking the top similar chunks using this particular PG Vector similarity feature and we need to only limit it to some top three top four top one whatever so this percentage as which tells you you need to get get the embeding from the question eding and then because the question embeding is this particular embeding so you need to get this uh compare all the embedding with this cion embedding and then find the top relevant stuff and this is the powerfulness it will do it very quickly just by single operator using PG

vectors operator and then you can fetch the results whatever the output which comes in so you see a particular output over here will show you you will get all the relevant chunks and then it gives you the relevant chunks and then if the question so we so first of all we also show you the embedding for the question and then we get the relevant chunks so we call this function because this was a function of get relevant chunks you give in the question you give in the question what was the question asked and then say

top relevant chunks which was retrieved based on the question was following 1 2 3 whatever I mean all the relevant chunks what we will do we'll simply set the default to one because I just want to see one because there are only three so I'll just take one and it will show you the top because this is showing is just for the learning purposes and then once you have the relevant chunks what is the next is using the generative model to generate an answer right that's pretty much it so what you do you use GPD

3.5 turbo to get the relevant chunks which is out here so you first of all prepare the context because you need to give the all the relevant information to the generative model so it can use those relevant information to make an answer right so what it does it takes the context and the context is nothing but your relevant chunk that's it I mean the relevant chunk which was retrieved over here that's what you get it's not an embedding it's the concise text which is received after calling top n sort of thing right and then you

you write a prompt prompt is essential part of generative model prompt is a way to communicate with your system so just like you use GP to write a beautiful prompts to get an answer the same way so you say using the following information you have the context answer the question and you also give the question so that it has the context so this question you have and this is the context you have so please answer it from there and it will be efficiently able to answer and then what you do you use this prompt and

then you create and then you call open a. chat completion you we used embedding previously to create an embedding now you're using generative model of open AI to create a response by using the model 3.5 turbo and then we are St stating that this is the prompt first of all the role is system that you are a helpful assistant this can be majorly enhanced by your prompt engineering technique and on the user side we need to have the prompt over here right and then it eventually gives you an answer that's pretty much there are several

parameters of Max tokens temperature there are lot of stuff which comes across it so please feel free to visit their uh particular I mean there are several documents around it but it's very simple the system the S the just just to explain you what these parameter means the messages parameter provides the input to the model in a chat-like format so that model knows that it's like a chat-like format so that it sees the conversation structure of the conversation as a series of messages uh now what is the system message into this case the system message

sets the tone for the behavior of assistant how you should act it should act like a helpful assistant now it will not get printed to the model but yeah it will show that system has to act like this and the user message provides the actual question which I said the prompt in into this case which includes the prompt as well as the context in order to get the answers from the model then that's pretty much it the max tokens into this case controls the length of the model's response so that it does not really response

too much and the temperature parameter represents the randomness of the model's response so that the lower so randomness what does it mean so let's say if the lower values like 0.1 represents it has to be more focused and deterministic it shouldn't be too much random however if you give if if you want to give more creative to the model to be more creative you can increase this temperature so that it creates more creative answer by its own to not just by relying on the context and that's it you close the connections because if you open

just like I I heard a very famous YouTuber saying if you open a fridge you have to close it also so you close the connection all these uh pois as well as con connections and that's pretty much it and that's pretty much it you simply go and now what there is one one more thing you have to first of all set up EnV and then you have to set up and uh virtual environment You by using this Ven and then you can run this Source q& because my V environment variable is this Q&A so all

my stuff is being saved over here for your example in environment variable nothing but helps you to keep everything intact and all these project into one single file one single source so now what we'll do we'll run this streamlet Q&A system let me save it document Chunk in started and here we go so if you let's do a little bit of here we go if you see from the database this has been the output of it so whenever you create the relevant chart chunks or the all the chunk all the chunks has been created into

the embeddings and then it comes over here it finds out the context and compare so you see that document chunks new has been created I hope it makes sense I hope it really makes sense I have not deleted this document just for a sake uh first of all I think uh I'll be there is some issue because it has been created several so however that that does not matter over here we just I just needed to show you of how this the chunking process looks like and all so this was a very basic system which

has you to create a Q&A system and this is how it looks like now what we will do we'll go to the something known as Lexi chat which is our next step and we'll use we'll we'll create a system where you can upload a PDF and then chat with it with the relevant context inside it it will also show you where the document has has been taken from I hope it makes sense um I'll be signing off I'll be coming into the next one for the Lexi chat we're going to start with Lexi chat and

Lexi chat is nothing but a platform where you can come upload your documents like any sort of documents and then what will happen you can come over here and uh have questions about it ask questions about it and then it will also have the conversational context so that you can go more deeper into their questions just like the chart GPT but it will give you the only information which is available into these uh PDFs you can have multiple documents over here and then you can also have taxs and the good thing is that it will

automatically assign those TXS by using an AI open AI API model to eventually assign those TXS which which has been created so there's lot of things which we need to work on this uh the first step is we in into the code if you see we'll start with something which is setting up our databases which is setting up our nothing but uh how how and where the way which we need to store our in our information and then we might go ahead with other page so this is complete where I'm focusing too much on code

instead of live coding because I do not believe in the idea of live coding because there might be a lot of things which might go missing into that and that takes a hell lot of time for you guys also for you to understand you can just understand the code in just one few few time and then we have an ability to um take out the code from the GitHub and eventually go ahead with it so that's the chat with the documents where you upload the document and any number you have with a conversation you also

get a references from where the has been created and then you also have the tax and there are multiple things which we have embedded into this project I'm super excited to get it started with this let's get started ASAP while coding this project so hey everyone now we'll go ahead and talk about something known as setting up the database because this is one of the most important part of our project where we will this this code will eventually set up the database scheme for us and um we'll be utilizing uh library and object object relational

uh Library sort of thing in Python which will help us to interact with our database which we have which which we are going to set up and then we probably going to use something known as PG Vector because as I told as I told PG Vector gives us an ability to store the embeddings in the most efficient and most uh variable way and then possibly use this for a semantic uh search and lot of things which we eventually had already discussed so we'll be using those two libraries in um to set up some database but

before we go ahead with why do we really need to set up a database I want you to take through through this example so I I I know you have already seen all these stream Lads and stuff like that so over here when you actually upload the documents it should be stored somewhere right it it cannot be just uploaded and then vent away for just one time even if I come back tomorrow and open the system it should have this document so this has to be stored somewhere when this when this gets uploaded that's the

first thing right because if it then if it does not then if the questions are being asked from the document it will be not able to retrieve the document because it has not been saved so we need to save it somewhere and then another one is something known as business which is the tax so we also need to uh be the assign the tax to this particular particular document so that the so that it has an easy categorization and sort so it would be pretty easy so we need to store those taxs also the already

existing taxs which you have created so that we can com take these taxs and compare with the documents which have been uploaded and assign those tax to those documents which relates the most and that's why we need a database for the storing of the tax too right and of course documents where we'll store the name of the documents the tax related to it and possibly the most importantly embeddings of the chunks which is Created from the documents so that's why we really need a database and that's what we need to create some sort of a

POS SQL database and uh we'll be using poer SQL so the way we need to create is we can either come come to the uh something known as table plus and create the table over here itself or we use a SQL query to uh eventually create in that way however we'll be using the python way to create such uh tables and all to make sure that we are r on to the project page itself so now the first step is which we need to follow is we need to initialize some sort of connection with the

database over here so that it is able to come over here click on tables create new tables uh based or create new databases based on your names so that's what the create new tables in that databases based on your requests so we need to really connect the connection to the POS SQL database and for that we'll be using something known as PV pwv pogress SQL database class envirment variable to eventually um uh connect with the database and then we'll be as I told that we are going to create EnV file will be where we'll be

storing all our information so it eventually gets get EnV um as a from o and then we'll be getting the posg SQL database name and Then followed by the host Port user and password so I hope you have already set up this and you already know this you can simply go to the EnV file and then just change it out there that's pretty much easy so now that's one of the this is pretty important for setting up your connection now what it does it centralizes the database configuration so it makes it very easy for you

to retrieve to easily connect with the database and acts like an interface instead of you coming over here and doing it so it's complete does things over here so now we'll be creating three models so understand we are going to create a three models we are not going to fill anything it's just that we are creating a three database models three tables inside there and then we'll be defining relationships between them like documents and tables will have some sort of relationship because tax is being assigned to those documents so we need to design uh design

designed in such a way that it has the relationship between them two so what we will do the first step is that we we'll be creating the documents table where we'll be storing all our documents into the uh POS SQL database so what it does this eventually creates a table called documents it will eventually creates a table called documents into this database into this database which we have recently connected with and then the uh and and using the text field sort of which we have imported from pwe which you can of course of course I'm

not going to go in uh too much of detail but it stores the name of the document as a text so that's what the text field means and that's what all the because it will involve only text right that's why it is known as it is the text in field if it it is a integer field that sort of thing is still valid so over here the documents will be text field that's why it is adding as the text field while the database name will be documents followed by the database which has to be connected

with the following so this table serves as the main entry point for storing metadata about each uploaded document so what it does if you if you come over here and see the documents it will store the first I uploaded is meeting with imoa CEO so that is the which I've uploaded so it will store the name of the document that's it so that's what the document says that it will store the the name of the document the content is not yet stored only the name of the document is being stored at right now we'll talk

about when the content will be stored just in a bit now another one which we have the tax the tax uh table will be also text field because when texts are being accepted and the database table would be tax followed by the database which we need to connect is following database so where will where you will be storing all the tax if you come over here into the tax sorry I think I really don't have yeah if you go to tax and then you see I've created only one tag which is business so over here

it has that tag also so now over here if you see this is the tax table which eventually this this this is the tax table which eventually represents or stores different type of cxs or categories that can be assigned to the documents now we need to define the relationships because there are two relationship where where tax are being assigned to the documents so we need to represent uh many too many relationship so we need to make so this eventually document tax which is another database model another table for us which eventually represents a many to

many relationship between documents and tax table so what it does the document ID is a foreign key linking to a document into the documents table so it's more like that we use the foreign key field which is another method from the pwe where it eventually what it does it says that that it says that it fetch all the tax for a document it fetch all the tax for a document and then it automatically so we also selling cascad which which which automatically deletes related tax when the document is deleted so let's say if you delete

that document and that tag is associated with the document it gets also deleted it does not remain solid so what it does the foreign foreign key field eventually establishes relationship in the documents so documents is the following this table and then is the the back ref which is another one of the parameters which allows for reverse lookup which is fetching all the tax for a particular document which we call this as a document tax so if you come over here this is the document tax so if you see the fifth document the fifth the the

the fifth ID of the document is related to the the ID of it is the first so if you go over here is the first right so that's what it does that it eventually creates a document tax which eventually fetches and then assigns the following so if you see the document tax that this this is being created so so I I know it might be confusing if you're new to it is basically that we need to link the tax to the related documents that's pretty much it uh now uh if if you go ahead the

tag ID the document ID is a foreign key linking to a document into the document table now that is the document ID so that we get the document ID now the tag ID is a foreign key linking to a tag in the tax table that's it so foreign key I if if you'd like to know more please go ahead and learn more about foreign key field but it's basically makees sure that we create such table where tag and document ID will be intelligently uh referring to each other right so if you upload a document it

will search through the tags and then use the tag ID and then because you'll be seeing that we'll be when when we eventually uh talk about um suggesting taxs based on the documents we give out two IDs the IDS related so we take the document ID and then we have those ID and then we eventually make that up and then add that into a document tax so that's one thing which is the document tax model another one is our another database model which we call this as a document information chunks so what it means it

represents the chunks of the documents and they embeddings and that's what is very important so if you come over here and then if you see um if you see is the document ID and then this is the chunk and this is the embedding related towards it which is that's what this is the vector vector this is a text or string and this is the document ID which which eventually if you come over here if you come over here documents that is the ID so right so document ID and again there's a relationship between documents and

documents new in information chunks so I don't want to talk more in detail why do we reduced into chunks we have already talked into details so basically a particular document is then classified into chunks so we'll So currently all of these are the empty what can you call empty uh sort of database right empty database you will not see this into your file if until you run something so you have this and then there are multiple chunks which has which has been created and every chunk will have its own EMB bearing and that's what it

says that it uses again foreign key to establish a relationship and then it also adds back of document in information chunks which what it does it's it's it's the foreign key field is the document ID which we are coming up with it what it does it links the chunk to its pered document so if you see this is the chunk of this particular document this is the chunk of this particular document ID that's what it does and then another one is the chunks which stores the actual text of the chunk uh into that database and

then we have the embedding which which is like a vector field which stores the 1536 dimensional uh Vector representation of the chunk and then you create a new database model known as document information chunks then that's pretty much it so that's what we do we PG vector vector field which is which is provided by PG Vector extension for posis it enables efficient similarity searches that's why you have seen in the last project also that we use something know as vectors to eventually create an embedding column so something new is over here that we are linking

several database to interact with each other and then what you do you're creating tables so what it does DB do it connects to the database and it creates the necessary tables which is documents taxs document taxs and document information tags sorry chunks so that's what it does it creates the necessary that is which will be complete empty in starting and then it creates the table by this now we have another interesting function known as setting up an open API key management and that's another major stuff so what it does it configures the open AI API

key directly into the POS SQL for usage in the queries so when we talk about generation uh generative models you'll see that how we are using directly this uh from the directly POS SQL query us using the help of PG [Music] to uh to eventually go ahead and uh because we have already enabled the PG Vector extension it gives us an ability to directly query any sort of like we want a response we want to generate an answer from Opia we don't need to write this extra code we can just write a oneliner SQL query

that's why we'll set up here itself and then I'm getting the open API key from my environment so what it does it it it you can use for some database queries like generating embeddings we we'll talk about when when we talk about generative and retrieval thing we'll talk about how it is being handy so it uses opena directly and then it stores we'll be storing the API key ensures that it has the proper integration so how it comes all together the documents table which you have created the first one what it does it stores the

metadata about uploaded documents which means the ID and the name of the document and then the tax table which stores the available tax for categorizing of the documents and then you have the document tax table which links the documents to the tax which is enabling flexible categorization and then you have document information chunks table which stores the document chunks and their embeddings for efficient retrieval so what it does it distorts document chunks as well as their embeddings by creating a chunks ID and as well as the the related ID right and we need to uh

as we are establishing relationship so it has to establish eventually a relationship with the uh parent document okay perfect then so this is this was it about the doc databases so now you have eventually created the four to five databases right now so if it is one it is eventually for database and then you connect and then create the tables now what we will do next is we will talk about how we eventually start to fill this database and I hope that you already know how this actually works so I don't want to get in

more detail about it but what we will do is we will go ahead and learn something about uh how to fill this which is retrieval systems and then generative systems these are two one of the crucial parts of it so we'll go ahead and then eventually design our um manage documents first and then we'll go to the chat with documents and then we'll go to the manage tax and then we are mostly be done but the manage documents is one of the most important part of it because here what you're doing you're you're converting the

the the the you you're getting the embedding you're getting you you're actually filling up the database when somebody uploads the documents how the documents goes through a phase of chunking and then how that goes to embedding phase and basically we'll be filling up the database currently our tables are empty so we need to fill that up with the relevant information so whenever somebody comes over here how exactly happens over here so we'll talk about that in the next one go ahead and talk about the another major processes which is how we are handling the manage

documents and it will be very similar to you that how you'll understand this first of all I want you to ignore all the UTS it might be very lengthy to understand it right now so once you go more again and again you'll automatically start to understand all these what what what are these Imports and all I don't need to go into the details right now but just to give you one small detail we are going to use use streamlit so we going to say set page config the page title would be the P manage documents

as you guys you can see the manage documents and manage documents the pce title and then the title of it would be the manage documents right and then we'll we'll come back to delete document in a bit so we are defining a function where it deletes the documents so what it eventually does it will we we have this this this function eventually deletes a do document so basically you just give the document ID which needs to be delet deleted and then we have imported our database of documents from this database file so what it does

it goes to these uh to to this documents. delete where the documents. ID of the document is the ID which is which is which has been passed and then eventually execute this particular statement that's what it does it is not currently being used this is the function which I'll come back to later so now what we're going to do we're going to talk about how eventually we want to um go ahead right so the first step is this what what we will do will because the because a document can be multiple pager and as I

say for the first step because we were talking the first step is converting a PDF into chunks so that computers can understand or generate meaningful meanings or facts from it so the first step is we'll be using we'll be we be will be doing document chunking that's the first step and second was be the fact extraction from the large documents so basically if you convert the large documents as a split into smaller chunks what it does um it it converts that bigger one to smaller chunks and then you still have the smaller ones now you

use an AI or use AI models like GPT 40 mini or something like that to convert those smaller chunks into meaningful or smaller text statement to extract facts or key pieces of information and removing irrelevant information from it so what you're doing two layer of things the first layer is converting bigger into smaller chunks and then you're using an AI to say say stating that hey this is my chunk can you just take out only facts or key pieces of information from it and this is the prompt which you give to an AI by giving

this chunk and then eventually does it so if you see over here uh if you if if you see over here this is the chunk this is the them there must be bigger chunk right this has been converted into only the relevant thing only the relevant fact or the important piece of information that's correct right now what we'll do the key parts of the pro key key parts of the code I'll go through I'll not go through each and every Integrity of the code because I hope that you already know if you don't please use

GPD to understand any line of the code now we have something known as ideal chunk length which is the ideal chunk length which means means what should be the ideal chunk length of this the the facts which will be generated what should be after you convert that into there's bigger one to smaller chunks now smaller chunks to more to more factually correct and as well as key pieces of information what should be it should be around 4,000 characters you can play around it but it ensures com compatibility onto it because smaller chunks are easier for

the AI to analyze and return meaningful insights out of it so I want to give you an example so let's say there's a 12,000 character document so it will be splitted into three chunks character 0 to 3 3,99 characters 4,000 to 799 and characters 8,000 to 11,99 so the three chunks will be created out of 12,000 character document so now we are coming to the next step is we are going to create a table sort of class or sort of model for generated document information so that this list whatever the facts which have been generated

can can be taken into the facts and which can be later inserted into the database because that's what we ultimately want to do so let's go to the generate chunks because that's another logic which we want to talk about so what it does this function generates a facts from the chunk of text by sending it to the open AI apis basically GPT model by giving them a smaller chunks to come to more key pieces of information or facts so we are we are going to use async uh a async which is a a asynchronous uh

function over here into the function now what it does it allows the systems to perform other tasks like processing other chunks while waiting for the API to respond it's in in sometime API may have a high usage and it might be that it is uh not allowing other chunks to be created so what it does it performs other set of task in the meanwhile while waiting for the API to be responding so what it does it eventually improves efficiency and respon responsiveness when processing multiple junks like if you upload 10 documents it will be if

the first is not happening it will go to the second one it will second second so it will keep on performing other task instead of just staying at the first and waiting for it to complete so what it does while true it says that keep on trying which means the to Total rri keep on trying because sometimes what happens that uh that what what eventually it ends up happening that Network requests can fail several reasons it might be rate limits it might be timeouts it it can be their own server issues it can be slot

of so we are adding a retry mechanism so that it ensures that temporary failures do not stop the entire process it so we just want to make sure that it does not stop the entire process so what it does the loop retries at least five times with 1 second of delay in in into it so that that that that eventually goes ahead but this is the so if you see over here the total retries goes to the five times because that's where the try and uh we sty and accept it goes through it it goes

through till five and eventually Waits at least for 1 seconds and if it still does not happens then it says fail to generates the facts for the PDF chunk with this error that's it so it will yield an error but what if if it does not yield error and then you what what you do you send a request to to an open to this send the chunk of text which you have the chunk of text which you eventually took from the PDF and it will be on a loop okay it it will be on a

loop so you need to send them the PDS chunk and then it will be the the chunk of the PDF it's not the whole PDF the chunk of the smaller this the bigger PDF converted into several chunks the first chunk will be sent to it and then you'll be calling using an openi request to the openi which where we'll be using our model which is which is the which we for the processing and then we'll going to give the messages it will be having two two instructions first is the role of the system and the

role of the user so first of all you give the system the role right you give the system the role which will be the system now into this you eventually pass the uh what what what can you say you you set you set up an AI with a task specific prompt so if you see over here that I've imported something as from constant Imports cre fact chunks so you need to tell the model what he needs to do so if you come to the constants dop you have something as create fact contents which says you're

an expert text analyzer who can take any test analyze it and create multiple facts from it output should be strictly in the Json format that's what it does so sorry so that's what it does now let's go to yeah so what it does the role of the system is to do that is to create such facts so that's what it is importing The Prompt for there you can you don't need to come over here every single time you can just go into the document and change the prompt and tromp changing is really really a magic

in drastically improving the performance now the next one is the role user where you providing the context where you're Pro providing the actual task to do which means you have to do this by you and what you the the user would be that will give you the PDF text so basically it passes a chunk of the text as an input from the user side and the task would be to create the fact system prompt you have temperature which says that it controls the randomness which means lower values make the outer more deterministic you actually prevent

the ways of being more creative and it should be in the jset format Json object and what we truly doing over here if there's no facts generated we are validating the response to make sure that the API Returns the valid data we need to do this particular checking so if it if if it does not then it raises an error of no facts generated then what we do we eventually uh takes up our event we we we we eventually validate our validate our response we it validates the Json response using the generated document information Chens

model which we have eventually created over here it eventually validates the model validate Json the content out of it and the facts out of it that's it and then it says generated this the length of the information documents from it it it it will be eventually converted into a list but it also validates it it says the length of the list would be this much facts for the PDF chunk so if you see over here so whenever you upload a document it says generated three facts for the PDF text Chunk eight um text Chunk nine

generated three facts so it will keep on generating those facts for each of the chunks that's one and as I say it keeps on retrying so this is about uh generating chunk where it eventually converts the bigger one into the more concise factual information key pieces of information now once you have this we need to assign the taxs associated to these kind of chunks through these kind of documents right and that's another thing which we have to do and then what we will do we'll talk about how we can combine both taxs as well as

uh this particular generate chunks and we we also have to convert those chunks into the embedding because that's for the retrieval we have to convert all the documents into into the embedding so that whenever the question ask the embedding of the question and then this documents embedding will be you using the similarity search you eventually match it so let's go to the next one as soon as possible now we have once we have completed this particular where it takes the PDF and converts those into chunks we also have one feature where the particular document has

to be attached with the tax so for example if you come over here I have created a tag of business or I can create taxs let's say different taxs over here so whenever you upload a document I upload a document with IM MOA eventually has the tax of business now if I upload something else and I add the tag of food and then I upload that then the food tag will be automatically associated with it and that's why we need to generate the sort of uh tax out here so first which which we did eventually

was to convert the simple PDF into chunks and then uh we convert uh we take we actually want to assign the tax to its respective document that should be correct right so that's what we are trying to do right now is this function uh which we have right now uh yeah this function which we have right now over here which is get matching T and again we are going to use asynchronous function because the reason why I say this because it creates several t tasks and it it even if the API does not response at

the time it continue executing further tasks so to talk about this what does this function does in a nutshell what it does it extracts the semantic tax for a document based on its content so if you upload a document it will take out some canding tax out of it and then it takes those taxs and matches with the existing tax which you have added into the database into the uh Tex sort of setting and then it and and it returns a list of IDs from generated so what it will do let's say that you have

a document like this a a sorry b c sort of document so what it does this function will eventually take out okay this is ml the tags are ml business AI right but only business is available so it will return that okay business is available in existing tax it will return the business so it will search in the datase what taxs are available and will eventually return something known as business the reason why because that's the tag which should be assigned which we have created over here otherwise it will keep on creating new set of

taxs which can be very lengthy too for you also so this is this is one thing uh that's what the get matching tax does so how it works so just to give you a little bit of understanding what it does it we the tax is our uh if you see over here tax is we are importing from our database the table so what it does it eventually fetches all the available taxs from the database that's the first step and then what it does it eventually goes ahead and converts the tax name to the lower case

for case insensitive in in matching so it converts all the tax to the lower case so that every not uh we we don't miss out on the uh common words I hope you already know what is case uh in sensitive mismatching and the openi model what it will do it will suggest those tag names which needs to be compared with the existing tax which is a which is available in the database and then matching it and then eventually G uh giving the tax to a particular document after having match so I I hope that's correct

now if the if there are no tax then it will return if the if there are no tax in over here then then it will not do anything so it will return an empty list ex immediately so if your database has no taxs then there's no point of creation of any tag so we eventually say that it returns an empty list so without this as as I said there's no point to match the document content then we start our asynchronous open AP AI API call so the total retries and then we going to retry for

at least five times to generate the matching tax for the PDF okay this is not for chunk for the the PDF over here and then we'll keep on doing it with a 1 second delay that's what the try and accept error saying and then we'll keep on doing it that's why while loop is here try and then what it does uh and then what it does is it sends that request to the open aai client so which you can see that we have imported from open aai from open a client and I I didn't went

through the this particular open air client so open air client where I have already set up the open which you can call this particular client to for any point for for generating answers for doing literally anything so it's like open AI async open a where you get an API key Max retries and all so you're basically using an open AI client sort of thing to call this API right so that's why you import from open a to import open client and then you call this openi client to open API request to suggest relevant documents based

on on the document content so what you do you you use gp4 Mini model and then the system would be get matching tax system and then uh do replace tax to match with and then s Str tax so over here you what you do what what what you do it gives the model the task it gives the model context for a task so if you go and see the constants so constant are your expert so this is the promp which we have to give to the system so that it knows what needs to be done

so it says you are an expert text analyzer that can take any text analy it and return matching text from this list which means from this database list and then that's the tax to match with that and tax to match with is already which we have from which we have converted into lower case and that's why we have this placehold placeholder variable which have which which which has been converted into Strings all those TXS so it says from this list one return those TXS which makes sense according to the text other output should be strictly

in the Json format and tax tax one tax 2 tax three up to tax three so this is the prompt where we are saying get matching tax system where we taking the prompt and then we are replacing the tax to match with placeholder with the tax which we which is already there which is the lower case so that it only Returns the matching text this is the task now the role of the user or spe that's that's the role of the user is provides the actual content document text for the analysis right and as I

said all these remains variable the same so what it will do it will take it it it will know the task it will know the PDF text now PDF text has to be has to be inputed into this function and PDF text wouldn't be the most famous uh what can you say uh I mean chunks I'll I'll talk about because PDF might be one lakh character one one lak word sort of big word so but you can get understanding of what this documented in just few sentences of few words or probably few paragraphs from the

PDF itself so we don't really need those and then utilize our resources for the same But continuing our conversation what it does it if it validates the response which means it ensures that API returned a validating response so if the response is empty if if the response is empty then it's empty response for generating matching tax and then what what this generated matching tags which we have created a class over here because we need to put our uh classes over here sorry taxs into this particular list which have which can be committed into the database

so it says we need to validate so what we are technically doing we using the generated matching class we are using the uh generated matching class to pass the Json response so that because it it it heals us the Json response right so to pass this Json response and then extract the suggested taxs out of it that's what this matching tag names are saying now we need to get the matching tax IDs the reason why I say this because if you come over here and see the document tax you need to matching tag ID so

this document has the matching tag this that is is one and document five is related to one so that's what we need to do so for each suggested tag so for each suggested TX so we so we taken out the suggested tag from the opener response so it goes through each suggested tag into it and then it fights this corresponding ID now if a tag is not available exist in the database which means it has not taken from the list then it throws an error right I I hope it makes sense so it it takes

out it finds the it eventually finds the corresponding ID into the DAT database so if the if that is available if matching tag is there then it appends the ID of the matching tag into the matching tag ID is the list which you have created which of the type of an integer otherwise it reaches in match not found in the database right and then it Returns the matching tag IDs that has to be returned so currently it is just for so you might be understanding how it how it will work and all so we'll see

we will see when we code it out into the pipeline so that whenever a document uploads you will see the how the process looks like so now the uh tax system is done I hope it makes sense to you now what we will do now what we will do we will cut we'll take all these two functions which we created and put it into something which is known as upload document and that's another major function which I want to talk about that what this upload document does and then how we can eventually make it used

to make it end to end where it converts the PDF into chunks chunks into edings uh and then creates the tax and then attaches the tax to the document that's what the aim of the up upload document and that's what we will end up to now we'll go ahead and combine those two functions we'll go ahead and combine those get matching tax and get chunks asynchronous function to go ahead and um uh take if if an document is uploaded then you will see how the step-by-step process happens and what in every document the step-by-step process

happens so the first step uh is that what we'll do I'll try to whenever the PDF file is there then the PDF we use something as PDF to text libraries that's what we have imported so we use PDF to text library to convert the the the PDF into the uh text to convert the PDF into the text which means it converts the PDF PDF pages into a list of text strings where each string represent a page whatever the page is and then byes iio what it does it wraps the uploaded PDF file into the bite

system to pass it to the PDF to text library and then what we say we want to after every page we want to join with this uh back sln which combines all the pages into a single strings separating them with this so all the pages with there are multiple strings so it combines all of them we're separating with this uh the slend so here how it works you you have this P PDF you have this pass PDF which you which eventually we get and then this is the text which which comes to us so you

have a paragraph and then the first step happens in converting this into the several strings and then you combine both all the strings into a single string by adding this separator right and then what you do because the because you still have to convert it into the chunks so you create a PDF chunks with the list and it should be a type of Str Str and it's an empty list so you go you go for I and Rin so what you for for what you say that I want to go through all this length of

the PDF text and I want to split the text into the chunks so basically that's what we are trying to do over here is it calculates length of the PDF calculates the total length of the combined text and the loop splits the text into smaller chunks of Ideal chunk length which is characters or maybe 400 characters whatever the ideal chunk length which you have defined I think it's 400 characters 4,000 characters sorry so it goes through uh from uh it calculates the total length of the combined text and then it splits the text into the

smaller chunks right smaller chunks and then what it does it it it append into this particular list by slicing the text from the current index to I + 100 index so i+ 400 index so so basically it just takes out so if you have the 12,000 then it takes from 1 to 4,000 then 4 to the next 4,000 8,000 that's how it is working right now I I I I hope it makes sense now perfect then so I can you can do the dry run if you want but that's what it converts this into this

particular chunks by converting them into several 4,000 characters each chunk now we need to prepare those chunks for the processing we need to prepare those chunks for the processing which means now you have converted the big PDF to smaller chunks but now that smaller chunks has to be converted into more understandable with key pieces of information and that's what we were doing in the generate chunks where we are calling the open API and stating that hey you're an expert and then please give me only relevant information right so that's what we are doing over here

now is we are we are create we are going ahead and then we are asking for uh General genate chunks for each chunk out there and this is an asynchronous function that's what we are creating a list of generate chunks cortines and what it does it simply stays it's it's a naming convention to be honest to name it that we are calling an asynchronous function so asynchron function will get the index of the uh will we will give the index to it as as well as so index which we will get from the enumerate thing

and uh en umate because it enumerate provides and of the PDF text chunks will provide the index as well as the chunk so because we have all these the it it will look like this some something like this PDF text Chun so it will also have an uh zeroth index first index second index so it will give you the index also it it will give you the simple index also as well as the chunk related to it and then it generates the chunks by giving the index of this and then the PDF text so it

it does not generates right now it creates the tasks because it's an asynchronous function so what it does it creates the task and then U task has been created for to process them asynchronously so that's what it does it creates those tasks for for it over it's a complete production level thing so uh you can see it is not doing it right now it is creating the task uh by calling this for each chunk by taking the index in the PDF text Chunk and it eventually converts those that that that chunk into the simple uh

or key pieces of information only for that particular chunk I'm talking about just one single document at this particular point of time so you will end up having the generate chunks cor routines will have this following the list will have will call generate chunks for all the chunks which has been created so this is this is just a dry run for you for you to understand it right so now we'll go ahead and then we have something known as and just just just to tell you why there's a cortin same because it represents what the

variable stores right and that's why it means it is it which is used as a uh what can you say it indicates the elements in the list are asynchronous task not function calls so if the this looks like a function calls but the these are not these are as asynchronous right perfect then now uh we will go ahead to the next one which is once we have this tasks which is created for the P of converting the chunk into these key pieces of information we go ahead and set up something known as uh when we

run those when we run those chunks when when we run eventually those chunks so the first one is this event Loop so what does it do so asynchron if you see we have imported async uh as async Library out here and then we creating a new event Loop because that has to be run on Loop right uh that has to be run on Loop so what it does a new event Loop is created and this Loop is responsible for executing asynchronous tasks so it's like in control room where all your tasks are going to be

managed but you need to assign someone you need to assign someone to give a permission it's like a telling P you need to assign someone to use this control room to execute task they can be multiple control room but you want to tell python that you need to use so you say a snle do set event Loop of this particular event so you have to use this particular control room to execute all the tasks and then this uh what you do you use gather uh you use gather to eventually get all the asynchronous tasks which

we need to run and it will get you all the cortines or the asynchronized task from this particular async because that's yields this so it will all yield all these AAS tasks to be executed and then what it and then what it ends up do that it it it waits until all three all all of the tasks are completed in into this it wa until it wait until all these uh asynchron asynchronous task has been completed and you have all the generated Tong contains over here right and then you do the same with get matching

tax croute so what it does you get you call the get matching tax and you only pass the first 5,000 characters because you don't need to pass the whole thing you just need to pass the first 5,000 characters of the PDF and then you'll be getting the tax associated with it so a kurtin is created to process the first uh 5,000 tag characters of the document for the matching tax and this is a separate task from the chunk generation so now uh what it does is the event Loop which we are creating is to run

until completed as I said the first it will run this and then it will run the next uh because there is only one asyn but there is multiple asynchronus task into this so we are running we are using the control room to run until complete to execute tasks and wait until all tasks are complete completed so here only two tasks are there which is generate chunks gather and get matching tasks so that's this one and this one and they will have their own list which has this has three task into into this and just it

it has one task which has to be completed so all of them starts in a parallel so it's all already done for you so all parallel execution will start which is generate chunks for the first generate chunks generate chunks all these three and then once and then the matching tax so all will do it parallell to make it faster efficient and everything and then you eventually get uh this particular if you run this if you run this particular if this event Troup is run for that you will get something which is the fact chunks one

fact shunks two fact chunks three so basically it precises the key pieces of information and gets you three again with a more precise information the output basically basically out output of this a synchronus function and and then you have the matching tag IDs because that's what it Returns the matching tag IDs which what it does it eventually finds out those tags which are available into database also common also it analyzes from docent eventually returns those IDs which can be Associated which which which which will be associated with the document itself so that's what it returns

and that's why we unpack both of this when we run this so this will return you the chunks which are more concise and the matching tax IDs over here I hope it makes sense now the reason why we do this is you will automatically understand so this will yield you the nested list so it will yield you this the document information chunks will be something like this something like this and your facts will be in inside it so we need to convert that into something no something like this right currently it's this so that's why

what we do we create a single list containing all the facts generated from the documents this is the line what it does it converts The Nest it to the single one I think that should work so what we'll do we'll create a single list containing and that's what we do and then you are you're having your facts generated from the each document chunk and then you having tag IDs that match the document content and the whole point for running this as as asynchronous thing is to do the parallel execution and make sure that we save

time and the and making sure the generating and matching tasks are independent tasks I hope it makes sense so now what we'll do we'll talk about the next stuff which is uh inserting all the inserting which is saving all the generated data from it that's the another one because once you upload the document then everything happens you generate the chunks you generate the matching IDs but has that has to be also saved right so we with db. automatic sorry uh Atomic uh uh as a transaction so what it does it creates a database transaction context

and all the database operation in inside this block as treated as one Atomic operation it as one Atomic operation which means it will be just for one row you can understand in that way perfect right so it first of all sets the open AI AP API key that's what it does it sets the open a API key that you can see over here that it you so that SQL knows that I can use this because we need to now generate the embeddings also for the chunks right right so and if any operation fails the entire

transaction is rolled back because not a row is inserted right so we don't want empty rows so if if if the operation fails if this operation fails then entire is abandoned so uh this what it does this document ID what we are doing we are we are inserting a new row into the document table we are inserting the new row into the document table and then it stores the name of that uploaded document and the name of that upad uploaded document which has to be given from here right and then it executes it and then

we get an document ID so that we can use that ID for several other purposes so if if you come over here you can see it has been inserted and the ID which we will yield from it so that we can anywhere uh take it up so for Chunk in the document information chunk so we'll go through all the chunks into the document information chunks and the document information chunks are nothing but the lists of are the lists of uh more concised information so it goes through every and then what it does it inserts those

documents so document ID as well as the chunk that chunk and then the embedding of it and that's where we using something which is another powerful feature of PGA time scale pgai where we are using and directly calling an open AI inside the SQL and saying that and stating that uh to generate me an embedding to to generate me an embedding by using this particular model and then you should generate for this particular chunk so it does for every chunk so that's why it is in a loop and then it executes it and then that's

the document information chunks and if if that's why you go to document information chunks you have this document ID chunk because there are multiple uh chunks for the same ID so you have multiple IDs for the same chunk for these different different chunks and then the embedding associated with it and then and then you have the and then you need to insert your tax also so what you do you eventually go ahead into document tax table insert many because that has because there are multiple things to insert because there might be multiple IDs so you

say for tag ID in a matching tax ID you takeen you have the particular document ID because document ID the tax ID is of that particular document ID and as as the tag ID and then you comment those transaction which means you push those transaction just just like you comment git and it says inserted facts for inserted this much chunks for the PDF name with the document ID and the length of matching tax matching tax so this is what you do and then you eventually close this event Loop so that's what it does I hope

it makes sense I hope the document uploads so this is for a single document so whenever the document comes it converts the tax into more chunks then chunks more precise information generates tax for that particular chunk inserts it by by calling an embedding so if you seen that we have also created an embedding also so embedding column has been created without writing another separate code for it that's that's that's why I say always these Vector scales and time scale PG PG Vector super powerful okay perfect then so what we'll do we have our final part

where we will convert this into an UI so that anyone can come and then they can upload and run this all these functions now we going to the last part which is development of our manage documents dashboard or development of the manage doc mends uh streamlit UI so let's go step by step uh I think that you must be knowing most of it it's just that I'll make it very quick so we need to we are going to create an interactive streamlit interface for managing uploaded documents which will include the functionality for uploading PDFs processing

them and displaying the stored documents from the database and the associated Tex allowing the user to delete the documents too for example the thing which was uploaded the tax associated with it the uploading uh feature as well as the delete feature so over here if you if you come then um you simply have this sto dialogue which we set a dialogue option which says defines a modal dialogue to what to write that's what it does and um over here it's like upload document and st. dialog is a decorator to handle dialogue interactions because we have

to literally upload something so what this upload document dial open does so what it does it imp it it implements the function functionality for this particular document upload so what it does use sc. file uploader where you upload the file and the type should be the PDF if the file type is none then what you do you simply is is is not none sorry then you simply go ahead and then the they click on the upload button and then they eventually upload it and then it will be simply uploading the PDF uh document so it

says that that the reason why we say it's not none because we need to ensure it has been proceeded correctly as well as we need to dis play the those functions with uploaded files names and the content and then this uh rerun sd. run what it does so whenever you upload a document it just refreshes this page so that you see the relevant results out of it so I hope it Mak sense uh and then you have sd. button upload document and then the key should be upload document button and the function which should be

is this particular upload document dialogue open which will be having the functionality when somebody clicks on this document now we fetch now we fetch the documents from the database which we fetch the documents from the database I I don't want to go into too much of detail of what we exactly we do over here but in a nutshell this particular document what it does it takes in the ID and the name of the document and then gives out the um document tax as well as the names and whatever the names of the documents and all

so it retrieves the store documents and there Associated Tax from the database and it uses a query to join to with with joins to connect the documents document tax and tax table to eventually get those answers so how it works is document. ID gives you the unique ID of each document and document. name gives you the unique name for sorry not unique the name of the document and a list of Associated Tax name and a list of Associated Tax name that's that's another one now what we do this array. remove what it does um this

these joints which which having so if you see that we have something known as array aggregator So eventually what it does it Aggregates stacks for each document into an arror removing any null values so basically you can see that we are removing any null null values out of it and then you have joints what it does it ensures all documents are included even if the if they don't have a tax so basically this ensures all documents are there even if they don't have the tax and then it retrieves the tax name uh which is over

here left outer join with tax it retrieves the uh tax names links to the each document I don't want to go go in super detail about SQL you just need to understand it's more sort of it helps you store and retrieves to show the information over here okay perfect so if the if the if there are no if if there are no documents what it does if there no documents in the database it displays an information message prompting the user to upload uh documents so if you delete the document from here you will see see

that particularly it will ask you to do it so no no documents created please create one but if this if if there's a length of the documents then what we do we display it so we go through document to the documents we create a container sort of setting so if you see the container sort of setting where we be are border is equals to true and then we write the name and then we attach the tax towards it and then we we also give an ability for someone to delete it now we have this option

of uh Delete document so if you come over here it says you just need to provide the doent ID and what it does it documents. delete where the document ID was the ID of which was retrieved and then you simply execute the command that's why we written the do delete document function in the v so I hope it makes sense it was pretty simple to understand so now what we will do we'll simply go ahead with chat with documents because till now what we have done is we have created a system when somebody uploads a

document the retrieval part is done which means that it eventually converts that to an embeding trks and embeding now we need to create the generative part of it so whenever whenever someone ask a question the question has to be converted into an embeding and then it has to be compared or made sem semantic search based on the embeding and then eventually it should give an answer so we we that that's the that's another part which is very easy to be honest so let's go to the chat with documents ASAP so now we'll talk about our

generative part which is the chat with documents so this is our generative part where we are going to literally interact with the documents which we have uploaded over here so uh what what we will do we'll eventually go ahead and um go step by step throughout the code so completely ignore the Imports right now and you might not understand a little bit the essense of it all is just just for making it a streamlit purpose and easy to understand that's why so it's okay if you're not able to understand too much about it just that

I have used little bit of streamlit advanced stuff which I I would like you to learn a little bit about stream later and how we exactly frontend and all it would be very easy for you then to understand so we set the page configurations to chat with documents and the title should be the chat with documents that's what it is over here and then when we possibly go ahead is is the message class definition so for a for a structure of how the message object should be so in a message you have something known as

the role which is is is it a user or is it assistant and content which is uh what content it comes to the actual text of the text and the references which says which chunks were used to create such answer right so this is the references so we create a message so any message will have these three components inside it these three this is the structure should be the reason why we do it which because we need to ensure that all messages in the chat flow follow a consistent formats it's like defining a template for

storing your conversation data of course we are not creating a database for storing our conversation data it's just that that we'll be uh maybe we want to for at the moment by the time I I I keep on opening this I I have this page open it should at least have my all the conversation history so that you can take take in that account also to I give an answer so uh now we have something known as if messages not in session state it should be we create a new one so what it does it

initializes the session state for storing messages so by the time I'm opening this I'm having the conversation it should have the it it it should have that all those messages already there so if it is not already fine it creates an empty list every time we open this so this session State ensures that for this session the charts will be available in GPT is different it will be always Avail your history will be always available but at sometime only one session you have only one session where all the things will happen and then when you

and once you go away then nothing will happen I mean everything will be back to the normal back to the it it will not have any history so that's what it does it is it initializes the session state for storing messages where all the history of the chat uh remains there and without this what will happen every message would reset every time like you send one message and then it will get a response and then it will redun which is which we don't we don't require we need a chat history also right so this push

message what does this do what what does this do it it is a helper function which we have created to append a new message to the chat history so uh which you have created over here so it should append a new message to this chat history so let's say I have this one conversation it gives a response I have another conversation so it should already also append that thing so you'll see how we'll use this post message in a bit but now the major part these are all for health helper functions just streamlet thing just

for showcase purposes but the major part is this so in generative models what we have learned first of all we eventually take the question we convert that to embedding and then we use a similarity search any similarity search using PG Vector to compare the similarity and get gets you the top five relevant chunks out of the already available chunks for that particular question right so what I what I'll end up do what what I'll be doing over here is I'll Define the asynchronous function because again it is an asyn asynchronous function because it it runs

things parall and then we'll be creating event Loop uh to make it faster and all so that's what we will end up doing so the user's in input message is analyze to find the most relevant document chunk stored into the database and then we'll be using PG Vector similarity search eding uh to retrieve the top five matches based on the question asked so let's go step by step over here uh the first is that add pushing the message which is of course you need to related document information chunks in the is the nothing but your

retrieved chunks after the embedding search and then you create one transaction which is one SQL query so you set an open API key so that SQL knows uh the credentials to in interact with the open AI key and then you get the result so you into from the table of document information chunks which has all the document information chunks what it does it's it it it actually calls a SQL and then it takes the eding row uh sorry column and then it calls an uh ai. open embedding use pgai which you call this as a

PG from time SC itself and then it and then you give the input message because we need an embedding for the question which was asked limit five and then we execute it so in just single line what you're able to do you're able to do the similarity search as well as do the embedding for the question which was asked and that is the power and then what you do you go through every chunks of it and then you relatively append the chunk over here and then you comment the transaction I hope it makes sense now

what we do now because now because we need to push the message so what what we do we have the role the content is the input message that that is the question because we literally have to push the keep keep on adding our conversation history and the references so references in this case I'll I'll show you it is nothing but uh your uh most favorite so what I'll do I'll simply upload a document manage documents upload document let's see see if I have any good documents come here let's see if I have one of the

other one yeah I think I can have something so let's do one thing possibly what we can do is we can come over here and then uh let's say let's let's let let's take some uh PDFs okay upload a document of course before that we need to few set of things what I'll do I'll delete this first of all and I'll add a tag the tag which I'll add is nothing but water cycle and go to manage documents I'll go over here and then I will upload my most favorite water cycle document on upload so

I'll show you what it does we'll have one conversation yeah so if if I could show you what's happening in the back end it is generating matching tax so it eventually generates the tax and then it generates the it it Compares this and then generates the matching tax which is the water cycle which is correct in this case and it is generating the chunks which is generated six facts for the PDF for the chunk one and then it eventually adds that into the document information chunks out there it says inserted all these now we can

come over here and then say what is the what is water cycle let's let's be very lame okay uh what is the summary so if you come over here you will see that uh you are having something running up but of course so the lesson involves student observing evaporation condensation so now when this happened what it does when the cion was sent which was converted which which eventually converted the into embedding and embedding which was used to retrieve the irrelevant documents and then the references so the in the message object what is being stored is

your question input message your simple content uh as well as the role as well as the references and these were the retrieved top five uh matching chunks which was which was taken from the PG Vector which was this this statement which we which eventually got us right so now over here this is the first thing that that's the first thing happens but how do we really generate this response now we have generated the relevant chunks but how do we generate it we again do the same thing we keep on trying because API might fail some

some time in between but we need it to be running while true it's in for five times at least with 1 second delay it says that we will call no P client for gp4 model the system and the instruction which is given to the system is respond to the so if you if you go to the constants what it says that you're a chat bir who has some specific set of knowledge and you'll be asked questions on that given the knowledge don't make up the information and don't answer until unless you have the knowledge about

it knowledge you have is following so we give them the context which is the relevant chunks only right so if if you come over here so if you see that we go through the chunk then then we only give the related document information chunks that's it that's why we say just take the knowledge from year one and that's why it is so powerful that gives you an updated response only right so it it it handles such things and then um the system message which defines the bot Ro and the user assistant message which eventually defines

the uh the the content which eventually defines the things the the the the instructions to be given with all these parameters right and then whatever the response is the message content is taken up and then we if there's no response we don't do it but if there's a response we push the message uh we push the message the role is the right now now assistant uh because the user was pushed over here because the input message the chunks but now what is pushed is your assistant as well as the response and references is none into

this case as of now because uh we are already publishing the references from in the user s side itself so it is very simple it simply say takes up the your relevant chunks and puts you the very nice uh generator response using open Ai and it keeps on try un this it is five times now over here so what it eventually do over here is uh it stores and displays the information so over here if if you can see it stores the uh in information which which will be used to display the response messages so

now what this code is and this St do rerun so basically once the thing then we just uh wait for it to run and then another another message can be added over there so uh coming back to the next one which is for message in the St session State what it does it Loops through the conversation history because we are storing all our conversation history into the messages and it Loops through and then displays each messages which is user message as well as the bot message so it it it goes through the all the state

which we in in the current session so if if if I refresh it will be gone if I refresh it will be gone but it is it is displaying the messages because it is already stored into that into this particular current state so if the reference if the references because we also want to add the references so if the references exist and references are nothing but your chunks if the references exists and the message references if you see into this case is nothing but a chunk so if the references exist then you can expand it

which is X expander and then you simply write all these references and then you say say something which is over here say something you need to add so to continue the conversation and then it is as simple if the input message comes in then you create an asynchronous task and run until complete where you send a message while giving the input message and when you call the send message function it does all these function and it keeps on doing it it keeps on adding into message and into that current state I hope it makes sense

to you now now in the final workflow users enters a query in the chat input and then what it does it retrieves the most relevant chunks and then they send them to open apis along with the query to generate a response and the assistant response which is over here the bot response is displayed or along with any references so I hope you got the uh it's very simple we already discussed it's just that we have made little bit advanced stuff more efficiency entric and all manage tax so this is very simple um I I don't

want to talk more in detail so it's basically we have created a framework we have a we have given a position to delete a tag and then we have created dialogue The Decorator function to a decorator to add a tag where the where the function is that somebody can the tag is the text input and then you're calling the tax table into the database to create a new tag and then you rerun it to show you the updated one right it is very simple so you create a container and then it should write it and

then you should give a ability to someone to delete it it's very simple I don't want to go in more more detail because we have already took a lot of time I I I thought I'll complete this project in 30 40 minutes but I hope that you understood the tag systems is you can see over here I would appreciate I have not gone through some of the code which was not too important for me to explain it's your job please go ahead and go to GPT copy and paste streamlet is very easy to know just

copy and paste otherwise it will take me another hour to explain you each and every line of extreme lit code so I hope now it makes sense and then what we do is nothing is we are done and uh if you come over here then you have the chat for documents manage documents and the manage tags ready which can be further used now I hope it makes sense now what we will do is we will finalize everything and uh I will just make a final intro on what improvements can be done and what can be

possibly the next set of course of steps for you in into this project now what I want to you to do is I want you to a little bit think about how you can make this project better to actually put this into your resume I have I I can actually add multiple features multiple functionality but you should be the one who should very create you should think very creatively and eventually implement this project in a way that is more unique than what we have done over here and this is this was one of the major

project which we have developed now what you can do there can be multiple sort of what what can you say improvements which can be L added over here for example you can add a functionality to segregate chats maybe you can add a functionality to give some sort of uh analytics you can add a functionality to enhance the you you can accept more multiple files like images like video files so that you it it it actually accepts such things so you can also U give the instead of somebody writing the tax you can give them the

recommendation based on the uh document or tax recommendation based on the do document and then you can eventually they can use the machine learning model to suggest tax based on the document content or you can add a tax search feature which can add functionality to search for documents using tax and you can also improve more sort of in terms of improving and sort of uh trying out different different embedding models and seeing what it works improving on the prompts you can also work on the autocomplete for the questions you can talk about feedback mechanisms to

make sure that how your users are interacting you can also have this knowledge graphs so that uh it can look very nice and then you can also have the analytics as I said and then you can have the domain specific prompts that's if somebody is coming from a climate they have a different prompt somebody coming on different they have a different prompt and then somebody can come collaborate you can also use finetune model Ensemble of the several models think about how you can do it I would strongly suggest you to go through PGA and PG

Vector they have amazing Integrations go through their platform learn about it I'll link more resources into the description down box below it will very easy for you to eventually go ahead I wish you the best and I hope to come to the next video with more advanced projects like this what we will do is we are going to talk about how we can upgrade our Lexi chat models so first of all you have already built a functional Lexi chart system with traditional methods like PG Vector for embeddings and basic retrieval techniques where you where you're

calling uh sort of open API calls to retrieve the and generate the answer so all this you have done it but what if I told you that you were just scratching the service it it was just the surface which you were scratching you can make it much more faster smoother and smarter and production ready for handling millions and millions of queries efficiently and that's what we are going to do into right now is we are going to transform our Lexi chat into faster smarter and production ready systems and which will make your resume and a

project looks way better than most of the people who are just relying on the basic systems so in this section we are taking J uh taking Lexi chart to the following levels so so far just to recap what you have done you have converted the documents which are split into manageable chunks for the processing and then opena embeddings are used for the chunk vectorization and then you use the similarity search PG Vector similarity search to identify relevant chunks and then Lexi chart retrieves and presents answers based on the vector matches that is what the retrieval

and the generative model which we have built right now but but as system scales we will face several challenges we'll discuss about these challenges into greater detail in the next 1 hour of the videos but I'll just talk in overview of the challenges which we'll face the first challenge is the retrieval process will get slower and slower as the database uh grows so if you search with the similarity search this will keep on the the latency the the efficiency of this will be decreasing a lot using PG Vector so there is something known as PG

Vector scale which helps you to Rectify this issue of even if the database becomes so large it should be able to manageably efficiently makes the retrieval process way better then embedding Generations becomes again cumbersome it's it's all about if you have the larger amount of database everything will be an issue whether it's search or whether it's even embedding generation an overall system needs to become more cost effective and scalable so even if you have millions of rows it may be that you get a lot of Errors something breaks down in the production so a lot

of issues will start to happen and that's what we'll have to face which is slower retrievals manual embedding processes architectural complexity and cost efficiency what we will do we'll talk about these each problem in Greater detail as we discuss more about the improvements which we'll do in the next 1 hour so if you don't understand don't worry about it perfect so what will be modified the first is that we will address the scaling challenges which is embedding generation so what we will do we will automate the whole embedding generation we'll not talk about right now

how we'll do it and why whats and how you'll get to know once we have the embedding generation we'll use PG Vector scale for efficient query handling using stream dis Canon you'll see how efficient it would be when you use it and you'll understand the problem over there itself I'm not going to talk about problem in Greater detail right now but we have discussed about the problem and the solution for each of the improvements which we'll be doing we are also going to make sure that all the workflows for scalability and cost eff Effectiveness are

streamlined and then we will use something as PG chat communicate completions for direct database to open a communication instead of any apis or layers into it to remove any architectural complex complexity in there so I hope it may it so these are the some of the changes which will modify if you have not understood any changes right now don't worry we have talked everything in detail so the first Improvement which will'll do is embedding generation where will automate entire PGI vectorizer so what you what we were doing previously we were simply going ahead and then

we were U asking open AI calls open Ai and then probably we have to create the open AI gives us embedding and then we store into database everything will be automated and you don't even have to call openi it's just your PG which will PGI vectorizer which anytime the chunks is added it automatically creates a new table and then adds it over there itself efficiently you which you don't have to do anyhow just few few few lines of s SQL code which is way efficient and faster than what manual process could be another one is

faster retrieval so instead of if you have millions of rows or millions of data into the databases you don't have to use PG Vector for that PG Vector scale uses efficient indexing algorithm like screaming disn in and we have talked about these algorithm in Greater detail so if you don't understand that's all right and the last Improvement which you'll do is instead of having a multiple layers of API calls directly to the open AI we we are going to integrate uh we going to use PGI to automatically let database interact track with the open AI

to get its answers that's it so these are the few changes which we'll do into our system and we are going to start with our first change before that we'll quickly set up a few things I hope you have the installation guide so let's quickly set up the few things before we start with our first Improvement so quickly we will set up our thing now before we go ahead and before you go go ahead I hope you have followed our installation guide PDF where we asked you to install the pogress SQL and then we asked

you to install Docker which is very important because we're going to use Docker for installation of the time scale setup and then I have asked you to download the python 3. and required libraries so the let's let's first of all go to the first step which is nothing but uh making sure that um uh you have the docker compose file which we have listed out here uh is there so what you should do you should first of all go ahead and download Docker if you have not and any POS zgi client I have used table

plus as a GUI client for procress and then what you should do you should download all the uh you should first of all create a virtual environment so the virtual environment is created by following is uh you can just go ahead and let me just show you the I I think we have already given the into installation guide so basically python when Lexi chat and then what you should you should activate that virtual environment by Source Lexi chat been activate and once that virtual environment activated you install all the requirements from the libraries out there

now once that is ready what you should do you should go ahead and uh simply go ahead and uh uh we will we will now down go to the docker is that so let's quickly get these things ready so now you have to set up this Docker compos yl which is the name of this PGI and the services over here would be time scale you don't have to worry about this is given by PG doctor do Docker image itself so you don't have to worry about what is this it's just for installation purposes so we're

going to install PG the time scale PG my main thing is pg-1 16 PG POS 16 and then you have to replace all these placeholders with your thing your pogress user when you are setting pogress connections you must be having all this user password open API key as well as your open API key and then PG so over here this is the time scale where you're downloading the uh time scale stuff and you have given the whole container name as a PGI where all the services will be downloaded time scale DB where we are setting

up first and then the vectorizer worker which is the PGI vectorizer which we're going to use don't worry we'll learn about it pgf vectorizer which which you're going to use ignore everything over here you have the you just replace this with your informations and then open I key that's pretty much it so once you do it I'm going to quickly you know uh change my thing okay so I think I have this ready I'm going to save it and then what I'm going to do you have to I let me quickly uh go ahead and

delete everything from here so that uh yes I'm going to delete it over here let it delete perfect so there are no containers so what you should do you should come come come back to this uh terminal and then run this command which is I've just written it everything for you so that you don't have to guys ER compose upd time scale DB right so once you do that it will automatically create something over here so if you go to The Container you have time scale DB running and it's going on right so that's the

first uh command which you have to run which is uh Docker uh compose now what we'll do we'll have to because we need to run our stream late applications too so you simply go to the some db. by and initial setup so this this may be uncommented so please uh comment uh un uncommented it and then what we will do we'll simply run this out first in Python and uh I'm just going to do this okay that's good now we'll run this let's see if it yields any error so if you click on chat with

documents then when and it says schema AI does not exist line so the the reason why it states because we have not enabled our extensions into our database yet so you go to the GUI extension and then you simply come over so if you see all the tables has has been removed right so you have to write create extension and select it and then run uh run it so it says the installation required extension PG Vector installing required extension so it will it it it will install all the PGI PG Vector scale extension for it

to work and then if you now refresh it it should be able to work now right so that's the first thing which you have to do and then you come back right and and and then you come back and then you download your another service which is your worker into this so I'm not going to show it to you but yeah PG Vector ve vectorizer worker when you do it you'll be having your worker also ready so you see vectorizer worker is also running out here into your Docker your installation setup done from here make

sure that you have enabled the extensions and everything so that your PG Vector knows that okay this extension is there and then now you can go ahead and follow the tutorial as course so I hope you have also followed the installation guide and follow this setup to make sure that you have everything running in so hey everyone now the first thing or the first Improvement which we will talk about is how we will go ahead and enhance our embedding creation processes so previously which what what was happening is first of all what you were doing

you were uh you used to manually generate embeddings for our document chunks using open AI apis so what does the process really look like that we work what we do we first of all pass the document and split it into the chunk so basically the bigger portion of the text used to be converted into the chunks and then we use the use to use open AI API calls or pgai to for the each chunk to generate embeddings of that particular chunk right and then what we used to do we used to store those embeddings in

the post SQL column alongside the chunk so if I if if I could tell you what used to really happen so let's say that uh you have this document chunks and then you have to document ID then the chunk of it and then the imparing of it that was used to happen and it was more like a sort of the process where you first of all pass the documents once uploaded you pass the document into the into the and then it that gets converted into the chunks and then the chunks eventually uh got got called

by openai apis uh using PG to generate embeddings and then we used to store those embeddings into these particular U documents so this was this was our previous approaches now the problem with this is there are major problems which is the challenges which comes across so every time you upload a document you have to write custom Logic for chunking so it has to go every time a chunking embedding generation storing results so it was every time you upload a document it has to go through these all those procedures which like a custom logic into this

case now let's say there are there is a millions of chunks not hundreds thousands millions of chunks and very possible in the real world scenario millions of chunks comes across so for that it is not scalable to do add in a for Loop approach and all and then embedding generation was always prone to errors due to inconsistent chunking strategies then one time it is there another time open a behaves very differently so the embedding generation was a very inconsistent processes and also the storage was not optimized for embedding data so it was more like even

if there's a um millions of Dimension rows it it used to get into into into the database which is not optimized so if you need to go and search it back it used to take it it might take us hell lot of time because if we see a millions of such uh uh examples or the rows out there so these were the few of the problems with the uh older approach which is manual embedding generation and the processes so now we have something known as PG vectorizer which is again an open source extension of the

time scale database um so what it eventually does over here we have replaced the manual embedding process with the PG vectorizer so what it does it automates everything with the best efficiency and with the best scalability and keeping in mind of efficiency scalability optimizing costs and all the specific details in terms of uh infrastructure level so what we are doing in PGI so what it does what PGI does it does whatever we have talked about chunking creation and storage but in very automated and efficient way so what it does PG first of all once it

receives the document it breaks your document into chunks using a buil-in character of course it has a buil-in character we'll talk about it and then what it does it creates the embedding it automatically call the open AI model of a choice directly from POS SQL it you don't have to write the custom code here it directly po SQL directly coordinates with the open AI to generate the embedding and instead of keeping embeddings in the primary table PGI creates the different table so now what I'll do I'll quickly show you how does this work so what

what eventually happens so let's say that somebody comes over here and then we upload a um some document let me just quickly go through it manual processes yeah the water cycle over here so uh we have up uploaded the documents out here and what really ended up happening is this uh gets upload this documents gets uploaded into the database so once you upload it what really happens it it automatically creates the chunks but over here it creates the new new table New View table not the table which you can do anything just for view is

Chunks as well as the edings so that's what I said it generates eding for each chunk using the open AI we have not created the new table it automatically creates it so because POG SQL is automatically interacting with the uh open AI to create the embeddings we'll come to the code part but I want you to realize the importance of it um so coming back to this just give me a second yes coming back to this what really happens is you it automatically creates the chunk once the document is uploaded it's it's everything automated and

very efficient it uh creates the chunks then it creates embedding directly from post SQL not from uh writing custom code and then it has an optimized storage so what does it really mean it means that instead of keeping embeddings in a primary table pji vectorizer stores them in a separate optimized column which is in a view based format not not something which you can do and you don't have to create this right so let me quickly go and tell you in the code part how it exactly works so previously we used to have this db.

pi and then in db. pi we used to create the document information chunks and we used to have an embedding also which is uh sorry which is our uh one more text field where we used to create the embedding so if you come over here so this was a document for information chunks document ID the content related to it as well as the embedding s out of it so that used to happen but over here we are not doing it so what we are doing we are doing the initial setup so which is very important

so just to make sure that you understand this very well what you should do you should first of all we have the code and this will be commented so you have to in in the first run you have to uncommented because this is the only one one setup you have to do only one time so that your vectorizer PGI vectorizer gets activated so what does it really mean to gets activated it means that every so if the document is uploaded over there it does it every 5 minutes it will keep on coming and checking that

whether the chunks are available or not whether the document has been uploaded uploaded or not if they are uploaded then it automatically creates the chunk so over here we need to do the setup so any documents comes into the place it should be able to automatically convert those to chunks then in bearings and then store it right that is what the tables are so we create a table db. create tables and over here we are giving all the uh because we are creating this document table tax table document tax and document in for information chunks

as previously and then we are executing and then we are also making a new uh setup which is the PGI vectorizer where it automatically creates such things where we are said where this is a SQL code so it says select AI vectorizer uh which is create vectorizer and then we are creating the document information chunk so what does really mean the do document information chunk the document information chunk over here really means that um we need to take the where to take the chunks from we need to take the chunks from this source so this

is our source where we need to take the chunks from and then what should be the destination the destination should be this the document information chunks edics and that's what it really means document information chunks EDS that's where the destination is and then you have the you have an embedding uh now embedding really means over here that the chunks which you've taken from the source because this was the source which has been con converted into the chunks and now we convert those things into an embedding by calling PGA embedding open a that's again which we

have been doing using PG and then we use the model of a choice and we give them opena API key okay I hope that you've already run this Docker compros in your initial setup so that you can easily get your open AP key now chunking chunking what does really mean chunking means if if the particular thing is big enough so it automatically chunks also it automatically chunks also for for example if you have the source like four to five sentences so it automatically chunks also by using the text spitter fun uh me method of its

own formatting means formatting means that let's say that you have this embedding so what it does you can write chunk this just for the formatting purposes of course we are not going to use it so the chunk would be simple chunk please add my chunk otherwise you can go ahead and add any sort of statement over here so that's what the formatting means so what it does it simply creates this sort of setting where any documents which you upload it should be able to automatically go over there and uh create the chunks if the source

chunk is large enough and then convert into embedding and store it into the document information chunks embedding and that's what it does nothing else so I hope that you really understood this what PG AI vectorizer does and I will talk a little bit about what is the difference between our old approach and the newer approach the old approach what is what it really was that we used to do it everything manually but now over here what we are doing is U uh automating it which means that it is more simplified workflow so we are literally

previously we were writing code right to chunk it off and then we we are writing code for taking the embedding over here we are not writing any code right that's what the power Parn is and they are more efficient so it's like you don't need to write custom chunking or embedding code anymore it's more like simplified workflow and then you have the embeddings which are stored into a dedicated table optimized for Vector data so that is automatically created by PGI and everything runs with a single command SQL command so it saves time it reduces any

sort of errors and it handles millions of millions of chunks with a minimal effort into this case right so I hope it makes sense uh whatever PGI vectorizer is and now if you run it so let's say that we that we have this particular thing and I'm what I'm going to do I'm going to delete I'm going to just do one thing I'm going to delete this so all the things will be also deleted let me just quickly show it to you I hope so yes so have as as as as you see everything is

deleted now we'll go ahead and upload something which is into this our water cycle perfect let's run this let's wait for this to run perfect and then what we do we simply go over here and then we run this so if you see our initial setup was uncom commented which which which was uncommented so it again creates the new uh vectorizer and tables and everything such like that so it says that you have the embeddings right right over there the content the chunk so this was the content and this this was the chunk which was

got from the content right and this was the content this was the chunk so it so of course we're taking the source this as a chunks the content and then we have this is our source where our bigger file is converted into the smaller smaller pieces and then these pieces were given to the as a as a source to the embeddings where it this is the content and this is the chunks and then if you keep on uploading so as you see we have not written any code so this is why there is the reason

why we are using PGI vectorizer I hope it makes sense I hope it really makes sense to you uh now what we'll do we'll go ahead and do more improvements using PG Vector scale uh we are going to use PGA aai more and then we'll help you to make you understand much more better thank you so much we'll see you in the next one so now what we'll go ahead is talk about another problem which is let's say that if your data comes in in millions and millions of rows the problem of optimization the problem

of uh working with the larger data sets of course I do not have that uh kind of a data set which I can show that how it to really affects but I will talk about some of the theoretical language in order to make you understand uh that what is the real problem so the real problem is so let's say that you are in a huge library with 1 million books and you're looking for books that talk about space exploration so that is a situation that you have a one 1 million book into a particular library

and you want to take out the space exploration book so how you will search so I'm taking I'm taking an example of PG Vector so we were using PG Vector previously so how you will use PG Vector in terms of taking out space exploration books from those 1 million books so basically what you do you use a basic catalog that tells you which books mentions the space exploration text that's what right so you you will go ahead and then you'll use a basic catalog that okay these segment of a book mentions about space exploration but

the only problem with this that it is faster because you eventually get where the space exploration books are but it only works if all the information fits into the small notebook which means the catalog so catalog might be a small notebook where okay at this point of time this this book is there at this St block this these these books are there so if the catalog is small and you're in information can fix into that catalog then only it makes sense if the catalog becomes too big then it's also hard for the you for you

to find out okay first of all let's first of all figure out the catalog thing there might be millions of topics so you try to find out okay where it is so if that gets um too big then that's where it slows down that your searching capacity slow slows down and that's what is the major problem which comes in so to to talk in a nutshell of if you are in a book of one a library where it has a 1 million books and you're looking for the books that talk about space exploration so basically

you look at a basic basic catalog which is there and find out the book but now if the catalog is too big you'll have a hard time in finding the right uh first of all block to go in so uh now over here if you don't if now let's say that uh if you don't use PG Vector scale so first of all and or P PG Vector it is very timec consuming because you need to manually search through all the chunks millions of Records traditional methods which is scanning going one one by one does not

make sense and even PG Vector method of catalog is eventually they eventually slow down as the DAT datais increases right and basic problems so that's where the PG Vector scale helps so we have something as streaming disn uh algorithm which helps you to do the optimize search uh which is which is one of the so PG vectors scale is nothing but a complimentary solution to PG Vector where it helps in one of the major problem which is optimizing the search procedure for retrieving results if you search something so I'm not going to go in detail

about what P streaming disn does so what it does uh in in the nutshell it's like instead of flipping Pages finding one by one or relying on memory catalog what it does it summarizes the catalog onto a disk right and it organizes the books in a way that helps finds the related topics faster I don't want to go into mathematical details on how it eventually does but it stores it stores the book summaries into disk and which helps and also it organizes the books in a way that it helps the find related topics faster and

much more better which is faster and relevant results uh you can go if you are more interested to know more about streaming dis and and how it helps you can go ahead and learn learn from a different sources but we are using a disk based scalability dis based storage to intelligently handle massive data sets which helps to balance your accuracy and the speed and also helps you to get faster and relevant results without scanning the entire data sets or without looking at the catalog or big catalog if you have the huge data set so I'm

going to take one example of chat chatbot with uh PG Vector scale so PG Vector scale is as I said is nothing but a complimentary solution to the pg Vector where it helps to um I mean what what can you say it where it helps to efficiently search so let's say that the chat B has to look through all the stored chunks of knowledge to answer a question so some question was asked and the chatboard has to look through all the stored chunks of the knowledge to answer so if you use PG Vector it just

goes to the catalog the index and then tries to find it but now if you have the millions of chunks the catalog becomes big and the process slows down and the relevant results also SL slows down so the chunks which are stored onto disk what it does is using stream streaming disn so now over here chunk are not stored into the some sort of POS State database is stored into into a disk into particular format which organizes them efficiently right and then what it does it organizes via streaming dis and an algorithm now let's say

that if somebody ask what is the tallest mountain on Mars so PG Vector scale it finds the most relevant 5 to 10 chunks quickly the reason why it is able to find 10 5 to 10 relevant chunks quickly is because the chunks are stored efficiently into a disc using streaming dis in and algorithm and then it saves Time by skipping irrelevant chunks and the chatboard gives you the exact numbers faster now let's say I will take one example I'll take one uh example of PG Vector skill and how it eventually helps so let's say that

you you're using uh you have a 1 million chunks let's say 1 million books and it takes 1 second to check each chunk whether that particular question is related with the chunk or not so the total time which will take for it to do is about 1 million seconds which is about 11 days to find the answer to a single question now with PG Vector scale the smart catalog which we have is the disn which narrows it down to the top 10 chunks into the milliseconds so it just gives you the top 10 chunks instantly

and then what it does it takes less than 1 second to give you the answer so that's why uh PG Vector scale is way more faster way more efficient and improved accuracy uh in terms of this so what we will do is um uh we I'll quickly go make you through the some of the code changes which we did there's no specific code changes this very simple code changes which we did uh which is not so uh hard also to understand so over here we are adding something as set disk an and query score so

we making in db. pi we are we are setting this set DB and dis an and query score so what is it help these are key parameters for the dis Canon we are obviously using PG Vector scale I hope you have installed and enbl the PG Vector scale so that it automatically stores the document uh sorry chunks into a disk sort of setting but these are key parameters for the PG Vector scale so the first one is the setting up the query res score so what it helps this parameter helps balance accuracy and performance during

queries so let's say if you have lower query score which means it will be faster but less accurate retrieval Which is less accurate information but if you set more High query score it will a slower but higher accuracy into this case so this is just a definition so that we can use this command to use this into our when we retrieve the information so let's quickly go and retri talk about where where we are actually retrieving the information I'm go I'm going to go over here and then come to this chat with documents and this

one yeah so over here we are sending the messages but over here when we send the message it is a generative phase where we are first of all retrieving the information so over here we SC that function we call that particular function which we have made and give the 100 as a SC score which maintains a good balance between accuracy and score then it sets the open AI key so that we take the results from the open Ai and then it executes the command and you don't have to do literally anything you just have to

enable the PG Vector scale and set the scores that's pretty much it and that's what the PG Vector scale will help you to do everything very efficiently you will realize this importance when you work with a millions of data sets how quick it is and then you can compare with the PG vectors to see how uh small it is so what it does it result is select the content from the document information chunks and then you create an embedding for the question and then you take out the top five chunks out of it right and

then you simply go ahead and then take theed chunks and then you as we have talked about all everything in detail Remains the Same nothing changes just you need to set the DS disc SC and score of over here that's pretty much it so I hope it makes sense I hope it really makes sense now if I talk about in a nutshell how eventually PG Vector scale helps with without PG Vector scales SQL query scans all the rows taking longer to execute but with PG Vector scale it uses disn indexing for cosign similarity to find

the top matches in milliseconds so it's all about optimization making it great so if you use this PG Vector skill into your project though your project will be way more you know Advanced because it is able to handle even millions of rows columns and stuff like that data in a matter of no time I hope it makes sense now what we'll do we'll go ahead and talk about another Improvement which is using pgai for the chat completions right now we are directly calling the open AI for it now what we will do we'll go ahead

and use pgai to uh for the for it you PA directly so in in case of PG embedding generation pogress directly talking with open AI to get the embedding so it is not we I'm not writing custom codes in the same way over here the PA will directly talk with the open AI open a to take the chats comp completions to make it faster efficient and more accurate so hey everyone we going to talk about our another Improvement which are doing it very majorly and which makes your project super duper amazing is chat completion so

basically generative where previously we were using uh PG to create the embeddings directly from the open AI we we going to use PG chat completions method to comp to generate an answer basically generative model to directly directly from the openi interacts with postgress equl so what is the traditional approach right now so the traditional approach is that you need to write a separate layer of code to in python or JavaScript or similar languages to handle open call so that should interact with open and then openi gives some result and then you take out and you

format it in a way and all those things now switching between the app layer and database adds delays into it and that's another problem that there are multiple less and it creates a higher latency problem and it makes it more prone to errors makes it more prone to system failures that's why you were seeing that if you're running previously you get retrying retrying so those things happens a lot so that's what we eventually give something known as uh pgai so you write code in python or JavaScript to call the open AP API and then the

API handles Keys request information it retries and responses in your application layer and then it stores and retrieves separately from the database adding complexity which is which is very uh extensive so first of all it calls it it it apis retries it makes sure that that you get an answer and then it separately stores the information now these are the major problems when you have when when you actually go to the extensive large scale databases or workarounds and all so with PGI chat applications what it really happens the database handles everything internally so let's say

that you have an uh database so it happens everything internally which is like embedding generation which we talked about the PGI embedding generation the retrieval of the chunks and the response generation via open Ai and this happens everything inside it where posg SQL database directly interacts with the open to get to actually get an answer towards it and stores the answer into the particular database without having any layers in between making it more faster and efficient and much more better so coming back uh to another one is I I want to take talk about first

of all call uh this one which is generate matching tax sorry generate chunks so we were using open AI ined to generate the matching chunks but over here what we are doing we are generating the chunks where what we are doing we're giving the PDF and then what we are asking we as previously we were calling an openai API to reach out and then try to give the whole bunch of text and then give me the uh prompts if you see over here the what can I say the system prompts so if I go over

here constants so this is create factions create an expert text analyzer take any text and analyze it create multiple facts out of it so previously we were calling an API to do it now what we technically doing we are calling an open AI which in which which once the documents is upload internally interacts within internally interacts with the open a to generate the response basically you're calling SE select select all the responses and this is the metod which is ai. openai do chat complete this this is the model which you have to use and uh

the system and the user and over here you create fact chunks which is the the the instructions given to the system and then PDF text which is on which the instruction should be followed and then we fetch everything so this happens this and this this particular PGI makes it very efficient much faster and better than all so this is what uh one of the example is and then we have another one which is um where we are generating the matching TX so basically basically uh previously again we were giving the text we were giving all

the uh TXS out there the PDF text and then we were asking them to generate the matching tax over here we are doing the same but over here we calling this ai. open chart applications using this model and everything Remains the Same and that's what PGI is so powerful that you don't need to really add any layers any layers we literally removed all the layers from it and it directly interacts internally in the databases so over here you seeing that I'm executing a SQL command so instead of going to the table plus um over there

go I am over here doing everything straight on over here then particularly over here you have something something known as uh so this these were retrievals where you're using PGI to retrieve the stuff out there which is Chunks and uh uh your uh tax which we have to match with and embedding is already generated using PGI vectorizer uh which is another one extension which we talked about now over here we had in in generate the message which we send the message instead of you're calling an API to do it you're saying select open and then

use gp4 a and then system we are giving the messages and then we're storing it so everything Remains the Same over here they're literally calling PGI to generate the answer so which which it happens internally everything on its own so to talk about it how exactly it works it's a built-in integration so when if you have seen we have installed and ex enabled the PGI extension so PGI connects with the POs SQL directly to open AI chat API and then it generates the responses using structured SQL queries which we have data right now and it

handles as as I said chunk embedding everything very efficiently and it delivers as soon as possible a realtime responses so that's why the PG worked a lot so we have talked about this in great detail uh about PGA npj is a very powerful library to help you do that to generate embeddings directly interacting with POS SQL and open AI without anyone in the middle I hope it makes sense and I really hope that all these improvements you'll do it and that makes your project much more better much more powerful and uh probably this might get

you something so what we'll do we'll finalize our stuff into our next call so into our sorry next uh video and that that was it where we literally use PG Vector scale for faster pgai for automatic chat completions and uh uh PG vectorizer for a automatic embedding creation I hope it makes sense let's catch up in the next one