Mastering Natural Language to SQL with LangChain and LangSmith | NL2SQL | With Code 👇

59.86k views13253 WordsCopy TextShare

Pradip Nichite

Embark on a journey to redefine database querying with "Mastering Natural Language to SQL with LangC...

Video Transcript:

hi everyone in today's video we're going to build natural language tosql chatbot using Lang chain and before I show you demo let me tell you what we're going to cover so the first thing we're going to build the basic natural language to SQL model that can take user let say natural query or question and convert into the SQL query next we will see how can we add a few short examples so that we can improve the model performance next we will see instead of having those fixed few short examples how can we select those relevant

examples dynamically based on the user query we will see what if we have a large DB with hundreds of tables how do I make sure I only select the relevant tables because we can't put all the table schema inside the let's say prompt first of all it's not the feasible and the second thing it going to cost a lot next we will see how do we customize this prompts and finally most important how do we add memory to the chat board so that it can answers the followup questions now here is our chat boot let

me ask some basic questions and one more thing we do have something called langmi that will help us to know what's happening you know for these steps under the hood so let me ask some simple questions maybe I ask you know um I will ask how many orders do we have I should also tell you what database I'm using you know that will help so MySQL sample DB um you know I kind of I should have put that link somewhere I think yeah this one is the MyQ sample database that I'm using basically which has

uh you know data related to the customers orders and here products are something related to cars so you can see this is what the schema what we have and I go back here we have total 326 orders so that's how uh it has generated now let's look at what's happening under the hood so let me go back to the Lang speed and let's look at the query that we just fired which is you know the um how many orders do we have and if you look at the final output we got is something called we

have got 326 orders but let's see what's happening you know under the hood so that we understand the first step how did it choose the tables to use so this is the first thing it has limited the tables now you could see see it has only used the table called orders So based on the user query it decided instead of using all of this table only that particular table information should go inside the prompt called the orders and we can verify that thing so if I go to the next thing where the query got generated

which is here if you look at this is where uh sorry not um this is where the query got generated and um the model prompt if we look at which will have only the ORD stable inform information so you can see it has only the orders table information and no other table information only the orders table information we can also see when this particular thing was uh you know thing what few short examples were added so let's say we can see something related to the product and um next was also related to let's say what

list of customers or something so basically think of so I don't I have only few relevant examples so they might not be the good but I I tell you what's happening under the hood so we got this two examples one is called listing product specific something and then we got let's say another thing listing you know customers let's say with certain Credit Now if I go back and now ask let's say different thing now I ask okay about um and employees now let's see what happens because this is a followup question it is assuming that

it knows what is my first question which was I was asking how many orders so ideally now I'm asking how many employees do I have right that's what I'm asking now it is understood that I'm asking about the employee That's What I Call asking the followup question now let's see to answer this question what tables did it did it choose and what let's say the few short examples it choose so let me go back here and um go back to again this is our relatest query called employees again we're going to see all of this

thing I just wanted to give you the Glimpse what's happening so this was our question and this was the final answer we got but let's look at this particular step which were choosing the table and it only choose the employee table you know that's what I mean it is able to dynamically choose what we uh you know should uh use and then we can confirm this if I go to the part where the query is generating and if I look at the openai uh you know call and the prompt I could see now inside the

prompt we have employees table and it's a sample records and nothing else because that's what is you know required and and uh if I go back now let's see what sample examples we selected see the sample example this time is something related to the employee right and this was product because why because uh we are taking anyway two sample examples and we don't have thr short so whatever the bestas matching it is coming but what I wanted to tell you if you have a good library of these few shot examples eventually you will have matching

examples at the moment I have only five or six example for the demo purpose and that is the reason you could saw when I asked ask about the products or the customer we got something related to that when I ask about you know I got something related to the employee and maybe if I ask uh let's say something you know inspired by this particular thing let's say customers in a France with credit something like that okay let's let's ask this one uh we will ask a customers let's say customers with the credit I would ask

how many customer first of all how many uh customers with credit limit or 20,000 and let's see this time what it selects so something 97 but let's say we go back to here and let's see whether we see that something related to the credit that example got selected or not and we know it's here somewhere in the query generation and this time it selected the table called customers what else and it also selected payment so because see now it require multiple table that's why it selected multiple tables what I wanted to show you you can

see now so we got something related to the customer and credit as a few short sample example this is one example this is another example right so both are relevant to what we ask so we ask about how many credit and the customers so it is related to let say payment and all and we saw the examples also uh the two examples that we selected from this these are also relevant so you can see this is working we are able to select dynamically exam examples and dynamic tables that we uh you know know using here

now let me go back and maybe let's say ask you know how many customers I ask right there're 97 so anyway let's go let's go now and actually see how we can implement this particular thing right now what we will do we will start with the basic you know we'll start with the notebook we will start building the basic flow as I mentioned here and then eventually once we have everything with the good understanding of what's Happening under the hood then we will simply take that code and I will create this streate application and you

know this is what we will see finally so I hope you you know have a patience so that we understand what's happening uh you know under the hood right once again this is the schema I will uh give this link so that you know you guys can uh use this particular schema or thing if you want for the practice purpose let me go back to now notebook and first thing I want is to install all the relevant L Chen things like lch open AI because I'm going to use the open AI model the next thing

um let's say the Lang chain Community then we have this Lang chain normal and then MySQL DB because I created and put this database into the AWS RDS so that I can access it remotely and then finally Crow DB as one of the vector database I want so basically this is my credential which I will change later and to start with the basic natural language model what we can do first of all we need to declare a database instance now this is crucial so you you can go and find based on your database what you

want so basically this is SQL Alchemy syntax kind of a thing so uh if you know the uh that particular that you know SQL Alchemy engine so we have a class called SQL database from the URI and we need to construct this URI string so for MySQL this is the string for the let's say postra SQL or for the other database you can check what particular you know documentation and what uh you know thing you require so for us the basic we require is a DP user password and this thing and if I run this

thing I will get the database connection object or let's say engine and then I can print certain information so let me print this thing here so you can see what all things I'm getting here so the first thing Dil whether it's SQL light MySQL post SQL that information is a part of this DB variable next thing get usable table name this is important because this tells us in this particular database we have this TBL the customer employees and offices and all of those things and finally you might be wondering from where does those information used

to come when we saw here those table information right they used to come from this particular variable called db. table info this variable is responsible for that and you can see by default it has all the table information so this is the customers table and then it will have some sample records from the customer so for example so let's say we have something called credit limit and this will help model to understand how does the credit limit value looks like right how does that number looks like something like that that will help those sample record

similarly we have employees table and all of those tables actually down we have all those tables you know including those nine tables or something what we have next how I'm getting this things coming here so this is called something a lang chain Lang Smith Tool which help us to track what's happening you know whenever you run certain things right so you can watch any specific video if you want on the lch uh sorry lsmith whatever it requires for this particular video I'm going to cover and I'm going to explain all the things what we saw

here so let me go back to do that particular thing we need to do this thing first of all we have to activate this tracing by setting this particular flag and then you will get this key if you go and create account on the Lang Smith at the moment it is you you get I think 3,000 runs free so I have been using and I have let's say 200 runs or something so you all can use this freely uh and this actually helped me learn Lang chain much better because now I understand what's happening you

know under the hood and it has improved my understanding next all we know that we're going to require the open API key so I'm just setting those environment variable now this piece of code is our basic uh what I call natural language to SQL thing basically L chain has one chain this particular chain so if you're not familiar with the chain and all of those things I had also created one beginner friendly tutorial called Lang chain expression language this covers everything so basically chain means you are able to you know uh put together a different

components of the language and you can execute them in a particular fashion so you can call them chaining together so what those component large language model is one of the component you have next could be your prompt is another component then you have Vector databas is one component on so all of these things you can you know arrange in this particular fashion and then you can execute think of this what we just discussed right each of this will be one step or component in our chain and we can execute them in a particular fashion or

sequence and we can call this as a chain or you can think of we can create the chain out of other chains so we will see that all of those things so L chain provides a chain called uh let's say SQL chain that we can create using this create SQL query chain function or whatever call it a class what it requires it requires two things the first the llm itself because you need a large language model which will understand the schema and generate that um you know the corresponding SQL query and definitely the DB details

the DB instance we saw it has everything what we require so let's try so we're going to ask you know what is the price for a particular product name because we know that product exists and see what happens this basic piece of code okay we got some error because I think I haven't run the above sale right the open a sale let me run again and you could see we got the SQL query corresponding to this particular natural query what ask which is what is the price of this particular thing so I think to get

the price it needs to go to maybe the you know the product table to get that particular you know price so this piece of code creating SQL query with the llm and DB instance is what you just need if you just want to get that particular SQL query and now we can go since it's a simpler equation we can see what has happened under the hood and since we have activated this now let's go and see this particular query what we just asked so if I go here every time I run any chain I get

this particular run here they call it I think trace or run whatever you know the terminology I might not be the perfect but I can see the select by Price or what is the price is what I just asked now if I click here I can go into that particular sequence what has all this happened this whole thing is one chain execution that we just did everything so when I click on the top it tells me directly what came and what I got as an output so it came as a question variable it came as

what is the price this is the natural query and this is what finally I got as an output then I can see okay but what's happening in the middle so these are the certain you know things happen in the middle but let's say we look at what happened when we called the open AI model so let's click on this and then you can see what input it got so when we call this function you can see it got some prompt now we haven't done we haven't created any prompt it means this is the default prompt

coming from the Lang chain so if I go I can see literally there's a big prompt is there and then it has some format of that prompt then it has details of the customer's table it has details of customer sample record the employees table the employee record and everything it has you can see it has all the tables information and finally you can see our question and we are saying we are now expecting the SQL query and the model has given SQL query which is here this is why it is important to have this thing

selecting or dynamic let's say selecting the relevant table why because we saw to answer this particular question we were only required this products table but we put lot of information there including all the tables you know the schema so let's go back again there and if You observe in the Lang chain or let's say in chat gbt anything we have since it is a conversational model chat is the conversational model like a chat GPT it has something called messages so if you say there is something called human there is something called AI those are nothing

but the message types this tells that this whole request is going inside the message let's say from the human and the model output is called the AI message this is the AI but what you see here is just the plain string but then how does we got this plain string so you can see there is something called the output parcel which simply converted that AI message you can see the input was the AI message but the output is simply a plain string this is we got there so this really help us to know what's happening

at each stage now let's go and create the more complex example now you know what if you want to execute this query so to execute this thing Lang provide a tool to execute the SQL query it is called the SQL let's say the query SQL database tool that tools require your DB instance and then you can invoke any query so I just have this particular query right now since we created let's call this particular step as execute query step as I told you in L chain we can combine this thing and that's what we're going

to do but let's look at this as an individually let let me execute this particular step called execute query and you can see I got two values actually those two values are associated with this Buy price and maybe this one right the MSRP this particular price and I got so this is how I can execute the query now this was I guess also a chain I hope it is a chain let me go and see whether it is recorded there or not yes I guess it is recorded so even this one particular thing what we

just did is a chain what we did this this one was also a chain and this is also in chain so we got something let's see what we got here this will be basic because there is nothing right no intermediate step uh it it has only one action it took this particular query and just executed nothing has intermediate steps go back to this now let's combine this is what called chaining or the L chain now we took one chain which was actually generating the query and then we see hey whatever the output of this particular

chain I want to give the input of this particular thing and you can see what there is expecting as an input so here you know we are passing this particular uh you know input so we basically we are now chaining in the steps and then final chain we got now we can execute so that it should create the query and it should also execute it let's see now I'm asking let's a simple thing see how many orders are there it is saying there are 326 orders so two things happen now right now let's see the

langing graph will be bit complex now let look at it because now it has two steps right so even first thing the final thing I asked this question I got this particular output but it has multiple steps see what's happening so first thing we will see okay so here we can see our uh model part the model got that same human message that what we had and then we got this AI message then we know that finally uh it converted into the simple thing which is the plain thing instead of AI message and then after

that you can see there is something called SQL DB query that's getting executed that took our query as an input and then we got so this is how if I click on the overall thing this and we got output basically the two things happen this one and this one now let's make it more complex okay I was telling you right that we saw there is some prompt already and we haven't given anything but we can print that prompt and see actually what is the format of prompt so we have this chain uh we should be

able to see its prompt you can see your my sare X but based on what we have provided detail this is the format it has and uh you might be wondering this is our input the query that we are asking this goes there and then table info is the variable that we saw earlier right a big information this about the tables and all so this is the prompt if you want to see don't worry we're going to see how can we customize this prompt because that's how we going to add those few short examples and

all those things great now once we have a basic things running but isn't it it's very ugly right so you don't want to show a user this response what we got from the DB you don't see this thing right you see a proper a rephrase response from the for the human let's do that thing the next right now what we do let's create one more prompt which will take all of this information which will take our query what we ask which will take the natural query SQL query and the final result we got and it

should rephrase the answer that's what we're going to do so we're going to do this a uh prompt we're going to create the prompt in the line chain if you're not available so we can import something called in a prompt template and this template we can create from a template so template can be created from message list of messages also from template you can think of it's like an app string or a string with placeholder that's it this particular thing from a template template is nothing but the string with the placeholders what we have here

so let's say we create one prompt where we say that hey we're going to give you the question what user ask the query that we generated and the result we got from the DB can you rephrase this answer for the user and then we can taste that thing here right now this is what that rephrase thing is doing so this is another chain what is happening here we are taking a prompt first thing is a prompt then we are passing our llm which we declare on the top which is the chart GPT uh you know

GPT 3.5 and then something called output parser so you have also seen this output parser coming here right so output parser is basically you know string output parser is going to take this AI message and give it like a plain stream so that's what we have added there hey first Let The Prompt go through the llm and we know that the llm output is actually an AI message we want to make sure that AI message get converted into the plain sentence right but we need to combine this step with the earlier two steps what we

had earlier we have generate execute and now we have a refres answer so let's combine these three steps in a final chain here now you might be wondering what is this what's happening here so basically this synta runable pass through assign so runable pass through is the way we are able to pass this you know uh let's say any input to this particular chain and this is how it is able to use that particular thing the uh this can use using this but this dot assign you know what's happening in the dot assign step that

we can see again in the Lang speed this is what my understanding this particular step when it runs you can see what's happening we are assigning something to this variable called query and what is that thing whatever the output that we going to get from the generate query so the first step of this particular chain is actually executing a generat query chain or a step and take the output of that particular query and assign to a variable called query that's the first thing now we got one variable called query next is what next step we

are creating a variable called result how again we are using assign by the output of this so we are taking first step now this particular step is made up of two small steps one is extracting whatever the query so we are taking whatever we put inside this query variable we are taking that and we are passing that query to the next step is called the execute query and the output of that execute query will go here inside this result so we just created values for the two variables one was the query other one was the

result why because if you look at the prompt which is after that the rephrase answer chain require three things it require question query and result question be anywh passing as the input so it is anywhere available there but then we also created two variable one is called the query variable other called the result variable right see this is what my understanding and this Lang Smith help me understand what's happening under the hood so let me run so that we all are on the you know same page what's happening so let's run this particular so that

instead of seeing the DB result we see the actual output so it is now seeing there are total 326 orders let's go back to the length me and see what's happening let's see the recent one okay this is what we asked now you see it became more complex because now there are multiple steps so this one is the one step let me see I could collapse yeah collapse so this is what my understanding so whenever that we did something called assign I'm thinking like this that this whole state is about assigning something to this variable

query this whole step is about assigning a value to the variable call this result right now let's see look at the query part when I click on the query overall part here is the overall this is what I ask how many orders are there and this is what the query I got so you can see earlier there was only one variable called question now there is an additional variable called query this is what that this one is doing you know assign is actually doing this thing it is making sure this variable query is going there

as an additional variable okay so we can see now whenever we create this final things executed we have the query as a um let's say an extra variable okay and that is what we got next we will see what happen and how we got that result variable when I click on the result variable you can see result has already input those two things question and query and then it got one more variable called result in the output right so when we were at this particular stage of you know up to this point whenever we executed

we already got three variables one question which was already there when we executed the thing and then later we got this two variable called query and the result that's what we could see this is what it is happening and finally once we have all of those things let's look at that final rephrase prompt what has happened now you can see final rephrase prompt what very small because we saw that thing it is saying that hey we have this question we have this query can you formulate the answer based on this result and this is we

got again an AI message and our final output parcel which gives us the plain thing right isn't it beautiful we could see what's happening at each stage and you know that that kind of made me more interesting uh you know about the Lang chain I started spending more time because of that okay good now we have a natural language tosql B ready that actually take a natural query execute sorry generate the query execute the query and rephrase the answer even if you have this thing it is good enough for you that you can build something

which has some you know use case or usability but we don't want to stop here as I told you what if I want to add some few short example because you problem statement may be difficult and by default model is not able to answer them well so maybe you want to add some examples so that it can you know so let's add some couple of example I think I have five six examples what I put here so example format you can see example is nothing but the list and each element in that list is a

dictionary with the two keys one is the input basically what user is saying in that case what human is saying and next is the expected output which is the query which is what let say assistant or AI is say this is what we got right so let's look at now how can we create the F shot prompt to create the F shot prompt there is a space so earlier we have used whatever the default prompt so we haven't used any custom prompt at the moment to generate the SQL query yes we did use the custom

prompt to generate the final refr answer but not for the generate query but now we're going to actually let's say create that particular form so before so this is what few sh uh chart prompt L requires okay I would say few short chart message prompt template requir it requires first thing example prompt it is saying whatever the examples you're going to add how do we format this example how do we format that each example it is explained here so we are saying Hey whenever we going to pass you an example it needs to be formatted

like the set of these two messages the first message will be from the human so whatever inside the input you should call it as a message from the human and then some SQL query it's like an anchoring so that it can generate the SQL query and the AI part will be the whatever query so we are saying hey if we have example like this make it as a human message and then the subsequent AI message that's what we are saying and then it is saying okay where are those examples so these examples are nothing but

the list of those examples finally we have to tell in this whole scenario what exactly the a variable so here uh I'm not sure I sometimes I got confused with this particular thing I think this input is maybe what we're going to send as a question and not basically these input variables something like that so maybe you guys can focus but my understanding is this input is not the input from here this input variable is what this coming what I am passing as an input variable that's what my understanding so let's see how does the

few shot prompt look like right so we can see let me run this thing so this is also another Beauty we saw earlier that we have created chain using other chains this is also chain this is also a chain and we created one big chain similarly you can see we have this prom template and that promt template we're going to reuse here and we're going to create this fuchia chart uhry few short chat template and then you you will see in the further that we will again use this thing and build something on top of

it right so kind of again kind of a composable or something you can see right let me run this thing so you can see what's happening maybe I would say let's see what's happening oh examples is not defined because I haven't run let me run it oh now you can see this is how fot example looks like this is what I mean this is how it is is formatting each example because we told it that hey we want to format like this human message will be like this and then this will be the corresponding AI

message that's what we mention in that example prompt formatting here and you can see all whatever because we have I think six or five six example all we can see here coming uh how many we are say this is one 2 3 4 5 six we have six examples and Al we are seeing now if you see um I think it doesn't require any does it require this all this variable table info I don't see it is using top K it is not using let me remove and see what happens because I might be using

them later okay good so I think it was not required so that input basically this one what we are you know uh sending and I'm sure it is this one like oh it is still running even so whatever we are passing it is actually trading it as a input that's what my understanding okay that this particular um you know input one this value is going there as an input let's see if maybe uh as we go forward it will help but there's no what if instead of six you have this 20 30 examples and we

don't want to put all of this example in the prompt so we want to select let's say if I'm asking something related to the customer payment and something I want payment example if I'm asking something related to the stock of the product then I want this particular example right instead of having all of this example that's where we want to do a dynamic few shot example selection and the good thing lanon has something inbuilt that help us to select the example it's called the example selector a semantic similarity example selection it means whatever the query

that we going to ask it's going to check that query or that natural question semantic score with each of these let's say the queries we have and whatever we want top two top three it's going to select so this is how we can build so the first thing we uh we import this semantic similarity example selector and we are seeing so there is a method called from examples basically we are giving the list of examples here is examples you require some embedding because to calculate that sematic similarity we're going to calculate the embedding then some

Vector store where we going to store those examples so that we can match thematically finally the K is equal to two means how many top example we want to select so I just want to mattion example and here also input key is I think this input now let's run this thing and see what happens what example we get so I'm asking who is highest paying customers and see what top two examples I got it is saying get the highest payment so something related to the highest payment and other thing is related to the credit I

would say still both are related to let's say a payment and credit matching example right who is the highest P customer let's say and again I can ask simply the other simple question maybe you know uh that same one let's say how many employees we have and I hope it will have something related to that employee question matching two questions will be related to the employee retrieve the names of the employee who report as something related to employee you can see we got something credit because there is only one let say the employee related question

and we have given K is equal to two and there is no threshold so basically it is just coming up with its another example right because we want top two example no harm into that so we can simply put them right so this is how now we got some logic something called example selector let's use this example selector inside our few short prompt that what we just created so let's go and see how we can do very simple minor change so when we create this few shot uh you know prompt template that we just created

instead of giving examples sorry here in instead of giving example we say take the instance of the example selector because the example selector already has the examples now let's see whether it works earlier we saw we got all of those examples right we saw five six examples we got this time how much examples we got how many products are there top K is I think again this top k or something I'm not sure why did I put because I was experimenting so much we will see if we need them and let's see good now we

got this uh you know formatted uh particular thing earlier see example selector is only giving list of dictionaries eventually this example prompt formatting this part is actually formatting into the human and AI messages and they will become part of our uh let's say the request right and as I told you this few short chat promp plate take taking another prompt template inside but now we can make this few because we saw our final example is not just only few short example there is some inst instruction before it and then you have few short examples and

maybe we want to add after that right so let's create a final prompt a more customize prompt so we are see this is what so earlier we saw how do we create the prompt let's say template let's see the earlier if you look at the syntax of the earlier one so I think we saw multiple ways of creating the you know promt so let's say when we saw the simple prompt you can see we are creating prompt from the template basically a um know a string with the placeholder the next we could also create the

prompt promt messages so basically it will take the list of messages you can see these are because this one thing is message message can be represented by the role or from whom it came and basically the content or that message so this is nothing but list of messages so this was actually made up from the messages and um this is also another style of creating the let's say we are also creating from the messageer this time we create one system message which has has that long instruction what we have regarding SQL and this is where

that table info where um you know all those table schema and their sample this is going to come here and then our few shot prompt basically this one this small thing will become part of that and finally the question that user is asking so that we can answer right it will be interesting to see it now let's see that thing see this is how now our customized promp become this is our system instruction up to this point yeah there some table info because just for the printing I don't want to actually call that so I

just created some placeholder because we know that there is one placeholder called table info you can see here and for the printing purpose I just put something here so that because my focus is on example and not the table info so uh you can see this is our system example uh some table info here it is going to be big info and this not not this one yeah this is our few short examples and finally this one is human is our final thing is our human input this one if you're not sure I can show

you this is what we said right so I can type something after that and so that you're sure that it's actually let's see that the final one is what we just created yeah you can see the final one is what we uh that one is the final this is what our uh you know prompt looks like now we are ready earlier we created our database chain without giving a prompt now this time let's give it a prompt so let's create again that same create SQL query now with this two earlier variable we going to pass

our prompt now final prompt so we updated our general query everything remains still same nothing has changed we still have our execute query which is there unchange we have a refres answer which is also unchanged only thing we change is the create SQL query chain and that's what I updated here let's run this thing and we're going to go to the L chain to see see whether we are selecting those proper examples or not let's run it with the spelling mistake I see there okay so there are 85 customers with this particular credit thing right

we just saw now go and let's go and look at the Lang chain whether we got those steps properly or not go back to the L chain or sorry the Lang Smith this is what we ask how many customers with the credit and we are more interested in that particular step where actually U you know so that step where we got those few short examples and it has to be the part of the query generation because that's way we are using them for the query generation look at the prompt here so this is the chat

open prompt again you can see that table info has now many tables customers and all of those because we haven't yet selected Dynamic tables we are still working on the examples but after this schema finishes we know we have our few short example what happened below the number of example question answer okay okay sorry so this is one message system message that is finished right because we know that chat take the messages right so system message was those big now you can see list of all customers in France with the credit and then get the

highest payment you know these two these two are our few short example and why they're coming like a human and AI message because we told it needs to be like that so we told it that you know they needs to be formatted like a human and AI message and we could see those two messages coming this is the first pair this is the second pair and finally ours the question that we asked so this is few shot and this is what we are asking and we got this SQL query so we see we are able

to select the dynamic example let let's try something else so that we are you know sure or maybe it will become repetitive uh I think it's working so let's go to the next because I see video is already getting long we haven't reached even the uh stream part so uh okay let me go back to the part where we are dynamically selected those examples good Next Step how do we uh you know select those relevant tables because this is very big if you look at the system prompt has those nine 10 tables whereas we might

require only what the query require only customer stables and nothing else and we have all of those things so let's go back and how can we do that thing you know Rel table so the first thing what I did I created one CSV with the table name and it's a description see whether I have this table description a small description so I have a table called Product lines and I put a couple of line description here saying that hey this particular table hold this information so these are the nine tables I have because I'm going

to use this information to decide based on the user query which are relevant tables this will help ideally even these names are introduced for GPT but what if you have very weird names it's always better to give some description uh to this particular you know table so that GPT can choose so this is just the pre-processing part here what I'm doing in this particular part I'm just reading my CSV and you can see I have this big long string of my table name and its description that I'm going to pass inside the prompt next I'm

creating a prompt called table details prompt that take my table details and I'm asking asking chat GPT since I'm going to share this notebook with you I just don't want to read line by line it's getting late already uh it's um you know long video so uh this will simply help us uh you know what are the uh relevant table based on the query and we are using some Lang chain utility something called create extraction chain P head tick okay the name I don't want to go into that name details what basically takes it takes

an llm instance it takes our prompt and it helps help us to extract those table or let's say object in a format that we specify so I have specifi ptic some kind of let's say data type right we are saying a schema that I want you should extract me a thing called a table and I have declared what does table mean so you can see on the top I have declared a schema this base model coming from y identic model like if you are fast API you might have seen something like this so we are

saying hey there is some let's say The Entity or table called something called table which has only one thing called name and that will be the name of the particular table and if you are giving me something it has to be the table object that's what we it we mean actually this particular thing and so given all of this thing let's see whether we get those table objects so we ask give me details of customers and their order counts so based on this prompt GPT chat GPT thinks that maybe the customers and the orders are

the only relevant table and you can see it has given us in a format that we want now this might be thinking like little thing but actually this will become much more important if you want to extract some other information also so here we are just saying that table give me the table and its name and this is what we got from it now let's say how can we use it so basically one of the thing we haven't seen at the we haven't seen it uh uh you know one thing is that there is a

way to manipulate actually this um you know chain to use certain variables uh for example I should have explained that thing earlier so I did commented something so when we create the DB instance right this database thing we can actually specify the sample rows how many sample rows we want you can see we are actually having the three rows but we can also say that hey only include one rows sample rows in the table we can decide what tables we want to include right and we can also pass a custom table information if you wants

rather than using this table information we can pass some custom table um uh you know information maybe I should show you um you know so that it's clear this again I should have actually shown this thing when we started that time itself so let me comment this thing and let's say I'm saying that I just want one sample row and only include these two tables and we have see that earlier it was many tables right so let's execute and do these things okay so go up what's happening so still it has many tables okay okay

custom table info is not defined okay because I saw I told you right we can pass a custom table info let's say the custom table info is some string and I pass this thing what happened now type error table info must be a dictionary with the table name as a keys and something like that oh my God I should have created that thing maybe I can ignore that variable for now maybe when I did a practice that time I did experiment with this also but okay why not let's quickly create that one if I just

maybe this let's see if it can take only one thing and it solve the problem okay it took anyway so we created now let's see what we got okay you can see we got only two tables because we told it to only use two tables but then uh and we see only one samp what do you call uh only one uh sample row because we also mentioned that you know select only one sample row I think this customer this customer you know what this is that custom table info that we just put right if you

want to put anything additional there right uh so I don't have a good example here maybe you know but you can see you could manipulate this thing you know what needs to be pass right but actually for us let's say we want all of those things right and let's skip it to the default what we have okay why I came here just like we saw this couple of variables uh you know the include tables so in the prompt the variable name is called actually I think um uh tables to include or something we will see

and we're going to manipulate that particular variable that variable we're going to uh manipulate can we see that variable here if it is there actually okay I don't see that variable here but anyway let's go back to the thing what we're going to do okay yeah so we saw that we are able to get those table names but let's do one thing first I want just those tables as a list of string and not uh you know let's say those table instance that what I did a simple simply iterating them and generating whatever I want

another thing I added an extra thing so here you could see I am passing it as an input because the prompt inside it expecting a variable name called input but the thing is that when I'm going to execute uh you see all the way we are using a variable called question so just to handle this case what we are doing hey whatever I'm going to pass to the question assign to the variable input so that we have this input variable here which is get available here just to handle that particular thing now let's create the

uh you know uh chain which will use our new thing called select table uh you know scenario now how we going to use this thing this is what I told you right so we are actually creating one variable called table names to use and this is a specific variable is actually available here inside this generate query so I think in the source code or wherever it is happening it checks did we got any specific set of TBL that we want to include if not use all the defaults so here we're going to manipulate we're going

to say hey we're going to pass you the list of tables that we care about and that's what happening so we just saw about that whenever we do something do assign some extra variable it's going to get available to the next chain that what we have and that's what going to happen here so let's say I'm going to ask how many customers with order count more than five and there are two customers but now let's go to the L speed and see really whether we got only those selected tables or not I go to the

line Smith is the default and uh how many customers are there this is customers right how many customers correct we got the answer but that's not the interesting part of this because we know the first step is actually table names to use let's see what happened there when I click on there you can see table names to use only Customer because that's what it is required so the step what happened table names to use which is this particular promt you can see that prom will have our information so that prompt has our information you can

see the table and their description and finally GPT decided hey we just need one table call customers and that will get used ahead so in the query final if you look at the when the query generated that particular prompt sorry I clicked somewhere else here correct this one now we see our system prompt has become very small now we have only one table you can see the customer and its records and nothing else that's how you can save the token and sometimes as I told you what if you have hundreds of tables you need to

come up with some way now you might say hey anyway we are costing some uh let's say the prompt tokens here also but that is still feasible because you can represent you know one table with let's say two lines of things and you can still handle your 50 tables in one single prom but it will be really tricky and uh uh you know difficult thing to put all that 50 table schema here because the schema it might have hundreds of columns it will be difficult right so at least it is still better to have some

description and then ask GPD to choose what we want even though we require some extra request to do that thing else it will not become feasible for you if you have many such a thing right so we just saw we are able to now pick up you know we are able to uh pick up this particular thing as there are two customers with an order and pick up those relevant tables okay and okay one more thing I wanted to show you as I told you that till the point we never ask any follow-up question we

ask all this questions now this is I want to show you that till the point whatever we have built is not the correct thing actually uh it can't answer the followup question and let's test this thing so I ask there are it is saying there are two customers with the order account more than five now I want to ask a subsequent question can you list let's say their names that sparing might be not be correct maybe this one or is this wrong anyway GPT should understand now I'm asking can you list their name and see

what happens whether it really understand my subsequent question or not the names of the customer in France with credit limit greater than 20,000 now you see I never ask about the credit limit and it is giving me something related to it means it is not understanding my followup question but the demo I saw show you there it is able to understand that followup question it means we need to add some memory to the chat box so that it remembers ke what was my earlier question and what was the earlier answer for that particular question and

that's what we're going to do at the moment to do that thing we have to modify our chat from template those final prompt template and add one more thing to it called the message placeholder so earlier prompt you see we have a system instruction we have few short example and then we directly had our user query what user is asking now in the middle we going to create a place holder which will hold our conversation history so in the Lang chain if you're not sure what going to come here what kind of message going to

come here so you can use something called message placeholder and I'm saying hey I'm going to pass you something called messages that's going to have the list of messages that you have to put it here that's what it mean and we can verify that thing so let's say currently I'm not passing anything so uh I'm not passing anything in that messaging array and that's why uh you don't see anything um you know coming here but let me show you that we can pass something you know uh there so let's go and add some memory to

it so I'm using some simple thing called a chat message history which is basically going to store out this all this history in the form of messages so I declare that particular object right and then what we do we can use this history uh let's see what's happening here repas how many customers and okay okay okay where is this so we created this chat history I think did I miss something so we got this chat invoke and we got this history table select table assign but we are not using this history anywhere right okay okay

we are invoking here down here so we just created that chain here we not invoking that's why I got confused that uh why we are not actually using it here right so you can see we declare our chat message history we again have our same thing but as I told you we updated our prompt because we added something extra so this prompt is a part of our uh you know create SQL query chain so we have to regenerate that particular chain so we created that chain again with the prompt the latest prompt what we just

added with the message placeholder and um next what next we declare the chain so it has everything nothing change it has the same variable what we saw on the top here nothing change only thing is that when I'm going to invoke this particular chain I'm going to pass one more variable or call something message why because one of the step in this particular thing is called generate query is expecting that variable to become and let's run this thing so first let's declare this uh thing which is the same thing so I'm not explaining because nothing

has changed now we are passing this message and let's see what we get I ask how many customers with order count more than five we got this thing and uh let's see whether there was uh there was anything in the messages so maybe I go to the line chain I go back to pick up the one which I just recently asked and you can see when we ran there is no messages because that chat doesn't have any history see we just created that chat uh message history here and we haven't put any message inside it

so there is no history associated with it that's why you see the message is getting empty but the next moment I'm going to add now I'm saying hey let's add some history so we are saying in that chat history this one is the user message and this was the AI message so there are two messages inside it now I'm passing those history do messages so those history. messages will give if I want to show you let me just add it and so that I can show you and maybe I just print this history. messages so

that you are clear they are getting as a messages so this history object has a field called messages that can give me those list of messages see human message and the AI message because uh as I added user message the message and the AI message is the a message is what you know we had asked question and the respon understand the the now let's go back to the messages so that it knows what was my previous question and it should be able to answer see whether this message memory is going inside it or not so

if I come here click on this latest message and go back the names of the customers more than five ERS now it understand my first question was custom understand that this was about the customers who has more than five orders and it it actually able to list now see what's happening here let's go back as I told you let me tell you again this is how I understood each step here this step is actually you know getting a variable called table names to use this variable is let's say this particular state is where we are

getting the query so I'm interested in this part so I'm going into its prompt so I could see the prompt here but I like to see here because I can see input and output both what goes to the model right okay so oh you see all of these tables has come something wrong anyway I will see what is wrong but first see whether we are getting those correct messages or not and um let's see what is what has happened you could see you know our prompt format The Prompt has first few short example then conversation

history and then our input so just above our input we have that conversation history so let's go back and see you can see this was our in uh you know uh it's input can you list the name and just above it we have our conversation history which is earlier message how many customers with the order count and this was the answer so we can see the messages are going inside it right now uh the problem that we saw is that a big system prompt let's see what is the issue with that and we know this

is part of this particular step called table names to use let's click here and see what's happening here oh my God so here when we oh so we ask something like this can you list their names and our prompt which we created you know this one uh where is the prompt that we created uh I think yeah down here here which takes our table details and our question and tells us you know what is the relevant things but this particular prompt don't have access to our history we haven't put the history here so it is

not actually aware what is this thing so when it came across something like this can you list their names what it did it simply decided to use all the tables right because if you look at the prompt prompt will be like this based on you know so this is what open prompt it sent actually but uh to decide what is this this input right so this was the prompt what it generated yeah it generated the query so I don't want to look at the query prompt I want to look at this one so let me

expand this I want to look at the table selection prompt so if you look at the table selection prompt what happened it got all this table description and this I got can you list their names now it does it don't have enough context to decide which tables to use so it simply decided to use all the tables and that's why it outputed all the table right so make sure if you're using some external prompt like what we are using it should also have some access to the history or the refine query so that it can

understand which tables to choose that was good learning so maybe you guys when you build these things make sure this prompt also you do something so that it has access to what was the previous you know uh message so this was good video got long actually uh I have to uh so when I check my recorder was already sto recording so I had to re-record certain part of it because it has some limit how many minutes it so this is the second time I resume now let me go back to the uh let's say the

streamlit code sorry where I have integrated all of those things and I hopefully know quickly explain what is there so let me go back to the stream L uh so streamlet we have already seen here right so if I go back and uh sorry not this this one yeah the stream late we have already seen whatever uh you know this thing so this basically a stream lead chat bot where we could see a set of messages coming this is something coming from the user and this robot icon this is coming from the AI so the

messages that what we are rendering has some properties that decide whether it is a user message or whether it's a let's say AI message and you could see there is something for the new input that I could put basic chatbot functionality and we will see uh I know many of you already seen these kind of things in other videos or even my videos so let me go back and see what's happening here so the first thing we are doing here is actually let's say adding some title to our stream it that's what we saw the

Lang and natural language to SQL part next we are declaring our open key next we are storing this open model in the session variable but actually I'm not using it anywhere I think I'm not using anywhere this particular model name as such because yeah I think I could have just you know ignored it the important thing is the messages so whatever you see it is coming on here this this coming from let's say session messages whatever we have and so we are declaring a stream L session variable called messages and we are seeing for here

we are seeing if there is no existing message variable it means there is no conversation and those messages will be empty and if it is there some thing we want to iterate through these messages and based on their role which is this stream lead chat message role I want to display this content and these three lines are making all of that what you just saw here right it is deciding which role and rendering that particular message here right so this is what uh that's happening next we are taking input from the user here and if

I got the input I'm updating my that message array so that next time so whenever I ask something right so whenever I'm going to ask something it's going to appear there and that's what um so whenever I'm going to ask something it's going to appear there and that appearing is happening here this particular part so whenever I'm going to ask something it again get display and eventually we wait for the GPT response or let's say this is where whole Lang chain is happening here I'm calling a function called invoke l chain and eventually whenever I

got the response I also update that response in this message array so that it get display on the UI like this some error can okay uh maybe um something Rel how many customers with order count more than five anyway some error has come maybe it was not the correct ver or something some intermediate step has failed so it requires it means it require better you know handling this stream has a lot of issues how many customers with the order count more than five anyway so we can always go and check what's happening under under the

hood uh right and debug it as we just saw you know earlier so that should not um you know scar us so let's go and see what's happening in the invoke chain function so invoke chain is simply taking the uh my prompt and my messages prompt here is the prompt is actually the user input so I should have used some different name I just copy this code from the uh stream L official website the sample code what they have now if I go to the invoke chain function see what's happening in the invoke chain function

first I'm getting my chain the find chain what we saw here I'm getting that particular chain I will show you this one the final chain next I'm creating in history because in streamlet it is very tricky to I make sure every time I'm answering something I'm creating uh you know that message array based on whatever the session messages I had inside the stream blade so that history get populated based on the session messages um that's what I'm doing and finally I'm invoking that particular chain and then I'm adding those things to our history so that

um you know bot is able to handle because I'm passing that history so let's see exactly so gate chain what it is doing gate chain it is doing that same thing what we have done it created our database object it decare our openi model it declare our generate query chain execute query chain rephrase chain and then this is our final Chain by using all of those things and I'm returning this chain here that's what it did create chain next create history basically create history will take the list of messages which is coming from that stream

L sessions and we see if there is a user role and this we add the user message yes AI message and we know that this history we are actually using um while creating the response based on those followup questions and then finally whatever the chain I got I'm invoking the question that I got from the user because you see in the main we have passed uh the first thing as a question and top K3 again did we use this variable anywhere I think I miss uh maybe you guys guys can check whether we are even

using this variable I don't think we are using it anywhere this particular variable somewhere but let's it be now there and this message is particular the history do messages what we are passing the same thing what we did and we are updating our history to make sure it Loop in the loop it runs everything is you know fine and all of those details now you might be saying okay when I'm getting change when is the other details right where are those details are coming from so basically this DV user and all of those information coming

from the enir variable which I declared so here's an environment file which has all of this as I said I will be sharing the code and there is nothing much um you know that is trickier simple thing what we uh you know just saw and then again create SQL query chain this thing answer prompt so where from where this prompt is coming so I put all the prompt inside this prompt. py so you can see that we have how to format our few short example prompt this is our F shot prompt this is our final

prompt which uses the F shot prompt which we saw and finally this is the prompt which rephrase our answer all of those prompt we already saw inside the notebook and from where those prompts we are getting and you know all of the things what we see here here I store you know all those examples actually stored uh you know here the example selector code is also here because uh that's let's say related to this particular thing you know what this cash resource since whenever we execute something on stream L it it reload that whole page

and we don't want to you know keep creating this chroma and this thing again and again so we can do something called cach resource so that it does only once that's what is happening here so go back to this um you know Lang chain utilities and I think um invoke query messages nothing else yeah is there anything that step you see okay this one you know select table so we have this select table that we imported which is coming from a table details file so if I go to the table details basically here I have

that code which actually read the you know that table description then create that table pent model then get those table name of the list and eventually that table details prompt I could put this prompt here inside the prompts but I just decided to put here in the right and then here we have our table chain that we are importing nothing new I just simply reorganize this code so that it works well with the stream LD right and the only stream L part which I already explained basically uh iterating to those messages making sure that message

State uh in know object has all of this information once you have the code I think that will be easier for you to uh you know understand and we just saw if you want we can just rerun that code again let's re Let's uh you know kill that process and rerun that again and see whether everything is fine so we could run something called stream L run you know main.py and uh yeah it's running maybe some warning we got related to the cach okay let's let's ask those questions again what we were asking where we

got the error how many customers with order count more than five you can see gate chain the first time it is getting that chain we are creating right what we saw and it is now generating the response so we saw the gate chain which was uh what is this again some [Music] warning see the gate chain also we have a cash resource so the next time if I ask any query we don't want to again create that chain so there are uh let's say two customers with an order I think the two and three is

getting confusing not sure let's ask the followup questions can you list their names this time that query I know that chain creation didn't happen because it it is cached already right uh that's what it is uh doing the names of those particular customer so working good I will be sharing all of this code uh with you including uh you know this things I I haven't spent much time on it but I think uh it's pretty easy everything is there already structure uh anyone can go through line by line and understand you know uh what's happening

hopefully you found this uh video useful because I try to use all my knowledge whatever I learn uh you know creating this natural language to SQL chatbot for our client most of the times I used to build this thing from the scratch all this functionality but then I observe lot of those things are already part of the Lang chain and this time I created those all things what I used to do using scratch uh you know uh using now L chain and I could see how efficient this particular thing but it requires a bit patience

and understand the underhood concepts in the Lang chain so that you become productive again right hopefully if you haven't subscribed to um you know my channel you guys can you know subscribe and let me know if you have any queries or you know any next video ideas you have thank you