Analysing Chatflows using LangSmith - FlowiseAI Tutorial #7

9.78k views2214 WordsCopy TextShare

Leon van Zyl

#flowiseai #flowise #openai #langchain Observability in LLM application is a critical component to...

Video Transcript:

In this series, we've had a look at creating quite a variety of chat flows using Flowwise. And I think you will agree that the complexity is increasing quite a bit. And one of the biggest challenges is keeping track of what these AlloLm applications are doing behind the scenes.

And this will become an even bigger issue once we start moving into more complex applications like agents with tools. But thankfully, there is a solution for debugging and monitoring these AlloLm applications in detail. And that is called LangSmith.

If you're unfamiliar with LangSmith, it's a platform that was developed by LangChain. And if you weren't aware, Flowwise is actually using LangChain as one of its underlying frameworks. And what this product allows us to do is to monitor each and every step within our application in detail.

And there are some other benefits as well. For instance, we can view the amount of tokens used by our application. You might have noticed that I was using LangSmith in a few of my videos, but it wasn't available to the public at that time.

So there was no point in trying to demonstrate this. However, LangSmith launched to the public just a few days ago. So I highly recommend signing up for a LangSmith account.

And if I go to pricing, you will notice that it's free to use as a developer. And of course, for more serious implementations, you can upgrade to one of these packages. But in this series, we will be using the free developer account so that you can follow along.

So go ahead and click on Get Started and sign up for your free account. After logging in, you should be presented with a dashboard like this. And at the moment, we do not have any projects.

And what we'll do is I'll actually show you the behavior of four different projects. Let's start with a very simple implementation, which is just a very standard Allolint chain. This is nothing fancy, but I do want to use this to show you the basic functionality of LangSmith.

So this is simply an Allolint chain node with a very basic prompt template that will accept a subject as input and then generate a junk. And for the model, I'm simply using the GPT 3. 5 Turbo instruct model.

And if we test this, we'll simply get a response back as per usual. And if we go over to LangSmith, we won't see anything within these projects. Now in order to analyze this chain, we can simply go to settings.

And within settings, we have this analyze chat flow option. And here we have a few options of different service providers. We are interested in LangSmith.

Let's go to connect credentials. Let's create our LangSmith credential by giving it a name like LangSmith API. Then let's get our API key from LangSmith by going to API keys.

Let's create a new key. Let's copy that key. We can close this pop up and note that I will be deleting this key after this recording.

So please use your own key. Then back in Flow-wise, we can simply paste in that key and hit add. This is optional.

I do recommend providing a project name. Otherwise, all of these logs will be stored against the default namespace. And of course, you can specify a project for each and every chat flow.

But I'll just create one project name for all of these flows called Flow-wise. We can then enable this analysis by setting this toggle to on. Let's save this change.

Now let's test this again. I'll just enter horse. And now when we go back to LangSmith, we can go to projects.

We will now see that Flow-wise project. And when we open this, we can see the trace for the chat that we just executed. We can see that it was indeed an LLM chain that triggered this trace.

Our input was horse. We can see the time and date, the time it took for this to execute. And something that most of you will find extremely valuable is the cost for this execution and the amount of tokens that were used.

I know a lot of you have been asking me in the comments how we can keep track of the token usage or the cost in these Flow-wise chat flows. And this is exactly how it's done. Now we can see additional information about this by clicking on that record.

And on the left hand side, we can see all the different steps that were executed as part of this trace. And since this was a very simple application, there really only was one step. And that was the call to open AI.

So we can see the content of that prompt template. And we can see that that placeholder was indeed populated with horse. And we can then see the generation from the open AI model.

You can actually take this one step further by tweaking this prompt within Langsworth itself by clicking on playground. Then within playground, you have the option of tweaking this prompt and then executing it. And you can also play with the different settings so that you can copy these back into your Flow-wise application.

If you want to use this, you simply have to click on secrets and then paste in your open AI API key. Now let's have a look at a few more examples. And you might also find this fascinating.

Let's have a look at this LLM chain with an output parser. Here we have a very simple LLM chain with a prompt template of generate a comma separated list of synonyms based on the following word. And we will grab a word from the chat box.

And then for the model, we are using this GPT 3. 5 turbo instruct model. And lastly, we also have a custom list output parser in this flow.

And this will take the output from the LLM and convert it into a JSON structure with a list. So before we run this, let's enable Langsmouth by clicking on settings and then analyze chat flow. Within Langsmouth, let's provide our Langsmouth credentials.

Let's provide a project name like Flow-wise. And let's set this to on. Let's save this.

Then let's run this in the chat. And let's enter the word happy. And this returns this JSON list of values.

Now what's very interesting in this example is based on this output, we actually have absolutely no idea how this list output parser works. But if we go to Langsmouth, and if we click on this trace, we can see that that word placeholder was passed into the model as happy. But we can also see this additional property which we didn't specify in the prompt template.

There's this placeholder called format instructions. And what the output parser module did was inject this piece of text to say your response should be a list of items separated by a comma. And there's also an example of values.

And that meant that the output of the model is now in this JSON list format. How awesome is that? And another useful feature is this metadata tab.

And there's a lot of information in here. But we can find some useful information like the model that was used during runtime, the temperature and the output parser that was used. Let's have a look at two more examples.

Let's have a look at this conversation chain. This is nothing too complex. This is simply a conversation chain with buffer memory assigned to it.

For the model, I'm using the GPT 3. 5 turbo model. And I'm using a chat prompt template with a system message of you are a helpful AI assistant.

And for the human message, we're simply grabbing the input from the chat window. Let's enable Langsmith by clicking on settings. It's going to analyze chat flow.

Within Langsmith, let's select our credentials. Let's enter the project name. And let's turn this on.

Let's save this in the chat. Let's enter something like the passphrase is Langsmith. And the reason I'm showing this example is to show you how memory works within these applications.

If we go back to Langsmith, we can see that the conversation chain is actually considered a runnable sequence. Runnable sequence is simply a Lang chain concept, which we won't really get into in this video, but we can still see our input. And if we click on this, we can see a very simple list of steps at this level.

And for the highest note, it simply shows us the input as well as the response that was returned. But we can also see that there are actually five hidden steps within this. So if we click on most relevant, we can click on show all.

So what we can see here is this runnable map step, which effectively receives our input as its input and then returns this output structure, which includes the chat history. This output is then passed along to the next step in the strace, which is the chat prompt template. And if you recall this chat prompt template as a system message as well as this human message, which is a dynamic value.

So if we have a look at the strace, we can see that the system message is indeed the hard coded value that we provided. But for the human message, this input value is added to this human field. So far, this is very straightforward.

If we go to chat open AI, we can see the output from the chat prompt template being passed in as the input for the model. And we can see the output from the model. And we can also see this output parser step.

And that is simply because the conversation chain will inherently convert its output to a string. And this is the final response that we saw in the chat window. But if we continue this conversation, so let's say in something like what is the passphrase, our model is able to answer this question because of the memory.

And if we go back to Langsworth, we can now see exactly how memory works. When we click on this latest trace, let's change this to show all. And if we now click on runnable map, we will notice that this looks a little bit different to the first time we executed this chain.

Of course, we do see our initial input, but the output now contains our input message as well as the chat history. And that contains the previous human and AI messages. Of course, we can go into these individual traces to see how this works.

Firstly, this simply accepts the human input and passes it back as I pass through. Then the second step in this process is responsible for fetching the chat history and then passing that back in this response. So now when we click on chat prompt template, we will notice that the input now contains our message along with the chat history.

And that affects the final structure of the chat prompt template, which not only includes the system message and the human message, but also the chat history. So if we now go to the next step, which is the chat open AI call, we will see that the input now includes everything including the history. And that is why the model was able to answer this question.

Now let's have a look at the final example. And that is this rag chatbot that retrieves information from a pine cone database in order to provide an answer. This is the exact same example that we used in the previous video, where we scraped the Lang chain documentation in order for the model to be able to answer questions on Lang chain expression language.

So let's enable Langsmith by clicking on settings, let's go to analyze chat flow, let's load our credentials, let's give a project name, and let's turn this on. Let's save this. And in the chat, let's ask what is LCL?

And as expected, we are receiving the correct response. We can click on this latest trace, we can see that this is quite an extensive list of steps, and we won't go through all of this. But what we are interested in typically is to see which documents were used as part of this context, we can get those by clicking on find docs.

And within this output, we can see information about the documents that were used along with their metadata. And we can scroll through these and see all of those different documents, which can be very useful for troubleshooting situations where a problematic or outdated document is returning the wrong information. And we could then go and delete that document from the knowledge base.

I hope you found this video of using Langsmith in flow wise useful, and please let me know if you would like a video to go over Langsmith in more detail. Also, please hit the like button and subscribe to my channel for more flow wise and Langsmith content.