Anthropic MCP with Ollama, No Claude? Watch This!

13.6k views6054 WordsCopy TextShare

Chris Hay

anthropic released model context protocol which allows you to connect llm's to your own data and too...

Video Transcript:

hey welcome back so about a week ago anthropic open source model context protocol which is their new way of connecting llms to external data sources and tools and we as a community went absolutely wild about it but one of the problems that we've had is that it can only talk to CLA Sonet model and you can only really access it through Claude desktop we haven't got a way of easily connecting our own LM especially ones on all Llama Or open Ai and we really want to be able to use model context protocol within their own applications and that is not been easy to do so today I'm going to show you how you can build your own native model context protocol application which connects to llms hosted on ol Lama and on open AI so it's just as a reminder in my previous video I showed you how you could get started with model context protocol on Club desktop and connect to a SQL light database which has a products table what we really want to be able to do is use model context protocol with our own applications and we also want to be able to use different large language models we don't always want to use cloud Sonic so I'm going to do the exact same demo but this time I'm going to do it within my own native CLI um so if we just run that for a second and you see in my CLI I've got a bunch of commands so it allows me to talk to uh the model context protocol servers I'm going to type in chat and that opens up an interactive chat mode and as you can see I've entered chat mode using provider open Ai and GPT 40 mini so I'm not not even using GPT 40 I'm going to use mini here and then I am just going to say what tables are available and chalks above it is now making a tool call you see tool list tables so exactly what called desktop did and it's came back with the three tables products users and SQL light sequence so I will say describe uh products and again once again you can see a tool command describe table inoke with arguments table name products and then it tells him the schema and then I'm just going to say select uh top 10 products ordered by by price then as you see here it's doing another tool invoke there so it's saying tool read query invoke with arguments query select St from products ordered by price limit 10 and then it's came back with screen protector mouse pad Etc the key thing is here I'm talking to the same model context provider as I did before but instead this is running in my own CLI there is no CLA desktop and more importantly I'm actually using GPT 40 mini rather than clae 3 Sonic but I'm not restricted to GPT 40 mini if I want to I can use llama and actually talk to any of my open source models so to do that I really need to pick a model that is able to talk function colon I'll go through that in a second so uh I'm going to pick in this case llama 32 but it equally works well with things like Quan it works with things like the IBM's Granite model it works with any model that is good at function calling so in this case I'm going to pick llama 32 and then I'm just going to do the exact same demo so I'm going to go into chat mode and I'm going to say uh all tables are available and then again as you can see it's calling list tables it's calling describe table and then is coming back with the uh The Columns and then I'm just going to say select top 10 products ordered by price in descending order and there you go it's working with all llama and llama 3. 2 models so as I said before I am using my uh repo here which is on github. com cruk mCP D CLI and then if you want to get that running with open AI then all you really need to do is in your M file set open AI oncore API key to whatever your exact key is if you of course you want that to work with o Lama then you won't need to do that the key thing is uh you are just going to run um UV run um again if you want to use open AI you don't need to set a provider uh or a model uh you can just run it like this and then that will of course work or if you are in to use olama directly then you just need to set the provider which would be olama and at Lama 3.

2 but what is cool about this CLI is and the way that I've built this is that actually I've gone down to the protocol level so I'm going to break down exactly how this works and I'm going to show you a little bit of how the CLI works so that you can go and be able to create your own applications so the first thing I want to do is go through the architecture a little bit and then I'll look at the the CLI and then I'll show you how we get started in this so in the demo that I showed you earlier and again I covered this in my other video but we have this idea of a host so in this case mCP CLI or the claw desktop application was effectively the host and then within that host we can run uh a client application so client 3 client 1 Etc um and then on the uh right hand side here is we have our servers so when that SQL light server database that we talking to that was actually an mCP server so it was a SQL light server and then that's talking to the resources and then the host and their clients are effectively talking to these servers that's that's really what's going on underneath the hood so in order for that to work those servers so in this case the server that I'm running is actually a standard IO uh server so if we uh look at that for a second so if you look in my server comp here you're going to see that I've got uh mCP servers and I've got a server called SQL light and really what I'm doing is running uvx and then I'm calling mCP uh server SQL light and then I'm passing in test DV as the path so in my folder um I have a test DB within there you can see it there and if I was to go back into my terminal and again I covered this in my other video but if I was to type in something like uh SQL light three and then I type test. DB I would be able to go uh select star from products and then you would uh and then you would be able to see the items that were in the database because that's what's actually happening because that's actually what's happening under the hood so this sort of uh database here so the uh testdb database is essentially acting as the database on this diagram and mCP server SQL light is acting as the server on the right hand side so if I do a UV run main pi uh- SQL light for a second is going to connect to that configuration if I wanted to change the configuration I just change it my server config um but as I said there's a set of commands here so you see ping checks if the server is responsive so that's probably the first thing about the protocol is that when a client is connecting to a server there is a message it needs to send which is an initialized message which will get its capabilities and then if I want to check if a server is available I can send it a ping command and then it will basically respond if it's there so you see I'm pinging server and the server is up and running um so if I were to go into my main. py for a second you can kind of see the handle command here and then it says if command equals ping pinging server we saw that and then I'm going to send a message called send ping and then you can see read stream and right Stream So I've passed in his parameters read stream and right stream so I'm going to send a ping and in this case it's going to be the standard IO read and write strings there so it's my input and output uh and it's going to send the Ping message you can see send ping is coming from messages.

ping I'm going to show you what that looks like in a second uh you can also see the send initialize message we'll go through that in a second and then uh if I go down a little bit further in my main just before uh I call that handle command there you see the first thing that happens is I'm going to load my server configuration config so a wait load config and then I'm going to establish standard IO communication you see this async with standard IO client I'm passing in my server parameters IE the name of the uh client I'm going to corect IE the name of the server that I'm going to connect to uh I'm going to create a read stream and a WR stream and then the first thing I do is initialize the server I get the capabilities and then it's going to handle that command there so that is really what's happening and then I'm just running around in a loop to try and get kind of more messages so so and again we can see this so if I switch out of critical mode here for a second and I switch to debug mode and then we just do this one more time so now that I've got debug mode open you can see here the first thing that happens I'm loading config from server config which I just showed you my Json file and then there's that command uvx and it's running mCP server SQL light and it's passing in test DB is the DB path and you see here subprocess started with PID 4795 so if I actually just uh run here for a second do PS minus a and there's 40 795 and you can see opt home Bru UV tool uvx mCP server sqlite and the path is DB there so you can see in this particular case standard IO is effectively running a process is a subprocess in my machine uh it's running in the background and then effectively I'm just using IO to submit send commands back and forward now mCP does support HTTP servers as well and and in particular uses servers side events to be able to do that I'll maybe cover that in another video Maybe cover this in another version but effectively you can kind of see that that is what's happening it's just a process that's running on my machine and then I'm just sending essentially standard in standard out to be able to communicate with that so first thing that happens here the next thing that you can kind of see here is that uh I'm going to send this uh piece of Jon this is the initialized command see this J and it's basically it's I think called Json RPC so you see it's Jason RPC 2. 0 I'm sending an ID in this case in at one uh the method that's the important part is for the messages is the initialized message and then I'm telling it what I support so I'm not going to go through the details of this but I'm basically say saying that I've got protocol version uh and then I tell you what capabilities my client is capable of supporting in this case uh Roots list change sampling Etc and I give it the name of my client and my version as well it's not important at the moment maybe in a deeper deeper dive Maybe go through that and then of course once I've sent that data um it's going to go across to the server and then it's going to respond back to me so you see I get a message back saying jsonrpc for in at one so it's acknowledge my message uh and then it uh tells you what capabilities it has as well so it's saying okay it's experimental it's list changed Etc and it gives all the details that it's support but that initialization is really the first message that happens um and then finally I'm just going to send a notification message so it's told me I've told it my capabilities it's told me its capabilities and then I'm just sending a notification to say okay I've initialized my client and then if I want to quit out the client I don't need to send it any other messages I can just close out there and and again if I come back into um the and again if I come back into mCP specification for a second so so just to sort of explain what we've seen here if I click on this kind of life cycle piece of the specification you see it talks about the life cycle of client server connections so initialization is the thing that we just did we sent what capabilities back and forward so you see here initialized request and then I got that response and then I sent back a notification to the server to say that I was initialized that was effectively what we we just just saw there and then the next part is from an operation phase I'm going to send it some protocols and then if I want to disconnect I can just close the connection and then we're done so um normal protocol operations is things like pings Etc but before I get on to that just to sort of go even further here there is the spec jsonrpc 2. 0 id1 method initialize parameters capabilities sample and client info and then the server's got to respond with its own capabilities information which it just did and of course as I saw showed you there this is my initialized message and then this is the capability sent back by the server and then eventually I send it a notification so I'm literally walking through uh this the spec here and how that works and then once that's done I can send my own operations there now if you're thinking that's a little bit complicated I've done a super simplified version here I've stripped it down it's a file called test.

py and you can see I've got my messages send initialized message ping Etc uh and I've done this strip down main method here where I just pass in config pass in this server name which in this case is equal light I call the load config and then here's that async with uh standard IO client passing in those parameters that I've just loaded from the config Ram right stream and I initialize the server by sending that send initialize command we'll go through that in a second and then we check and if it works uh then we're connected so if I close this down for a second we'll just clear this if I do a PS minusa uh you'll see that 4795 has now disappeared so uh we're no longer running that subprocess and now if I do a UV run and this time rather than calling uh main. py if I call test. py uh you can see same sort of thing uh and then it's uh sent it's initialized it's received its message and then it's uh sent its uh initialized notification and then I've just closed the stream because I was done I hadn't have anything else to do there so if I do a PS minus a you can see it's no longer running where good and then of course if I wanted to extend this if I wanted to talk to the server a little bit more then I could just simply uh away a send ping there so if I save this now and then we run um my test.

py one more time you can see now it's done everything that it did before but now I'm sending a new message and that new message that I'm sending is the Ping which is Json RPC 2. 0 the ID is ping one and the method is Ping so it's a very simple ased form at there and then uh the server is going to uh receive it it's going to do whatever it's going to do and then it's going to come back to me and send me the response to Ping so ping one method none param none and then you know a success so that sort of leads me on to what these messages look like so if I come into my messages folder for a second you can see I've got this sort of Base jsonrpc message uh it's a pantic class that I've created in this case but the key thing is you see Json RPC 2. 0 and then I'm accepting in an ID method params result is error so you can just use this it's a base message type and then if we look at this the Ping message is super super simple because to send the Ping all I am going to do is I'm going to call a thing called send message pass the read stream the WR stream the method is ping ping one and then I'm going to get that answer so it's really simple I'm just passing in a method and a message ID and then I'm calling the send message you can imagine what send message is going to do it is uh taking all of this is going to create that Json RPC message that that we had a second ago I'm going to pass in my message ID My Method and my parameters and then I've got a little bit of uh uh sort of retry logic in here where I'm going to attempt to send the message sends it onto that right stream so it's just doing a standard IO right off it goes and then if it fails then I'll get a timeout whatever and then once it's done uh we're good to go now if I look at send initialize message it's a bit more complicated because I've got all of these sort of client capabilities and initialize parameters blah blah blah and then I'm going to uh I'm going to initialize them set my protocol version but eventually I'm going to send that message put those parameters in his Json and it's just initialize and then once uh I do that send I'm going to expect to get a result and then I'm going to uh once I've got that result I will then eventually send that notification uh to say it's done the key thing I want you to notice when I'm chatting with this is I'm not actually uh using any of the library from anthropic on the model context protocol uh website I've literally implemented this from scratch by hand just using Json RPC and then that shows you that it's pretty easy for you to be able to do that yourself now if I come back into the CLI for a second um I'm just going to set my uh logging back to critical because we don't want to look at the debug and I'm going to rerun my main.

py so back into my CLI we'll click help again there you can see that um there's a few things that I've got available so ping checks if the server is responsive so we've done that one already so Ping Server is up and running um I can clear it um but there's a few other commands that we've got here we've got things like list prompts now that is for um if you've got something like vs code or something like that or slack or something and you want to be able to do the app thing and see what prompts it supports you can you can have a look at the prompt list now in the SQL light server that I downloaded there um which is one of the sort of default anthropic provided ones um it it doesn't give you a lot of prompts here it just says a prompt to see the database with initial data and you know so it's not kind of much useful there it's very generic again if I do list resources resources are meant to be things like kind of file system based things so it could be files it could be content from a Content management server again because this is a SQL Server database it's an example project it doesn't do anything useful it's just coming back with memo insights but you can imagine files and Etc so I'm just going to clear that but the thing that you're probably most interested in is if I do something like a list tools and you can see from the tools list this is the same tools queries that we were executing before so things like read query uh you can see things like create table for example describe table so if you remember the demo that we did with uh CLA desktop or within the CLI itself what was actually happening underneath the the hood is when I said what tables are available in my database you can see it's a view result from list tables get a schema view result from describe table and then select top 10 products read query there is a one Ono one mapping between here read query blah blah blah so hopefully anybody who's ever done agents before is going to start to understand what's actually happening underneath the hood here is effectively what I'm doing is making a call to get a list of of tools that the model is able to interact with and those tools are going to be placed in the context so the basically what you're seeing here this description is going to end up in the lm's context and once it's in the lm's context then the llm is going to know what tools it's available to be able to work with and then of course when it's making a question like um you know list me all the tables for example uh in my database then it knows the tools that it's got and it's going to make those particular tool calls and the way that it's going to do that is via function calling because actually underneath the hood here I'm giving it all the information that it needs needs to be able to make a function call so you can kind of see here for the tool name description input schema so it's type object properties query type string description required it is basically describing everything it needs in order to be able to do a function call so how did I actually implement this in chat mode well if we come into here for a second so um as you can see um when I open up chat what's actually happening here is I call a handle called uh handle chat mode so if I go into what my chat Handler is for a second this is kind of where the magic happens and you're going to see exactly what's going on so the very first thing that I do is as you see I uh call this fetch tools method and you can guess what fetch tools does um it is in my tools Handler and within my tools Handler you can see what I'm doing is calling send tools list so I'm actually just making that exact same list tools call that we had and then I'm going to get all the tools coming back and then I'm going to stick it at here now the key thing that I'm going to do once I've got the tools list you can see I've got this thing here called generate system prompt where I'm turn taking the tools that have been returned back and I'm going to generate a system prompt with it I'm putting it in the context so if I go to generate system prompt here you can see uh this one in particular is going to call a thing called the system prompt generator and then I'm going to generate a prompt and I'm going to add it to the system system prompt so if I go to this generate prompt for a second anyone who's ever worked with clo before uh and looked at their kind of website before you are going to see this is going to feel very very familiar because you can see in this template in this environment you have access to a set of tools you can use to answer the question format instructions string and scale of parameter should be specified blah blah blah BL blah tool definitions in Json schema user system prompt tool configuration and then I'm literally just going to generate a system prompt here and then I'm also going to be adding on some general guidelines step-by-step reasoning analyze St systematically blah blah blah blah and then this big description is effectively going to be what the llm is dealing with so before once I enter chat mode I have got a very large system prompt that tells you exactly what uh tools the model's got to deal with and how I want it to be able to do tool usage so everything from step-by-step reasoning blah blah blah clear communication and give it examples of how it works so it is stuffed into that system problem and that's exactly what happens when you're working with agents right you you fill the agent with the context so it knows exactly how to be able to call the to tools and what tools it is available to so if I come back into my chat Handler for a second I've now generated my system prompt um the next thing I need to be able to do is be able to do function calling so what I do here is I can convert the tools into open AI format so if you call here um I just take the format that we've got so I've made that call to the database schema and then all I'm going to do is is is turn that into open AI format so that function calling will work and then I'm just going to initialize my llm client here I'll I've got a nice little llm client thing where I abstract away a few things I could use something like llm but I wanted just to hook something up really quickly so in this case um you see here I've set my provider to open Ai and the default models GPT 40 mini and then on in here it's just standard native open Ai and it's standard native olama so this case at this llm client here you see I've got create completion call open AI completion if the providers ol Lama then it'll call the oama completion and then it's just going to do the normal chat completions key thing that is happening here look at the open AI version the tools that I've gotten back from my uh from my tool call by the mCP server I'm passing in as Tools in my chat completions natively that is effectively function calling it is just calling tools I'm just using tools within function calls that's it um and then I'm going to H get the response back and return that so you see here tools calls and then I'm just going to get the result of the tool call there um probably same thing for AMA completion similar thing and in this case I default to kind of quen but as I say I pass through a model in this case l 32 that worked I get the response and then uh I'm going to fill in that tools call so the format that I get is slightly different from o Lama so I just need to convert it to uh a sort of generic way of doing that and then that's good so if I come back into my chat Handler I set up my conversation history so you know so so far I've set it up I've set up what my tool calls are I've set up my conversation history I hit it chat mode and then I'm just going to chat as normal I'm going to add to the conversation history and then when I do pro conversation when I call the llm I call that completion I'm going to get my uh my tools back but key thing is if I have a tool call coming back from the llm I am then going to handle my tool call so I'm going to call handle tool call and then you can imagine what's going to happen here a little bit complicated but the key thing that happens here is I'm going to pass parse that tool call message I'm going to get the name of the tool the function name the arguments cu the llm sort to all that out is giving me back you need to call this function these are the arguments you need to pass uh then eventually once I've pared all of that I'm eventually going to have to call the tool and what am I doing here I'm just doing another message and this time is called send call tool so if I go to the definition on that it's just like any other Json RPC message I'm passing through uh send message here's my restream here's my right stream the method in this case is tools SL call I pass in the name of the tool and then I pass in the argument so in this case it would be list tables maybe no arguments or it would be uh requery and then it passing all the the the arguments I want to run I to SQL statement is going to perform that and it's just another Json RPC message then eventually once I get the response back then I'm just going to add the tool call into the conversation history I'm going to add uh all the response in there and then uh and then I'm going to continue on so to bring it back into this if you are just curious and again I sort of said this before you know this is my SQL light database this is my SQL light server which was the default quick start in fact that quick start is available here so if you go in the model context protocol IO quick start you see this is the database that you create these are the exact same products that we've entered in there I have a video on how to do this um and then you can call mCP server SQL light which is downloaded from um the sort of sample servers that are available my model contact protocol so I didn't create mCP server SQL L it was just available um but you can see with that running as a service running called desktop talk uh using the code that I created by recreating the protocol I'm able just to interact with this and at the end of the day though all that has happened so mCP server a here is that SQL uh light server that we showed you there this is the SQL light database and then the the protocol in this case is standard IO but of course it could be uh HTTP with server side events I run that call that sard initialize I get the capabilities once I've got the capabilities the next thing I'm going to do is I'm going to operate the command in chat mode in this case I'm going to list the tools so I get the tools back and then all I'm going to do is uh when the llm decides that I need it needs to make a tool call in the same way as agents do IE via function calls then it's just going to make a send call back to that server there that is it that is exactly how it works um this is the diagram that sort of uh shows the same thing there CL desktop mCP server SQL light database initialize available Quil abilities query requests perform the SQL query get the result and then have the formatted results the only difference in this diagram is I'm not using CL desktop I'm using my mCP CLI everything else is the same and then from an llm point of view rather than using Sona I'm using my own llm and then just to bring this home one more time so if I do call MP passing SQL Server light providers or llama and llama 3. 2 the reason I picked llama 3.