Stop paying for ChatGPT with these two tools | LMStudio x AnythingLLM
279.53k views2155 WordsCopy TextShare
Tim Carambat
In this video, we are installing two user-friendly tools that make downloading, running, and managin...
Video Transcript:
hey there my name is Timothy carat founder of implex labs and creator of anything llm and today I actually want to show you possibly the easiest way to get a very extremely capable locally running fully rag like talk to anything with any llm application running on honestly your laptop a desktop if you have something with the GPU this will be a way better experience if all you have is a CPU this is still possible and we're going to use two tools both of which are a single-click installable application and one of them is LM studio and the other is of course anything LM desktop right now I'm on LM studio. a they have three different operating systems they support we're going to use the windows one today because that's the machine that I have a GPU for and I'll show you how to set it up how the chat normally works and then how to connect it to anything LM to really unlock a lot of its capabilities if you aren't familiar with anything llm anything llm is is an all-in-one chat with anything desktop application it's fully private it can connect to pretty much anything and you get a whole lot for actually free anything in LM is also fully open source so if you are capable of programming or have an integration you want to add you can actually do it here and we're happy to accept contributions so what we're going to do now is we're going to switch over to my Windows machine and I'm going to show you how to use LM studio with anything LM and walking through both of the products so that you can really get honestly like the most comprehensive llm experience and pay nothing for it okay so here we are on my Windows desktop and of course the first thing we're going to want to do is Click LM Studio for Windows this is version 0. 216 whatever version you might be on things may change a little bit but in general this tutorial should be accurate you're going to want to go to use anything.
com go to download anything LM for desktop and select your appropriate operating system once you have these two programs installed you are actually 50% done with the entire process that's how quick this was let me get LM Studio installed and running and we'll show you what that looks like so you've probably installed LM Studio by now you click the icon on your desktop and you usually get dropped on this screen I don't work for LM studio so I'm just going to show you kind of some of the capabilities that are relevant to this integration and really unlocking any llm you use they kind of land you on this exploring page and this exploring page is great it shows you basically some of the more popular models that exist uh like Google's Gemma just dropped and it's already live that's really awesome if you go down here into if you click on the bottom you'll see I've actually already downloaded some models cuz this takes time downloading the models will probably take you the longest time out of this entire operation I went ahead and downloaded the mistal 7B instruct the Q4 means 4bit quantized model now I'm using a Q4 model honestly Q4 is kind of the lowest end you should really go for Q5 is really really great Q8 if you want to um if you actually go and look up any model on LM Studio like for example let's look up mistol as you can see there's a whole bunch of models here for mistol there's a whole bunch of different types these are all coming from the hugging face repository and there's a whole bunch of different types that you can find here published by bunch of different people you can see that you know how many times this one has been downloaded this is a very popular model and once you click on it you'll likely get some options now LM studio will tell you if the model is compatible with your GPU or your system this is pretty accurate I've found that sometimes it doesn't quite work um one thing you'll be interested in is full GPU offloading exactly what it sounds like using the GPU as much as you can you'll get way faster tokens something honestly on the speed level of a chat GPT if you're working with a small enough model or have a big enough graphics card I have 12 gigs of vram available and you can see there's all these Q4 models again you probably want to stick with the Q5 models at least uh for the best experience versus size as you can see the Q8 is quite Hefty 7. 7 gigs which even if you have fast internet won't matter because it takes forever to download something from hugging face if you want to get working on this in the day you might want to start to download now for the sake of this video I've already downloaded a model so now that we have a model downloaded we're going to want to try to chat with it LM Studio actually comes with a chat client inside of it it's very very simplistic though and it's really just for experimenting with models we're going to want to go to this chat bubble icon and you can see that we have a thread already started and I'm going to want to pick the one model that I have available and you'll see this loading bar continue There are some system prompts that you can preset for the model I have GPU offloading enabled and I've set it to Max already and as you can see I have Nvidia Cuda already going there are some tools there are some other things that you can mess with but in general that's really all you need to do so let's test the chat and let's just say hello how are you and you get the pretty standard response from any AI model and you even get some really cool metrics down here like time to First token was 1. 21 seconds I mean really really kind of cool showing the GPU layers that are there however you really can't get much out of this right here if you wanted to add a document you'd have to copy paste it into the entire user prompt there's really just a lot more that can be done here to Leverage The Power of this local llm that I have running even though it's a quite small one so to really kind of Express how powerful these models can be for your own local use we're going to use anything llm now I've already downloaded anything llm let me show you how to get that running and how to get to LM Studio to work work with anything llm just booted up anything llm after installing it and you'll usually land on a screen like this let's get started we already know who we're looking for here LM studio and you'll see it asks for two pieces of information a token context window which is a property of your model that you'd already be familiar with and then the LM Studio base URL if we open up LM studio and go to this local server tab on the side this is a really really cool part of LM Studio this doesn't work with multimodel support So once you have a model selected that's the model that you are going to be using so here we're going to select the exact same model but we're going to start a server to run completions against this model so the way that we do that is we can configure the server Port usually it's 1 2 3 4 but you can change it to whatever you want you probably want to turn off cores allow request queuing so you can keep sending requests over and over and they don't just fail you want to enable log buing and prompt formatting these are all just kind of debugging tools on the right side you are going to still want to make sure that you have GPU offloading allowed if that is appropriate but other than that you just click Start server and you'll see that we get some logs saved here now to connect the LM Studio inference server to anything llm you just want to copy this string right here up to the V1 part and then you're going to want to open anything ilm paste that into here I know that my models Max to token window is 496 I'll click next embedding preference we don't really even need one we can just use the anything LM built in EMB better which is free and private same for the vector database all of this is going to be running on machines that I own and then of course we can skip the survey and let's make a our first workspace and we'll just call it anything llm we don't have any documents or anything like that so if we were to send a chat asking the model about anything llm will'll either get get a refusal response or it will just make something up so let's ask what is anything llm and if you go to LM Studio during any part you can actually see that we sent the requests to the model and it is now streaming the response first token has been generated continuing to stream when anything llm does receive that first token stream this is when we will uh start to show it on our side and you can see that we get a response it just kind of pops up instantly uh which was very quick but it is totally wrong and it is wrong because we actually don't have any context to give the model on what anything llm actually is now we can augment the lm's ability to know about our private documents by clicking and adding them here or I can just go and scrape a website so I'm going to go and scrape the use.
com homepage cuz that should give us enough information and you'll see that we've scraped the page so now it's time to embed it and we'll just run that embedding and now our llm should be smarter so let's ask the same question again but this time knowing that it has information that could be useful and now you can see that we've again just been given a response that says anything LM is an AI business intelligence tool to form humanlike text messages based on prompt it offers llm support as well as a variety of Enterprise models this is definitely much more accur it but we also tell you where this information came from and you can see that it cited the use.