Run ANY Open-Source Model LOCALLY (LM Studio Tutorial)

148.79k views2719 WordsCopy TextShare

Matthew Berman

Get UPDF Pro with an Exclusive 63% Discount Now: https://bit.ly/46bDM38 Use the #UPDF to make your s...

Video Transcript:

this is the easiest way to get open-source large language models running on your local computer it doesn't matter if you've never experimented with AI before you can get this working the software is called LM Studio something that I've used in previous videos and today I'm going to show you how to use it let's go this is the LM Studio website the LM Studio software is available on all platforms Apple windows and Linux now today I'm going to show you how to get it running on a Mac but I've gotten this working on Windows as well

and it is dead simple so really you just download the software and install it there's nothing to it and once you do that this is the actual LM studio so first let's explore the homepage here you're going to get a nice little search box where you can search for different models that you want to try out basically anything available on hugging face is going to be available in LM studio if you scroll down a little bit you get the new and noteworthy model so obviously here's Zephyr 7B beta here's mistal 7B instruct code llama open

Orca these are the top models for various reasons and not only that it tells you a bunch about every single model it pulls in all the information from the model card so it's easily readable from here thank you to the sponsor of this video updf updf is an awesome free alternative to Adobe Acrobat but let me just show it to you so after a few clicks I got it downloaded and installed I loaded up an important PDF and you can do a lot of awesome things with it so let's start with OCR so I clicked

this little button in the top right I select searchable PDF and then perform OCR so that allow me to search through it and do other things with it now that it's a text document very easy and there we go after a few seconds I have the OCR version right here now I can highlight all the text easily switching back to the PDF we can do a bunch of cool stuff so we can easily highlight we can add notes I can easily protect it using a password by clicking this button right here I can easily add

stamps so I could say confidential right there and you can easily edit PDFs check this out and best of all it has a really cool AI feature where you can actually ask questions to this document so it's basically chat with your doc all you have to do is click this little updf AI in the bottom right it loads up the document I click get started and it's going to give me a summary first and then I can ask it any question I want all right so let's ask it something who are the authors of this

paper so be sure to check out updf and they're giving a special offer to my viewers 61% off their premium version which gives you a lot of other features link and code will be down in the description below thank you to updf so let's try it out if I just search for mistol and hit enter I go to the search page and we have every model that has mistel in the keywords and just like hugging face you get the author and then you get the model card information and you get everything else involved too so

you can really think of this as a beautiful interface on top of hugging face so here's the BLS version from 4 days ago let's take a look at that so if I click on it here I can see the date that it was uploaded loaded again 4 days ago I can see it was authored by the bloke and then I have the model name dolphin 2.2.1 Ash Lima RP mistal 7B ggf lot of information in that title on the right side we can see all the different quantized versions of the model so everything from the

smallest Q2 version all the way up to the Q8 version which is the largest now if you're thinking about which model to choose and even within a model which quantized version to use you want to fit the biggest version that can actually work on your machine and it's usually a function of Ram or video RAM so if you're on a Mac it's usually just Ram but if you have a video card on a PC you're going to look at your video RAM from your video card so I'm on a Mac today so let's take a

look and one incredible thing that LM Studio does for you out of the box is that it actually looks at your specs of your computer and right here it has this green check and should work which means the model that I have selected right now should work on my computer given my specs so you no longer have to think about well how much RAM do I have how much video RAM do I have what's the model size which quantization method should I use it'll just tell you it should work now here's another example I just

searched for llama this is the Samantha 1.1 version of llama and it is a 33 billion parameter version and right here it says requires 30 plus GB of RAM now my machine has 32 GB so it should be enough and it's not saying it won't work but it's giving me a little warning that says hey it might not work and back to the search page for mistol let's look at a few other things that we're going to find in here so it tells us the number of results it tells us it's from hugging face Hub

we can sort by the most recent we can sort by the most likes we can sort by the most downloads usually likes and downloads are pretty in line with each other I usually like to sort by most recent because I like to play around with whatever the most recent models are and you can also switch this to least so you click on that and you can find least recent but I don't know why you would want to do that then we also filter by a compatibility guess so it won't even show me models that it

doesn't think I can run and if I click that again now it's showing all models so I like to leave that on filtered by compatibility best guess now again within the list of quantized versions of a specific model we can actually see the specific Quant levels here so this is q2k q2k and so on all the way up to Q8 and the largest one down here is going to be also the largest file size if we hover over this little information icon right here we get a little description of what each of the quantization methods

give us so here Q2 lowest Fidelity extreme loss of quality uses not recommended and up here we can see what the recommended version is which is the Q5 km or KS and it says recommended right there so these are just a little bit of a loss of quality Q5 is usually what I go with here it gives us some tags about the base model the parameter size and the format of the model we can click here to go to the model card if we want but then then we just download so we download it right

here so I'm going to download one of the smaller ones let's give it a try we just click and then you can see on the bottom this blue stripe lit up and if we click it we can actually see the download progress and it really is that easy and you can see right here I've already downloaded the find code llama 34b model and I'm actually going to be doing a video about that and also another coding model called Deep seek coder and what makes LM studio so awesome is that it is just so so easy

to use and the interface is gorgeous it's just super clear how to use this for anybody and it makes it really easy to manage the models manage the different flavors of the models it's a really nice platform to use all right while that's downloading I'm going to load up another model and show it to you so in this tab right here this little chat bubble tab this is essentially a full interface for chatting with a model so up at the top here if we click it you find all the models that you've downloaded and I've

gone ahead and selected this mistal model which is relatively small 3.82 gab so I select that and it loads it up and then I'm really done it's ready to go I'm going to talk about all the settings on the right side though and over here on the right side the first thing we're going to see is the preset which basically sets up all the different parameters pre-done for whatever model you're selecting so for us for this mistal model of course I'm going to select the mistal instruct preset and that's going to set everything here's the

model configuration and you can save a preset and you can also export it and then right here we have a bunch of different model parameters so we have the output Randomness and again what I really like about LM studio is that it can be used even if you're not familiar with all of this terminology so typically you see Temp and end predict and repeat penalty but a lot of people don't know what that stuff actually means so it just tells you output Randomness words to generate repeat penalty and if you hover over it it gives

you even more information about it so here output Randomness also known as Temp and it says provides a balance between Randomness and determin minism at the extreme a temperature of zero will always pick the most likely next token leading to identical outputs each run but again as soon as you select the preset it'll set all of these values for you so you can play around with it as you want here's the actual prompt format so we have the system message user message and the assistant message and you can edit all of that right here here

you can customize your system prompt or a pre- prompt so if you want to do role playing this would be a great place to do it so you could say you are Mario from Super Mario Brothers respond as Mario and then here we have model initialization and this gets into more complex settings some things are keep the entire model in Ram and a lot of these settings you'll probably never have to touch and here we go we have Hardware settings too so I actually do have apple metal I'm going to turn that on and I'll

click reload and apply and there we go next we have the context overflow policy and that means when the response is going to be too long for the context window what does it do so the first option is just stop the second option is keep the system prompt and the first user message truncate the middle and then we also have maintain a rolling window and truncate past messages so so I'll just keep it at stop at limit for now and then we have the chat appearance if we want plain text or markdown I do want

markdown and then at the bottom we have notes and now that we got all those settings ironed out let's give it a try all right and I said tell me a joke Mario knock knock who's there jokes jokes who just kidding I'm not really good at jokes but here's one for you why did the Scarecrow win an award because he was outstanding in his field and so you can export it as a screenshot you can regenerate it or you can just continue and continue is good if you're getting a long response and it gets cut

off over on the left side we have all of our chat history so if you've used chat GPT at all this should feel very familiar if you want to do a new chat you just click right here if you want to continue on the existing chat you just keep typing so for example I can just say tell me another one and it should know that I'm talking about a joke because it's continuing from the history that I previously had in here so why did this made turn red because it saw the salad dressing great now

if I wanted to say new chat and I said tell me another one it wouldn't know what I'm talking about there we go and it's just typing out random stuff now so I'm going to click stop generating and then if we look at the bottom we have all the information about the previous inference that just ran so time to First token generation time tokens per second the reason stopped GPU layers etc etc so it really gives you everything but it keeps it super simple the next thing I want to show you is for developers so

if you want to build an AI application using LM Studio to power the large language model you click this little Double Arrow icon right here which is local server so I click that and all you have to do is click Start server you set the port that you want you can set whether you want cores on and you have a bunch of other settings that you can play with so once I click Start server now I can actually hit the server just like I would open Ai and this is a dropin replacement for open AI

so it says right here start a local HTTP server that behaves like open ai's API so this is such an easy easy way to use use large language models in your application that you're building and it also gives you an example client request right here and so this is curl so we curl to the Local Host endpoint chat completions and we provide everything we need the messages the temperature the max tokens stream and then it also gives us a python example right here so if we wanted to use this python example we could do that

and what's awesome is you can just import the open AI python library and use that but instead replace the base with your local host and it will operate just the same so you get all the benefits of using the open AI library but you can use an open source model and of course on the right side you get all the same settings as before so you can adjust all the different settings for the model and then the last tab over here looks like a little folder so we click it it's the my models tab which

allows you to manage all the different models that you have on your computer so right now it says I have two models taking up 27 GB of space I don't want this fine model anymore it's taking up too much space so let's go ahead and delete it so I just click delete and it's gone just like that it is so easy to manage all of this and I think I covered everything for LM studio if you want to see me cover any other topic related to LM Studio let me know in the comments below if

you liked this video please consider giving a like And subscribe and I'll see you in the next one