FREE Local LLMs on Apple Silicon | FAST!

218.16k views3155 WordsCopy TextShare

Alex Ziskind

Step by step setup guide for a totally local LLM with a ChatGPT-like UI, backend and frontend, and a...

Video Transcript:

we're now at a point where we can run a Chad GPT like thing on our MacBook using the Apple silicon GPU which makes everything run faster and we don't have to configure that or go through all the setup that we used to go through it's much much easier now and you can even have your own gpts kind of like gpts that Chad GPT has on your own machine and it looks way better than uh at least the mess that I made last week hey that was a beautiful prompt okay if you missed that video I'll link to it down below it didn't look very good today it's going to look good and we're going to get a ton of functionality and it's going to be fast let's do it now there's a couple of ways of going about this I want to show you the details that go into it because this you know we're software developers we want to know what's going on behind the scenes plus this has the added advantage of showing you the code so you can actually take a look under the hood because I think it's really cool this is what we're using right here it's called open web UI we're going to set up AMA and then we're going to set up this on top of it the first thing you're going to to do is clone this repository now look I'm going to show you the step by step I'm assuming that you already know how to clone a g repository but I will not assume that you have a local node environment or a python environment setup for those videos I'll link those down below in case you need a refresher I'm going to clone this repository let's go get clone boom and here it is we're going to go into that directory and let's have a peek at the code this has everything this has the front end written in JavaScript it's actually written in uh selt here go to package. json so so this will run on node and this is the front end we're using vit and we're using spel kit and this is like all the modern stuff the modern text deck the back end is a python back end so that's going to be in the back end folder you can take a look at the python side of things and this also has Docker configurations so yes you can run this through Docker but first I'm going to show you how to do this the regular way without Docker we're going to kick things off with AMA go to ama. com and download it but you're going to be like what the heck is olama and why should I download it and why should I run things you tell me this is the internet you might be doing bad things to me so AMA is like an agent that runs on your machine that automatically manages downloading llm models you can go to models here and these are the models that are available these are open- Source models you got llama 3 you got Gemma you got mixol code Gemma all these models that are available and this is an open source project so you can go to the GitHub page and check out this code too but we don't need to do that all we need to do is download this tool so we're going to go to the homepage click on download Mac OS this is available for Linux or Windows but you going to get the Mac OS version cuz this is a Mac OS tutorial I'm doing this on Mac OS Apple silicon let's go boom in your downloads folder you're going to get this file double click on it it's going to extract AMA which you can then drag to your applications folder boom in applications find AMA and run it are you sure yes we're sure this is going to have a little cute llama in your menu bar and it's running you can tell that it's running by going to Local Host 11434 is the port boom and it says olama is running so how do you use this thing well if you go back to the command line now you can say AMA version it says AMA version is that version right there whatever version you might have when you're watching this now you can use the AMA CLI to fetch models and you can get the model names on the website under models llama 3 for example let's do that so I'm going to say AMA pull llama 3 boom it goes out it gets the file it's 4.

7 GB I've already downloaded before so it took me absolutely no time at all but it might take you a few minutes if I want another one like llama 2 boom I also had that one let's get one that I didn't have 53 it doesn't matter to you I'm going to speed up that video I'm not going to make you watch it but I just want to see how long it takes this one is small it's 2. 3 GB and it says it's going to take like 25 seconds to download what's up with all these different models well they have different capabilities they're trained differently llama is from Facebook meta 53 is from Microsoft Gemma is from Google mixol is from mistol all of these companies spend millions of dollars training these models so that you can get them for free and run them locally on your own machine and you're going to have to play around to see what gives you the best results for your use cases all right I got three of these models what do we do next with this well we can run it ama run uh let's go with 53 what this will do is create a prompt so you can interact with a model right there Hi boom look how fast that is that's insane how can I assist you today but I want you to see something here I'm going to open up my activity monitor I have 64 gigs of RAM on this thing so the number of models I'll be able to run should fit within that 64 GB actually it's it's a little less than that it's about 75% of that earlier I made a video as to why it's less but we won't get into that here I want to open up the GPU history here and have that little window open so we can take a look at what's happening because this is not running on the CPU it's running on the GPU completely transparent to us write me a 1,000w essay on JavaScript boom and there it goes it starts writing that us it's really fast it's crazy fast it's probably going to be done before we see anything here oh we do see stuff it's done with the with the writing but at least it gave us a little bit of a blue Mark right there you can see that the GPU usage this is the Apple silicon GPU by the way was almost fully utilized for that moment it was generating that text and we didn't have to configure anything like we did previously to exit this I'm going to say buy because now we need a front end a pretty UI that looks like Chad GPT let's go back to our open web UI project and I'm going to pop open the terminal here in Visual Studio code control back tick will do that for me I have cond installed if you don't know what I'm talking about I'll link a video K lets you set up python environments where you can run python code in projects so that's what I have here it says base because that's the base I'm not in any active cond environment but I will create a new one condac create-- name and I'm going to call this one open web UI and I want to use Python equals. 311 if you follow my steps right there you'll be fine but if you want to know more details you can watch that other video and now we're going to just use this command to activate that environment cond to activate open web UI so copy that line paste it and now instead of Base we have open web UI here so now we're inside that python environment that has python version 311 if we say python version 311 there we go now we can go to the backend folder and Mis type all kinds of stuff before we get to the point so I'm going to do pip install dasr requirements this has a bunch of requirements that we need to install I'm going to give it the dash U flag for upgrade and if we take a look at what that is in the backend folder we have the requirements.

text file so it has all these requirements to run the back end wow YouTube transcript API I wonder if it can do it probably can I'm not going to for for another time I'm I'm getting distracted here pip install - R requirements this actually takes a couple minutes to do because there's so many requirements and they all have to be installed within this environment what's nice about this environment setup is I don't have to mess up my python installation with all these requirements that I might need only for this thing that's why I like to use cond all right we also have to run the front end environment which is a node application so while that's happening I'm going to leave that alone oh it's done okay great but if it was still working I can go up here and say I want to do another terminal I'm going to use node and npm and for node I also have an environment manager for that I use NVM for that let's take a look at what version of node I'm working with here I'm working with 18 I'm fine with that you can use a more modern version of node this is modern enough for this purpose for node version management I linked another video you can check that out later or you can just follow my steps here just make sure you have node installed okay just don't install it wrong please don't install node globally you can if you want but just don't let's move on npmi what what does npmi do for those of you that don't know is going to look at package. json and install all the dependencies that are here basically NP I is short for npm install once that's done we're ready to build this npm run build which is going to do this script right here cool we got our front end we got our back end let's go back to the back end terminal where we just installed all the python requirements and we're going to run Bash start. sh that's in the backend folder boom when you start it up and go to Local Host port 8080 it takes you to off automatically because this has authentic a built in with a database really cool for you to look at that code by the way check it out and uh you do need to sign up it's not going to send your credentials anywhere it's all stored on your machine this is just for fun Alex let's go with my email this can be a fake email by the way which is what I'm going to do right now create account boom and now I'm signed in and look at this beautiful interface um I can start chatting but I can't really until I select a model so if we go up here to select the model you'll see all the models that we have installed you can switch between them and you can even combine models I'll show you that in a moment so we've downloaded llama 2 you saw me do that llama 3 and 53 these two I'll talk about that in a second where they came from I can say let's go with 53 and maybe I want to mix in llama 3 and let's do those two and I'll say hi to both of them so it chose llama 3 for some reason it says hello it's nice to meet you is there something I can help you with this is basically going to O Lama and then returning the result to the back end and then the front end and that's how you interact with this in the settings you have the ability to select a theme system prompt Advanced parameters you can play around with that here's where you can have models and you can pull models directly from this user interface so if if you see another model you like like mistol for example you can type that in here and pull it down this way delete models that exist all sorts of add-ons for the UI image Generation by the way I don't have a video for this yet but if you want to see an image Generation video with automatic 1111 and how that integrates let me know in the comments down below I may just do that here or maybe a memb only video if you're a member thank you so much for being a member if you want to join the channel and support the channel I make special videos just for members as well as these videos but don't worry I'll also make more tutorials and things like that for the main Channel as well if you subscribe that's completely free that's your cue to subscribe and if you like it if you like this videos give me a thumbs up so I know all right what do we got here chats Imports exports account this is your user management if you click on user you can sign in sign out archive chats playground with a playground that's a little bit different than chat you can actually add your system prompt you might not know this but when you're using chat GPT there is a system prompt that gets sent along with your prompt in addition to the context now here's a really cool thing prompts this allows you to basically store your own prompts if you like some prompts you've created them they work well for you you can store them in here you can import them you can export them and you can use Community prompts so there's a whole community that shares their prompts that you can access through this UI it's not working right now maybe when you try it it'll work but what I did try was this right here if you go to model files you have made by open web UI Community this is basically like gpts in Chad GPT where where people put together their own little models but it's more than just gpts gpts is basically you provide a context you provide some sample prompting but here people can actually add their own models to it so for example code companion this gives you the uh model file content so it gives you a system prompt it gives you some parameters to start with when you install this it loads it up icon and everything see this model tag name this will actually grab the model that's associated that's been fine-tuned I'm guessing it's been fine-tuned if you know any better let me know in the comments by this person that created this thing so if you go to save and create is going to pull the Manifest download any Associated models which may be large see that pull progress right there it's going to take a little bit of time because this thing is pulling down two pretty large files I think these are 26 billion parameter models that it's pulling down related to code generation specifically so I'm going to have some coffee I don't know about you but that's what I'm going to do I'm out of coffee I'm going to have to sit here and wait we're almost there folks we're almost there we got code companion so now when I go to new chat I can select that code companion model write me some code I know super descriptive but you're probably sick of seeing all the examples let's see what it comes up with when just prompted like that oh it CHS go and there we go get it I'm sure it can write some code beautiful code but this is not a video about that we're also set up folks but there's one more thing and I told you about Docker now I do have Docker installed you can go to Docker desktop products Docker desktop download for Mac make sure it's the mac Apple chip version if that's in fact what you have run Docker desktop and again I have entire development setup videos I'll link to one of those down below how I set up my development environment on Mac that includes Docker once Docker is up and running it's really easy peasy all you got to do is go right here openweb ui.