host ALL your AI locally

1.41M views5291 WordsCopy TextShare
NetworkChuck
Ready to get a job in IT? Start studying RIGHT NOW with ITPro: https://go.acilearning.com/networkch...
Video Transcript:
I built an AI server for my daughters. Well, first it was more for me. I wanted to run all of my AI locally.
And I'm not just talking command line with alama. No, no, no. We have a gui, a beautiful chat interface and this thing's feature filled.
It's got our back chat histories, multiple models, we can even add stable diffusion. And I was able to add this to my notes application obsidian and have my chat interface right there. I'm going to show you how to do this.
Now you don't need something crazy like Terry, that's what I named my AI server. It can be something as simple as this, this laptop, I'll actually demo the entire setup on this laptop. So likely the computer you're using right now, the one you're watching, this video one will probably work.
And seriously, you're going to love this. It's customizable, it's wicked fast, way faster than anything else I've used. Isn't that amazing?
And again, it's local, it's private. I control it, which is important because I'm getting it to my daughters. I want them to be able to use AI to help with school, but I don't want them to cheat or do anything else weird.
But because I have control, I can put in special model files that restrict what they can do, they can ask, and I'll show you how to do that. So here we go. Get your coffee ready.
We're about to dive in, but first let me have you meet Terry. Now Terry has a lot of muscle. So for the case, I needed something big.
I got the Leon Lee zero 11 dynamic EVO xl. It's a full tower EATX case perfect to hold my ASUS X six 70 E Creator pro art motherboard. This thing's also a beast.
I'll put it in the description so you can look at it. Now, I also gave Terry a big brain. He's got the A MD Ryzen 9 79 50 x.
That's 4. 2 gigahertz and 16 cores. From memory, I went a little crazy.
I've got 128 gigabytes of the gki trite D five Neo, it's DDR R five 6,000 and way overkill for what I'm doing. I think I got a Leon Lee water cooler for the CPU. I'm not sure if I'm seeing Leon Lee, right?
I don't know. Correct me in the comments. You always do.
And then for the stuff AI loves, I got two 40 nineties, it's the MSI Sremm and their liquid cooled so they could fit on my motherboard. 24 gigabytes of memory each giving me plenty of muscle For my AI models for storage, we got two Samsung nine 90 pros, two terabytes, which you can't see because they're behind stuff. And also of Corsair AX 1600 I power supply 1600 watts to power the entire build.
Terry is ready. Now, I'm surprised to say my system actually posted on the first attempt, which is amazing. But what's not amazing is the fact that Ubuntu would not install.
I tried for hours actually for a whole day and I almost gave up and installed Windows, but I said, no, Chuck, you're installing Linux. So I tried something new, something I've never messed with before. It's called Pop Os by system 76.
This thing is awesome. It worked the first time. It even had a special image with Nvidia drivers built in.
It just stink and worked. So I sipped some coffee, didn't question the magic and moved on. Now if you do want to build something similar, I've got all the links below.
But anyways, let's talk about how to build your very own local AI server. First, what do you need? Really all you'll need is a computer.
That's it. It can be any computer running Windows, Mac or Linux. And if you have a GPU, you'll have a much better time.
Now, again, I have to emphasize this, you won't need something as beefy as Terry, but the more powerful your computer is, the better time you'll have. Don't come at me with a Chromebook please. Now step one, alama.
This is the foundation for all of our AI stuff and what we'll use to run AI models. So we'll head on over to alama. ai and click on download and they've get a flavor for every os.
I love that. Now if you're on Mac, just download it right now and run it. If you're on Windows, they do have a preview version, but I don't want you to do that.
Instead, I want you to try the Linux version. We can install it with one command. And yes, you can run Linux on Windows with WSL.
Let's get that going real quick. First thing I'll do is go to the start bar and search for terminal and launch my terminal. Now those first bit is for Windows folks only Linux people to hang on for a moment, we got to get WSL installed or the Windows subsystem for Linux.
It's only one command WSL dash install and that's it. Actually hit enter and that's going to start doing some stuff. When it's done, we'll set up a username and password.
I got a new keyboard by the way. Do you hear that link below? It's my favorite keyboard of the entire world.
Now some of you may have to reboot. That's fine. Just pause the video and come back.
Mine is ready to go though. And we're walking Ubuntu 22. 04, which is still amazing to me that we're running Linux on Windows.
That's just magic right now we're about to install Llama, but before we do that, you got to do some best practice stuff like updating our packages. So we'll do a pseudo a PT update and then we'll do a pseudo A PT upgrade Y to apply all those updates. And actually while it's updating, can I tell you something about our sponsor IT Pro by a CI Learning.
Now in this video, we're going to be doing lots of heavy Linux things. I'm going to walk you through it. I'm going to hold your hand and you may not really understand what's happening.
That's where IT pro comes in. If you want to learn Linux or really anything in it, they are your go-to, that's what I use to learn new stuff. So if you want to learn Linux to get better at this stuff or you want to start making this whole hobby thing your career, actually learn some skills, get some certifications, get your A plus, get your CNA, get your AWS certifications, your Azure certifications and go down this crazy IT path, which is incredible.
It's the whole reason I make this channel and make these videos. Check out IT Pro they've got IT training that won't put you to sleep. They have labs, they have practice exams, and if you use my Code network check right now, you'll get 30% off forever.
So go learn some Linux and thank you to IT Pro for sponsoring this video and making things like this possible. And speaking of my updates are done. And by the way, I will have a guide for this entire thing.
Every step, all the commands, you can find it at the Free network Chuck Academy membership. Click the link below to join and get some other cool stuff as well. I can't wait to see you there.
Now we can install llama with one command. And again, all commands are below. It's going to paste this in a nice little curl command, little magic stuff and I love how easy this is.
Watch you just sit there and let it happen. Do you not feel like a wizard when you're installing stuff like this and the fact that you're installing AI right now? Come on.
I noticed one thing real quick. Old LAMA did automatically find out that I have an Nvidia GPU and it's like awesome, you're going to have a great time. If it didn't see that and you do have a GPU, you may have to install some Nvidia Cuda drivers.
I'll put a link for that below, but not everyone will have to do that. And if you're rocking a Mac with an M1 through M three chip, you're going to have a good time too. They'll use the embedded GPU Now at this, our Mac users, our Linux users and our Windows users are all converged.
We're on the same path. Welcome. We can hold hands and sing.
That's getting weird. Anyways, first we have to test a few things to make sure alama is working. And for that we're going to open our web browser.
I know it's kind of weird, just stick with me. I'm going to launch Chrome here and here are my address bar. I want to type in local host, which is looking right here at my computer.
And port 1, 1 4, 3, 4, hit enter. And if you see this right here, this message, you're good to go and you're about to find this out. Port 1 1 4 3 4 is what llama's API services is running on and it's how our other stuff is going to interact with it.
It's so powerful. Just check this out. I'm so excited to show you this.
Now before we move on, let's go ahead and add an AI model to alama. And we can do that right now with alama Pull and we'll pull down Llama two, A very popular one. Hit enter and it's ready.
Now let's test it out real quick. We'll do Alama run Llama two. And if this is your first time doing this, this is kind of magic.
We're about to interact with a chat GPT, like AI right here, no internet required. It's all just happening in that five gigabyte file. Tell me about the solar eclipse.
Boom. And you can actually control see that to stop it. Now I want to show you this.
I'm going to open up a new window. This is actually an awesome command and with this WSL command, I'm just connecting to the same incident. Again, a new window.
I'm going to type in watch dash N 0. 5, not four five Nvidia dash smmi. This is going to watch the performance of my GPU right here in the terminal and keep refreshing it.
So keep an eye on this right here. As I chat with llama two, give me a list of all Adam Sandler movies and look at that GPU Go. Ah, it's so fun.
Now can I show you what Terry does? Real quick? I got to show you Terry.
Terry has two GPUs here. They're right here and Alama can actually use both of them at the same time. Check this out.
It's so cool. All the semi old Jackson movies. And look at that.
Isn't that amazing? And look how fast it went. That's ridiculous.
This is just the beginning. So anyways, I had to show you Terry. So now we have a llama installed.
That's just our base. Remember I'm going to say bye. So slash bye to end that session.
Step two is all about the web ui. And this thing is amazing. It's called Open Web ui and it's actually one of many web UI you can get for Llama, but I think Open Web UI is the best.
Now Open Web UI will be run inside a Docker container. So you will need Docker installed and we'll do that right now. So we'll just copy and paste the commands from Network Struck Academy.
This is also available on Docker's website. First step is updating our repositories and getting docker's GPG key. And then with one command we will install Docker and all its goodies.
Ready, set, go. Yes, let's do it. And now with Docker install, we'll use it to deploy our open web UI container.
It'll be one command you can simply copy and paste. This Docker Run Command is going to pull this image to run this container from Open Web ui. It's looking at your local computer for the llama base, URL because it's going to integrate and use Llama and it's going to be using the host network adapter to make things nice and easy.
Keeping in mind this will use Port 80 80 on whatever system you are using. And all we have to do is hit enter after we add some pseudo at the beginning, pseudo docker run and let it do its thing. Let's verify it real quick.
We'll do a little pseudo docker PS. We can see that it is indeed running. And now let's go log in.
It's kind of exciting. Okay, let's go to our web browser and we'll simply type in local host colon port 80, 80, and whoa, okay, it's really zoomed in. I'm not sure why yours shouldn't do that.
Now for the first time you run it, you'll want to click on sign up right here at the bottom and just put your stuff in. This login info is only pertinent to this instance, this local instance, we'll create the account and we're logged in. Now just so you know, the first account you log in with or sign up with will automatically become an admin account.
So right now, you as a first time user logging in, you get the power. But look at this. How amazing is this?
Let's play with it. So the first thing we have to do is select the model. I'll click that drop down and we should have one llama two.
Awesome. And that's how we know also our connection is working. I'll go ahead and select that.
And by the way, another way to check your connection is by going to your little icon down here at the bottom left and clicking on settings and then connections. And you can see our oh LAMA based CRL is right here. If you ever have to change that for whatever reason.
Now with LAMA two selected, we can just start chatting and just like that, we have our own little chat, GBT that's completely local and this sucker is beautiful and extremely powerful. Now, first things we can download more models. We can go out to llama and see what they have available.
Look on their models to see their list of models. Code Gemma is a big one. Let's try that.
So to add code Gemma, our second model, we'll go back to our command line here and type in Alama pull code Gemma. Cool, it's done. Once that's pulled, we can go up here and just change our model by clicking on the little dropdown icon at the top.
Yep, there's code gma. We can switch. And actually I've never done this before, so I have no idea what's going to happen.
I want to click on my original model LAMA two. You can actually add another model to this conversation. Now we have two here.
What's going to happen? So code Gemma is answering it first. I'm actually not sure what that does.
Maybe you guys can try it out and tell me. I want to move on though. Now some of the crazy stuff you can see right here, it's almost more featured than chat GBT In some ways.
You got a bunch of options for editing your responses, copying, liking and disliking it to help it learn. You can also have it read things out to you, continue response, regenerate response, or even just add stuff with your own voice. I can also go down here and this is crazy.
I can mention another model and it's going to respond to this and think about it. Did you see that? I just had my other model.
Talk to my current. That's just weird, right? Let's try to make 'em have a conversation.
They're going to have a conversation. What are they going to talk about? Let's bring back in LAMA two to ask the question.
This is hilarious. I love this so much. Okay, anyways, I can spend all day doing this.
We can also with this plus sign upload files. This includes a lot of things. Let's try, do I have any documents here?
I'll just copy and paste the contents of an article, save that and that'll be our file. Summarize this. You can see our GPU being used over here.
I love that so much. Running locally. Cool.
We can also add pictures for multimodal models. I'm not sure coma can do that. Let's try it out real quick.
So alama can't do it, but there is a multimodal model called lava. Let's pull that down real quick with lava pulled, let's go to our browser here. Once more, we'll refresh it, change our model to lava.
Add the image. That's really scary. There we go.
That's pretty cool. Now here in a moment, I will show you how we can generate images right here in this web interface by using stable diffusion. But first let's play around a bit more.
And actually the first place I want to go to is the admin panel For you, the admin, we have one user and if we click on the top right, we have admin settings. Here's where a ton of power comes in first. We can restrict people from signing up.
We can say enabled or disabled. Now, right now, by default it's enabled. That's perfect.
And when they try to sign up initially, there'll be a pending user until you're approved, lemme show you. So now real quick, if you want to have someone else use this server on your laptop or computer or whatever it is, they can access it from anywhere as long as they have your IP address. So lemme do a new user signup real quick just to show you.
I'll open an incognito window, create account, and look. It's saying, Hey, you got to wait. Your guy has to approve you.
And if we go here and refresh our page on the dashboard, there is Bernard hack. Well, we can say, you know what? He's a user or click it again, he's an admin.
No, no he's not. He's going to be a user. And if we check again, boom, we have access.
Now what's really cool is if I go to admin settings and I go to users, I can say, Hey, you know what? Don't allow Chad deletion, which is good. If I'm trying to monitor what my daughters are kind of up to on their chats, I can also whitelist models.
So you know what, they're only allowed to use LAMA two and that's it. So when I get back to Bernard hack Well's session over here, I should only have access to LAMA two. It's pretty sick and it becomes even better when you can make your own models that are restricted.
We're going to mo you on over to the section called model files right up here. And we'll click on create a model file. You can also go to the community and see what people have created.
That's pretty cool. I'm going to show you what I've done for my daughter, Chloe, to prevent her from cheating. She named her assistant Deborah.
And here's the content. I'm going to paste it in right now. The main thing is up here where it says from, and you choose your model.
So from llama two. And then you have your system prompt, which is going to be between three double quotes. And I've got all this telling it what a can and can't do, what Chloe's allowed to ask.
And it ends down here with three double quotes. You can do a few more things. I'm just going to say, as an assistant education save and create.
Then I'll go over to my settings once more and make sure that for the users, this model is whitelisted. I'll add one more. Debra Notice she's an option now.
And if Bernard's going to try and use Debra and say Debra paper for me on the Civil War. And immediately I was shut down saying, Hey, that's cheating. Now Llama two, the model we're using, it's okay.
There's a better one called mixed roll Lemme, lemme show you Terry. I'll use Deborah or Deb and say, write me a paper on Benjamin Franklin. I notice how it didn't write it for me, but it says it's going to guide me.
And that's what I told it to do to be a guide. I tried to push it and it said no. So that's pretty cool.
You can customize these prompts, put in some guard rails for people that don't need full access to the kind of stuff right now. I think it's awesome. Now, OpenWeb UI does have a few more bells and whistles, but I want to move on to getting stable diffusion set up.
This thing is so cool and powerful. Step three, stable diffusion. I didn't think that image generation locally would be as fun or as powerful as chat GPT, but it's more, it's crazy.
You got to see it. Now we'll be installing Stable diffusion with a UI called Automatic 1 1 1 1. So let's knock it out.
Now before we install it, we got some prereqs and one of them is an amazing tool. I have been using a lot called PI ENV, which helps us manage our Python versions and switch between them, which is normally such a pain. Anyways, the first thing we got to do is make sure we have a bunch of prerequisites installed.
Go ahead and copy and paste this from the Network Check Academy. Let it do its thing for a bit. And with the prereqs installed, we'll copy and paste this command, a curl command that'll automatically do everything for us.
I love it. Run that. And then right here it tells us we need to add all this or just run this command to put this in our bash RC file.
So we can actually use the pie EMV command. I'll just copy this, paste it, and then we'll type in source B RC to refresh our terminal. And let's see if pi ENV works, PI ENV, we'll do a dash H to see if it's up and running.
Perfect. Now let's make sure we have a version of Python install that we will work for most of our stuff. We'll do PI ENV install three point 10.
This will of course install Python three point 10, the latest version. Excellent Python three point 10 is installed. We'll make it our global Python by typing in PI ENV global three point 10.
Perfect. And now we're going to install automatic 1, 1, 1, 1. The first thing we'll do is make a new directory M-K-D-A-R for make directory, we'll call it stable.
And then we'll jump in there. CD stable diff. And then we'll use this W get command to w get this BS script.
We'll type it Ls to make sure it's there. There it is. Let's go ahead and make that sucker executable by typing in CH mod.
We'll do a plus x and then web UI sh. Now it's executable. Now we can run it.
Period slash web ui sh. Ready, set, go. This is going to do a lot of stuff.
It's going to install everything you need for open web ui. It's going to install PyTorch and download stable diffusion. It's awesome.
Again, a little coffee break. Okay, that took a minute, a long time. I hope you got plenty of coffee.
Now it might not seem like it's ready, but it actually is running and you'll see the URL pop up around here. It's kind of messed up, but it's running on port 78 60. Let's try it out.
And this is fun. Oh my gosh. So local host 78 60, what you're seeing here is hard to explain.
Lemme just show you And let's generate, okay, it got confused. Lemme take away the MPA Lupa part. But this isn't being sped up.
This is how fast this is. No, that's a little terrible. What do you say?
We make it look a little bit better. Okay, that's terrifying. But just one of the many things you can do with your own ai.
Now you can actually download other models. Lemme show you what it looks like on Terry and my new editor, Mike, tell me, do this. That's weird.
Let's make it take more time. But look how fast this is. It's happening in real time as I'm talking to you right now.
But if you've ever made images with GT four, it just takes forever. But I just love the fact that this is running on my own hardware and it's kind of powerful. Lemme know in the comments below, which is your favorite image, actually post on Twitter and tag me.
This is awesome. Now this won't be a deep dive on Stable Diffusion. I barely know what I'm doing.
But let me show you real quick how you can easily integrate automatic 1, 1 1, 1 1. Did I have to do enough ones? I'm not sure.
And they're stable diffusion inside Open Web ui. So it's just right here back at Open Web ui. If we go down to our little settings here and go to settings, you'll see an option for images here.
We can put our automatic 1 1 1 1 base URL, which will simply be HTTP colon whack wack 1 2 7 0 0 1, which is the same as saying local host Port 78. What is it? 0 6 60 60 think is what it's, we'll hit the refresh option over here to make sure it works.
And actually no it didn't. And here's why. There's one more thing you got to know.
Here we have OpenWeb UI running in our terminal. The head control C is going to stop it from running. In order to make it work with open web ui, we got to use two switches to make it work.
So let's go ahead and run our script one more time. Open web UI or web UI sh. And we'll do dash listen and dash API Once we see the URL come up.
Okay, cool, it's running. We can go back over here and say, why don't you try that again buddy? Perfect.
And then over here we have image Generation experimental. They're still trying it out. We'll say on and we'll say save.
So now if we go to any prompt, let's do a new chat and we'll chat with llama two. I'll say, describe a man in a dog suit. This is for a stable diffusion prompt.
A bit wordy for my taste. But then notice we have a new icon. This is so neat.
Boom. An image icon. And all we have to do is click on that to generate an image based on that prompt.
I clicked on it, it's doing it. And there it is right in line. That is so cool.
And that's really terrifying. I love this. It's so fun.
Now this video is getting way too long, but there are still two more things I want to show you. I'm going to do that really quickly right now. The first one is, it's just magic.
Check it out. There's another option here inside Open Web ui, a little section right here called Documents. Here.
We can simply just add a document. I'll add that one from before it's there available for us. And now when we have a new chat, I'll chat with Code Gemma.
All I have to do is do a hashtag and say, let's talk about this and say, give me five bullet points about this. Cool. Give me three social media posts.
Okay, go Gemma. Lemme try it again. What just happened?
Okay, let's do a new prop. Oh, there we go. And I'm just scratching the surface.
Now the second thing I want to show you, last thing. I am a huge obsidian nerd. It's my notes application.
It's what I use for everything. It's been very recent. I haven't made a video about it, but I plan to.
But one of the cool things about this, this very local private notes taking application is that you can add your own local GBT to it, like what we just deployed. Check this out. I'm going to go to settings.
I'll go to community plugins. I'll browse for one. I'm going to search for one called B-M-O-B-M-O Chatbot.
I'm going to install that, enable it. And then I'm going to go to settings. I'll have BMO chatbots.
And right here I can have an Alama connection, which is going to connect to let's say Terry. So I'll connect 'em to Terry and I'll choose my model. I'll use Llama two, why not?
And now right here in my note, I can have a chat bot come right over here to the side and say like, Hey, how's it going? And I can do things like look at the help file, see what I can use here. Ooh, turn on reference.
So I'm going to say reference on, it's now going to reference the current note I'm in. Tell me about the system prompt. Yep, there it is.
And it's actually going through and telling me about the note I'm in. So I have a chat bot right there, always available for me to ask questions about what I'm doing. And I can even go in here and go highlight this, do a little prompt, select generate it's generating right now and just generate some stuff for me.
I'm going to undo that. Let me do another note. So I want to tell a story about a man in a dog suit.
I'll quickly talk to my chat bot and start to do some stuff that's pretty crazy. And this I think for me is just scratching the surface of running local AI private in your home on your own hardware. This is seriously so powerful and I can't wait to do more stuff with this.
Now. I would love to hear what you've done with your own projects. If you attempted this, if you have this running in your lab, let me know in the comments below.
Also, do you know of any other cool projects I can try that I can make a video about? I will love to hear that. I think AI is just the coolest thing, but also privacy is a big concern for me.
So to be able to run AI locally and play with it this way is just the best thing ever. Anyways, that's all I got. If you want to continue the conversation and talk more about this, please check out our Discord community.
The best way to join that is to jump through our Network Check Academy membership, the free one. And if you do want to join the paid version, we do have some extra stuff for you there too, and it'll help support what we do here. But I'd love to hang out with you and talk more.
That's all I got. I'll catch you guys next time. I.
Copyright © 2025. Made with ♥ in London by YTScribe.com