I’m changing how I use AI (Open WebUI LiteLLM)

433.39k views5445 WordsCopy TextShare
NetworkChuck
🛠️ Build your own AI Hub!! Run OpenWebUI on your own VPS with Hostinger (code networkchuck10): http...
Video Transcript:
I found a way to access every ai. I'm talking chat, GBT, Claude Gemini Grok from one self-hosted interface, and no, I'm not paying for any of these plans. Get out of here yet.
I have unlimited usage and I get access to the newest models as soon as they come out. No more waiting. And the best part is that all my people get to use it.
I can create accounts for my employees, for my wife, for my kids, and they can access all the new stuff, but the best part is that I have control, for example, my kids. I don't want them accessing every AI model so I can restrict that. I can also restrict what they can ask, what they get help with so they're not cheating on their homework and letting out some network check secrets and I can see all their checks, which really you should be looking at your kids' AI chats if you're letting them have AI and you should let them have AI hot take, but I think kids need to learn how to use it because that's kind of the future.
Right now it's not going anywhere, but seriously, I love the solution. It's better security. My data's a bit more safe and oh my gosh, the amount of features it has, I'm addicted.
This might be the better way to use ai. This is open Web ui. Now you've probably heard of that.
In fact, I've talked about it, but this video is going to come at it in a bit different way. I'm going to try something new and if you've never heard of it, get your coffee ready. I'm going to have you set up in about five minutes.
Let's go. Okay. Open Web ui.
It's an open source, self-hosted web interface for AI and it allows you to use whatever LLM or large language model you want to use. And it's not just cloud stuff like chat, GPT and Claude, which by the way, you're probably wondering how are we going to run those. You'll see it's really awesome, but it's not just those.
We can run self-hosted models of the alama talking like Llama three and Myre and Deep Seek. You can run 'em all and I actually, I often run them side by side, two, three, sometimes four. One of my favorite features.
Speaking of features, fair warning, there goes your weekend. There are so many to play with, it's addicting, but it's also simple enough for anyone to start using immediately, so don't worry, but I will say this asterisk, this isn't for everyone. There's one asterisk, one thing that might scare you away, you'll see, but I'm still here.
I'm still going to use it. We'll cover that later. Now what do we need to get this set up?
As I mentioned, this is self-hosted, which means you yourself are going to host this somewhere. You're going to set it up, you're going to install it, and for that you really have two options. Either the cloud, this is the easiest and fastest method or you can go on prem, host it in your house.
This could be on your laptop, on a nas, on a raspberry pie. I'll show you both options. Whichever you choose is going to be quick and easy and you're going to be like, how was that so fast?
And how is this so amazing? Trust me, you will. We'll start with the cloud.
Don't blink. It's going to be fast. And for this option, we'll be setting up what's called a VPS or a virtual private server and the cloud and we'll be setting it up on hosting the sponsor of this video.
So real quick in the description, I have a link hosting your. com/network. Chuck VPs, go ahead and go there.
Click on choose your plan, and KVM two is my favorite option because you're essentially getting yourself a very healthy home. Laugh. Hey network, Chuck from the future here.
I know what you're probably thinking six bucks a month. Why Don? I just pay for Chacha.
Bt, hey, I get it, but here's why I still love this. First it's cheaper than chat GBT. Second, you're getting your own server that can host your own chat, GBT, which is just cool.
And third, you can host more than just open web ui. The server's beefy enough to do a lot more things. It's a home lab.
I'm telling you a Healthy Home lab. Just wanted to add that context. Anyways, back to me.
Look at this thing. A MD, epic, CPU eight gigs of RAM and VME storage. Plenty of bandwidth and my favorite feature for all you home laborers, backups and snapshots because we break stuff and you're going to need this.
So just know not only will this puppy run open web UI just fine, you'll be able to add more stuff to it. More projects, resume building moments. I just started watching Home improvement again, so I feel like I need to do this.
Sorry, I couldn't do it. That's embarrassing. Don't do some coffee while I deal with that real quick.
You do it at home. See if you can do it. Tim, the tool man, Taylor, love that show.
That show still hits. Anyways, let's keep building this. So I'll choose the KVM two.
If you don't already have an account, it'll ask you to make an account. Choose your term. I'm going to do not 24 months, 12 months sounds pretty good to me.
Check this out. Coupon code, right over here on the right, type in network. Chuck 10, apply that sucker.
It's now cheaper. Now pick where you want it to be. Somewhere close to you actually, yeah, Phoenix is good.
I think it'll automatically tell you based on the latency to you. And then we'll choose our OS. Now for us, because we want to do open Web ui, we're in luck.
We'll actually click on application right here and we'll click on show more. I don't see it right now where yet Buddy. We're going to be looking for Llama.
Ah, there it is. He's hiding from me. They probably have one of the cutest logos in the industry.
We're going to go ahead and select this because not only will it install Llama, which is what people can use to install local LLMs, it will also install Open Web UI like that and it's going to be on Ubuntu 24 0 4. So you can add plenty of stuff on top of that. Alright, let's go ahead and click on confirm.
Continue. Actually, I lied. You're going to get logged in right here.
Enter all your info free malware scanner. Sure. Okay, click continue.
Enter a root password. This will be the password that you'll use to log into your VPS. Click continue and I think we're almost done.
Yeah, finish that up right here. Go and it's setting it up right now you have a virtual private server being spun up in the cloud and they're installing OpenWeb UI along with Llama and all you have to do is sip some coffee. It's pretty cool.
For on-prem, go watch this video right here. I'll walk you through it. Just pause me.
I'll still be here. Come back and see me. Alright, it is done.
We'll click on VPS management page, go look at it and here is mine. Go ahead and click on Manage over here on the right and right now open Web UI is just waiting for us. Click on the manage app button right there.
What that will do is launch another tab. Go and click on that, essentially your public IP address on port 80 80. And here we are.
This is your open web ui, unlock mysteries wherever you are. Sounds like AI made that. Alright, go and click on Get started at the bottom there and here we'll create our first account for OpenWeb ui.
This first account will be your admin account. So you have Godlike Powers over everything. Click on create admin account and celebration.
We're here. Okay, let's go. Now, if you followed along with the hosting your setup, you'll see by default we've got a nice little AI model to play with.
LAMA 3. 21 B as opposed to an open AI model like Chad, GBT Lama 3. 2 is a local model.
It'll use your servers resources instead of open ais. Let's talk to it. Hey, how are you?
And it feels like chat GBT, right? Same kind of familiar interface except as you might see it's slower. Actually that wasn't too bad and it wasn't too bad because this is not a very smart model.
It's very small, which means it's going to be a bit dumber than the other ones and you won't really be able to run bigger smarter models unless you have some killer hardware. I'm stocking GPUs Terry, but we don't really care about that right now because we're not done yet. We're about to add some big boy models from the cloud.
Now when you want to access AI models like chat, GBT or Claude, you usually have two options. Option one, normie mode, you go out to chat GBT, you pay a monthly plan, pay a lot. If you want to access to all the new stuff and that's it, you're done.
It's easy. No shame I do it. But then option two is where things get interesting.
APIs application programming interfaces are what developers use to integrate AI like Chad CT into their apps and programs. So what, we're not writing an app, why do we care? Well, it comes down to how they pay for that access.
Normies pay a set price per month. APIs you pay as you go or you pay for what you use. Two reasons why.
That's amazing. First, providers normally give API access to all of their models, especially the ones they just released. So think chat, GT 4.
5 that just came out and the people who have access to that on the normie plans are only the $200 a month people, the $20 Pro users, sorry you're out of luck, but if you're using an API, you get access to that right Now. The second cool thing is that you may end up saving money, not guaranteed massive asterisk, but if people on your team or in your house aren't really heavy users of ai, paying for a full plan for them doesn't make any sense if they're only going to be using 50 cents a month. Okay, so what does that look like?
Well, let's get it signed up for it right now. Let's go out to open AI and instead of going to chat gbt, we'll go to open ai. com/api and we'll get signed in or create an account.
Whatever you got to do, once you're in, you'll go to the top right and click on start building. And here, yeah, it's going to ask you for a credit card, but you're not going to be charged per month. You're only going to be charged for what you use initially.
You can add just five bucks. That's five bucks that will sit there until you use it. So I'll go and add a credit card right now.
I'll top it off with five bucks and then I'll go and create what's called an API key. This will actually unlock all these chat GBT models for us on the open web UI interface to get that API key. We'll go to settings at the top, right, just click that little gear there.
Once there we'll go to the left and click on API keys and we'll create a new secret key. Name it, put it in the default project, leave everything else as is and click on create secret key. There's your key.
Copy it. Let's go put it inside OpenWeb UI right now here in OpenWeb ui we're going to go to the top right and click on our profile icon and click on admin panel. From here, we'll click on settings and then connections.
Connections are what give us additional functionality, additional LLMs for open web ui and right there there's a blank space baby sitting right there for us. Sorry Taylor Swift. Well we're going to paste our API key right there and click on save Now, no fireworks, nothing crazy.
What happened? Let's click on the little menu thing on the left to open that up, expand it and then click on the pencil to start a new chat. And at the top there will change our model from llama to whatever we stink in what?
Look at all these GPT models. We have access to everything including that new 4. 5 model.
Let's search for it real quick. Where's it at? There it is.
Let's start chatting with it. Let's just have fun and right now if you follow it along, you're using a $200 a month model for nothing. Well, not for nothing we're about to see.
Don't get crazy yet. Lemme cover this part. We got to talk about how we pay for these AI interactions and this is the asterisk, the little, the gotcha you got to be careful about.
So when you're talking to an AI model specifically an LLM, a large language model that's going to be text-based. The way they charge us is by tokens. It's like Chuck E Cheese just without crappy pizza and a scary mouse.
Now what's a token? A token is a word in some cases. So for example, a small word like you, that's probably going to be one token or how more complex words might be broken up.
Actually let me ask it. How many tokens was your last response? It's 15 tokens.
Break that up so I can see which words were tokens and which were broken up. That's so sick. Okay, let's doing my job for me.
Do punctuation. It's its own token. What a ripoff.
If you want to save money with ai, don't use punctuation. Okay, words equal tokens. I still don't understand how much money we're being charged.
Let's go to the chart. How much you're charged will depend on which model you're using. Certain models are smarter and they require more resources to answer your questions.
And that's on display right here. For the oh three mini model, which is a solid model, it's going to cost you a dollar and 10 cents per 1 million tokens. So that's a healthy amount of interaction, right?
On the other hand, the oh one reasoning model will cost you $15 per million tokens. That's not scary. You want to see what's scary.
The model we were just using, the 4. 5 is their most expensive model, $75 per 1 million tokens, and that's just input Notice they do have an output section too. I wish I could do that for people, for my kids, charge them for talking to me and then when I give out my wisdom, make it more expensive.
It's genius. Now I know it's kind of hard to break down what does a million tokens mean? How much money am I going to be spending and am I going to be saving money?
Here's your warning, right? So a casual user, let's say they have 50 conversations a month, about a thousand tokens each. It could be as low as 50 cents assuming they're using a model like the 4.
0. Now if you use AI like me, that's very low usage, but some people are like that. A moderate user might have 200 conversations a month and this could be anywhere from five to 10 bucks a month.
Power users, and keep in mind, these are all very rough estimates. This can be sky's a limit, right? 20 bucks to infinity.
So hey, draw the infinity. Simple think I'm nailing it. Yep, got it.
I can tell you right now, me as a power user, it would not be 20 bucks a month. It'd be a lot more. What impacts that?
Well, what models you choose? I talk to the best models a lot. 4.
50 yeah, oh 1 0 3 talking all day and my conversations are long and that does impact how much it's going to cost context when you're using OpenWeb ui, the context of our messages are being sent each time I say something to the API so that it knows what I'm talking about. So the number of tokens I'm using exponentially grows with the length of my conversation and sometimes I sit there and talk for a while with an AI to figure stuff out. Now notice as part of open AI's pricing, this is very specific to OpenAI.
They do have cashed input which will help offset a lot of those costs. They will cash your responses, kind of keep them in memory over time. I think it's like 24 hours by default, they may change that.
Don't quote me on that. So I'll say all that as a warning. Be careful.
Can this save you money? Maybe, but I wouldn't do this as the primary goal to save money For me, it's more about I want to give my family myself and my employees access to all the ai and I don't want to pay for 15 million plans and have to manage all these different things. I want one interface, one place to go and I want control.
Now if you're worried about this, I will show you ways we can put in budgets with a tool I'm about to show you. It's so cool. You can put a budget in per person so they don't go over like you're stuck at 20 bucks a month.
If you use that 4. 5 all day, you're done, buddy. You're talking to Alama for the rest of the day.
Why is Alex's work so crappy after three? I don't know. Let's break this down.
What was this scribbly writing? Beautiful dude, I'm on a roll today. Let's keep going.
Now we're jumping into a very fun part of this tutorial and it's to solve kind of a big problem with open web ui. Check this out. If I go back to my settings where I added the open AI API key and my connections, I really only have options for two types of connections.
Open ai, API and oh llama. API. Oh, llama being the local option.
What about clo? What about Gemini? What about all these fun ones?
I want to try, the whole point of this was to try everything. Yeah, that's kind of a problem because you can't just plug in Claude right here or Anthropic. It won't happen.
This is where a tool I fell in love with comes in. It's called light. LLM Light.
LM is a proxy for AI or a gateway. If we go to the webpage real quick, they connect to so many ais. I think they say a hundred plus, right?
And that's exactly what we're going to do. So check this out. Open web ui.
All it's going to have to connect to is light lm and it does that just fine because it has an open AI compatible API. It does great. And then with light LLM, we connect everything else.
Open AI andro, which is Claude Gemini, grok Deep seek and no, not the one hosted in China. You can actually access an American hosted Deepsea on another service called Grok with a Q. Very confusing but very cool.
Now Light LM will be a proxy server that will install alongside open web ui. It's not scary. Trust me.
It'll take like three seconds. You ready? Get your coffee.
Let's install light lm. So real quick, we're going to access the same server we installed Open Web UI on if you followed along with me on the hosting your side, setting up A VPS right here in our portal where we're managing our VPS, we're going to access the terminal, which is super easy for us. There's a button right here, B browser terminal.
Go ahead and click on that. For everyone else, just access the terminal of whatever server you want to deploy this on. We'll deploy it via Docker, very similar to how we up open web UI on the other tutorial you watched earlier.
I said earlier too much. Alright, we're inside the terminal. I will have the commands below, but the first thing we'll do is use GI to clone the LM proxy server GI clone.
So lemme give me some room up here. There we go. Get clone and then the address light LLM.
Ready, set, clone. This will clone that repo from GitHub and create a folder for us that we'll jump into here in a moment. Little coffee break and it is done.
Type in LS to see our new folder. There it is. Type in CD and light LLM to jump into that folder we're in.
Now we're only two commands away. First thing we'll do is use nano type in Nano the best text editor ever and we'll edit the file, the hidden file env, just like that. And we're going to add two lines of config.
First we'll type in LLM, all caps master key that'll have that. Equal quotes, double quotes, SK dash something. Ideally you want it to be a randomly generated key.
Actually I'll just use Dashlane to do that for me right now. I'll just do digits and letters. We'll do 10 of them and you'll want to copy this down somewhere.
This will be your password to log into the server Once we build it, I just clicked out of my browser terminal. Good thing I copy my password. Alright, we'll close out with double quotes.
Hit enter and we'll add one more line of config. We'll add the lights LM salt key just like this and have that equal the same kind of starting point, SK dash and then a randomly generated string of characters. This will be used to encrypt and decrypt your L-M-A-P-I key credentials.
So I'll randomly generate some stuff real quick, shall copy all of this real quick. Put that somewhere safe, then hit control X, Y enter to save. And for most scenarios all we have to do is type in docker dash compose up dash D, ready, set, go.
And this is literally building our server. We don't have to worry about anything else except making sure we sip some coffee while it's happening. Now, while that's installing, let's get our API keys.
Ready? First we need our open AI API Key. Easy for me to say.
I normally like to create a new key for every service. So I'll create a new one called this light LM default project. Create it, copy it, get it ready.
And the same process you can repeat for anthropic, for the Claude models Gemini, for the Google based models. I'm just going to do anthropic for now and I'll grab Grok too. Grok being X ai, Elon Musk's ai, which is actually pretty amazing unfortunately I don't think the grok three is available on API just yet.
But I'll go and create a key and it's done. If you see something like this, you're solid. If we type in docker PS, because everything is running through Docker, we'll see all of our healthy containers running.
Now what we'll do is open up a new tab. Actually I need to grab the IP address of my server here, where to go? There it is.
Grab that IP address and in your address bar, go out to that IP address port. I think it's 8,000, what was it? Oh, it's 4,000 Port 4,000.
There we go. And then we'll click on lights, LM admin panel on ui, click on that. The username will be admin and then it'll be that master key you set up and the environment variable, the SK one N or N.
Now lots of bells and whistles. All we care about right now is doing a few things. First, let's go to models on the left here and then right here and the top menu, you'll see the option to click on add model, and then we'll add our first model.
Let's start with Claude. So I want to click on Anthropic and we could either choose all models, like just go crazy, select them all or be very specific. So maybe I only want the three seven latest and 2.
1 to compare how dumb it's. Then I'll add my API key here and add the model just like this at the bottom right, clicking on all models, you can see it sitting right there. 2.
1, 3. 7. And then here's the cool part.
This is where the proxying comes in. We'll go to the top left and click on virtual keys. We're going to create our own virtual API keys that can control so many things.
Check this out. We'll create a new key for now. We'll say it's owned by us, we don't need a team or anything.
We'll name the key. I don't know kids. So let's say we're making it for my kids and we'll say the models they can access are three, seven and two.
One checking out optional settings. You can add a budget, 20 bucks and this will be a monthly budget and you can do a lot of, you can expire the key. They have thing called guardrails, which we're not going to cover right now, but we'll go ahead and create the key and there's our key.
We'll copy it and now we'll add it to open web ui. So here we're in open web ui, we're at our admin panel on the connections and say, I want to delete my open AI API key. Delete.
I'm going to add. You don't have to do that by the way. Now I'm going to add my light.
L-L-M-A-P-I Key under the open AI API key. I feel like I've been saying open ai, API so much the base URL will be htt, P colon, wack, wack, local host port 4,000. So colon 4,000.
And then we'll put our API key right in here, just like that and click on, actually no, we'll test it real quick. Verify, connection, verify. And that's because they're on the same server.
Local host is right there. Click on save. And now if we go back and try to create a new, oh, there it's new chat.
Claude sitting right there. Oh, that's so cool. How you doing Claude?
Ah, love it. Check this out. I'm just going to show it to you right now.
I was going to wait, but click on add model. We can put Claude 2. 1 there as well.
Let's do a new chat. Actually, let's add them side by side and say tell me a riddle and they'll answer it side by side. How cool is that?
Now real quick, I'm not going to make you wait. I'm going to add open AI and grok. Now I added these models and now I have Aroc and four oh and oh three many.
But no one inside of Open Web UI will have access unless I give it access to those virtual keys. So I can edit my key, go to settings, edit settings, and add additional models. So I oh three many grok four oh save, and then now back at open web UI land.
I'm going to refresh and see if they show up. I want to do a new chat. There it is.
I rocked the party here. 4 0 0 3 mini. So now I've got four different ais and we'll add a llama in for fun too.
How many Rs are in the word strawberry? And now they're all answering except for O three. Many doesn't like it.
Claw got it, right, GR got it right. Four o got it right. And llama's dumb.
How cool is this? And over here on the light LLM side, you can add as many virtual API, keys as you want. Add those in the open web ui.
Actually check this out on the light LLM side, if I go to usage, it'll show me how much is being spent. I probably need some time to catch up with the other ones, but this is now my AI hub and this is where I'll control the budget. And then back in open web, UI, land.
Just a few things I want to cover real quick. First, my kids, let me add my kids to my team here. I go to settings, admin settings, and then users.
I can create groups. Let's create a group, call it kids. I'll go back to overview and create some users here, kid one and kid two.
I can go to my groups, add them to the kids group and here I can say what permissions they have access to. Can they access models? Can they access knowledge and prompts and tools, which is a whole world of things I can't talk about right now.
This video would be way too long. I'll click on save and we can also control who has access to what models. Let's say I only want them to have access to Claude three, seven.
It's the smartest. I can go in here to the model, click on groups and say the kids have it. Everyone else, sorry.
No, I can also do this. Give it a system prompt. You are a school helper.
Your job is to help my kids, help kids with their school, but you cannot do their work for them. Never let them cheat. Never write an essay or solve a problem.
You must guide them. And you can only talk about school related subjects, guardrails in place. Click on save and I'll just grab this URL real quick, open it up in a incognito window and log in as my kids kid1@hotmail.
com. Alright, I've only got access to one model. Write a paper for me about George Washington and there we go.
It won't write it for me. What is two plus two? Oh, it gave me the answer.
What is nine times seven divided by four? Okay, it'll help with math. Let's ask you something non-school related.
What is the plot of the movie? The Matrix. Oh, that's answering.
Oh, film studies class. Okay, got it. This is something my daughter would ask, so it just relates it back to school.
That's very cool. I like that. Now the best part is getting back to the users on kid one here who was just having the conversation.
I can click on chats and there it is, and I can jump right in there and see everything that was said, which for my employees, I'm not going to monitor that and I can turn that off for my kids. 100%. AI is nuts.
And you got to keep an eye on that kind of stuff. Now this video is way too long. I'm sitting here staring at the screen.
Can I talk about that? No. Can I talk about that?
No, it'd be too long. Let me know if you want me to make another video covering the ins and outs of Open Web UI because it has tools, prompts, functions, pipelines, image generation. Oh, it's so addicting and I would love to hear if you've done anything cool with this as well.
Now, there's one last piece of this I haven't shown you yet, and that's this A here right now it's just an IP address. You don't want to give your family and friends an IP address. Like, hey, go alto.
1 8 5 2, 8, 2, 2 4. That's the new AI server. No, that's terrible.
I'm going to walk you through how to set up a DNS name. We'll purchase it on hosting here. I'm going to walk you through how to set up a friendly domain name for this.
But we'll do that in a separate video right here. That's all I got. Thanks again to hosting here for sponsoring this video and I'll catch you guys next time.
Get control of yourself.
Copyright © 2025. Made with ♥ in London by YTScribe.com