Run AI Models Locally: Ollama Tutorial (Step-by-Step Guide WebUI)

12.16k views2424 WordsCopy TextShare

Leon van Zyl

Ollama Tutorial for Beginners (WebUI Included) In this Ollama Tutorial you will learn how to run Op...

Video Transcript:

you will learn everything you need to know about using Olama, from basic setup to downloading and running open source models to more advanced topics. As an added bonus, we will also install Open Web UI, which is a beautiful user interface for interacting with your models. And yes, this also includes RAG, so you can upload and chat with your own documents.

First, let's download and install Olama. Go to olama. com, then click on download, then select your operating system, and click on download.

Then execute the file that you just downloaded, and from this pop-up, click on install, and Olama will now be installed on your machine. You can start Olama in one of two ways. The first option is to run the Olama desktop app.

You should then see the Olama icon in your system tray. The second option is to open your command prompt or terminal, and then in the terminal run olamaserve. But because I'm already running Olama, I do receive this message, which we can simply ignore.

We can verify that Olama is running by entering Olama in the terminal, and you should see a list of all possible commands if Olama is working correctly. Now let's get familiar with some of the most common commands that you will use with Olama. First, we can view all the available models by entering Olama list, and at the moment we don't have any models installed.

So let's have a look at how we can download our first model. Back on the Olama website, let's click on models. Here we can search for models, or sort by featured models, or the most popular models, or the newest models.

We can see the details of a model by clicking on its name, and from this page we can see that there are two versions of this model available. The 9 billion parameter model, and a 27 billion parameter model. The hardware requirements for running these models will greatly depend on which size model you download.

For most of us, the smaller models, like 8 billion or 9 billion, will be perfectly fine, whereas the larger models are typically meant for enterprise grade hardware. From this drop down, we can select the model that we are interested in. Let's simply stay on the 9 billion parameter model.

Scrolling down, we can see some additional information about the model, like its license, and some other information. And we can see the instructions for downloading and running the model on the right hand side over here. And if we had to change this model, you will also see that this command changes on the right hand side.

But let's simply swap back to the 9 billion parameter model. In order to download and run this model, you can simply copy this command and paste it into the command prompt. This will both download and immediately run this model.

But instead, I just want to show you how you can download this model without running it. Let's copy the name of this model, and then in our command prompt, let's enter "olama pool" and then the model name. And let's run this to start the download.

Our model was now successfully downloaded. Now this is optional, but I am going to download two additional models. And this will come into play later on in the video, but you do not have to follow this step as well.

So I'm going to download the JMR 2 billion parameter model, as well as a super impressive llama 3 model. So now let's run "olama list" again. And this time we will see the models that we just downloaded.

In your case, it might only be one model. To view the details of any specific model, we can run the command "olama show" followed by the model name. Take note that it's not necessary to enter "latest" as well.

Running this will show you the base model that was used for this model, the amount of parameters, the context size, and some other important information. We can also remove a model by running "olama rm" followed by the model name. And done.

If I run "olama list" again, you will notice the JMR is now gone. Now let's move on to the fun stuff. We can run our model by typing "olama run" followed by the model name.

Let's use the JMR 2 model. Now let's test this out by sending a message like "hello there" and we do get our response back from the model. Excellent.

Now let's enter something like "Write the lyrics to a song about automating my life using AI" and our model is hard at work writing our song for us. Great stuff. It's important to know that our conversation history is being stored for the session.

For example, if I say something like "my name is Leon" I can then ask our model what is my name and the model will be able to recall information from our conversation history. Now let's have a look at a few special commands that we can run within this chat window. Let's have a look at the "set" command.

We can access the "set" command by entering front slash set. This will show you a whole bunch of attributes that you can set about the session. For this video let's have a look at the "parameters" command as well as the "system" command.

We won't have a look at all of these but I do want to show you a very important parameter. So let's enter "set parameters". Actually it should be "set parameter" and within parameter we have this temperature value that we can set.

Temperature is a value between 0 and 1. 1 means that the model will be completely creative where 0 means that it will be factual and stick to the system prompt. So for something like a chatbot we might want to set the temperature to something like 0.

7 or if you want a chatbot that is factual for instance a math tutor then set the temperature to a low value like 0 or maybe 0. 2. I'm going to set the temperature to 0.

7. Now let's set one more value and that is the system message. We can use the system message to give a personality or role to our model or give the model very specific instructions on how to respond.

For instance let's call "set system" and in quotes something like "You are a pirate your name is John". Let's set this. We can also display these values by entering front slash show.

Here we can show information about the model that we're currently using, the model's license, the model file which we will have a look at later on in this video, the parameters, the system message etc. For example let's show the parameters and here we can see that the temperature was set to 0. 7.

Let's also show the system message which we set to "You are a pirate your name is John". Great! We can also save these changes as a brand new model.

We can do that by entering front slash save and the name of our new model which I will call "John the pirate" and let's press enter and this has now saved our changes as a brand new model. To exit the chat we can simply enter front slash buy and now we're back in the terminal. So now if we enter "Olama List" we can actually see the model that we just created.

So if we ever wanted to continue our conversation we can enter "Olama run John the pirate" and let's press enter and now we can see the conversation history and if we send our model a message like "Who are you? " our model will respond like a pirate and it knows that its name is John. How cool is that?

I do want to show you one more way in which you can create your very own models. Open up a folder on your machine and then create a new file. It really doesn't matter how you create this file so feel free to use any text editor that you want.

I'll simply use VS Code then within this folder create a new file and give it the name of the character that you want to create like "Mario" as an example. First we need to start by telling "Olama which model we want to use as the base model" then we can set any of the parameters just like we did in the command prompt by first entering parameter by the name of the parameter that we want to change followed by the value that we want to set like 0. 7.

Then we can also set the system message which we need to set within three of these double quotes like so. Then we can enter something like "You are Mario from Super Mario Bros. Answer as Mario.

" Let's go ahead and save this file and I'm actually going to close the text editor. Now within this folder we can simply open the command prompt by either right clicking on this folder and clicking on "Open in terminal" or in the address bar "Enter CMD". So now that we have the command prompt open in the same folder as this file we can create our model by running "Olama create" followed by the name that we want to give this model which I'll call "Mario".

Then enter -f followed by period front slash and the name of our model file which we called "Mario". That is this file over here. Now let's press enter.

Let's confirm that our model was created by running "Olama list" and indeed we can see that Mario was created. Let's also view the details of this model by running "Olama show Mario" and for this model we can see that the JM2 model was used as the base model and then other information like the temperature as well as our system message. Let's also run this model by entering "Olama run Mario" and let's say in something like "Who are you?

" and that sounds like Mario to me. We still have a lot to cover but first if you're finding this video useful then please hit the like button and subscribe to my channel. Also let me know down in the comments which open source models you prefer and why.

Now let's move on. So far we've only interacted with our models in the terminal which might be perfect for most use cases but I'm sure that you would prefer to use a way more attractive UI than this and we will have a look at installing the web UI in a minute. The specific topic might be slightly more technical but bear with me this is an important topic for any developers watching this who might be interested in calling the "Olama models" from their own applications.

"Olama" provides several API endpoints that you can use to create messages, create models and pretty much everything else. I'm not going to cover all of these APIs in this video as I do want to move on to setting up the web UI but I will leave a link in the description of this video to this GitHub repo. Let's simply have a look at creating a completion using this API endpoint.

To demonstrate this I'll simply call the endpoint from Postman by changing the method to "Post" and then enter the following URL. This is assuming that your instance of "Olama" is running on port 11434. Let's switch over to body and let's have a look at the JSON structure that we need to pass in.

Within the body let's enter the model name as "jmar2" and the prompt as "y is the sky blue". Let's run this and you will notice that the response is coming through as chunks and this is because the response is streaming back to us. We can disable streaming by adding another property called "stream" which we can set to "false".

Now when we run this again we will receive the response in a single structure. Now let's move on to installing the web UI for Olama. For this we will install open web UI which was previously called "Olama web UI".

Thankfully this is very simple. There really is only one dependency. You need to have Docker installed on your machine.

To install Docker go to docker. com then under products go to Docker desktop. Then go to download and download Docker for your operating system.

And finally execute the installer. After installation is completed you should be able to find Docker desktop on your machine. After installing Docker you should be able to start the Docker desktop and you will be presented with a screen similar to this.

And that means that Docker is up and running. So all we have to do then is run a command that we can find on the open web UI page by scrolling down to the installation section. So here we've already installed Docker and now we simply want to copy this command over here.

Then we can open the command prompt or terminal and paste in the command that we just copied. Let's press enter. Now if I switch back to Docker desktop I can see that open web UI was added and it's currently running on this port.

And if I open this I will be presented with the sign in screen. Let's click on sign up then enter your name. I'll also enter my email and a password.

Let's create this account and now we can actually start chatting to our models. We can select from all the available models by clicking on this drop down and here we can see JMR2, LAMR3 as well as our custom models. After selecting the model we can actually start chatting to it like hello and this seems to be working.

We can also chat to documents by uploading files. So I can click on more upload files and I'll simply upload this tesla financial statement. Then I can ask questions like what was the revenue for 2023 and the model will use RAG to answer questions from this document.

If you would like to see how I used Olama in building an advanced AI application in combination with Flow-wise then check out this video over here.