hi I'm Matt I want to help you get up to speed on what olama is all about in this free course you're going to learn all the different aspects of olama and what you can do with it this first video will just get you started in the most basic way we'll install a llama verify it's running download a model try out a prompt and find and download another model it's not going to be everything but that'll come as the rest of the course is released so let's get started first first thing we want to do
is visit the ama.com web page you can also get to this by going to ol. a because all the cool kids have a URLs okay let's take a quick look at what's on this page at the top we have a link to the Discord you can join and ask any questions you have and you'll probably get a decent answer GitHub has the source code and documentation for olama which you can review but if you have an issue with olama it's best to keep the question questions on the Discord and not the GitHub issues GitHub issues
are for actual problems in the project and not really support issues start out the question in the Discord and if you need to escalate GitHub is a great place to go the search box will let you search for both official models and user contributed models we'll talk more about models soon next to that is the link to the community models since I'm logged in you can see my username on AMA this is is all model related and we'll come back to that down at the bottom we can see a link to the docs which is
just a folder on the GitHub one other interesting link is to meetups and these are events held around the world with the AMA team keep an eye out there may be one close to you at some point in the future so right in the middle is a link to download olama click that to get three choices Mac Linux and windows I'll go into more detail on this in another video but just choose your platform and follow the instructions I'm on a Mac right now so I'll click the download button and then run the installer so
once it's installed there are a few different things you can do to ensure that olama is running the easiest is just to run olama run 53 that's f as in Phi and the number three the reason I chose that model is that it's short and easy to spell and small so we can be up and running quickly you probably don't have the model so you'll see it download the various layers of the model you'll learn more about layers later in this course if you're on a Mac or Windows and the olama service wasn't running just
running olama run will start up that service if however you're on Linux and the olama Run command fails you may not have the service running you can refer to this page to get a little bit more information about how to get it started it's always best to let the service run that piece rather than running it locally in a command prompt that you start so at this point you may still have to wait a little bit longer for that model to download so let's talk for a moment about what a model is a model is
made up of a number of pieces the biggest of which is the weights file this is a collection of nodes and they have connections between them called weights and biases those weights and biases combined are referred to as parameters a node is often a a concept maybe a word or a phrase and when the model is trained the parameters connect each of these different concepts together by different amounts and sometimes they get a little closer and other times they get a little further away as the model is trained more and more two nodes won't just
have one weight between them they might have many combinations of Weights depending on the context of what the node does although it feels like magic this is how much of the world's knowledge can be stuffed into a relatively tiny little file how big that file is depends on how the parameters are represented when the file is originally developed it's probably going to use 16 or 32-bit floating Point numbers these can be incredibly big and precise but if we group those numbers into smaller sets we can abstract them down to much smaller numbers while retaining an
incredible amount of precision the most common amount is four and that's what's referred to as 4-bit quantization there'll be a more advanced video in this course that goes into a lot more detail about quantization in the future when each parameter is represented by a 32-bit number llama 38b or 8 billion parameters will take roughly 32 gigs of vram to run because there are eight bits in a bite four bytes perameter so 8 billion Time 4 adds up to roughly 32 GB there's some extra overhead as well but that's the simple way of calculating it if
we quantize to 4 bits per parameter that gets close to four to 5 GB of vram required which is a whole lot more accessible there are a few other components to the model and we'll cover that later in this course so after all that your model should be downloaded and ol will have dropped you into the repple reppel is a coding concept and means read eval print Loop this is a place that you can enter some code interactively and it'll be processed right away and in the ol reppel we can enter a question and get
it answered immediately so try asking a question why is the sky blue and within a few seconds the model will spit out or generate an answer the answer is streamed out token by token a token is a word or common part of a word and there are a number of factors that go into how long that generation will take you can continue the conversation and the model will remember much of what was said limited by the size of the context window that the models supports often this contact size is 248 tokens by default inama models
but that's easily modifiable if your conversation goes longer than 248 tokens the model will start to forget the earlier parts of the conversation and if you restart the CLI or reppel that entire history will be wiped often users will work with olama through a thirdparty UI open web UI is a common one as is Misty and so many others one thing some of the uis offer is better ways of leveraging memory so you can continue those conversations for longer we'll see that in future topics now back at the command line type slby to exit out
of the reppel let's go to the ama.com website and click on models right now the list of models is sorted by featured try sorting by newest one of the more recent models at the time of this recording is intern LM which attempts to be better at math and math reasoning that's not actually saying that much because models tend to be terrible at these things and aren't the best tool to use thankfully it's also good at all the usual things models do so click on the link for intern LM we have a few bits of info
on this page first there's a short description of the model we see how popular the model is as well as how recently it was updated then there's a drop down with different variants of the model it defaults to the most common one which will be a four bit quantize model to the right is the command to run to get this model below that is the hash of the model and the overall size below that we see the various layers of the model there's that layer term again and there will be more on that later in
this course in the drop down with the different tags or variants find the one that is 7B chat B 2.5 Q2 _ K so copy the command to run this model and paste it into the terminal if you're still running 53 then type slby to exit and then run that command you'll see it download the model which is a bit larger than the last one when it's done try asking what is a black hole and soon after you will get an answer describing a black hole in a way that's a little different from 53 style
but what's most incredible about this is that this one has been quantized from the original 32-bit floating Point number to a TW bit quantization you will usually see much better answers from the 4-bit model but it's pure magic this even works at all while we're still in the repple type slash question mark you'll get a list of all the commands you can run then try typing SL Mark shortcuts this shows us different keyboard shortcuts you can use in the reppel though I still prefer exiting with slby so exit the reppel however you prefer now type
olama LS to see a list of your two models olama PS will show us which models if any are currently loaded models stay in memory for 5 minutes by default and several can be loaded at once depending on your Hardware we'll look at concurrence in more detail in a future video if you want to remove one of the models you can use olama RM and the model name there is so much more you can do with AMA but this video is already long enough watch out for the next video in this course coming in the
next few days if you have any specific questions about what's covered in this course join us on a brand new Discord that you can find at this URL thanks so much for watching goodbye