I love small and awesome models

24.91k views1916 WordsCopy TextShare
Matt Williams
As one of the original Ollama team members, I'm excited to dive into the latest update and share my ...
Video Transcript:
llama 3. 2 it just got released a couple days ago and folks are pretty excited about it a few YouTubers were sponsored by meta for this release notably I wasn't and looking at some of the videos that were sponsored well it seems like they made a few odd choices but anyway meta next [Music] time I'm available my name is Matt Williams and I was a found member of the olama team I left in January and now I'm focused on building up this YouTube channel where I tend to look at all things local Ai and llama 3. 2 definitely fits into that category so I thought I'd take a look at the model from a few different perspectives first I'll just start asking some questions then we'll summarize some content and then we'll look at the tools capability finally I'll take a look at one of the ways I actually use models I think it should be kind of fun so let's first take a look at the announcement blog post from meta llama 3.
2 revolutionizing Edge Ai and vision with open customizable models right up at the top it talks about the 11b and 90b vision models as well as the 1B and 3B text only models the vision models are also really good at text the next bullet point points out a context length of 128k tokens for the smaller models but doesn't mention a context size for the larger models we see that the vision models do the exact same thing as the 3. 1 models in addition to supporting Vision scroll down and we see a bit more about the image reasoning use cases supported by the vision models such as document level understanding including charts and graphs captioning of images and visual grounding tasks such as directionally pinpointing objects in images based on natural language descriptions this sounds pretty cool because this has always been a problem when adding docs with graphs to a rag system of course they include some benchmarks but we all know my opinion on those so we'll just get right past that there are some cool demo gips in the article further down so it's definitely worth taking a look at but let's move on to actually using it of course I'm going to be using AMA which is the best way to run models locally you can find out more about olama at ama. com click the button in the middle of the page to download and install it it's super easy and I've done tons of tutorials on how to get started so you should definitely check that out then run oama pull llama 3.
2 colon 1B to grab the 1 billion parameter model okay so what questions do we want to ask I can't look at any model without without asking my favorite why is the sky blue and that is blazingly fast turn on verbose and try that again 127 tokens per second is kind of amazing for an answer that is that good now a lot of folks like to go straight to the riddles and logic puzzles to test a model in fact they will grade a model purely on its ability to count the number of RS in a word something that models were not traditionally designed to be able to answer it's like me grading your worthiness based on whether you can hold your breath for more than 2 minutes that's a skill that isn't even relevant for most of us though I used to work with a guy whose Dad held a world record in free diving which required that skill they even made a movie about him oh while I'm mid tangent let me remind you to like And subscribe I like to say I'm working on my first million subscribers and only need about 967 th000 more to get there your help to achieve that would be greatly appreciated let's stop the tangent and go back to the model now a much more relevant question is something you might actually ask I have a new coffee shop on Banbridge Island come up with five great photo ideas that I can take for an advertisement with that would showcase the shop and it comes up with some great answers how about generating a few tweets to help get folks to the shop again awesome answers but some folks need a Riddle question so let's try that classic three murderers in a room one guess what it gets it wrong it's a stupid question and I'm not all that bothered by it let's go back to a question that is actually useful I have a 5-year-old daughter who loves asking questions that I learned the answer to 40 years ago but have since forgotten and so getting a model to answer this is fantastic fantastic though they all do pretty good at it explain photosynthesis the fact that I get an answer in less than 3 seconds is the magic here let's go to another favorite question I see a lot but you know most times I see folks ask the question and they get what they think is the wrong answer but they don't seem to realize the problem is mostly how they ask the question often when we ask a model to do something we make assumptions that the model knows what we're thinking asking a model to take two numbers and decide which is bigger is actually a bit ambiguous because depending on the type of number either one could be bigger if it's a version number then 8. 21 is the right answer but if it's a floating Point number then 8. 8 is the right answer so I like to be a little bit more specific at my questions and it gets it right but if you ask the right question most models tend to get it right not always but they tend to get it right let's try another the sum of two numbers is 10 and the product is 25 what is the difference between the two numbers explain each step in your solution and we see it does a pretty great job at this finally let's ask the RS in Strawberry question even though it doesn't really say much about the abilities of the model it gets it wrong in a way that I have haven't really seen before it's so wrong but who cares I'll exit out of here and try running the 3 billion parameter model we won't do all the questions but a few should be fun let's start with the r question and it's wrong in the usual way the difference between the two numbers is nice and it gets it perfectly right the three killers question is interesting getting the right answer but with terrible project to get there so let's move on to a different use case one of the things that blog post talks about was summarization so let's try that out now the 3 billion and 1 billion models support a contact size of 128k but AMA configures models to be 2K by default so let's create a new model that uses a larger contact size I'll set it to 16k save the model file and then run the olama create command now I can run that new model summarize this video script down to a single paragraph and then I'll add the script for my recent video on environment variables and the result is pretty great another thing it's apparently really good at is tool use now this specifically refers to the newer less reliable approach versus the tool use approach added about a year ago I look forward to the newer approach being as good as that older one but anyway we can go to the blog post on tool use on the page and it points to some sample code for Python and JavaScript so I'll grab the JavaScript code and paste it into my VSS code editor now one of the interesting things about using large language models is that the answers can be different each time you ask so just because you get an answer that's right or wrong one time doesn't mean you'll always get a similar answer so I'll update this code to repeat the call 10 times what this example does is make a call to the model and provide info about a tool that returns flight information now the actual function is using mock data so it doesn't actually query any real site but it has the same ultimate results we have a question hardcoded of what is the flight time from New York to LA the model should tell the app to call the function with NYC and LAX as parameters and get back a Json blob with a flight time of 5 hours and 30 minutes and then turn that into some good text let's try it first with llama 3.
2 3 billion parameters in 10 tries it gets it right every single time often even with larger models this hasn't been the case so that's pretty impressive let's try it with the 1 billion parameter model and in 10 drives it got it wrong every single time that said I tried this when I first started playing with it and it got it wrong 20% of the time so hey but even larger models get it wrong a lot let's try it with mistol small and that's a 22 billion parameter model look at the results it failed four out of 10 times I'll try with one other model fire functions V2 in other tests with this model I would get good answers 80% of the time but now it seems to be getting it right 100% of the time okay so now I want to change gears a bit and take a look at at one of my actual use cases for models to help help create new content I write my scripts in obsidian and one of the plugins I use is called companion it offers the ability to complete the text I'm actually writing I've modified it a bit to use a different prompt but you can see I'm using the 3 billion parameter model with the larger context now I'll paste the first few paragraphs of the script for the video on environment variables and every time I pause it continues writing for me when I see some text I like I can just press tab to accept each word and then I can continue it's not always perfect but it works well often enough that I find it really helpful so that's a quick intro to the new llama 3. 2 models I was hoping that the vision models would be available by now but they aren't the AMA team is working hard on making them compatible and it looks like they're getting really really close so hopefully by the time you see this video it's out and I have to do an update what do I think of these small llama 3.
Copyright © 2025. Made with ♥ in London by YTScribe.com