New Llama 3.3 Shocks the AI World - Crushes GPT-4 and Costs Almost Nothing

4.28k views1502 WordsCopy TextShare

AI Revolution

Meta's Llama 3.3 is a groundbreaking AI model with just 70 billion parameters, delivering near top-t...

Video Transcript:

So Meta just Unleashed llama 3. 3 and get this it uses only about 17th the parameters of their previous 405 billion parameter Colossus just 70 billion but it still hits almost the same performance with that kind of efficiency we're talking lower costs smaller GPU demands and the power to turbocharge everything from everyday AI tools to immersive VR worlds so let's talk about it so first off l 3. 3 is meta's new multilingual large language model and the headline here is that it has 70 billion parameters that might sound huge and it is but what's really wild is that it's performing close to meta's previous giant llama 3.

1 model that had a staggering 45 billion parameters in other words llama 3. 3 gets you near top tier performance while being way more efficient it's like having a sports car that's nearly as fast as the ultra expensive model but consumes way less fuel and space Mark Zuckerberg himself announced that llama is now the most adopted AI model in the world meta claims it's had over 650 million downloads that's massive uptake especially considering how many developers are now building on these open- Source AI protocols open source is key here meta wants to be the backbone for tons of AI projects although open source might sound like they're just giving it all away remember that if everyone is building on meta's Foundation meta becomes sort of the underlying infrastructure that could mean a whole lot of Market influence over time um now alongside this AI push they're also expanding in VR they're working on making their VR tools the industry standard think about it if they control the leading AI tools and the leading VR tools they're kind of setting the stage for what the future of digital connectivity looks like maybe even the metaverse they keep talking about they're teaming up with third parties to do this because the more people rely on meta's tools whether that's for AI or VR the more meta becomes embedded in the future of digital interaction on the Practical side Zuckerberg also mentioned plans for a new AI data center in Louisiana and a new undersea cabling project to support all this meta AI apparently has around 600 million monthly active users but let's be real meta has over 3 billion users across its apps Facebook Instagram messenger WhatsApp and they've basically stuck their AI assistant right into all of them they're also nudging people to try AI image generation features and other AI powered experiences sure that might drive up the usage stats but how meaningful is that usage if people aren't exactly flocking to it voluntarily this is a fair point maybe AI assistants inside social apps don't feel Supernatural yes you can generate neat images or ask questions but for many everyday users what's the compelling reason still it's early days and maybe the real value is going to be down the road especially as VR becomes more mainstream all these puzzle pieces AI VR wearables the undersea cables are building towards something bigger okay let's move into more of the technical weeds llama 3. 3 is open- source and multilingual supporting English German French Italian Portuguese Hindi Spanish and Thai it's been trained on a whopping 15 trillion tokens compared to llama 2's 2 trillion token training set that's a huge jump in training data leading to better performance on reasoning tasks coding benchmarks stem problem solving and even trivia another neat trick is that this new model supports super long context Windows 128,000 tokens that's about the length of a decent sized book meaning the model can handle really long documents and keep track of what's going on over many pages meta's also done a lot under the hood to make sure llama 3.

3 can run efficiently for one the new model uses something called grouped query attention which helps with memory efficiency and speeds things up during inference that's the process of generating answers or predictions when you're actually using the model rather than training it gqa improves scalability and makes it cheaper to run according to meta llama 3. 3 can be super cost offensive for developers generating text at as little as 1 cent per million tokens way cheaper than the likes of gp4 or Claude 3. 5 in many scenarios there are some incredible Hardware savings too the older 405b parameter model required ired a massive amount of GPU memory uh up to nearly 2 terab of GPU memory for inference in contrast the 70b Llama 3.

3 might only need as little as tens of gigabytes we're talking about potential GPU load reductions that could save developers hundreds of thousands of dollars in upfront GPU costs plus huge ongoing power savings imagine trimming down from something like $600,000 in GPU costs to something more manageable because of the reduced memory footprint for Developers researchers llama 3. 3 also comes under a specific Community license agreement it's mostly free and open source but there's a catch if your organization has over 700 million monthly active users you need a commercial license directly from meta also if you use it you must credit meta with something like built with llama and follow their acceptable use policy this policy aims to prevent harmful content generation cyber attacks or any activity that breaks the law so it's open but with certain guardrails meta is not just throwing this model out there without any safety measures they've really leaned into safety and Trust the model uses things like supervised fine-tuning sft and reinforcement learning with human feedback rlh to ensure it's aligned with helpfulness and safety standards they've built a bunch of safeguards like llama Guard 3 and prompt guard to keep the model from doing harmful things or spitting out unsafe content they've also done extensive red teaming where Security Experts try to trick the model into doing bad stuff to find weaknesses and fix them they've looked at risks from child safety issues to Cyber attack enablement and tried to mitigate them they've even factored in environmental considerations training llama 3. 3 took a lot of GPU hours about 39.

3 million GPU hours on h180 GB Hardware this generated around 11,390 tons of CO2 equivalent emissions but meta claims they used renewable energy to achieve net zero emissions for the training phase they've also publicly shared how much energy it took so others can understand the environmental cost of these massive models performance-wise llama 3. 3 kicks but on a range of benchmarks often beating similarly sized models on mmu which tests knowledge across various subjects it hits around 86% accuracy on math benchmarks like math it scores around 77 % for coding tasks such as human aowl it gets a pass at one of 88. 4% which is pretty darn good it even handles multi-lingual reasoning tasks well scoring about 91.

1% on mgsm while it might not beat the biggest models on every single test like gp4 en coding tasks it gets impressively close in terms of ecosystem you can grab llama 3. 3 from meta's site hugging face GitHub or other platforms developers can integrate it with various tools like Lang chain or weights and biases in it it's Cloud friendly on AWS gcp or Azure you can also fine-tune it for your own applications with meta's torch tune Library it's flexible enough for a wide range of scenarios everything from natural language understanding to coding assistance to maybe even VR experiences down the road safety remains a big talking point meta wants developers to take responsibility for how they use the model for example if you integrate llama 3.