so AI 21 Labs the brains behind the Jurassic language models has just dropped two brand new open- source llms called Jambo 1. 5 mini and Jambo 1. 5 large and these models are designed with a unique hybrid architecture that incorporates Cutting Edge techniques to enhance AI performance and since they're open source you can try them out yourself on platforms like hugging face or run them on cloud services like Google Cloud vertex AI Microsoft aure and Nvidia Nim definitely worth checking out all right so what's this hybrid architecture all about okay let's break it down in simple terms most of the language models you know like the ones used in chat GPT are based on the Transformer architecture these models are awesome for a lot of tasks but they've got this one big limitation they struggle when it comes to handling really large context Windows think about when you're trying to process a super long document or a full transcript from a long meeting regular Transformers get kind of bogged down because they have to deal with all that data at once and that's where these new Jamba models from AI 21 Labs come into play with a totally new game-changing approach so AI 21 has cooked up this new hybrid architecture they're calling the SSM Transformer now what's cool about this is it combines the classic Transformer model with something called a structured State space model or SSM the SSM is built on some older more efficient techniques like neural networks and convolutional neural networks basically these are better at handling computations efficiently so by using this mix the Jamba models can handle much longer sequences of data without slowing down that's a massive win for tasks that need a lot of context like if you're doing some complex generative AI reasoning or trying to summarize a super long document now why is handling a long context window such a big deal well think about it when you're using AI for real world applications especially in businesses you're often dealing with complex tasks maybe you're analyzing long meeting transcripts or summarizing a giant policy document or even running a chatbot that needs to remember a lot of past conversations the ability to process large amounts of context efficiently means these models can give you more accurate and meaningful responses or denan the VP of product at AI 21 Labs actually nailed it when he said an AI model that can effectively handle long context is crucial for many Enterprise generative AI applications and he's right without this ability AI models often tend to hallucinate or just make stuff up because they're missing out on important information but with the Jamba models and their unique architecture they can keep more relevant info in memory leading to way better outputs and less need for repetitive data processing and you know what that means better quality and lower cost all right let's get into the nuts and bolts of what makes this hybrid architecture so efficient so there's one part of the model called Mamba which is actually very important it's developed with insights from researchers at Carnegie melon and Princeton and it has a much lower memory footprint and a more efficient attention mechanism than your typical Transformer this means it can handle longer context windows with ease unlike Transformers which have to look at the entire context every single time slowing things down Mamba keeps a smaller state that gets updated as it processes the data this makes it way faster and less resource intensive now you might be wondering how do these models actually perform well AI 21 Labs didn't just hype them up they put them to the test they created a new Benchmark called ruler to evaluate the models on tasks like multihop tracing retrieval aggregation and question answering and guess what the Jamba models came out on top consistently outperforming other models like llama 317b llama 3.
1 45b and mistra large 2 on the arena hard Benchmark which is all about testing models on really tough tasks Jamba 1. 5 mini and large outperformed some of the biggest names in AI Jamba 1. 5 mini scored an impressive 46.
1 beating models like mixol 8 x22 B and command R plus while Jambo 1. 5 large scored a whopping 65. 4 outshining even the big guns like llama 317b and 45b one of the standout features of these models is their speed in Enterprise applications speed is everything whether you're running a customer support chatbot or an AI powered virtual assistant the model needs to respond quickly and efficiently the Jambo 1.
5 models are reportedly up to 2. 5 times faster on Long context than their competitors so not only are they powerful but they're also super practical for high-scale operations and it's not just about speed the Mamba component in these models allows them to operate with a lower memory footprint meaning they're not as demanding on hardware for for example Jambo 1. 5 mini can handle context lengths up to 140,000 tokens on a single GPU that's huge for developers looking to deploy these models without needing a massive infrastructure all right here's where it gets even cooler to make these massive models more efficient AI 21 Labs developed a new quantization technique called experts int 8 now I know that might sound a bit technical but here's the gist of it quantization is basically a way to reduce the Precision of the numbers used in the model's computations this can save on memory and computational costs Without Really sacrificing quality experts in eight is special because it specifically targets the weights in the mixture of experts or Mo layers of the model these layers account for about 85% of the models weights in many cases by quantizing these weights to an 8bit Precision format and then de quantizing them directly inside the GPU during runtime AI 21 Labs managed to cut down the model size size and speed up its processing the result Jamba 1.
5 large can fit on a single 8 GPU node while still using its full context length of 256k this makes Jamba one of the most resource efficient models out there especially if you're working with limited Hardware now besides English these models also support multiple languages including Spanish French Portuguese Italian Dutch German Arabic and Hebrew which makes them super versatile for Global applications and here's a cherry on top AI 21 Labs made these models developer friendly both Jamba 1. 5 mini and large come with built-in support for structured Json output function calling and even citation generation this means you can use them to create more sophisticated AI applications that can perform tasks like calling external tools digesting structured documents and providing reliable references all of which are Super useful in Enterprise settings one of the coolest things about Jamba 1. 5 is AI 21 lab's commitment to keeping these models open they're released under the Jamba open model license which means developers researchers and businesses can experiment with them freely and with availability on multiple platforms and Cloud Partners like AI 21 Studio Google Cloud Microsoft Azure Nvidia Nim and soon on Amazon Bedrock datab bricks Marketplace and more you've got tons of options for how you want to deploy and experiment with these models looking ahead it's pretty pretty clear that AI models that can handle extensive context windows are going to be a big deal in the future of AI as Oran from AI 21 Labs pointed out these models are just better suited for complex data heavy tasks that are becoming more common in Enterprise settings they're efficient fast and versatile making them a fantastic choice for developers and businesses looking to push the boundaries in AI so if you haven't checked out Jamba 1.