[Music] Elon Musk is once again making headlines with his latest Venture xai which has recently introduced grock 2 a new AI language model that's been getting a lot of attention and there's good reason for that Beyond its technical capabilities grock 2 stands out as one of the few AI models that operates with very little censorship the kinds of images people have been generating with it are proof of just how unrestricted this model is before we dive into the technical details let's take a moment to look at some of these controversial examples all right so launched just under 2 years after the company was founded grock 2 is grabbing attention not only because of musk's involvement but also because it's performing really well in an already crowded and competitive field it's been tested against some of the top AI models out there including open ai's gp4 Google's Gemini and anthropics Claude and here's the thing it's not just keeping Pace with these models in some key areas it's actually outperforming them one way to measure how these models Stack Up Is by looking at their ELO scores originally created for ranking chess players the ELO system has been adapted for comparing AI models 2 grock 2 has been doing really well on the lmce leaderboard which is a popular platform for these kinds of comparisons it's currently outperforming GPT 4 in several important benchmarks including GP QA which tests graduate level science knowledge and math which involves solving pretty tough math problems for example on the GP QA Benchmark grock 2 scored 56. 0% to put that in perspective gp4 turbo scored 48. 0% and Claude 3.
5 Sonet scored 59. 6% now these might seem like small differences but in the world of AI even a few percentage points can make a big difference in terms of understanding and problem solving abilities grock 2 also did well on the mm mlu Benchmark which stands for massive multitask language understanding scoring 87. 5% that's just ahead of GPT 4 turbos 86.
5% and Gemini Pros 85. 9% in Practical terms grock 2 is designed to be easy to use flexible and capable of handling some pretty complex tasks it's not just about generating text it can also handle real-time information pulled straight from X the social media platform that used to be known as Twitter this makes grock 2 particularly powerful for applications where having up to-the minute information is crucial or where you're dealing with fast changing real world situations along with grock 2 xai also rolled out grock 2 mini this is a smaller version of the main model designed to work faster while still delivering accurate results it's not just a strip down version It's optimized for situations where speed is key making it perfect for scenarios where quick responses are more important than having every last detail even though it's smaller grock 2 mini still holds its own in the benchmarks take the math benchmark for instance grock to mini scored 73. 0% that's better better than some of the other top models out there like Claude 3.
5 Sonet which scored 71. 1% this shows that even the light version of grock 2 can outperform much of the competition in tough areas like math and science benchmarks are really important in the AI world because they give us a clear idea of how one model compares to another grock 2 has been put through its Paces with a series of tough tests and the results are pretty impressive on the human evil Benchmark which tests the model's ability to generate correct python code Gru achieved a pass at one score of 88. 4% that's slightly lower than GPT 4 Turbo score of 90.
2% but it's still ahead of Claude 3 Opus which scored 84. 9% this puts grock 2 among the top performers en coding tasks showing that it's not just about generating text or solving math problems it's also about handling practical real world coding challenges grock 2 also shines in visual tasks on the math Vista Benchmark which tests the model's ability to solve math problem s using visual reasoning grock 2 scored 69. 0% that's well above gp4 turbo 58.
1% and even ahead of Claude 3. 5 sonnet which scored 67. 7% in terms of document-based question answering doc vqa grock 2 scored 93.
6% which is just shy of the top score of 95. 2% achieved by Claude 3.