Introducing Claude 3.7 & Claude Code
Anthropic has unveiled Claude 3.7, their most advanced hybrid ...
Video Transcript:
anthropic has just released Claude 3. 7 Sonet the most intelligent model to date this is what they're calling a hybrid reasoning model what's unique with this model is it can both Produce near instant responses or you can enable extended step-by-step thinking as they describe it is one model but effectively two different ways to think in addition to this language model they're also releasing a tool called Claud code which is now in research preview in terms of the benchmarks 3. 7 Sonet is a significant improvement over its predecessor if we just look at the chart here basically across the board we can see that clae 3.
5 Sonet even without any extended thinking that it really ranks up with the performance of all of the reasoning models where this model really shines is in its extended thinking capabilities in areas like math physics instruction following coding and other tasks even though this model just came out today I'm going to go out on a limb and say that this is probably going to be the best coding model and the preferred model for Developers over the coming weeks CLA 3. 5 originally came on the scene in I believe it was June of last year and ever since that model came out that model really enabled a ton of different companies to reach product Market fits we saw cursor really take off we saw tools like bolt. new really take off and a large part of those tools success was at its core Sonet 3.
5 this isn't to Discount what those teams have done but having a really strong Foundation model with the coding capabilities it just allows you to build build out these applications that you weren't previously able to and with that in mind we can still see what a significant leap 3. 5 to 3. 7 is now the other thing that's interesting with this is in developing it we've optimized somewhat less for math and computer science competition problems and instead shifted Focus towards real world tasks that better reflect the needs of our users instead of focusing on these code competition tasks that might not be as applicable to software Engineers day-to-day next in addition to the model what they also announced was Claud code Cloud code is an agentic tool that you can run within your terminal and here's just a quick demonstration of it once you have it installed you'll be able to run Claud this is going to work through the anthropic API once you're within the root of your project it will not only be able to answer questions about your repository but also Implement changes if you have a change that involves multiple files it will go and search for those files read through those files and then ultimately update those files the tool can also run terminal commands if you're trying to compile it push a hub or run a series of tests you'll be able to do all of that within the terminal app and I also plan to make another video on this specifically in the coming days it currently is in a limited preview it is first come first serve at time of recording I will put that within the description of the video able to successfully get access to this just hours after trying but with that being said I'm not sure exactly how many seats that there are for this tool so in terms of system requirements there are pretty humble system requirements here to set it up you can run through the installation steps go within the root of the project that you want to run the tool on you can launch Cloud code and be able to set up the authentication through the anthropic API then you'll be Off to the Races now another thing to note within the cloud.
you will be able to go over to GitHub and you can directly add the context of your application with the input here that is a really novel use case even if they're private you'll be able to access and select all of the different contexts that you want within the repository now you are also going to be able to access this model from the artifacts Paine the other great thing with the model is you will be able to select from the API when and when not to turn off in extended thinking mode so you will be able to set as a developer the quote unquote thinking budget to control precisely how long Claude spends on a particular problem they mention that this extended mode is not an option that switches to a different model in a separate strategy instead it's the very same model it just gives itself more time to think and expend more effort in coming to an answer now another thing that a lot of people will appreciate is that they are going to give a visible thought process they're not abstracting the thought process away they have decided to make the thought process visible in raw form they mentioned that this has several benefits so trust being able to observe the way that Claude thinks it makes it easier to understand check its answer and it might help users get better outputs alignment in some of our previous alignment science research we've used contradictions between what the model inwardly thinks and what it outwardly says to identify when it might be engaging in concerning behaviors like deception an interest it's often fascinating to watch Claude think and this is something that I have to say when I originally saw the Deep seek R1 model that was probably one of the most interesting pieces of using the model is actually being able to read through the thought process and understand why it's making the particular decisions that it's decided on another interesting piece with the news release Here is that 3. 5 benefits from what they call Action scaling an improv capability that allows it to iteratively call functions respond to environmental changes and continue until its open-ended task is complete one example of such task is using a computer Cloud can issue virtual Mouse clicks virtual keyboard presses to solve tasks on a user's behalf now in terms of the OS World Benchmark so this is its ability to actually use the computer that computer use agent that we saw they released in the fall we can see that this is a significant improvement over the previous model now it still is between about 25 and 30% there still is a ways to go but but I'd imagine that over the coming months and years that we'll increasingly see this Benchmark as well become saturated now another interesting and really funny example that they had is claude's ability to play the game Pokemon we can see some of the previous models and how they ranked on the particular toss and how far it could get within the game Claude 3.