NEW Claude 3.7 Sonnet Is Simply The Best Coding AI (Claude Code Testing)

84.02k views4430 WordsCopy TextShare

The AI Advantage

Making apps has never been easier. Join the community for personalized help with getting started: ht...

Video Transcript:

all right ladies and gents I want to show you something here Claude 3. 7 came out it's probably the best llm for code generation and it came out with this thing called Claude coder this is some serious competition for a lot of the the AI powered IDs out there all of them charge subscription fees this thing is available just through Claude so rather than talk about it let me just show you through this I'm genuinely excited about this thing because I just built this and like 2 minutes and I'm not saying it's revolutionary that I built this in like 2 minutes okay this has been possible for months now okay with since 3. 5 son it essentially building things like this is not that hard with the appropriate applications but look at that expenses income reports savings tips generated by AI all of this was generated in the cloud interface from a one-hot prompt like this okay and then I took all the code dropped it into a folder and rather than going through like the I don't know 30 step process of actually setting up all the files and splitting it all up I just used Claude coder here Claude code um that's what it's called it's a research preview it came with this release we'll talk about it in a second and I just navigated to the folder and told it yo bro what did I tell it um you know set this all up for me I told it something like that and then it gave me an error message when I ran it and I just gave it error message and the whole thing works okay before this I spent about 1 hour trying to make it work manually and I was still not done this is something that only apps like curser could have done up until now and correct me if I'm wrong I'm not a you know professional software engineer here but I feel like I have a pretty good grasp on all these apps that are coming out and you know I'm a novice coder myself like I selftaught myself python like 3 four years ago but essentially it just ran the whole thing and voila here we are working web app that is you know with authentication and everything else look I can log out I can register with new accounts let's go through the step by step no cuts no edits I want to show you how this works and I want to use CLA coder to actually improve upon this application in the middle of this video how about that so I'm just going to actually start there okay I'm going to open up this terminal so you can see how this clot code thing works and again this is not revolutionary right these things have been accessible through application like I don't know uh what do we have like curser or wind serve or lovable now or pythagora um all of these apps that are kind of a um well not really a co-pilot like a like a full IDE with Native AI integrated could do this this comes for free they just they just put this out there it's a research preview okay so to to play around you only need to install this I'm not going to go through the installation uh basically if you've dabbled with code just a little bit you'll manage to figure it out they have great documentation on it over here on their blog um yeah you just go here on this link under CLA coder and then I'm sure you'll be able to figure it out maybe a little bit of AI help would help um just said help twice but yeah here under joining this preview here are all the steps you need nodejs and get installed and then you basically run this and navigate to the folder that you want to be in and then you just say cloud and that's it so once you have it installed you literally just open up a terminal with the folder that you want to that you want to work in and you just say CLA and then boom CLA is working with the code and then point being you can do things like this you can go in here and say my favorite PR prompt of all time make it better and or like let's just say make it better would be good but I don't want to mess up too much here I'm just going to say make it prettier and now you know Claude 3.

7 Sonic is going to figure out what that means Implement all the changes change my folder structure whatever whatever it has to do in all of these folders that it just created for me to make it prettier and basically here for the terminal I'm talking to it in human language and natural language rather than like prompting it and and giving it code again nothing revolutionary what is revolutionary though I suppose is this new model so while it does this um let's look at this new model this is just is going to keep working for a while it's going to update my code base and we can look at the new version of the app in a second here okay but what is brand new is this new model Claude 3. 7 Sonet and Claude code so I started with Claude code because I thought this was really sick but Claude 3. 7 Sonet is their new model if you've been following the space you might have known that 3.

5 Sonet was up until 01 Pro or like 01 basically considered the best model to generate code with right it wasn't even close everybody actually using it and building apps was like Hey all of them are great but Claud 3. 5 Sonet for code generation that's where it's at now we have 3. 7 Sonet and is much better now I should say that like you know since the release of 3.

5 Sonet people have been talking about 01 and 01 Pro and deep seek R1 and then now O3 mini um as Superior in terms of code Generation Now there were still debates right it wasn't black and white it it wasn't like people kind of like turned on CL and we're like oh we don't use that anymore many people still up until like yesterday were still using Cloud 3. 5 Sonet to generate their code many of these apps were still running with it to write code like cursor Etc um but now we have just a way better version of the best model for code previously so now it should be pretty clear the benchmarks they speak a clear language if you compare it to 3. 5 Sonet well on all of these benches it just beats it look at this this is 01 this is 03 mini on the high setting it gets 49.

3% on software engineering bench um and this one with the custom scaffolding which essentially I looked into into this basically what this means is that they added some prompting and they let it think more than 30 times by default it thinks up to 30 times they let it think like up to 100 times in most cases it just fought like 40 50 times but basically they let it reason more and they gave it some extra prompting to like store its answers and it reached 70% versus open AI o mini high at 49. 3% noted it should be noted that this does not include grock free because grock free didn't publish this yet and their API is not out yet so we can't run these benches ourself and it doesn't include o free full because that isn't released yet so fair enough so compared to all the other options right now killer model in terms of coding okay and this is all like coding focused but I think this video is going to be also relevant for non-technical folks as you should know what's going on here this is wild and maybe you can even use it for yourself agentic tool use look at that amazing um all these other benchmarks I think this is the main one it just wins on almost everything compared to some of the competition and like you know Gro is like strong competition and some things like visual reasoning o Gro has it high school math competitions o Gro completely kills it over here you know but we don't have all the Croc groc benchmarks on things like graduate level reasoning this actually comes out on top look at that the top the top number uh with the scaffolding comes out on top so legitimately everybody loved 3. 5 Sonet already this is a massive like this is the main comparison I wouldn't even compare it to the others I would compare it to 3.

5 Sonet because it's not just about the benchmarks it's also about like The Vibes it's about the taste test if you so want and on that it was always killing it although some of these were far ahead in benchmarks people were still using 3. 5 Sonet and now 3. 7 Sonet is just a g ginormous upgrade even in terms of benchmarks now how does it perform in practice well only time will really show but what we can do today and what we can do now instead of running all of these random prompts of like generate me visualization of this dashboard of this like that's all well and good I think there's value to that but like what we're going to do now is we're going to actually build application for ourselves actually did that in advance in preparation for this video and we're going to improve the application with this thing they called claw code and CLA code basically simple I'm going to mute this you can check out this video but once you have it installed which I don't know for people who have the very basics of coding or like if you know how to use a terminal if you know how to run a few basic commands like this shouldn't take you more than like three to four minutes um and once you have it you basically just right click any folder um and then you say services and you just open up a terminal a window and you just say CLA boom that's it right and then once you have that you can talk to it in natural language just like you would talk to me or a fellow engineer or just like you would talk to curser or one of the other apps I keep throwing up my hands a lot because I'm like this is big but it is so that's what Claude code is okay working with it as simple as uh we pointed out I maybe let me just round out this this blog article so we can close it and leave this behind uh you know they're saying like basically we H assistance in 2024 now this is the first Milestone where we actually have collaborators that we work together with AKA you know other companies name this call this agents and then eventually we're going to have Pioneers which like come up with breakthrough solutions by themselves other you know open eye CES like AGI um so yeah there you go this is the blog post very interesting let's look at what this thing does in practice and how it performs versus some of the competition here and a few more details in the end like the the outputs are massive like it just gobbles up 20,000 words like it's nothing okay I the same prompts I just want to say this a prompt the same prompts I've been testing this between chpt and CLA you tell chat GPT to write you a 50,000 word essay and it comes back with like a thousand words and it's like well bro that's that is all I could do Claude created 21,000 I can show you the example conversation here if you want Claude created 21,000 uh to words and was like hey this is all I could do but but if you just type continue I'll keep going and then you type continue and it generates another 20,000 words for you insane um I was running some promps here as you can see I was obsessing over it over the past few hours we by the way just we spent three hours in uh in my office hours that I hold every second Monday right now in the community just playing with this thing so it was really fun but um here it is so check it out it just wrote it just wrote this okay look at how long I have to scroll it just wrote all of this in a one-hot prompt I believe correct me if I'm wrong but I believe this is the longest output length uh web AI assistant llm that we have if you use it for the API by the way you can output 128,000 tokens in one shot that's another interesting thing here with like which which is like wow so if you use the anthropic console um you can just output you know even more but this as you can see will just throw this into the tokenizer and then we'll move on to the app okay but I just wanted to cover this point here if we throw it into the tokenizer this was the chat GPT output by the way 1,300 tokens 7,000 characters and this right here 20,000 tokens 110,000 characters my apologies I I said where it's it's it's tokens um so I guess the the tokens would be like I don't know 16 17K if my napkin MAF here is right anyway 110,000 characters in one shot versus what um these are the 01 Pro and chat GPT and these These are the 01 Pro and chat GPT 40 outputs okay 7,000 characters um versus 6,000 characters versus Sonet 7 uh 3.

7 at 110,000 characters okay so output length is insane which means it can create these large code bases and create large apps for you in one shot and then you know if if that's not long enough it tells you like hey I can't write 500,000 characters um but just tell me in the end somebody told me like just tell me continue and I'll continue yeah I just told it continue basically and then it kept going there you go so right here you said you can write continue to keep the chat going okay so you get the point massive massive output really good at coding what does it look like in practice though let's see so it went ahead and did something here what did it do while I was talking about all this okay it came up with a bunch of improvements shows you all the code it changed the app now has a more professional cohesive look while maintaining full functionality users will find the interface more engaging and intuitive all right so while I was blabbering away here this upgraded the app let's see how that went can I just refresh here yeah look at the visual upgrade okay so I'm going to wait I had a username was it test one I think it was test one um no okay so I'll just register a new account how about that create a new account say test two um give it some random password create account and there we are how about that that is a upgraded interface right so these are the types of things that you could only do with like cursor up until now or you had to piece it all together manually which meant like updating every single file it was basically a pain that's why everybody's using these apps like cursor and the competition that I kind of like named this is free though right I just installed this from their page that I showed you here like you literally just install these two things and then like you install it with this command and then on your computer you can just say cloud you could go into a folder a thing you built and you just say cloud and then CLA does things for you pretty insane like I mean what am I supposed to say we're doing this live sort of right no edits just results insane look more saving tips and then I can and various things so I'm going to like just briefly no thinking involved I'm just going to give it random numbers um okay test two random numbers maybe change the date you can see all of this is working like this works as well or as better as as code generation with Sonet did um whatever add some of those and then add some expenses uh one you have categories like this is amazing it created a personalized expense Tracker app for me with login management with a with a database in the background um okay this this might be too much whatever it doesn't matter you get the point you know okay so add all of those and then if I look at the reports voila okay obviously this one is way above but like you get the point right and then it does like forecasting for you and everything and where you'll be with your savings rate and everything obviously these numbers are like a mess now but you get the point you get a beautiful dashboard you get your top expenses and then based on those you have saving tips now I think the one thing that didn't work is this AI implementation but again I could easily tell it to hey like implement the anthropic API I would have to mess with the API key which I don't want to do right now here but you get the point like if something doesn't work you just follow up here and it just works and this is free there's no subscription and Fric just put this out and this model is so damn good and outp puts something so long that some of these other apps I'm not saying they're redundant like there's certainly a lot of uses for them but like if you just want to build something quickly and just try something out and if you don't want to pay a subscription fee then there you go clot code pretty pretty awesome okay so I think that's a pretty neat demo what else could we do we could like follow up to do random stuff I mean I don't want to make this video too long but I want to show you the process that I created this with because it's as simple as could be right it's this one I prompted it to build a personal finance tracker using Python and webbased dashboard users should be able to log expenses and now hold up we're going to compare this to 01 Pro and Gro okay so users should be able to log expenses categorize spending and visualize Trends with met plot lib this app should include an AI feature that suggests ways to save money based on spending patterns so how about this just for fun I'm going to follow up with um the AI feature doesn't generate new recommendations um and what I'm going to do um I mean let's just follow up it it should probably ask me for for my API key which I'm not going to have here but that's fine I think you get the point it just generated all of this in one shot okay it gave me one file which was yay long impressive I just downloaded this file threw it in a folder and with Claude code I told it set all of this up for me and it created the file structure it did everything and when I hit play when I started it it basically it didn't work it actually gave me an error and then I just copied the error that I got in my browser into Cloud code hit enter and it was like ah sorry fixed that for you and then it worked so it was like the simplest process just like you're used to but the point is this thing works this thing works okay um basically here I was trying to set it up manually before I was like okay what folder structure do I need and it tells you you need to run all of this create all of these folders let's look at the comparison what does this look like inside of uh yeah here was um generating some prompts to run so that's where I have this from I was running a deep research but basically if I compare it to 01 Pro 01 Pro FS for three minutes and comes up with yeah like okay you need this project structure you need to set this up okay cool then you need to set up a database then you need to set up your flask app then you need to do this you need to do step number seven you can you need to put it all together right by the time I have done this I'll I'll build you a house with my bare hands and I'll start a family with multiple children by the time you put all of this together manually as a no be as a coding novice like this is ridiculous nobody's going to be doing this I mean I I guess you can but like come on the point of AI is supposed to be easy right not complicated Gro what did Gro tell us Gro basically goes in and it's like hey here's all the requirements okay that you asked for cool cool cool and then again we're like here's the implementation plan install all of these packages right so it doesn't give you the commands it's like you got to install these bro and you got to know what I mean by that this is the project structure I want from you then start SL flask then you know do all of this again like step number 2,745 implement the visualizations um Implement these you know fetch the EXP expenses for the current like embedded in the dashboard I mean Gro don't get me wrong Gro is amazing I've actually been using it over the weekend and like hats off like holy great model but this takes a lot of efforts to implement whereas Claude 3. 7 was just like yo here's everything and then inside of Claude code I just like navigate to the folder until it go so um let's see what is it doing here this implementation provides diverse helpful Financial recommendations so did all of this as I was talking again right um added variety implemented refresh functionality add the advanced analysis didn't even want an API key how does that work I don't know I'm going to refresh though let's see how our saving tips I was testing to right so you know not to was it test to not to yap on too long here um let's round this video out but the saving tips let's see this work better yeah now they refresh there's more variety like it improved it it did it right it's that simple and you don't need a subscription you can just run this it's amazing okay so I see I think you see the point we looked at the comparison we looked at the token output we looked at clot code we looked at the model itself and how good it is at generating stuff by the way I want to show you one more example which is kind of a funny one I think this is sort of uh as as Chris and the community pointed out it's sort of a Easter egg when you ask at one of these standard um problems like you know how many RS are in Strawberry it's like click the strawberry to find out and it creates this and then you're like huh wait what what is going on here and they're like free so this is what this is what you get from cloud 3. 7 versus you know um I don't know if I run this inside of 01 Pro it will probably be a bit drier so the combination of of like anthropics 3.

7 and they're just like the taste test The Vibes of like 3.