DeepSeek R1 vs ChatGPT o1 - Ultimate Test

20.78k views3618 WordsCopy TextShare
Skill Leap AI
Prompts used in the video: Multi-Step Reasoning & Logic You have a row of 100 light bulbs, all init...
Video Transcript:
I compare deep seek R1 reasoning model versus chat GPT o1 reasoning model side by side I ran it across 10 different reasoning prompts and the results were completely surprising to me now I'm going to specifically use prompts that require reasoning so you wouldn't use these type of models for every single task you would use it when it needs reasoning when it needs to think through a problem step by step before it just gives you an answer now a couple of things to mention before I start with the prompting on the left side here we're at chat. Deep seek. com this is the Deep seek official chat website but to use R1 you have to make sure you turn this on right here if you don't turn that on it uses the V3 model which is comparable more to the chat GPT 40 model I'll save that for a different video right now I want to really focus on the reasoning model now the second thing I want to mention is deep seek is an open- Source large language model meaning you could actually download versions of this and run it privately and locally the version on the website is not going to be private the data will be stored by this company here and I will cover the terms of service at the very end we're actually going to use both of these models and compare them side by side so you see how deep seek uses your data and how chat GPT uses your data but chat gpt1 especially if you use it for work especially if you're going to upload data here with this little Clipper to analyze this one opts out of training data by default because it is a paid upgrade it is really designed I'm using the teams plan but any of their paid upgrade UPS you out of data training this deep seek the free version and the chat PT free version by default do use your data for training their models now keep in mind the access to chat gp01 requires a plan that starts at $20 a month deep seek is free now for the very first prompt we're going to run both of them through a logic and multi-step reasoning question you have a row of 100 light bulbs all initially off when you pass through the row for the first time you toggle every bulb turn them all on on the second pass you toggle every second bulb on the third pass you toggle every third bulb and so on up to the 100th pass which bulb end up turned on now they're both going to go to work deep seek is a little bit slower right now just because it's going completely viral so the server load they're getting on the Deep seek website is pretty significant so I'm not going to compare it with any type of speed right now and I want to show you why these models are a lot different than the regular chat GPT or other large language model models they actually go through a process of thinking through all the steps and this whole gray part for example inside of deep seek is all the steps that he had to think through before he gave you an answer and if I go all the way down look at all this thinking here it had to do and here is the actual answer in white so this whole area I could actually leave it collapse down and only see the answer and here's the answer he gave us and chat gpt's answer is also right over here so 1 4 9 16 25 36 49 yep they both got this right and chat GPT the 01 model you could also click here to kind of see the step-by-step process it looks like there's only a couple different things that it thought through and in this case it took 45 seconds of thinking so you can see I had a lot more chat pt01 here thought for 10 seconds okay this one is going to be a little bit of a math problem with a Twist to it a horse cost $50 a chicken cost $20 and a goat cost $40 you bought four animals for a total of $140 which animals and how many of each did you buy and again I have R1 turned on and chat GPT the reasoning model could be turned on couple of different ways you could pick it from a model dropdown again you need the Plus or the teams plan or the Pro Plan here I'm using the teams plan and then you could also turn it on here to use thinking here and you could go ahead and send it out that way but you just need one of those turned on for it to go through the thinking process here okay this time actually deep seek is working a little bit faster because it's showing you the thinking process here this one hides the thinking process until you click on it 17 seconds this one is still actually thinking so it looks like deep seek thinks longer okay this is very interesting deep seek got a different answer than 01 deep seek says two valid combinations two horses and two chickens or one chicken and three goats really nice formatting down here with the answer and 01 says buy one chicken and three goat which is one of the answers but it did not come up with the second combination now I'm pretty sure this is the right answer here and let me just show you the thinking of deep seek here I mean look at the thinking process that he went through and how long did this take 76 seconds he went through like every single step here and ran so many different math equations here to come up with that okay so that's obviously a point for deep seek let's get to the next one now this one is domain specific problem solving and this is related to physics and again I got the answer key from someone that specializes in physics just to be able to verify it because I actually have no idea how to solve this one but I do know the answer so let's send this one out okay this one again is the right answer and chpt also got the right answer let me just show you the difference in the way they had to think through it though look at Deep seek right here it's incredible how much it has to go through to get to the answer so Chantry PT does get to the answer a lot quicker and if I click over here pretty much no details on the steps it used to actually give me a lot more details when I was looking at this before but now 137 seconds versus 7 Seconds came to the right one but again you did miss one so far but they both got this one right let's get to the next one now this one's a famous one in the thought experiment category which came first chicken or or the Egg let's see what we get for that okay let's kind of go through this in real time this is a classic question I've heard before first I know the chicken came from the egg wait maybe Evolution plays a role here but then again the egg is a chicken egg if it contains a chicken alternatively if you define a chicken egg as an egg laid by a chicken then the chicken would have to come first this is really interesting he thinking about pretty much every scenario that it can be deep seek says for scientific standpoint the egg came first as the first chicken emerged from an egg laid by nearly a chicken ancestor okay well this can be playful straightforward explanation is eggs existed before what we formally classified as chickens so the egg came first so they both said the egg came first and I think deeps just does a little better job explaining it in simpler terms but same answer this is I think a pass for both okay this one I want to test their stepbystep reasoning here and it's kind of a trick question the restaurant bill for three people was $45 they each paid $15 so they paid 45 in total the waiter put $5 in his pocket and gave $5 back to them therefore each ended up with paying $14 which sums up to $42 it's kind of a trick question let's see what we get here okay so 80 Seconds deep seek chat GPT 7 Seconds let's get to the bottom here to see the answer chat GPT says in short there is no actual money missing and that was what I was testing for I was trying to confuse it and it says all the money is accounted for okay and deep seek the missing oh strange formatting here all the money is accounted for so same conclusion here this one just you know gets to it much faster it looks like 80 Seconds versus 7 Seconds okay this next one actually will be a variable answer so there's no right or wrong I just want to see how it deals with something that is more ambiguous here consider the sentence I did not say she stole the money interpret the sentence in at least four distinct Ways by emphasizing different words and explain how each emphasis changes the meaning now they both kind of came to a similar conclusion here so deep seek says emphasize on I emphasize on didn't emphasize on she and emphasize on stole chat GPT emphasize on I same thing emphasize on didn't same thing emphasize on she same thing but the last one's a little different emphasize on my money instead of stole and again both accurate this is not right or wrong I just wanted to see if we could actually figure out this kind of an ambiguous question that has multi- answers to it and they both passed this one now this next one seems very obvious to us as humans but usually large language models have a very hard time with this one which one is bigger 9 .
11 or 9. 9 let's see if it gets this one right wow chat GPT got that one wrong 9. 11 is larger than 9.
9 deep seek the right answer 9. 9 okay so far deep seek has not failed once CH GP now has failed in two different questions and here's another classic question how many RS are in Strawberry let's see if we guess this one right again very easy for us to figure out but hard for large language models to figure out okay strawberry contains three instances of R in position three eight and nine okay interesting few seconds versus 24 seconds this time but they did both get a right I'm going to actually try to confuse it a little bit let me start a new chat and I'm going to misspell strawberry this time okay I misspelled it this time so there's one two three four RS now in Strawberry oh look at this thinking right here S no T no R yes and then these other RS yes yes yes okay the letter R appears four times in the misspelled word strawberry and then right here the word strawberry which contains three consecutive RS contains a total of four RS okay so this didn't tell us it was misspelled but it did get the same conclusion four and four 7 Seconds versus again 21 seconds so clearly Chad GPT thinks a lot less in less time just gives you the answer quicker this this one thinks longer but I rather get the right answer and wait now a couple of things when it comes to the functionality of these websites one thing that I really like with this deep seek website is the fact that you could turn on search and then you could check for something that requires search and requires deep thinking because otherwise the training data he has is actually older than even chat gpt's training data back to like 2023 but with this search icon it does have upto-date information chat GPT on the other hand does have search but you see it's grayed out right now because search for some reason doesn't work with the 01 model now the one really nice update that I made a different video about is the website prop plexity doai which is an AI search engine does allow you to use the reasoning model with R1 and the reasoning model with o1 if you want to combine it with search so that does solve the limitations with 01 but to get access to this this is another $20 a month upgrade that has nothing to do with the subscription from 01 that doesn't give you this this is a totally different company that uses that model but there is a way around combining 01 with search which is true perplexity right now and hopefully open Ai and chat GPT roll that out so you could do the same kind of thing I'm showing you inside of R1 with search now deeps also has this other problem with how often the server is busy but again it's going completely viral right now the usage is through the roof so hopefully this is something that gets solved or other companies add deep seek into their platform because it is open source they could technically do that and download it and provide servers for it so it doesn't have this issue but while I was recording this video this message right here the servers are busy was pretty consistent now I just want to try one last thing here so chat gp01 has another model called 01 Pro so I switched my account right now to chat GPT Pro which is $200 a month but with $200 a month you get 01 unlimited 01 the other one actually has a limit which again is probably a plus to R1 because you could just use the website as long as it's up and running but this 01 Pro is supposed to even beat 01 in this $200 plan so let me run the couple of questions that 01 missed here let's see if we actually could solve those now this one got it right 9. 9 is larger than 99.
Copyright © 2025. Made with ♥ in London by YTScribe.com