DeepSeek - The Chinese AI That Crashed The Markets

99.77k views4668 WordsCopy TextShare

Matt Wolfe

Let's talk about DeepSeek and DeepSeek-R1. Discover More: 🛠️ Explore AI Tools & News: https://futu...

Video Transcript:

there's one new AI advancement that has the tech World sort of spinning right now we're getting articles like this a shocking Chinese AI advancement called Deep seek is sending us stocks plunging we can see on January 27th the day I'm recording this video Nvidia lost over 177% of its value according to leor here the release of deep seek made Nvidia stock crash by 177% or 465 billion dollar mark andreon one of silicon Valley's most prominent investors said deep seek R1 is one of the most amazing and impressive breakthroughs I've ever seen and as open source a profound gift to the world so in this video I want to do my best to try to explain what deep seek is why the stock market is freaking out about it talk about some speculation around it talk about what I believe is going to be the long-term end result of something like deep seek and even show you how you can use it yourself if you want I want to give you the whole picture of deep seek so that you're looped in on the biggest story in the AI world right now as well as have all of the necessary context to speak intelligently about it and form your own opinions on it to really understand deep seek R1 and why people are freaking out about it we need to go back to a research paper that came out last month in December of 2024 called Deep seek V3 now deep seek V3 was a large model with 671 billion parameters however it used what's called a mixture of experts model meaning that it didn't use all of these parameters every single time it was prompted in fact it only used 37 billion activated parameters for each token what made this model really really special is this line right here despite its excellent performance deep seek V3 requires only 2. 78 million h800 GPU hours for its full training to put that into perspective according to perplexity GPT for's training required approximately 60 million GPU hours again compared to just 2. 78 million h800 GPU hours and when open AI did it with gp4 they were using really high-end a100 gpus from Nvidia these h800 gpus that deep seek was using was because China actually has restrictions on what gpus the US can send them so Nvidia developed these h800 gpus to make them compliant and actually be allowed to send them to China but they're not nearly as powerful as the gpus that the us-based AI companies have access to so they were able to train this model about 95% faster than what something like GPT 40 was trained on with less powerful gpus than what companies like open aai had access to and then when we look at the benchmarks here we can see deep seek V3 is this blue dashed line GPT 40 is this sort of Darker yellow line here the second to last line and Claude 3.

5 Sonet is this final line here and in things like math and coding this deep seek V3 scored pretty close I mean in math it did a lot better than GPT 40 and about on par with Claude on the MML U which tests for a variety of tasks for large language models it was the second highest only behind Claude 3. 5 Sonet in math it scored higher than pretty much everything and in this code Force Benchmark it crushed the other models and in swe bench which tests how well AI does on solving problems on GitHub it was only barely out scored by Claude 3. 5 Sonic so this deep seek V3 again required 95% less compute to train and got results on par with GPT 40 and Claude 3.

5 Sonet all while being open source and publicly available however around the same time this came out in December we got access to models like 01 01 Pro open AI showed off their next models in' 03 and so some of these benchmarks didn't look as exciting because people were cons considering 01 and 03 the new state-ofthe-art models and these are being compared to the sort of last generation of state-of-the-art models so why are people freaking out about it all of a sudden now this week well last week deep seek released some new research in their deep seek R1 now deep seek R1 uses deep seek V3 so the model we were just looking at that was really fast and a lot less expensive to train on lesser gpus it's using that as its underlying model however this model went through a new fine-tuning method on top of the existing V3 model so if we read the abstract here it says deep seek R1 a model trained via large scale reinforcement learning without supervised fine-tuning as a preliminary step demonstrates remarkable reasoning capabilities basically they asked it a whole bunch of questions that they already knew the answers to and had it double check itself against the existing answer key essentially now this is a super oversimplification but that's kind of how the reinforcement learning worked it was unsupervised reinforcement learning so it would ask it for example a math question the model would try to figure out the answer on its own and then double check its response to essentially like an answer key of the known answers it did that with math it did that with coding it did that with the various skills that they wanted it to specialize on now the other thing that makes R1 Stand Out specifically is that it actually uses Chain of Thought prompting at the time of inference so when you put a prompt in you'll actually see it think through and even correct itself it might think through an A logical way of doing something and then go actually this might be the better way to do it and then think through that so it has this reasoning process that it goes through right after you ask it a question now that's not actually specifically detailed in the research paper but we can see here that the template requires deep seek r10 to First produce a reasoning process followed by the final answer and as I recently L mentioned we saw 01 from open AI we saw demos of 03 making deep seek V3 feel a little bit less impressive but when you combine the fact that V3 was very very inexpensive to train on lesser Hardware than what we have in the US and now with this new R1 model that uses reinforcement learning to fine-tune the model as well as a sort of Chain of Thought reasoning when you give it a prompt now we're actually getting this open source model to give us results just as good if not better than what we're seeing out of open ai's state-of-the-art closed model that's why people are freaking out so check out The Benchmark comparison here once again the blue with the little lines through it is deep seek R1 this dark gray bar here is open AI 01 and this light blue line on the very right was deep seek V3 the model that sort of preceded this R1 model we can see in pretty much every single Benchmark it either outperformed or performed just as good as open ai's 01 model so it's about on par with what we're getting out of 01 for code it's beating 01 in math in sort of general purpose use it's about on par with 01 and in this ability to solve GitHub problems it is leading the pack open AI 01 Clos Source model minimum 20 bucks a month to use trained on thousands of A1 100s or h100s from Nvidia and now we've got this deep seek R1 model that was trained on lesser gpus in way less time that does just as good and that freaks people out now coming back to these headlines a shocking Chinese AI advancement called Deep seek is sending us stocks plunging like we saw Nvidia had a big drop today the thinking behind this drop is that well maybe we don't need nearly as many gpus as we thought we did to train these next level AI models if all of the big research companies that are training AI models can now do it at 5% of the time and cost as before and still get like 01 level results why are people going to need to buy as many gpus and this had a ripple effect we also saw meta and Google and Oracle and most of the big tech companies had a drop as a result today because of this what makes this even more fascinating is this is basically a side project of this company according to this tweet from Han Xiao I'm not sure if I mispronounced that or not he's saying that the company that owns deeps is a Quant company they've been working together for many years already they're super smart guys with a top math background and they happen to own a lot of gpus for trading and Mining purposes and deep seek is their side project for squeezing those gpus so they bought the gpus for investments in crypto and Quant trading and things like that and apparently they had more power than they needed so they started training their own models Han here followed up saying that nobody in China even takes them seriously so it's not that Chinese AI team are lean and great and can do such great things but it's only deep seek that's lean and mean Chinese AI companies are just as fat and heavy on marketing just like their American counterparts but I actually do think Nvidia is going to recover I think my personal opinion is that people are starting to get worried that Nvidia was getting overvalued and they saw this news as a chance to get out a few people probably started getting out it caused a little bit of a panic more people panicked and got out not Financial advice but I think it's going to recover there are a few things I want to share about deep seek that make them a little bit more ambiguous and also some counterarguments that I want to share that make me think that selling Nvidia right now was probably a little bit misguided let's start with some of the controversy around it according to this Investopedia article here Analyst at City Bank expressed doubt that deep seek had achieved its results without the most advanced chips they maintained their buy rating on nvidia's stock and said they don't expect major US AI companies to move away from using its Advanced gpus Alexander Wang the CEO of scale AI also sort of disputes whether or not they actually used as few of the gpus that they said they used and whether or not they actually used h800 the sort of dumb down versions of the h100s or they use something else so he says he thinks it's closer to 50,000 more powerful Nvidia Hopper gpus or h100s but believes the company can't disclose the truth due to us export controls on AI chips again I mentioned this earlier but the reason they're claiming they used 800s is because the US basic basically limits the amount of compute power that are available in the chips that we sell to China so Nvidia built these h800 which are less powerful in order to still be able to sell these things to China but according to Alexander Wang here and also the City Bank official they actually belied that they were using a lot more gpus than they claimed to be using and a lot more powerful of gpus than they claimed to be using they're just claiming they used h800 to not get themselves in trouble there's also been some rumors around that maybe they didn't start from scratch maybe they used like a llama model as a starting point and then trained on top of that from all of my digging and research there's no real weight that I can find to those claims other than the fact that sometimes some of the prompts will claim that the model was created by open AI or if you asked it to troubleshoot something for you it might give you instructions on how to troubleshoot something in chat GPT but the reality is it was likely trained on just a ton of open internet data and there's a lot of chat GPT and open AI instructions publicly available on the internet therefore a lot of that was probably in his training data just by default and the way they collected the data there's this site manifold. markets here where you can bet on random things and they asked did deep seek lie about the number of gpus they Ed in training of V3 right now it's saying there's a 38% chance that they did so seemingly most people don't believe that they actually cheated with this this but as of right now everybody's just kind of taking deep seeks word for it like we haven't seen receipts or anything now I summed up my overall thoughts in this expost here most people are saying the dip is because models can be trained with way less compute now and that's not good for NVIDIA and that's most likely the reason for the dip but here's my counter arguments I just went over this one many analysts claim that deep seek either trained on much more powerful gpus but can't talk about it due to restrictions or they started with a different set of model weights like llama where the expensive part of the training had already been done this is just speculation but it is fairly widespread speculation I also believe that if we know we can use less compute to train fairly powerful models people will still throw way more compute at it to train even more powerful models so my second Point here if in fact it did just get much cheaper to train 01 level models with far less compute many companies will likely still throw more compute at it if we can train this level of model with this low of compute imagine what we can train if we 10x or 100 Exit and then finally the point that I think is the most important point and after I said this I noticed that a whole bunch of other people said it likely before I did if it is actually a whole ton cheaper to train New Foundation models that really means many of the big companies like open AI have even less of a moat than we thought it opens the doors for many new companies and many new open- source models to pop up which all need compute the lack of compute needed to train a single model seems like it will counterbalance with more companies buying gpus because now they too can create their own Foundation models specifically tailored to their needs essentially maybe companies will buy less gpus per company but this could be counterbalanced by a lot more companies getting into the game due to lower barrier of entry and of course after I posted this I saw this post from Gary tan posted several days before my post say do people really believe this if training models get cheaper faster and easier the demand for inference actual real world use of AI will grow and accelerate even faster which assures the supply of compute will be used and this was in response to somebody's saying China's deep seat could represent the biggest threat to us Equity markets as the company seems to have built a groundbreaking AI model at an extremely low price and without having access to Cutting Edge chips Satia nadela the CEO of Microsoft pointed out javon's Paradox strikes again as AI gets more efficient and accessible we will see its use Skyrocket turning it into a commodity we just can't get enough of this is that same three that I was making here if we take a peek at what javon's Paradox is here on Wikipedia The javon's Paradox occurs when technological advancements make a resource more efficient to use thereby reducing the amount needed for a single application however because the cost of using the resource drops overall demand increases to the point where Total Resource consumption actually Rises rather than Falls so basically if we can do more with less compute that doesn't mean people are going to buy less compute they're going to buy more compute to do even more with less compute and also the barrier to entry just got lower for more companies to develop their own models so I feel like in the end this will be a net win for NVIDIA but again this is not Financial advice I would take everything I'm saying with the grain of salt I'm just sort of digging through all the resources I've come across and trying to put the puzzle pieces together for you now anybody can use deep seek right now there's multiple ways to do it you can go to deep seek. com and play with it straight on their website you can click Start now it'll log you in through a Google account and if you want to use the R1 model you click this button that says deep think R1 that will make sure you're using R1 I'll tell it to invent a complex logic problem and then solve it and when I do that you can see it actually says it's thinking and you can actually see it think in real time okay I need to invent a complex logic problem and then solve it let me start by brainstorming etc etc wait I remember there's a classic puzzle type where there are three types of people another angle wait here's another idea let me outline the problem but it makes it more complex alternative like you could just see it thinking through all of this stuff as I'm talking through it here this is what makes R1 different than V3 again the underlying model that this was built on was that V3 that I talked about in the very beginning this R1 is the one that introduced this extra thinking through the problem as well as the reinforcement learning fine-tuning process okay so that was wild it actually thought through this process for like a good five minutes or so you could see all of the thinking that it did here and it thought for a long time 208 seconds so I guess closer to like 4 minutes it then created its own logic problem and then went on to solve the logic problem but that's not the only way to get it as of right now deep seek is also the number one app inside of the free app store on the iPhone so if you want to use it on mobile you can get it there as well all of this news and everybody talking about it has actually caused it to pass chat GPT now if you do run into issues trying to actually use deep seek this article also came out on Business Insider deep seek temporary limited new signups citing large scale malicious attacks I don't know if the is still ongoing I didn't see any errors or messages when I tried to log into deep seek but the article does say deep seek said only users with a china-based phone number could register for a new count a measure taken because it had recently faced large scale malicious attacks apparently the issue has sort of worked itself out as of the time of this recording but just know that it could be up or down a little bit if you are trying to log in and use it there's also a couple ways to use distilled versions of deep seek now a distilled version is using a smaller underlying model so instead of us using deep seek V3 as its underlying model it might be using something like Quinn 7B Quinn 14b or one of the Llama models if you head over to the console.

gro. com you can actually use deep seek in grock they're using a distilled llama 70b model so that underlying model that it's using is llama 70b but it's using r1's sort of thinking ability on top of it and because grock is insanely fast you get results really really quickly so if I ask the same question and tell it to invent a complex logic problem and then solve it and then submit it using grock we can see that it is thinking but it's just going really really really fast cuz that's just what grock does when it's using grock's Cloud gpus it just blows everything else out of the water in terms of speed now that it's done we can see all of the thinking through that it did here it's a little bit less formatted than the main web version but you can see here's where the thinking starts we scroll all the way down here and we can see that it gave us a complex problem and solved its own problem and it did the whole thing in just a few seconds and then finally you can run it completely locally if you want I recommend a tool called LM studio for this it's a free tool that makes it really easy to download and add models so I downloaded LM Studio here if I want to add new models I just come to this discover button and then I can just type deep seek up here in the top it'll find all the different versions of deep seek that you can download and use you find the one you want to use and then you click this little download button to grab it onto your computer and then you could run the model locally on your own computer right now I'm using the Deep seek R1 distill Quinn 14b model so the underlying model is the Quinn 14b for this one let's give it the same prompt create a complex logic problem and then solve it and we can see it's got its whole thinking box where it goes through and thinks through the whole problem on its own we can see it thought through the whole process it actually thought for one minute and 55 seconds to get through this whole process here when I ran it locally now I am using it Nvidia 5090 GPU so I'm pretty much using the most top-of-the-line consumer GPU you can get and it did it all in about 2 minutes we can see how much it actually thought through before finally at the end giving us the logic problem as well as its answer to the logic problem and it did it at a rate of 63. 4 two tokens per second the nice thing about using LM studio is once you've downloaded the model you still need to be connected to the internet to download the model but once the model's downloaded on your computer you could unplug the internet turn off your Wi-Fi and this would have given me the same response in the same amount of time I can be completely offline it's sending nothing to the cloud if I was worried about privacy or data protection or anything like that I can run these models completely offline with no issues using something like LM studio and I would know for certain that none of my information is actually getting back to any sort of cloud provider or anything like that and so that's how you can use deep seek right now if you want to again You've Got Deep seek.

com you've got the deep seek mobile app you can run it straight from grock using a distilled llama version or you can use LM studio and use any variation of distilled model on your own local computer and if you think the story for deep seek ends there well the day I'm recording this January 27th that same company dropped new research this time in an AI image generation model this new model is called Janice Pro 7B so not only are they creating top-of-the line pretty much state-ofthe-art art new large language models at cheaper costs and doing it a lot faster they now appear to also be doing this with AI image generation now I haven't played with this model myself yet so I don't know too much about it it literally came out while I was in the process of recording this video but if we look at their benchmarks here we can see this new Janice Pro is the blue with the white lines on it and it pretty much outperformed in both of these benchmarks against sdxl stable diffusion 1. 5 pixart a dolly 3 sd3 medium and emu 3 gen which I believe is meta's AI image model so not only are they disrupting the large language models they're now also trying to disrupt the AI image generation models as well as I learn more about this Janice model we'll talk about it in some future videos I just wanted to add this to the mix because it's that same deep seek company that has people kind of freaking out right now but there you have it there's kind of the lay of the land you're going to hear a lot of people talking about deep seek and deep seek R1 it's going to be in the news more and more there's going to be a lot of videos about it a lot of expost about it I wanted to break down the facts and what we know about it and some opinions from other people and just make sure you had the lay of the land and know exactly what it's about and can speak intelligently on it there's probably a few things I've left out I'm sure they'll get mentioned in the comments if I did but that is deep seek R1 and now deep seek Janice and that's why Nvidia and the stock market were affected by it at least that's why people are claiming the stock market was affected by it I think it's just a sort of short-term thing but we'll see how it all plays out hopefully you enjoyed this video hopefully you learn something new hopefully you feel more looped in if you like breakdowns like this and you want more AI news more tutorials and to learn about more cool AI tools make sure you like this video And subscribe to this Channel and I will make sure a lot more of this kind of stuff shows up in your YouTube feed and of course if you haven't already check out futur tools. this is the site where I curate all the cool AI tools I come across I keep your AI news up to dat on a daily basis is here and I have a free newsletter where I share just the coolest tools and most important news from the week it's twice a week it'll hit your inbox and if you sign up you'll also get free access to the AI income database a database of cool ways to make money using various AI tools again it's all free you can find it over at future tools.