deep seek R1 was released just a few days ago and it has sent shock waves through the AI industry R1 is an AI model that has the ability to think just like open ai's Cutting Edge state-of-the-art 01 and 03 models but here's the thing it's completely open source and open weights deep seek a small Chinese company gave all of it away for free and they even detailed how to reproduce it but that's not even the craziest part it was was trained for just $5 million as compared to the tens and hundreds of millions of dollars
that most people in the AI industry thought was required to train a model of this caliber and it has sent everyone in the AI industry scrambling to understand the ramifications deep seek has been called everything from the downfall of major US tech companies like open Ai and meta to the greatest gift to humanity to a Chinese scop meant to shake the US to its core this story is wild so buckle up so just about a week ago president Trump Sam Alman the founder and CEO of open AI the founder of Oracle and many others got
together to make the announcement about project Stargate that is a $500 billion investment in AI infrastructure built in the US that is on top of the billions and potentially trillions that have already been spent on gpus mostly coming from Nvidia then right after that markk Zuckerberg doubled down on how much his company meta is going to spend on AI infrastructure also stating that they are going to continue to spend many billions of dollars building out energy infrastructure and AI infrastructure so the theme amongst the biggest tech companies in the world is spend as much as
we can to win at Ai and then something happened on January 20th 2025 a small small Chinese research firm called Deep seek released deep seek R1 a completely open source open weights AI model that has the ability to think also known as test time compute that is directly competitive if not slightly better than the 01 model by open AI that cost hundreds of millions of dollars to train and just like that the AI world was flipped upside down all of a sudden we had this completely open source version of a state-of-the-art model that we didn't
think we were going to have so soon let alone to be absolutely open source and essentially free the initial reaction was extremely strong I've made multiple videos about it I'll drop them down in the description below people looked at this and were stunned the biggest names in the AI industry realized we now had a completely open- Source state-of-the-art model and as everybody was taking this in and super excited that we can play around with it reproduce it suddenly the tone shifted in the technical paper that was released alongside deep seek it was noted that that
model was trained for just $5 million that is a fraction of the cost of what every other state-of-the-art model cost to train now think about what this means meta Microsoft open Ai and all of The Magnificent Seven basically the biggest seven tech companies in the world have been investing trillions of dollars building out AI infrastructure and then all of a sudden this little Chinese company comes along open sources a model that's comparable to the best models out there and not only did they make it completely free but they said it only cost $5 million and
then all of a sudden a lot of analysts are looking at these big companies spending billions of dollars per year and thinking do we really need that and a lot of people are pointing at these big companies saying you guys are about to lose you've invested so much money and it wasn't even necessary now I will tell you I do not agree with that whatsoever but that is a theme going on right now in the AI industry and then somebody on Twitter asked how is deep seat going to make money because they're giving it away
for free how are they actually going to make money and the API endpoint to actually run the model is really really cheap and you don't even need it you can run it on your own hardware and then this tweet went viral deep seeks holding and this is the Chinese company's name is a Quant company meaning they are mathematicians tasked with building trading algorithms simply to make money that's it many years already super smart guys with top math background happen to own a lot of GPU for trading mining purposes and deep seek is their side project
for squeezing those gpus essentially this is not even the main function of the company this was a side project so a handful of smart people got together figured out how to make a state-of-the-art model incredibly cheap append the entire AI industry and it was their side project that's insane to think about and this went viral and the memes were strong let me show you a few of the reactions from people in the industry so here's one from simp for Satoshi Sam spent more on this referencing this incredible automobile which I know was multiple millions of
dollars and that's Sam Altman driving it then deep seek did to train the model that killed open AI now again I don't really believe this I will explain what I think is going on in a little bit here we have Neil Coosa son of venod kosla saying deep seek is a CCP State scop plus economic Warfare to make American AI unprofitable they are faking the cost was low to justify setting price low and hoping everyone switches to it to damage AI competitiveness in the US don't take the bait now there was a community note saying
there's zero evidence of this and that was wasn't even the craziest take in Davos Alexander Wang the CEO of scale AI basically called out deep seek saying no they actually have many more gpus than they're telling us simply because there is an export ban on China from the US that we cannot export our Cutting Edge chips to them at scale and so in the research paper if they admitted that they had a bunch of gpus obviously the US would be pretty pissed and in this clip Alexander Wang talks about how deep seek probably has 50,000
h100s which are nvidia's top-of-the-line gpus and the fact that they can't talk about it because it goes against the export controls that the US has in place and maybe that's true although again remember everything is open- sourced and they really went into deep detail they being deep seek into how they actually produced this model for so cheap and the company hugging phase is reproducing it right now now let me show you some posts from emad who is the founder of stability AI who basically ran the numbers and figured out yeah it's actually legit what they're
saying deep seek are not faking the cost of run it's pretty much in line with what you'd expect given the data structure active parameters and other elements and other models trained by other people you can run it independently at the same cost it's a good lab working hard now it wasn't enough he didn't put any numbers but of course he followed up and did check this out so he basically says for those who want the numbers here it is optimize h100 could do it in less than 2 .5 million and he actually used Chad gp01
to figure it out now I'm not going to go through this it's a bit technical for this video and again now all of the focus is back to the major tech companies anthropic meta open AI Microsoft who have raised and spent billions and billions of dollars to build out AI infrastructure only to have the rug pulled out from under them from this tiny Chinese company listen to this deep seek goes Omega viral and they can handle the demand on their two Chromebooks they have to use for inference meanwhile anthropic cannot handle the load of their
paying customers with billions in funding do I get this right and that seems to be the sentiment across the board here's another one I have made over 200,000 requests to the Deep seek API in the last few hours zero rate limiting and the whole thing cost me like 50 cents bless the CCP open AI could never now here's the thing we've been talking on this channel a lot about test time compute a lot of the scaling that's happening in AI right now is not at pre-training not that $5 million that it cost to actually build
out the model but since these models can now think and the more thinking they do the better the results that thinking is actually just compute it's using compute and so what's interesting about this is that even at test time so they're hitting the API 200,000 times zero rate limiting and extremely inexpensive unless they are just losing tons of money and have a bunch of gpus that that we don't know about they've figured out something about efficiency that the US companies have not Alexander Wang follows up with a post deep seek is a wakeup call for
America but it doesn't change the strategy USA must out innovate and race faster as we have done in the entire history of AI and tighten export controls on chips so that we can maintain future leads every major breakthrough in AI has been American and continuing China's deep sea could represent the biggest threat to us Equity Market as the company seems to have built a groundbreaking AI model at an extremely low price and without having access to cuttingedge chips calling into question the utility of the hundreds of billions of dollars worth of capex being poured into
the industry so that's a huge huge claim here now it's one thing to be able to train the model originally at a very cheap and efficient price but it's another thing to actually be able to run the inference at an extremely cheap and efficient price now I said earlier I don't believe it and let me tell you why so there's two possibilities let's just assume they were able to figure out how to make this model extremely cheaply we're going to be able to replicate that awesome right everybody wins That's The Power of Open Source now
at inference time at thinking time even if let's go down the two paths even if this model is able to run inference extremely cheaply then we are getting to javon's Paradox as the cost per unit of any technology decreases the usage the total usage and the spend actually increases we've talked about that on this channel that is because as the unit cost of any Tech decreases the amount of use cases that it can apply to in a positive Roi way increases dramatically that's what we've seen with every Tech throughout history then let's think about the
other path they actually do have a bunch of gpus powering it and they're simply faking how efficient it is is well first of all we're going to figure that out because we have ai companies throughout the world replicating deep seek R1 right now but let's just assume they're doing that then that's fine all of this investment is still very valid and even if it is really efficient all of this huge investment by these AI companies in AI infrastructure is still valid because at the end of the day Whoever has the most compute will have the
smartest model it doesn't matter if it costs $100 per token or a fraction of a penny per token the more compute the better Whoever has the smartest AI will win and here's Gary tan the president of y combinator basically saying the same thing and this is in reference to the chart that we just talked about where it is a big threat to us Equity markets do people really believe this if training models get cheaper faster and easier the demand for inference actual real world use of AI will grow and accelerate even faster which assures the
supply of compute will be used yes that is the way to think about it I agree wholeheartedly but not everybody agrees chamath palopo billionaire invester former early Facebook employee and all-in podcast bestie has the exact opposite to say and he actually broke it down pretty well so in his first point he's saying in the 1% probability that the CCP has all of these chips that they shouldn't we need to go investigate that so that's point one next he talks about training versus inference now we are in the era of inference right now we always knew
this day would come but it probably surprised many that it would be this weekend with a model this cheap many new products and experiences can now emerge trying to win the hearts and minds of the global populace Team USA needs to win here to that point we may still want to export control AI training chips we should probably view inference chips differently we should want everyone around the world using our Solutions over others now I'm going to jump down to point 4 now because this is interesting and the part that I really disagree with there
will be volatility in the stock market as capital markets absorb all of this information and repic the values of the mag 7 that's the Magnificent 7 companies like Tesla and meta and Microsoft so keep that in mind Tesla is the least exposed the rest are exposed as a direct function of the amount of capex they have publicly announced Translating that it basically means the company's stock might go down because of how much they have invested into AI infrastructure because if everything's cheaper now why did they spend so much do not agree with that at all
all again let's look at javon's Paradox the cheaper the tech the more it's going to be used the more inference needs to be used thus all of that supply of GPU is going to be used Nvidia is the most at risk for obvious reasons that said markets will love it if meta Microsoft Google Etc can win without having to spend 50 to 80 billion doll per year the markets might love that but that is not going to be the case again whoever has the smartest AI will win eventually when we reach artificial super intelligence it
is literally a battle of who has the smartest Ai and what does that take the most amount of inference or the most compute in general and what does that take the most chips the most spend into chips if we find really efficient ways to use these chips great everybody wins But ultimately the cumulative number of chips is really what's going to matter or compute he goes on to criticize the US and saying that we've been asleep and I'll just read this because it's an interesting take the Innovation from China speaks to how asleep we've been
for the past 15 years we've been running towards the big money shiny object spending programs and have thrown hundreds of billions of dollars at a problem versus thinking through the problem more cleverly and using resource constraints as an enabler now a key concept to know is that if people are faced with bigger restrictions and bigger constraints they tend to get more creative they tend to be able to extract more efficiency out of less and that's what what he's really referring to here I think the quote is constraint is the mother of innovation something like that
but not everybody thinks it's just conspiracy theories and the end of US tech companies Yan laon the head of meta's AI division who is a big proponent of Open Source has this to say to people who see the performance of deep seek and think China is surpassing the US and AI you are reading this wrong the correct reading is open source models are surpassing proprietary ones deep seek has profited from open research and open source EG pytorch and llama from meta they came up with new ideas and built them on top of other people's work
because their work is published in open source everyone can profit from it that is the power of open research and open source and I could not agree more this is a huge win for open source this is going to allow many companies to start competing with the closed Frontier models by having opsource state-of-the-art models this story is still unfolding it has been crazy to watch the AI industry react to the news that essentially everything that they thought might actually be changing right now so what do you think do you think they have more gpus than
they're leading on do you think that they were able to basically come up with this amazing efficiency with just a handful of people as a side project did China just jump into the lead of AI or is this just a great gift to the world because it is is open sourced I'm going to continue following up on this story I am enthralled with it I am absolutely fascinated by what's happening right now in the world and I hope I broke it down for you well if you enjoyed this video please consider giving a like And
subscribe and I'll see you in the next one