hey I'm Dave welcome to my shop I'm Dave plumber a retired software engineer from Microsoft going back to the MS DOS at Windows 95 days and today we're tackling a seismic shift in the world of technology the release of China's open source AI model deep seek R1 this development has been described as nothing less than a Sputnik Moment by Mark andreon and for good reason just as the launch of Sputnik challenged assumptions about American technological dominance in the 20th century deep SEC car1 is forcing a reckoning in the 21st for years many believed that the
race for AI Supremacy was firmly in the hands of the established players like open Ai and anthropic but with this breakthrough a new competitor has not just entered the field they've also seriously outpaced expectations if you care about the future of AI Innovation and Global technological competition you'll want to understand what deep seek R1 is why it matters whether it's just a giant scop and what it means for the World At Large let's dive in to set the stage here's the part that really upset the industry and sent the stocks of companies like Nvidia and
Microsoft reeling not only does deep seek R1 meet or exceed the performance of the best American AI models like open AI 01 they did it on the cheap reportedly for under $6 million and when you compare that to the tens of billions already invested if not more here to achieve similar results not to mention the $500 billion discussion around Stargate it's cause for alarm because not only does China claimed to have done it cheaply but they reportedly did it without access to the latest of nvidia's chips if true it's akin to building a Ferrari in
your garage out of spare Chevy parts and if you can throw together a Ferrari in your shop on your own and it's really just as good as a regular Ferrari what do you think that does to Ferrari prices so it's a little bit like that and just what is deep SE car1 it's a new language model designed to offer performance that punches above its weight trained on a smaller scale but still capable of answering questions generating text and understanding context and what sets it apart isn't just the capabilities but the way that it's been built
deep seek is designed to be cheap efficient and surprisingly resourceful leveraging larger foundational AIS like open AI gp4 or meta llama as scaffolding to create something much larger let's unpack that because at its core deep seek R1 is a distilled language model when you train a large AI model you end up with something massive hundreds of billions if not a trillion parameters consuming terabytes of data and requiring a data centers worth of gpus just a function but what if you don't need all that power for most tasks and that's where the idea of distillation comes
in you take a larger model like a gp4 or the 671 billion parameter Behemoth R1 and you use it to train the smaller ones it's like a master Craftsman teaching an apprentice you don't need The Apprentice to know everything just enough to do the actual job really well deep seek R1 takes this approach to an extreme by using larger models to guide its training deep seek creators have managed to compress the knowledge and reasoning capabilities of much bigger systems into something far smaller and more lightweight the result a model that doesn't need massive data centers
to operate you can run the smaller variants on a decent consumer grade CPU or even a b laptop and that's a GameChanger but how does this work well it's a bit like teaching by example let's say you have a large model that knows everything about astrophysics Shakespeare and python coding and instead of trying to replicate that raw computational power deep seek R1 is trying to mimic the outputs of the larger model for a wide range of questions and scenarios by carefully selecting examples and iter ating over the training process you can teach the smaller model
to produce similar answers without needing to store all that raw information itself it's kind of like copying the answers without the entire library and here's where it gets even more interesting because deep seek didn't just rely on a single large model for the process it used multiple AIS including some open source ones like meta llama to provide diverse perspectives and solutions during the training thinking of it assembling like a panel of experts to train one exceptionally bright student by combining insights from different architectures and data sets deep seek R1 achieves a level of robustness and
adaptability that's rare in such a small model it's too early to draw very many conclusions but the open- source nature of the model means that any biases or filters built into the model should be discoverable in the publicly available weights which is a fancy way of saying that it's hard to hide that stuff when the model is open source in fact one of my first tests was to ask deep seek what famous photo depicts a man standing in front of a line of Tanks it correctly answered the tonan Square protest tests the significance of the
photo who took it and even the censorship issues surrounding it of course the online version of Deep seek may be completely different cuz I'm running it offline locally and who knows what version they get within China but the public version that you can download seems solid and reliable so why does all this matter well for one it dramatically lowers the barrier to entry for AI instead of requiring massive infrastructure in your own nuclear power plant to deploy a large language model you could potentially get by with a much smaller setup that's good news for smaller
companies research Labs or even hobbyists looking to experiment with AI without breaking the bank in fact I'm running it on our AMD thread Ripper that's equipped with an Nvidia RTX 68 GPU that has 48 GB of vram and I can run the very largest 671 billion parameter model and it still generates more than four tokens per second and even the 32 billion version runs nicely on my MacBook Pro and the smaller ones run down to the Ora Nano for $249 but there's a catch building something on the cheap has some risks for starters smaller models
often struggle with the bread and depth of knowledge that the larger ones have they're more prone to hallucinations generating confident but incorrect responses sometimes and they might not be as good at handling highly Specialized or nuanced queries additionally because these smaller models rely on training data from the larger ones they're only as good as their teachers so if there are errors or biases in the large models that they train on those issues can trickle down into the smaller ones and then there's the issue of scaling deep seeks efficiency is impressive but it also highlights the
trade-offs involved by focusing on cost and accessibility deep seek R1 might not compete directly with the biggest players in term of Cutting Edge capabilities instead it carves out an important Niche for itself as a practical cost-effective alternative in some ways this approach reminds me a bit of the early days of personal Computing back then you had massive mainframes dominating the industry and then Along Came these scrappy little PCS that couldn't quite do everything but were good enough for a lot of the work fast forward a few decades in the PC revolutionized Computing deep seek might
not be GPT 5 but it could pave the way for a more democratized AI landscape where Advanced tools aren't confined to a handful of tech Giants the implications here are huge imagine AI models tailored to specific Industries running on local hardware for privacy and control or even embedded in devices like smartphones and smart home hubs the idea of having your own personal AI assistant one that doesn't rely on a massive Cloud backend suddenly feels a lot more attainable of course the road ahead isn't without its challenges deep seek and models like it must prove that
they can handle real world tasks reliably scale effectively and continue to innovate in a space dominated so far by much larger competitors but if there's one thing we've learned from the history of technology is that Innovation doesn't always come from the biggest players sometimes all it takes is a fresh perspective and a willingness or sometimes the necessity to do things differently deep SE car1 signals that China is not just a participant in the global AI race but a formidable competitor capable of producing Cutting Edge open source models for American AI companies like open AI Google
deep minds and anthropic this creates a dual challenge maintaining technological leadership and justifying the price premium in the face of increasingly capable cost-effective Alternatives so what are the implications for American AI well open source models like deepsea car1 allow developers worldwide to innovate at lower cost this could undermine the competitive advantage of proprietary models particularly in areas like research and small to medium Enterprise adoption us companies that rely heavily on subscription or API based Revenue could feel the squeeze but intentionally dampening investor enthusiasm the release of deep SE R1 is open source software also democratizes
access to powerful AI capabilities companies and governments around the world can build upon its foundation without the licensing fears or the restrictions imposed by us firms this could accelerate AI adoption globally but reduce demand for us developed models impacting revenue streams for firms like open Ai and Google cloud in the stock market companies heavily reliant on AI licensing Cloud infrastructure and invidious chips or API Integrations could face downward pressure as investors factor in lower projected growth or increased competition now in the intro I made a little side reference to the potential of a scop angle
and while I'm not much of a conspiracy theorist myself some have argued that perhaps we should not take the Chinese at their word when it comes to how the model was produced if it really was produced on second tier hardware for just a few million dollars it's major but some argue that perhaps China invested heavily at the state level to assist hoping to upset the status quo in America by by making what is supposed to be very hard look supposedly cheap and easy but only time will tell so that's deep seek R1 in a nutshell
a scrappy little AI punching above its weight built using clever techniques and designed to make advanced AI accessible to more people than ever before it's not perfect it's not trying to be but it's a fascinating glimpse into what the future of AI might look like lightweight efficient and a little rough around the edges but full of potential now if you found this little explainer on deep seek to be any combination of informative or entertaining remember that I'm mostly in this for the subs and likes so I'd be honored if you consider subscribing to my channel
to get more like it and there's also a share button down in the bottom here so somewhere in your toolbar there'll be a forward icon which you can use to click on to send this to somebody else that you think probably wants to be educated and just doesn't know about this channel so if you want to tell them about deep SE car1 send them a link to this video if you have any interest in matters related to the autism spectrum check out the free sample of my book on Amazon it's everything I know now about
living your best life on the spectrum that I wish I'd known long ago in the meantime and in between time hope to see you next time right right here in Dave's Garage do it l do it do it