DeepSeek has hit artificial intelligence like a tsunami. And Silicon Valley is still in shock. In less than a month, a small AI laboratory, whose icon is a whale, has completely turned the table of artificial intelligence upside down. The first blow came on Christmas Day with a product similar to ChatGPT, The nuclear charge was in the fine print. In this document, DeepSeek claims that it had achieved a level of intelligence comparable to ChatsGPT with an investment of only $5 million. The second blow was even greater. It happened on January 20, a few hours after Trump presented
with Sam Altman the Stargate project, the largest financial investment in artificial intelligence in history. 500,000 million dollars. That day China sent a war message to the United States in the form of a language model. The message said something like "Despite your restrictions on the use of Nvidia chips, we are able to create an AI model as good as ChatGPT's o1." By the way, using O1 costs $200 a month. Ours is free. There you have it. Download it, manipulate it, market it if you want. The result? Millions of people already use DeepSeek around the world. Your
mobile application It is the most downloaded. And we all ask ourselves a lot of questions. How have they achieved it? Is what they say true? Are users sending data to China, to the Communist Party? And above all, do the gigantic investments that until recently we believed make sense? necessary to achieve general artificial intelligence? Encouraged by hundreds of messages received on networks, I have also had to change the plans of this channel. In record time, my team and I have prepared the most complete guide about this phenomenon. In this video we are going to try to
answer rigorously to all the questions the world has about DeepSeek. Hello, I'm Gustavo Entrala. By day, I help large companies and startups design their future. And at night, I practice a hobby that I love, teaching what I know about that future to those who They always want to be one step ahead. If you are interested in the future, subscribe. And hit the bell so YouTube will notify you when a new video is released. What is DeepSeek and why is everyone talking about DeepSeek? Deepseek is several things at once. It is an artificial intelligence laboratory in
China, certainly not very big, with no more than 200 employees. Deepseek is also the brand of two artificial intelligence models, the V3 that appeared at Christmas and the R1 that appeared on December 20 January, the same day that Donald Trump was inaugurated as president of the United States. Deepseek is also a website, deepseek.com, and an application that can be downloaded at Google or Apple app store. Deepseek has reached a very high level notoriety recently, when it launched the R1 model, which is a model capable of reflection, capable of elaborating a chain of thoughts, and which
is practically equivalent to a model from the OpenAI firm called O1. Using OpenAI's model, the O1, costs $200 per month, But what is relevant, what has caught the world's attention and what has caused an earthquake in bags these days, has been the fact that DeepSeek offers this most advanced model of completely free of charge. Since the announcement of the model that was hatched by the other announcement of the investment network promoted by the presidency of the United States called Stargate, in which several companies are going to invest 500,000 million dollars, as I say, The R1
announcement was hatched by the Stargate announcement, but as the days went by, analysts and programmers began to thoroughly analyze this model from the DeepSeek firm and They realized that he had been trained at a cost of 5.3 million dollars during only two months. The shock was immediate. Of course, if the tycoons of the technology companies In the United States, they have been saying for months that the next model is going to cost more than a thousand millions of dollars to train it, now a Chinese company arrives that shows that with an investment infinitely less than
five million dollars and in a very short period is capable of launching a Thought-provoking artificial intelligence model that eliminates competitive advantage that OpenAI had, Sam Altman's firm that is estimated to be losing 3.5 billion dollars a year. Of course, this leads to many questions. The world wonders if Big Tech will continue buying chips from Nvidia at the rate at which they have been doing so. If Nvidia can manage to maintain its hegemony by selling chips that perhaps intelligence technology companies artificial they no longer need. And the answer to this doubt was the biggest fall in
the stock market that has ever occurred. produced in a company in a single day. NVIDIA on January 27, 2025 lost 600,000 million dollars in value on the stock market. And furthermore, as that reflection continued internal to the technological world in the United States, new arguments were added. How is it possible that this company DeepSeek, which is based in China and therefore suffers the limitations of the American government to import the most powerful NVIDIA chips? How is it possible that this company, having much lower capacity chips, has managed to emulate OpenAI, Anthropic or XAI, Elon Musk's
artificial intelligence? And the state of shock was little by little translating into a state of collective hyperventilation, in the biggest Silicon Valley self-esteem crisis we've ever seen. What are we doing? They will think of Silicon Valley. How is it possible that we have been so wrong? And how is it possible that in the context of the artificial intelligence war between China and the United States United, the restrictions that until now seemed to have left the China, far below the capacity of American technology companies and also far behind in terms of time frames? To the extent
that China does not have the necessary technology, It's going to cost a lot more time, that's what was thought, it's going to cost a lot more time to reach the quality level of artificial intelligence in the United States. Second question, what restrictions does the United States impose on China in the purchase of chips for artificial intelligence? The United States considers that the most advanced chips in artificial intelligence, especially those from the firm Nvidia, are a strategic asset both for the USA and its allies. The United States is experiencing a cold war in the area arms,
commercial and technological with China and thinks that it is essential to beat it in the field of artificial intelligence, because when AI is applied to military weapons will make a difference. Until that moment, until the appearance of DeepSeek, it seemed that the United States The United States had an advantage of at least five years over China due to the highly advanced level of NVIDIA chips. China cannot manufacture, does not have the capacity to manufacture chips at the level of NVIDIA GPUs, does not have access to the technology basic lithography of the chips and does not
have access to the most patents advanced. There is a manufacturer called Huawei that has some chips for artificial intelligence whose capacity is actually still a few years away away from the capacity of NVIDIA's most advanced chips and is It was thought that this limitation for the Chinese market would make it impossible to advance research. of Chinese artificial intelligence technology and products. What are those restrictions? These limitations were introduced by President Joe Biden for the first time in October 2022, which subjected Nvidia chips to US export control United to China. As a result of this measure,
NVIDIA began to manufacture chip models of artificial intelligence only for the Chinese market, with a much lower memory capacity and data transmission speed. This is how the A800 chip and the H800 chip emerged. This last one, the H800, is equivalent to the H100 chip, which is the most used today in data artificial intelligence centers in the United States. But the H800 have restricted the volume of data they can handle and the speed of transfer to the system. It's like sell a Ferrari that can't go over 100 kilometers per hour. Be careful, these chips are not
cheap. They are much less capable than NVIDIA's most advanced ones, but they are not cheap. A H800 chip has reached a price of more than 70 thousand dollars per piece in the Chinese market. Has there been smuggling? The proof that this smuggling has existed is that 15% of Nvidia chip sales globally They are destined for Singapore. In other words, Singapore spent more than 7,000 on Nvidia chips billion dollars in the year 2024. Is it a coincidence that Singapore imports so many chips far above your needs? No, it's not a coincidence. It is suspected that many
Chinese technology companies imported these chips to Singapore through third parties companies and then brought them to mainland China. Furthermore, it is known that Chinese companies technology, the Baidu, the Alibaba, the Tencent, etc. have been using cloud services with Latest generation Nvidia GPUs for training your models. This was known. And to avoid these legal loopholes that allow China to use the chips more advanced NVIDIA, Joe Biden, just before leaving, leaving the presidency, has imposed new restrictions. These new restrictions mean that only the United States and the countries considered friends, which are 18, They can buy NVIDIA
chips without any type of limitation. Below are a very large number of countries that the United States considers neutrals who can buy Nvidia chips by first asking the Secretariat for permission of Commerce of the United States. And finally there is a series of seven countries among which There are the usual suspects like China, Russia or Iran. These countries do not They cannot acquire advanced generation chips from Nvidia in any way. Ah, so it is very clear, DeepSeek has used chips that are illegal to import. Let's focus the conversation around what DeepSeek says it has done
in this paper that I show you now, It is the paper that explains the construction of your models. In this document it is very good explained how they have designed these models, how they have trained them and how the inference is made. And regarding what it says, we are going to, therefore, we are going to stick to what its documentation says and what experts say who have already downloaded the model, who have crushed it and who have completely gutted and that they understand how the model works inside. Having those two issues into account, it has
been proven that in the final phase of model training v3 of DeepSeek have used a cluster, that is, a group of interconnected chips, of 2,048 chips NVIDIA H800. That is completely proven. Another thing is that in previous phases of training of that DeepSeek model has been able to use much more money than it says it has used and much longer. And it is also possible that in those previous phases of investigation of the model higher capacity chips have been used. And this statement that I make now is linked with a persistent rumor. And that rumor,
which seems corroborated in some interviews that exist on the internet to the founder of DeepSeek, seems to mention at some point that he came to buy 10,000 A100 chips, which are the previous generation of NVIDIA's most capable chips, that is, the one prior to the H100 that are used at this time. Those chips would have a very high value, reaching a very considerable price, between 100 and 300 million dollars, but those who have judged the model, those who have studied it in depth and have seen in depth what they are their innovations, they affirm that
the architecture of the model and the innovations that the model would not make sense if they had used the A100 chips. What DeepSeek tells us is that it has spent 5.3 million dollars. And how do you calculate that expense? They calculate it from what It would mean DeepSeek renting these lower capacity chips than those imported to China, throughout the training phase at a price of $2 per GPU hour. And we know that DeepSeek has not actually rented these chips, it has not rented them, he owns them. This information has been officially confirmed by Nvidia. What
DeepSeek has done is buy them at the price that I mentioned previously of $70,000 per unit. The total amount of the purchase of the necessary chips would then be around the 40 million dollars. That amount is still much lower than the amounts that have been invested to train the most well-known artificial intelligence models. It is estimated that training DeepSeek has cost 3% of what it cost OpenAI to train the O1 model. That figure, even fattened, it would be very far from, for example, the billion dollars that it has had to Elon Musk investing to develop
his data center, the Colossus, which I have told you about elsewhere video, which has cost him, as I say, a billion dollars and is much further away from the 500B dollars that the investment is supposed to cost to build the largest data center in the story in a Texas town in the context of the Stargate project. So, although what the document tells us about the cost of developing DeepSeek, it is true and proven, the paper It only talks about the final phase of training. What factors, what elements of training would they be left out? They
would be left out, and in fact DeepSeek recognizes this, all the costs of previous research. What is called, and this word is horrible, experiments would be left out. model ablation. They are a chain of experiments that are run to verify what works about the innovations that are being introduced and what does not work. This phase of ablation takes a lot of time and a lot of investment money. Nor does these 5.3 million include the cost of developing the algorithms that have been used. Nor is the cost of the data that has been used to train
the model, of which we only know that there are 14.8 trillion pieces of data. And therefore, in computing the cost of the model, There is no data processing, there is no training of the models, reinforcement learning, and distillation is not there either, which is a very interesting factor that I am going to talk to you about below. So Gus, DeepSeq will be much worse than OpenAI or Gemini because of the low power of the chips, right? Well no. The quality of a model is measured through what is known as benchmarks, as comparison metrics. There are
several standards of comparison metrics. The paradox of this case is that the comparison model that DeepSeek has used to calibrate The quality of its models is the OpenAI Strawberry Benchmark. So DeepSeek has used OpenAI's own benchmark system to measure itself against OpenAI. And what do these DeepSeek Benchmarks tell us? The Strawberry Benchmark is made up of a test which is mathematics, a standard American test called the AIME. The second test is a physical and chemical biology test called the GPQA, which is also a standard in USA. There are two programming tests and there is a
logic and reasoning test called zebra. AND How does deepseek compare to openAI? practically tied as you are seeing in This graph deepseek wins in three tests and openAI wins in another three. Hey how about I take a break? in the video to talk to you about the backroom of this channel. We have started using several AI tools to improve our productivity. We will post a video soon in which we are going to explain what tools they are and how we use them. By the way, to those who You ask in the comments if I am
actually an AI avatar, I will answer you officially today, I am flesh and blood. These AI applications, like many others that we use in the management of the Channel such as Slack, Gmail or Zoom require user credentials. Being able to share these credentials securely is especially important for this workflow. channel. We have started using NordPass and this has been a radical change because NordPass generates and securely stores passwords for your entire team in an encrypted cloud. The synchronization Cross Devices works perfectly on both desktop and mobile devices. mobile devices. The data leak scanner that this
application has alerts us if any of our credentials, users and passwords have been leaked on the dark web. Finally, secure sharing allows me to share credentials and credit card details safely with any of my teammates without compromising security, because We all know that sending passwords by email, credit card details or writing them down on paper is very risky. To eliminate the risks of password management in your company, I highly recommend you try NordPass. They offer a Free trial with no credit card required so you can see how NordPass Business transforms the way you work in
your company's applications. For a limited time, NordPass offers a free 3-month trial of the NordPass Business product, only for those who watch this channel. To access this offer use the code GUSTAVO at nordpass.com/gustavo. And now let's go back to our video. So, how have they managed to overcome the limitations they had? I'm going to explain it in simple terms. They have trained their models by introducing very important innovations in the algorithms and have managed to optimize the capacity of the chips they have, which As you know they are limited, using machine code. I will detail
this explanation in a more technical way in an answer to a ask later in this video. But now I'll give you an appetizer. On the one hand, they have used a technique called distillation, by which a new model, It would be the apprentice model, it dialogues with an already existing model. Imagine what the model could be 4.0 from OpenAI and this second model would act as a teacher. The two models would open a dialogue in the one where questions are asked, many questions, millions of questions. The apprentice makes a question, the teacher answers that question
and the learner distills the knowledge of the model older and superior model. Has DeepSeek built on previous work from other models? of OpenAI and meta? Everything seems to indicate that this is the case and OpenAI affirms that it has evidence that this has happened. On the other hand, when designing your model capable of reflecting and capable of developing a much more accurate answer, the R1 model, the model that reasons, has used a completely revolutionary technique. They have created a reward system so that the model learns to think for itself. This methodology is called "reinforcement learning",
reward learning, and they have done it with very good results. And finally, they have simplified the necessary computation to make inference, What happens when we make a query? to an artificial intelligence. This process, therefore, is called inference, and it is a different process from training. So, every time we ask a query to an artificial intelligence or a chatbot, for example, we are generating inference. DeepSeek has managed to simplify inference applying completely new compression techniques. What implications does this change have for artificial intelligence? And for the global geopolitical scenario? Until this moment, we all thought that
the United States, with some exceptions, For example, in France there is the Mistral model and there are other models in other countries, but We thought the United States had absolute dominance in generative AI. The emergence of DeepSeek tells us that although the United States is still the leader, DeepSeek has shown that the quality of the best models can be replicated in less time than sooner and with fewer resources. What direct impact does this have? Lower training costs through new techniques and lower costs for inference, for the query we make to artificial intelligence chatbots. This represents
a revolution. To the extent that the The costs of training a model drop dramatically, and so does the cost of putting artificial intelligence into service on the market, these costs decrease. And of course, and more people, more companies, more operators will be able to do it. Second, the emergence of DeepSeek and the fact that your model is available through open source, Open Source, opens a spigot in the debate that existed between closed models of artificial intelligence and open artificial intelligence models. If there are open models with DeepSeek R1 capability, It's totally free, what's going to
happen? Well there are probably other companies that develop open models of great capacity, and that is already happening. And this means that competition expands and also that there will be more and more companies and more organizations that instead of hiring a payment service in the cloud to be able to use artificial intelligence within your organization, They are going to start installing these artificial intelligences locally. They are going to do it, as it is technically known, on-premise, so the cost will be lower and these companies also make sure to protect the private data of their organization
and all the knowledge that is generated within that organization. And at a geopolitical level, I think this balances the scales in innovation in intelligence artificial between the United States and the rest of the world, specifically in a first instance between the United States and China. But any country can now develop its own models, overcoming restrictions that the United States imposes on certain countries and also knowing that the cost of training and inference of those models is going to be much cheaper. Who is behind DeepSeek? DeepSeek is a side project of an organization that It's called
HiFlyer, which is a quantum mutual fund operator. Let's go in parts. DeepSeek is a laboratory founded by a man named Liang Wengfeng. you are seeing in image now. This man began his professional career by creating a fund of quantum inversion. A quantum investment fund is a fund in which the calculations mathematicians, algorithms and therefore artificial intelligence play a leading role when making investment decisions. This boy was born in the year 85, therefore he now has 40 years old, in a Chinese city called Sanyang, which is a port city in southern China. He was a very
outstanding guy, like all of these. Already at school he studied calculus in high school and then he went to Shenyang University, where he studied artificial intelligence. Being very young, he founded this investment fund, High Flyer, which he currently manages around of 8 billion dollars in assets. So, in this background is where Liang Wenfeng learns what it needs about artificial intelligence, acquires, it is rumored, a quantity NVIDIA chip major to operate its investment fund and begins hiring Chinese graduate students in artificial intelligence and putting them to work on an idea that he has a model of
general artificial intelligence, that is, a model of artificial intelligence but at another level compared to current ones. This guy is known for being a guy, a very discreet and practical leader, who actively participates in the DeepSeek research process. They say he reads scientific articles, There are some interviews where you can get to know a little about him. Well, read and write scientific articles, writes code and participates in group discussions with your team. And the people who work with him describe him as just another engineer, as much more than as a manager or as a businessman.
and they say that it has great engineering, infrastructure, modeling and resource mobilization. So it is, in short, a great head. And what you want is to compete with OpenAI, with Google and with Microsoft, right? Well, everything seems to indicate no. At this time, DeepSeek is a side project of the investment fund High Flyer. The money of this firm is generated, therefore, by the investment fund. And everything also seems to indicate that the founder, Liang Wenfeng, what he wants to do is what Elon Musk failed to do with OpenAI. All the steps you've taken, the decisions
you've made and the documentation we have available at this time, as well as the strategy of the artificial intelligence model, of V3 and R1, They seem to indicate that what you want to do is to create that open research entity very advanced in artificial intelligence that offers the world its innovations. Come on, what OpenAI was before Sam Altman decided to change its orientation. That is why DeepSeek was born as an open source project. They have given it to the world. Then I am going to explain to you in detail what it means for this project
to be open source. DeepSeek has no subscriptions, you are not charging to use the service, It is totally free. WengFeng, according to what I have been able to investigate, is known for its commitment to open source technology and because of a desire he has to challenge, and this seems very important to me, to challenge the dominance of large American technology companies. This man believes that open source It is a way to attract talent and promote innovation, and believes, it seems, that the value of DeepSeek lies in its team and in its capacity for innovation, so
the company focuses on fundamental research more than in commercial applications. 10 days ago, Liang Wenfeng participated in an event with Li Xinping and stated that wants to keep DeepSeek as a completely open artificial intelligence model. In other words, in principle it has committed to not closing the model to access by other companies and the public. And he has also said that DeepSeek's mission is to unravel the mystery of artificial general intelligence out of pure curiosity. This is what We know, this is what this person says and time will tell us if what he says is
in good faith or was not in good faith. What does it mean that DeepSeek is open source? DeepSeek is distributed under the MIT license for open source technologies, for open source technologies. That license is the simplest that exists and is also the most permissive that exists on the market. This MIT license allows Deepseek, market Deepseek, improve Deepseek and download it to a computer. There are people who have already managed to download the smaller version on a Rapsberry Pi. There are many things being done, many experiments with Deepseek. There are some American companies, such as Perplexity,
that are already beginning to offer the Deepseek model in the United States as one of the model options you can use when you do a content search in Perplexity. Microsoft has also incorporated DeepSeek to your Azure cloud so that any company can use them in their developments. The only requirement you have to do what you want with DeepSeek, Even marketing it is citing the origin of the model. Many people are concerned that DeepSeek censors and that we send our data through DeepSeek to China. Is this true? We have to differentiate two things. On the one
hand, we have to differentiate between deepseek.com, the website through which we can make queries to Deepseek and the application that Deepseek has on both Android and iPhone. We have to differentiate on the one hand when we access Deepseek through its own services, whether through its website and its application, and when we access Deepseek through other services. I mentioned before that DeepSeek is an open source technology and therefore This allows anyone to download DeepSeek on a computer, server or on a cloud server. So, making this differentiation previously, if you use the service.com or the DeepSeek application,
you are sending information to their servers and you are sending a lot of information, because every query we make to DeepSeek or any Another model of artificial intelligence involves many words. We do very elaborate consultations, We ask you very intimate questions, we can even ask you questions that affect our company or our business. We can upload the balance sheet of our company or the list of customers. If we do that through dot com or deepseek, deepseek or apps from DeepSeek, we are sending all that information to China and the Chinese government has permission legal express
to save and use all that data. Therefore, it would be very prudent to companies and institutions prohibit access to the .com or the DeepSeek application within their corporate networks. Additionally, some .com and app queries are censored and Deepseek is not responding. Up to 1,500 questions have been detected that Deepseek does not want to answer. Is hyper-known that does not respond to any queries about what happened in Tiananmen Square, does not respond to questions about members of the Chinese Communist Party or tells you the version of the Communist Party on Taiwan's status as an independent country.
I mean, that what Deepseek offers us through its dot com and its applications is the same as offers DeepSeek to its users in China. And you know that in China there is a version restricted Internet that does not allow the use of YouTube, nor does it allow the use of Facebook and that also imposes a wall that makes all communications between users through the Internet and all the information that a user consumes through the Internet In China, it is controlled by the government. But if you use DeepSeek locally, on your computer or through a supplier
outside of China, you are not sending any data to China and restrictions on questions can also be eliminated. Who wins with DeepSeek and who loses with DeepSeek? Who wins? China wins, which shows that it has the human talent and the necessary training to be at a very advanced level in the field of artificial intelligence. Not only does he have the talent, but he also has the ingenuity to bypass the restrictions that the United States has put to use the most advanced NVIDIA chips. Furthermore, China does have very powerful electrical equipment, nourished by various types of
energy. 60% comes from coal and is the world leader in hydroelectric energy, in wind energy and solar panels. In addition, the number of nuclear installations is increasing. In other words, China does have the electrical system prepared to make large investments in artificial intelligence. With this, artificial intelligence also becomes globalized and democratized. It no longer costs a billion dollars to create a foundational model or modify a model foundational. Countries that have restrictions on the import of chips can also develop their own models with less advanced chips. Open source models also win with DeepSeek. And the great
beneficiary of this change is Meta, which from the beginning was oriented to create open source artificial intelligence models. This also benefits Microsoft, who has huge investments in his Azure cloud company and who wants to provide inference services and that is investing 80,000 million dollars this year in those services to make inference. Amazon also wins, Amazon did not have a frontier model, it did not have its own founding model. And of course, to the extent that there are many models like DeepSeek, que se alogen en Amazon, and to the extent that there are many companies and
people who use these services through Amazon, Well, Amazon and all the cloud software companies They benefit from this change. Organizations also win who want to use artificial intelligence with greater security. An immediate consequence of this change, as I have commented, is that inference is going to be increasingly cheaper. Those who have data and those who have a product to offer also win. In other words, the data that YouTube has, the data that a large pharmaceutical company has or the data that a State has, for example, They acquire a value much higher than that of the
technology itself. And also those who have attractive products are winning. Those who have easy-to-use products, with a large distribution capacity, such as some of the large technology companies, Well, you also win, because providing these services will cost you much less money. And the fact that it costs them less money will make it easier for these services to reach to many more people. Artificial intelligence also wins locally, what is known as Edge AI, artificial intelligence executed not in the cloud, but through our own devices. Fine-tuned training of models until they acquire a very small size and
a very low computing cost, makes Apple a big winner of this change, because Apple is the hardware company that has the best integrated CPUs, GPUs and memory banks. It is true that Apple has not yet demonstrated a high level of reliability in its Apple Intelligence products, but here we are thinking more about the medium and long term. And as a result of this information bomb that has been the appearance of DeepSeek and also as a result of the shock that has caused in the technological field and in the markets, Satya Nadella, the CEO of Microsoft,
who I consider to be the most brilliant CEO that at least I have ever known, launched a tweet last hour of the night of Monday the 27th, that bloody Monday for the stock market, in which he talked about something that I did not know personally, which is the Jevons paradox, also known as the Jevons effect. He launched a tweet with this paradox that has been widely replicated. Because? Because what this says This paradox is that when technological progress increases, the efficiency with which it is used a resource, the fall in the cost of use, can
induce sufficient increases in demand, so that total resource consumption increases rather than decreases. This means, translated into the field of artificial intelligence, that both Microsoft like Meta like OpenAI like Donald Trump with his Stargate project, they think right now DeepSeek accelerates its plans, it does not cancel them, but it will accelerate them. Because? Because in the extent to which the resource of the chips and the resource of the use of intelligence models artificial is cheaper, many more people will be able to use them and therefore that resource will grow both in extension and capacity. In
other words, according to this Jemons paradox, Big technology companies have found the perfect excuse to move forward with their plans. And who loses with DeepSeek? Paradoxically, the American government and Big Tech lose. It has been shown that chip import restrictions do not stop innovation outside the United States, and that in fact have been counterproductive because they have caused for China to innovate and achieve ideas that until now seemed impossible. I think OpenAI, Anthropic and XAI also lose, that is, all the companies that make foundational models of artificial intelligence. The training of those models is being
revealed as very expensive and also with the appearance of DeepSeek it seems that these models are increasingly more of a commodity. In other words, there is no relevant and lasting competitive advantage in the investment. What is done to train a model. This would partly explain why Microsoft is divorcing OpenAI. What do I mean? Because Microsoft has allowed OpenAI to have another company such as Oracle as a partner in its Stargate project. I also think that Anthropic is a great loser for a reason. DeepSea, which in just a few days has become the application of most
downloaded mobile artificial intelligence in the Android and Apple stores. However, Anthropic, which has had the Claude Artificial Intelligence application for years, has not stood out never as a consumer product project and therefore I would not be surprised if, with the context competitive that exists, Anthropic will soon be sold to one of the large companies of technology. On the other hand, although the United States authorities and companies involved say that the Stargate project is still underway and that the plan to invest those 500,000 million is still going, for me it is a little in doubt. Unless
who are behind that project are truly contemplating that We are close to a level of artificial intelligence that will be capable of obtaining great advances for humanity. That famous intelligence general artificial or also called superintelligence. If that is true they will Many artificial intelligence chips will be needed to be able to train and use that general AI. And what about NVIDIA? I think that in the short term and in the short and In the medium term, NVIDIA GPUs will continue to be sold at a very high price and in a volume also very high in
the years 2025 and 2026. But later it will continue to have NVIDIA such a high level of demand. I find it doubtful for the simple reason that if in a year or in a year and a half we have seen so many innovations and we have also seen we have contemplated this boom of innovations by a small Chinese company, if that has happened, What else is going to happen that will reduce both the dimensions of the models and the cost of training and the cost of inference. So I see Nvidia doing very well in the
medium term, but I am not sure that in the long term it will be able to maintain growth levels of its turnover that it has had in the last two years. By the way, what does DeepSeek mean for investors? The emergence of DeepSeek implies that AI warfare, the volume of weaponry and the capacity of that weaponry are not everything. So far, the valuations of artificial intelligence companies and the valuations of the large technology companies that invested massively in artificial intelligence, were supported by an idea, for a thesis. The thesis that investments in artificial intelligence will
be exponential, they are going to be gigantic. The surprise of the appearance of DeepSeek means, As I have already said in several points of this video, the main competitive advantage is not the hardware and the realization that this is so may begin to reveal in the financial markets that there is a certain bubble effect. We will see what happens in the coming weeks regarding to the question of whether to continue investing in Nvidia or sell or buy. I'm not going to give any advice to anyone from this channel, but I can say one thing that
is part of the history of technology, and that is that chips are a cyclical business that has decline processes, slowdown processes, rise processes, innovation cycles that generate a great economic expansion and are a roller coaster therefore and Nvidia seemed until now that it was defeating this law of gravity of the semiconductor sector. I think that from now on investors are going to pay more attention to the income generated by companies that are dedicated to making artificial intelligence products and applications, that are dedicated to providing services for companies and users and also to companies small ones
that are going to start appearing with innovative products in very specific market niches. In other words, we are going to stop thinking a lot about infrastructure and in chips and basic investments, I think now it will start to gain prominence the growth cycle in applications and services. And the big question that the market is going to ask itself from now on is Is such a large CAPEX investment necessary? by technology companies to develop artificial intelligence? And the second question, linked to the first, evidently, is how long will it take to recover that investment? What innovations
has DeepSeek brought at a technological level? This part of the video is going to be very technical, but I'm going to try to explain it the way as simple as possible. I am going to address the innovations that DeepSeek has introduced, classifying them into the three artificial intelligence models that it has developed to date. The V2 model, which is obviously a previous model to V3, which is a model in which DeepSeek tested a good part of the innovations that it has applied to its most recent models. I will also talk about some improvements that DeepSeek
has introduced in the V3 model. And finally I will talk about the R1 model, which is the model equivalent to OpenAI 1, that is, the model that thinks. Let's start with the innovations of the V2 model, which is the model originating prior to V3. Well, it has introduced two concepts which are the DeepSeek MOE, Mixture of Experts, and the DeepSeq MLA. We are going to go in parts, I don't want you to get lost with the acronyms that I'm going to use. So, on the one hand the DeepSeek MOE, Mix of Experts. What is architecture
mix of artificial intelligence experts. Until two generations of models of artificial intelligence, when we made a query, the server opened the entire model in its memory, all the content of the model, and opened what is known as the window of context, which I will explain later in a little more detail. In other words, every time we do a query to a GPT chat, or a Claude, or a Grok, or in the case of XAI, every question What we do requires that the server open the entire model simultaneously to know how it has us to respond.
This implies a memory capacity and a transmission speed of the amazing data. This is the normal procedure. But work began with the GPT-4 model with an architecture called the expert mix model. And what does that architecture consist of? It consists of the model being divided between different experts, each of whom knows about a specific issue. So that, for example, GPT-4 was said to have 16 expert models. And when we ask a question, Instead of us having to open the entire model simultaneously, with the data absorption capacity and the data traffic speed that this requires, Let's
say we only open the fragment that is assigned to that specific expert. In the case of V2, from the Chinese firm DeepSeek, what they have done is divide the load between specialized models and generalist models. And that has been the innovation they have introduced. In other words, there are models specialized in very specific issues, but there are several models that are capable of answering a very wide range of questions. And that impacts the inference, the consumption we make of that model of artificial intelligence because it reduces it. Second innovation that DeepSeq carried out In the
V2 model it is what is called the DeepSeek MLA, which is the Multi-Head Latent Attention, which is a modification to Google's transformer architecture, whose origin I told you in this video that I point out now so you can see it. As I told you before, when you make inference, that is, when you make a query to an artificial intelligence model, you load the model and you load a thing called the context window. The context window It contains, among other things, the memory of the conversations I am having with that chatbot, the documents that I can
share with that chatbot, which can even reach the size of a book, and all the queries I am asking. That implies a very abundant use of memory. He system that DeepSeek has invented allows you to compress the context window in a way dramatic, making the inference much easier. Let's now think about the innovations that DeepSeek has carried out in its V3 model. First, they have created a new approach to balancing data loads and traffic. model data. Second, they have compressed the next word prediction tokens, which will be the response that the model will give us
in the training phase. And thirdly, and this is the most important, they have distilled other models of artificial intelligence. And this seems to me to be the most controversial and most unique part of the new features that DeepSeek has developed for this V3 model. I mentioned it previously in the video, that distillation technique is used in today's artificial intelligence companies. So OpenAI does the education and training of its new models counting on other models that they managed internally. Google also does it and all intelligence companies do it artificial. But the difference in this case, in
the case of DeepSeek, is that it is very likely that there used the most advanced OpenAI models as masters for their model. In that process of distillation, what happens? It happens that on the one hand there is a model who is a student, who is the one who sends inputs to the model that is a teacher. In this case it would probably be the OpenAI O1 model. And the teacher model answers the questions that the learner model and the model are asking. The apprentice distills the way of extracting the answers that the master model has.
In other words, it is a way, therefore, to speed up learning a lot and to greatly compress the combination of data necessary so that the new model can offer answers that are sensible. and you will tell me "and how is this possible? I mean, how is it possible that OpenAI or Gemini allow that there are companies that use their models for training?" Okay, well this is happening because great models use APIs. To connect a model to an organization, with a company, an e-commerce store, for example, that wants to use artificial intelligence, connects to large models
through an API. I mean, that APIs are ways of providing service by the large model, the original model, and it is also possible to develop distillation techniques through the chatbots themselves, although they would require many more machines to do it simultaneously and it has a cost therefore very high. All experts say they are convinced that DeepSeek has used distillation techniques from other older models to reduce the production time training and to improve the quality of the outputs that DeepSeek produces. And then a very interesting innovation is that DeepSeek has optimized performance of the H800 chips,
which are those NVIDIA chips that I told you about before, that NVIDIA had sort of tuned to be able to sell them in the Chinese market. DeepSeek has managed to optimize performance by bypassing the CUDA architecture or ecosystem from NVIDIA. I know I'm getting very technical, but just so you understand, CUDA is as the operating system of NVIDIA chips and is proprietary to NVIDIA, which means that if you If you want to use NVIDIA chips you have to use CUDA. And it's part of the market pit that NVIDIA has at that time and that makes
it so powerful. Well, DeepSeek has managed work with a version, or let's say, in a deeper environment of NVIDIA chips called PTX, which is like the ability to operate directly on a chip without the need for a interface, what was known in the field of programming as the machine code of the chips. Well, they have gone in there and managed to optimize the performance of those chips and all those innovations. They are the ones that have contributed to the v3 model being trained in just two months at a cost, We have already counted that it
is only a part of 5.3 million dollars. And what technological innovations have been applied to the R1 model, What is this model that reflects, that gives more thoughtful answers? And also, in the case of DeepSeek, I am also going to show you how they reflect us in the interface exactly the reasoning you are carrying out. It's very curious. Okay, so how did they train this model in such a short time as well? They have achieved this by applying a very creative innovation, which is the use of reinforcement learning, learning via rewards, without human intervention. And
for this I need explain a little about reinforcement learning with human intervention and without human intervention. Imagine that a child is learning to ride a bicycle. We would have two formulas basics to teach you. One formula would be that he sees us riding a bicycle, that we supervise how the child is getting on the bicycle, how he begins to pedal, we support him so that he does not fall, etc. That would be a form of learning in which the child is imitating the behavior of an older person and is being supervised in her learning by
an elderly person. Another way to learn to ride a bike would be to tell the child "get on the bike, Start pedaling and whatever God wants, fall to the ground as many times as necessary." and let the child learn to ride a bicycle alone. What is the difference between a And another way to learn to ride a bike? In the first case we would talk about reinforcement learning with human supervision. In the second case we would talk about reinforcement learning, that is, a learning system with rewards, without human intervention, with rewards or punishments in the
case of the bike. This second is the one you used DeepSeek to train your R1 model. So let's say that the model has learned by itself to think, he has learned by himself to reflect by making mistakes. And how has it been proven Whether what the model thought was reasonable or unreasonable? With a reward system in which the correct answer of the model to a question was rewarded only if the answer It contained reasoning in which the model explained how it arrived at that answer. And this has caused, this part of the paper that explains
the V3 model to us is very nice, sorry, the R1 model, this has caused what they call an "aha" moment, a eureka moment, which is how this model has learned to think without the help and without the ways of thinking of a human being. Dipsyck has discovered that this model offers paradigms of reflection that were not known to date. It's very interesting. What does DeepSeek mean for the future of artificial intelligence? First, DeepSeek means more competition from more places around the world. Secondly, the competitive advantage of the foundational models has lost value. To the extent
that models become commodities, like raw materials, as something interchangeable, easily cheapened, easily replaceable, The models of laboratories such as OpenAI or Anthropic or XAI in principle lose competitive value and competition is now based on a combination of various elements. Chips are still very important but there is also human talent and there is also the ability to develop those spaces, those data centers in which there will be millions of chips. And in the long term a critical element will be the capacity of the electrical network of each country to support the consumption that these laboratories are
going to carry out, these artificial intelligence data centers. And therefore, we open a new chapter in this game of thrones that is artificial intelligence. By the way, if you haven't been able to see the AI game of thrones video I recommend you do it right now. you you're going to have a great time. A hug and thank you very much for watching this video. I would have liked to have been able to make a much richer edition, with music, with images, illustrations, etc. But I believe that the urgency of the matter and the importance of
matter well deserved to make this effort in record time. Thank you very much and for Please if you can subscribe to the channel. Like DeepSeek it is free to subscribe.