NVIDIA CEO Jensen Huang Leaves Everyone SPEECHLESS (Supercut)

316.14k views2816 WordsCopy TextShare

Ticker Symbol: YOU

Highlights of #nvidia ( #nvda stock ) Founder and CEO Jensen Huang speaking at AI Summit India. High...

Video Transcript:

general purpose Computing as we know it has existed for 60 years until now for the last 30 years we've had the benefit of mors law an incredible phenomenon without changing the software the hardware can continue to improve in an architecturally compatible way every single industry has subsequently been built on top of it but we know now that the scaling of CPUs has reached as limit the Free Ride of Moors law has ended we no longer can afford to do nothing in software and expect that our Computing experience will continue to improve that costs will decrease

and continue to spread the benefits of it and to benefit from solving greater and greater challenges we started our company to accelerate software our vision was there are applications that would benefit from acceleration that acceleration benefit has the same qualities as Moors law for applications that were impossible or impractical to perform using general purpose Computing we have the benefits of accelerated Computing to realize that capability for example computer Graphics real-time computer Graphics was made possible because of Nvidia coming into the world and make possible this new processor we call gpus but we felt that longterm

accelerated Computing could be be far far more impactful there is no such magical processor that can accelerate everything in the world because if you could do that you would just call it a CPU you need to reinvent the Computing stack from the algorithms to the architecture underneath and connect it to applications on top in one domain after another domain computer Graphics is a beginning but we've taken this Cuda architecture from one industry after another industry after another industry today we accelerate so many important industries ktho is fundamental to semiconductor manufacturing computational lithography simulation computer Aid

and Engineering Quantum Computing so that we can invent the future of computing with classical Quantum hybrid Computing in each one of these different libraries we're able to accelerate the application 20 30 50 times of course it takes a rewrite of software which is the reason why taken so long in each one of these domains we've had to work with the industry work with our ecosystem software developers and customers in order to accelerate those applications for their domains modulus teaching and AI the laws of physics not just to be able to predict next word but to

be able to predict the next moment in time of fluid dynamics and particle physics and so on so forth and of course one of the most famous application libraries we've ever created called called cdnn made it possible to democratize artificial intelligence as we know it these acceleration libraries Now cover so many different domains that it appears that accelerated Computing is used everywhere but that's simply because we've applied this architecture one domain after another domain that we've covered just about every single industry now accelerated Computing or Cuda has reached the Tipping Point the first thing I

happen of course is how we do software our industry is underpinned by the method by which software is done the way that software was done call It software 1.0 programmers would code algorithms we call functions to run on a computer and we would apply it to input information to predict an output somebody would write python or C or Fortran or Pascal or C++ code algorithms that run on a computer you apply input to it and output is produced very classically the computer model that we understood quite well however that approach of developing software has been

disrupted it is now not coding but machine learning using a computer to study the patterns and relationships of massive amounts of observed data to essentially learn from it the function that predicts it and so we are essentially designing a universal function approximator using machines to learn the expected output that would produce such a function and so going back and forth looking this is software 1.0 with human coding to now software 2.0 using machine learning notice who is writing the software the software is now written by the computer and after you're done training the model you

inference the model you then apply that function now as the input that deep learning model that computer vision model speech understanding model is now an input neural network that goes into the GPU that can now make a prediction given new input unobserved input and we have gone from coding to machine learning from developing software to creating artificial intelligence and from software that prefers to run on CPUs to now neural networks that runs best on gpus this at its core is what happened to our industry in the last 10 years we have now seen the complete

reinvention of the Computing stack the whole technology stack has been reinvented the hardware the way that software is developed and what software can do is now fundamentally different we dedicated ourselves to advance this field and and so this is what we now build initially we were building gpus that fit into a PCI Express card that goes into your PC this is what a GPU looks like today this is Blackwell yeah thank you a massive system designed to study data at an enormous scale so that we could discover patterns and relationships and learn the meaning of

the data this is the Greek breakthrough in the last several years we have now learned the representation or the meaning of words and numbers and images and pixels and videos chemicals proteins amino acids fluid patterns particle physics we have now learned the meaning of so many different types of data we have learned to represent information in so many different modalities not only have we learned the meaning of it we can translate it to another modality so one great example of course is translating English to Hindi translating English large body of text into other English summarization

from pixels to image image recognition from words to pixels image generation from images of videos to words captioning from words to proteins used for drug Discovery from words to chemicals discovering new compounds all because of this one instrument that made it possible for us to study data and enormous scales well I just want to say that in order to build the Blackwell system of course the Blackwell gpus is involved but it takes seven other chips tsmc manufactur all of these chips and they're just doing extraordinary job ramping the Blackwell system Blackwell is in full production

and we're expecting to deliver in volume production in Q4 and so this is basically Blackwell now this is one of the things that's really incredible about the system let me show it to you nothing's easy this morning [Music] this is MV link and it goes across the entire back spine of a rack of gpus and these gpus are all connected 72 dual GPU packages of black worlds 144 gpus connected together so it's one giant GPU if I were to spread out all of the chips to show you what this connect together it's essentially a GPU

so large it' be like this big but it's obviously impossible to build gpus that large so we break it up into the smallest chunks we could which is retical limits and the most Advanced Technologies and we connect it together using MV link this is mvlink backspin you're looking at all of the gpus being connected that's the quantum switch that connects all of these gpus together on top Spectrum X if you would like to have ethernet what connects this together this is connected to this switch and this is one of the most advanced switches the world's

ever built now all of this together represents Blackwell and then it runs the software that's on top the Cuda software CNN software Megatron for training the large language models tensor RT for doing the inference tensor RT llm for doing distributed multi-gpu inference for large language models and then on top on top of that we have two software Stacks one is NVIDIA AI Enterprise that I'll talk about in a second and then the other is Omniverse so this is the Blackwell system this is what Nvidia builds today of course the computation is incredible each rack is

3,000 lb 120 kilow 120,000 watts in each rack the density of computing the highest ever the world's ever known and what we're trying to do is to learn larger and smarter models each year we're increasing the amount of data and amount of the model size each by about a factor of two which means that every single year the computation which is the product of those two has to increase by a factor of four now remember there was a time when the world Mo's law was two times every year and a half or 10 times every

5 years 100 times every 10 years we are now moving technology at a rate of four times every year four times every year over the course of 10 years incredible scaling the second thing that we've discovered recently and this is a very big deal intelligence is not just one shot but intelligence requires thinking and thinking is reasoning and maybe you're doing path planning and maybe you're doing some simulations in your mind you're reflecting on your own answers and so as a result thinking results in higher quality answers and we've now discovered a second scaling law

and this is a SCA scaling law at a time of inference the longer you think the higher quality answer you can produce this is not illogical this is very intuitive to all of us if you were to ask me what's my favorite Indian food I would tell you chicken briani okay and I don't have to think about that very much and I don't have to reason about that I just know it and there are many things that you can ask it like for example what's Nvidia good at Nvidia is good at building AI super computers

nvidia's uh great at building gpus and those are things that you know that that's encoded into your knowledge however there are many things that requires reasoning for example if I had to travel from uh Mumbai to California I I want to do it in the in a way that allows me to enjoy four other cities along the way if I were to to tell it I would like to go from California to Mumbai I would like to do it within uh 3 days and I give it all kinds of constraints about what time I'm willing

to leave and able to leave what hotels I like to stay at so on so forth the people I have to meet the number of permutations of that of course quite high and so the planning of that process coming up with a optimal plan is very very complicated and so that's where thinking reasoning planning comes in and the more you compute the higher quality answer uh you could provide and so we now have two fundamental scaling laws that is driving our technology development first for training and now for inference the number of foundation model makers

has more than doubled since the beginning of Hopper there are more companies that realize that fundamental intelligence is vital to their company and that they have to build Foundation model technology and second the size of the models have increased by 20 30 40x the amount of computation necessary to train these model because of uh the size of the models but also multimodality capability reinforcement learning capability itic data generation capability the amount of data that we use to train these models has really grown tremendously that's one and then the other reason of course is that black

well is also used for generating tokens at incredible speeds and so together all of these factors has led to the demand for black well being incredibly High let's talk about now how we're going to use this technology earlier I told you that we have Blackwell we have all of the libraries acceleration libraries that we were talking about before but on top there are two very important platforms that working on one of them is called Nvidia AI Enterprise and the other is called Nvidia Omniverse and I'll explain each one of them very Qui quickly first Nvidia

AI Enterprise this is a time now where the large language models and the fundamental AI capabilities have reached a level of capabilities we're able to now create what is called agents large language models that understand the data that of course is being presented it could be streaming data could video data language model data it could be data of all kinds the first stage is perception the second is reasoning about given its observations what is the mission and what is the task it has to perform in order to perform that task the agent would break down

that task into steps of other tasks and uh it would reason about what it would take and it would connect with other AI models maybe it's a model that understands how to generate images maybe it's a model model that is able to retrieve AI semantic data from a proprietary database so each one of these large language models are connected to the central reasoning large language model we call agent and so these agents are able to perform all kinds of tasks uh some of them are maybe uh marketing agents some of them are customer service agents

some of them are chip design agents envidia has Chip design agents all over our company helping us design chips and so we're going to have agents that are help helping our employees become super employees these agents or agentic AI models uh augment all of our employees to supercharge them make them more productive now when you think about these agents it's really the way you would bring these agents into your company it's not unlike the way you would onboard uh someone uh who's a new employee you have to give them training curriculum you evaluate them and

so they're evaluation systems and you might guard where if you're accounting agent uh don't do marketing and so each one of these agents are guardrail that entire process we put into essentially an agent life cycle Suite of libraries and so this is what we call Nvidia Nemo we have on the one hand the libraries on the other hand what comes out of the output of it is a API inference microservice we call Nims essentially the this is a factory that builds AIS and Nemo is a suite of libraries that on board and help you operate

the AIS and ultimately your goal is to create a whole bunch of Agents the next generation of it is going to be about producing and delivery of AI and as you know the delivery of software coding and the delivery of AI is fundamentally different but dramatically more impactful insanely more exciting the second part is this what happens after agents now remember every single company has employees but most companies the goal is to build something to produce something to make something could be factories it could be warehouses it could be cars and planes and trains and

uh ships all kinds of things that next generation of AI needs to understand the physical world we call it physical AI in order to create physical AI we need three computers and we created three computers to do so the dgx computer which Blackwell for example is is a reference design an architecture for to create things like d GX computers for training the model that model needs a place to be refined it needs a place to learn it needs the place to apply its physical capability it's robotic capability we call that Omniverse a virtual world that

obeys the laws of physics where robots can learn to be robots and then when you're done with the training of it that AI model could then run in the actual robotic system that robotic system could be a car it could be a robot it could be AV it could be a autonomous moving robot it could be a picking arm uh it could be an entire Factory or an entire Warehouse that's robotic and that computer we call agx Jetson agx dgx for training and then Omniverse for doing the digital twin start locally grow globally right right

that's fantastic okay thank you thank you thank you very much Michelle thank you thank you