Unknown

0 views10224 WordsCopy TextShare
Unknown
Video Transcript:
This is how intelligence is made: a new kind of factory, a generator of tokens—the building blocks of AI. Tokens have opened a new frontier, the first step into an extraordinary world where endless possibilities are born. [Music] Tokens transform words into knowledge and breathe life into images. They turn ideas into videos and help us safely navigate any environment. Tokens teach robots to move like the Masters. [Music] Inspire new ways to celebrate our victories: a martini, please; call, light up. Thank you, Adam, and give us peace of mind when we need it most. Hi, Moroka. Hi,
Anna. It's good to see you again. Hi, Emma. We're going to take your blood sample today, okay? Don't worry; I'm going to be here the whole time. They bring meaning to numbers to help us better understand the world around us, [Music] predict the dangers that surround us, [Music] and find cures for the threats within us. [Music] Tokens can bring our visions to [Music] life and restore what we've [Music] lost. [Applause] Zachary, I got my voice back, buddy! They help us move forward one small step at a time [Music] and one giant leap [Music] together. And
here is where it all begins: welcome to the stage, NVIDIA founder and CEO Jensen [Music] [Applause] [Music] [Applause] Huang. Welcome to CES! Are you excited to be in Las Vegas? Do you like my jacket? I thought I'd go the other way from Gary Shapiro. I'm in Las Vegas, after all. If this doesn't work out, if all of you object, well, just get used to it. I really think you have to let this sink in. In another hour or so, you're going to feel good about it. Well, welcome to NVIDIA! In fact, you're inside NVIDIA's digital
twin, and we're going to take you to NVIDIA. Ladies and gentlemen, welcome to NVIDIA! You're inside our digital twin. Everything here is generated by AI. It has been an extraordinary journey—an extraordinary year here—and it started in 1993. Ready? Go! With MV1, we wanted to build computers that can do things that normal computers couldn't, and MV1 made it possible to have a game console in your PC. Our programming architecture was called UD, missing the letter C until a little while later, but UDA (Unified Device Architecture) and the first developer for UDA and the first application that
ever worked on UDA was Sega's Virtual Fighter. Six years later, we invented, in 1999, the programmable GPU, and it started 20 plus years of incredible advance in this incredible processor called the GPU. It made modern computer graphics possible, and now, 30 years later, Sega's Virtual Fighter is completely cinematic! This is the new Virtual Fighter project that's coming; I just can't wait. Absolutely incredible! Six years after that—six years after 1999—we invented CUDA so that we could explain or express the programmability of our GPUs to a rich set of algorithms that could benefit from it. CUDA initially
was difficult to explain, and it took years—in fact, it took approximately six years. Somehow, six years later, or so, in 2012, Alex Krizhevsky, Ilya Sutskever, and Jeff Hinton discovered CUDA, used it to process AlexNet, and the rest is history. AI has been advancing at an incredible pace since. Started with perception AI, we now can understand images and words and sounds; to generative AI, we can generate images and text and sounds; and now, agentic AI—AIs that can perceive, reason, plan, and act. And then the next phase, some of which we'll talk about tonight: physical AI. 2012—now,
magically, in 2018 something happened that was pretty incredible. Google's Transformer was released as BERT, and the world of AI really took off. Transformers, as you know, completely changed the landscape for artificial intelligence. In fact, it completely changed the landscape for computing altogether. We recognized properly that AI was not just a new application with a new business opportunity, but AI, more importantly, machine learning enabled by Transformers, was going to fundamentally change how computing works. And today, computing is revolutionized in every single layer—from hand-coding instructions that run on CPUs to creating software tools that humans use. We
now have machine learning that creates and optimizes new networks that process on GPUs and create artificial intelligence. Every single layer of the technology stack has been completely changed—an incredible transformation in just 12 years! Well, we can now understand information of just about any modality. Surely, you've seen text and images and sounds and things like that, but not only can we understand those, we can understand amino acids; we can understand physics. We understand them, we can translate them, and generate them. The applications are just completely endless. In fact, almost any AI application that you see out
there—what modality is the input that it learned from? What modality of information did it translate to, and what modality of information is it generating? If you ask these three fundamental questions, just about every single application could be inferred. And so, when you see application after application that are AI-driven, AI native at the core of it, this fundamental concept is there: machine learning has changed how every application is going to be built, how computing will be done, and the possibilities beyond. Well, GPUs, GeForce, in a lot of ways, all of this with AI is the house
that GeForce built. GeForce enabled AI to reach the masses, and now AI is coming home to GeForce. There are so many things that you can't do without AI. Let me show you some of it now. [Music] [Applause] [Music] [Applause] That was real-time computer graphics. No computer graphics researcher, no computer scientist, would have told you that it is possible for us to ray trace every single pixel at this... Point we, Ray tracing is a simulation of light. The amount of geometry that you saw was absolutely insane; it would have been impossible without artificial intelligence. There are
two fundamental things that we did: we used, of course, programmable shading and Ray-traced acceleration to produce incredibly beautiful pixels. But then we had artificial intelligence be conditioned and controlled by that pixel to generate a whole bunch of other pixels. Not only is it able to generate pixels spatially because it's aware of what the colors should be, it has been trained on a supercomputer back in Nvidia, and so the neuron network that's running on the GPU can infer and predict the pixels that we did not render. Not only can we do that; it's called DLSS. The
latest generation of DLSS also generates beyond frames; it can predict the future, generating three additional frames for every frame that we calculate. What you saw, if we just said four frames of what you saw, is because we're going to render one frame and generate three. If I said four frames at full HD 4K, that's 33 million pixels or so. Out of that 33 million pixels, we computed only two. It is an absolute miracle that we can computationally, using programmable shaders and our ray tracing engine, compute 2 million pixels and have AI predict all of the
other 33. As a result, we're able to render at incredibly high performance because AI does a lot less computation. It takes, of course, an enormous amount of training to produce that, but once you train it, the generation is extremely efficient. So this is one of the incredible capabilities of artificial intelligence, and that's why there are so many amazing things that are happening. We used GeForce to enable artificial intelligence, and now artificial intelligence is revolutionizing GeForce. Everyone, today we're announcing our next generation, the RTX Blackwell family. Let's take a look. [Music] Here it is: our brand
new GeForce RTX 50 Series, Blackwell architecture. The GPU is just a beast—92 billion transistors, 4,000 TOPs, four petaflops of AI, three times higher than the last generation, Ada. And we need all of it to generate those pixels that I showed you: 380 ray tracing tera flops—so that we could compute the most beautiful image you possibly can—and of course, 125 shader teraflops. There is actually a concurrent shader teraflops as well as an integer unit of equal performance, so two dual shaders; one is for floating point and one is for integer. G7 memory from Micron, 1.8 terabytes
per second, twice the performance of our last generation. And we now have the ability to intermix AI workloads with computer graphics workloads. One of the amazing things about this generation is the programmable shader is also able to now process neuron networks. So, the shader is able to carry these neuron networks, and as a result, we invented neurotexture compression and neuromaterial shading. As a result of that, you get these amazingly beautiful images that are only possible because we use AI to learn the texture, learn a compression algorithm, and as a result, get extraordinary results. Okay, so
this is the brand new RTX Blackwell. Now, even the mechanical design is a miracle. Look at this; it's got two fans. This whole graphics card is just one giant fan! You know, so the question is, where's the graphics card? Is it literally this big? The voltage regulator design is state-of-the-art, incredible design. The engineering team did a great job, so here it is. Thank you. Okay, so those are the speeds and fees. So how does it compare? Well, this is RTX 490. I know, I know many of you have one. I know it; look, it's $1,599.
It is one of the best investments you could possibly make. For $1,599, you bring it home to your $10,000 PC entertainment command center! Isn't that right? Don’t tell me that's not true. Don't be ashamed; it's liquid cooled, fancy lights all over it. You lock it when you leave; it's the modern home theater. It makes perfect sense. And now for $1,599, you get to upgrade that and turbocharge the living daylights out of it! Well now, with the Blackwell family, RTX 570 brings 490 performance at $549. [Applause] Impossible without artificial intelligence, impossible without the four tops—four teraflops
of AI tensor cores—impossible without the G7 memories. Okay, so 5070 for 490 performance: $549. And here's the whole family, starting from 5070 all the way up to 5090. 5090 is twice the performance of a 4090. Starting, of course, we’re producing at very large scale availability starting January. Well, it is incredible, but we've managed to put these gigantic performance GPUs into a laptop. This is a 570 laptop for $1,299. This 570 laptop has 4090 performance. I think there's one here somewhere. Let me show you this; look at this thing here. Let me—here, there are only so
many pockets, ladies and gentlemen. Janine! [Applause] Paul, can you imagine you get this incredible graphics card here, Blackwell, and we're going to shrink it and put it in? Does that make any sense? Well, you can't do that without artificial intelligence, and the reason for that is because we're generating most of the pixels using our tensor cores. So we retrace only the pixels we need, and we generate, using artificial intelligence, all the other pixels. We have, as a result, the amount of energy efficiency is just off the charts. The future of computer graphics is neural rendering—the
fusion of artificial intelligence and computer graphics. And what's really amazing is—oh, here we go! Thank you. You, this is a surprisingly kinetic keynote, and, uh, what's really amazing is the family of GPUs we're going to put in here. So the 1590— the 1590 will fit into a laptop, a thin laptop. That last laptop was 14, 14.9 mm. You got a 5080, 5070 TI, and 5070. Okay, so, ladies and gentlemen, the RTX Blackwell family! [Applause] Well, GeForce, uh, brought AI to the world, democratized AI. Now AI has come back and revolutionized GeForce. Let's talk about artificial
intelligence. Let's go to somewhere else at Nvidia. This is literally our office; this is literally Nvidia's headquarters. Okay, so let's talk about AI. The industry is chasing and racing to scale artificial intelligence. The scaling law is a powerful model; it's an empirical law that has been observed and demonstrated by researchers and industry over several generations. The scaling law says that the more training data you have, the larger the model that you have, and the more compute that you apply to it. Therefore, the more effective, or the more capable, your model will become. And so the
scaling law continues. What's really amazing is that now we're moving towards, of course, and the internet is producing about twice the amount of data every single year as it did last year. I think, in the next couple of years, humanity will produce more data than all of humanity has ever produced since the beginning. We're still producing a gigantic amount of data, and it's becoming more multimodal— video, images, and sound. All of that data could be used to train the fundamental knowledge, the foundational knowledge of an AI. But there are, in fact, two other scaling laws
that have emerged, and it's somewhat intuitive. The second scaling law is post-training scaling law. Post-training scaling law uses techniques like reinforcement learning from human feedback; basically, the AI produces and generates answers based on a human query. The human then, of course, gives feedback. It's much more complicated than that, but the reinforcement learning system, with a fair number of very high-quality prompts, causes the AI to refine its skills. It could fine-tune its skills for particular domains, become better at solving math problems, better at reasoning, and so on and so forth. It's essentially like having a mentor
or a coach give you feedback after you’re done going to school. You get tested, you get feedback, and you improve yourself. We also have reinforcement learning AI feedback, and we have synthetic data generation. These techniques are rather akin to, if you will, self-practice. You know the answer to a particular problem, and you continue to try it until you get it right. An AI could be presented with a very complicated and difficult problem that is verifiable, functionally, and has an answer that we understand—maybe proving a theorem, maybe solving a geometry problem. These problems would cause the
AI to produce answers, and using reinforcement learning, it would learn how to improve itself. That's called post-training. Post-training requires an enormous amount of computation, but the end result produces incredible models. We now have a third scaling law, and this third scaling law has to do with what's called test-time scaling. Test-time scaling is basically when you're using the AI; the AI has the ability to now apply a different resource allocation. Instead of improving its parameters, it's now focused on deciding how much computation to use to produce the answers it wants. Reasoning is a way of thinking
about this—long thinking is a way to think about this. Instead of direct inference or a one-shot answer, you might reason about it; you might break down the problem into multiple steps. You might generate multiple ideas and evaluate—your AI system would evaluate which one of the ideas you generated was the best one, maybe solving the problem step by step, and so forth. Now test-time scaling has proven to be incredibly effective. You're watching this sequence of technology, and all of these scaling laws emerge as we see incredible achievements from ChatGPT to 01 to 03 and now Gemini
Pro. All of these systems are going through this journey step by step: from pre-training to post-training to test-time scaling. Well, the amount of computation that we need, of course, is incredible, and we would like, in fact, that society has the ability to scale the amount of computation to produce more and more novel and better intelligence. Intelligence, of course, is the most valuable asset that we have, and it can be applied to solve a lot of very challenging problems. The scaling law is driving enormous demand for NVIDIA computing; it's driving an enormous demand for this incredible
chip we call Blackwell. Let's take a look at Blackwell. Well, Blackwell is in full production; it is incredible what it looks like. First of all, every single cloud service provider now has systems up and running. We have systems here from about 15 computer makers. It's being made about 200 different SKUs, 200 different configurations—they're liquid cooled, air cooled, x86, Nvidia gray CPU versions, MVLink 36 by 2, MVLink 72 by 1, a whole bunch of different types of systems, so that we can accommodate just about every single data center in the world. Systems are currently being manufactured
in some 45 factories. It tells you how pervasive artificial intelligence is and how much the industry is jumping onto artificial intelligence in this new computing model. Well, the reason why we're driving it so hard is that we need a lot more computation, and it's very clear—it's very clear that, um, Janine, you know, I—it's hard to tell you. You don't ever want to reach your hands into a dark place. Hang on a second, is this a good idea? All right. [Applause] [Music] Wait for it, wait for it. I thought I was worthy; apparently, you didn't think
I was worthy. All right, this is my show and tell. This is a show and tell. So, this MVLink system—this right here, this MVLink system—this is the GB200 MVlink 72. It is 1 and 12 tons, 600,000 parts, approximately equal to 20 cars, 120 kilowatts. It has, um, a spine behind it that connects all of these GPUs together: two miles of copper cable, 5,000 cables. This is being manufactured in 45 factories around the world. We build them, we liquid cool them, we test them, we disassemble them, shipping parts to the data centers, because it's 1 and
A2 tons. We reassemble it outside the data centers and install them. The manufacturing is insane, but the goal of all of this is because the scaling laws are driving computing so hard that this level of computation, Blackwell, over our last generation, improves the performance per watt by a factor of four. Performance per watt by a factor of four, performance per dollar by a factor of three. That basically says that in one generation, we reduce the cost of training these models by a factor of three, or if you want to increase the size of your model
by a factor of three, it's about the same cost. But the important thing is this: these are generating tokens that are being used by all of us when we use ChatGPT or when we use Gemini or our phones. In the future, just about all of these applications are going to be consuming these AI tokens, and these AI tokens are being generated by these systems. Every single data center is limited by power, and so if the performance per watt of Blackwell is four times our last generation, then the revenue that could be generated—the amount of business
that can be generated in the data center—is increased by a factor of four. And so these AI factory systems really are factories today. Now, the goal of all of this is so that we can create one giant chip. The amount of computation we need is really quite incredible, and this is basically one giant chip. If we would have had to build a chip—here we go, sorry guys, you see that? That's cool! Look at those disco lights in here! Right? If we had to build this as one chip, obviously, this would be the size of the
wafer, but this doesn't include the impact of yield; it would have to be probably three or four times the size. But what we basically have here is 72 Blackwell GPUs or 144 dies. This one chip here is 1.4 exaflops—the world's largest supercomputer, fastest supercomputer—only recently achieved an exaflop plus. This is 1.4 exaflops of AI floating-point performance. It has 14 terabytes of memory, but here's the amazing thing: the memory bandwidth is 1.2 petabytes per second. That's basically the entire internet traffic that's happening right now. The entire world's internet traffic is being processed across these chips. Okay,
and we have 130 trillion transistors in total, 2,592 CPU cores, a whole bunch of networking, and so these—I wish I could do this; I don't think I will. So these are the Blackwells. These are our ConnectX networking chips. These are the MVLink, and we're trying to prevent the Envy—the MVLink spine—but that's not possible, okay? And these are all of the HBM memories: 14 terabytes of HBM memory. This is what we're trying to do, and this is the miracle. This is the miracle of the Blackwell system. The Blackwell die right here—it is the largest single chip
the world has ever made! But yet, the miracle is really in addition to that. This is the Grace Blackwell system. Well, the goal of all of this, of course, is so that we can—thank you. Thanks, boy. Is there a chair I could sit down for a second? Can I have an MAO Ultra? How is it possible that we're in the Mobe Ultra Stadium? It's like coming to Nvidia, and we don't have a GPU for you! So, we need an enormous amount of computation because we want to train larger and larger models. And these inferences—these inferences
used to be one inference, but in the future, the AI is going to be talking to itself. It's going to be thinking; it's going to be internally reflecting and processing. So today, when the tokens are being generated at—so long as it's coming out at 20 or 30 tokens per second, it's basically as fast as anybody can read. However, in the future—and right now with GP1, you know, with the new pre-Gemini Pro and the new GP, the 0103 models— they're talking to themselves. They're reflecting; they're thinking. And so, as you can imagine, the rate at which
the tokens could be ingested is incredibly high, and so we need the token rates—the token generation rates—to go way up, and we also have to drive the cost way down simultaneously so that the quality of service can be extraordinary. The cost to customers can continue to be low and, uh, will continue to scale. And so that's the fundamental purpose—the reason why we created MV Link. Well, one of the most important things that's happening in the world of enterprise is a Genentech AI. A Genentech AI basically is a perfect example of test-time scaling. It's an AI
system of models; some of it is understanding and interacting with the customer, interacting with the user. Some of it is maybe retrieving information from storage—a semantic AI system, like a RAG. Uh, maybe it's going onto the internet; maybe it's, uh, studying a PDF file. And so it might be using tools, it might be using a calculator, and it might be using generative AI to, uh, generate, uh, charts and such. And it's iteratively taking the problem you gave it, breaking it down step by step, and it's iterating through all these different models. Well, in order to
respond to a customer in the future, in order for AI to respond, it used to be ask a question, answer, start spewing out; in the future, you ask a question and a whole bunch of models are going to be working in the background. And so, test-time scaling—the amount of computation used for inferencing—is going to go through the roof. It's going to go through the roof because we want better and better answers. Well, to help the industry build agentic AI, our go-to-market is not direct to enterprise customers; our go-to-market is we work with software developers in
the IT ecosystem to integrate our technology to make possible new capabilities, just like we did with CUDA libraries. We now want to do that with AI libraries. And just as the computing model of the past has APIs that are, uh, doing computer graphics or doing linear algebra or doing fluid dynamics, in the future, on top of those acceleration libraries, C acceleration libraries will have AI libraries. We've created three things for helping the ecosystem build agentic AI: NVIDIA Nims, which are essentially AI microservices all packaged up. It takes all of this really complicated CUDA software—CUDA DNN,
Cutlass, or TensorRT, or Triton, or all of these different really complicated pieces of software—and the model itself. We package it up, we optimize it, we put it into a container, and you could take it wherever you like. And so we have models for vision, for understanding languages, for speech, for animation, for digital biology, and we have some new, exciting models coming for physical AI. These AI models run in every single cloud because NVIDIA's GPUs are now available in every single cloud; they're available in every single OEM. So you could literally take these models, integrate them
into your software packages, create AI agents that run on Cadence, or they might be S, uh, ServiceNow agents, or they might be SAP agents, and they could deploy it to their customers and run it wherever the customers want to run the software. The next layer is what we call NVIDIA Nemo. Nemo is essentially a digital employee onboarding and training evaluation system. In the future, these AI agents are essentially digital workers that are working alongside your employees, um, working, uh, doing things for you on your behalf. And so, the way that you would bring these specialized
agents into your company is to onboard them, just like you onboard an employee. We have different libraries that help, uh, these AI agents be, uh, trained for the type of, you know, language in your company. Maybe the vocabulary is unique to your company; the business process is different; the way you work is different. So you would give them examples of what the work product should look like, and they would try to generate, and you would give feedback, and then you would evaluate them, so on and so forth. And you would guardrail them; you say these
are the things that you're not allowed to do, these are the things you're not allowed to say. This—and we even give them access to certain information. Okay, so that entire pipeline—a digital employee pipeline—is called Nemo. In a lot of ways, the IT department of every company is going to be the HR department of AI agents in the future. Today, they manage and maintain a bunch of software from the IT industry; in the future, they will maintain, you know, nurture, onboard, and improve a whole bunch of digital agents and provision them to the companies to use.
Okay? And so your IT department is going to become kind of like AI agent HR. And on top of that, we provide a whole bunch of blueprints that our ecosystem could, uh, take advantage of. All of this is completely open source, and so you could take it and, uh, modify the blueprints. We have blueprints for all kinds of different types of agents. Well, today we're also announcing that we're doing something that's really cool and I think really clever. We're announcing a whole family of models that are based off of Llama—the NVIDIA Llama Neotron Language Foundation
models. Llama 3.1 is a complete phenomenon. The download of Llama 3.1 from Meta is 350,650,000 times something like that. It has been derived and turned into other models—about 60,000 other different models. It is singularly the reason why just about every single enterprise and every single industry has been activated to start working on AI. Well, the thing that we did was we realized that the Llama models really could be better fine-tuned for enterprise use. And so we fine-tune them using our expertise and our... Capabilities and we turn them into the Llama Neotron Suite of open models.
There are small ones that interact in a very, very fast response time—extremely small. They're what we call Super Llama Neotron supers. They're basically your mainstream versions of your models or your Ultra model. The Ultra model could be used as a teacher model for a whole bunch of other models; it could be a reward model evaluator, a judge for other models to create answers, and decide whether it's a good answer or not—basically, give feedback to other models. It could be distilled in a lot of different ways—basically a teacher model, a knowledge distillation model—very large, very capable.
So all of this is now available online. Well, these models are incredible; they are number one in leaderboards for chat, leaderboard for instruction, and leaderboard for retrieval. The different types of functionalities necessary that are used in AI agents around the world—these are going to be incredible models for you. We're also working with the ecosystem; all of our Nvidia AI technologies are integrated into the industry. We have great partners and really great work being done at ServiceNow, at SAP, at Seagate for industrial AI. Cadence is doing great work, Synopsys is doing great work. I’m really proud
of the work that we do with Perplexity. As you know, they revolutionized search—really fantastic stuff. Codium—every software engineer in the world—this is going to be the next giant AI application, the next giant AI service, period. Is software coding. Thirty million software engineers around the world—everybody is going to have a software assistant helping them code. If not, obviously, you're just going to be way less productive and create lesser good code. And so this is thirty million; there’s a billion knowledge workers in the world. It is very, very clear that AI agents are probably the next robotics
industry and likely to be a multi-trillion dollar opportunity. Well, let me show you some of the blueprints that we've created and some of the work that we've done with our partners. AI agents are the new digital workforce, working for and with us. AI agents are a system of models that reason about a mission, break it down into tasks, and retrieve data or use tools to generate a quality response. Nvidia's agentic AI building blocks, pre-trained models, and Nemo framework let organizations easily develop AI agents and deploy them anywhere. We will onboard and train our agentic workforces
on our company’s methods like we do for employees. AI agents are domain-specific task experts. Let me show you four examples for the billions of knowledge workers and students. AI research assistant agents ingest complex documents like lectures, journals, and financial results, and generate interactive podcasts for easy learning. By combining a U-Net regression model with a diffusion model, Cordi can downscale global weather forecasts down from 25 km to 2 km. Developers at Nvidia manage software security AI agents that continuously scan software for vulnerabilities, alerting developers to what action is needed. Virtual Lab AI agents help researchers design
and screen billions of compounds to find promising drug candidates faster than ever. Nvidia analytics AI agents built on an Nvidia METR blueprint—including Nvidia Cosmos, Nimron, Vision language models, Llama Neoton LLMs, and Nemo Retriever—Metropolis agents analyze content from the billions of cameras, generating 100,000 pieces of video per day. They enable interactive search, summarization, and automated reporting, and help monitor traffic flows, flagging congestion or danger in industrial facilities. They monitor processes and generate recommendations for improvement. Metropolis agents centralize data from hundreds of cameras and can reroute workers or robots when incidents occur. The age of agentic AI
is here for every organization. Okay, that was the first pitch at a baseball. That was not generated; I just felt that none of you were impressed. Okay, so AI was created in the cloud and for the cloud. AI is creating the cloud for the cloud. And for enjoying AI on phones, of course, it's perfect. Very, very soon, we're going to have a continuous AI that's going to be with you, and when you use those Meta glasses, you could, of course, point at something, look at something, and ask it, you know, whatever information you want. AI
is perfect in the cloud—it's creating the cloud—and perfect in the cloud. However, we would love to be able to take that AI everywhere. I've mentioned already that you could take Nvidia AI to any cloud, but you could also put it inside your company. But the thing that we want to do more than anything is put it on our PC as well. And so, as you know, Windows 95 revolutionized the computer industry. It made possible this new suite of multimedia services, and it changed the way that applications were created forever. Windows 95—this model of computing, of
course, is not perfect for AI. And so the thing that we would like to do is we would like to have in the future your AI basically become your AI assistant. Instead of just the 3D APIs, the sound APIs, and the video APIs, you would have generative APIs—generative APIs for 3D, generative APIs for language, and generative AI for sound, and so on and so forth. We need a system that makes that possible while leveraging the massive investment that's in the cloud. There's no way that we—the world—can create yet another way of programming AI models. It's
just not going to happen. And so if we could... Figure out a way to make Windows PC a world-class AI PC. Um, it would be completely awesome, and it turns out the answer is Windows. It's Windows WSL2—Windows WSL2, Windows WSL2! Basically, it's two operating systems within one. It works perfectly; it's developed for developers, and it's developed, uh, so that you can have access to bare metal. WSL2 has been optimized for cloud-native applications; it is optimized for—and very importantly, it's been optimized for CUDA. So WSL2 supports CUDA perfectly out of the box. As a result, everything
that I showed you with NVIDIA Nims, NVIDIA Nemo, the blueprints that we develop that are going to be up on ai.nvidia.com, so long as the computer fits it—so long as you can fit that model—and we're going to have many models that fit, whether it's vision models, language models, speech models, or these animation human digital human models. All kinds of different types of models are going to be perfect for your PC. You download it, and it should just run. So our focus is to turn Windows WSL2 Windows PC into a target first-class platform that we will
support and maintain for as long as we shall live. This is an incredible thing for engineers and developers everywhere. Let me show you something that we can do with that. This is one of the examples of a blueprint we just made for you: generative AI synthesizes amazing images from simple text prompts. Yet, image composition can be challenging to control using only words. With NVIDIA NIM microservices, creators can use simple 3D objects to guide AI image generation. Let's see how a concept artist can use this technology to develop the look of a scene. They start by
laying out 3D assets created by hand or generated with AI, then use an image generation NIM such as Flux to create a visual that adheres to the 3D scene. Add or move objects to refine the composition; change camera angles to frame the perfect shot, or reimagine the whole scene with a new prompt. Assisted by generative AI and NVIDIA NIM, artists can quickly realize their vision. NVIDIA AI for your PCs—hundreds of millions of PCs in the world with Windows—so we could get them ready for AI. OEMs—all the PC OEMs we work with, basically all of the
world's leading PC OEMs—are going to get their PCs ready for this stack. And so, AI PCs are coming to a home near you. Linux is good. Okay, let's talk about physical AI. Speaking of Linux, let's talk about physical AI. So, physical AI: imagine, whereas your large language model generates tokens one at a time to produce the output when you give it your context or your prompt on the left, the amazing thing is this model in the middle is quite large and has billions of parameters. The context length is incredibly large because you might decide to
load in a PDF—in my case, I might load in several PDFs before I ask it a question. Those PDFs are turned into tokens. The basic attention characteristic of a transformer has every single token find its relationship and relevance against every other token. So you could have hundreds of thousands of tokens, and the computational load increases quadratically. It processes all of the parameters and the input sequence through every single layer of the transformer and produces one token. That's the reason why we needed BERT. Then the next token is produced when the current token is done. It
puts the current token into the input sequence and takes that whole thing to generate the next token. It does it one at a time. This is the transformer model; it's the reason why it is so, so incredibly effective yet computationally demanding. What if, instead of PDFs, it's your surroundings? And what if, instead of a prompt or a question, it's a request: "Go over there and pick up that box and bring it back?" Instead of producing text tokens, it produces action tokens. What I just described is a very sensible thing for the future of robotics, and
the technology is right around the corner. But what we need to do is create effectively a world model, as opposed to GPT, which is a language model. This world model has to understand the language of the world; it has to understand physical dynamics, things like gravity, friction, and inertia. It has to understand geometric and spatial relationships. It has to understand cause and effect—if you drop something, it falls to the ground; if you poke at it, it tips over. It has to understand object permanence—if you roll a ball over the kitchen counter, when it goes off
the other side, the ball didn’t leave into another quantum universe; it's still there. All of these types of understanding are intuitive understandings that we know most models today have a very hard time with. So we would like to create a world foundation model. Today, we're announcing a very big thing: we're announcing NVIDIA Cosmos, a world foundation model that is designed to understand the physical world. The only way for you to really understand this is to see it. Let's flip the next frontier of AI: physical AI. Model performance is directly related to data availability, but physical
world data is costly to capture, curate, and label. NVIDIA Cosmos is a world foundation model development platform to advance physical AI. It includes auto-regressive... World Found Foundation models, diffusion-based World Foundation models, advanced tokenizers, and an NVIDIA CUDA accelerated data pipeline. Cosmos models ingest text, image, or video prompts and generate virtual world states as videos. Cosmos generations prioritize the unique requirements of AV and robotics use cases, like real-world environments, lighting, and object permanence. Developers use NVIDIA Omniverse to build physics-based, geospatially accurate scenarios, then output Omniverse renders into Cosmos, which generates photoreal, physically-based synthetic data—whether diverse objects
or environments; conditions like weather or time of day; or edge case scenarios. Developers use Cosmos to generate worlds for reinforcement learning AI feedback to improve policy models or to test and validate model performance, even across multisensor views. Cosmos can generate tokens in real-time, bringing the power of foresight and multiverse simulation to AI models, generating every possible future to help the model select the right path. Working with the world's developer ecosystem, NVIDIA is helping advance the next wave of physical AI. NVIDIA Cosmos, NVIDIA Cosmos, NVIDIA Cosmos—the world's first world foundation model—is trained on 20 million hours
of video. The 20 million hours of video focuses on physical, dynamic themes—humans walking, hands moving, manipulating things, fast camera movements. It's really about teaching the AI to understand the physical world rather than generating creative content. From this physical AI, there are many downstream things that we could do as a result. We could do synthetic data generation to train models; we could distill it and turn it into effectively the seed, the beginnings of a robotics model. You could have it generate multiple physically plausible scenarios for the future—you could do a Doctor Strange. Because this model understands
the physical world, it could also do captioning; it could take videos and caption them incredibly well. That captioning and the video could be used to train large language models, multimodal large language models. You could use this technology, this foundation model, to train robotics as well as larger language models. This is the NVIDIA Cosmos platform, which has an autoregressive model for real-time applications, a diffusion model for very high-quality image generation, and an incredible tokenizer—basically learning the vocabulary of the real world—and a data pipeline. If you would like to take all of this and train it on
your own data, this data pipeline, because there’s so much data involved, has accelerated everything end-to-end for you. This is the world's first data processing pipeline that's CUDA-accelerated as well as AI-accelerated. All of this is part of the Cosmos platform, and today we're announcing that Cosmos is open licensed; it's available on GitHub. We hope that this moment—and there’s a small, medium, large for very fast models, mainstream models, and also teacher models, basically not knowledge transfer models—Cosmos becoming an open world foundation model will do for the world of robotics and industrial AI what Llama 3 has done
for enterprise AI. The magic happens when you connect Cosmos to Omniverse, and the reason fundamentally is this: Omniverse is a physics-grounded—not physically grounded, but physics-grounded—algorithmic physics-principled physics simulation grounded system. It's a simulator. When you connect that to Cosmos, it provides the grounding, the ground truth that can control and condition the Cosmos generation. As a result, what comes out of Cosmos is grounded in truth. This is exactly the same idea as connecting a large language model to a RAG (retrieval-augmented generation) system. You want to ground the AI generation on ground truth. Thus, the combination of the
two gives you a physically simulated, physically grounded multiverse generator, and the applications, the use cases, are really quite exciting. Of course, for robotics and industrial applications, it is very clear that this Cosmos plus Omniverse plus Cosmos represents the third computer that's necessary for building robotic systems. Every robotics company will ultimately have to build three computers. A robotics system could be a factory; it could be a car; it could be a robot. You need three fundamental computers: one computer, of course, to train the AI—we call the DGX computer to train the AI. Another, of course, when
you're done to deploy the AI—we call that AGX; that’s inside the car, in the robot, or in an AMR, or you know, at a stadium, or whatever it is. These computers are at the edge and they're autonomous, but to connect the two you need a digital twin. This is all the simulations that you were seeing. The digital twin is where the AI that has been trained goes to practice, to be refined, to do its synthetic data generation, reinforcement learning AI feedback, and such. So, it's the digital twin of the AI. These three computers are going
to be working interactively. NVIDIA's strategy for the industrial world—and we've been talking about this for some time—is this three-computer system. Instead of a three-body problem, we have a three-computer solution. Let me give you three examples. All right, so the first example is how we apply all of this to industrial digitalization. There are millions of factories, hundreds of thousands of warehouses—that’s basically the backbone of A50. The trillion-dollar manufacturing industry—all of that has to become software-defined. All of that has to have automation in the future, and all of it will be infused with robotics. Well, we're partnering
with Keon, the world's leading warehouse automation solutions provider, and Accenture, the world's largest professional services provider, and they have a big focus on digital manufacturing. We're working together to create something that's really special, and I'll show you that in a second. But our go-to-market is essentially the same as all of the other software platforms and all the technology platforms that we have through the developers and ecosystem partners. We have just a growing number of ecosystem partners connecting to Omniverse, and the reason for that is very clear: everybody wants to digitalize the future of industries. There's
so much waste, so much opportunity for automation in that $50 trillion dollar portion of the world's GDP. So let's take a look at one example that we're doing with Keon and Accenture. Keon, the supply chain solution company; Accenture, a global leader in professional services; and Nvidia are bringing physical AI to the $1 trillion warehouse and distribution center market. Managing high-performance warehouse logistics involves navigating a complex web of decisions influenced by constantly shifting variables. These include daily and seasonal demand changes, space constraints, workforce availability, and the integration of diverse robotic and automated systems. Predicting operational KPIs
of a physical warehouse is nearly impossible today. To tackle these challenges, Keon is adopting Mega, an Nvidia Omniverse blueprint for building industrial digital twins to test and optimize robotic fleets. First, Keon's warehouse management solution assigns tasks to the industrial AI brains in the digital twin, such as moving a load from a buffer location to a shuttle storage solution. The robot's brains are in a simulation of a physical warehouse, digitalized into Omniverse using open USD connectors to aggregate CAD, video, and image data into 3D light art, point clouds, and AI-generated data. The fleet of robots executes
tasks by perceiving and reasoning about their Omniverse digital twin environment, planning their next motion, and acting. The robot brains can see the resulting state through sensor simulations and decide their next action. The loop continues while Mega precisely tracks the state of everything in the digital twin. Now, Keon can simulate infinite scenarios at scale while measuring operational KPIs such as throughput, efficiency, and utilization—all before deploying changes to the physical warehouse. Together with Nvidia, Keon and Accenture are reinventing industrial autonomy. In the future, everything is in simulation. In the future, every factory will have a digital twin,
and that digital twin operates exactly like the real factory. In fact, you could use Omniverse with Cosmos to generate a whole bunch of future scenarios, and then an AI decides which one of the scenarios is the most optimal for whatever KPIs, and that becomes the programming constraints—the program, if you will—the AI that will be deployed into the real factories. The next example is autonomous vehicles. The AV revolution has arrived after so many years of weo success and Tesla's success. It is very, very clear: autonomous vehicles have finally arrived. Well, our offering to this industry is
the three computers: the training systems, training the AIs, the simulation systems, and the synthetic data generation systems—Omniverse, Cosmos, and also the computer that's inside the car. Each car company might work with us in a different way—use one, two, or three of the computers. We're working with just about every major car company around the world—wmo, zuk, and Tesla, of course, in their data center; BYD, the largest EV company in the world; JLR has got a really cool car coming; Mercedes has a fleet of cars coming with Nvidia starting this year going into production. I'm super, super
pleased to announce that today, Toyota and Nvidia are going to partner together to create their next generation AVs. Just so many cool companies: Lucid, Rivian, and She, and of course, Volvo. Just so many different companies. Wabi is building self-driving trucks. Aurora—we announced this week that Aurora is going to use Nvidia to build self-driving trucks. There are 100 million cars built each year, a billion vehicles on the road all over the world, a trillion miles that are driven around the world each year. That's all going to be either highly autonomous or fully autonomous coming up, and
so this is going to be a very large industry. I predict that this will likely be the first multi-trillion dollar robotics industry, this business for us. Notice, just a few of these cars are starting to ramp into the world. Our business is already $4 billion, and this year, probably on a run rate of about $5 billion, so it's really a significant business already. This is going to be very large. Well, today we're announcing that our next generation processor for the car, our next generation computer for the car, is called Thor. I have one right here.
Hang on a second. Okay, this is Thor. This is Thor. This is a robotics computer. This is a robotics computer that takes sensors and just an immense amount of sensor information, processes it—you know, ET cameras, high-resolution radars, LiDARs—they're all coming into this chip. This chip has to process all that sensor information, turn them into tokens, put them into a transformer, and predict the next path. This AV computer is now in full production. Thor is 20 times the processing capability of our last generation, Orin, which is really the standard. Of autonomous vehicles today, and so this
is just really quite, quite incredible. Thor is in full production. This robotics processor, by the way, also goes into a full robot, and so it could be an AMR; it could be a human or robot; it could be the brain; it could be the manipulator. This processor basically is a universal robotics computer. The second part of our drive system that I'm incredibly proud of is the dedication to safety: Drive OS. I'm pleased to announce it is now the first software-defined programmable AI computer that has been certified up to ASIL D, which is the highest standard
of functional safety for automobiles — the only and the highest. So I'm really, really proud of this ASIL ISO 26262. It is the work of some 15,000 engineering years. This is just extraordinary work. As a result of that, CUDA is now a functionally safe computer. So if you're building a robot, NVIDIA CUDA. Okay, so now I wanted to — I told you I was going to show you what we would use Omniverse and Cosmos to do in the context of self-driving cars. You know, today, instead of showing you a whole bunch of videos of cars
driving on the road, I'll show you some of that too. But I want to show you how we use the car to reconstruct digital twins automatically using AI and use that capability to train future AI models. Okay, let's play it. The autonomous vehicle revolution is here. Building autonomous vehicles, like all robots, requires three computers: NVIDIA DGX to train AI models, Omniverse to test drive and generate synthetic data, and Drive AGX, a supercomputer in the car. Building safe autonomous vehicles means addressing edge scenarios, but real-world data is limited, so synthetic data is essential for training. The
autonomous vehicle data factory powered by NVIDIA Omniverse AI models and Cosmos generates synthetic driving scenarios that enhance training data by orders of magnitude. First, Omnimap fuses map and geospatial data to construct drivable 3D environments. Driving scenario variations can be generated from replay drives or AI traffic generators. Next, a neural reconstruction engine uses autonomous vehicle sensor logs to create high-fidelity 4D simulation environments. It replays previous drives in 3D and generates scenario variations to amplify training data. Finally, Edify 3DS automatically searches through existing asset libraries or generates new assets to create sim-ready scenes. The Omniverse scenarios are
used to condition Cosmos to generate massive amounts of photorealistic data, reducing the sim-to-real gap and, with text prompts, generating near-infinite variations of the driving scenario. With Cosmos, Neutron video search, the massively scaled synthetic dataset, combined with recorded drives, can be curated to train models. NVIDIA's AI data factory scales hundreds of drives into billions of effective miles, setting the standard for safe and advanced autonomous driving. [Music] Is that incredible? We take thousands of drives and turn them into billions of miles. We are going to have mountains of training data for autonomous vehicles. Of course, we still
need actual cars on the road; of course, we will continuously collect data for as long as we shall live. However, synthetic data generation using this multiverse, physically based, physically grounded capability means that we generate data for training AIs that are physically grounded and accurate and plausible. So that we could have an enormous amount of data to train with, the AV industry is here. This is an incredibly exciting time! I'm super, super excited about the next several years. I think you're going to see, just as computer graphics was revolutionized at such an incredible pace, you're going
to see the pace of AV development increasing tremendously over the next several years. I think the next part is robotics. So, human robots — my friends, the Chat GPT moment for general robotics is just around the corner. In fact, all of the enabling technologies that I've been talking about are going to make it possible for us, in the next several years, to see very rapid breakthroughs — surprising breakthroughs — in general robotics. Now, the reason why general robotics is so important is that whereas robots with tracks and wheels require special environments to accommodate them, there
are three robots in the world that we can make that require no greenfield or brownfield adaptation. If we could possibly build these amazing robots, we could deploy them in exactly the world that we've built for ourselves. These three robots are: one, agentic robots; agentic AI, because, you know, they're information workers. So long as they could accommodate the computers that we have in our offices, it's going to be great. Number two, self-driving cars — and the reason for that is we spent 100-plus years building roads and cities. And number three, human-like robots. If we have the
technology to solve these three, this will be the largest technology industry the world has ever seen. So, we think that the robotics era is just around the corner. The critical capability is how to train these robots. In the case of human-like robots, the imitation information is rather hard to collect, and the reason for that is in the case of cars, you just drive them. We're driving cars all the time. In the case of these human-like robots, the imitation information — the human demonstration — is rather laborious to do. And so, we need to come up
with a clever way to take hundreds of demonstrations, thousands of human demonstrations, and somehow use artificial intelligence and Omniverse to synthetically generate millions of synthetically generated motions. From those motions, the AI can learn how to perform a task. Let me show you how that's done, developers. Around the world, developers are building the next wave of physical AI embodied robots. Creating general-purpose robot models requires massive amounts of real-world data, which is costly to capture and curate. Nvidia Isaac Groot helps tackle these challenges by providing humanoid robot developers with four things: robot foundation models, data pipelines, simulation
frameworks, and a Thor robotics computer. The Nvidia Isaac Groot blueprint for synthetic motion generation is a simulation workflow for imitation learning, enabling developers to generate exponentially large data sets from a small number of demonstrations. First, Groot Teleop enables skilled human workers to portal into a digital twin of their robot using the Apple Vision Pro. This means operators can capture data even without a physical robot, and they can operate the robot in a risk-free environment, eliminating the chance of physical damage or wear and tear. To teach a robot a single task, operators capture motion trajectories through
a handful of teleoperated demonstrations, then use Groot Mimic to multiply these trajectories into a much larger data set. Next, they use GroGen, built on Omniverse and Cosmos, for domain randomization and 3D-to-real upscaling, generating an exponentially larger data set. The Omniverse and Cosmos Multiverse simulation engine provides a massively scaled data set to train the robot policy. Once the policy is trained, developers can perform software-in-the-loop testing and validation in Isaac Sim before deploying to the real robot. The age of general robotics is arriving, powered by Nvidia Isaac Groot. We’re going to have mountains of data to train
robots with Nvidia Isaac Group. This is our platform to provide technology elements to the robotics industry to accelerate the development of general robotics. I have one more thing that I want to show you. None of this would be possible if not for this incredible project that we started about a decade ago inside the company, called Project DIGITS, which stands for Deep Learning GPU Intelligence Training System. Well, before we launched it, I shrunk it to DGX and harmonized it with RTX, AGX, OVX, and all of the other X's that we have in the company. It really
revolutionized DGX-1, which transformed artificial intelligence. The reason why we built it was that we wanted to make it possible for researchers and startups to have an out-of-the-box AI supercomputer. Imagine the way supercomputers were built in the past; you really had to build your own facility and infrastructure and engineer it into existence. So, we created a supercomputer for AI development—specifically for researchers and startups—that comes literally right out of the box. I delivered the first one to a startup company in 2016 called OpenAI, where Elon Musk, Ilya Sutskever, and many NVIDIA engineers were present. We celebrated the
arrival of DGX-1, which, as you know, revolutionized artificial intelligence and computing. Now, artificial intelligence is everywhere. It's not just confined to researchers and startup labs; we want artificial intelligence to be embedded in the new way of performing computing. Every software engineer, engineer, and creative artist—everybody who uses computers today as a tool—will need an AI supercomputer. So, I just wish that DGX-1 were smaller. Imagine, ladies and gentlemen, this is NVIDIA's latest AI supercomputer, which is currently called Project DIGITS. If you have a good name for it, reach out to us. Here's the amazing thing: this is
an AI supercomputer that runs the entire NVIDIA AI stack. All of NVIDIA's software runs on this. This unit can sit somewhere and connect wirelessly to your computer; it can even function as a workstation if you prefer. You could access it like a cloud supercomputer. NVIDIA's AI works on it and it's based on a super-secret chip that we've been developing called GB 110—the smallest Grace Blackwell that we make. You know what? Let’s show everybody the insight. Isn't it just so cute? This is the chip that’s inside it. It is in production. This top-secret chip was developed
in collaboration with MediaTek, the world’s leading SoC company, who worked with us to build this CPU. This CPU connects with chip-to-chip NVLink to the Blackwell GPU, and this little thing here is in full production. We expect this computer to be available around the May timeframe, so it’s coming your way. It's incredible what we can do, and it's just stunning. I was trying to figure out if I need more hands or more pockets! Imagine, this is what it looks like. Who doesn’t want one of those? If you use a PC or Mac, you know that it’s
a cloud platform—a cloud computing platform— that sits on your desk. You could also use it as a Linux workstation, and if you would like to have double-digit capabilities, this is what it looks like. It together, uh, uh, with ConnectX, and it has Nickel GPU, direct all of that out of the box. It's like a supercomputer. Our entire supercomputing stack, uh, is available, and so Nvidia Project Digits. [Applause] Okay, well, let me, let me, let me tell you what I told you. I told you that we are in production with three new Blackwells. Not only is
the Grace Blackwell supercomputer MVLink 72s in production all over the world, we now have three new Blackwell systems in production. One amazing AI foundational model, the world's first physical AI Foundation model, is open and available to activate the world's industries of robotics and such. And three, and three robotics, three robots working on, uh, agentic AI—human or robots—and self-driving cars. Uh, it's been an incredible year. I want to thank all of you for your partnership, uh, thank all of you for coming. I made you a short video to reflect on last year and look forward to
the next year. Play, please. [Music] [Applause] [Music] [Music] [Music] [Music] [Applause] [Music] [Music] Have a great CUS, everybody! Happy New Year! Thank you.
Copyright © 2025. Made with ♥ in London by YTScribe.com