I really think you have to let this sink in in another hour or so you're going to feel good about it well uh welcome to Nvidia in fact you're inside nvidia's digital twin and we're going to take you to Nvidia ladies and gentlemen welcome to Nvidia your inside are digital twin everything here is generated by AI it has been an extraordinary Journey extraordinary year and uh it started in 1993 ready go with nv1 we wanted to build computers that can do things that normal computers could and mv1 made made it possible to have a game console in your PC our programming architecture was called UD missing the letter c until a little while later but UDA uni unified device architecture and the first developer for UDA and the first application that ever worked on UDA was sega's Virtual Fighter six years later we invented in 1999 the programmable GPU and it started 20 years 20 plus years of incredible advance in this incredible processor called the GPU it made modern computer Graphics possible and now 30 years later sega's Virtual Fighter is completely cinematic this is the new Virtual Fighter project that's coming I just can't wait absolutely incredible 6 years after that six year 6 years after 1999 we invented Cuda so that we could explain or Express the programmability of our gpus to a rich set of algorithms that could benefit from it Cuda initially was difficult to explain and it took years in fact it took approximately six years so somehow 6 years later 6 years later or so 2012 Alex kvki ilas suser and Jeff Hinton discovered Cuda used it to process alexnet and the rest of it is history AI has been advancing at an incredible Pace sense started with perception AI we now can understand images and words and sounds to generative AI we can generate images and text and sounds and now a gentic ai ai that can perceive reason plan and act and then the next phase some of which we'll talk about tonight physical AI 2012 now magically 2018 something happened that was pretty incredible Google's Transformer was released as Bert and the world of AI really took off Transformers as you know completely changed the landscape for artificial intelligence in fact it completely changed the landscape for computing alog together we recognized properly that AI was not just a new application with a new business opportunity but AI more importantly machine learning enabled by Transformers was going to fundamentally change how Computing works and today Computing is revolutionized in every single layer from hand coding instructions that run on CPUs to create software tools that humans use we now have machine learning that creates and optimizes Neo networks that processes on gpus and creates artificial intelligence every single layer of the technology stack has been completely changed an incredible transformation in just 12 years well we can Now understand information of just about any modality surely you've seen text and images and sounds and things like that but not only can we understand those we can understand amino acids we can understand physics we understand them we can translate them and generate them the applications are just completely endless in fact almost any AI application that you you see out there what modality is the input that it learned from what modality of information did it translate to and what modality of information is it generating if you ask these three fundamental questions just about every single application could be inferred and so when you see application after applications that are AI driven AI native at the core of it this fundamental concept is there machine learning has changed how every application is going to be built how Computing will be done and the possibilities Beyond well gpus GeForce in a lot of ways all of this with AI is the house that GeForce built GeForce enabled AI to reach the masses and now ai is coming home to GeForce there are so many things that you can't do without AI let me show you some of it now [Music] [Applause] [Music] [Music] [Music] [Applause] [Music] that was realtime computer Graphics no computer graphics researcher no computer scientist would have told you that it is possible for us to R trce every single Pixel at this point we Ray tracing is a simulation of light the amount of geometry that you saw was absolutely insane it would have been impossible without artificial intelligence there are two fundamental things that we did we used of course programable shading and Ray traced acceleration to produce incredibly beautiful pixels but then we have artificial intelligence be conditioned be controlled by that pixel to generate a whole bunch of other pixels not only is it able to generate other pixels spatially because it's aware of what the color should be it has been trained on a supercomputer back in Nvidia and so the neuron Network that's running on the GPU can infer and predict the pixels that we did not render not only can can we do that it's called dlss the latest generation of dlss also generates Beyond frames it can predict the future generating three additional frames for every frame that we calculate what you saw if we just said four frames of what you saw because we're going to render one frame and generate three if I said four frames at full HD 4K that's 33 million pixels or so out of that 33 million pixels we computed only two it is an absolute miracle that we can computationally computationally using programmable shaders and our Ray traced engine race racing engine to compute 2 million pixels and have ai predict all of the other 33 and as a result we're able to render at incredibly high performance because AI does a lot less computation it takes of course an enormous amount of training to produce that but once you train it the generation is extremely efficient so this is one of the incredible capabilities of artificial intelligence and that's why there's so many amazing things that are happening we used GeForce to enable artificial intelligence and now artificial intelligence is revolutionizing GeForce everyone today we're announcing our next generation the RTX Blackwell family let's take a look [Music] [Music] [Music] [Music] w here it is our brand new GeForce RTX 50 Series Blackwell architecture the GPU is just a beast 92 billion transistors 4,000 tops four pedop flops of AI three times higher than the last generation Ada and we need all of it to generate those pixels that I showed you 380 Ray tracing teraflops so that we could for the pixels that we have to compute compute the most beautiful image you possibly can and of course 125 Shader Tera flops there's actually a concurrent Shader Tera flops as well as an iner unit of equal performance so two dual shaders one is for floating point one is for integer G7 memory from micron 1. 8 terabytes Per Second Twice the performance of our last generation and we now have the ability to intermix AI workloads with computer graphics workloads and one of the amazing things about this generation is the programmable Shader is also able to now process neuron networks so the Shader is able to carry these neuron networks and as a result we invented neurot texture compression and neurom material shading as a result of that you get these amazingly beautiful images that are only possible because we use AIS to learn the texture learned a compression algorithm and as a result get extraordinary results okay so this is this is uh the brand new RTX Blackwell 1590 now even even the even the mechanical design is is a miracle look at this it's got two fans this whole graphics card is just one giant fan you know so the question is where's the graphics card is it literally this big the voltage regulated design is state-of-the-art incredible design the engineering team did a great job so here it is thank you okay so those are the speeds and feeds so how does it compare ha well this is RTX 490 I know I know many of you have one I know it look it's $1,599 it is one of the best investments you could possibly make you for $15. 99 you bring it home to your $10,000 PC entertainment Command Center isn't that right don't tell me that's not true don't be ashamed it's liquid cooled fancy lights all over it you lock it when you leave it's it's the modern home theater it makes perfect sense and now for $1,500 and99 $15.
99 you get to upgrade that and turbocharge The Living Daylights out of it well now with the Blackwell family RTX 570 490 performance at 549 impossible without artificial intelligence impossible without the Four Tops four ter Ops of AI tensor cores impossible without the G7 memories Okay so 70 490 performance $549 and here's the whole family starting from 570 all the way up to 590 5090 twice the performance of a [Applause] 4090 starting of course we're producing a very large scale availability starting January well it is incredible but we managed to put these in in gigantic performance gpus into a laptop this is a 570 laptop for $12. 99 this 570 laptop has a 4090 performance I think there's one here somewhere let me show you this this is a look at this thing here let me here there's only so many pockets ladies and gentlemen Janine Paul [Applause] so can you imagine you get this incredible graphics card here Blackwell we're going to shrink it and put it in put it in there does that make any sense well you can't do that without artificial intelligence and the reason for that is because we're generating most of the pixels using pixels using our tensor cores so we retrace only the pixels we need and we generate using artificial intelligence all the other pixels we have as a result the amount of the Energy Efficiency is just off the charts the future of computer Graphics is neural rendering the fusion of artificial intelligence and computer graphics and what's really amazing is oh here we go thank you this is a surprisingly kinetic keynote and and uh what's really amazing is the family of gpus we're going to put in here and so the 1590 the 1590 will fit into a laptop a thin laptop that last laptop was 14 14. 9 mm you got a 5080 5070 TI and 5070 okay so ladies and gentlemen the RTX blackw [Applause] family well GeForce uh brought AI to to the world democratized AI now ai has come back back and revolutionized gForce let's talk about artificial intelligence let's go to somewhere else at Nvidia this is literally our office this is literally nvidia's headquarters okay so let's talk about let's talk about AI the industry is chasing and racing to scale artificial intelligence in artificial intelligence and the scaling law is a powerful model it's an empirical law that has been observed and demonstrated by researchers and Industry over several generations and this the the scale the scaling laws says that the more data you have the training data that you have the larger model that you have and the more compute that you apply to it therefore the more effective or the more capable your model will become and so the scaling law continues what's really amazing is that now we're moving towards of course and the internet is producing about twice twice the amount of data every single year as it did last year I think the in the next couple years we produce uh Humanity will produce more data than all of humanity has ever produced uh since the beginning and so we're still producing a gigantic amount of data and it's becoming multimodal video and images and sound all of that data could be used to train the fundamental knowledge the foundational knowledge of an AI but there are in fact two other scaling laws that has now emerged and it's somewhat intuitive the second scaling law is posttraining scaling law posttraining scaling law uses Technologies techniques like reinforcement learning human feedback basically the AI produces and generates answers the hum based on a human query the human then of course gives a feedback um it's much more complicated than that but that reinforcement learning system uh with a fair number of very high quality prompts causes the AI to refine its skills it could fine-tune its skills for particular domains it could be better at solving math problems better at reasoning so on so forth and so it's essentially like having a mentor or having a coach give you feedback um after you're done going to school and so you you get test you get feedback you improve yourself we also have reinforcement learning AI feedback and we have synthetic data generation uh these techniques are rather uh uh akin to if you will uh self-practice uh you know you know the answer to a particular problem and uh you continue to try it until you get it right and so an AI could be presented with a very complicated and difficult problem that has that is verifiable U functionally and has a has an answer that we understand maybe proving a theorem maybe solving a solving a geometry problem and so these problems uh would cause the AI to produce answers and using reinforcement learning uh it would learn how to improve itself that's called post training post trainining requires an enormous amount of computation but the result produces incredible models we now have a third scaling law and this third scaling law has to do with uh what's called test time scaling test time scaling is basically when you're being used when you're using the AI uh the AI has the ability to now apply a different resource allocation instead of improving its parameters now it's focused on deciding how much computation to use to produce the answers uh it wants to produce reasoning is a way of thinking about this uh long thinking is a way to think about this instead of a direct inference or One-Shot answer you might reason about it you might break down the problem into multiple steps you might uh generate multiple ideas and uh evaluate you know your AI system would evaluate which one of the ideas that you generated was the best one maybe it solves the problem step by step so on so forth and so now test time time scaling has proven to be incredibly effective you're watching this sequence of technology and this all of these scaling loss emerge as we see incredible achievements from chat GPT to 01 to 03 and now Gemini Pro all of these systems are going through this journey step by step by step of pre-training to posttraining to test time scaling well the amount of computation that we need of course is incredible and we would like in fact we would like in fact that Society has the ability to scale the amount of computation to produce more and more novel and better intelligence intelligence of course is the most valuable asset that we have and it can be applied to solve a lot of very challenging problems and so scaling law it's driving enormous demand for NVIDIA Computing is driving an enormous demand for this incredible chip we call Blackwell let's take a look at Blackwell well Blackwell is in full production it is incredible what it looks like so first of all there's some uh every every single cloud service provider now have systems up and running uh we have systems here from about 15 uh 15 15 U uh excuse me 15 computer makers it's being made uh about 200 different SKS 200 different configurations they're liquid cooled air cooled x86 Nvidia gray CPU versions MV link 36 by2 MV links 702 by one whole bunch of different types of systems so that we can accommodate just about every single data center in the world well this these systems are being currently manufactured in some 45 factories it tells you how pervasive artificial intelligence is and how much the industry is jumping onto artificial intelligence in this new Computing model well the reason why we're driving it so hard is because we need a lot more computation and it's very clear it's very clear that that um so uh this mvlink system this right here this mvlink system this is gb200 MV link 72 it is 1 and2 tons 600,000 s approximately equal to 20 cars 12 12 120 kilow it has um a spine behind it that connects all of these GPU together two miles of copper cable 5,000 cables this is being manufactured in 45 factories around the world we build them we liquid cool them we test them we disassemble them ship them parts to the data centers because it's one and a half tons we reassemble it outside the data centers and install them the manufacturing is insane but the goal of all of this is because the scaling laws are driving Computing so hard that this level of computation Blackwell over our last generation improves the performance per watt by a factor of four performance per wat by a factor of four perform performance per dollar by a factor of three that's basically says that in one generation we reduce the cost of training these models by a factor of three or if you want to increase um the size of your model by factor three it's about the same cost but the important thing is this these are generating tokens that are being used by all of us when we use Chad GPT or when we use Gemini use our phones and the future just about all of these applications are going to be consuming these AI tokens and these AI tokens are being generated by these systems and every single data center is limited by power and so if the perf per watt of Blackwell is four times our last generation then the revenue that could be generated the amount of business that can be generated in the data center is increased by factor of four and so these AI Factory systems really are factories today now the goal of all of this is to so that we can create one giant chip the amount of computation we need is really quite incredible and this is basically one giant chip if we would have had to build a chip one here we go sorry you guys you see that that's cool look at that disco lights in here right if we had to build this as one chip obviously this would be the size of a wafer but this doesn't include the impact of yield it would have to be probably three or four times the size but what we basically have here is 72 Blackwell gpus or 144 dieses this one chip here is 1.
4 exf flops the world's largest supercomputer fastest supercomputer only recently this entire room supercomputer only recently achieved an xof flop plus this is 1. 4 exop flops of AI floating Point performance it has 14 terabytes of memory but here's the amazing thing the memory bandwidth is 1. 2 pedabytes per second that's basically basically the entire internet traffic that's happening right now the entire world's internet traffic is being processed across these chips okay and we have um30 trillion transistors in total 2592 C CPU cores whole bunch of networking and so these I wish I could do this I don't think I will so these are the black Wells these are our connectx networking chips these are the mvy link and we're trying to pretend about the mvy the the mvy link spine but that's not possible okay and these are all of the HPM memories 12 ter 14 terabytes of HPM memory this is what we're trying to do and this is the miracle this is the miracle of the black wall system the black wall dies right here it is the largest single chip the world's ever made but yet the miracle is really in addition to that this is uh the grace black wall system well the goal of all of this of course is so that we can thank you thanks boy is there a chair I could sit down for a second can I have a michelou [Applause] Ultra how is it possible that we're in the micho ultra Stadium it's like coming to Nvidia and we don't have a GPU for you so so we need an enormous amount of computation because we want to train larger and larger models and these inferences these inferences used to be one inference but in the future the AI is going to be talking to itself it's going to be thinking it's going to be internally reflecting processing so today when the tokens are being generated at you so long as it's coming out at 20 or 30 tokens per second it's basically as fast as anybody can read however in the future and right now with uh gp01 you know with with the new the pre Gemini Pro and the new GP the the 013 models they're talking to themselves reflecting they're thinking and so as you can imagine the rate at which the tokens could be ingested is incredibly high and so we need the token rates the token generation rates to go way up and we also have to drive the cost way down simultaneously so that the the quality of service can be extraordinary the ca to customers can continue to be low and AI will continue to scale and so that's the fundamental purpose the reason why we created MV link well one of the most important things that's happening in the world of Enterprise is agentic ai agentic ai basically is a perfect example of test time scaling it's a AI is a system of models some of it is under understanding interacting with the customer interacting with the user some of is maybe retrieving information retrieving information from Storage a semantic AI system like a rag uh maybe it's going on to to the internet uh maybe it's uh studying a PDF file and so it might be using tools it might be using a calculator and it might be using a generative AI to uh generate uh charts and such and it's iter it's taking the the problem you gave it breaking it down step by step and it's iterating through all these different models well in order to respond to a customer in the future in order for AI to respond it used to be ask a question answer start spewing out in the future you ask a question a whole bunch of models are going to be working in the background and so test time scaling the amount of computation used for inferencing is going to go through the roof it's going to go through the roof because we want better and better answers well to help the the industry build agentic AI our our go to market is not Direct to Enterprise customers our go to market is is we work with software developers and the IT ecosystem to integrate our technology to make possible new capabilities just like we did with Cuda libraries we now want to do that with AI libraries and just as the Computing model of the past has apis that are uh doing computer Graphics or doing linear algebra or doing fluid dynamics in the future on top of those acceleration libraries C acceleration libraries will have ai libraries we've created three things for helping the ecosystem build agentic AI Nvidia Nims which are essentially AI microservices all packaged up it takes all of this really complicated Cuda software Cuda DNN cutless or tensor rtlm or Triton or all of these different really complicated software and the model itself we package it up we optimize it we put into a container and you could take it wherever you like and so we have models for vision for understanding languages for speech for animation for digital biology and we have some new new exciting models coming for physical Ai and these AI models run in every single Cloud because nvidia's gpus are now available in every single Cloud it's available in every single OEM so you could literally take these models integrated into your software packages create AI agents that run on Cadence or they might be S uh service now agents or they might be sap agents and they could deploy it to their customers and run it wherever the customers want to run the software the next layer is what we call Nvidia Nemo Nemo is essentially a digital employee onboarding and training evaluation system in the future these AI agents are essentially digital Workforce that that are working alongside your employees um working Al doing things for you on your behalf and so the way that you would bring these specialized agents into your the special agents into your company is to onboard them just like you onboard an employee and so we have different libraries that helps uh these AI agents be uh trained for the type of you know language in your company maybe the vocabulary is unique to your company the business process is different the way you work is different so you would give them examples of what the work product should look like and they would try to generate it and you would give a feedback and then you would evaluate them so on so forth and so that and you would guardrail them you say these are the things that you're not allowed to do these are things you're not allowed to say those and and we even give them access to certain information okay so that entire pipeline a digital employee pipeline is called Nemo in a lot of ways the IT department of every company is going to be the HR department of AI agents in the future today they manage and maintain a bunch of software from uh from the IT industry in the future they'll maintain you know nurture on board and uh improve a whole bunch of digital agents and provision them to the companies to use okay and so your H your it department is going to become kind of like AI agent HR and on top of that we provide a whole bunch of blueprints that are ecosystem could could uh take advantage of all of this is completely open source and so you could take take it and uh modify the blueprints we have blueprints for all kinds of different different types of Agents well today we're also announcing that we're doing something that's really cool and I think really clever we're announcing a whole family of models that are based off of llama the Nvidia llama neotron language Foundation models llama 3.
1 is a complete phen phenomenon the download of llama 3. 1 from meta 350 650,000 times something like that it has been derived and turned into other models uh about 60,000 other different models it it is singularly the reason why just about every single Enterprise and every single industry has been activated to start working on AI well the thing that we did was we realized that the Llama models really could be better tune for Enterprise use and so we fine-tune them using our expertise and our capabilities and we turn them into the Llama neotron Suite of open models there are small ones that interact and uh very very fast response time extremely small uh they're uh sup what we call Super llama neotron supers they're basically your mainstream versions of your models or your Ultra model the ultra model could be used uh to be a teacher model for a whole bunch of other models it could be a reward model evaluator uh a judge for other models to create answers and decide whether it's a good answer or not give basically give feedback to other models it could be distilled in a lot of different ways basically a teacher model a knowledge distillation uh uh model very large very capable and so all of this is now available online well these models are incredible it's a uh number one in leaderboards for chat leaderboard for instruction uh lead leaderboard for retrieval um so the different types of functionalities necessary that are used in AI agents around the world uh these are going to be incredible models for you we're also working with uh the ecosystem these Tech all of our Nvidia AI Technologies are integrated into uh uh the it in Industry uh we have great partners and really great work being done at service now at sap at Seaman uh for industrial AI uh Cadence is doing great work synopsis doing great work I'm really proud of the work that we do with perplexity as you know they revolutionize search yeah really fantastic stuff uh codium uh every every software engineer in the world this is going to be the next giant AI application next giant AI service period is software coding 30 million software Engineers around the world every everybody is going to have a software assistant uh helping them code uh if if um if not obviously you're just you're going to be way less productive and create lesser good code and so this is 30 million there's a billion knowledge workers in the world it is very very clear AI agents is probably the next robotics industry and likely to be a multi-trillion dollar opportunity well let me show you some of the uh blueprints that we've created and some of the work that we've done with our partners uh with these AI agents AI agents are the new digital Workforce working for and with us AI agents are a system of models that reason about a mission break it down into tasks and retrieve data or use tools to generate a quality response nvidia's agentic AI building blocks Nim pre-trained models and Nemo framework let organizations easily develop AI agents and deploy them anywhere we will onboard and train our agentic workforces on our company's methods like we do for employees AI agent are domain specific task experts let me show you for examples for the billions of knowledge workers and students AI research assistant agents ingest complex documents like lectures journals Financial results and generate interactive podcasts for easy learning by combining a unet regression model with a diffusion model cord can downscale global weather forecast down from 25 km to 2 km developers like at envidia manage software security AI agents that continuously scan software for vulnerabilities alerting developers to what action is needed Virtual Lab AI agents help researchers design and Screen billions of compounds to find promising drug candidates faster than ever Nvidia analytics AI agents built on an Nvidia Metropolis blueprint including Nvidia Cosmos neaton Vision language models llama neaton llms and Nemo retriever Metropolis agents analyze content from the billions of cameras generating 100,000 pedabytes of video per day they enable interactive search summarization and automated reporting and help monitor traffic flows flagging congestion or [Music] danger in industrial facilities they monitor processes and generate recommendations or [Music] Improvement Metropolis agents centralize data from hundreds of cameras and can reroute workers or robots when incidents occur the age of agentic AI is here for every [Music] organization okay that was the first pitch at a baseball that was not generated I just felt that none of you were impressed okay so ai ai was was created in the cloud and for the cloud he was created in the cloud for the cloud and for uh enjoying AI on on phones of course it's perfect um very very soon we're going to have a continuous AI that's going to be with you and when you use those metag glasses you could of course uh point at something look at something and and ask it you know whatever information you want and so AI is is perfect in the CL was creating the cloud is perfect in the cloud however we would love to be able to take that AI everywhere I've mentioned already that you could take Nvidia AI to any Cloud but you could also put it inside your company but the thing that we want to do more than anything is put it on our PC as well and so as you know Windows 95 revolutionized the computer industry it made possible this new suite of multimedia services and it changed the way that applications was created forever um Windows 95 this this model of computing of course is not perfect for AI and so the thing that we would like to do is we would like to have in the future your AI basically become your AI assistant and instead of instead of just the the 3D apis and the sound apis and the video apis you would have generative apis generative apis for 3D and generative apis for language and generative AI for sound and so on so forth and we need a system that makes that possible while leveraging the massive investment that's in the cloud there's no way that we could the world can create yet another way of programming AI models it's just not going to happen and so if we could figure out a way to make Windows PC a worldclass AI PC um it would be completely awesome and it turns out the answer is Windows it's Windows wsl2 Windows wsl2 Windows wsl2 basically is two operating systems within one it works perfectly it's developed for developers and it's developed uh uh so that you can have access to Bare Metal it's been wsl2 has been optimized optimized for cloud native applications it is optimized for and very importantly it's been optimized for Cuda and so wsl2 supports Cuda perfectly out of the box as a result everything that I showed you with Nvidia Nims Nvidia Nemo the blueprints that we develop that are going to be up in ai.