John Hennessy and David Patterson 2017 ACM A.M. Turing Award Lecture

138.74k views11686 WordsCopy TextShare

Association for Computing Machinery (ACM)

2017 ACM A.M. Turing Award recipients John Hennessy and David Patterson delivered their Turing Lectu...

Video Transcript:

Hello, everyone out there and welcome. I'm Vicki Hanson, the president of ACM and I'm really thrilled to be here tonight. ACM initiated the Turing Award in 1966 to recognize contributions of lasting and major technical importance to the computing field. We’re indeed fortunate that the 45th international symposium on computer architecture, is the venue for this year’s ACM, AM Turing Lecture As a result of the fundamental contributions to the development of energy efficient RISC-based processors, the two recipients of the 2017 Turing Award have played a key role both in the mobile and in the internet of things

revolutions. John L. Hennessy, former president of Stanford University and David A. Patterson, retired president I said that before, retired professor of the University of California, Berkeley and former president of ACM conceived of using a set of simple and general instructions that needed fewer transistors, and that’s reduced the amount of work a computer most perform. The concept was revolutionary. The work led by Hennessy at Sanford and Patterson at Berkeley would result in a systematic, quantitative approach to designing faster, lower power and reduced instruction set microprocessors. Patterson’s Berkeley team coined the term RISC and built the RISC

one processor in 1982. It would later be commercialized by Sun Microsystems and a SPARC microarchitecture. Hennessy co-founded MIPS Computer Systems in 1984 to commercialize the Stanford team’s work. Over the past quarter-century, their textbook Computer Architecture, A Quantitative Approach, which is now in its sixth edition has influenced generations of engineers and computer designers who would adopt and further refine their ideas. It would become the foundation for our ability to model and analyze the architectures of new processors thus accelerating advancers in microprocessor design. Today, 99% of the more than 16 billion microprocessors produced annually are RISC processors.

They are found in nearly all smartphones, tablets, and in the billions of embedded devices that comprise the internet of things. It's my privilege to introduce our 2017 Turing Laureates, John Hennessy, Chairman of the Board of Alphabet Inc. and Director of the Knight-Hennessy Scholars program at Stanford University and Dave Patterson, distinguished engineer at Google and Vice Chair of the Board of the RISC-V Foundation. John and Dave, welcome. Alright. So, when John and I were figuring out how to do this, it just seemed like it would be crazy if we gave two independent lectures given our career.

So, we’re going to do this as tag team. So, part one is going to be history, we’re going to do the history of computer architecture in about 20 minutes, I'm going to do that part. John is going to do part two, talking about the challenges facing our field we’re going to tag team, doing the first part on domain specific architecture and me doing the last part. And then there's time for questions and answers, and we’re looking forward to that. Okay let's go back 50 some years, early 1960s, IBM had this problem, IBM had four incompatible

lines of computers and when incompatible, the instruction sets are different, the software stacks are different, , the operating system different, the IO is different and the markets are different. So, IBM engineers had this idea that they would bet the company, bet the whole company that they could have a single instruction set that would handle all four independent lines that they’d unify around that To pull that off then and now the hard part is control, but Maurice Wilkes, the second person to win the Turing Award and from our -- from our field had this idea of

how to make it easier to build control and his idea was called microprogramming. The insight, the logic is what and his technology at the time was more expensive than ROM or RAM, but ROM was cheaper than RAM and was faster, so let’s specify control as a read-only memory and they called each word of that control memory microinstruction. So, IBM was going to bet the company, were going to pull this off using microprogramming. So, on April 1964, they made the biggest announcement in the company's history and here's the example of four different machines, you can see

the data with those from 8 bits to 64 bits and the microinstructions go from 50 bits to 87. So, back then with the microcode is the wider their hardware, the wider the microinstruction, but it didn't take as many microinstructions are interpret it so it was shorter, so that was kind of called horizontal microprogramming and the one on the left is 50 bits, but it’s longer because it takes more clock cycles to execute. So, that was called vertical microprogramming and they bet the company and back then using today's dollars, a million and a half dollars for

the small one, they won the bet. Bet the company, won the bet. So, IBM dominated the mainframe computing industry and still to this day that instruction set is still available and dominates the mainframes. And the second computer architect to get a Turing Award Fred Brooks played a big role in that effort. Moore's law comes along, semiconductor technology comes along and then the minicomputer. Now logic, RAM and ROM are all in the same transistors, so they're all about the same and RAM is of the same speed. But with Moore's law, we can have bigger control stores

and also because of this RAM, it made it you could have bigger control stores because you could fix it. So, this led to these more complicated instruction sets and the classic example that was Digital Equipment, VAX instruction set and you could see it had 96 bit wide, but 5000 microinstructions. An idea came along because it was in RAM was called Writable Control Store, basically you could then since it was alterable rather than just run the standard instructions set, you could put tweaks and to tailor it to exactly to your application so Writable Control Store. So,

microprogramming became very popular in academia and that’s when I was a graduate student. My PhD thesis is in this area my first paper was at SIGMICRO, but SIGMICRO is actually, you may not know this is, the International Workshop of Microprogramming. Later it changed its target, but it was microprogramming in the beginning. Surely, the most famous machine with writable control store was the Xerox Alto built in 1973. This was the progenerator of the ideas we all use today. It had the first computer with a graphical user interface, the first computer with an ethernet and this is

all written in microcode in the writable control store. It could do bit lithographic nomination and ethernet controller and the person-- there's a picture of the Alto and the third computer architect to win the Turing Award Chuck Thacker and this fork, his contributions with the Alto. Now, microprocessors were behind the times. TMicroprocessors would follow what the big guys did. MOS technology was rapidly growing and they would just imitate what the big would do, they would do these microprocessor wars because everybody was writing still an assembly language, they’d say here’s my new instruction and look, we use

this and they would be countered back-and-forth through people and venting new instructions. Surely, the most ambitious microprocessor maybe of all time but certainly in the 1970s was I think it's 432. Gordon Moore of Moore's law fame was a visionary and he believed they had the 8-bit 8080 microprocessor, he believed the next one they did he told was going to be stuck with forever that would last for as long until it would last. So, he hired a bunch of PhDs in computer science sent them up to Oregon to invent the next great instruction set and this

is a very ambitious project. They did in that era 32-bit capabilities, it was object-oriented architecture, it had a custom operating system written in an exotic programming language, so big ideas. Alas, they were big, but they were -- they were late. So, it didn't fit in one chip, it was spread across a few chips, it had performance problems and it was going to be years late. So, the people for Oregon had to tell Gordon Moore, sorry we’re not going to be done in time. So, what was Gordon had to do, what he had to do is

start an emergency project. It was called the 8086, they had 52 weeks. The team had 52 weeks to upgrade a stopgap 16-bit processor so instruction set, architecture, chip, everything in 52 weeks. They took three weeks of elapsed time, 10 person weeks to design instruction set. And the they basically extended the 8080 to 16 bits and it was announced not to very much [inaudible]. The great news for Intel is that IBM decided to pick the -- an 8-bit version, 8-bit bus version of 86, they liked the 68,000. It had a more elegant instruction set closer the

360, but it was late. So, they went with the 8088. IBM thought at the time that they’d be able to sell of the PC maybe 250,000 of them instead they sold hundred million. So, the 8086 became an overnight success and thanks to binary compatibility that IBM had invented earlier, it was binary compatible with PC software and so, a really bright future for the 8086. So, now researchers started taking look at these microcoded machines. This is a picture of a fourth computer architect to win the award John Cocke. What is happening is this transition from assembly

language programming to programming at high-level languages. High level languages were popular, but he couldn't write operating systems, UNIX disproved that. UNIX was written in a high level language,so we could write everything in a high level language. So, now it wasn't what assembly language programmers did, it was the output of the compiler that mattered. John Cocke and his group as IBM, which built this hardware, this ECL server, but also in particularly advanced compiler technology. They said, let's take our compiler technology of this IBM mainframe instruction set, but only use the simple instructions that loads and stores

register-register ones. What would happen to performance? It went three times faster but using a subset, well that was kind of a shocking result and then computer architects Joel Emer and Doug Clark did the study of that VAX architecture that I showed you few slides earlier, what did they find. First of all that the average number of clock cycles per instruction was 10, so with that microcode interpreter, it took on an average 10 microinstructions to be able to do that. And then the other thing that they found is 20% of that instruction set was 60% of

the code and it was almost never used. So, that wow, why are we doing that. I came on the scene, I joined Berkeley in 1976 and then I did a kind of strangely as a system professor, I did a sabbatical three years later at DEC because I had done my dissertation in microprogramming and they wanted help with microprogramming bugs. And I came away kind of astounded how hard it was to get bugs out of the -- out of the VAX instruction set architecture. So, I -- as soon as I came back, I wrote a paper,

I said look if the microprocessor people are going to follow the trends of the minicomputer and mainframes and build more complicated instruction sets, they’re going to have a microcode problem and then we’d have to be able to repair the microprocessors. I was going to use writable control store about that. So, what happened to that paper? I come back from my sabbatical, do all that work, it was rejected. I remember the viewers saying, this is a stupid way... ...to build microprocessors with repairable microcodes. So, I like if they’re going to make it more complicated to repair

it... ...but it’s a stupid way to build microprocessors... ...so then kind of like well, why are we doing this? So, this is a transition from CISC to RISC. So, the SRAM the fast memory inside the processor... ...processor instead of using it for a microcode interpreter... ...let's just make it a cache of user-visible instructions. So, the content of the memory will change depending on what the program is running. Let's keep the instruction set so simple... ...you don't need an interpreter you could basically get a pipeline implementation. So, you could think of them as simple as the

microstructure is just not as wide. By the way, the CISC, you know, the compilers only use... ...a few of these complicated CISC instructions. So, you’re not losing that much anyway. Around that era, the chips are getting bigger due to Moore's law... ...and we could get whole 32-bit datapath... ...and a cache on a single chip, so that made this RISC more attractive. Then, there was a breakthrough in register allocation... ...by Greg Chaitin. Sorry... ...using graph coloring that made... ...register architectures much more efficient than in the past. So, that's about when Berkeley and Stanford came on the

scene. We did our work originally with a series of... ...a graduate courses, four graduate courses were... ...investigated the architecture that this was RISC-1... ...that was mentioned earlier. Two of the graduate students decided to build a more... ...efficient version about the same time as Hennessy.... ...and his students at Stanford built the MIPS one. So, these were all done about contemporaneously. You know, we wish we’d had this explanation early on but eventually... ...through I think Clark's work evaluating that... ...VAX and that architecture we talked about... ...we could do this iron law factoring into three things and as...

...Dileep Bhandarkar who is here... ...wrote in a paper after the RISC stuff is look... CISC executes fewer instructions maybe three horses as many as RISC... ...but a lot more clock cycles so there's this kind of factors of four advantage. So, now I'm going to go back in history... ...and you’re going to see a couple things. First of all... ...people who look something like John and I'm but with a lot more hair... ...and you’re going to see these things, these little fancy pieces of plastic... ...but this is the way we used to do presentations for you

younger people. The great thing about when we used to these transparencies they were called... ...you would start the class from the projector and start. It would take 5 seconds to start. Today we, it takes about 30 minutes to start. Alright, here we go. This was the long video. So, we’ll stop right there. Alright, so what happened with -- so our colleagues at Intel... ...were able to mast teams of 500 people to build microprocessors... ...much more than the RISC people could do... ...and they had great technology and they had the idea of... ...translating the x86 instructions

as we know into RISC interest microinstructions. So, any of the good ideas that RISC people had they could use... ...and so they started to dominate... ...and got up to 350 million chips a year, which is... amazing and not only dominated the desktop, but servers as well. But the post PC era, which just say let's start from... ...the iPhone in 2007... ...now it's not buying chips from Intel... ...it's -- it's getting intellectual-property to integrate... ...on the SOC that you're designing yourself... ...and so that's different and of course, in this marketplace, the value... ...area energy as much

is just plain performance, so I mean... ...hard to handle it and... ...actually last year, there were more than 20 billion chips... ....with of 32-bit processors in them. The x86 peaked in 2011 with the dropping sales of PCs... ...and it actually they’re selling less now than they used to. The cloud is still big, but as this paper estimated there's only like... 10 million servers in the clouds, so it's not that many chips. So, it's 99% of the processors today are RISC. Okay, what's next in history computer architecture is something... ...that was going to replace RISC and

CISC, which is VLIW. If you can -- VLIW, the champion of that was Josh Fisher. he actually did his dissertation in microprogramming and You can think of VLIW’s horizontal microcode right... ...it’s really wide instructions controlling a lot of data path... ...but the compiler does all the work. It was time for Intel to expand their architecture... ...and they decided to get to 64 bits, they had done the transition... ...from 16 to 32 and now was time for 32, 64... ...and they decided they were going to bet on VLIW... ...and they named their bet EPIC... and in

fact, they joined forces with Hewlett-Packard... ...who was also working VLIW and together... ...they were doing the EPIC architecture... which is a VLIW with binary compatibility and ...they started during 1994 to considerable fanfare... ...and now what this meant was for business reasons, you know... ...they were using this emergency instruction set for... -- for 20 years and so, it kind of made sense... ...to get a more technological foundation.. There was also a business advantage is... ...that AMD had the rights to make the x86. But, given the new instruction set, they didn't have the rights to that. So,

they weren’t allowed to. So, AMD was forced to just extend the x86 to 64 ...bits while this new architecture was going to take over the world. A bunch of people, when Intel and HP joined forces... in the 90s said, this is the future, bunch of companies just believed them. Wow, that’s going to happen like it or not... ...so they just quit what they were doing, dropped the RISC architectures... ...and started to embrace EPIC. What happened? EPIC failure. Alright, so... ...the compiler what happened was, you know, the code that might work... ...well in floating point code

didn't really work on integer code right. The pointers caused a problem and in particular the problems were... ...of the code size, it’s VLIW, right, so the program... has got a little bigger, they’re long instructions... ...that's a problem. Unpredictable branches were a... ...problem particularly for these integer codes... ...and then cache misses which were unpredictable. So, all three of those made it -- well, two of those made it really hard to compile... ...and the program is really a lot bigger. The out of order techniques worked pretty well for the... ...cache latencies better than the VLIW. So, kind

of out of order subsumed it and... ...but the biggest thing was the compilers... ...which the VLIW bet was on, I think compilers can handle... ...all this complexity, schedule all these wide arms. It turned out as Don Knuth and other Turing... ...award winners said, impossible to write. Now, given all the publicity around... ...the Itanium and EPIC it was called. When started not to work, people noticed... ...and so some wag instead of calling it the Itanium... ...re-christened it the Itanic. So, then you can see the sinking into the future. So, that's kind of what we do in

computer architecture, right. We have these arguments then companies go spend billions of dollars... ...betting on both sides and then we let them architect... ...and figure it out and in this case, it failed. So, wrapping up my part before I hand off to my colleague... ...the consensus on instructions sets today, it's not CISC. No one has... proposed one of these microcode interpreter instruction... ...sets in more than 30 years. VLIW didn’t work for general-purpose for some of the reasons we said. However... ...you know, it found a place in more embedded of RDSP things because... ...you know, it

-- the branches are easier, it... ...doesn't have caches, the programs are smaller. So, it kind of... -- VLIW worked there, it didn’t work for general-purpose. So, what’s left, RISC, Who would have guessed, 35 years later it’s till the best ideas. Okay, with that, I will tag John and he’ll take over. Okay, and now for something really different. Okay, so what we’re going to talk about now is what... ...some of the current challenges are. I know most you are... ...familiar with those kinds of things. Technology changes, we’re in an era of lots of change right now.

The end of Dennard scaling means that power... ...and energy become the key design constraints. ...and the ending of Moore's law, not the complete end... ...all as Gordon Moore said to me... all exponentials come to an end. Right. So, we’re in the slowdown phase in Moore's law... ...but we’re also faced with similar... ...kinds of challenges around our architectural ideas. Because our architectural ideas... ...as we push than ever harder became less and less efficient. ...whether they were ideas about multicore... ...in Amdahl’s lawover there were concepts... ...for exploiting instruction level parallelism. We were pushing the envelope more and

more... ...and as the inefficiencies in those fundamental architectural ideas... ...became larger and larger... ...the fact that we were at the end of Dennard scaling... ...and the end of Moore's law made it more and more difficult. So, what’s happened in terms of processors? Well, we have this early on notion, we're getting early CISC processors about... a 22% performance per year. Then, we got on this incredible high phase... where we were getting dramatic performance goals, 50% improvement per year... ...then we sort of ran out of steam with the ILP [inaudible]. The end of Dennard scaling came along.

We moved to multicore. It worked pretty well and then things got even slower ...and finally, you look at the last two years... ...we’re basically looking at 3% improvement... ...in processor performance per year. The end of a dramatic phase... ...we got to rethink what we're doing, we have to approach problems differently... If you break this down and begin to look at things, you can see that... ...turnover in Moore's law with respect DRAMs. Of course, DRAMs are a very particular technology... ...they rely on trench capacitor design... ...and so, it's basically seeing the tail off faster than conventional,

...but even if you look at the number of processors... ...the number chips, the number of transistors in an Intel processor... ...you can begin to see the end of Moore's law, right. There, first a little bit and then we gathered steam again... ...but if you look at that curve since about 2000... ...we’re falling off. So, we’re now -- ...if we'd stayed on that curve... ...we’d have 10 times as many transistors in... ...the typical microprocessors we have today. So, we have a really differentiated and separated away from that curb... ...that's caused all of us to think... ...how

-- what are going to do next? ...and of course, Dennard scaling a similar... ...kind of problem even more acute... ...I think everybody would say everybody does chip design would say... ...energy is job one now. Power is job one think about the design. So, as the technologies continue to improve... ...we’ve got this curve... ...that's power consumption that's going in the other direction... ...and you look at how that curve really takes off after about 2008... ...it just goes up and up and up. Now, of course that’s power... ...how it translates the energy... ...depends on how efficient we

are in using the transistors. Unfortunately, the techniques we have for using transistors... ...have become increasingly inefficient. Think about cache as something we all love... ...one of the truly great ideas in computing, okay. But of course larger and larger you make your cache, the... ...less and less effective it is in speeding up the program. So that's our challenge, we’ve got to find new ways to think about how to use ...the capability that we have more efficiently. We’re also in a pretty sorry state with respect... ...to security as we heard the panel at lunchtime. The simple thing

I'd say that security is... ...if airplanes malfunctioned as often as computers malfunctioned... ...nobody would be at this conference who didn’t live in Southern California. You’d all be home, because you’d never get on a plane, right. If cars malfunctioned that way, we’d never get in a car right. We’ve got a big problem. Now some of us are old enough... to remember when there was a lot of emphasis on security. In the 70s where great projects, it was Multics, there was a lot of focus on it. We invented domains, rings, even capabilities... ...some of the ideas that

are just coming back into computer architecture. We’re piloted in the 1971. What happened? Those ideas were not well used. We also had not yet developed the... ...architectural techniques to make them fast. So, things like translation lookaside buffers which make... ...virtual memory possible. Imagine virtual memory without that... ...every single memory access, every single memory... ...access requires two accesses to main memory. So, we didn't have high-performance ways to do it. The technique didn't seem to help. They had lots of overhead. They were abandoned. At the same time, we thought... formal verification is going to solve all our

problems. We’re going to verify all our software. In fact, I remember the rise of kernel based and microkernel operating systems. The kernel -- the operating system that controlled security things... ...was only to be 1000 or 2000 lines of code. There’s no kernel with less than a million lines of code out there. So, we basically didn't get the security thing right... ...the way we thought we were going to solve it. Almost all software has bugs. All of you who buy new piece of software, get 15 page... ...disclosure which basically says if this software doesn't work too

bad for you. Right? That's what it says and you all check the box and get the software. So, we the hardware community, the architecture community... ...have to step up, working hand-in-hand... with the art -- with the operating system community... ...with the people who think about security... ...to really get this problem right. If we don’t it... ...we’re going to have a community of users out there... ...who are become increasingly unhappy with us. There are a lot of holes in this. If you look at it, here's... just one simple example, Intel processors have this management engine... ...that

runs a bunch of code... ...before any kernel operating system code runs and sets up the machine. Who reads that code and makes sure that it is doing the right thing? None of us read that code. So, we’ve got real problems... with large instruction sets, people can test random... ...opcodes, find holes in the instruction set definition... ...lots of issues that we’ve got to get right and we’ve got to rethink. And of course, we have the specter computer... ...architecture thing as we’ve pointed out at lunch. This is going to require us to rethink our definition... ...of instruction

set architecture... because we never thought about timing base. Now, I'm old enough to remember when timing-based attacks... ...were first discovered back in the 1970s. There was a timing-based attack on tops 10 and... ...tops 20, the OSs that ran on deck system, 10s and 20s back then... ...and then we just kind of forgot about it... ...and assumed it was an operating system... ...problem and purely a software problem. Now it's an architecture problem. And it's an architecture problem that's... ...been there for 20 years and we really didn't even know it. So, we've got rethink what we mean

about... ...security, what our role, what role the architecture community is... ...how we work collaboratively and I think... ...as we’ve pointed out lunch there are lots more microarchitecture... ...attacks on the way, lots more timing based side channel attacks are coming. This is not to be an easy problem, it’s going to mean really rethinking... ...how we think about security. Okay, so this is a new time to think about this problem... ...and we’re going to have to redefine our notion of computer architecture. So, here we are... ...sounds like a tragedy about unroll. Slowdown in Moore's law, no more

Dennard scaling... ...architect security is a mess... what are we going to do about this? So, I've always taken the view there's a great quote that John Gardner once said. He said, what we're facing is a set of seemingly impossible... ...challenges, opportunities disguised as great challenges... and that's where we are. We have great opportunities. So, what are those opportunities? Think about software... ...Maurice Wilkes said to me about 25 years ago, I said Maurice... ...what happens if Moore's law ever slows down? ...and he says then we’re going to have to think a lot more carefully about... ...how

we write our software and pay a lot more attention to efficiency. So, we’ve got software centric opportunities, right. We all write in these modern scripting languages. They're interpreted. They are dynamically typed. They encourage reuse. They give lots of power... ...but they run terribly. They are incredibly inefficient, great for programmers, bad for execution. Hardware centric approaches, I think Dave and I both believe the only path forward and I... ...think lots of people in the architecture memory take this view. It’s something that's more domain specific... ..so don’t try to build a general purpose processor that does everything

well. Build a processor that does a few tasks incredibly well... and figure out how to build a heterogeneous architecture... ...using those techniques. Of course, there’s combinations, hardware and software go together. We’ve got to think about not only domain specific architectures... ...but the languages that are used to program them at the same time. So, here's a great chart that's out of a... ...paper called There's Plenty Room at the Top... ...by Leiserson and a group of colleagues at MIT... ...and it looks at a simple example admittedly... ...a simple example, Matrix Multiply. So, we’ll take the version in

Python... ...how much faster does the version just rewritten and C run? 47 times faster, 47 times. Now, I've worked in compilers before I worked in computer architecture. A factor of two would make you a star in the compiler community. If you just get half that 47, you’ll have the factor of 23. You’ll be a hero, you’ll win the Turing Award. Then you take it and you take it on to an 18 core Intel... ...you find the parallel loops because there's no way... ...that our software systems can find them automatically... ...that gives you another factor of

8... Then you take -- you layout the memory optimization so... ...the caches actually work correctly with a large matrix. ...which they typically don't. That gives you another factor of 20... ...and finally... ...you take advantage of domain specific hardware... ...with Intel AVX instructions, you use the vectors... ...and that gives you another factor of 10 performance improvement. The final version 62000 times faster than the initial version. That's a lot of work to get that out of Moore's law and hardware. So, this is just a great opportunity. Domain specific architectures achieve higher efficiency by... ...tailing the architecture to

the characteristics of the domain. ...and by domain specifically mean to include... ...us programmable infrastructure, right. A... machine which does a processor architecture... ...that does a range of applications... that are characterized in a particular domain... ...not just one hardwired application. Neural networks for machine learning, GPUs for graphics... ...programmable network switches are all good examples. So, if you know Dave and I, you know... ...we’d like to see a quantitative explanation... ...of why these techniques are faster... and I think that's important because it's not black magic here. They’re effective because they make more effective use of parallelism... ...SIMD

less flexible than MIMD, but when it works more efficient... ...no doubt about it right. VLIW... ...versus speculative, out of order,more efficient when VLIW structures work... So, what about these domain specific languages, I think the key thing is it's simply too hard to start with C or Python and extract the level of knowledge you need to map to the hardware efficiently if you have domain specific support on it. It's simply too hard a problem. Lots of us in the compiler community worked on that problem, it's just too hard. So, you need a higher-level language that talks

about matrices, vectors or other high-level structures and specifies operations at that higher-level. Still means there are interesting compiler challenges because we wanted a domain specific program to still be relatively independent from the architecture. So, we want have interesting compiler challenges there from mapping that domain specific program to a particular architecture and the architecture may vary from version to version. So, I think there are lots of terrific research opportunities here, make Python programs run like C, you'll be a hero if you do that. It's sort of déjà vu when I think about it because what we

were trying to doing RISC days days was to make high level languages run efficiently on the -- on the architecture of the time. This is the same challenge, same kind of challenge. Domain specific applications, what are the right targets, what are the right languages, how do you think that compiler technology, how do you build domain specific languages and applications which can port from one generation to the next so we’re not constantly rewriting software. Well, what problem might you work on? Well, there's one area where the number of papers is growing as fast as Moore's law,

which you can see on this plot and that’s machine learning. So, there's an obvious area, it's computationally intensive, provides lots of interesting things and the number of applications is growing by leaps and bounds. So, why not work on machine learning. So, of course that the tensor processing unit is one example of this that Google tried to do, deploy these machines and I think from the viewpoint of the Google people, their view is if we don't deploy this technology, we won’t be able to afford to run lots of applications on -- of machine learning they’ll be

computationally too expensive. The architectures look radically different, right. So rather than build large caches and take up lots of the architecture for control, instead we build memory that's targeted at the application use because memory access is such a big power consumer. You think very hard about trying to get memory accesses on-chip rather than off chip. You think about building lots of computational bandwidth to match the kind of application you're running. In the case of doing neural network inference, it's doing lots of matrix multiplies, right, so you build a systolic array. Here's an old idea come

back in its prime 30 years after it was created. And if you look at performance, you can get dramatic performance improvements in terms of performance per watt. Of course, the challenge-- one of the challenges we’ll have here is what applications, what benchmarks and just as we did in earlier era, we invented SPEC as a way to normalize and get ways to compare different architectures, we’re going to have to think about how we do that for machine learning and other domain environments as well. The one thing we’ve learned in the architecture community if we have a

set of benchmarks that's a good set of benchmarks and it's reasonable and can't be tampered with that provides stimulus for everybody to think about how to bring their ideas out and test them against that. So, summary, lots of opportunities, but a new approach to computer architecture is needed. We need renaissance computer architecture. Instead of having people that only understand a small sliver of the vertical stack, we need to think about how to build teams that put together people who understand applications, people who understand languages and domain specific languages and related compiler technology, together with people

who understand architecture and the underlying implementation architecture. From me, it's a return to the past, it's a return to a time when computer companies were vertically integrated rather than horizontally decomposed and I think that provides an exciting opportunity for people both in academic computer science as well as in industry. So, thank you for your attention, thank everybody for organizing this terrific conference and thanks to all the colleagues who’ve collaborated with this over the years and Dave is going to finish. Yes, I forget to thank everybody, so John filled in of me, but yeah, we found

out about the award not that long ago and ACM Pat Ryan contacted the and the organizing committee and said, what would you think, can you she squeeze a Turing Award lecture in with about and they did and so we are really appreciative. Okay, so this is the last part before questions and answers and you know, John and I are looking forward to them. So, I was always jealous of my colleagues in operating systems and compilers that they could work on industrial strength and make contributions and the whole world could use them because people use open-source

operating systems. So, why can’t we do that in architecture? Okay, so let me tell you about RISC-V and it’s called RISC-V, V because it’s kind of the fifth Berkeley RISC project. So, basically it was time at Berkeley that Krste Asanovic was leading the way and we needed to doing instruction set and the problem with the obvious candidates is not only were they complicated, we just -- we wouldn’t be allowed to use them. Intel was one of our sponsors, but they didn’t let us use the x86 because it was controlled. So, Krste decided that he and

graduate students were going to do a clean slate instruction set. So, let’s start over, kind of a radical idea, but it would only take three months, okay and then the lead graduate students for Andrew Waterman, Yunsup Lee and Andrew is here as well. I saw him as well and talked to him about it, so but that well, it took us four years, but they built a lot of chips in that time and I helped some, but it’s really Krste, Andrew and Yunsup that did it. Then this weird thing happened, you know, if you're in academia,

you always get complaints about what you're doing. So, but we started getting complaints about us changing internally details of the RISC-V instruction set from fall to spring, right So, okay, you guys don’t complain about everything, why do you care if we change our instruction set at Berkeley for our courses and what we discovered in talking to people, there was this thirst for an open instruction set. They looked at a whole bunch of them. They uncovered RISC-V and they started using it. They were going to use it themselves. So, once we heard and understood there was

a demand for an open instruction set, we thought well that’s a great idea, let’s help make that happen. So, what's different about RISC-V? It's really simple, it is far simpler than other instruction sets. The manual is about 200 pages that Andrew wrote most of those, Andrew and Krste wrote most of that and, you know, x86 is 10 times that. It’s a clean slate design, you know, it's easier if you start 25 years later to look at all the mistakes of the past for the MIPS and SPARC architectures, don't make those mistakes, don’t tie the microarchitecture

in, which is easy to do in retrospect. It's a modular instruction set and that their standard base that everybody has to have that these all software runs and then there's optional extensions, so you include or not depending on your application. Because we knew the domain specific architectures were going to be important, there's lots of opcode space to set aside, some architectures kind of lose with a bigger address fields and a big deal is it’s community designed. The base of standard extensions are finished that's not going to change, if you want to add extensions it’s a

community effort to do that where we bring the experts beforehand. Typically, what happens in computer architecture, you announce a new set of instructions if that is a company and then all of the software people tell you want was wrong with it. We have those conversations upfront and it's a foundation that’s going to maintain it. You know, universities lose attention and not excited, so it's a foundation non-profit foundations will run it and you know, it’s tech like operating systems compilers, advances happen for technical reasons not really marketing reasons. So there's actually a few different instruction set,

a 32-bit and a 64-bit and one even for embedded, the standard extensions optionally multiply and divide even atomic instructions single and double precision, compressed instructions and then vector which is no more elegant than the classics SIMD ones, my simple instruction format supported by the foundation. So, the founders members are growing up and to the right, in fact so there’s more than a 100 of them in couple of years nVIDIA announced at a workshop they were going to replace their microcontroller with RISC-Vs so that’s I know 20 or 40 million year. Western digital announced at a

workshop that they're going to put them in disc, so they’re going to bring computing to the disc and that’s going to billions per year and at out practice talk at Stanford on Thursday, two people came up to me from Changhong and Anyka and they announced announced they’re going to RISC-V and they’re going to be shipping 30 million a year starting next year. So, it's -- it's really starting to catch on. In terms of the standards groups or the extensions or the pieces, I think there’s people here who have worked on these pieces, but we get,

you know, it’s nice to get all because it’s open you get all the experts together and have these conversations, something like a standards committee before you embrace it into the instruction set. And RISC-V is just one example and nVIDLA to it’s credit has an open, just what John was talking about, a domain specific accelerator and everything is open, the software stack is open. The instruction set is architecture is open, implementations are open, it’s a scalable design that you can use it and it comes either with a RISC-V core as a host or not, it’s up

to you so that another example of open architectures. And then, you know, motivated this mornings are this lunch times thing is security, likes this idea of open architectures, they don't believe in security through obscurity. They believe in openness. So, there are worried companies, countries are worried about trapdoors that’s a serious worry. This paper that’s referenced here changed one line of Verilog to insert a trap that they could take over a machine. So, they'd like to be open implementations that you could look at and then the big thing is, you know, clearly from I -- I

think what you’d pick up from the lunchtime conversation, security is going to be a big challenge for computer architecture. We need everybody working that who wants to, right now the proprietary architecture you have to work for Intel or arm for those proprietary ones. Here everybody can work on them and including all of academia who have a lot of value to be able to add and that I think what’s exciting the opportunities is given the great advances FPGAs FPGAs and open source implementations and open software stacks, you can do novel architectures and put them online. You

can connect into the internet and you could be able to the subject to attacks or you could offer reward for attacks. So, you really have an active adversary on the other side, not just, you know, defensive on this thing and even though it runs at a 100 MHz that's fast enough to run real software you could have users and because it’s an FPGA you could iterate in weeks rather than years with with standard hardware and so, my guess is and I talked to people to table-talk is probably RISC-V will be the exemplar. Probably people will

use RISC-V if we do you co-design with architects and security people to advance it. That probably happened there first. Now other people could use the ideas that’s probably happened first in RISC-V. So, summarizing the open architecture summary, free, it's actually free. Anybody can use it. There’s no contracts or anything like just like LINX. It's simpler out of the gate, and it's not going to be there won’t be marketing reasons to expand it when I talk to people in commercial companies, why do you do that, well, you know, it's easier to sell a new instruction than

that is, you know, a better implementation. It makes a big difference at the low end. I’ve been surprised how minute that they want to how small they want it to be. At the high-end, you know, the architecture doesn't matter as much, but it can be -- there’s no reason it can’t be as far at the high end. We can support the DSAs, we have the opcode space for that and I think with just more people building processors, it's going to be a more competitive market which probably means faster innovation and I think as I said

I think security experts are going to rally around RISC-V. And our modest goal is world domination We can't think of any reason why, you know, a single instruction set architecture wouldn’t work well at the small end and why not, you know, it’s all RISC processor. So, hoping it will be the Linux of processors. So, for the last part of the talk, okay, is agile hardware development. So, little over 15 years ago, there was this breakthrough in software engineering instead of the idea that software would be better with elaborate planning and phases, called the Waterfall model,

, this was a rebellion and the rebellion was we’re going to do it agile. We’re going to do short development. We’re going to make working prototypes that are incomplete, go to the customer and see what they want next and we’ll do this rapid iteration and this has been a revolution in software engineering. One of the model is what’s call a Scrum organization, it’s from rugby and that you have a small team team and they would do the work and they do these sprints where you build the prototype and then pause and see what you next

then work hard in sprints. The good news is modern CAD software enables us to use some of these software development techniques and so, small teams can do some of that with abstraction and reuse. Here's an example from Berkeley in terms of three different designs, there is the leftmost column these are all RISC-V examples and so three stages, 32-bit, three stage pipeline, 32-bit design, the middle one is a classic 5 stage, 64-bit design and the right one is an out of order machine, 64-bit design. The unique lines of code, all the less than half of a

code is unique, you can share across them in the rocket design. Hardly any of it is unique and so it’s huge amount, you know, it's, you know, the raising level abstraction reduces the lines of code, but the big deal is getting code reuse. Now, it seems nothing, you know, how do you do this one-month turnaround when you’re doing the hardware, well what the group evolved was this iterative model. First of all, if you can do your innovations do it there then it’s just like software. Now, but you can't run that many clock cycles, you can't

you billions or trillions of clock cycles. If you go to the FPGA, you can. So, if you want to see how it really works then you make the changes to the FPGA, the great news that's happened over the last few years is that there's instances of FPGA in the clouds. You don’t even have to buy hardware to do FPGAs, you can just go the cloud and use it. Somebody else sets it all up and maintains it and you can use it and there’s examples at this conference like that. But, you know, if we’re really going

to do energy and really going to care about cost, we have to get to layout, right and so that’s the next iteration. Once the ideas work in the simulator and FPGA and actually do layout, now the layout flows gives you some estimates. There's more work that you have to do to really be ready tape out, which we call it tape in, but then it’s -- it's pretty good. Now, we could stop there. We could stop there because you really can estimate the area and power with good accuracy in the clock rate and all that stuff.

So, why not stop there, it’s because we’re hardware people. The most -- the difference the advantage we have over software people is they’re stuck in cyberspace, you know, the programs there, it’s never -- it's not physical. We build things. We get something physical back and there’s the excitement when the chip goes back, is it going to work, how fast is it going to work, how much power and stuff. It comes back So, the reason to build a chip is the reward, the excitement for everybody involved, the graduate students or the company to get the chips

back. Now that must be really expensive. No, it’s not really expensive. We’ve had test chips forever that were cheap. So, you could get 100 one by one, you know, 10 millimeter squared chips in 28 nm for $14,000. Why is this exciting now? Because with Moore's law where we are that's millions of transistors, you can get a RISC-V core and nVIDIA DLA in a tiny test chip and it's only $14,000. So, everybody can afford it. Now, if you want to build a really big chip that would be the last step and of course that will be

more expensive but everybody can afford to build chips today. As an example at Berkeley led by Krste Asanovic, you know, they built 10 chips in five years and so with this agile model, right, you don’t kind of wait several years like I did and then check it out and see what happens. They just, you know, gave next iteration, tape it out, so the graduates that came out of the program felt very confident that the chips are going to work because they built a lot of them and each one getting a little bit better, the agile

model is superior. So, wrapping up before we do questions, John and I think we’re entering a new Golden Age. If the end of Dennard scaling and Moore's law means architects is where we can innovate if you want to make things do a lot better. It’s not going to happen at the microarchitecture level. Security clearly needs innovation as well, if the software only solutions are not going to lead to more secure systems. The domain specific language is raising the level of abstraction to make it easier for the programmer. also makes it easier for architects to innovate

and domain specific architectures are getting these factors of 20 or 40, not 5 or 10%. The open architectures and open implementations reduces the effort to get involved, you can go in and like your compiler and operating colleagues and make enhancements to these devices and everybody gets to work on it. The cloud FPGAs make it even easier for everybody to build things and what looks like custom hardware and this agile development means all of us can afford to make chips. So, as John said like our time when we were young architects in the 1980s, this is

a great time to be an architect, even in academia industry and with that we’ll have questions. That was the fastest talk I ever heard Dave. I thought John was going to cut me off, but no -- So, thanks for the thought. I want to go back to what John said that it’s very easy to get 62000... ...speed up from scripting languages. In reality, a system like the V8 Google compiler... ...is extremely efficient. It has millions of lines of code... ...and if you look at with the performance lost... ...there is no silver bullet. It’s all over

the place. There is type system. There is multiple layers of compilers. There is inline caches... ...garbage collection. So, and caches don’t work. Nobody said it was going to be easy. I think what he said was the potential is there for 62,000. Yeah, you don’t have to get all it. If you get a factor of 1000 you’re a hero, right and... ...you just have to get one 160th of what's available. I think you're right, it is spread around. It's going to -- I think the interesting question will be... ...how to combine compiler approaches with... ...perhaps new

kinds of hardware support, which... ...will help you put the two back together. Remember some of you are old enough to remember the age of [inaudible]... ...machine and SPUR and things like that. We had a bunch of ideas there. Well, now we’ve got programming leverage that’s substantial. What are the right architecture ideas to match with that compiler technology? Don't separate the two, don’t put the architects over... ...here. I'm walking away from the architecture people -- -Compiler people. -But... ...they're mixed together and you get compiler people, let them mix together. Yes, I agree. Just finally pointing out

that it’s like the development took millions of engineer hours already so... We’re researchers, right, if, you know... ...somebody could have told us that when we were, you know, in our 30s... ...right, it was like you can’t build a microprocessor. That takes, you know, hundreds of Intel engineers, you can't do that. This is an opportunity. We have no choice, right. If Moore's law was still going on. Microprocessors were... ...double every 18 months, I don't know if people work on this, But, we got to do something too and here's an opportunity. And you should think about what

actually happened in the whole VLSI era... ...the academic community built a set of design tools. There weren’t design tools out there. I remember... ...going down and seeing Zilog designing Z80s... ...and there was a piece of mylar pasted on... ...the wall that was about 20 feet by 20 feet... ...and that’s how they were doing the design. If we had adopted that design methodology... ...we never would've been to design anything. The grad students would have all quit. They’d still be grad students. I'm Ling [inaudible] student from Duke. I'm working on domain specific architecture. So, for domain specific

computing people... here actually, I think we have two approach... ...one is that we run a new domain application and conventional... computing [inaudible]. So, we can find that where is the [inaudible] improve that. So, [inaudible]... ...they first find that... ...the computing [inaudible]... ...accelerator and then they find... ...that the instruction said accelerator is [inaudible]... ...so they proposed the [inaudible]. This is the [inaudible]... ...and the other approach for the domain specific computing is that... this is like a [inaudible] design approach. So, in this approach and we just proposed a... solution directly from the software [inaudible]... ...hardware architecture. So,

one example is -- Okay, what's the question? [inaudible] said earlier what's the question? So, the question -- here is the question. So, for domain specific computing people... ...here and we have two approach, so do you have any comments ...or suggestion about these two approach. Thank you. I think both approaches have merit. I think to say one approach will dominate the entire space is probably wrong. ...and in the end, the advantages of benchmarks for various areas... ...will bear out which approach works better... ...that’s my view. Yeah, okay. A quantitative approach. Very nice talk. So, Jason Mars,

Michigan... So, we're kind of in an era right now where... ...we’re going specialized, right. So, you can think of it as an extreme CISC... ...where we’re really specializing for particular complex use cases... ...and, you know, so you guys are hard-core advocates for RISC... ...point of view on the world and so, I wonder... ...what kind of lessons can we take from your experience? Having this battle before CISC versus RISC... ...what kind of RISC lessons can we take? ...being that the community as a whole has been so... focused on specialization, acceleration, etc. Put the hardware and the

software together. Don’t separate out the hardware from the software, think about... ...the problem in an integrated fashion. I think that's the most... ...important thing. I mean we... ...the RISC guy just wouldn't have come without the insights... ...about what compiler technology needed to look like... right, and that's the key insights. So, it’s the same thing here. Don't split them apart, but optimize across... ...that boundary and think how they fit together. And like we said, you know, it’s serendipitous, right... ...is that for domain specific languages are being created.. ...to make programmers more efficient in these domains... ...and

that's independent of all of this. But, what good news for us raise the level of... ...abstraction that may ease a program, also raises... ...where we can innovate, right. And so... ...yeah, like that's what John said, the vertically ...integrated people -- so it means more from us. Back in the old SPEC days, I don’t think people didn’t even know what was... ...you know, I don’t know it’s program four. I'm... ...going to study it and make it run faster. Now... ...because you didn’t win anything if you studied the... ...SPEC program because you weren’t allowed to change it.

This is a really different world, right. You're going to -- the opportunities we see here are... I think that's why it’s happening at [inaudible] companies. These companies have the skills at many different levels... ...and they’re finding, you know, big opportunities. Thank you. First and foremost thank you very mach for... ...your contribution, really appreciate it. So, the question I have is really about domain specific architectures, Now I know the entire architecture community ...at this point in time is talking about ...domain specific architectures being the ...solution for a lot of the problem we have. I've worked on

mobile computer architecture for... ...most of my time over the past several years and this device that you have here, it's got more than just a CPU and GPU. These folks have ben doing domain specific architectures... ...for over a decade at this point. So, I'm curious... while we look at on surface base, we have the -- ...we’ve gone from CPU, GPU to TPUs today... whereas these things have 10 times more... ...that number in terms of integration in SOC. So, how you look at domain specific architectures in this domain? and say that, is it new or is

it really old what are the lessons we can borrow? I’ll try and do shorter, I usually go on for long... ...but basically the difference is programability... ...right, like I was involved with the - ...the project that’s what's called the Pixel. Virtual Core at Google on my sabbatical... ...and there was a hardware called ISPs, they just do... ...this all on hardware. It was a fixed hardware line. The idea was to make this more programmable and they... ...actually the domain specific languages is called [inaudible] that they tie it in there, but I think that's the difference.Certainly there's..

...lots of special purpose accelerators that are... ...very energy efficient that kind of do 1ish thing. and so can we keep software and acceleration, right. That's exactly right. So, Vivek [inaudible], so Dave and John, I’d like to tap into your 40 years of experience in education ...and think about how computer science has been taught over this 40 year period. and, you know, as you approach -- look for opportunities in this Golden Era, Do you see gaps in the preparation of undergraduate and graduate students... to -- to address this era? So, clearly security. I mean clearly we're

not... ...teaching our students enough about the... ...importance of security, how to think about it, there was more -- if you go back... to textbooks of 20 or 30 years ago, you’ll probably find more... ...stuff on security than you will in some modern textbook. We need to fix that. That's obviously -- obviously crucial. I think for better or worse, our field has expanded so dramatically that it's impossible for an undergraduate or even... ...a graduate student to master everything, right. When I was a graduate student, I could go to... ...basically any PhD defense and understand what's... going

on. I could go to any job talk and understand what was going on. To say that you can do this today, it's just... ...impossible. The field has grown by leaps and bounds... ...so we’ve got to accept that we have... ...to teach our students some basic... algorithms, computation, complexity, but we’ve also... ...got to teach them some important system concepts. ...right, parallelism, security, things like that... ...and we’ve got to realize that we’re going to... ...have to teach them to be lifelong learners... ...because they're going to have to relearn and reengage... ...because our field is going to continue

to expand and blossom. I think the other thing is because of this... ...excitement about data science, I think statistics... ...now depending on the school they may get plenty... of statistics or they may not, but I think computer science getting... ...closer to statistics and vice versus is going to be a think so... you know, I guess you’d look at your curriculum and see. Sarita [inaudible], University of Illinois, so thank you. So, you just talked about entering a Golden Age of computer architecture and you also talked about how it’s going to be important... ...for hardware and software

people to work together... right and I think this group gets that, right. We get that we've to work with software folks.. We are actually used to doing that. But, the other way is not clear, right. Generally, culturally, I think... ...software people don’t quite get hardware, it’s not part of... ...the computer science department’s culture, so my question is... what can we do to move that needle? So, the first the thing you have to tell them is... ...guys, we’ve been giving you a free ride for 30 years While you write your crummy software and we made it

faster... ...right, that’s over, you know. You know... ...at Berkeley in like 2003, I went to the faculty lunches... ..and, you know, I'm going to give a talk. The future is [inaudible]. Single cores are not getting any faster. I don't know how many of my colleagues believe me when I... ...said, you know, they all, everybody has got an opinion, but I think look our job is to tell them it's over. The free ride is over. There's no magic, there's no -- the cavalry is not coming over the hill. The opportunity going forward is going to be

hardware, software codesign and architects are going to play this big role. If... ...you want things to get faster and lower energy, ...it’s going to -- this is the only path left, you know... and I think it’s our -- I think it’s actually ...all of your job to explain that your colleagues and the faculty that get that ...are going to be at advantage, right, because we’re right, right. There’s nothing else that’s going to happen. We’re never wrong. We -- the architecture community, you know, we know... ...what's going on in hardware and stuff like this... and that’s,

you know, you may get of course,well, it won’t quantum computer, no, quantum computer is a really exciting thing, it’s not tomorrow, alright and it’s not going to happen tomorrow.. It’s about the same schedule as fusion. So, I get that, but you’re being live streamed and ...now everybody is listening to you -- so that’s my -- You can video clip us. Mark [inaudible], Microsoft. So, clearly it a renaissance for architects and mircoarchitects... ...that are building very innovative chips for AI. Some of them requiring, you know, thousands... ...if not tens of thousands of cores. What I don't

see as much as a renaissance of compiler folks, ...you know, [inaudible] going more vertical, but... ...gone are the days that we needed to compile to a... ...simple RISC ISA, now it’s compiling for these gigantic... distributed machines. So, do you think that we’re... ...feeding the pipe with talent in that space? Well, I think it’s an opportunity. My view is that... ...we kind of hit the wall... ...on compiler -- we pushed compiler technology as far as it could go. If we were talking about compiling from lower-level, high-level... ...languages, so right, like C, Fortran, things like that. We

pushed that technology as far as it would go, ...we couldn’t push it any further I mean and lots of people who worked on it... ...kind of gave up on doing things like... ...reorganizing memory to improve memory... ...performance, who got as far as we could get. For simple programs, you can go it, matrix multiply, I can automatically... ...but take a large application or something as... ....far as matrices forget it, the system breaks. So, now we -- there's an opportunity... ...to reengage and renovate that whole area and rejuvenate it... ...and I think that's exciting opportunity and I

think those... ...people will get the most leverage if they’re working... hand in hand with hardware people with the architects. And, you know, [inaudible] for domain specific languages... ...are hopeful right, matrices are -- are primitives... ...right, and that makes it easier to do. There's a project at Google... ...called XLA that is trying to do optimization... ...from this very high level that’s new -- well... to my -- I believe that’s a new challenge for the compiler -- It’s a different approach. It’s a different approach. So, yeah, and we can also, you know... ...we’re probably not going to

do all the work... John and I. What we can do is kind of set up opportunities and this is... ...clearly an opportunity and, you know, we’re... ...going from an era, we’re really tiny... ...you know, papers with little tiny improvements, right, we’re talking... ...these giant numbers are there and, you know, like... -John, I like that what John said. -And you work in this field -- Future Turing awarders. Exactly. There's only one requirement, you have to have more hair than Dave. I’m [inaudible] student and, ...you know, you talk about we are the... Golden Age of architecture ...that

we can develop domain specific language for new network... ...and how you think your network will effect our... ...community, for example with new network maybe --... Let me take that one, how can new network actually -- So, Cliff Young here gave a keynote address at... ...a workshop called [inaudible], which it was... ...how can we use machine learning to design computers, right... ...and I think that's a really interesting idea, right. There's... ...something in machine learning called auto machine learning where... ...you use machine learning to design the models. So, you know, this is revolutionizing and machine learning is

many fields... ...it would be interesting to use it here, so like should branch predictors... ...be based on machine learning principles prefectures of machine learning.. It’s kind of more like a question how well could we -- could we... design machines better for use machine learning, that's ...another, you know, exciting, potentially revolutionary idea... and I see Cliff’s keynote. Hi, I'm Lee from Hallway. I work on hydrogenous accelerators. I just want to looking at the system level, like... ...you have all these domain specific hours... ...so now we're talking about applications... ...on the cloud on all of these views.

We need something to aggregate all of these hours, What's your comment on this because I think my opinion there's a... ...lot, like bigger gaps on the software side... ...and the hardware side for [inaudible]. Yeah, there's a big gap. There's a big gap in both. I... ...mean I think that's what makes it an exciting time. There's lots of opportunity to kind of rethink... ...how we program, how we organize our architecture... ...and that provides -- that's why we think it’s a new Golden Age... ...right. We've had this run up. We ran up all... ...this curve and we

pushed everything out there. Those ideas are done. Time to rethink the problem and that... means there's a great opportunity for -- Maybe we should make it clear what we mean by Golden Age... ...so we’re kind of researchers, right and... ...what could be really scary for companies which make a lot of gold... ...isn’t what we’re talking about that it’s... ...like oh money on the floor. This is got... ...wow, we don't know what to do. So, people... ...like us have an opportunity to lead the way. When it’s crystal clear that companies can keep... ...doing the same thing

and make a lot of money, you know, it's very comforting for companies... ...but not so exciting for researchers. So, as there's just a target rich environment for... ...researchers in architecture to make big contributions... to, you know, society. Okay. They’ve told us that the reception is about to start... So, we got to thank everybody for all your great... ...questions and Vicki wants to just close. Thank you. I just wanted to have everyone give them another... ...big hand. Thank you for that wonderful talk.