Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

46.88k views4736 WordsCopy TextShare
Snowflake Inc.
In recent years, the spotlight in AI has primarily been on large language models (LLMs) and emerging...
Video Transcript:
please welcome Andrew [Applause] in thank you it's such a good time to be a builder I'm excited to be back here at snowfake build what i' like to do today is share you where I think are some of ai's biggest opportunities you may have heard me say that I think AI is the new electricity that's because a has a general purpose technology like electricity if I ask you what is electricity good for it's always hard to answer because it's good for so many different things and new AI technology is creating a huge set of opportunities for us to build new applications that weren't possible before people often ask me hey Andrew where are the biggest AI opportunities this is what I think of as the AI stack at the lowest level is the semiconductors and then on top of that lot of the cloud infr to including of Course Snowflake and then on top of that are many of the foundation model trainers and models and it turns out that a lot of the media hype and excitement and social media Buzz has been on these layers of the stack kind of the new technology layers when if there's a new technology like generative AI L the buzz is on these technology layers and there's nothing wrong with that but I think that almost by definition there's another layer of the stack that has to work out even better and that's the applic apption layer because we need the applications to generate even more value and even more Revenue so that you know to really afford to pay the technology providers below so I spend a lot of my time thinking about AI applications and I think that's where lot of the best opportunities will be to build new things one of the trends that has been growing for the last couple years in no small pop because of generative AI is fast and faster machine learning model development um and in particular generative AI is letting us build things faster than ever before take the problem of say building a sentiment cost vario taking text and deciding is this a positive or negative sentiment for reputation monitoring say typical workflow using supervised learning might be that will take a month to get some label data and then you know train AI model that might take a few months and then find a cloud service or something to deploy on that'll take another few months and so for a long time very valuable AI systems might take good AI teams six to 12 months to build right and there's nothing wrong with that I think many people create very valuable AI systems this way but with generative AI there's certain cles of applications where you can write a prompt in days and then deploy it in you know again maybe days and what this means is there are a lot of applications that used to take me and used to take very good AI teams months to build that today you can build in maybe 10 days or so and this opens up the opportunity to experiment with build new prototypes and and ship new AI products that's certainly the prototyping aspect of it and these are some of the consequences of this trend which is fast experimentation is becoming a more promising path to invention previously if it took six months to build something then you know we better study it make sure there user demand have product managers we look at it document it and and then spend all that effort to build in it hopefully it turns out to be worthwhile but now for fast moving AI teams I see a design pattern where you can say you know what it take us a weekend to throw together prototype let's build 20 prototypes and see what SS and if 18 of them don't work out we'll just stitch them and stick with what works so fast iteration and fast experimentation is becoming a new path to inventing new user experiences um one of interesting implication is that evaluations or evals for short are becoming a bigger bottleneck for how we build things so it turns out back in supervised learning world if you're collecting 10,000 data points anyway to trade a model then you know if you needed to collect an extra 1,000 data points for testing it was fine whereas extra 10% increase in cost but for a lot of large language Mel based apps if there's no need to have any trading data if you made me slow down to collect a thousand test examples boy that seems like a huge bottleneck and so the new Dev velopment workflow often feels as if we're building and collecting data more in parallel rather than sequentially um in which we build a prototype and then as it becomes import more important and as robustness and reliability becomes more important then we gradually build up that test St here in parallel but I see exciting Innovations to be had still in how we build evals um and then what I'm seeing as well is the prototyping of machine learning has become much faster but building a software application has lots of steps does the product work you know the design work does the software integration work a lot of Plumbing work um then after deployment Dev Ops and L Ops so some of those other pieces are becoming faster but they haven't become faster at the same rate that the machine learning modeling pot has become faster so you take a process and one piece of it becomes much faster um what I'm seeing is prototyping is not really really fast but sometimes you take a prototype into robust reliable production with guard rails and so on those other steps still take some time but the interesting Dynamic I'm seeing is the fact that the machine learning p is so fast is putting a lot of pressure on organizations to speed up all of those other parts as well so that's been exciting progress for our few and in terms of how machine learning development um is speeding things up I think the Mantra moved fast and break things got a bad rep because you know it broke things um I think some people interpret this to mean we shouldn't move fast but I disagree with that I think the better mindra is move fast and be responsible I'm seeing a lot of teams able to prototype quickly evaluate and test robustly so without shipping anything out to The Wider world that could you know cause damage or cause um meaningful harm I'm finding smart teams able to build really quickly and move really fast but also do this in a very responsible way and I find this exhilarating that you can build things and ship things and responsible way much faster than ever before now there's a lot going on in Ai and of all the things going on AI um in terms of technical Trend the one Trend I'm most excited about is agentic AI workflows and so if you to ask what's the one most important AI technology to pay attention to I would say is agentic AI um I think when I started saying this you know near the beginning of this year it was a bit of a controversial statement but now the word AI agents has is become so widely used uh by by Technical and non-technical people is become you know little bit of a hype term uh but so let me just share with you how I view AI agents and why I think they're important approaching just from a technical perspective the way that most of us use large language models today is with what something is called zero shot prompting and that roughly means we would ask it to uh give it a prompt write an essay or write an output for us and it's a bit like if we're going to a person or in this case going to an AI and asking it to type out an essay for us by going from the first word writing from the first word to the last word all in one go without ever using backspac just right from start to finish like that and it turns out people you know we don't do our best writing this way uh but despite the difficulty of being forced to write this way a Lish models do you know not bad pretty well here's what an agentic workflow it's like uh to gener an essay we ask an AI to First write an essay outline and ask you do you need to do some web research if so let's download some web pages and put into the context of the large H model then let's write the first draft and then let's read the first draft and critique it and revise the draft and so on and this workflow looks more like um doing some thinking or some research and then some revision and then going back to do more thinking and more research and by going round this Loop over and over um it takes longer but this results in a much better work output so in some teams I work with we apply this agentic workflow to processing complex tricky legal documents or to um do Health Care diagnosis Assistance or to do very complex compliance with government paperwork so many times I'm seeing this drive much better results than was ever possible and one thing I'm want to focus on in this presentation I'll talk about later is devise of visual AI where agentic repal are letting us process image and video data but to get back to that later um it turns out that there are benchmarks that show seem to show a gentic workflows deliver much better results um this is the human eval Benchmark which is a benchmark for open AI that measures learning out lar rage model's ability to solve coding puzzles like this one and um my team collected some data turns out that um on this Benchmark I think it was POS K Benchmark POS K metric GB 3. 5 got 48% right on this coding Benchmark gb4 huge Improvement you know 67% but the improvement from GB 3. 5 to gbd4 is dwarf by the improvement from gbt 3.
5 to GB 3. 5 using an agentic workflow um which gets over up to about 95% and gbd4 with an agentic workflow also does much better um and so it turns out that in the way Builders built agentic reasoning or agentic workflows in their applications there are I want to say four major design patterns which are reflection two use planning and multi-agent collaboration and to demystify agentic workflows a little bit let me quickly step through what these workflows mean um and I find that agentic workflows sometimes seem a little bit mysterious until you actually read through the code for one or two of these go oh that's it you know that's really cool but oh that's all it takes but let me just step through um to for for concreteness what reflection with ls looks like so I might start off uh prompting an L there a coder agent l so maybe an assistant message to your roles to be a coder and write code um so you can tell you know please write code for certain tasks and the L May generate codes and then it turns out that you can construct a prompt that takes the code that was just generated and copy paste the code back into the prompt and ask it you know he some code intended for a Tas examine this code and critique it right and it turns out you prompt the same Elum this way it may sometimes um find some problems with it or make some useful suggestions out proofy code then you prompt the same LM with the feedback and ask you to improve the code and become with with a new version and uh maybe foreshadowing two use you can have the LM run some unit tests and give the feedback of the unit test back to the LM then that can be additional feedback to help it iterate further to further improve the code and it turns out that this type of reflection workflow is not magic doesn't solve all problems um but it will often take the Baseline level performance and lift it uh to to better level performance and it turns out also with this type of workflow where we're think of prompting an LM to critique his own output use it own criticism to improve it this may be also foreshadows multi-agent planning or multi-agent workflows where you can prompt one prompt an LM to sometimes play the role of a coder and sometimes prom on to play the role of a CR of a Critic um to review the code so such the same conversation but we can prompt the LM you know differently to tell sometimes work on the code sometimes try to make helpful suggestions and this same results in improved performance so this is a reflection design pattern um and second major design pattern is to use uh in which a lar language model can be prompted to generate a request for an API call to have it decide when it needs to uh search the web or execute code or take a the task like um issue a customer refund or send an email or pull up a calendar entry so to use is a major design pattern that is letting large language models make function calls and I think this is expanding what we can do with these agentic workflows um real quick here's a planning or reasoning design pattern in which if you were to give a fairly complex request you know generate image or where girls reading a book and so on then an LM this example adapted from the hugging GTP paper an LM can look at the picture and decide to first use a um open pose model to detect the pose and then after that gener picture of a girl um after that you'll describe the image and after that use sex the spe or TTS to generate the audio but so in planning you an L look at a complex request and pick a sequence of actions execute in order to deliver on a complex task um and lastly multi Asian collaboration is that design pattern alluded to where instead of prompting an LM to just do one thing you prompt the LM to play different roles at different points in time so the different agents simulate agents interact with each other and come together to solve a task and I know that some people may may wonder you know if you're using one why do you need to make this one play the role with multip multiple agents um many teams have demonstrated significant improved performance for a variety of tasks using this design pattern and it turns out that if you have an LM sometimes specialize on different tasks maybe one at a time have it interact many teams seem to really get much better results using this I feel like maybe um there's an analogy to if you're running jobs on a processor on a CPU you why do we need multiple processes it's all the same process there you know at the end of the day but we found that having multiple FS of processes is a useful extraction for developers to take a task and break it down to subtask and I think multi-agent collaboration is a bit like that too if you were big task then if you think of hiring a bunch of agents to do different pieces of task then interact sometimes that helps the developer um build complex systems to deliver a good result so I think with these four major agentic design patterns agentic reasoning workflow design patterns um it gives us a huge space to play with to build Rich agents to do things that frankly were just not possible you know even a year ago um and I want to one aspect of this I'm particularly excited about is the rise of not not just large language model B agents but large multimodal based a large multimodal model based agents so um give an image like this if you were wanted to uh use a lmm large multimodal model you could actually do zero shot PR and that's a bit like telling it you know take a glance at the image and just tell me the output and for simple image thoughts that's okay you can actually have it you know look at the image and uh right give you the numbers of the runners or something but it turns out just as with large language modelbased agents SL multi modelbased model based agents can do better with an itative workflow where you can approach this problem step by step so detect the faces detect the numbers put it together and so with this more irrit workflow uh you can actually get an agent to do some planning testing right code plan test right code and come up with a most complex plan as articulated expressing code to deliver on more complex thoughts so what I like to do is um show you a demo of some work that uh Dan Malone and I and the H AI team has been working on on building agentic workflows for visual AI tasks so if we switch to my laptop um let me have an image here of a uh soccer game or football game and um I'm going to say let's see counts the players in the vi oh and just so fun if you're not how to prompt it after uploading an image This little light bulb here you know gives some suggested prompts you may ask for this uh but let me run this so count players on the field right and what this kicks off is a process that actually runs for a couple minutes um to Think Through how to write code uh in order to come up a plan to give an accurate result for uh counting the number of players in the few this is actually a little bit complex because you don't want the players in the background just be in the few I already ran this earlier so we just jumped to the result um but it says the Cod has selected seven players on the field and I think that should right 1 2 3 4 5 six seven um and if I were to zoom in to the model output Now 1 2 3 4 five six seven I think that's actually right and the part of the output of this is that um it has also generated code uh that you can run over and over um actually generated python code uh that if you want you can run over and over on the large collection of images es and I think this is exciting because there are a lot of companies um and teams that actually have a lot of visual AI data have a lot of images um have a lot of videos kind of stored somewhere and until now it's been really difficult to get value out of this data so for a lot of the you know small teams or large businesses with a lot of visual data visual AI capabilities like the vision agent lets you take all this data previously shove somewhere in BL storage and and you know get real value out of this I think this is a big transformation for AI um here's another example you know this says um given a video split this another soccer game or football game so given video split the video clips of 5 Seconds find the clip where go is being scored display a frame so output so Rand is already because takes a little the time to run then this will generate code evaluate code for a while and this is the output and it says true 1015 so it think those a go St you know around here around between the right and there you go that's the go and also as instructed you know extracted some of the frames associated with this so really useful for processing um video data and maybe here's one last example uh of of of the vision agent which is um you can also ask it FR program to split the input video into small video chunks every 6 seconds describe each chunk andore the information at Panda's data frame along with clip name s and end time return the Panda's data frame so this is a way to look at video data that you may have and generate metadata for this uh that you can then store you know in snow fake or somewhere uh to then build other applications on top of but just to show you the output of this um so you know clip name start time end time and then there actually written code um here right wrot code that you can then run elsewhere if you want uh let me put in a stream the tab or something that you can then use to then write a lot of you know text descriptions for this um and using this capability of the vision agent to help write code my team at Landing AI actually built this little demo app that um uses code from the vision agent so instead of us sing the write code have the Vision agent write the code to build this metadata and then um indexes a bunch of videos so let's see I say browsing so skar airborne right I actually ran this earlier hope it works so what this demo shows is um we already ran the code to take the video split in chunks store the metadata and then when I do a search for skier Airborne you know it shows the clips uh that have high similarity right right oh marked here with the green has high similarity well this is getting my heart rate out seeing do that oh here's another one whoa all right all right and and the green parts of the timeline show where the skier is Airborne let's see gray wolf at night I actually find it pretty fun yeah when when you have a collection of video to index it and then just browse through right here's a gray wolf at night and this timeline in green shows what a gr wolf and Knight is and if I actually jump to different part of the video there's a bunch of other stuff as well right there that's not a g wolf at night so I that's pretty cool um let's see just one last example so um yeah if I actually been on the road a lot uh but if sear if your luggage this black luggage right um there this but it turns out turns out there actually a lot of black Luggage So if you want your luggage let's say black luggage with rainbow strap this there a lot of black luggage out there then you know there right black luggage with rainbow strap so a lot of fun things to do um and I think the nice thing about this is uh the work needed to build applications like this is lower than ever before so let's go back to the slides um and in terms of AI opportunities I spoke a bit about agentic workflows and um how that is changing the AI stack is as follows it turns out that in addition to this stack I show there's actually a new emerging um agentic orchestration layer and there little orchestration layer like L chain that been around for a while that are also becoming increasingly agentic through langra for example and this new agentic orchestration layer is also making easier for developers to build applications on top uh and I hope that Landing ai's Vision agent is another contribution to this to makes it easier for you to build visual AI applications to process all this image and video data that possibly you had but that was really hard to get value all of um until until more recently so but fire when I you what to think are maybe four of the most important AI Trends there's a lot going on on AI is impossible to summarize everything in one slide if you had to make me pick what's the one most important Trend I would say is a gentic AI but here are four of things I think are worth paying attention to first um turns out agentic workflows need to read a lot of text or images and generate a lot of text so we say that generates a lot of tokens and their exciting efforts to speed up token generation including semiconductor work by Sova Service drop and others a lot of software and other types of Hardware work as well this will make a gentic workflows work much better second Trend I'm about excited about today's large language models has started off being optimized to answer human questions and human generated instructions things like you know why did Shakespeare write mcbath or explain why Shakespeare wrote Mac beath these are the types of questions that L langage models are often as answer on the internet but agentic workflows call for other operations like to use so the fact that large language models are often now tuned explicitly to support tool use or just a couple weeks ago um anthropic release a model that can support computer use I think these exciting developments are create a lot of lift rate create a much higher ceiling for what we can now get atic workloads to do with L langage models that tune not just to answer human queries but to tune EXA explicitly to fit into these erative agentic workflows um third data engineering's importance is rising particularly with unstructured data it turns out that a lot of the value of machine learning was a Structure data kind of tables of numbers but with geni we're much better than ever before at processing text and images and video and maybe audio and so the importance of data engineering is increasing in terms of how to manage your unstructured data and the metad DAT for that and deployment to get the unstructured data where it needs to go to create value so that that would be a major effort for a lot of large businesses and then lastly um I think we've all seen that the text processing revolution has already arrived the image processing Revolution is in a slightly early phase but it is coming and as it comes many people many businesses um will be able to get a lot more value out of the visual data than was possible ever before and I'm excited because I think that will significantly increase the space of applications we can build as well so just wrap up this is a great time to be a builder uh gen is learning us experiment faster than ever a gentic AI is expanding the set of things that now possible and there just so many new applications that we can now build in visual AI or not in visual AI that just weren't possible ever before if you're interested in checking out the uh visual AI demos that I ran uh please go to va. landing.
Copyright © 2025. Made with ♥ in London by YTScribe.com