What's next for AI agentic workflows ft. Andrew Ng of AI Fund
316.16k views2576 WordsCopy TextShare
Sequoia Capital
Andrew Ng, founder of DeepLearning.AI and AI Fund, speaks at Sequoia Capital's AI Ascent about what'...
Video Transcript:
all of you uh know Andreu in as a famous uh computer science professor at Stanford was really early on in the development of neural networks with gpus of course a creator of corsera and popular courses like deeplearning. ai also the founder and Creator uh and early lead of Google brain uh but one thing I've always wanted to ask you before I hand it over Andrew while you're on stage uh is a question I think would be relevant to the whole audience 10 years ago on problem set number two of cs229 you gave me a b and I was wondering I looked it over I was wondering what you saw that I did incorrectly so anyway Andrew thank you Hansen um looking forward to sharing with all of you what I'm seeing with AI agents which I think is the exciting Trend that I think everyone building in AI should pay attention to and then also excited about all all the other uh on Sak presentations so hey agents you know today the way most of us use Lish models is like this with a non- agentic workflow where you type a prompt and generates an answer and that's a bit like if you ask a person to write an essay on a topic and I say please sit down to the keyboard and just type the essay from start to finish without ever using backspace um and despite how hard thises is L's do it remarkably well in contrast with an agentic workflow this is what it may look like have an AI have an LM say write an essay outline do you need to do any web research if so let's do that then write the first draft and then read your own first draft and think about what parts need revision and then revise your draft and you go on and on and so this workflow is much more iterative where you may have the L do some thinking um and then revise this article and then do some more thinking and iterate this through a number of times and what not many people appreciate is this delivers remarkably better results um I've actually been really surprised myself working these agent workflows how well how well they work I's do one case study at my team analyzed some data uh using a coding Benchmark called the human eval Benchmark released by open a few years ago um but this says coding problems like given the nonent list of integers return the sum of all the all elements are an even positions and it turns out the answer is you code snipper like that so today lot of us will use zero shot prompting meaning we tell the AI write the code and have it run on the first spot like who codes like that no human codes like that just type out the code and run it maybe you do I can't do that um so it turns out that if you use GPT 3. 5 uh zero shot prompting it gets it 48% right uh gp4 way better 607 7% right but if you take an agentic workflow and wrap it around GPT 3.
5 I say it actually does better than even gbd4 um and if you were to wrap this type of workflow around gb4 you know it it it also um does very well and you notice that gbd 3. 5 with an agentic workflow actually outperforms gp4 um and I think this has and this means that this has signant consequences fighting how we all approach building applications so agents is the ter of around a lot there's a lot of consultant reports talk about agents the future of AI blah blah blah I want to be a bit concrete and share of you um the broad design patterns I'm seeing in agents it's a very messy chaotic space tons of research tons of Open Source there's a lot going on but I try to categorize um bit more concretely what's going on agents reflection is a tool that I think many of us should just use it just works uh to use I think it's more widely appreciated but actually works pretty well I think of these as pretty robust technology when I use them I can you know almost always get them to work well um planning and multi-agent collaboration I think is more emerging when I use them sometimes my mind is blown for how well they work but at least at this moment in time I don't feel like I can always get them to work Rel Lively so let me walk through these four design patterns in the few slides and if some of you go back and yourself will ask your engineers to use these I think you get a productivity boost quite quickly so reflection here's an example let's say ask a system please write code for me for a given task then we have a coder agent just an LM that you prompt to write code to say you def du task write a function like that um an example of self-reflection would be if you then prompt the LM with something like this here's code intended for a toas and just give it back the exact same code that they just generated and then say check the code carefully for correctness sound efficiency good construction CRI just write prompt like that it turns out the same l that you prompted to write the code may be able to spot problems like this bug in line Five May fix it by blah blah blah and if you now take his own feedback and give it to it and reprompt it it may come up with a version two of the code that could well work better than the first version not guaranteed but it works you know often enough for this be wor trying for a lot of applications um to foreshadow to use if you let it run unit test if it fails a unit test then he why do you fail the unit test have that conversation and be able to figure out fail the unit test so you should try changing something and come up with V3 by the way for those of you that want to learn more about these Technologies I'm very excited about them for each of the four sections I have a little recommended reading section at the bottom that you know hopefully gives more references and again just the foreshadow multi-agent systems I've described as a single coder agent that you prompt to have it you know have this conversation with itself um one Natural Evolution of this idea is instead of a single code agent you can can have two agents where one is a coder agent and the second is a Critic agent and these could be the same base LM model but that you prompt in different ways where you say one your expert coder right code the other one say your expert code review to review this code and this Tye of workflow is actually pretty easy to implement I think it's such a very general purpose technology for a lot of workflows this would give you a significant boost in in the performance of LMS um the second design pattern is to use many of where already have seen you know LM based systems uh uh using tools on the left is a screenshot from um co-pilot on the right is something that I kind of extracted from uh gp4 but you know LM today if you ask it what's the best coffee maker web search for some problems um will generate code and run code um and it turns out that there are a lot of different tools that many different people are using for analysis for gathering information for taking action for personal productivity um it turns out a lot of the early work in two use turned out to be in the computer vision Community because before large language models lm's you know they couldn't do anything with images so the only option was that the LM generate a function called that could manipulate an image like generate an image or do object detection or whatever so if you actually look at literature it's been interesting how much of the work um in two years seems like it originated from Vision because LMS would blind to images before you know gp4 and and and lava and so on um so that's two use and it expands what an LM can do um and then planning you know for those of you that have not yet played a lot with planning algorithms I I feel like a lot of people talk about the chat GPT moment where you're wow never seen anything like this I think if not used planning alums many people will have a kind of a AI agent wow I couldn't imagine the AI agent doing this I've run live demos where something failed and the AI agent rerouted around the failures I've actually had quite a few of those moment wow you can't believe my AI system just did that autonomously but um one example that I adapted from a hugging GPT paper you know you say this general image where the girls read where a girl is reading a book and it posts the same as a boy in the image example. jpack and please subscribe the new image for your voice so give an example like this um today we have ai agents who can kind of decide first thing I need to do is determine the post of the boy um then you know find the right model maybe on hugging face to extract the post then next need to find a post image model to synthesize a picture of a of a girl of as following the instructions then use image to text to and then finally use text of speech and today we actually have agents that I don't want to say they work reliably you know they're kind of finicky they don't always work but when it works is actually pretty amazing but with agentic loops sometimes you can recover from earlier failures as well so I find myself already using research agents for some of my work where one of piece of research but I don't feel like you know Googling myself and spend a long time I should send to the research agent come back in a few minutes and see what it's come up with and and it sometimes works sometimes doesn't right but that's already a part of my personal workflow the final design pattern multi- Asian collaboration this is one of those funny things but uh um it works much better than you might think uh uh but on the left is a screenshot from a paper called um chat Dev uh which is completely open which actually open source many of you saw the you know flashy social media announcements of demo of a Devon uh uh Chad Dev is open source it runs on my laptop and what Chad Dev doeses is example of a multi-agent system where you prompt one LM to sometimes act like the CEO of a software engine company sometimes Act designer sometime a product manager sometimes I a tester and this flock of agents that you built by prompting an LM to tell them you're now Co you're now software engineer they collaborate have an extended conversation so that if you tell it please develop a game develop a GOI game they'll actually spend you know a few minutes writing code testing it uh iterating and then generate a like surprisingly complex programs doesn't always work I've used it sometimes it doesn't work sometimes it's amazing but this technology is really um getting better and and just one of design pattern it turns out that multi-agent debate where you have different agents you know for example could be have ch GPT and Gemini debate each other that actually results in better performance as well so having multiple simulated air agents work together has been a powerful design pattern as well um so just to summarize I think these are the these are the the the uh patterns of seen and I think that if we were to um use these uh uh patterns you know in our work a lot of us can get a prity boost quite quickly and I think that um agentic reasoning design patterns are going to be important uh this is my small slide I expect that the set of T AI could do will expand dramatically this year uh because of agentic workflows and one thing that it's actually difficult people to get used to is when we prompt an LM we want to response right away um in fact a decade ago when I was you know having discussions around at at at Google on um it called a big box search we type a long prompt one of the reasons you know I failed to push successfully for that was because when you do a web search you one of responds back in half a second right that's just human nature we like that instant grab instant feedback but for a lot of the agent workflows um I think we'll need to learn to dedicate the toss and AI agent and patiently wait minutes maybe even hours uh to for a response but just like I've seen a lot of novice managers delegate something to someone and then check in 5 minutes later right and that's not productive um I think we need to it be difficult we need to do that with some of our AI agents as well I saw I heard some loss um and then one other important Trend fast token generation is important because with these agented workflows we're iterating over and over so the LM is generating tokens for the elm to read so be able to generate tokens way faster than any human to read is fantastic and I think that um generating more tokens really quickly from even a slightly lower quality LM might give good results compared to slower tokens from a better LM maybe it's a little bit controversial because it may let you go around this Loop a lot more times kind of like the results I showed with gbd3 and an agent architecture on the first slide um and cand I'm really looking forward to Cloud 5 and uh CL 4 and gb5 and Gemini 2.