Andrew Ng On AI Agentic Workflows And Their Potential For Driving AI Progress

52.23k views5161 WordsCopy TextShare

Snowflake Developers

In this Luminary Talk given during Dev Day at Snowflake Summit 2024, Landing AI Founder and CEO Andr...

Video Transcript:

hello is everyone excited to be here W Welcome to our first ever de day I'm really thrilled to have you folks join us and we are humbled by the community response and are super excited about this amazing turnout and this week at snowflake Summit we talked a lot about our new products our vision for our future with customers prospects Partners but today it's for the Builder Community it's for all of you snowflake customers or not we want you to connect share ideas and scale up in data and AI we want to get inspiration from each other and from the industry luminaries we have lined up this afternoon I took a picture of myself with Andrew like to the side I was like oh my God I'm with the demig god um I'm a developer I'm a I'm a software engineer at heart and I like love the little things that technology can do I'm genuinely super excited when I check out new things I wrote my first streamlet app it like all of 10 lines long and I was like holy cow this thing runs inside you know snowflake I don't have to deploy a server it just works out of the box and of course I shared that with Adrian was like oh my God you're writing a stream lit app and I was and I get super inspired when folks like our PM director Jeff Holland he made this video this is like this weird idea that I had hey let's use container services to do some video transcriptions um and then get structured attributes from those transcriptions use cartex search to put it in and have a chat box okay and he made that happen in a couple of hours and I could then install an app to do the very same thing also in like 10 minutes and I could Tinker with it and these are all great things because we are able to grow the community of developers that build on snowflake it's a strategic priority for us so we're evolving and investing to better meet the needs of Builders like you and although we started as a source close you know as a closed Source product with with an Enterprise Focus we are opening up we are becoming an application platform with a healthy dose of Open Source and community-led development and you heard it before we just concluded our first International AI hackathon featuring Arctic our own true open llm and congrats to the winners but we began investing in our developer program 5 years ago to support developers building data intensive applications it's our sweet spot and the growth has been amazing thousands of people are powered by snowflake already and we partner closely with these companies at every stage to help them build but also scale their applications with customers like help them generate revenue and whether it's providing build and design resources specialized support or go to market we have the partner program we are aligned with the growth of these Partners on snowflake you can have fun building and creating amazing startups that can change the world with our support and hundreds of startups are building their entire businesses on top of snowflake with a handful of them including folks like Maxa my data outlet and relational AI earning Millions from Distributing their apps on the snowflake Marketplace and I met mhom earlier yesterday I was like dude snowflake ran a an unpaid commercial for you for 25 minutes that's what the keynote yesterday was and we also make Equity investments in these startups because we want to align long-term incentives earlier today on this very stage big Geo scientific Financial systems and Signal flare. were the finalist of our fourth annual startup Challenge and they competed for up to a million in investment from Snowflake winners and the big winner is Big congrats to signal flare. for winning the startup challenge please give them a big round of [Applause] applause and with our snowflake native app accelerator funding program we have partnered with 10 leading VC firms to invest up to $100 million in early stage startups that are building native acts we are also investing in training for our Builders to help them skill up and grow their careers just this week we launched the NSTAR education program from self-fed online courses and in-person workshops in all regions of the world all of this for free and check out the courses we just dropped on corsera to start building on snowflake I feel very fortunate that we are all at the center stage where data AI technology is still transforming the world it's a thrill it's a privilege it's also a responsibility and we are very grateful to many of The Luminaries there's no other word for them that are driving the transformation and are joining us here today as we kick off our luminary talk series and I am delighted to welcome our first luminary speaker on on stage founder and CEO of Landing AI co-founder and chairman of corsetta and a formal Google colleague please welcome DrAndreu [Applause] in hey thanks welcome welcome Andrew it's a privilege it's an honor it's a Trill to be on the same stage as you you've been around AI for way longer than most people what was your AI aha moment by the way I went to grad school at Brown and everybody then told me this is like 20 years ago 25 years ago everybody's like don't touch AI nothing will come out of it wow they're wildly wrong but what was your Biga moment for AI I remember when was a teenager uh my first job was as an office admin and I just remember doing so much photo copying like Photo copying photo copying photo cop and even then as a teenager I thought boy if only we could you know do something to alterate all the solo copy I had to do maybe I could spend my time on something else that's why and that's why I W up ping computer science in Korean computer science and Ai and in fact your mouse just now I i' actually forgotten I I I saw you operate the Google ads business now you're CE of a huge company when you mentioned that you writing stream code I got to throw all of that I think you know that can actually be fun that stream one was fun I was so excited to watch the video of Landing Ai and snowflake working together Landing lens that we posted together on uh on on LinkedIn that to me is uh is is is like pure JY um as we're talking about AI I have to ask is there a billion dollar model coming you think where like you know people need I don't know 50,000 h100s to get started step one yeah yeah I definitely some people are thinking that way it be interest see we get there um P me feels like there could be cheaper less Capital intensive less energy intensive PS as well to build highly intelligent systems but on the other hand I don't think we've squeeze all the juice we can out of sheer scaling laws so that's also worth pursuing and and I just say I really appreciate the work that snowflakes been doing as well on open sourcing Artic I think yeah need need more we need more contributors to do to that kind of thing to me good things happen when technology spreads broadly when lots of people can do the same thing um otherwise it gets like naturally falling into the hands of a few mean that we don't like get broad-based benefits so you know for me that's the reason why I hope models stay somewhat less expensive so that more people can develop more people can tinker and push all of us um forward um couple more questions you are at the US capital recently where there was this debate or open source model AI regulation um where do you land uh in this debate yeah you know at this moment I'm actually very worried about California's proposed s1047 yeah which I think would be very stifling for innovation in open source so I feel like there's a technology layer there's a technology layer and Technologies are useful for many applications then there's an application layer which tends to be a specific instantiate ations of technology to meet a customer need and for general purpose technology like AI it's impossible to stop AI from being adapted to potentially harmful use cases so California's s1047 um poses the Spectre of liability if say someone op sources a model and someone finds some way to adapt it to NE fairest means and I wish we could guarantee AI will never be used for for bad things I wish we guarantee computers will never be used for bad things but if you say that any computer manufacturer is liable if anyone uses your computer for something bad then the only rational move is that no one should make any more computers and that would be awful so I think Washington DC fortunately has gotten smarter um I feel like over the last year you know the White House Executive Order I had some concerns with but I think the House and Senate have gotten decently smart the Schumer gang actually figured out Ai and it's more pro- investing than Pro shutting it down but actually really worried that here in California which is home to so much AI Innovation there's this truly awful proposal on on the boat just pass a senate vote going to the assembly next uh um that that I think would be awful if it passes so we'll see all of you you know go go fight the fight sp sp 1047 is an awful idea people forget I think like it is really important to uh reiterate what Andrew just said which is you know all of us need to understand that AI is a technology um and uh yes there'll be good things that come from technology but there'll also be bad people that use technology we need to make sure that laws cover those things um but not you know either make a hero or a villain out of Technology there are going to be all kinds of different um use cases um that we as a society need to be ready for okay and and to be clear I'm Pro thoughtful regulation right take out the HMS regular harmful applications I'm Pro thoughtful guardrails but when regulations puts in place impossible requirements that's then I think the only thing that will do is stifle technology and stifle Innovation that's correct and that's the thing to remember which is premature regulation can be super stifling because it introduces so much risk okay topic deser we know that you know language models whether gbd3 or four or you know the Llama models or Arctic you know we big steps forward but the buzz these days which you have written about which you have thought a lot about is Agent Aki can you tell us what it's all about yeah so I think AI agents which I chat little about later with the presentation as well is significantly expanding the set of what can be done with AI um I feel like um uh you with a set of AI tools and launch language models uh uh that that are working and the work on cortex is brilliant frankly and I find that when you built on top of these tools we can even further expand what is possible of a large language model um and in terms of a technology Trends I think for any Builder anyone building AI yeah if I to pick one thing to to keep an eye on I would say is AI agents I think that more there's more than one thing we should keep an eye out on but if I had to pick my top one this might be it well we should all be saying agents agents agents what we won't um with that you know I will leave the floor to you Andrew's got you know going to do a few remarks you'll all love hearing from him as I said this is an incredible privilege for me to have Andrew and the other amazing guests uh that are going to be here I hope all of you have a lot of fun uh listening to him learning from him asking questions and of course doing cool things yourself thank you and I just say really want to thank stre and host no fake team my team Landing AI building Landing lens as a native app on snow really thinking about how to hopefully do more things with cortex has been such a brilliant platform we are super excited to be working with you and your team thank you congratulations thank you good luck thank you so um because this is a developer conference I want to take this opportunity to share with you some things about AI agents I'm excited about and I'm actually going to share some things I've never presented before so there will be new stuff here um so you know AI agents right what are they um many of us are used to using large language models with Z what's called zero shot prompting and that means asking it to write an essay or write a response to prompt and that's a bit like if you imagine going to a person and saying could you please write an essay on topic X by typing from start to finish all in one go without ever using backspace you know and despite the difficulty of writing this way I can't write that way despite the they writing this way L's do pretty well in contrast an agentic workflow is much more iterative you may ask an OM please write an essay on a on a write an essay outline and then ask do you need to do any web research if so go search the web fetch some info then write the first draft then read your draft to see if you can improve it and then revise the draft so with an agentic workflow it looks more like this where the algorthm may do some thinking do some research then revise it and do some more thinking and this iterative Loop actually results in a much better work product um and if you think of using agents to write code as well today we tend to promp an LM you know write code and that's like asking a developer Could you type out the program and have it just run you know from typing for the first to last character and it works surprisingly well but agentic workflows also allow it to work much better so um my team collected some third some data that was based on the coding Benchmark called human eval human eval is a stand a benchmark released by open a few years ago that gives coding puzzles like this you give a nonempty list images return to some and that turns out to be the solution and it turns out that gbt 3.

5 um on on the evaluation metric um uh passet K got it 48% right with zero shot prompting prompting they just write out the code GPD 4 does way better 60% 67% accurate but it turns out that if you take gbt 3. 5 and wrap it in a gentic workflow it does much better and so and and with gp4 that's that does also very well and so to me one thing I hope you take away from this is while there was a huge improvement from gbd 3. 5 to gbd4 that Improvement is actually dwarfed by the improvement from GP 3.

5 with an agentic workflow and to all of you building applications I think that uh um this maybe suggest how much promise an agentic workflow has so um my team at Landing AI works on visual Ai and I want to share with you some late breaking things I've never presented this before we just released this as open source a few days ago on what Amic said about building a vision agent so um the lead of this project Dyan lad is an Avid surfer and so he looks a lot at shock videos that's a shop and these are you know uh uh surface kind of swing around and Dyan was actually interested with videos like these you know how close do sharks get to surface and um this is a video generate so generat a video like this Shar is 6. 07 7. 2 M 9.

4 now it's swung far enough away so we switched the color from red to green and the surface more than 10 m away from the shop so if you were to write code to do this you know you run object detection do some measures finding boundary boxes plot some stuff like you could do it that's kind of annoying you know take several hours to write Cod to do this so I want to show you the way we built this video which was we wrote a prompt can detecting surfboards shs the video draw a green line sh surfboard uh assume 30 pixels is one meter Mark the line red blah blah blah this was the instruction given to the vision agent given this um the lrm you prompts uh uh writes a set of instructions that breaks the task down into sequence of steps extract frames by using you know the extract frames to and so on this is a sequence of steps to do that task after that retrieve tools uh tools means function calls so for example I know save video right as a utility function that saves the list and then we retrieve a long description you know of the save video tool or the save video function and similarly for the other two closest box distance to measure the distance between the shock and the surfer and then based on that um we end up generating code fully automatically generated that when Run results in the video that you just saw right um so I'd like to just dive a little bit deeper into how this works uh so we we we put Vision the vision agent to to to work as follows you input a prompt this is a slightly simpler prompt than the one I used just now but calculate the distance between the shock and the nearest surfboard and the goal of our vision agent is to write code to carry out the task that you prompted it to so that you can then feed it a single image and have it you know generate the desired outcome and um similar to agentic workflows on writing non-image code we find that this works much better than zero shot prompting for for many applications moreover we found that for a lot of uh uh image users you know for example if in Snowflake you have 100,000 images then having code they can very efficiently run on a very large set of images is important too because once you have the code you can take a large stack of images or many video frames or whatever and run it through a relatively efficient piece of code to to to process and get the answers and I want to share with you how um our vision agent Works uh and and it's open source so take a look give us feedback maybe help us improve it but um the vision agent is built with two agents the Cod agent and then also Tesla agent but with a prompt like this a coder agent first runs a planner to build a plan that lists of the steps needed to complete the task um so you know load the image use a tool to detect the object calculate distance and so on um and then it Retreats a detailed description of each of these tools tools means functions and then finally um generate the code right and I don't know if some of this seems a little bit to magical almost but all the code doesn't good help take a look at it take a look at the specific prompts we use you might be surprised when you look at the details how you know all of this stuff is seems Seems magical maybe the first time but look at the code and look at the prompts and and maybe and it and it turns out that when you do this here I few the demos um this says detect every person this image figuration Mouse out put the python dictionary so there is a bunch of codes here's a python dictionary eight people are masked two people are unmasked you know or here's a different prompt to actually generate a visualization plot the detections and so on so this is a new piece of code of automa generated and um you know like I actually missed the unmasked people the the the the object detection thing found the unmas people um one more example oh this one's kind of fun analyze the video every two seconds class does car crash or not I'll put Jason you know showing is there a car crash or not so car crash videos are always well I don't think anyone was hurt but 16c video it's coming there's a car fortunately no one was hurt I think um and if you do that here's the code on the right and it processes the video and all put adjacent showing you know at this time stamp there was no car crash at this time stamp there was a car crash right and so the feedback I'm hearing from quite a lot of people from from my internal team team and some users is yeah I could have written the code myself but would have taken me a few hours and you can now get this done um I find that in computer vision we use lots of different functions and honestly I can never remember right you know what functions to use what's the syntax and this really makes the process of building visual AI applications much easier um for when it works uh and I want to share just one other thing that makes the performance better which is use the test the agent so I showed you the Cod agent and it turns out that you can prompt an LM to say write some tests for this and or write text code and based on that it can execute the test code um right now our test code is often uh type checking so it's a little a little bit limited frankly but even with that we can execute the test code and if the test code fails feed the output back to code agent have it do a reflection and rewrite the code and this gives it a further performance Bo oh and I should say in terms of AC academic literature the two research papers that we count on the most is the agent coder paper by hang all and then also the data interpreter paper by Hong all and take a look at those papers if you want to learn more about these techniques and so just to show one last demo um this is key techic and M video every two seconds uh we wanted it to highlight so this is actually for you know CCTV videos uh Tes kind of put together as a video common people think people want is just highlight the interesting parts to look at it's a long prompt YouTube link so it creates instructions you know like so retrieves tools it turns out the code doesn't work right so the code maybe I'll show you this one the code actually fails a few times um here when running it there's an index error Trace back so we feed all these error messages back to the LM fails the second time fails the third time it turns out the third time it fails um no module name p tube and so the last thing that fixes it is uh it's figured out you know to do pip install Pi tube um and then this actually fixes it runs the code and then you have this kind of you know highlighting in the CCTV agglomerated video which of the which of the four videos has more than 10 vehicles and that you should look at right so um so I'm excited about a gentic AI as a as a direction for many applications including coding and vision and the visual in agent which is what we've been working on just to share some limitations it it's very very far from working all the time um in our experiments two failure is probably one of the most common failures we use a generic object detection system grounding Dino that sometimes fails to detect objects here is missing a bunch of uh Yow was it yellow sorry yellow tomatoes um common failure one of the things I was excited about Landing I collaboration of snowflake was we recently built Landing lens which is a supervised learning computer vision system as a snowflake native app but I think with supervised learning we able to mitigate some of these errors um and then you know it's not good at complex reasoning so here if you say each bread weighs half kilogram how much wavs on the fence with this example system naively protects all the birds but doesn't realize that one of the birds is flying and won't put weight on the fence but it turns out if you modify the prompt to say ignore the flying birds and actually get it right um and I feel like um uh today Vision agent with releasing in beta um it sometimes works sometimes doesn't work it's a little bit finicky to the wording of the prom and sometimes you do need to tune the prom to be more specific about the step-by-step process so I wouldn't say this is you know brilliant amazing song software but sometimes it works and it works I've been really delighted and Amazed by the results um and um I I just want to mention uh oh hey guys you stand up the the team that built division agent is actually here today Dylan's the surfer s in middle and Asian shanka so I hope you you catch them you'll learn more about this either here or at or at the Landing ey booth and uh is also online at va.