Unknown

0 views9977 WordsCopy TextShare

Unknown

Video Transcript:

okay we'll look at the transer that enterprises I think today's lesson I look at the briefly the first three lessons that you have it's going to be less about going really deep in algorithms although we will do a little bit especially when we talk about the key blocks for building agents but it's more about what we're seeing um in the in the market today I'll also give you a few hints like if you're looking into the Masters PhD some of the ideas that you might track or if you want to do a startup again some

of the interesting ideas that you might uh we might pursue as well so let's start with some of the trends that we that we seeing and I'll maybe go very quickly on some of the things that You' already seen and and you know I'll try to keep it uh quick so uh maybe if you look at a little bit of the history and if I even go to my time of of PhD and and um all of my peers my my friends U my classmates were talking about wow why is neural networks even in these

text books at the time support reing Machines of decision trees more less all the regation algorithms were uh working you know quite better and um we were thinking well neural networks would just be odds like it's it was really performing horribly and actually my my um officemate uh who used to play lots of Chess By the way and if you're looking intoing your Masters and PhD in a in a limited amount of time don't play chess all the time that's maybe the The Lesson 12 today um he and I were talking about okay we should

really take this this take this out but we were both wrong at the end of the day and what is it that changed and how come new networks became so much better than anything else one I think from architecture point of view it's much more easily paralyzable compared to many of the other algorithms that we've we've been working at the at the time second beyond beyond that you need the training set right like to be able to paralyze things you need lots of lots of data to be able to make paralyzable really know something something

useful to give context at the time I was looking at detecting some cancer from various CT images and I believe I had access to about know 50 patients data set so every time I I I I did the training it was taking roughly 42 seconds and I was like oh my God it's it's taking so much time every time I click on you know restart 42 seconds of wait time now it's pretty crazy right like if you look at some of the large models and how much they are taking today it's uh you know sometimes

it's taking days sometimes weeks and sometimes months depending on the size of the of the model by the way as a side note one of my worries is that we're not learning fast enough like there was something good about that 42 seconds every 42 seconds I was changing something I was learning what it did to the training and it was at the time most of support Vector machines everyone was using support Vector machines okay if you change this parameter this happens if you change that parameter okay this other thing happens you get intu very very

fast imagine being a researcher working on large language models and then you change something click on start training and you get back to results about four weeks from today so of course that's one of my worries sometimes many of us don't know why it works we're getting intuition faster and faster but uh as a side note like of everything that's happening in the world like our pace of getting intuition is much slower than what it used to be now let's talk about the data sets I said okay neural networks are very paralyzable so that's great

but what about these data sets now imagine getting it's not really all the books but you know taking that aside for a second all the books in the world or all the websites in the world that's a lot of data right but how do you create the training set and you've probably seen seen this in some of the prior classes as well this technique called masking which is you take a sentence I love running or or you know any sentence and you start masking the words and you're trying to predict those words now if you

have the web and or or the books all the books in the world this can create lots and lots of training data so now we solve the training data problem we do not need to deal with only 50 know patient data sets anymore we can we can work with you know a huge amount of data set and combine it with you know really great improvements in gpus in tpus and like especially in hardware and all a sudden neural networks uh have started and with the Transformer stru of course you know there is lot of improvement

in some of the algorithms as well but with the with the Transformer structures all of a sudden it start working really great and you're getting all those all those uh awesome results now now again looking at history I think one of the moments that really especially with Enterprises um change the moment was um when chat GPT launched in November 202 to um after that everyone realized oh my God AI actually works better than we thought it does I believe lots of companies including Google had lots of internal uh tools that showed know chat GPT like

you know chat experiences and other experiences you know internally but making it public a was a brave move but but B I think uh it changed the mindset with with everyone in the world now going through you know some of these other other things that we are seeing and you've seen some of these problems right like you give an image and you you detect a label out of an image or you give some speech and and you recognize what that what that speech is translation lots of AI problems we're seeing lots of improvements especially in

the last one decade and more than that with generative AI you can actually go the other way around as well like now what used to be output you can give input and it generates you know all these images or or speech um Etc videos you see you see all of that or or some text you know poets Etc so just to again give more context so this is image classification on image net and you see when it started in 2011 like this grap at least starts in 2011 um about 50% accuracy rate in in some

of the algorithms and this graph at least is is showing until 2022 and we've already reached you know almost 91% and actually it's Advanced even more with the you know large models uh in general this is huge amount of advancement inde when I look at the time when me and my classmates were doing our masters or phds we were making some predictions about where AI would be and I think we where we are um we thought we would be here about in 100 years or so whereas we arrived there in 15 years or or so

something like that so it's moved so much faster and and and the last few years that acceleration has even increased uh further you see it everywhere speech recognition um so in this uh in this uh data set it moved from 13% in 2016 to it's a measurement in 21 2 and a half% and and nowaday it's actually much better but uh you know open II's products Google's products anthropic like when you look at some of their their systems they're they're working so much better than than some of some of this but compared to you know

what we started with and this is only 2016 only eight years ago it's a day and night difference now I think I mentioned about this it's it's neural networks a Transformer structure of course um there's lots of advancement in in algorithm as well and I mentioned this uh okay how do you uh generate the lot of training sets this concept of next token prediction um you take any sentence and you start masking the masking the keywords and that's how you generate lots and lots of uh keywords and uh at the end I think uh what

happens is you get something that's versatile and effecting effective in handling lots of different use cases anywhere from a technical discussion or kind of a friendto friend conversation they're very versatile in their capabilities and I don't need to tell you because you've been trying this you know all over I'm assuming and now when you look at some of the uh main algorithms or approaches to be able to either build the base models or start improving on them and I think you know agents is a is an overloaded term but uh like building an application or

agent that's specific to your use case what you would usually want to do is get lots of know expert responses like give a prompt and he's the here is the output from from an expert you do this many times actually many many times the issue is that well first of all let's talk talk about the good part it actually works like once to do it many times tens of thousands of times it works the is is getting an expert to be able to give you that much training set is not easy so although the supervised

fine tuning is very effective it requires lots of training set but also it requires like finding those experts and being be able to spend lots of time on them as well so the more practical approach that we've seen is you know usual reinforcement learning from Human feedback essentially you can have the your your base algorithm generate multiple options and let them in front of a a human that says and hopefully an expert that says thumbs up and thumbs down giving a thumbs up and thumbs down is significantly faster you know process than providing a very

expert opion sometimes know you can do both of both of those um and then you can feed these inputs into your loss function for optimization um of your of your uh large language models so these are some of the main approaches we we're seeing I'm going fast in here because I know in some of the other classes you you've seen this and uh at least on Google S you've seen uh Gemini and um this is a project and I'll go very fast on this one and this is all public data but this was released in

December and I think February there was another another version that's already released I think a few interesting things it's it's multi model from day one and second it's uh it has the ability to be able to provide a very large context uh indeed internally we're trying 10 million and and um and we're getting really interesting results and and good results as well now maybe one interesting idea for if for you as a PhD thesis or or Master thesis or a startup um you know search is very fundamental to our lives and the search algorithms are

very fundamental to our lives indeed you will see even to make a large language model work well search is fundamental to that as well one question is is the input context is growing with a large language model with a huge amount of input context be able to replace search like the search algorithms traditional search algorithms either keyboard matching or embeddings matching or deep retrial techniques will it be able to replace essentially traditional search algorithms in in future so I think that's you know something as a a food for thought um but uh this is I

think an very interesting interesting problem and lots of interesting research is going to continue happening in that in that area so as I said it's multimodel from from start you can input images text audio video and uh we actually get really interesting interesting results and here's one example uh in this example you see um a physics problem and then you give actually an image um and then you you continue with your text and it's able to provide a a pretty good uh pretty good response and you can try on AI Studio or many of many

of this and as I said large context is really interesting and um if you want to try with you know videos or what or with very large amount of text I think it's it's very interesting to to experiment with those uh it's again available at this link in in AI studio um and I think this was one of the reading materials that I that I mentioned it's um a needle in TCH test uh for any retrieval algorithm or for large language models with with a larger context let's drop a piece of sentence or a or

a piece of a video or or piece of an image or or or or something which is called the needle in a large amount of text and see if the large language model is able to extract extract that so if you especially want to do more research search on that idea about okay how do I build a search engine using just large language models this test becomes really relevant for you but also to be able to test all these large language models as their input context increases um this test is is a very interesting test

and then in the case of Gemini I think green means it it's it's it P the needle and red means it's it's missed it um the results were very very promising and I think in in some of the reading material that I sent there are results from from other uh other LMS as well it's less about the specifics of those results it's more about this test and and learning it and uh and using it in your research when appropriate um and then this is a limb's leaderboard I think Gemini made it to to number one

in August now it's not um then it became number two then then it changed again it's constantly changing and that's maybe one of the artifacts of what's happening in the world right now if you look at 18 months ago but first of all there was more or less one model that was available to to anyone uh through open AI um or at least one really high performance model nowadays that's changing quite uh quite a lot and I think that's actually going to be bringing some of the trends that we're seeing in Enterprises as well and

you will see that uh in a second all right so let let's look at some of the Enterprise uh Trends now now the very first one and maybe the obvious one is that AI is moving so much faster it's not a surprise right but all the Enterprises are all about AI right now so I've been doing this job you know for for for a while now let's say two and a half years ago or three years ago when we have a new product in AI we would usually you know barely be able to find one

or two customer know Enterprise customer to Pilot it and we'll be like wow this is great that has completely changed now for every feature that we're putting immediately thousands if not tens of thousands of Enterprises are like calling us and wanting wanting to try and and this is probably correct for any of these any of these companies so that's a big Trend change and a mindset change when comes to Enterprises and how they're looking but there's also like I think theoretical reasons as well I mean one of them as I said people understood well AI

could work but another one is the amount of data needed to get started has come down very very significantly so before large models and large language models a typical Enterprise needed to collect some training set and then need need to you know clean it up and then usually something went wrong they would go back and redo it again and then it would take months and and I've seen cases of taking it years for them to be able to collect all the training set and then feed it into you know their you know training algorithm or

or machine learning algorithm of their choice now that has changed because all these large models and large language models are already trained with a huge amount of data where the amount of data so what it means is many Enterprises just can take one of the base models and can use it in their in their application or if not like they can just add like let's say you have a very custom application that you want to work on the amount of additional data that you need that you need to provide is very significantly less than what

it used to be before so this I think is really exper things now people can you know start things and and try things within days as opposed to sometimes within years now second and I think this is also a big culture change in the world as well it used to be that to be able to develop any AI algorithm or application you need to be an AI expert a data scientist someone who's trained like either a master's PhD or lots of courses in AI that has completely changed nowadays any developer and we see you know

lots of middle school students high school students are doing this can start building AI applications using large language models and also reading a little bit about okay how do I what do I do to fine tune these models so that's a big change and indeed there a big culture change for uh for all these companies who have been working on AI as well indeed one thing that I tell to my teams is that because all of all of our team members were were essentially producing or developing products for AI experts so that's a culture shift

that I think is happening in the world for the producers of of these large language models they're also trying to focus much more on developers you see that's why like open AI anthropic you know Google Microsoft everyone is doing developer events because any developer can provide and produce AI applications nowadays and and that was much less the case a few years ago now if you think for a second the number of AI experts or data scientists versus number of developers in the world maybe there's more than 100 100 scale that's also partially why we are

saying an exponential growth everywhere like all companies that's working on AI nowadays is is seeing a big exponential growth and this is one of the reasons now literally anyone can go there and start producing AI now a few other technical Trends um and I was partially wrong on this one I thought we would have a separate model for um different let's say at least domains like like for healthcare you would have a different model and and for finance you would have a different model etc etc what is happening is the pace of improvement in the

Bas large models H is is so large that you can produce a base model A a domain specific model and by the time the new uh you know base model comes it's usually already better than that domain specific model so we've seen that a couple of times iteratively and the approach I think many of the at least larger companies are taking right now is produce um a large model that is very good in reasoning now again another idea for you is there is a master student PhD student as a teases work undergrad project or a

startup how do you get you know these large language models and and be able to customize it perfectly for a specific use case and you know potential domain as well inde just Healthcare model is nothing enough you need to like healthcare has thousands of different use cases and you might need a a different model for different use cases but uh if you're looking for another idea to go pursue I think uh the world can use more research in that area right now I think many many people are already have already started but that's at least

another another idea for for you to to look at into um another trend is it started with all dance models and essentially which means that uh during inference time during running your know large language model that that you trained and you learned you would go and execute you know every Noe of of your of your trans Transformer that's also changing now we're going more and more into efficient sparse models so you do not maybe necessarily need to go through every note you can you can be much more efficient about it and maybe you know execute

10% of the paths or Pathways and uh this uh you know is has is is showing a transformative effect because that means a your latency could be much less and B your cost also much much less as well so I think at least that's a trend that we're that we're seeing among other things and I think I mentioned this I personally thought there would be single modality models there will be one for image one for video it's all getting combined into into one thing we're seeing models that are able to understand all kind of modalities

and and probably they'll be able to produce all kind of modalities as well now this is an interesting Trend again 18 months ago especially for Enterprises it was all about okay which model do I use what is the latest model what is the largest model and that was the main you know parameter for choosing what to use and we're seeing that change now we're seeing that it's really the choice of the platform rather than than the model one of the reasons is that every other week there is a new model right so um regardless from

which whichever company but but you you you you start now you expect to to see another model every few weeks every month and especially Enterprises are seeing that and uh for that they they really want to be in a place where they can try many models Al together in that and again this is uhis and um you can see you know this is different this is more recent it's not yesterday but I think it's a few weeks ago you see there are several companies that are producing really great large language models they head-to-head again very

different from 18 months ago what it meant is when you talk to Enterprises and we did talk to maybe thousands of them um and some of these attributes like accessing a broader set of models became very important and then ail to customize these models on top of these base models has become very important and similarly choice and flexibility at every level is very important I'll show a diagram of how visited in verx AI I think you would see similar kind of diagrams everywhere but a typical architecture or architecture whatever you want to call it where

you build on some of these uh success models for uh success factors for generative Ai and of course especially for Enterprises being able to manage these models is very important so it needs to be able to return results not 98% of the time it needs to be able to return 99.9999% of the time so of course that becomes very important as well if you're building an actual production um algorithm an application another Trend I think the cost of API cost is approaching zero which is also very interesting um I think I mentioned some of the

reasons like dance models versus sparse models there are many reasons beyond that there lots of improvement in the chip space gpus are getting better tpus are getting better and you I'm not sure how much more you will see this trend probably you would like it to continue to see one interesting fact is for instance the the flash model uh running that versus some of the open source models we've done lots of calculations it was cheaper to run a flash model which is supposed to be not free than an open source model because you still need

to figure out the hardware and chips yourself and you need to have some cost of management so maybe the trend might slow down a little bit because it's come to a a place where running open source actually is almost more expensive than than some of these uh paid models uh but at least that's a trend that we that we've seen um I think the prices have come down uh at least probably one and a half order of magnitude since how things get get started now good news is usually the latency went down with it as

well and that's very important because the smaller the latencies the the the broader the set of applications you can do that fits that latency requirements of your applications so uh we are seeing the scale also go up as the latency of of these models are going down that means you can build know more lat see sensitive applications as a as a result okay another huge huge huge trend is search and we see I I think every other customer that I meet and probably every other use case uh is starting to use large language models and

search together some of them just make a call to large language models and search separately some of them they you like they use grounding to be able to do it but this is a big Trend that that that wasn't there a year ago so there are multiple reasons for it um if you think about it large language models are are great but they're typically trained in the past sometimes it's two weeks in the past sometimes it's two months sometimes it's six months but sometime in the past so if you're asking the score for the game

last night you're not going to have much success with large language models another one is I think you all know this but large language models hallucinate even the best of the best ones still hasate several percent of the time so essentially they're they're lying to your face but they're very confident about it because they don't know they're lying and for many of the applications where factuality has become more important we see people are going more and more into okay I need to also retrieve some data through search and triangulate between what large language model is

doing and what search is doing and third large language models I think you all know this they're great in reasoning they're able to they're great in answering you know questions and that is coming and maybe combine 100 different sources and and be able to design an answer for you the issue is they cannot tell you which exact source that answer is coming from from and again search has been pretty great uh in citations and that creates for many applications the level of authority that the application needs to be able to say hey I'm saying this

because here is the exact blur from exact this document that's saying this and uh I think uh especially Enterprises in many of their use cases have been have been using the so this is a huge Trend every other application and use case as I said uh seems to be know somewhere on it especially Health Care Finance any kind of know customer service how much is my bill okay let me hallucinate about that like things like that so it's a like you really need search to be able to um you know make the large language modes

much more factual and another maybe last trend is we see that um many Enterprises are interested in also their own employee productivity so we see lots of companies coming out there with tools that are specifically targeting employees of of a company and and the companies are very interested in those tools as well I think Google is coming up with with something as well but other examples gleen we've seen chat GP Enterprise all about making you know the employees more productive U as well now uh I think we we summarize some of this but uh um

we've talked to many many companies to be able to uh come up with know with some of these Trends and I mentioned the like success factors for generative AI now has has really come along these other factors like accessing to broader set of models choice and flexibility uh and ability to customize now with that as I mentioned let's look at you know one possible architecture you know of course vertex AI is is the one that my teams are building so I'm most familiar with but I think many companies have similar architectures so choice and flexibility

at every level I think that's very important to provide to to Enterprises like that's uh probably all the way to chips um that's why for in the case of vertex AI can use GPU or or TPU and it's very important because um at this time there is a huge chip shortage in the world I would say probably the the demand is 100 to supply uh right now so being able to have a choice is there is is quite important again models like being able to provide models from you know whatever company you are the first

party of your models your own models other companies models as well as open source models in one place uh to to your customers to to the Enterprises I think becomes very important and then model builder and agent Builder which are really the tools to be able to customize based on these base uh base models as I said um like at least on vert.x I lots and lots of different models including Gemini models um you know Gemini pro at Le now actually with two million context window and all kind of models you know not just for

text generation but image Generation Um speech to text and text to speech as well as embeddings generation as well but the key is uh you need to provide the the choice and then I think um especially in vertex Ai and I see it in in other platforms as well being able to provide the uh open source models as well at the same place because depending on the use case I think different Enterprises is looking for something else and being able to compare them and evaluate them by the evaluation tools become really really important as well

um becomes very valuable to the to the Enterprises same and now actually we have three and a half sonets on on verx AI as well um actually providing other other uh providers other other companies models as well uh so that again your customers can compare so and then I think uh once you have these base models and then the choice model builder and agent Builder is how you get into to customization and how you get into building successful agents and and and applications and then maybe like in the next part of this uh lesson I'll

focus more on on these which is tuning and distillation grounding extension and function uh function calling as well but if I were to give an overview because I think uh given time will'll probably you know go fast in the next section and uh you know have time for question questions as well tuning is as you might s you know guess you get your own use case you get your own data with it how do I get the base model be able to perform specifically for your use case distillation is um usually these base model are

are are super large and uh and they're able to answer all the questions in the world right and for many use cases especially use cases of the application you do not need that indeed you'd rather have a smaller model that is great in in your exact specific use case and this solation essentially is the is the concept of going from large language models or generating smaller models that are as good as a large model in that in your specific use case or and and and and test data grounding essentially I mentioned it a little bit

combining it with search and and uh making your large language models much more factual and you might want to do it with the web search or web data or you might want to do it with with your own Enterprise data as well but uh essentially combining with search like that's what you should remember when it comes to grounding and then function calling or or extensions and I sometimes call this as I think you probably have watched this show Who Wants To Be A Millionaire but I think one of the things is okay you need to

call a call a like you have an option to call a friend to ask a question and it's there there are two things that are pretty tricky here one you need to know that oh okay that's a question that I do not know and I need to call a friend and second you need to be able to choose the right friend to call when you know you do not know the answer so I think uh function calling is is like that a large language model for instance is not going to be able to book a

flight ticket for you or look up flight schedules um but with a function calling like let's say xedia has a has an extension for that uh you you might have your large language model be successfully be able to do uh do that again It's tricky because it needs to still know that oh I can't do this I need to call an extension and second like for the case of the flight booking it needs to call the xedia extension not for instance the math extension which is great in math but um no not necessarily in Flight

booking so I think uh this next section which we will go a little bit faster is going to really look into these four uh areas fine tuning distillation grounding and function calling and then you look at fine tuning I think you see lots of different algorithms here and and as you go from left to right um for anywhere from prompt design which is not really a training by the way it's essentially you're just changing your prompt to all the way to full fine tuning and maybe this table shows some of them a little bit better

in the case of full fine tuning um you're changing the weights of you know all of your notes in your in your Transformer um in the case of uh prompt design but for that you also need lots and lots of training set in the case of prompt design you're typically just adding one or two or up to 10 examples on what you would expect from the large model or large language model in the case of uh prompt tuning so there was prompt design in the case of prompt tuning you can give much more inputs as

a okay let's say it's a it's a classification task let's say it's a classification task of finding the brand from a description of a product and then you would give maybe thousand examples and instead of giving them one by one um the algorithm would learn an embedding space and an embedding Vector that it can prepend the large language model almost as an instruction uh to be able to do that uh one thing to my surprise I would say I would I would have expected that to to work much more you know broadly uh not many

people are using it because prompt design is just like you give a few examples much more easier much more practical we're seeing people just doing prompt design uh in in practice and we'll talk about distillation in the next section uh actually but um again conventional find tuning you get a a pre-trained model checkpoint any one of them um and uh you have a new data set a new task and uh you know you're with your loss function you're you're updating all of the all of the weights or most of the weights it requires lots of

computes um so in practice especially for Enterprises we're not seeing many customers doing this actually we're seeing much more of the customers doing other types of training which I will I will talk in a second I think I mentioned conventional prompt tuning from many examples you learn an embedding Vector to be able to prepend your your large language models I don't see much of the use right now in practice and that was a surprise to me now I think this is what I see many people are doing you probably have done it as well you

have a question that you want to know give to your large model or large language model and you provide you know one two five uh examples uh you can provide no example which is called zero shot or or one shot is you providing exactly one example of a similar question and answer you were getting and then large language models typically get so much better in answering the questions as you give even a few questions about your about your use case we see lots of people going this path and then I think one thing that you

should uh you know know about is parameter efficient fine tuning so when I was talking about the conventional uh fine tuning we talked about updating all of the parameters of your neural network uh here you would be updating you know certain set of parameters or you might even be adding a layer of a network on top uh as well so we're seeing this becoming uh much more effective and much more commonly used in in practice and I think there are many reasons for it I think you see some pros and and and cons in in

general but uh it works with uh much less no uh amount of computes because you're not trying to uh update all of your neural networks but select parts of it and uh typically might require a bit less data as well and uh we're seeing lots of customers you know choosing to go this path and uh of course uh ER low rank adaptation or Lowa tuning is is one of the most common ways of uh ways of doing it um I think you can you can read the paper to you know for the details as well

but uh in general maybe a few things to to remember here it's it's quite efficient because it's holding the base model and then almost having an addition component to it and learning just that uh component which is mentioned by BNA a matrices and as you will see in the paper because of the way it uh you know handles the BNA matrices it also becomes pretty efficient from Storage Point of View as well and that's something that uh many Enterprises care about when they have a fine tuning they want to be able to store it all

you need to do here is you're not changing the base model you're only changing just a set of matrices and you you need to just store that part which is called an adapter and that becomes very effective for many reasons like less data to store so very storage efficient but also because of that security privacy everything just follows and Enterprises have been really liking that and of course uh both in terms of inference as well as in terms of performance uh it's it's worked really well many many many research papers show that Laura and then

and then Q Laura which is a quanti version of of Laura uh have worked very well um You probably have read those papers if you haven't please go and go back and and and and read those we'll now move on to know other topics but uh at least that's a you know as a trend in Enterprises we see a lot of usage of of Laura and I and I know other platforms um also see the same distillation um I think uh as I mentioned for many of the use cases you do not need these you

know huge models and the huge models usually mean higher cost and higher latency so for many applications actually you want less latency so dation usually comes to your uh help here and uh we look at one algorithm here I'm not going to go into like some of the you know historical other advancements but uh a typical approach that uh people use here is a teacher and a student model and the teacher is trying to give you you know answers on a set of training data that you have and and the student model is is learning

from from that the alternative would have been let's say you have a classification task and you have millions of data samples you probably would have needed to label them which we call hard labels a human usually would would need to do that and then you can probably train a student model based on that and it could be a small small model as well as you might guess it's actually uh pretty just to just to label instead a typical approach is you might still label a few or some but you would typically give that training data

to a teacher model a very large model which is great in reasoning assume it's good and and it essentially helps generating labels and there are various techniques for it and I will mention just a few talks about that as well it could be even soft labels and then you get your prediction from a smaller model you start with a smaller model and you start tuning or changing the weights of that smaller model based on your loss function of how much you're able to guess the answers from the teachers teachers answers but when I describe it

like you realize immediately oh I do not need to create lots of labels the teacher model is doing that for me so that's why we see many people uh use this technique when it comes to when it comes to you know distillation or or generating much smaller models and um I think one approach let's say this is a you know similarity problem or classification problem let's say it could be an image similarity and um with the soft Max uh activation function you you turn uh the output layers outputs of potentially similarity between between in this

case images into probabilities now this function is very common the issue is that as you see in this example it put almost all of the probility into the you know first one so many people have used various techniques and and one of them is a temperature and in this function as you increase the te temperature uh you would actually smoothen out that probability more and more and the idea is you're really create creating soft labels and if I go back to the Alor that I was verbally explaining um is that the teacher model is getting

the soft labels uh the student model is getting these soft predictions which is again a vector of you know potential matching and the loss function uh put them back propagate into changing the weights you could also do that with the heart predictions by the way it's just that you need to get the get the labels you need to get the heart uh are predictions which is which is a timely process and sometimes you do both all right so uh let's maybe look at a little bit on grounding too I think I mentioned this why grounding

is needed LS being trained in the past hallucination and can site sources and uh we've seen many approaches being taken taken here so first of all making sure you provide the right context to the large language models and through a retrial algorithm you could for instance use the web search to be able to in parall to the LM to be able to get the data and then you can potentially use a search engine with your with the Enterprises private data as well and then and then combine the combine the two um with hallucination and and

and and factuality another approach that we see many people take and we certain to take that at Google as well is looking to generating better models that are getting better in factuality and this comes in a little bit expans of reason capability as well um but I think many Enterprises would prefer prefer that and then user experience actually becomes important as well you want your results to be authoritative you want to be able to show um where the sources of of the data is for the results that you're generating so and the right context as

I said could be anything anywhere from private documents anywhere from fresh content of in the on web uh or other auor authoritative uh documents as well and this could be from thirdparty sources that are making their their their data uh available and then a typical of course people refer to some of this as rack uh as well a typical uh way to do it is you would get a query or a prompt and then you would convert that into a query that you can feed it into a retrieval engine or a search engine you would

get the results and then you would tell your large language model hey here is the result that I'm getting when I ask a search engine about this and here's the query now taking also this results into account what would you say so that's a typical approach that we we're saying of course lots of details like how do you take a prompt and convert it into a query like uh cury to a retrieval engine or a search engine and then once we get the results how exactly create a new prompt to be able to give to

uh to the to the large language models as well and I'm not going to go into the details of this uh as we as we running out of uh uh time but uh you might also want to get the results and do another post processing to really check the citations one you know again and and uh be able to like you know really be sure about your your answers as well um as you're combining search and and large language models I mentioned uh about better models um very typical people would to use reinforcement learning with

human uh uh feedback and and potential AI feedback as well like you could use a very large model to do the PS and thumbs down uh as well designing a reward model to highly punish you know responses um and and creating a loss function to be able to feed feedb and I will I think I'm guessing a big trend from any large language model um producers to go more and more into factuality in in the next few months maybe six months 12 months because many use cases um you definitely want to be in the you

know more on the factual factual side than a reasoning or creative side there is usually a little bit of a a cost or or or tradeoff between the between the two all right and then I think uh user experience depending on the use case we see lots of uh customers using the various tools all the way from okay like in a response looking at the sources how many supporting sources there are how many you know know contradicting sources there are and based on that maybe refining the refining the answer uh as well but in general

being able to site the sources um has been something that many Enterprises have been asking for again and again and I think even on vertex AI we took this approach where you have grounding with Google search grounding on on on emperis data grounding on you know moodies or Thompson Reuters like third party data and having a model that's uh like we call it grounding with High Fidelity which is uh it's actually a model not even combining with a surge but much more factuality train and and of course on top of all of that and this

is an you know interesting concept remember I mentioned the cost of LMS are going down we do not see that Trend as much in search in general like there are lots of search engines out there where the cost of an API call has been constant for for a while for years so this requires a concept of dynamic retrieval being able to really understand intelligently when you want to combine with a search engine versus not so that's also another Trend that we're seeing again if you're looking into Ted work startups um this is also another another

idea being able to intelligently understand what to do based of based on llm response like a dynamically understand you need to call a search engine but as we will see in the next section on function calling um being able to call a function all right um You probably have all used large language models you come across many things where you're not getting the the results that you want or especially do if you want the large language models to do something give the schedule of trains give like maybe buy a product there are many things that

you just cannot do change something in an you know health record of a of for patient change something in a in an employee record those are the type of things a large language model cannot do large language model is great in getting information but it's not great in taking actions we call it real world actions with them I would say function calling um as well as grounding combined like helps with many many of those um but uh some of these limitations we've seen when it comes to function calling I think the biggest uh limitation I

would say of large language models is is being able to take these actions and for many applications you would need to be able to take these actions and uh I think you've got into some issues and and you've done lots of work arounds but uh maybe in this slide I'm I'm going fast now because I want to have some time for questions as well I think there are multiple steps like three easy steps I would say um defining your functions uh first of all like let's say you're a travel company creating an extension or a

function that does you know looks at flight schedules looks at flight bookings hotel bookings Etc and you have a different function for each one of them and then you like either gemini or or you know an anthropic model or an open AI model uh there's lots of work going on to make large language models understand that okay I cannot do this and and then be able to choose from the functions the available functions to be able to call them when needed and then I think in the inference time that's exactly what the what the large

GL model would do here is something that I cannot do I cannot book a flight oh I need to find an extension here's here's my extension library with 100 different extensions okay Expedia is the right one to call to be able to do this and I think uh this roughly describes uh what I what I said in a bit more technical technical terms um but uh um like this is also how it works with with Gemini as well I think as a tool provider it's important to be able to provide the ER the ability to

your customers to provide your extensions and functions so that your library of extensions are constantly growing but at the same time you need to constantly put on work on on your large language models as well to be able to get better in understanding when it needs to call an extension and and also you know which extension to call it only becomes more challenging over time because as people are adding more and more extensions that means now the library of extensions is extending and you need to be even smarter to be able to choose the right

extension uh for the things that you want to do all right I think with that you can do lots of things all of a sudden like uh you could say get structured outputs know real time information retrieval um you can you can get for instance if you're a customer agent you can you can get your customers's data you can get a customer bill you can uh have you know tools for your employees uh you can book a hotel or a flight or buy a product you can do lots of things once you have function uh

calling enabled essentially function calling extensions make large language Bots be able to do many more things things in in in life and in applications rather than just being able to provide you the data and and information all right now I mentioned many tools but there are many other tools that you really need to be able to build successful agents and applications like you need to be able to have a great pumpt management system like store them have a revision history even being able to convert a prompt from one large language model potential to another language

model uh as well and uh I think I mentioned as you give more choice and availability to these large language models U something very important is evoluation and I will mention a very important Trend that we're seeing here as well which is um providing evoluation tools for uh for large language models or or other other tasks and uh in in vertx AI we have multiple of them but I think the auto side by side is a really important concept and you will see a a a growing Trend and this might be another idea for you

again for your teis work etc to use larger models like the the largest model possible um for rating uh purposes indeed there is a shortage of human in the world right now for all the rating classification work that needs to happen and and uh we're definitely seeing a trend towards doing some of it with you know actual large language models that that are that are large and have have very good reasoning capabilities The Other Side by Side tool that that we also have has exactly that um you can run multiple large language models and then

you you ask a very large model U to rate them and and decide which one is better interestingly enough you're seeing a a significant correlation and I think there's there's lots of work going on across many companies as well with these large models assessing these tasks and a human assessing these tasks as well so with that uh maybe I these are some of the announcements that we've done actually last week these are all public announcements and I think different companies are seeing a very similar Trend maybe one thing to remember is that we're seeing a

huge exponential growth everywhere in AI in the world right now just a few numbers that we published last week is um just this year in Enterprise world only just for for companies um I think from January to August uh about 30 time 36 times growth uh on the API call for a business it usually takes between five and 10 years to get this this type of a growth this is not even one year half a year and uh and then you would say wow this was very successful business so the amount of growth that I

think all of these AI companies are seeing is just something that no one has seen before I think if you want to remember one thing a you're in the right place right now because you have a huge interest in in this um and and B I think this trend is gonna is going to continue and um like try to be as involved with AI as as possible but also add uh like many students come to me okay what should I study in addition to AI maybe add components of you know other Sciences or Humanity art

something also that brings in some creativity as well to your mind too I think that's going to be really important even how we program uh a few years from now uh is going to be very different than than how we are doing it how we are doing it today and and and creativity and forcing your creative part of your your part of your brain is going to be super valuable for you as well