Smart LLM Routing & Novel Multi-Agent Framework

4.75k views4855 WordsCopy TextShare

Discover AI

New AI system will recommend the perfect AI model for your specific task at your specific price poin...

Video Transcript:

hello Community we have some beautiful new research here just three days ago we have here the University of Illinois here in the US published a new research paper graph router a graph based router for llms now you know that if we have specific tasks specific queries and we have llms at a specific price point and a specific performance that we expect it is rather simple to build here a graph however it is much more interesting to build here from the graph now a graph neural network so that we can discover hidden edges or notes or

connections within a dynamic graph Network so in the worst case we can say hey let's predict future llms for unseen task but of course we want to have an optimization of the graph data for the llms that we already know of now the training of a graph rotor in itself is not really particular difficult and I don't want to focus on this but I want to show you how do we build this graph because there are some complexity that are just beautiful so what is the task at hand so we start here with we want

to efficiently select some specific LMS do you go with claw do you go with Sonet do you go with 01 with a mini based on your specific task on your requirement given the projected performance that we can back from those system for your task and the anticipated computational cost and then the price point that Microsoft is asking you so easy how do we build a graph now we have notes and edges so with notes the researcher went here with a simple three type node model so we have the models and each and every of the

models you see here beautiful is now a n structure plus we have different task we can go for a question answering task for a summarization task for a text generation task for a sentiment analysis for a dialogue system you got the idea and then we have specific user queries so let's say in the question answering class I have open domain question answer close domain extractive Q&A abstractive Q&A so I have here a complete classification with particular queries so this is some dense data set and then of course we need edges well couldn't be easier know

we have the llm notes that connect now here to the query notes and we have the task notes that connect to the query notes yeah but the beauty is we have some irregular Notes too no we have notes within here task notes structure or maybe we have some cross reference depending whatever the graph might show us so how do we build the graph now that we have the note and the edge classification done now we just need the data and of course on our edges we have information about the price point about I don't know

the latency or the task execution in total so you have a lot of information that you could put into your graph representation if you want to learn more about edges here and the power of graphs I have this video for you if you want to see a link prediction is this video if you want to go with here a computer model here I would say Pi G for torch geometric data is really beautiful I'll show you here the definitions or and this is what we will look a little bit later on in about 2 minutes

time if we have a graph how we go to a graph machine learning computational graph structure but for the moment let me Focus here on pgs bird so we have heterogeneous graph using a sentence Transformer great so here we are and now the task is easy no how to build a graph we need a graph representation and a graph from a little grass H is a data structure where the notes this is our task our ques and our llms are connected by those edges to represent relationships but not physical distances we have no matric here

in this space so edges between not are defined based on how the entities interact the task relates to a specific specific query or the llm generates a response for a specific query and we have of course the edge weights or the attributes here of the edges and they represent characteristic data like the performance or the cost data whatever you want you put here in the edge weight structure but you remember the weights do not respond here also to ukan distances but everything is here just on a relational basis so what edges do we have as

I told you we have task query edges so if you have a task note for mathematical reasoning and a specific user query like what is the sum of 35 and 58 then the AG is created exactly between those notes in the training data and there's no complex add feature data but you could just assign a binary value you represented a query belongs to this task so you see we can start really easy next we have the llm query edges here you can put your data about performance and cost you can calculate these values based on

your historical data or you just look it up or you have it in your training data already and then then it is rather easy now and I show you here py torch geometric examples here you put torch geometric torch geometric here we have now our tensel structure for the task noes and a tza structure for the query notes and a tensel structure for the llm nodes we have the embeddings we combine them we have the edges and we create our graph data here you see just some lines of code but you might say hey wait

a minute wait a minute wait a minut what you mean we have to task embeddings where does the embeddings come from H okay no problem at all embeddings embeddings are simply used to represent notes and edges in a way that would allow here multiple models to learn relationships and to learn the those relationships efficiently let's say the task note edings now this a generator using an llm let's say we have gbd4 to create now a descriptive text for each specific task gp4 can do this easily and those descriptions are then passed through a pre-trained language

model but a very specific one like here bir sentence Transformer model to produce task embeddings so this means we just create a new Vector in a vector space that we also have to create so example the task question answering here is described by td4 and then the resulting text maybe each sentence or maybe you take a complete paragraph or you take the complete text and you just embed it using here the sentence Transformer bir in a vector space let's go to the very note embeddings it is rather similar embeddings using here the same name pre-trained

Bird model so example what is the capital of France is passed through bird to generate aquarium embedding and we get a new vector and then the llm node embedding now you know it descriptive embeddings are generated using here a prompt to obtain details about the lm's capabilities so gp4 is prompted to generate a description of 01 mini which is interesting in itself and then this textual description generated by Ani is now the input into a sentence Transformer model and this sentence Transformer model outputs here a specific Vector representation in a specific created Vector space and

this is what we call an embedding of this particular information so you see we have quite a lot of llms interacting with other llms passing information through sentence Transformer just to create the initial IM of the notes and then then we are now ready ready to go and have a look for example at this video that I have two years old never mind it's still actual how we go here from bird and sentences to a vector representation with a bidirect journal encoder stack and I have here seven videos so you can have a deep dive

and whatever uh computer code you prefer you can go here with caras I have all the other languages available so this this means now we understand that the way to come now from our embeddings here there is quite some complexity because the vector space the quality of the vector space is important if you just go and buy some Vector space you cannot be sure that your specific data are within this Factor space you know simple or language maybe you would have the French version and you can't buy an English Vector store or you have a

topics specific vectors to so you see there are a lot of complexities there not even mentioned but it is of course you have to prepare you have to go through all these steps but sometimes you do not find it anymore but there lies here massive problems so now we have a graph isn't it beautiful we have here p g created here graph or whatever uh software package you want and now this graph is here but now as I told you you want to make predictions no we want to have insight into the encoded information here

to say if now a new query comes along that was not in the training data set so what model is it at what price point at what complexity level well imagine we have now a new llm coming out and we have now a description by gp4 describing now O2 model then theoretically in the graph based on the similarity in the vector space of the description that was generated by gp4 on O2 we can build let's take Network X the simplest software package here a graph structure where we use here similarity in the vector space to

optimize here the graph for new unseen data and we can argue on semantic similarities in the vector space so you see the logic behind this is maybe not as trivial as it may sound so our step now is to go from a graph to a graph newal Network a GNN I already have a video on this it's so beautiful you know from graph data to graph machine learning how to build a computational graph you have different message pausing aggregation mathematical function I explain everything in this video if you want to build it so you see

for my little green grass up embeddings allow us to convert Now High dimensional data either text task queries into lower dimensional Vector representation if we use for example umap and this vectors encode semantic information about the input in a way that is easy for machine learning models like a graph newal nrixs to process oh yeah I forgot to mention that here if you really go to a lower dimensional Vector representation the mathematics behind this operations to come down to lower mathematical space Dimension but in incorporate the complexity of the original information this is not an

easy task and I have I think about 10 videos on umap great so for example instead of representing a task r as a raw text embeddings condens another information into a fixed size Vector we talked about this capturing essential features such as the relationship queries task difficulty or capabilities and those embeddings also enable inductive learning so you see wherever we are we are faced with embeddings we have to find simpler representation in simpler Vector spaces to be able to do anything at all now the graph rout as in the publication this brand new Publications just

three days ago uses also birth based embeddings here for the not representation bir directional encod representation yes I think I have 50 videos on sentence Transformers so no problem at all so foras we have the task embeddings the query node embeddings and the llm node embeddings the task node generated from the task prescription using bird generated from quaries using bird generated from llm description using bird after gbd4 came up with new descriptive details as you see the interplay of the complexity is now amazing and I think sometimes we forget that we start here with a

simple complexity in our Vector space in our Senter Transformers in our bidirectional encoders in our next step in our compression here of mathematical dimensions in our next complexity if we have a semantic description and in the next complexity if you do projections on graph Neal Network or message pausing or we want to have a link prediction we should be aware that on all levels errors can happen but let's look here at a complexity that is so huge we hardly see this complexity and there's not a second paper and this is a beautiful paper here beginning

of October October 2024 Auto Machine learning agents now this this is the real stuff here case uh Korea beautiful deep Auto a multi-agent llm framework for full pipeline Auto Machine learning we have everything that we are dreaming of not one agent but a complete multi-agent scenario multiple llm Frameworks and employed here to build for us the complete pipeline for the complete endtoend exercise with Automated machine learning it couldn't get more beautiful what do you think about this so yeah we have here this um visualization with the little sheeps but since this will be an economic

activity that will bring in hundred of billions of dollars I do not go with the sheeps and even if the sheeps multiply and it becomes a little bit more interesting no let's choose something else let's go with this here this is here a simple visualization so what we have we have the time Horizon and they go from end to end and never mind what is the absolute time we are just looking here at at how bigs are the chunks for the single processes now first we start with a prompt paring then have we have a

request verification and then we have retrieval and planning and as you can see wow this is a huge chunk and then we have plan execution also massive then we have a little bit of an execution verification and a selection and summarization here and these two colors and then we have code generation and the first time I saw this I said hey this is interesting so in an end to end real code generation is about Asser no this dark blue here and those two are now the new emerging Focus points in EI so let's make it

even easier I call this coding I call this here plan execution and this year retrieval and planning so three simple task in a comp complete Automated machine learning pipeline generated only by multi-agent AI systems let's have a look what is retrieve and plan according to the UTS critical phase in the auto machine learning agent framework multi-agent framework where the system generates multiple plans to solve your the task you know that's beautiful because we just not go with the highest probability task but we say we want a set of options for our task and then we

want to have a detailed analysis of the risk for example so we have at first paring here the user requirement utilizing the users instruction that are the beginning of everything to understand the task at hand really in all details what is it that the user want where do we start what is our initial condition what data do we have what access to what systems do we have what computer simulation can Rec call in the dependently do we have access to the internet do we have access to different databases do we everything you have to think

about this and build here at first a plan now you know in my last video where we're talking here about queries user queries with a high complexity like a complexity 8 out of 10 if we try to decompose those in two level complexities this is not really what might be working but never mind let's go go on the next stop here retrieve so we have EXT to retrieve external knowledge accessing external data up todate data relevant to the task beautiful and then we have generating multiple plans so we're creating now a set of diverse endtoend

plans from the data retrieval to the complete model deployment phase based on the internal knowledge and everything we can have access to with our data so what does this step do simply ensures that a framework considers a wide array of possible solutions enhancing the chances of finding the most optimal one that satisfies here my user requirements sounds easy but as you see it is almost as powerful as the complete code generation second part second part now is the plan execution let's have a look at this what is it plan execution according to the a of

this beautiful study refers to the stage where the generated plans from the planning phase are now executed by specialized agents now we have our multi-agent scenario but be careful because there's a critical distinction so Four Points we start with Point number one plan decomposition the same what I just told you here about a C8 of 10 complexity can we decompose those complex task at all but let's assume yes let's have a green light due to the complexity of endtoend plans covering the entire Auto Machine learning pipeline the plans are broken down into a smaller manageable

subtask and then each plan is decomposed into subtask specific to the agent's role and the expertise this is also in one of my last video I was talking about the super agent this here is exactly if you want an implementation of the super agent you say hey I have 20 agents and I have now a specific subtask and now I have to find the correct subtask for an agent where the agent can give me back I don't know a 90% confidence level so data related task are assigned of course to the data agent doing all

the data management and the model related task are assigned to the model agent now this is great because this decomposition allows your redu in a complexity it's faster it's less expensive and this allows agent to focus on their specialized task that they have been trained on improving efficiency and Effectiveness great second step prompting based plan execution now as I told you there's one critical element we are not really sending the data off to the real world execution but this is here a computer simulated execution it would be much too expensive to really connect to all

these agent and really run this at this level already we will only do this here much later in the coding phase when we will generate the code for this we are here still in the optimization phase and we work here with computer models that simulate the performance of agent for a particular task for a particular code generation so you see we have another complexity how good are the agent to simulate this is this real the real thing let's see everything works about fine beautiful so execution of design subtask without actually running their code this means

the ACT is if they were performing the task to provide expected outcomes based on their knowledge if your computer simulation of the knowledge and the performance of the AI agent varies just I don't know 10 to 20% you can have now maybe a plan chosen later on that is not actually the best plan for to solve this task so we have the data agents role handles task like data retrieval data pre-processing data augmentation and data analysis and the model agent role takes inside from the data agent to inform here the model selection and the hyperparameter

optimization this now is exactly what I showed you at the beginning of this video this is now exactly that the model agent says hey I have access to let's say 200 Vision language model on let's say hugging face to be easy okay so I have a specific task I know now from 200 miles I have their description I know their performance data on standardized Benchmark data Maybe I have my own test for my own domain specific knowledge of those 200 llms or Vision language model or whatever and I know if I have to fine-tune them

I have to choose a particular hyperparameter optimization a particular Laura configuration maybe I choose multiple adapters that I want to use this model must know all the Real Performance data of the real agent in this computer simulation of to anticipate your the performance step number three you execute all this computer simulation and then step number four you aggregate all these results together everything is collected everything is brought in and now our artificial intelligence can now come up maybe and decide okay but before there's the decision we have here you see this little orange and red

here so we have the execution verification so this is just a step to be extra careful we say hey let's verify this before we really send it off here okay so this is here this 1.04 so this is here this kind of an orange indicator I don't know if you see it here on your screen great and then we just have selection summarization and then we come to to the code generation so you see those two parts are a massive amount here of time that the system will spend on and please note that coding is

not anymore or maybe is on place on the third place for the importance in multi-agent systems so this plan execution is about breaking down complex plans into specialized subtask and the Agents that we will choose can be simulated in their performance and in their cost structure and in their time structure to GA expected outcomes without incurring your high computational cost of running them in real time for hours and hours and hours so so important to know here the performance data and the cost data of all agents that you have access to if you look at

this this is time but time is not money know what about the real cost and the auor did here a cost presentation which is beautiful thank you for doing this and you immediately see our retrieval becomes now almost insignificant if you only look at the money it's so cheap but plan execution double here in size if you want and we have a complete new body here because this is a short time interval but this is really really costly so what is this selection and summary and you know what coding is now H just a little

bit you know retrieval is on the one side of the spectrum is almost push to the border and coding is also push to the other side of the border and those retrieval and coding are not anymore that important as it was I don't know two years ago but you see now for Value those those are the elements that you should focus on so let's do this let's do go now for this select what is it selection and summarization this involves choosing the best plan based on the ex on the simulated execution result and preparing now

this best plan in detail for the real world implementation in the coding phase sounds easy now so let's start with the execution verification the agent manager verifies the execution results from all the agents to ensure that they all meet here the user requirement and if an agent results do not satisfy this requirements the framework May revise the plan or adjust the parameters maybe it goes back then we have the selection of the best plan now it is the responsibility of the agent manager to select La the most promising plan that REM that aligns with your

goals with your price point with your time constraint in your knowledge domain whatever you have this involves comparing expected performance metric model suitability and the complete compliance with any specifi constraints great then next step is summarization and instruction generation so the selected plan is now summarized into a set of actionable instruction remember this is some LM doing all of this in our agent so it depends on the complexity level that the agent and therefore our llm or VM is doing to do this find the real perfect set of actionable instruction for specific configuration of the

system so includes data like the chosen data set on any pre-processing steps hey wait a minute hey hey hey then we have a selected model architecture and the hyperparameter all the deployment requirements and the target platforms maybe you choose Google maybe you choose Amazon whatever and then all those instruction are formatted to guide now the next agent that we'll take over here the operational agent in the next phase and this next phase is here our coding phase and this made me thinking why are the global players are so interested now in this because you know

coding is not important anymore now that we have here all our code llms and whatever why should we as human care at all about coding why should we as human care at all about how the system retrieves information we have ai systems for this no the real intelligence now here is here in this middle sector look at where the money the value is 80% just these twos so I have now a simple question for you what a Microsoft owned model that makes no the clear decision have a positive inclination towards an operational Microsoft owned llm

operate on a Microsoft platform for the cloud compute for example and choose Microsoft specific parameters hyperparameters for a perfect Microsoft generated result that then you can further integrate on your Microsoft empowered applications like word or email or whatever you have you see can you build here an ecosystem like apple guess what what company is trying to do this right now so you see this level of aggregation of retrieval coding planning reduce the complexity find the right configuration of the system select the right path this is now the edge of research where all the global corporations

are interested in not really in coding not really in retrieval but here where the value lies just to clarify that we are talking now about constraint free settings and constrained a settings what is this give you an example I have a prompt hey and need a very accurate model to classify hey wait a minute images in the butterfly image classification data the data set has been uploaded so I just say hey I need a very accurate model and I will show you why I introduce your constraint aware settings here I say hey I want here

to transfer learning from a pre-trained reset 50 model and I want your accuracy of at least 95% on the test plate and provide the final train mod so the more specifications you have the more constraint a where your setting things are did you tell the system what it really needs you know what happens interesting you know this here on top the constraint free we just had a look at this but this here the bottom is no new constraint errare you know what happened here in the selection and the summarization this will become for the money

the absolute dominant Factor because you specified so many parameters suddenly the optimization problem is not that easy like here where we had the primary intelligence into the plan execution but now all of this goes now in selection and summarization and you see in both cases coding is here it's the Border coding is also here pushed to the Border pushed to the side so careful where you will focus here your attention your time and your money on because it looks like here looking here at the latest research paper that we have a new research topic and

this is to find the right system configuration and look at what we have to invest here before we even enter to write some operational code I hope you enjoyed it I hope you had some insight I hope you found it interesting and it would be great to see you in my next video