So Meta might have just introduced the future of large language models well I guess after this video you won't be calling them large language models anymore this is what we call large concept models language modeling in sentence representation space basically stating that look LMS aren't exactly the best thing to use and in this new paper by meta they detail large concept models this is something that is I guess you could say a little bit different to llms and it's really cool when you realize the thing that it is tackling so we all know what llms
are but if you know how llms work you'll know that they work through the process of tokenization so these systems actually work by predicting the next word and they do that via the token so you can basically think of them as advanced autocomplete although I do know some researchers are going to hate me saying that but this is essentially how it works and I've used the GPT 40 tokenizer visualizer to show you what the characters actually look like to the llm system and this is genuinely why there was that big debate as to why these
llm models weren't able to get the famous question how many RS are in Strawberry there are two RS in the word strawberry of course there are three but this is essentially why these llms weren't able to get the answer right because they of course just saw the word strawberry as one entire token leading to this big confusion now this is essentially where another AI researcher he actually talks about this entire thing he says I'm more and more conf confident that tokenization will be gone stating that the era of these tokenization models might actually be going
away sometime soon and he says here that humans don't actually think in tokens tokens are hard-coded abstractions in llms that lead to weird Behavior LMS can solve PhD level math questions but cannot answer is 9.9 bigger than 9.11 and he says now meta is Shifting llms to large context models or large concept models changing next token prediction to next concept prediction and the concept is treated as a sentence representing an abstract idea or action so they talk about how despite the undeniable success of llms and continued progress all current llms Miss A crucial characteristic of
human intelligence explicit reasoning and planning at multiple levels of abstraction the human brain does not operate at the word level only which is very true we usually have a top down process to solve a complex task or compose a long document of course we first plan the higher level like the outline and then step by step we add these details at lower levels of abstraction and one may argue that llms are implicitly learning hierarchical representation but they're stipulating that models with an explicit hierarchical architecture are better suited to create coherent long form output and I
think this is one of the key things here that you know llms do miss some reasoning steps and I think maybe this is why certain models reason better than others because maybe they're already using this architecture of course that is some huge huge speculation but we do know that certain companies like Claude have a ridiculous level of response quality and many are stumped as to how they're managing to do it now basically they actually talk about this example here and don't worry about this huge block of text I'm going to explain this to you in
a simple simple way it basically says that imagine a researcher giving a 15-minute talk or presentation which is kind of like what I'm doing now and in such a situation you don't usually prepare detailed speeches by writing out every single word they're going to pronounce and of course this is true it wouldn't really flow well and it wouldn't sound that good it would really sound monotonous instead they actually outline a flow of higher level ideas that they want to communicate usually on a PowerPoint presentation they'll just have like three or four words maybe with some
images and then they'll talk about those and basically they also State here that if they give the same talk multiple times the actual words being spoken are actually going to be different and they could even be in different languages but the higher level abstract ideas are going to to remain exactly the same and this is the same when writing a research paper or essay on a specific topic and this is exactly what Humans Do by preparing an outline that you know is of course a structure and then they refine it iteratively and of course this
is something that humans do when processing a large document instead they use a hierarchical approach and they remember which part of a long document they should use to search for a paic and of course this is something that humans do when looking through words in a large document now this is the example that they actually use here so this left side diagram that you actually see shows how the model transforms a story about Tim into a very simple form you can see pink dots actually show us the detailed story so it shows us how Tim
wasn't athletic he thought that that would change if he joined a sport he tried out for several teams he didn't make the cut for any of them and then he decided to train on his own instead you can see that the higher level abstraction here you can see the higher level idea and the blue dots show how it summarized into a key concept and it says that because he was not very athletic Tim could not join teams to improve so he decided to train on his own and then this is how on the right hand
side we can see how the LCM processes this language like a sandwich so you can see this bottom layer which is the concept encoder which is of course fixed this is where you take the regular words and you converts them into these Concepts so you can think of these as complete ideas then this is where we got the middle layer the large concept model this works with these Concepts and processes and understands them like understanding the main ideas without worrying about specific words and then of course we get to the top layer which is the
concept decoder which is actually it takes the processed Concepts and turns them back into regular words that we can actually read this is quite like similar to taking a story written in English converting it to Universal idea symbols that any language could understand and then processing those ideas and converting those ideas back into words which could be you know in any language think of this bottom layer right here as you know an idea language and then this is where you think about those ideas and then this is where you translate it back into human language
and this is actually really different to the current AI models that work directly with words this system works with complete ideas or concepts instead now what's actually fascinating about this is that they state that to some extent the large concept model architecture actually resembles the jeer approach which is from Yan Lan as you know one of the leads at meta Ai and this V JEA approach is something that also aims to predict the representation of the next observation in an embedding space and this is actually really interesting because V jeer is a fascin architecture this
is the architecture where they talk about a new architecture for self-supervised Learning and how they actually try to learn the same way that humans do the goal with jeppa which means joint embedding predictive architectures is to create highly intelligent machines that can learn as efficiently as humans vup is pre-trained on video data allowing it to efficiently learn Concepts about the physical world similar to how a baby learns by observing its parents it's able to learn new Concepts and solve new tasks using only a few examples without full fine-tuning V jeppa is a non-generative model that
learns by predicting missing or masked parts of a video in an abstract representation space unlike generative approaches that try and fill in every missing pixel V JEA has the flexibility to discard irrelevant information which leads to more efficient training to allow our fellow researchers to build upon this work we're publicly releasing V JEA we believe this work is another important step in the journey towards AI That's able to understand the world plan reason predict and accomplish complex task now if we actually do take a look at the results we can see that lcms actually generated
coherent and meaningful expansions lcms also tended to avoid excessive repetition compared to llms like llama 3.1 a and lcms followed instructions better generating controlled length expansions overall I do think this is one of the most fascinating things because tokenization has really been a big problem with llms and we know that these very small problems in AI like how many arza and strawberry are quite frustrating when we're trying to look at what these llms can do considering that we know these models are pretty smart let me know what you guys think about this research if you
guys think this is the future of llms or what's going to be next or maybe this is going to be some kind of hybrid architecture I do think meta is always pioneering Innovative research which is why I chose to include it into today's video so if you enjoy the video I will see you in the next one