intelligence is very specifically your ability to handle novelty to deal with situations you've not seen before uh and come up on the fly with models that make sense in the context of that situation and this is actually something that you see very little of in llms if you ask them to solve problems that are significantly different from anything they've seen their Traina they will fail the abstraction reasoning Corpus for artificial general intelligence or rcgi for short you can think of it as a kind of IQ test that can be taken by humans it's actually very
easy for humans or AI agents every task that you see every task you get is novel it's different from any other task in the data set it's also different from anything you may find online argi is designed to be resistant to memorization and all the other benchmarks can be hacked by memory alone when I've spoken to AI researchers I've gone through Arc um challenges together with them and they are trying to look at their introspection so they're saying I'm looking at this problem and I I know it's got something to do with color I know
it's got something to do with counting and and then they run the program in their mind and they say one two 3 no that doesn't work that doesn't work I think introspection is very effective when it comes to uh getting some idea of how uh your mind handles system 2 thinking I think it's not very effective for system one because Sy system one is inherently not something you have direct access to it happens like unconsciously instantly uh in in parts of your brain that you're not directly observing via VIA via your own Consciousness but system
2 is not like that system 2 is very deliberate uh it's very slow very low bandwidths it's very introspectible but what's not uh mentioned here [Music] is France it's an honor to have you on the show um honestly this this means so much to me you're my hero so thank you so much it's my pleasure to be here and I would say you shouldn't have yours like it's uh I shouldn't no why not it makes for a disappointing experience um not for me okay yeah not not for me hopefully I can I can live up
to to the expectations but oh defin definitely I'm sure you will um France I mean you've been critical of of the idea of um scale is all you need in AI can can you tell me about that sure so yeah so this idea that scale is all you need uh is uh something that comes from uh the observation of scanning scaling laws uh when trained dep networks which is so scaling laws are this relationship between the performance you see uh in in uh deep planning model so typically llms and how much data and compute uh
went into training them um and it's this sort of flag logarithmic scaling uh of LM performance as as a function of training comput typically that's that's how it's formulated and uh many people are extrapolating from that that well there's there's no limit to uh how much performance we can get out of these models all we need uh is to scale up the compute by a few orders of magnitude right and eventually we get uh much Beyond human level performance purely via scaling compute with no change uh in architecture with no change in training Paradigm and
well the the major flaw here is the way you measure performance uh in this case performance is measured via exam style benchmarks uh which are effectively uh memorization games so effectively uh measuring how good the llm is at memorizing the answers to the questions uh that you're going to test it on not necessarily the exact answers but maybe uh the sort of like um program templates that you need to apply to arrive at the answers and if you're measuring uh something that's fundamentally driven by memory then it makes sense that that as you increase the
amount of memory in the system like the number of parameters amount of train data uh and comput is really just a proxy for that you see a higher performance because you know of course if you can memorize more uh uh you're going to do better at your memory game um and my take is that this performance increase that you're observing it is actually orthogonal to intelligence you are not really measuring intelligence because your benchmark can be hacked purely by preparing for it by memorizing things in advance uh if you want to uh Benchmark intelligence you
need a different kind of game a game that you cannot prepare for um something like Arc for instance and I think if you look at uh performance and Arc over time or a function of compute you don't see this relationship in fact the highest performing models on Arc uh today uh did not require tons of computes uh and some program search approaches actually did not require a any um training time compute because they were not trained at all they do require some inference time compute but it's not a very large amounts so you you said
that um language models are interpolative databases and i' I've spoken with um sabaro the other day and he he calls them um approximate retrieval systems and many people say to me Tim this is ridiculous that of course they're not databases they do extrapolation but um I think as an intuition pump around memorization that that is what they do and you wrote a substack Blog about this as well yes memorization is what they do I think uh the part where people get stuck is that when when they hear memorization they think the llms are just memorizing
uh answers to questions they're just memorizing content right and of course they do memorize a lot of content a lot of knowledge and factor and so on but that's not primarily what they do what they're primarily mizing is functions programs and these programs do generalize to some extent they capable free generalization and um when you query llm you are basically querying uh a point in program space you can think of dlm as a manifold where each point encodes a program um and of course you can you know uh interpolate across this manifolds to compose programs
or combine programs via VIA interpolation like this which means that you have an infinite number of possible programs to choose from and what happens with LMS is you are uh training them training this very uh very rich very flexible models to predict the next token right um and if you had infinite uh uh memory capacity what you could do is of course just learn a kind of flap table right but in practice uh D llm only has uh some billions of parameters so it cannot just learn a lookup table for every sequence uh in strin
data it has to compress and so what's actually learning is uh predictive functions that take uh uh and and they take the form of vector functions of course because the llm is a curve so the only thing you can encode with a curve is a bunch of vector functions um and so you're learning these Vector functions that take as input elements uh uh of of the the the entry sequence and and output elements uh of what comes after that um like for instance let's say the llm comes across um The Works of Shakespeare for the
first time uh but the LM has already learned a model of the English language well now the the the the the text that it's looking at is slightly different but it's still the English language um so it is possible to model it by using a lot of uh uh functions that came from uh learning to model English in general um and it becomes much easier to model Shakespeare by just learning a sort of style transfer function that will go from the model you have to this Shakespeare sounding text and that's kind of like how you
will end up um with things like the ability to do uh textual a style transfer with with an llm right it's because it turns out that uh it is more compressive uh to learn style independently from content um and uh B based on on the same kind of model uh DM is going to learn Millions uh of of independent uh productive functions like this and it can of course combine them via interpolation because they're all Vector functions they're not like uh discrete programs like you might imagine uh a Python program for instance they're not like
that they're actually Vector functions CU When you when you say program I think a lot of people think of a program as being something with conditional logic and with an llm that's not what they are yeah it's almost like in an input sensitive way um you see this kind of traversal through the model and it's like a mapping so it's it feels input to Output mapping and that mapping is continuous and it is it is implemented via a curve but but we can describe that as a program yes of course they're functions yes and they
and you said they were compositional yes because uh these functions are vector functions you can uh sum them for instance uh you can interpolate between them to produce new functions I I love this Kaleidoscope hypothesis so can can you you know dramatically introduce the Kaleidoscope hypothesis sure so so everyone knows where the caleidoscope is right it's like this uh cardboard tube with a few bits of colored glass in it um and this uh this just like few bits of uh original information get uh mirrored and repeated and transformed and they create uh this tremendous richness
of complex patterns you know it's it's beautiful and the collat SC hypothesis is this idea that uh the world in general and any domain in par follows the same structure that it appears on the surface to be extremely rich and complex and uh infinitely novel with every passing moment but in reality it is made from the repetition and composition of just a few atoms of meaning um and a big part of intelligence is the process of mining your experience of the world to identify bits that are repeated um and to extract them extract these unique
atoms of meaning and uh when we extract them we call them abstractions and then as we build uh sort of like inner Banks uh of such abstractions then we can uh re use them to make sense of nval situations of situations that appear to be extremely unique and Nal on the surface but actually they can be interpreted by composing together these uh uh reusable abstractions that's the fundamental idea behind intelligence intelligence is a cognitive mechanism that you use to adapt to novelty to make sense of situations you've never seen before and it works but by
creating models on the Fly of the new situation by combining together existing building blocks abstract building blocks which were mined uh from your past experience and there are there are two key tricks here one trick is the synthesis trick whereby you take these building blocks and quickly assemble them uh to form a program a model that matches the current or the current situation that you're facing there synthesis and there's abstraction generation which is the reverse process in which you're looking at uh the information you've you've got available about the world like your your experience your
perception also the models that you've created to respond to it and you're going to turn that distill it into reusable abstractions which you then store uh in your memory so that you can use it the next time around so synthesis and abstraction generation and together they form intelligence in my model at least in my architecture of IGI so you've been prominent in the in the AI space for many many years now what experiences or insights LED you to develop such a clear perspective of intelligence so early in your career right so if you read uh
some of my old blog posts or the the first edition of my deep planning book you see that I started talking about how deep planning could do system one very well but could not do system two and I started talking about the need for uh program synthesis um roughly in mid 2026 I mean I started writing writing about it a lot uh in 2017 but in practice I started forming these ideas uh in 2016 and there are several things uh that led me to it I think one of the big uh Catalyst events was working
on automated theor proving using deep learning uh with Christian Sig um and the the key idea was you know theor proving is very very akin to program synthesis um you you you're basically doing uh tree research uh with operators taken from DSL and the key idea was uh to use a deep learning model to guide the search process and so I tried to do it uh for uh pretty long time you know trying trying lots of different ideas and uh everything I I was trying basically failed I mean it was doing much better than random
but if you analyzed how it was performing and how it was um uh producing that that ability perform better than random it was just doing uh shallow pattern recognition right it was not redoing any kind of system to reasoning and it seemed like a like a huge obstacle that I was just not able to overcome by tweaking the architecture or the tring data or anything else um there was this pattern cognition shortcut available and this shortcut would be taken every single time you could not learn generalizable uh discrete programs uh via deep learning and that
came as a as a big insight to me because you know before that point uh I was you know like everybody else in the field I was under the assumption that deep learning models were a very general competing substrate that you could train deep learning models to perform any kind of competition that you know they were they were to in complet um uh that they were to in complete and um around the same time you know 20 2015 2016 there were lots of uh similar ideas floating around like the concept of new old turing machine
for instance people thought and I thought this was a very promising direction that deep learning could ultimately uh replace uh handwritten software you know um so I I subscribed to to these ideas very early on but then uh in these experiments trying to get neural networks to do math I realized that actually they were fundamentally limited uh that they were a recognition engine and that if you wanted to do uh system two thinking you needed something else you needed program synthesis so that's when I had this uh this realization I started talking about it um
but in general you know I've been thinking about intelligence and and how to create it for uh quite a long time like my first sort of like uh uh AGI architecture uh something I I I developed in uh back in 2010 uh summer 2010 so uh and and the reason I developed it is because I was already thinking about it for for a few years before that so I've been I've been in the field for for quite a while yeah quick meditation on the shortcut Ro because I think this gets to the core of it
that deep learning learn I mean basically like we're projecting into ukian space and the only semantic metric is the ukian distance and you know so so these models learn spectrum of spurious correlations and perhaps more spurious than not spurious sure so in general um the reason they're doing this is because spir spous correlations are always available to explain something no matter what you're looking at there's always some element of noise which you can wrongly interpret as being meaningful and it's also because uh the planning models they they are curves um meaning that they are um
continuous differentiable surfaces in a higher dimensional space uh and and we are we are fitting the parameters of these curves via stochastic descent and a curve is you can represent many things with a curve but it's a very bad substrate to represent any sort of discrete competition you can do it you can embed discrete processing on a curve but it's just not a very good idea right um it's not easy to fit generalizable discret programs in this format and this is why you end up with things like the fact that it's tremendously difficult to get
a deep neural network to learn how to sort a list or to learn how to add uh two sequences of digits for instance even llm stateoftheart llms they have very hard time doing it active been trained on millions of examples of adding digits but still they are only achieving some something like 70% accuracy on new digits so they've memorized a program to do it but because this program uh is a vector function is embedded on a curve it is not a very good program it is not very accurate and you see this uh time and
time again with any sort of uh algorithmic type processing and just for those of you at home a piecewise linear function is still a curve people might get confused by that because they think of a curve as being this this smooth thing but um if you look at the Wikipedia definition of curve you're absolutely right it's still it's still a curve you mentioned the neural touring machine which um actually isn't a touring machine of course but it it it behaves a little bit like one what what do you see um is the gap there you
know with neuron networks not being touring machine fundamentally I think uh fitting a parametric curve is ground descent is a good fit for uh What I Call Value Centric abstraction ction which is the idea that you're going to compare things via a continuous distance function which leads to the idea that you're going to embed things and by things I mean like instances of something like could be images could be uh discrete Concepts could be words right um that's that's going to lead to this idea that you're going to embed them uh in a on a
manifold so a space where two things that are similar end up close together and different uh dimension of variation on your manifold are semantically meaningful um you can do this with scurve with with scures because um the the sort of like continuous it naturally leads you to compare things via VIA continuous distance but that's a very bad fit uh for any kind of uh type two abstraction like what I called program Centric abstraction where you're actually interested in graphs and you're not interested in comparing graphs via a distance function you're you're interested in uh comparing
when uh two graphs are exactly identical to each other or more precisely when a graph appears to be a subcomponent of a larger graph um so for instance as a software engineer if I'm refactoring some code if I want to compress my code uh by expressing uh multiple functions as just one function I am not interested Ed in how close the functions feel on the perceptual level I'm interested in whether they are implementing the exact uh program or in in maybe in different forms maybe I need to inject some obstruction in there um and this
is a a comparison that you have to do in a very uh explicit step bystep way you cannot just look at two pieces of code and instantly say without having to think about it oh yeah they look similar and how would you describe that capability it's like a kind of epistemic risk rather than an alteric risk or or verification might be a better way of describing it yeah verific step-by-step verification is is a good way of describing it and um you know uh I just said it's definitely not like this sort of like perceptual uh
continuous distance style comparison and that's true but I think it can also be guided by perception it's like doing this uh stepbystep exact comparison is very costly it it requires you know all of your attention uh uh expanded over some length of time uh so you are not going to want to do it uh kind of in a in a Brute Force like way over many different possible Cate functions you want to use your intuition to identify just a small number of options and and these options your your going try to verify exactly so I
do think we have um the ability to do approximate uh distance comparisons between uh discrete objects but the the key thing to keep in mind is that uh these uh these uh fast uh comparisons are not exact right they're approximate so they might be wrong and I think you get um the the same type of output from an llm if you're trying to use it for programming um they they will often give you things that feel right uh but aren't exactly right and in general I think that's the thing to to keep in mind when
you're using deep learning or when when you're using LMS is that they're very good at giving you things that are directionally accurate but not actually accurate so if you want to use them well you need this uh uh post factor verification step so um watching your children grow up how has it influenced your thinking on intelligence and and learning one thing you notice when you watch children grow up is the fact that constructivism uh is uh entirely right that uh you they learn thing uh in a very Ive manner they try things out and from
these expenses these very deliberate expenses they extract new skills which then they they reinvest uh in in new goals and in J you know you see pretty clearly that learning learning in general but especially in children is structured in what I would describe as a series of feedback loops where the child will uh notice something interesting come up with an idea set that as a goal like imagine you're you're there on the floor like crawling then you notice something that looks uh intriguing so you're like hey I'm going to grab it right so that's your
goal and now you're entering this sort of feedback loop where you're trying to uh reach that goal you're doing something towards it then you get some feedback and you're evaluating right you have this sort of like uh plan action feedback back to plan uh look and uh if you reach the goal then in the process you will have learned something and you will be able to reinvest uh that that new skill uh in in your next Endeavor um and the way the way they set goals is always grounded in the things they already know about
and you start not knowing much like when when you're born you're animated by just a few reflexes um but when when you start forming these goals they always come from uh from this layer that you've already mastered and you're you're building your own mind kind of like layer by layer like At first for instance um one of your most important sensory motor fances is your math because you have the cing reflex which is extremely important it's something that you're born with it's not something that's acquired it's extremely important important because it's high feed right um
and you also have the the things like the Palma grasp reflex for grabbing things uh but you cannot really use it yet because uh you you are not in full control of your limbs so you cannot really like grasp grasp things um but um when you start being more in control of your limbs you will want to grasp things and the reason the first thing that you that you try to do after you you you grasp a thing is you bring it to your mouth to sucket because you set this goal um because it sounded
interesting uh with respect to the things you already know how to do with the things you already find to be interesting right and once you know how to grab things you're going to add that to your to your your world sort of like inner world and you're going to build the next layer on top of those things so next thing uh you you're learning to crawl for instance why do you crawl why why are you trying Tove move forward because you saw an object that seemed interesting that you want to grab so you are learning
to crawl to grab something you are learning to grab to put it in your mouth and you're not learning to put things in your mouth because it's already something that's that's hardcoded um so you're you're sort of like constructing yourself in this sort of like clayer wise uh fashion so basically everything everything you know everything you think about is built upon uh lower lower level Primitives which are built upon lower level Primitives and so on and ultimately it comes back to uh these extremely basic social affordances that newborn children have I do believe we construct
especially you know young children they construct their thoughts based on the their s experiences in the world you you have to you you cannot think in a vacuum you have to construct thought to construct thoughts out of something uh and that something is extracted from your experience right and the younger you are of course the more uh grounded uh your thoughts are they they they they relate more directly to the things you're experiencing and doing in the world as you get you know older your thoughts will get increasingly abstract increasingly disconnected from physicality but they
are you timately you know built upon the physical layer it's just that um the the Tower of layers has gotten so tall that you cannot see the ground anymore but it's it's still connected so children see the kaleidoscope and the Kaleidoscope is Created from abstractions in the universe and then children over time derive abstractions from the kaleidoscope and reason over them yeah the the not is um bits in in their experience or their own actions that appear to be reusable uh that appear to be useful to make sense of uh naral situations and as you
go you're building up these uh vast libraries of reusable bits and having access to them makes you really effective in making s of new situations and and you said constructivist which is quite interesting so do do you think children construct different abstractions or do you think there's a kind of attractor towards representing the abstractions which the universe came up with I mean do different people come up with different models uh to some extent probably yes uh but because these models they ultimately extracted from the same kind of experiences and they extracted via the same kind
of process they will end up being very similar I would think I mean you you do you do definitely see that different children follow slightly different developmental trajectories but ultimately they are all somewhat parallel they they are all roughly following the same stages maybe with different timing you know so another interesting thing you've said is you know language models have near zero intelligence and I just wondered if it's near zero which part of it is not zero sure yeah and you know people people think that it's a very provocative statement because they're using LMS all
the time they find them very useful uh they seem to make sense they seem very humanlike and so I'm like hey they have near zero intelligence and that that that sounds kind of shocking but the the key is to understand that um intelligence is a separate concept from skill from Behavior Uh that you can always be skilled at something without that necessarily being inter ENT uh and intelligence is very specifically your ability to handle novelty to deal with situations you've not seen before uh and come up on the fly with models that make sense in
the context of that situation and this is actually something that you see very little of in llms uh if you ask them uh to solve problems that are significantly different from anything they've seen in their train data they will fail um so that if you define intelligence in this way and you come up with a way to Benchmark it uh uh like AR gii for instance and you try llms like all the state-of-the-art llms on it uh they don't have Zero Performance right um and so this is where the nonzero uh uh part of my
statement comes from so that said you it's not entirely clear with that nonzero performance that ability to adapt to novel problems uh is actual intelligence or whether it's a flaw of The Benchmark maybe the Benchmark was not actually producing uh entirely novel problems maybe there was very significant overlap between this or that question and something that the LM has seen string data it's very difficult to control for that because the LM has just memorized so much it has seen you know pretty much the entire internet plus uh tons of uh uh uh data annotations that
we accit specifically uh uh for for for that LM um and we don't know fundamentally what's in the train that so it's kind of it's kind of difficult to tell but it does seem to me that lens uh are actually capable of some degree of recombination of what they know to adapt to something that they've genuinely not quite seen before uh it's just that the um degree of distory combination their generalization power is very weak it's very low yeah this gets to the core of it because a lot of people argue that this combinatorial creativity
or this cone of extrapolation does constitute novel model building and I interpreted what you said as you know if we zoom out and think of the training process as well that that obviously is model building yes obviously it's just uh gr descent like fitting a curve to a data set vent is model building uh the major flaw there is that it's very inefficient uh model building uh it requires to get a good model you need a dense sampling of pretty much everything uh the model is going to have to deal with at test time so
the model is effectively only displaying weak generalization it can adapt to things it has not seen before but only if they remain very very close to things it has actually seen before and um where intelligence comes into play is the ability to adapt to things that are way out of the of the distribution because the real world is not a distribution right uh every day is new every day is different but you have to deal with it anyway critics will say and and I can I can empathize I mean I I use Claude Sonic all
of the time for my coding I I I'm paying for about I don't know 2,000 requests a month on on cursor so I'm using it a lot and it appears clairvoyant in many cases and they would argue I'm sure that well because it's trained on so much stuff the convex hole is you know enough to capture any novelty we we might need therefore what's the problem sure that's something I hear a lot I decided that yeah maybe novelty is overrated I just need to train on everything decided that yes there there can exist a dense
sampling of everything you might ever want to do everything you might ever want to know so I mean I disagree with that because because imagine you were you were training at lamps uh 10 years ago and you're trying to use them now uh they're not going to know about uh the programming languages that you're using they're not going to know uh about all the all the libraries and so on they're certainly going to seem uh much less uh intelligent just because uh there's this Gap in your knowledge the world is changing all the time and
you could say well but what if you just retrain the model uh on freshly scrapped uh data uh every single day I mean sure you you can do that and this will address some of these problems but still it's likely that at some point you will come up with uh problems that actually novel problems that don't have a solution uh on the internet uh and that's where you need intelligence right and I'm actually quite confident that uh at some point in the future maybe in the near future uh we'll be able to create a system
that can actually uh address uh this this issue of novelty that can actually take what it knows and recombine it uh in in truly original ways to address completely new problems once we have a system like this uh we can start uh developing new science for instance like um one of the things you cannot do with llms today uh is develop uh new science right because the best they can do is uh speak speed back to use some interpolation of something they've read uh on online right um they're not they're not going to uh set
you on the way to some Grand Discovery Again The Devil's Advocate on that I agree that the creativity and the reasoning comes from the prompter and because we anthropomorphize the models and we we misit the the role of of the human but still inside that addressable space in the llm with a human supervisor I'm sure we can um creatively explore the convex Hull of what is known not create new things sure uh you can do that and that's a process you know as as you said to be driven by uh you to human because you
are going to be the judge of what's interesting versus what's nonsense and without this sort of external verification uh it's it's difficult to make good use of alms and sure you know that's that's I think that should be the thing you always keep in mind when using LMS is that they're very good at making uh useful suggestions but you should never blindly trust the suggestions they make uh especially if it's uh something like code right you should always uh use it as a starting point but verify like make sure that it's actually corrects are very
good at putting you uh in the right direction but they're not very good at outputting exactly correct answers and perhaps that's why if we look at all of the successful implementations of llms or applications they always have a human super visor in the loop yes or it could also be an external verifier like sometimes uh the verification process something that you can delegate to a symbolic system so so now is a great segue for intelligence now um fans of the show will know yanuk and I have already made about eight hours of content on your
measure of intelligence paper back in the day we we poured through it and it's it's fascinating but could you just briefly introduce it now just to give a refresher sure so my definition of intelligence is uh skill acquisition efficiency so it's this idea that intelligence is separate from skill so if you have a benchmark that just measures uh the skill of an AI at something it is not a benchmark of intelligence it is always possible to score high without that actually displaying any intelligence whatsoever uh if you want to actually measure intelligence you have to
look at how efficiently the system acquires new skills given uh a limited amount of data so you have to control in particular for the data that the system has access to um and which usually takes two forms you know it can take the form of Prior like the information that uh the system only has access to before it's looking at uh your benchmark and then experience which is the amount of information that the system will extract from the task uh The Benchmark that you're giving to it um and so if you control for Prius you
control for experience and you measure skill uh then you have some measure of skill acquisition efficiency the information efficiency of uh uh the acquisition of high performance on an noval task um and that's uh something that I've tried to turn into a concrete Benchmark and that was the arch data set just a quick point on that is one of the potential issues with the measure of intelligence is that it's non-computable because we can't represent the domain of all possible tasks sure um so my in in in the paper I I had this uh formalization uh
of uh my measure of intelligence um and it is non-computable um its purpose is not to be used as a a practical tool like you're not going to actually want to run this equation uh on on a system and get a number out of it uh it is a formalism uh that's useful to think about the problem of intelligence precisely right it's a it's a cognitive device it's not a practical device for course there's this wonderful figure which we show up on the screen now which is you describe the intelligence system as being a thing
which produces skill programs while adapting to novelty but one one thing I was wondering though is you're you're talking about it as a kind of meta learning prior and do humans come with the metal learning prior baked in or is that something we also learn and should it be the same for AI system yeah so that's that's a a very important question um so intelligence is uh it's not skill it's a kind of meta skill it is the skills through which you acquire new skills and is this meta skill also something that is acquired through
experience or is it something that you're born with that's uh that's that comes hardcoded in your brain uh so by Evolution presumably um I think the answer is that it is both I think you are born intelligent so you are born with uh this skill acquisition mechanism but the skill acquisition mechanism does not uh operate in a vacuum it actually needs uh so it it's it's composed of two bits right there's the synthesis engine which um take takes a look at a new situation a new task and we'll try to uh combine existing parts existing
abstractions uh into a model all for that task for that domain um and there's the uh uh the abstraction engine bit which looks at the models that we have produced so far looks basically at the the information you have available and we try uh to produce reusable abstractions to be to be added back uh to the library that's going to be uh used by the synthesis engine the next time around and um the this library of course is acquired through experience and the better your library of abstraction uh becomes uh the more effective you are
uh at synthesis the more effective you are at acquiring uh new skills efficiently uh so I believe that this uh sort of like macro level architecture of intelligence is something that you are born with but as you use it throughout your lifetime you are getting better at it you are polishing it so you're not acquiring intelligence as a skill from scratch but you are polishing it uh another mechanisms through which I think you're you're poate is that um the synthesis mechanism is probably incorporating learned components so that synthesis is itself a synthesis from existing abstractions
is itself a skill and you are getting better at it as you use it so I think for instance a 15-year-old is going to get better it's going to be better at skill acquisition than a 10-year-old this is really interesting because in a way you're combining rationalism nativism with with empiricism because I think you're saying that there is the creation of denovo skill programs that are not just compositions of of the fundamental ones but the the broader question as well is we do this Library learning so children develop um they they finesse they refine they
build these abstractions and surely there must be some trade off with complexification because you don't want the library to be too big no right then you can't do search with it anymore so is there some kind of um pruning or does it Converge on a certain side is that the reason why our cognitive development seems to kind of plateau at a certain point um that's quite possible um you know that that's that's actually very very deep question it's also very practical I think uh to building an AGI so your AGI is going to have this
library for usable uh Primitives you want to expand the size this Library indefinitely Or you wants to cap it at some number like you want at most 1 million programs in it or something like that so clearly our ability to efficiently acquire new skills our intelligence uh does not uh improve over our lifetime in in unbounded fashion uh it seems to Peak uh relatively early on I think there's actually a trade-off here which is that your uh Rob power like for instance the the amount of information that you can integrate in your mind at any
given point um kind of kind of Tren down as you age inevitably um but the quality of the abstractions that you uh work with and also your intuition for how to combine them so the the learn components of the synthesis engine uh they do get polished over time they do get better over time time so you have this uh kind of factor that makes you smarter and this factor that makes you dumber um you know empirically I think intelligence probably Peaks uh in your early 20s that's when that's when you're the most the most efficient
in acquiring new skills um but then again you know it depends uh I think uh a higher level uh cognition uh Peaks probably in your early 20s but there are uh things that you should be learning uh earlier than that right anything so you know I I mentioned like cognition is built layer by layer each layer is built on top of the previous one uh the lower layers in the stack they crystallize they're setting Stone relatively uh uh early uh before 15 typically so if he wants to acquire any kind of skill that deals with
a low level sensor Primitives like you want to get really good at playing an instrument you want to get really good at singing you want to acquire a native accent in some language you should do it before you're 15 typically yes I mean on on the um the abstractions You could argue that that it's it's kind of limited by a computational bound or you could argue that it's just converging towards Universal abstractions but I wanted to comment on what you just said personally I think knowledge is very important so i' I've spent years doing this
thing with Keith dgar who's one of the smartest people I I know in the world did his PhD at MIT and he's taught me how to be smart just the way he thinks about things he has I've reprogrammed my brain and I'd much rather be like this than go back to my early 20 ear better abstractions much better abstractions and but then again I can give counter examples I've I've spoken with um uh I don't want to mention any names but but sometimes professors who lean too much on their knowledge and not their fluid intelligence
they can seem quite entrenched and so too much knowledge and not enough fluid intelligence can be a bad thing as well there seems to be some kind of optimal balance yeah so it depends whether um you're relying on it depends on whether you believe you already have the answers to the questions or whether you believe you you have uh templates that you can use to get the answers um gaining better templates for problem solving or even for for generic learning uh that that makes you more intelligent that's one of the uh points of Education like
if you learn math you learn physics you on programming now you have all this uh these meta level templates for problem solving that make you more effective at problem solving that even make you more effective at learning I think that 20 I was much more effective uh both in the in the in the the methods I was using um in in my Approach at language learning than than I would have been at uh 12 even though at 12 I had uh you know more more brain plasticity had more more memory it was easier to retain
things uh but I did not have the right uh tool set pretty much and that tool set is very much required um if you think you already have all the answers then you're not going to be uh looking to create anything new or looking for new information and maybe that's the the pitfall that uh uh some intellectuals kind kind of fall into uh they they think they've got everything figured out so they don't need to to search any further uh but instead if you're just uh carefully collecting and creating um ways to solve problems or
like interesting ideas and you're not not quite sure how you're going to use them yet but they they sound uh useful they sound intriguing um and then you're faced with something new you're going to look into your library look for for the the best sort of like uh thing to connect it to uh that's how you get insights like if you're uh if you keep all these things in mind and then you come across something new instead of ignoring it because you already know everything or you think you know everything you're going to try to
connect it with uh this solar flag uh uh things in your mind that are waiting for the click you know and then that's how you get uh big urea moments you know yes the templates become activated but I can give an example actually with your measure of intelligence paper I spent weeks studying in that paper um I I I read it so carefully and so deeply and I remember there were a lot of ideas in it that I struggled with and now I could read it I could just flick through it and I just got
it and actually it's the same with many other papers because you learn these abstractions on mlst we've always focused on the abstractions but maybe there's a cost to that because I'm just a cognitive path where my brain is just lighting up and then and I understand it but maybe there's something else I'm missing sure I think you know by sort of like abstracting away uh the details you're able to focus on the bigger picture the the third or the fourth time that you're reading it um and and then you kind of find something new at
a higher level yeah you don't get stuck in the details so at the end of the measure of intelligence paper it was from 99 right you you introduced The Arc challenge the abstraction and reasoning uh Corpus can you can you bring that in sure so yeah it's from from 2019 uh the abstraction reasoning Corpus uh it's uh a data set a benchmark that tries to capture uh the measure of intelligence that I outlined in a paper so um it's basically an IQ test for machines but it's also intended to be easy for humans it's a
set of tasks uh that are reasoning tasks so each task you get a a couple activ typically two to four demonstration examples which are uh the combination of uh an input image and an output image and the input image is uh basically a grid of colors they are pretty small grids uh typically like from five 5 by five to 30 by 30 30 by 30 is the largest uh and so you're seeing some patterns in this input grid and then uh you're told that it maps to a certain output grid with some other pattern and
so your job is to figure out what is the transformation what is the program that goes from input to output and you get uh a few pairs input output pairs like this to learn this program on the Fly and then you are given a brand new input grid and you must show that you've understood the program by producing yourself the corresponding output grid and um it's pretty easy for humans uh for instance the the so the the data set is split into different subsets there there's a public training subset which is generally easier it's intended
to demonstrate uh the sort of core knowledge priors that the the tasks are built upon so core knowledge is is another important concept here um I mentioned the the the the grids feature patterns while these patterns must be referring to something you know um and um in in order to build anything in in building blocks so these building blocks are core knowledge which are uh sort of like this knowledge prior that all humans are expected to have mastered by age roughly four so there are going to be things like objectness like what is an object
basic geometry like you know symmetries rotations and so on but basic topology like things being connected um uh uh agness as well gold gold directedness so uh just uh these very simple core knowledge systems and everything uh in the AR gii tasks is built upon these atoms of knowledge right um and uh so the the training subset is just intended to demonstrate what cor knowledge looks like in case you want to apply a machine learning approach and instead of you know hard coding core knowledge you want to uh learn it from from the data then
there's a public validation subset which is intended to be as difficult uh as the private uh private test Set uh so it's intended for you to test your Solutions and see what score you get uh and then there's the the private test set which is what we actually evaluate uh uh the the competition on uh on kle and uh it's pretty easy for humans because we had the private test Set uh verified by two people uh and uh each one scored 97 to 98% so there are only 100 tasks in the PRI test set so
it means they actually solved uh with no prior exposure uh 97 to 98 tasks out of 100 and together they get to 100 right so the task that each did not solve actually at the at the No No overlap um so that shows that if you're a smart human you should be able to do pretty much every every every task in the data set uh and U it it turns out uh this data set is tremendously difficult for AI systems um and so I released this in 2019 today the state-ofthe-art uh was actually achieved uh
uh earlier this morning it's 46% right yes nice one Jack and team yes Muhammad Jack and and Michael congratulations guys yeah congrats um so and uh so oh by the way there's actually uh an approach that's not public but that has a proof of existence uh which should do uh 49% at least 49% is uh what you get if you merely emble every entry that was made in the 2020 iteration of the competition wow why has nobody done that then um well it's not exactly Apples to Apples right because we are talking about hundreds of
submissions uh each submission was using uh some slightly different tweak on Brute Force program search uh but you have hundreds of them and each one was consuming some number of hours of compute so even if you had all the notebooks for all these uh uh submissions uh and you put them into one um Mega notebook it would actually take too long to run it in the competition right so in a way you are uh by assembling the submissions you are in a way um scaling up Brute Force program search uh to more compute uh and
and and you're getting uh better results you know in the limit uh if you had infinite compute you should be able to solve Arc purely via Brute Force program search right um it is definit ly possible to produce uh domain specific languages that describe uh Arc transformations in relatively concise manner in a manner so concise that you would never need more than like 40 uh different transformations to express a solution program um and and you're going to have like I know 200 Primitives in your DSL um well uh just uh uh finding uh every possible
uh program that's 40 operations deep uh out of a DSL of 200 if you had infinite compute you could definitely do that right well there's an interesting discussion point on that I think I raised this with Ryan and Jack which is that even if you did have an infinite amount of computation it's there there's still a selection problem because you you could select based on complexity for example selection is comparatively easy because you can simply so uh for for let's say you have infinite compute so for each program you get well technically you get an
infinite number of matches right uh but let's say realistically you get like 10 uh you can simply pick uh the simplest one like the shortest one but is the simplest one a good heris stic uh empirically yeah it seems to be okam razor it seems to work in practice because the other potential weaknesses um I mean you mentioned Elizabeth spoky and Folks at home you should read she's from Harvard she's a professor of psychology and uh you know came up with those those core knowledge prior but I think you're coming at this very much from
the psychology school of thought which is that we should um understand the psychology of the human mind and build AI around that is that fair yeah so I'm uh a little bit cautious about the idea that AI should try to emulate human cognition I think we don't really understand enough about the human mind uh for that understanding to be a useful guide when it comes to creating AI so I have my own ideas about how uh how to how intelligence might work and and how to create some software version of it but it's only partially
derived from you know introspection and looking at people interesting and the reason I said it might be a potential weaknesses let's say we select the lowest complexity program we have an infinite amount of computation we we do the program synthesis and then we assume that because all of the generalization space would be in the kind of um compositional closure of the prior that we start with then it will work yes but but that is an assumption sure but it's a reasonable assumption you could also train a system to judge uh whether a given program is
likely to generalize or not it would use length uh uh on on the DSL as one of its features but not the only feature one of the other really important things about the arc challenge is Task diversity and the reason we need task diversity I think if I understand correctly there are about 900 uh tasks in in the original um Arc challenge now you spoke about developer aware generalization what is it and why is it so important right so developer Weare generalization uh is deciding that when if generalization is the ability to adapt to things
that are different from the things you've experienced before um then it kind of matters what a frame of reference you're taking are you taking the frame of reference of uh the agent does it matter if this agent is able to adapt to things that it has not in person experienced before or do you take the frame of reference of the developer of the agent uh are you trying to get the agent to adapt to things that the developer of the system could not have anticipated um and I think the the correct frame of reference is
the frame of the developer because otherwise uh what you end up with is the developer is going to build into the system either via hard coding or via pre-training um the right kind of uh models and data so that the agent is going to be capable of Performing very well but without actually uh demonstrating any kind of generalization just by leveraging the prior knowledge uh that uh that that is built into it the current um Arc Benchmark I just wondered if you could comment on on its weaknesses but just to cite a couple of examples
Melanie put Melanie Mitchell put a piece out saying that it should be a moving Benchmark and delip George put an interesting piece out saying that it might be perceptually entangled in in a way that that we might not want so what are your Reflections on the potential weaknesses of it sure uh I mean AR is the first attempt at capturing my measure of intelligence uh it's pretty crude attempt because of course you know I'm I'm technically Limited in what I can produce and it has of course uh pretty pretty strong limitations so I think the
first limitation is that um it might be falling short of its goals in terms of how much diversity there is into it and uh uh how much uh novelty so some tasks in uh uh version one of rgi Because by the way that's going to be version two as well so some tasks are actually very close to each other there's some redundancy um and they might also be very close to things that exist online some of them and which might be actually one of the reasons why you see LMS uh able to solve some percentage
of Arc maybe they're actually doing it because they've seen similar things in their train data so I think that's the main flaw um and um so yeah so melan Mitchell mentioned you know this a benchmark like this should be a moving Benchmark I actually completely agree uh I think ultimately uh to measure intelligence you're going to want not a a static data set you you're going to want a a task generation process um and you're going to ask it for a new tasks it's going to it's going to be capable of giving you something that's
very unique very many different uh handcrafted just for you it's going to give it to you and then um it might try for instance to measure uh how data efficient you are in solving the task so it's it's first going to give you maybe one or two examples are going to challenge you to figure it out um and if you cannot then maybe it can give you a couple more and then a couple more and that way so the reason why something like this would be interesting is that you can start benchmarking um approaches that
have very low intelligence like for instance uh curve fitting via gr on the scent uh technically care fitting V inent is a kind of program synthesis so you should be able to apply it on Arc uh the main reason why you cannot is because for each uh task you only have a couple examples and and the space is not interpolative so it doesn't really work C fitting doesn't really work but if for each task you had 1,000 examples for instance it could be conable that you could fit a curve uh that will generalize to to
novel inputs um well if you have this uh Dynamic task generation and example generation system then you can start benchmarking uh uh techniques like this and it will be interesting because then you can start uh grading on the same scale um fitting a Transformer understand versus program search uh Brute Force program search jistic program search the planning guided program search and so on and then you can start seeing um very concretely what it means uh to be more intelligent but it means to be more uh uh data efficient in your ability to produce generalization and
the other thing that you can start grading uh when you have uh uh this this sort of dynamic uh Benchmark generation process is you can start grading how much generalization power uh different systems have uh so you can you can measure how data efficient uh your your synthesis your model synthesis process is but also how much generalization power the output model has because you can uh challenge the test taker with different inputs that will be more or less difficult so you you start at the lowest level by demonstrating a task with very few examples and
let's say for instance very um very simple uh test inputs and uh as you go further you're going to add more examples to kind of kind of refine the constraints of the problem but you're also going to uh send the test taker uh much more difficult uh examples of the problem uh to kind of test how far it can generalize or how complex uh the models it can produce can be I love this idea of of a generative Arc and and I can see ultimately Arc will be a generative Benchmark yes and and I guess
that is similar to the way things work in in the world so there's a a generative function of the universe it produces the kaleidoscope and we go backwards from the Kaleidoscope to the generative function but knowing this is the thing like we in this intelligence process we need to know what the priors are and the priors must be either fundamental or deducible from the fundamental priors that were there in the first place yes that's right and you know I think the the big Pitfall uh to avoid here is and that's actually the reason why um
I did not uh release Arc one as generative Benchmark this was by the way uh the first Direction uh I investigated when I was uh trying to come up with the thing that eventually became Arc um I was thinking that I would I would create a program synthesis Benchmark where um the the the test examples would be created by some kind of master program and um and I investigated many different directions um things like s automata and so on like for instance you're you're given the output of s automata and you need to reverse engineer
uh the rules that produ it that sort of thing um and ultimately so I did not go with that for several reasons so one reason is that I wanted the tasks to be easy intuitive for humans and uh that's actually difficult to achieve in this way uh I also wanted to avoid formalizing uh too much of the core knowledge because uh any uh um formal formulation of core knowledge might be losing something might be missing something important that you cannot really put into uh into words but that that is there and uh also because and
that's very important if you just write down one master program and let it generate uh your data set then the complexity of the tasks in your data set is fundamentally limited by the complexity of the master program and so as someone trying to solve The Benchmark the only thing I have to do is reverse engineer the master program and then I can use it for instance to generate uh infinitely many tasks that it can fit could fit a curve to uh or I just uh hardcode the system that already understand already understands how this uh
Master generative function behaves and can anticipate it right so I can hack The Benchmark um and that's why ultimately I ended up with this model where every task in AR one is actually handcrafted uh By Me In this case and I I think you know that's that's touching on on something that's uh um is is is subtle but very important which is that I'm I'm a big believer in the idea that the solution to the problem in Ence must be co-evolved with uh the challenge The Benchmark like The Benchmark should be uh a tool that
points um researchers in the right direction that's that is asking the right questions but to ask these questions uh that is in in itself that is a complex problem so I think if you if you were capable of coming up with a master program that generates uh a test of intelligence that is uh Rich enough complex enough novel enough interesting enough to be true test of intelligence uh coming up with that program is as hard as coming up with AI it is in fact the same kind of thing you you basically need AI to create
to create uh the challenge that uh AGI is a solution to right how explainable should these programs be I mean as an example you could explain to me the reason why you got a coffee this morning or something like that and I would understand but AGI presumably would be able to build models for things that we don't understand like um economics or financial markets or something like that it would be an inscrutable uh mess so how could that work well yeah so uh egi would be capable of approaching a new problem and new task and
new domain and uh very quickly and very efficiently from very to data coming up with a model of that thing and uh that model should be productive so it should be able to anticipate the evolution of of the system it's looking at in the future um I think it should also be uh caal so you should you should be able to use it uh to plan towards goals like you can imagine like I have this model of the economy for instance I want to get it uh towards this state here are the uh interventions can
make that will actually causally lead uh to to to desire state so it should be a a practive Model A causal model that you can use to sort of like simulate the behavior of of of the system um and I think that actually makes it inherently interpretable uh you don't need to explain how the model works you can you can just show it in action so one example is let's say we're are looking at ARC we're not looking at the econom anymore we're looking at a task in argi um currently most of the uh program
sythesis approaches they are looking for uh input to Output transformation programs and if you're not reading the contents of the program then one way you can interpret them is just running them uh on on a test input and seeing uh what you get I think um the kind of model that an actual uh AGI would produce in this case case they would not just be input to Output Transformations they would explain the contents of the task so there would be programs that you could use for instance um to produce new instances of the task right
or even to go from output to input uh when when applicable instead of just going from input to Output uh and such a kind of program is extremely interpretable because you can just uh uh ask for new examples and and then look at them right okay so so I can imagine there might be some kind of mediated interface which does encapsulation you know we understand the interface but maybe we should think about this the other way so when I've spoken to AI researchers I've gone through Arc um challenges together with them and they are trying
to look at their introspection so they're saying I'm looking at this problem and I I know it's got something to do with color I know it's got something to do with counting and and then they run the program in their mind and they say one two three no that doesn't work that doesn't work and and then they try and formalize that into some kind of an approach do you think that the way we introspect is a useful way to build a solution for the arc challenge I think so I think introspection is very effective when
it comes to uh getting some idea of how uh your mind handles system two uh thinking I think it's not the effective for system one because system one is inherently not something you have direct access to it happens like unconsciously instantly uh in in parts of your brain that you're not directly observing uh via VIA via your unconsciousness but system 2 is not like that system 2 is very deliberate uh it's very slow very low bandwidths there's only you know a few things happening at any any given time H it's it's very introspectible so I
think you know what what you're describing this idea that you're looking at a new task you're trying to describe it uh via a set of properties in your mind uh and then you're coming up with a small number of different hypothesis about uh what could be some programs that match uh these uh these descriptive constraints and then you're trying to execute them in your mind to check that your intuition is correct I mean that's canical uh system to thinking right um I think that's basically how uh program syntheses works in the brain but what's not
uh mentioned here is all the system one parts that that are in support of this system two thinking I'm really a big believer in the fact that no cognitive process in the human mind is pure system one or pure system two everything is a mix of both so even when you're doing things that seem to be extremely reasoning heavy like solving Arc or doing math or playing chess or something um there's actually a ton of pattern cognition and intuition going on you're just not noticing it right um and it takes the form for instance um
the fact that um you're only looking at maybe two to four different possible hypothesises for your Arc task in reality the space of potential programs uh is is immense there's like hundreds of thousands of possible programs you could be looking at but no you're only looking at like two or three uh and what's doing this reduction is uh your intuition right uh or pattern cognition it is system one and I think that the reverse is also true even when uh you're looking at cognitive processes that seem to be extremely system one uh like perception for
instance um there's quite a bit of system two elements uh when uh I think perception for instance is very very compositional it's not pure input to Output matching the way deep learning model would do it there's actually quite a bit of General a composition that happens and that is actually system too I really agree that there's some strange entanglement between the two systems I mean there was one task where color certainly had something to do with it select you know you can almost visualize it as a SQL query you know um Group by the colors
uh select counts order and ascending order Skip One take three you know that that kind of thing and it's similar to abduction in the sense that there's this perceptual um inference happening to this set of hypotheses and and then at some point I'm doing some post hog verification which really does seem like system to but that but the whole thing seems to work together in a symphony yes and the they are so intermingled that maybe saying um that we're looking at system one plus system two or System One versus system maybe that's the wrong framing
maybe what we are looking we're looking for is actually a different kind uh of data structure of or substrate that underlies cognition that is inherently both system one and system two um but yeah what you're doing in your mind as you describe is basically program synthesis but that program synthesis is very very heavily guided uh by perp Primitives and just by intuition about what you feel like uh what what you feel might be the the the the correct solution so when we Implement um programming synthesis in in a computer I mean we could just do
a naive greedy Brute Force search and then we have this combinatorial explosion tell me about that right um the primary obstacle that you run into if you're doing program synthesis so program synthesis at a very high level it's you have a a language so typically it's it's domain specific because that's a shortcut so it's not like a language like python it's a language a little bit more specialized than that um and you have a bunch of functions in this language and you use them to create programs a program is basically just a composition of these
functions uh into something look like in in the case of Arc it's typically going to be a program that Tes as input um uh an input grid and produces the corresponding uh output grid and um the way you do program syn is that you try a bunch of compositions of this function and for each each one uh each program you're going to run it in practice so run it on on the target input look at the corresponding output and check whether that output is the output you expected uh and you you do that across all
the examples that you have available uh across all all the programs that you can come up to and then you look at which are the programs that uh actually match actually produce the correct output across all the examples right uh and maybe you have one one such program that's a match maybe you have 10 and then you you must make a selection you you must try to guess which one is more likely to journalize and typically it's going to be the shorter one but the huge uh bottleneck that you face is that um the size
of program space like the number of programs you have to look at grows combinator with uh the number of building blocks in your DSL but also with the size of the program program so if you're looking for uh programs that involve for instance 40 different function Calles um you're looking at a very very large space so you could not possibly uh iterate over every individual element of that space uh so that's the Comal explosion bottleneck and uh humans clearly do not suffer from this problem like you you describe this introspection process when you're looking at
an arc task and you're only executing a very small number of programs step by step and you're only really executing them to verify that they're actually correct U you apparently rely on uh an extremely powerful kind of intuition um that is not entirely reliable which is why you still have to perform this ver verification step it does not give you the exact right answer kind of kind like an llm um I believe what the LMS are doing is actually the same kind of cognitive process it's it's it's better matching right it's intuition uh so you
still have to verify but it's directionally correct it's doing a really really good job um at sifting through pretty much this uh almost infinite space of programs and reducing it to just a few possibilities and uh I think that's actually uh the really hard part uh in cognition as this reduction process so there are some interesting approaches to work so I spoke to um Jack Cole and Ryan greenblat and then there's there's the dream coder um type approach maybe we should start with dreamcoder because you know tanon B's group at MIT um you know Kev
Kevin Ellis was the author of The Dream Cod of paper and he's actually working with Zena tarez building a lab called basis I spoke with them the the other day and they are very much focused on the arc Challenge and they're they're implementing a lot of mit's work um on the arc challenge which is which is really cool but I guess like the the elephant in the room is that dreamcoder and please introduce what that is it's a really elegant beautiful approach to Arc but unfortunately it doesn't work very well yet right so it's been
a while since I read the paper but my recollection of dreamcoder is that it's a program synthesis technique that tries to create a bank of reusable Primitives uh that that is actually uh developing kind of like as it gets used uh to to Sol new tasks and I think that's a fundamentally right idea and it's probably the only system in which I've seen I've seen this idea in action this idea of abstraction generation that you're going to use your experience and your your problem solving experience to try to abstract away uh functions that you're going
to put uh in your DSL for for reuse later also remember it had this uh uh wake sleep yes cycle so I think that was to um train uh so the the the synthesis component that they had leverage deep learning and they were training the Deep learning model via the the Wake slip uh setting can you can you correct me yes so they they had a um a neuron Network generative model for programs and then they had a sleep phase where they would retrain the generative model and something called an abstraction um sleep where they
would kind of combine together programs that work very well and discard ones that weren't been used very well you know that that kind of thing yeah yeah that's that's what I usually call abstraction generation yes like I see intelligence as having two critical components uh synthesis where you're taking your existing building blocks and assembling them composing them together to create a program that matches the situation at hand right and then there's abstraction generation where you're looking back on the models you generated uh and uh or or just your your your the data you got about
the world and you're trying to mine it uh to extract reable building blocks that you're sending to your memory uh where you can reuse them the next time around and yeah and dreamcoder was actually trying to implement uh these two components which I think is really uh uh the right direction so it's very promising so what about Jack Cole what what what do you think of his solution and and that's the mind's AI group on on the leaderboard right so what they're doing is basically basically they're doing an llm so it's it's an encoda decoda
model I think it's based on on T5 on the T5 architecture uh they are pre-training on a large code and math uh data set because apparently it helps which you know on its own it's an interesting finding um and uh then they are further functioning it on millions of generated Ark like tasks so they're producing pratically uh lots of tasks that look like our task and they're functioning uh the model on it when when I say functioning so they're basically for each task uh they're tokenizing uh the task description they're reducing it to a sequence
of tokens so that's that's actually pretty easy uh feding that into D LM and they're expecting to produce uh the output Grid in tokenized form and then they're de coding that back out um and um so just the setup I I described on its own as it turns out the not perform very well it does like a few percent but they added a really powerful twist which is that uh they're doing test time functioning so they're um taking their their pre-trained uh llm and uh at inference time on each new task they're producing a fine-tune
version of D llm so they're doing that by uh producing variants uh of the task uh by applying a bunch of randomized hardcare Transformations basically um and they're turning that into a sort of like mini train data set they're fing the llm on that train data set and then they're uh applying that function model uh on the test input and and producing a test output um and if you think about it so just this uh test time functioning trick is actually getting their model from a very very low performance like small percentage of that solved
um to as you know over over 40% uh which is very impressive so if you zoom out by a lot I think what they're doing is not that different uh from program search it's basically uh at a different point on the Spectrum so you can think of program search as a as a spectrum with two axis one axis is like the richness and complexity of your DSL of your bank of reusable building blocks and the other axis is the richness and complexity of the ways that you recombine uh these building blocks and um discrete program
search typically is going to operate over a very very small DSL like DSL with maybe 100 to 500 uh primitive functions in it uh but it's going to recombine them in very complex ways uh to get programs that may have depths 20 for instance um and what Jack Cole is doing is basically turning Isam into a a database uh of reusable Vector functions and it has millions of it so it's very very broad very large uh DSL in a way and then test time fine tuning is using uh gradi descent to recombine these Primitives into
a new program um and by the fact that you have this huge uh performance jump from not using test time fing to using test time fing really highlights empirically the fact that recombination program search is a critical component of intelligence if you're just doing a static inference you're not doing any any sort of fre combination or or if you're doing it it must be uh some form of in context learning so basically uh using a memorized recombination program um if if uh if if you're only doing static inference you basically do not display much intelligence
at all uh if you're doing a recombination via test time F tuning then you are starting to implement the synthesis component uh of intelligence that I described and the problem is that gradient descent is a very weak very data inefficient way of doing synthesis it is in fact a wrong prod time uh and so what you get is that uh the resultant programs uh have a a very shallow depth of a combination right so on the on the program synesis Spectrum um the Mind ey solution is uh this point where uh they're really maxing out
on the richness of the DSL axis but they're very very low on the depth of Rec combination AIS whereas uh discrete program search as it's usually implemented is uh on the complete other side of the spectrum where you have a very very small very concise DSL but very sophisticated combination right um and intuitively my guess is that what makes human intelligence uh uh special is that it's not at either end of the spectrum it's somewhere in between you have access to a very large very rich uh Bank of abstractions of of ideas and patterns of
thought uh but you're also capable of combining them on the Fly uh to a very meaningful degree you're not doing uh test time fing in your brain when you when you're coming up with noal ideas you're not doing gr and descent at all you are doing some form uh of discrete program search uh but you're doing it on top of this very very rich uh Bank of Primitives and that enables you to solve any Arc problem pretty much within seconds I remember reading your deep learning with python book many years ago and you were talking
about the Perils of fine tuning you have to have the learning rate quite low because you might damage those representations in in the Bas model and when I spoke with Jack he said that um I'm not sure how much of it I should say publicly but he encoded the the fine tuning in a kind of language which would reinforce the existing manifold of of of the model so you know he was kind of like saying I want to use it as a foundation model by transforming the descriptions in a way that that reinforces it and
um and also the the active inference thing it's not active inference from a prestonian point of view but the test time inference that is moving away from what you said earlier which is that it's not a retrieval system I'm actually now generating new compositions as as part of the inference process that's correct it's not just a retrieval system when you when you're just doing staic inference uh with LM you're just prompting it get getting back some some result um that's pure retrieval uh and there's very little re combination happening any recom if it happens must
go through uh one of these uh pre-learned recombination programs like you know some people say that um in context learning uh is leveraging some kind of hardcoded grad descent algorism that's latent in D llm so maybe that's happening but whatever is happening clearly empirically uh we can see that it doesn't work very well it doesn't adapt to novelty to very meaningful extent but um if you add test time f then you are actually starting to do real recombination right you're not just uh reapplying uh the the programs stored in the llm you are trying to
to modify them to rec combine them into something that's uh custom to the task at hand that's the process of intelligence right uh my I think you directionally this is the right idea uh the only issue I have with it is that uh gr and descent is just a terrible way to do recommendation it is it is a program since algorism of course right uh it's it's just the the wrong approach so in which case I mean I had this discussion with Jack when I interviewed him but while I accepted that it's a general method
of course it's still um domain specific in the sense that you have to come up with a prompting technique in order to find true in the language model and so on but but it could in principle be applied to um you know fairly broad domains of of problems but you would agree though that it goes against the spirit of your measure of intelligence so there are there are elements of the approach that are not quite in line with the spirit of the of the competition I think in par the idea that um is going to
pre-train his llm on millions of generated AR tasks so this kind of makes me think of an attempt to anticipate what might be uh in the in the test uh data set in the private test set um try trying to generate as many tasks as possible and hope for collisions between uh what You' generated and what's actually going to be uh in the test set so that of course is trying to hack uh The Benchmark via memorization um it is not what we intended uh but you know ultimately it is up to us uh the
creators of The Benchmark to make sure that it cannot actually be hacked via memorization that it is a resistant memorization if we did a bad job with that because it's actually possible to anticipate what's in the the private set then that's on us so in practice by the way I think we did a decent job because uh that so if if you're not doing test time fine tuning right you're only getting a very low uh accuracy on the test set so it kind of shows that yes the test set is actually decently novel right I
think this also shown by the fact that um the best uh llms right now if you're just doing uh direct prompting they're doing uh so the best one is uh clo 3.5 it's doing 21% right so it kind of implies that uh about 80% of the data set is decently novel right even if you if you uh use as your frame of reference the uh entirety of the internet pretty much uh so that's actually a get sign uh but I think you know in in Jack CO's approach also uh the overall approach is in the
in the spirit uh uh of what I had in mind because what it's doing is a form of program synthesis it it's just that is gathering um via VIA learning is it's gathering this enormous DSL right and then it's doing very very Shadow combination and doing it with SC and descent which I think is is not what you should be doing but it ends up working right so why not I agree with that so so actually in spirit it's the right approach but it's bottleneck by stochastic radient descent on on a on a large language
model but um this is just an interesting segue though so again in your deep learning with python book I think around chapter 4 very pedagogical for folks who want to learn about machine learning you spoke about the leakage problem so you know the reason why we have a training set and we have a validation set and a test set is we don't want information to leak between the the sets and it can happen inadvertently so for example every time someone gets a new score on on the The Arc challenge it's tested on the private set
and that's information and people then modify their approach and it's as if they've seen something in the private set when when they haven't seen it directly that's correct and what they've seen is that uh this approach that they've tested said performs better so now they've learned something about the the conents uh of the of the private test set and yeah like many folks even you know folks who are machine learning experts they have this um misconception that you can only overfit if you are directly training on something if if you're if you're using this Trin
data that's not the case so for instance uh some years ago people were doing uh neural actic research to find uh new carvet architectures um that would perform well on imag net they all used imet uh as as their reference and what they were doing is they were mining this enormous space of possible architectures and selecting the ones uh that ended up performing well when trained uh on on on imet uh and what you ended up with was an architecture that was at the architecture level uh overfit to the the imaginate evaluation set right um
in in J if you have any sort of process that extracts uh information even even just a few bits of information from your evaluation data set and is re injecting this information back into your model even if it's not an automated process even if it's if it's just you looking at the results and then tweaking the approach by hand uh you are starting gradually to over fit to to to what your you're testing on and uh you ultimately this would happen uh with the the the uh private test set of argi it's just that because
the the only bit of information you get each time you submit something is your total score you're really not extracting many bits of information right um but eventually because uh each participant can make three submissions a day um and there are many participants eventually would start over fitting um which is part of the reason why we're going to release uh a version two of the data set and by the way with version two the data set we're going to uh do something that is pretty important that should have been done earlier probably which is that
um we are going to we're going to have two private test sets right there's going to be the one uh that we evaluate on when you submit and and for which you see the score that's going to be the public leader score but then we're also going to have an extra private one which we're only going to uh evaluate your your Solution on at the end of the competition so that you're going to proceed through the competition by only getting this backing all that here's how how well you perform on the first uh private test
set right but at the end we're going to swap that out with the new one and then you're going to hope that your model will jiz to it hope being the operative word yeah yeah I mean now might be a good time to talk about um our friend Ryan greenblat from Redwood research I I interviewed him he's a very very smart guy I enjoyed working with him and he um did a kind of uh you know let's generate loads and loads of candidate programs with with an llm and then um validate them in a kind
of he didn't want to call it a neuros symbolic framework which I thought was curious but what what do you think about his approach yeah I think that uh directionally that's the right approach you know we we kind of described how uh when you are solving an art task you are generating a small number of hypothesis and they programs and then you are actually executing them in your mind uh to verify whether whether they're correct or not right uh it's the is the same kind of process where you're using a big intuition machine to produce
candidate programs and these candidate programs you're hoping that they're uh more or less right but you're not sure right so you still have to verify them uh via uh via a system two type uh process uh which you know in in in this case that's going to be a code interpreter in your case you're actually literally going to be executing the programs in your in your head um I think that's that's basically uh again the same type of uh program search uh approach that we're seeing uh among the folks that are doing Brute Force program
search or the Mind ey approach is just a different point uh on the program C the Spectrum but it's the same kind of thing right uh and in J you know I think the the research direction that is the most promising to me is combining uh deep learning with discret program search maybe not quite what what Ryan greenl is doing but the idea that you're going to use uh a de planning model to guide program search that it it has to look at fewer Cate programs or sub programs that is absolutely the right idea right
so I'm not surprised that this is getting good results and I do expect you're going to keep seeing uh even better results from variance of this approach so one thing I would change um is instead of generating endtoend python programs and then uh just having a binary check is it correct or not um I think it might be more interesting um it might be a better use of the llm to generate uh modifiable graphs built on top of an an arc specific DSL and then instead of just checking whether the program is correct or not
you might want to do uh local discrete search around your candidate programs basically use your candidate programs as uh seed points like starting points uh for discret search to uh reduce the amount of work that the discret program search processes to do um and you know in general use I I keep repeating this but you should use lamps as a way to get you uh in the right direction but you should never trust it to land uh in the exact right spot you should assume that where you land is probably close to the solution but
is not exactly the solution you're still going to have some some amount of manual work to do to to to go from uh the points like for instance the candidate programs that the DM produced to the actual solution and that work has to be done by a system two type process yeah I discussed this with him and he still is of the mind that they are doing emergent reasoning and given enough scale that the Divergence between alteric risk and epistemic risk will tend towards zero which of course we don't agree with but I agree with
you that wouldn't it be interesting if it's quite stateless the system at the moment wouldn't it be interesting if there was some kind of program library and maybe retrieval augmented generation into the library he does have some interesting properties to the solution which maybe you might want to comment on he's using um Vision he's doing some inter prompting he's using self reflection he's got like a candidate evaluation methodology what do you think about the overall thing sure um I think it's promising and uh yeah you know I I think we're going to we're going to
keep seeing variant of this that are going to perform well and this is this is the reason why we introduced uh the public track in the challenge you know we we kept hearing from folks saying hey I'm sure uh gp4 can can can do this we were like well maybe uh let's try it um and of course you cannot uh enter the private competition with GPT 40 because it would it would involve sending the private task data U to the open are server so it would no longer be private so that's not possible so what
we did is that we introduced an alternative test set right which we call semiprivate so it's it's private in the sense that we're not publishing it but it's also not quite private because it is being sent uh to open air servers and or anthropic servers and so on um and um we we did this because we want people like cryan green blads to show up and come up with some uh sophisticated uh Chain of Thought Pipeline and and prove us wrong if possible and just before we leave this bit are you aware of any other
interesting approaches which perhaps aren't in the public domain but you know about so I am aware of various people making claims uh about about their solutions to Arc uh but I'm not aware of specific details they tend to be very secretive people uh and ultimately I only trust what I see we have two tracks we have the private track on kle with a lot of money on the line uh we have the public track where you can use any set of the artm you want um if you if you have something you you you should
submit it to one to one of the two tracks if it's self-contained then just go for the money uh if if it uses an l then use the public track but if it's not on the leader board I'm probably not going to be I'm I'm not going to believe you are the organizers worried that if someone did reach human level performance that it would be worth more than a million dollars if they sold it somewhere else um sure maybe I I doubt that's what's going to happen though but maybe interesting and and also um just
on the economics of it this is quite an open source approach but what do you think the incentives are because if if I already had a really good solution if I was Jack Cole I mean I would it's worth me spending six months on it because there's a good chance I might win if I have nothing then maybe I'll just have a quick look and see if there's anything but I won't invest much time um versus start up a lab and put the money into that and just hire good people to work on it so
of course there there's a there's a big money price but you know we don't expect that people are going to show up and Sol of Arc because they want the money specifically uh the amount of money is not high enough that this is going to happen instead uh the money that we putting on the line is just uh a signal to uh indicate that this challenge matters and we're serious about it and we think it's important but ultimately the real value that there is uh in uh in submitting a solution and winning is I would
say reputational value it's like you become uh the first person to practice open challenge that's been that's been open since uh since 2019 uh and presumably Your solution is a big step forward towards the GI um a lot of people are talking about Arc right now uh if you were to solve it you would definitely make headlines right uh it would be a big deal so for instance you mentioned starting a lab well uh it would be a great opportunity to start a lab around your solution and then raise a bunch of money right and
you could do that uh just just on the momentum generated by your your winning entry could you comment on you know I had sabaro kahati on recently and he's got this llm modulo architecture which is really interesting you know basically you have this neuros symbolic you know llm generating ideas critics what do you think about that general idea yeah I think that's uh generally the right approach like you should not blindly trust the output of nem instead you should use it as a an intuitive suggestion engine it will give you good candidates but you should
never just blindly believe that these these candidates are exactly the correct solution that you're looking for you should verify and this is why llm modulos some exal verifier is so powerful is because you are cutting through uh the the communal explosion problem that that that would come with uh trying to iteratively trying every possible solution uh but you're also not uh limited by the fact that LMS are terrible at system to right uh because you you still have this last mile verification and that's going to be done by true system to uh uh solution the
architecture was really interesting because it was by directional as well so the outputs you know like the verifiers might give you yes no maybe or some additional information and then the LM could be fine tuned and so on but but my read on it though is that it brutalizes it a little bit because the verifiers of course are very domain specific and that seems to be slightly different to some of the solutions to the arc challenge yeah uh it will tend to be domain specific and also it's it's not always the case that you're operating
in a domain where there can be an exteral verifier right uh sometimes there can be I think in particular this is true with program synthesis from input output BS so in this is true for Arc in fact um because you know you know what output you have to expect given certain input and what you're producing can be uh your producing program so they can actually be executed they can be verified uh for many other programs you have no such granes right so moving on a tiny bit um agency yes now I I think of agency
as being defined as a virtual partition of a system that has self-causation and intentionality allowing for the control of the future and I assume that it's a necessary condition for intelligence and I know you don't because we spoke about this the other day but what do you think is the relationship between agency and intelligence right so you know many people kind of treat uh agency embodiment intelligence as almost interchangeable Concepts um I like to separate them out uh in in my own model of the Mind um and the way I see it intelligence is a
tool that is used by an agent to complish goals um but it is it is related too but it is separate from uh your sensor motor space for instance um or uh your ability to set goals and I think you can even separate it that from your world model so I don't know if you're an RTS player Maybe yes as in Command and Conquer Warcraft right Warcraft Warcraft exactly uh so all these games are are RTS games and in RTS game well uh you have you have you know units moving around and you can give
them commands um and you have a mini map as well so imagine that you're selecting a unit and you're right clicking somewhere on the mini map to tell the the UN need to go there well um you can think of the mini map as being a a world model like it's a simplify representation of the actual world of the game that captures uh key elements of structure um like where things are typically and where you are and when you're right clicking the mini Maps you are specifying a goal and well in this in this metaphor
uh intelligence is going to be the past finding algorithm it's taking in this world model taking in this goal which are externally provided and figuring out what is the correct sequence of actions uh for the agent to reach the goal right it's it's about uh intelligence is about navigating uh intelligence about navigating um future situation space it's it's about pathf finding in future situation space um and uh in in in this metaphor you can see that intelligence is a pool it is not the agent the agent is made of many things including uh a goal
setting mechanism you know in this metap for it's it's played by you you are setting the goal uh it's made of a world model which enables the agents to represent what the goal means uh and maybe simulate uh planning uh it's also going to be uh uh including a sensor motor space like an action space uh and and uh that can that can receive sensory feedback as well um but the agent is the combination of all these things and they're all separate from intelligence intelligence is basically just a way to take in information and turn
it into an actionable model uh something that you can use for planning right it's a it's a way to convert uh uh information about the world into um a model that can navigate uh possible evolutions of the world I agree with everything you've just said I think the tension is after speaking with people like Cole friston you know when we think about the physics of intelligence and you know this epic particle system we live in with function Dynamics and behavior and so on um the agency and the intelligence it's not explicit the world model isn't
explicit so there seems to be something else going on which is why in many cases I think of agency and intelligence as being virtual properties rather than explicit physical properties that's not to say that we couldn't build an AI where everything is explicit because that would be useful we could we could build it in computers but there's always the tension of whether we think of the world as this complex simulation of low-level particles and nested agents I have cells which are agents and my heart is an agent and and I'm an agent or whether it's
explicit all right well I think in the first AGI that we're going to build uh these different components are going to be uh explicitly separated out in software because that's simply the the the easiest uh way to get there at least that's my take on it uh the architecture is going to be explicit yes so you you actually spoke about functional Dynamics the other day which was music to my ears obviously being a fan of of the fresian worldview what what's your take on that so to be to be honest with you this is actually
uh something I've been thinking about but I do not have very crisp ideas about it yet but it is my general intuition as to how the human mind performs program synthesis so I think um there are there are two scales two levels at which the mine um changes itself there's the the long-term scale which which has to do with uh abstraction mining like abstraction generation and memory formation it's um uh it has to do with neuroplasticity as well you are basically changing Connections in your brain uh to store reusable programs your your your formalism of
intelligence focuses a lot on internal representation so this idea of in our minds we we have a we have a world model and so on and when I read some of your blog posts from from years ago you're talking a lot about um this externalist tradition which is that a lot of cognition happens out outside of of the brain how do you reconcile those two World Views right um well I'm I'm a big believer that most of our cognition is external as you say like uh when when we're talking to each other for instance we're
using uh words that we did not invent we're using uh mental images ideas that we just read about somewhere and so on um and if we had to develop all these things on our own you know we would need extremely long lives uh to start being intellectually productive so um I don't think there's really any any contradiction between the two views like the idea that sure like humans uh as individuals are intelligent uh you possess intelligence I possess intelligence uh we can use it sort of like in isolation on our own uh uh and and
uh we can extract from our environment from our lived experiences we can extract um reasonable bits uh which we can make use to make sense of normal situation stats the process of intelligence we process it uh as individuals uh but also um we we able to to communicate right we're not we're not just individuals we're also Society so uh these ideas these reusable abstractions we can uh extract them uh from our brains we can uh put them uh out there in the world share them with others like we can write books for instance uh we
can type up computer programs that can be uh not even just executed by other brains but even by computers right um and um this process is just the creation of culture and then once culture is out there you can download it into your brain and that's education and as you're doing it uh you're sort of like artificially uh filling up your bank of reusable abstractions uh and it's a use shortcut you know uh it's it's almost like downloading skills like in The Matrix uh it's it's a little bit of that like uh learning about physics
learning about math uh You are downloading these um very very rich uh reusable mental templates like really mental building blocks and then you you can in your own brain you can Rec combine them uh you can reapply them and new problems it makes you uh more intelligent like literally more intelligent it makes you uh more efficient at skill acquisition more efficient at at problem solving and so on yeah beautifully articulated I mean there's a couple of great books I've read on this The Language game and um also Max Bennett's book on intelligence basically talking about
this um the plasticity of mtic information sharing um you know allowing us to stand on the shoulders of of giants I I think there's a there's a an interesting uh angle to the question you asked I know I don't know if if you were aware of it but what I've described there is this idea that humans are the source of uh abstraction uh human individual human brains use their lived experience to extract abstractions and then they're uh externalizing them via language typically not not exclusively but most of the time uh and then other brains can
download these abstractions and kind of make them their own which is a huge shortcut because you you don't have to experience everything on your own uh to start leveraging abstractions um but in this model abstraction generation and abstraction recombination to from new models is always happening inside brains right the only part that's externalized is the memory that you're uh uh moving the abstractions the reable building box out of these individual brains uh putting them in in books and so on uh and then and then downloading them back but to be useful they need to be
internalized in your brain uh a question then is could uh abstraction generation or recombination actually happen outside as well uh not necessarily in the context of creating an AGI because you know that's exactly what what an AI would be it would be uh this uh recombination and abstraction process this synthesis and abstraction process uh encoded in software form but do we have today like external processes that that that implement this well I think we sort of do I think science in particular uh is doing a form of synthesis uh that is that is driven by
humans but uh it is not happening inside human brains like we have the ability to uh do recombinative search over uh spaces that actually cannot fit inside human brains I think you see it uh in a lot of the things that we invent like uh when when you create a better computer for instance uh you are doing some kind of freom search over a space of possible devices but you are not really able to hold a a full model of the device inside your own brain instead the model is distributed uh across some some number
of externalized artifacts um and I do believe that human civilization is implementing this highly distributed um synthesis part of the of the process of intelligence it is implemented it externally across many different brains manipulating externalized symbols and artifacts and this is what's underpinning a lot of our civilization because the systems we we've been creating we've been inventing are uh so complex that no one can really understand them in full so you cannot run uh this uh this invention process inside brains anymore instead you are using brains to uh drive a much bigger externalized process so
I think cogn is externalized not just in the sense that uh we have the we have the power to uh uh write down and then uh read uh ideas abstractions and then reuse them inside our brains we're actually running uh intelligence outside our brains as well I completely agree and you've written about this about how intelligence is collective situated um and and externalized yes but there's always the question of of yeah many of you know like science for example is is is a a kind of collective intelligence which supervenes on us and and languages as
well but do things like mimesis happen outside of um biology I mean certainly it happens in in the world you know the selfish Gene it happens with with genetics but you could argue that a kind of mimesis actually happens just in any open physical system with certain patterns of functional Dynamics and so on so um you know the real question I think with this externalized cognition is where do the abstractions come from perhaps our brains are just very efficient at building the map from the territory and it's it's just a slightly better way of doing
what already happens naturally externally yeah um I think to a large extent uh the way we've externalized cognition is uh not as efficient as the way we've implemented cognition in our in our own brains um these externalized ctive processes they you know so intelligence is is a kind of search process right over a space of possible ACC combinations of a thing um I think right now the search process is to a large extent externalized when you're looking at technology when you're looking at science um but it's not externalized in a very smart way I think
we're roughly implementing uh Brute Force search I see it a lot especially in theep planning research um the way the deing commune as a whole is finding new things is by trying everything else and eventually hitting the thing that works you know um and I believe uh uh individual humans actually much if they if they had enough brain power to actually model these things uh in their own brains uh they would be much more effective at finding finding the right solution interesting I mean Ryan greenblatt's view was emblematic emblematic of some of the ex risk
folks and that he was arguing that he can be in a hermetically sealed chamber or be a brain in aat and it's a pure intelligence he would still be able to reason and solve tasks and and and so on and the counterview is that physicality and embodiment is really important I mean when I asked Mary Shanahan this I said what's the reason why we need to have physically embodied robots and he said well these robots are interacting with the real world they're understanding the intricate causal relationships between things and that helps them build models more
efficiently but perhaps in service of just learning about the abstractions which already exist in the physical world yes to exercise intelligence uh it needs to be operating on something like you think out of something uh about something like you need to have some concrete environment and goals in that environment that you want to accomplish and actions that you can take so it's about something it cannot be about nothing but it's also uh made of something you are uh making your plans to reach your goals based out of existing components existing uh sub routines uh if
you have nothing at all uh you not only you have nothing to be intelligent to about but your uh intelligence has nothing to recombine right um and that's why embodiment is important I mean in in humans you know I mentioned this idea that cognition is built layer by layer each new layer which is a little bit more abstract and the the one before it it is built um in terms of the components that came before and if you uh dig uh deep enough if you unfold your mind layer by layer at the very bottom uh
you will find things like uh The cirking Reflex for instance uh it's like it starts everything starts with uh your mouth uh and then um you you start having things like uh grabbing objects to put them in your mouth and then things like crawling on the floor so that you can reach objects so you can you can grab them and put them in your mouth and so on and at some point where you start putting objects in your mouth uh but the new things you're learning are still expressed in terms of this sort of FL
concept and skill hierarchy right and and uh when when you end up doing abstract math well you are using building blocks that eventually resolve to this extremely primitive uh uh sensor motor uh sub routines right so yeah embedment is important but at the same time uh I think the kind of body and and s affordance space that you have is very much uh plug on play if you have a true AGI uh you could you could basically if if you have an AGI you could plug any environment uh any soral space uh any DSL as
well uh into it um and it would start being intelligent about it you know uh so in that sense like embodiment is important uh but what kind of embodiment might not might not necessarily be important um and you know uh another thing that's really important is goal setting by the way which is distinct from embodiment is also distinct from intelligence if you're just a brain in jar uh uh with with nothing to think about well uh you're not going to be very intelligent but also you're not really going to be doing anything because you have
nothing to do you have no goal uh uh to drive your thoughts um and I think this is especially true if you if you're looking at at children the way you learn anything is by setting goals and accomplishing them you cannot really build uh good mental models uh uh good good World models uh passively purely by you know uh observing was going on around you uh with no goals of your own that's not how it works uh goal setting is a critical component of any any intelligent agent I completely agree I think the only unresolved
tension in my mind is that there are many manifestations of intelligence and it is possible for us to build an abstract explicit version which would run on computers essentially it doesn't necessarily need to mimic the type of intelligence we have in the real world yeah I think so and I think it will probably have uh at least in its first few iterations it will probably have significant architectural similarity with the way intelligence is implemented in in people but um you ultimately you know it it might it might Drift Away towards towards entirely new types of
intelligence now you've said that language is the operating system of the Mind what did you mean by that right so what's an operating system right it's not the thing as a computer um it is something that makes your computer more usable uh and more useful it empowers uh Computing for some user um well it it empowers some user uh to to to best leverage uh the capabilities of their computer I think language plays a similar role for the mind I think language is distinct from the mind like it's it's a separate thing from from intelligence
for instance or even from a world model but it is a tool um that you as an agent uh is leveraging to make your mind to make your thinking more useful right so I believe language and thinking are separate things language is is is a tool for thinking and what do you use it for well I think one way is that you can use language to uh make your thoughts uh intros spectable your your thoughts are there they're like programs in your brain which you can uh execute to get their output um but you cannot
really look at them uh by writing them down uh in in in words I don't mean like literally writing them down but just expressing them as words uh suddenly you can start uh reflecting on them you you can start looking at them you can start comparing them and a critical you can start indexing them as well I believe one of the rules of language is to enable you to uh do indexing and retrieval over your own ideas and memories if you did not have language uh then to retrieve memories you would have to rely on
external stimuli right like you know a poost is eating a mlin and it's reminding him of of a specific time and place and um if uh Pro did not have language then every time he he every time he needs to uh think about that particular time and place he would have to eat the mland this would be his only access point to that memory right this exal stimul uh if he has language then he can use language to try to uh query uh his his own world model and retrieve uh the memories that you want
so it's uh it's a way to express what you want to retrieve uh inside your own mind uh it's also a way to compose together more complex thoughts if you cannot uh reflect on thoughts if you cannot kind of like materialize them and and and look at them and and modify them in your mind then I think you also quite Limited in the in the complexi of the thoughts you can you can formulate this this is a very very simple program analogy by the way if you have a computer you can actually use it to
write programs you do not need an operating system right you can just write in a assembly code um why not but you are severely limited uh in in in terms of the the complexity of the software you can produce if you have U an operating system and you and and and and you have uh you know high level programming languages and so on then uh these are tools that you can use as a programmer uh to to develop much more complex software and your intelligence as a programmer your programmability has not changed it's just your
tools that have got better incident you are much more capable that you were before right so I think intelligence is using language as a similar kind of tool yeah we have this information architecture of mediated abstractions at um it's almost like concentric circles um of of complexity and in the language game they spoke about you know scissors are a physical tool and and language are the mtic equivalent of scissors and of course we can compose these tools together and and use them in different circumstances but moving to Consciousness a tiny bit I mean you suggested
that Consciousness emerges gradually in in children how does this you know inform your your views of of machine Consciousness right so I mean to start with I am not that interested in the idea of machine Consciousness I'm specifically interested in intelligence and related uh aspects of cognition I think Consciousness is a separate problem clearly you know it has some relationship with intelligence uh you you you see it for instance in the fact that well anytime you you use uh system to thinking you are aware of what you're doing Consciousness is involved so clearly there is
a relationship between Consciousness Consciousness and system to uh the nature of this relationship is not entirely clear to me and I also do not pretend that I understand consciousness well and honestly I don't believe that anyone does so I'm I'm always very suspicious when I when I hear people who have a very very uh detailed and and precise and cical ideas about about Consciousness so that you know I I do believe that it's plausible that machine Consciousness is possible in principle I also believe that um we don't have anything that resembles machine Consciousness today uh
we're probably pretty far from it um for for a system to be conscious you know it would need at the very least it would need to be uh much more sophisticated than a sort of like input to Output mapping that you see in deep deep learning models in llms um at the very least you would expect the system to have some kind of permanent State um that gets uh influenced by uh external stimul but that is not just fully Set uh by external stimul it has some kind of uh consistency and continuity through time uh
it can influence its own future states it is not purely reactive right I think Consciousness is is in opposition to a purely reactive type systems like deep learning models or insects maybe um and I don't think we have any any system that looks like this today also think Consciousness requires to ability to introspect quite a bit like this sort of like uh self-consistent state of the system uh that is maintained across time it should have some way to represent and influence itself should be self self self driving in a way um and we don't have
anything uh like that today but in principle you know maybe maybe it's possible to uh build it and so you mention uh this this thing I mentioned on Twitter like this idea that um uh babies are not born conscious which partly is extremely controversial so maybe I can I can say a little bit more about that um so first of all you know we have no real way of assessing with 100% certainty whether anyone is conscious at any stage of development right it's it's basically guess um it seems to me that uh babies uh in
the womb are very unlikely to be conscious because uh they're basically uh they they're basically fully asleep all the time like they're uh asleep you know they're in in one of two possible uh sleep States like 95% of the time there's U deep sleep where they're just you know inert and there's active slep uh where they're moving around you know and you know the mother can can feel them move around and when when they're moving on they're not actually awake they're actually asleep it's just active sleep and the remaining 5% is not uh wakefulness it's
just transitions uh between deep sleep and active sleep and the reason they are just sleeping all the time is that uh they're being sedated right uh the womb is very low oxygen pressure environment um and the sedating them and also uh the placenta and the baby itself are producing uh anesthetic products basically the the placenta uh is actually producing anesthetics and so that's keeping the the babies like in this uh dreamless sleep uh pretty much which doesn't mean by the way that their brain is not learning their brain is not like just disconnecting and doing
nothing uh they are actually learning but they they're learning in this very passive way you know they're just Computing statistics about uh what's going on in the environment which which is you know what brands do whether you're awake or your sleep um but yeah I I believe that babies in the womb are not conscious and when they're born this start at at Consciousness level zero pretty much um and as they start being awake and start experiencing the world uh then Consciousness starts to light up but it is not this sort of like instant switch where
where they go from uh uh being unconscious to being fully conscious it happens gradually so you start at zero and by the way you can't have to start at zero even uh after you wake up because uh when you're born you have nothing to be conscious off you know like um pretty much everything not just actions but even perception is something that you have to learn through experience when you're born uh you cannot even really see because you have not learned to see you know uh you have not trained uh your visual cortex right so
you can see maybe like Blobs of light uh you cannot you do not have a model of yourself of your own s affordances uh you have maybe a very crude Proto model that you developed by moving around in the W and having your your your brain can kind like map what's going on and and correlations kind of like in the in in your space but it's not really a model it's not sophisticated model of anything so you have nothing to be conscious of you have no world model no model of yourself uh no real uh
incoming perceptual stream because you have not learned to take control of your of your s per forces just yet so you start at zero and then as you build up these models uh your world model your model of yourself and so on uh you start gradually bit by bit being more conscious and uh at some points you you reach a level where you can be said to be fully conscious the way may be like a dog might be fully conscious and I think it happens pretty fast it happens probably significantly earlier than the first just
clear external signs of Consciousness uh I think around one month oldish their babies are probably uh conscious to at the same level as you know most most mammals I suppose um but that's still not adult level Consciousness right um and um I think adult level Consciousness is something that children only start experiencing around Age 2 to three doesn't mean that they they were not cons the whole time like again their conscious pretty much starting on day one it's just to to very small amount right um and uh so Consciousness is something that you have to
build up over time at least that that's my theory and um there are some sort of like indications that uh this is not entirely made of basically um one example is uh if you try to observe a tensional blink uh try to measure it in in you will see that basically up until age three they have a significantly slower attentional blank than adults uh and they're going to PSE um um the events around them into uh U fewer few events so they kind have a more coarse grain resolution uh of time in in the world
um and I think that actually uh that that's St that's St to this idea of level of Consciousness and I also have this this very probably controversial idea that well so you you reach adult level Consciousness around like age two to three roughly um but then you don't stop there you actually keep getting more and more uh conscious over time and um your Consciousness level probably Peaks around age like 9 to 10 and then then it goes in reverse you get less and less conscious with every every passing year but not to a very uh
significant extent so that uh the the difference in degree of Consciousness between um I don't know a 90-year-old and a 10year old and a three-year-old is actually very very minor but it is still there and I think this uh plays into some things like for instance our subjective perception of time I think the more conscious you are the the the higher your your level of consiousness the slower uh your perception of time because your perception of time is highly dependent on how many um things you can notice in any any time span so one way
you you could conceptualize your degree of Consciousness is you can imagine Consciousness is kind of like Nexus uh in your world model it's a focus point from which um from which span like a bunch of connections to other things uh connections that encor this this Focus point and give it meaning and these connections they can be they can be uh they can be fewer of them or more of them and they can be more or less deep right and the deeper the connections the more you have the more conscious you are uh and and there's
also this uh this uh uh temporal component where uh if if you're highly conscious then even in one second you might be noticing many things and drawing many connections between these things uh and and and things you know uh that's that's a higher level of Consciousness on the other hand uh if you're not if you're noticing very few things if you have a very coar grain perception of reality uh that is evolving and and you're only noticing few things uh in in any any time span then uh you are you have a faster perception of
time like things just you know passing blink um and that that's that's a lower level conscience like uh if you drink a lot of booze uh you have reduced Consciousness right and things will actually seem to move faster and you will not sure things and the depths of connections that you establish between things is is less um I think something like you know if you're if you're a one-year-old Toler uh you have uh a much slower uh tensional blink your perception of time is likely very very fast and you know we have this idea that
CH run passive time slower I think that's true but it really depends on your Edge I think if you're one time is super fast because again you're you're at this lower level of conscience if you're three it's basically adult level but if you're 10 it's actually pretty slow right or if you're seven it's it's slow as well it actually gets slower and slower and slower until until it picks around at age like 9 10 then it starts getting faster again because you're less and less conscious at a time I remember being very bored when I
was a child I've not felt bored in as long as I can remember and um I interviewed Professor Mark SS recently he's got a great book called The Hidden Spring and his basic idea is that Consciousness is prediction errors so the more uh you know like you're conscious when you first learn how to drive so the more things become automated the less conscious we are and then maybe time goes faster in many ways as we grow up but this idea of being more or less conscious is is really interesting as you say it's like a
a dius which but on on the machine sentience thing I remember you came on the show to talk about the Chinese room argument and you said understanding is a virtual property of functional Dynamics in in the system and presumably you would also argue that Consciousness is a virtual property of functional Dynamics in in the system I think so I think it it is not strongly tied to substrate so in principle you should be able to implement Consciousness uh using the right functional Dynamics in Silicon yes theoretically I don't think we have it or that we
close to having it but in principle I don't see a problem with that yes and we'll leave the hard problem of Consciousness to one side although Mark SS was quite dismissive about the hard problem of of Consciousness you know which is that there is something it is is like to be conscious well I think there is I think there is yeah like some people dismiss yeah some people dismiss the problem of Consciousness saying yeah no like something like Consciousness is what it feels to be in information processing system or things like that it really means
nothing it's just pushing uh the problem back to where you can better control it with words but it's not reducing the problem H there is clearly such a thing as qualia and you are experiencing them right now so you cannot deny that they exist uh and we have no way to explain or even describe what they are like you can describe many things about Consciousness but the con the the subjective experience is not reducible to to these explanations there is something and we don't know what that is and you think we we have it and
animals have it but yes animals have it I mean not all animals uh and I again like I believe in this idea of degrees of Consciousness um and and animals probably have it to less extent than we do it might it might not be huge difference by the way but it's probably less yeah do you think the Earth could be conscious to some degree no I don't think so I think um um nonanimal systems typically lack the basic prerequisites that I would want to see in a system to even start entertaining the notion that it
might be conscious like for instance the ability to maintain um this uh self- influenced self-consistent uh inner State across time uh that's influenced by perception but that that is also capable of uh um driving itself pretty much influencing its own future State that's uh capable of representing itself introspecting and so on uh I don't think you see that in non-biological systems today do you think the collective of all Americans could be seen as a conscious being no why not again because it lacks uh these basic prerequisites so it needs to be a physical form of
connectedness to the surroundings it couldn't there couldn't be a virtual version distributed over many agents you no you you you could definitely imagine a distributed version it's just that I'm not seeing the collective of all Americans for instance uh implementing this uh self- influenced self-consistent uh State that's capable of uh representing itself and the world uh and so on and even then you know even if you have these things in a software system for instance it's not automatically conscious it's just that it start being plausible that it might be conscious if you also see uh
uh signs like uh pretty clear signs it might be so what what might be such a sign well uh it's difficult and I I I don't think that you you're ever going to see uh um a proof of Consciousness a proof of Consciousness that works 100% of the time I think it's always kind of kind of a guess but typically you know I think it's highly likely that system is conscious if it has all these prerequisites and it is capable of expressing uh statements about its own inner State um that cannot be uh purely a
product of repeating something the system has heard you know like if you ask an llm about how it feels and so on it will answer something but it's really just uh rehashing something it has read so what I would want to see is uh the system is making statements about how it feels and there seems to be a strong correlation between the behavior of the system and what it is telling me and what it is telling me is unlike anything that the the system has seen elsewhere before like I don't know I'm I'm uh uh
holding my my two-year-old and trying to uh console them because um they're crying and I'm like hey you shouldn't cry stop crying and um they're like but I want to cry that's that's how I feel like well there's a pretty strong correlation between what what what the child is doing and what they're saying about themselves so you can believe them uh and they've never heard anyone saying I want to cry it's it's they're really expressing something they could they could not have picked up from anywhere else you know so in this situation is just highly
plausible it is not proof of anything it is highly plausible that they they in fact do have uh some awareness of their their own mental States and they're expressing something about them and they are actually conscious they are experiencing qualia you know so France you've been very critical of um singularitarianism and and I think there there are good stories like stories about the end of the world this ideed that we are living uh in the end times and maybe that we have a role to play in it um these are these are good stories which
is why you find them a lot uh in fiction like in science fiction for instance you find them a lot in religion as well uh and they're not new they've been they've been around for thousands of years uh so I think that's the primary driving force it's just that they are they are good as memes they are good stories people want to believe them uh and and they're also very easy to retain and propagate that's that's really the the main thing uh you know everyone is just craving meaning uh uh to organize their lives around
which is why uh Cults are still are still a problem uh in in our day and age uh and that's just an instance of that I think think do you think there's a bit of a messiah complex as well as absolutely yeah absolutely I think you you you see it a lot in the in the San Francisco Bara um there there are people who have who have kind of lashed onto this idea of building a GI um and while using it to sort of like picture themselves as Messiah as you say personally I see creating
a GI as a scientific problem not not a religious Quest um you know and this this is often um kind of merging together with the idea of FAL Life by the way uh which is of course very natural because uh uh the the story in most regions is always about uh this this combination of um anyway um but yeah it's it's it's uh kind of merging as well with this idea of eternal life right that if you create a GI uh it will it will make you live forever pretty much so it's it's is very
religious idea right um and it has become this religious quest uh to get there first uh and and whoever gets there first will become uh as Gods right so I'm not really subscribing to any of that I think building a GI is scientific problem and uh once you build a GI it's basically just going to be a very useful and valuable tool it is going to be you know as as as I mentioned a past finding algorithm in future situation space it's going to be a piece of software that takes in information about the problem
and is capable of very efficiently synthesizing a model of that problem uh which you can use to make decisions about the problem uh so it's a avilable tool but it it does not uh turn you into God and certainly you can use it in scientific research and maybe you can use it in longevity research but does not automatically uh make you immortal because it is not omnipotent I think if you start having very powerful ways to turn information into actionable models your bottleneck quickly starts becoming the information that you have so for instance if you
have uh an AGR that can uh do physics it can quickly synthesize new uh physics theories um the the thing is uh human scientist is today they're already very very good at that they're in fact too good they are so good that their ability to synthesize plausible new theories far exceeds our ability to collect uh experimental data to validate them that's what that's what you see with string string theory for instance um and that's that's a pretty Stark illustration of the fact that if you're too smart then uh you start you start rainning kind of
like free uh uh uh of information um and that's that starts not being very useful anymore right uh applied intelligence is grounded uh in in experimental data and if you are very intelligent then experimental letter becomes a bottleneck so it's not like you're going to see a runaway intelligence explosion is there anything that would make you change your mind I mean again I had this discussion with green blat and I try and avoid having x- risk discussions when when I'm actually debating and a lot of it hinges on agency so I said because I don't
think systems are are agential or will be I don't see the problem because a lot of the the Mythos around this you know the bromium ideas around instrumental convergence and orthogonality it's all goals it's all agency based so no agency no problem presumably you agree but you know maybe if there was agency would you think there was a problem yeah no I think intelligence is separate from agency is separate from goal setting if you just have intelligence in isolation then again you have a way to turn information into cable models uh but it is not
self-directed it is not able to set its own goals or anything like that goal setting has to be an add-on an external component you plug into it now you could imagine that well what if you combine thisi with an autonomous goal setting system with a value system uh you turn all of that into an agent and then you you give it access to uh the nuclear codes for instance something like that is that dangerous well yes but you've you've kind of engineered that danger in a very uh deliberate fashion right uh I think once once
we have a GI uh we'll have uh plenty of time to kind of uh anticipate this this kind of potential risk so I do believe you know uh AGR will be a powerful technology uh so this is exactly what makes it valuable and useful um anything powerful is also potentially risky but we are very much going to be the ones in control because AGI on its own cannot set set goals until you actually create uh uh an autonomous goal goal setting mechanism but why would you do that you know so the the difficult part the
dangerous part is not the intelligence bit it's more like the the the goal setting um and action space bits uh and if you want to create something very dangerous that creates that uh sets its own goals and takes action in the real world you do not actually need uh very high intelligence to do so you can already uh do so with very crud techniques right so the thing is um existential risk I mean it's a legitimate form of inquiry and especially nuclear risk for example and and I know me many of these folks they're not
just solely focused on AI existential risk they're looking at other risks as well but how do you view the incentives I mean you could be really cynical and just say oh effective altruism and open philanthropy they're throwing lots of money at this and what they actually want is power and control how do you how do you kind of think about this well there's there's definitely a bit of that I also think a lot of the True Believers they're just buy into it because they want to believe it's it's again it's very parallel to religious ideas
uh in many ways so I don't I don't think it's it's very rational you know um so that said you know once we have a GI because today we don't and I fig par close to it but once we have it then we can start think about uh the risks that are involved I don't think you're you're going to see you know um the day the day you just uh start training the program it becomes self-aware and takes control of uh uh your lab and so I don't think you're going to see anything like that
uh again intelligence AI is just a piece of software that can turn data into models it's up to you to use it in certain way right I mean like an abstract way to think about this is framing it as safetyism and governance in general so if we take away the hyperbolic X risk and we talk about um you know misinformation and and things like that sure what do you think about that um I mean maybe I should be more specific I mean uh you know deep fakes and and misinformation and infringement of copyright and so
on do you think that we should strongly regulate this or would it harm Innovation if we did I think there are definitely harms that can be caused uh by current technology uh by current and near-term uh uses uh of AI and yes I think some form of Regulation might be useful to protect the public against some of these harms I also think that um the the regulation proposals that I've seen so far are not very satisfactory they are more uh uh leaning towards harming uh Innovation um than paring the public I think ultimately they will
they're more they're more likely to uh end up concentrating power uh in the AI space SP uh than than just protecting the public um so I think regulating I is difficult and just relying on existing non AI regulation to product people uh might be the better course of action uh given that introducing a new AI specific regulation um is you know it's it's it's it's difficult problem and I I don't think based on what I so far I don't think we're going to do a very good job at it Fran it's been an honor and
a pleasure thank you so much it's my pleasure thank thanks so much for having me amazing [Music] [Music] [Music]