A Brain-Inspired Algorithm For Memory

169.17k views3759 WordsCopy TextShare

Artem Kirsanov

Get 20% off at https://shortform.com/artem In this video we will explore the concept of Hopfield ne...

Video Transcript:

consider the following scenario you are at a party when you hear a short snippet of your favorite [Music] song Almost instantly your brain recalls the lyrics of that song and many related memories such as attending a recent concert featuring that artist it seems very natural and unimpressive after all people can recall information all the time however if you think about it this problem is computationally non-trivial let's put ourselves in the shoes of evolution and try to come up with an algorithm for the brain to solve it the first approach that comes to mind is to

actually store some kind of a database of all the songs You' have heard a sufficient number of times along with related information like the title and lyrics When an audio fragment of an unknown song is received D we can scan through all the songs in our database find the one that has a close enough match and retrieve its lyrics however the search space of every song I've ever heard is astronomically large and it's even larger when considering every single memory you've formed since childhood performing an exhaustive search would be simply impossible yet you seem to

have no problem instantly recog recognizing familiar stimuli and finding associations between them so how does the brain accomplish this so quickly in this video we will lay the foundation for a new paradigm of information storage and retrieval which is more in line with Biology and actually build one of the simpler models of this process known as hopfield networks developed by John hopfield in 1982 who laid an important groundwork for many ideas in both neuroscience and machine learning if you're interested stay [Music] tuned just to reiterate we need a way to somehow query what we know

and find associations between existing memories and new inputs without explicitly checking individual entries for a map which seems like an impossible problem however we can draw insights from a seemingly unrelated field of molecular biology and in particular a concept known as living thoughts Paradox of protein folding as you may know proteins are long chains of amino acids that fold into specific three-dimensional structures which determine their function the number of possible structural configurations a protein can take considering all the different ways you can arrange the atoms of an amino acid chain in three-dimensional space is absolutely

enormous given the number of possibilities it seems like it would take an astronomical amount of time for a protein to search through all the possible structures to find its correct folded state in fact there are computations showing that even if the protein samples its different confirmations at a nanc scale it would still will require more time than the age of the universe to arrive at the correct configuration yet in reality proteins fold into their native structures in a matter of milliseconds so how do they accomplish this when I first heard this Paradox in high school

it seemed to me like an illposed question after all the protein molecule is not a computer so it doesn't do any sort of search it just falls into the most stable and favorable configuration according to physical laws this is similar how when you throw a ball the ball doesn't surge through all the possible trajectories to select the optimal parabolic one it simply follows that path because well physics works this way but how can we think about this folding into a favorable configuration favorable for what exactly let's introduce the concept of energy as it will come

in handy in future videos as well if you're if you think back to your high school physics days you may recall something along the lines of energy is a quantitative property that describes the state of a system namely the capacity to do work or cause change energy can be stored in a variety of different forms and for the case of proteins we will be interested in potential energy stored in the interactions between the atoms in the protein chain each possible configuration of the protein chain has a specific potential energy level determined by the sum of

all of these Atomic interactions in other words we can assign a positive number to each state equal to its energy which is a function in some very high dimensional space where different dimensions correspond to degrees of freedom you need to uniquely describe a configuration for example all possible dihedral angles of peptide bonds let's abstractly visualize it as having just two dimensions then the energy function can be thought of as a surface where each point on it represents a possible protein configuration and the height of the point represents the potential energy of that configuration this is

what we are going to refer to as energy landscape for a protein it would be a complex rugged surface with many Peaks and valleys now now here is the key point a protein molecule like any physical system tends to minimize its potential energy Guided by the second law of Thermodynamics it will naturally seek out the configuration that has the lowest possible energy level as this represents the optimal arrangement of its atoms and this in fact corresponds to the native correctly folded State when a protein is folded it is essentially rolling downhill on the energy landscape

following the steepest path towards the valley this is why proteins can fall so quickly they don't need to search through all possible configurations they simply follow the natural tendency of physical systems to minimize their potential energy the protein's folding process is Guided by the shape of the energy landscape which in turn is is determined by the interaction between its atoms and The Descent along the surface is essentially driven by the underl physical process of energy minimization now the core idea is to achieve something similar for the case of associative memory suppose we have a system

that can encode information in its states and each configuration has a specific potential energy determined by the interaction between the T hates then we need to First somehow sculpt the underlying energy landscape so that memories or state patterns we want to store correspond to local Minima these wells in the energy surface second we need something that would play the role of the second law of Thermodynamics and would drive the changes in the states directing the system towards the nearest local minimum once these two things are achieved retrieving a memory that is most similar to the

input pattern is done by configuring the system to encode the input pattern initially and letting it run to the equilibrium descending into the energy well from which we can read out the source memory sounds neat right so let's get into building it let's consider a set of neurons which we can think of as abstract units that can be in one of two possible States plus one or minus one this is a simplified analogy of how nerve cells in the brain encode information through patterns of firing they either generate an electrical impulse at a given point

in time or remain silent we'll focus on the fully connected Network where each neuron has connections to every other neuron these connections have weights associated with them real numbers that signify the strength of coupling between the corresponding pair of neurons for a pair of units I and J we denote the connection weight between them as w i j and the states of neurons themselves as XI and XJ in the brain connections between neurons or synapses have a well-defined direction a pair of neurons is connected asymmetrically meaning that the syapse from neuron a to neuron B

is physiologically separate from the signups that connects B to a if that one exists at all and so they can have different weights while we could generalize a hop field Network to account for asymmetric connections it would introduce complications and potentially unstable behavior for Simplicity here we will stick to the original formulation of the hopfield network which assumes symmetric weights in other words neurons I and J are connected by the same weight in both directions now that we have a set of neurons symmetrically linked with each other through weight and connections let's explore what these

connection weights represent if wi J is greater than zero the connection is said to be excitatory and favors the alignment between the states of two neurons we can think of each connection as being either happy or unhappy happy depending on the states of its neurons for example if wi J is a large positive number it means that neurons I and J are closely coupled and one excites the other in this case when one neuron is active the other tends to be active as well and when one is silent the other one is more likely to

be silent these configurations where both x i and XJ are either one or minus one agree with the connection weight however if we observe for example that x i is equal to 1 and XJ is equal to minus1 it conflicts with the excitatory nature of the connection making such a configuration less likely conversely when wij is negative the connection promotes misalignment between the weights this alignment between the signs can be expressed more concisely using the product x i * XJ this product will be positive when both neurons have the same sign and negative when they

have different Signs by multiplying this product further by the connection weight we obtain an expression for the happiness of that connection for a positive wij happiness will be positive when the product of the two states is positive but this is just one Edge we can extend this idea and and compute the happiness of the entire network as a whole by summing this quantity across all edges the larger that number is the more overall agreement there is between connection weights and pairwise states of neurons ultimately we will search for a set of Weights that maximize this

quantity and maximizing happiness is equivalent to minimizing it with a minus sign which you can think of as the measure of overall conflict between the actual configuration of states and what's favored by the connection weights this total conflict between the weights and pairwise states is exactly what we're going to Define to be the energy of the system as we discussed previously we want the hopefield network to be able to gradually evolve towards energy Minima but looking closely at the formula we can see that the energy value depends both on the states and the weights so

there is a lot of things the system can tweak to change it what exactly is getting adjusted as we will see further there are essentially two modes of network updates that nicely map to the two aspects of associative memory namely adjusting the weights corresponds to shaping the energy landscape defining which configurations are stable by digging energy wells around them this is the act of learning when we are writing new memories into the network once the weights are fixed to weaken the states of neurons to bring them into greater agreement with the weights corresponds to descending

along the energy surface this is the act of inference when we are recalling the memory that is at the bottom of the energy well which is nearest to the configuration of the input pattern let's take a look at inference first suppose for a second someone has already set the weights W and hands us the backbone of the network the neurons themselves with all the connection weights however the exact configuration of states which neurons are active and which are silent is unknown the question then becomes how do we find the state pattern that would minimize the

total energy as we discussed simply checking all possible States is not an option so we will start with some initial state which could be either a partial or a noisy version of one of the memories or a random configuration altogether once the initial condition is set we will iteratively try to lower the energy value by focusing on updating one neuron at a time let's denote the neuron we current ly considering as neuron I we will calculate the total weighted input to it from all other neurons in the network this input which will'll denote as hi

is the sum of the states of all other neurons multiplied by their respective connection weights if hi is positive it means that the weighted sum of the other neuron States is in favor of neuron I being in the plus one state conversely if h I is negative it suggests that neuron I should be in the minus one state to minimize the conflict with the other neurons so we will update the state of the neuron eye based on the sign of hi notice that this update is guaranteed to decrease the energy of the network because from

the two candidate States we are selecting the more energetically favorable one you can think of this as a kind of a voting process each neur looks at the states of all other neurons weighted by the strength of their connections and decides whether to be active or silent based on the majority vote we'll go through this process for each neuron in the network one by one chosen in random order updating their states based on the input from all other neurons once we've updated all neurons we will have completed one iteration of the network inference and decreased

the systems energy by a little bit we'll keep repeating this process doing these sweeps through all neurons updating them one at a time based on the current configuration as we do this the network will gradually evolve towards a configuration that minimizes the overall energy at some point however we will reach a configuration where flipping any neuron would lead to an increase in energy so no further adjustments would be necessary at that point the network has converged to a stable configuration where each neuron State agrees with the majority vote this stable configuration represents a local minimum

in the energy landscape now you might be wondering is the network guaranteed to reach such a stable configuration could we possibly stumble into a particularly unlucky set of states and get stuck in a never ending Loop of flipping neurons back and forth in other words is such iterative flipping of one neuron at a time equivalent to doing a descent along the energy surface this is where we come back to the point about symmetric weights it turns out that there is a mathematical proof that I'm not going to cover here stating that as long as your

weights are symmetric this simple majority vote single neuron update rule is guaranteed to eventually converge to a stable configuration if you do it enough times to restate it the hopfield network can settle into different local Minima based on its initial conditions these local Minima in the energy landscape correspond to distinct memories stored in the network when we initialize the network with a pattern that is similar to one of these memories in some way and let it evolve it will fall into the nearest local minimum effectively recalling the complete stored memory thus performing parent completion or

noise correction but so far we haven't talked about how we come up with the set of connection weights that incode specific memories in the first place so let's explore the learning process before we move to storing several memories let's consider memorizing a single pattern of State that means the network would have a single Global minimum one energy well and would converge to the same pattern every time no matter where you initialize it while it has little practical use it provides a nice starting point to describe the learning procedure let's denote the template pattern that we'd

like to store as C which is a vector packing the states of all neurons and CI will denote the I component the state of I neuron encoding the memory while XI refers to the state of I neuron in the network in general which could be tweaked revisiting our definition of energy we want to set W J so that this quantity would be at its minimal value for the memory pattern if we plug x i equal to CI we get the equation for the energy of the reference pattern as a function of Weights which we want

to turn into a global minimum notice that we don't really care about the absolute value of that energy as long as the energy of the desired memory pattern is less than the energy of any other configuration now intuitively the lowest possible energy is obtained when all the connection weights fully align with the state pairs but when we have just a single pattern this is very easy to do all we need is to set the weight wi J to be the product of the corresponding pair of states in the memory pattern this way every connection is

satisfied and the energy of the network when it is in the state C becomes the negative of the total number of edges when the network is in the state C any single flip of a neuron would increase the energy thus making it a stable state I want to reiterate The crucial Point here if we want to come up with the set of Weights that would dig an energy well around some pattern then all we need to know are the pairwise relationships between states in that pattern if the two neurons are active together in the source

memory strengthening the connection between them lowers the Hop field energy of that memory effectively storing it in the weights for associative recall you may have heard the famous statement from Neuroscience attributed to Donald Hab neurons that fire together wire together and in fact what we just did is known as the hean learning rule great so we found a way to make a single pattern a stable state of the network but we want to store multiple patterns how do we do that here is the key idea we can simply sum the weights we would get for

each P pattern separately so if we have three patterns C1 C2 and C3 we can set the weights according to the following equation what this will do is turn each of the patterns into a local minimum it's pretty straightforward to show mathematically and if you're interested I encourage you to check out the references in the video description however intuitively if the patterns you want to store are very different so they are are far away in the state space from each other then if you first independently dig energy wells around each of them and then simply

add the energy Landscapes together the resulting surface will have local Minima in the same three valleys and this nicely brings us to the limitation of the hopfield networks there is a limited number of valleys we can sculpt in the energy landscape before they start to interfere with each other at some point if we try to store too many patterns the network will fail to converge to a stored pattern reliably and recall weird in between kind of memories the total maximum number of patterns you can store is thus limited and Depends only on the size of

the network it is approximately 0.14 times the number of neurons so if you have a Hope buil Network of neurons you can reliably store less than 14 patterns in the best case scenario if you are unlucky however and some patterns are similar to each other or correlated their energy Wells Will begin to interfere even before you reach the full capacity all of this makes vanilla hopfield networks not useful for practical purposes however to this day they provide a powerful and intuitive model of associative memory a simple network of neuronlike units that can store and retrieve

pattern through purely local learning and inference rules despite their limitations hopfield networks have laid the groundw for more advanced energy- based models in one of the next videos we will look at the extension of the hopfield networks known as boltzman machines these generative architectures introduce additional hidden units and stochastic Dynamics allowing them to learn more complex probability distributions there is also an extension to Modern hopefield networks published in 2016 with John hopfield himself as one of the authors but that's a topic for another time in the meanwhile I'd like to take a moment and thank

short form who are kindly sponsoring today's video short form is a platform that lets you supercharge your reading and gain valuable insights from books their unique approach of book guides goes way Beyond simple summaries by providing a comprehensive overview of the material not only do you get a concise version of the main points but you also benefit from related ideas sourced from other books and research papers on the topic they have an actively growing library of books from all sorts of genres such as science health and Technology not only that but there is a useful

AI powered browser extension that allows you to generate similar guides for arbitrary content on the internet personally I found short form to be really helpful both when I'm choosing books to read and writing notes and flashcards on the topic don't hesitate to bring your reading to the next level by clicking the link down in the description to get five days of unlimited access and 20% off on annual subscription if you like the video share it with your friends press the like button and subscribe to the channel if you haven't already stay tuned for more computational

neuroscience and machine learning topics coming up goodbye and thank you for the interest in the [Music] [Applause] [Music] brain the bells ringing Roman cqu singing be my soul sh missionaries in a far field for some reason I can't explain I know St Peter won't call my name never word but that was when I red the world