From Physics God Equation to Linear Regression in Machine Learning

2.54k views2478 WordsCopy TextShare

CompuFlair

https://calendly.com/compuflair/data-science-consultation Schedule a 1 Hour Consulting session to (...

Video Transcript:

hi in this video I will use a great physicist genius idea whom I will introduce at the end of this video to derive machine learning linear regression out of what I like to call the god equation I will present something that to me looks like a miracle that makes the god equation take a universal form in widely different systems like the stock market three forces of nature the neurons in a mouse brain human blood test super fluids and many more hi I'm Aran B if it's your first time watching my videos I have 11 years

of data modeling and data analysis experience I have a PhD in high energy physics and the master of mathematical physics I also have been a data science fellow of the US National Library of Medicine and the post doctoral researcher at Syracuse University in the previous video we discussed that the god equation governs any given spreadsheet of data regardless of what system the data is coming from whether it is a human blood test the stock market or physical systems like like super fluids we also discussed that the god equation is nothing but the probability function that

tells us the chance of a given Row in our spreadsheet happening again I also mentioned that the probability function can have any form and the forms are different from a spreadsheet to a spreadsheet now I have good news and bad news let me first tell you the bad news we never know the true form of the probability function that covers the spreadsheet even in physics which is known as an exact science the true reality of events is not known that means even in physics we don't know the true shape of probability that governs the observations

remember Newton his theory of gravity has a domain of validity beyond that domain we need to upgrade to Einstein's general relativity but this Theory also works well in a certain domain for example we know that it fails if we go to various small subatomic distances so these are our models for reality but not the reality itself the same thing is true in machine learning the true form of the probability function the god equation that rules our data set Spreadsheet will remain unknown forever okay done with the bad news let's now see the good news the

good news is that most of the time we don't need to know the true form of the probability function if we can wait some time for the system to settle and then collect some data in the domain that we want to model to see what I just said let's pull up the God equation we know that this is a probability function we also know that if we sum over all the possible values it must be equal to one that means Z is the sum over the numerator of the god equation therefore the only unknown in

this equation is f which to be consistent with the physics literature we call it effective free energy of the system remember the tailor series regardless of the true form of f we can always express it in the form of a tailor series and use data to estimate the constants but there is a little bit of a problem the series has an infinite number of constants to estimate how can I possibly estimate them all the answer is due to a miracle that happens most of the time and I will discuss it shortly the higher the order

of terms the lower their importance in other words a fair estimate of f would be the following equation and I can always add a few more terms to improve my estimate of the effective free energy let me now discuss the miracle using three examples the stock market will be my third example a bunch of particles in a container is the second example and here is the first example imagine I leave this ball here and let's assume there is no friction to stop it it will fluctuate around the minimum of gravitational potential let's now Define a

variable X to be the balls distance from this minimal this x corresponds to the column name in my spreadsheet that I would like to analyze if at a random time I measure x what would be the chance that the ball is right at the minimum highly probable right it's because the ball is fluctuating around the minimum now what would be the chance that X would be this far away from the minimum almost zero right so we just observe that a small values of X are very probable that means X is almost always close to zero

let's pick a random number close to zero for example .01 which one is larger X to power 2 or x to^ 3 or x to^ 4 we see that x to^ 2 is very larger than x to^ 3 and x to^ 3 is very larger than x to power 4 that is the reason why when I expand the effective free energy in the god equation I can keep the first few terms and throw out the rest if I need a better estimate I can add a few of the ignore terms that are more significant than

the rest although in this example the ball was moving on gravitational energy and not effective free energy but still the example was a good Dem demonstration of the concept as the second example we discuss a system that actually fluctuates around the minimum of its effective free energy let's see what I mean here you see a bunch of particles released at the corner of a container after the release particles quickly move and fill the entire container that is because we have this empirical law that A system that is left on its own will evolve to its

maximum disorder State Maximum entropy State and will stay there that maximum entropy state is the state of of equilibrium let's now say I am interested in analyzing the density of particles inside this area of the container so the column name in my spreadsheet would be the density of the particles at the bottom left corner of the container if you watch this system for as long as we want the density of particles at the bottom left corner will only fluctuate around the constant value which happens to be the mean density it never drops to exact zero

and never Rises to very high values that means means the density minus its mean will stay close to zero at any time that you wish to measure it just like the X variable that we defined in the first example what this animation is telling us is that the probability of finding a particle's density close to its mean is very high and the probability of finding the density far from the mean is nearly zero so I can already feel that the probability as a function of density has a maximum at the mean of the density from

the mathematical form of the G equation we can immediately say that when probability is maximum the effective free energy is minimum for example in this plot you can see both the effective free energy and the probability of a gausian distribution as you can see at the maximum of the probability the effective free energy is at the minimum so what this animation is telling us is that when the system is at equilibrium it fluctuates around the minimum of its effective free energy and the minimum happens to be at the mean of the variable in this case

the density therefore if I wait for a while to make sure the system is in a steady state I can Define the X variable to be the density minus the mean density and this variable will always remain close to zero now if I expand the effective free energy in the god equation around the mean of the density the second order term will be the most important one and higher order terms will have lower significance and I can drop them for now question why the second order term is the most important one how about the first

order term the answer is that in the tailor series the coefficient of the first order term is equal to the first derivative of the effective free energy at its minimum and we know that the derivative of a function at its minimum is zero so far we discussed physical systems how about non-physical systems well it turns out that even non-physical systems move toward their maximum disorder State Maximum entropy and stay there as their equilibrium State and since it's a stable state it means they fluctuate around their mean value at all times afterward that means the chance

or probability of finding the system around the mean is highest equivalently that means the system is at the minimum of its effective free energy and again the second order term in the tailor series is the most significant now let's discuss the stock market a bit or when the co just spread out it acted like an external Force to blow away the stock market from the minimum of its effective free energy and it took a little while for the market to evolve and find another minimum of effective free energy to settle at if a system is

still rapidly changing which means it hasn't settled at a minimum of its effective free energy yet we cannot expand its effective free energy and keep lower order terms in other words the system is kind of unpredictable at that moment in the stock market that means trading is highly risk but if we wait for enough time the system will fall into a minimum of its free energy and stay there until an external Factor blows it away again let's now put our tailor series expansion into the god equation and see what that looks like the shape is

familiar isn't it it is the gausian form there is something called the central limit theorem which states that under certain conditions the gaussian distribution governs any spreadsheet regardless of the nature of the stock market blood test or a bunch of particles density okay so far we have discussed an effective free energy that is a function of just one variable that means we discussed a spreadsheet or a data set with just one column but in most of the situations we have more than one column in our spreadsheet let us label these columns in our spreadsheet by

X1 X2 X3 all the way to xn so now f is the function of all these n variables in this case the tailor expansion will take a form like this again the system will eventually fall into a minimum of the effective free energy if I give it enough time and I can neglect higher order terms in my first approximation so far we have treated All The Columns of our spreadsheet equally from a probabilistic point of view the god equation point of view none of these variables are special these are just variables but sometimes it is

in our interest to predict prict one or more of these variables given the rest of the variables for example in the stock market we usually want to predict the price so that column of the spreadsheet becomes the target of my prediction so we call it the target variable the rest of the variables are called the predictors because I want to use them to predict future prices the predictors are sometimes also called features just naming the convention is to label the target variable with y and the predictors as X1 X2 Etc Also let's assume the target

column is placed as the last column I can move the columns in my spreadsheet arbitrarily right now getting back to the tailor series I keep all the variable names as before except the last one the target variable which I now write as y now I factor the coefficient of Y Square out and rearrange the terms in the following form to find the beta coefficients I just need to expand the parenthesis for example data one is just the following now let's put our final form of the effective free energy to the god equation it is a

one-dimensional gaussian distribution for y the target variable those who are familiar with gausian distribution already know that this term is the mean of Y and this equation is just what is conventionally known as linear regression the bar at the top of Y is to remind us that it is not the Y itself but the mean of Y that is in this equ equation in one sense I have already proven this equation earlier in this video if you remember I showed that if I perform a tailor expansion around the mean of a variable which now is

named y I end up with this form so in principle one can play this video backward starting from this equation and go back to where we set the right hand side of this equation to the mean of the variable but even if we accept this as a proof I still haven't shown the estimates for the B coefficients using our collected data in the next to the next video I will start with something called the maximum likelihood and prove that the mean of Y is just the right hand side the same maximum likelihood also provides a

practical way to estimate the bit of coefficients using the data in our spreadsheet before I end this video I would like to travel back in time and introduce a physicist named lond he introduced an idea in physics that is basically the same as the underlying assumption in machine learning at his time super fluid was a Hot Topic in physics but it was kind of very hard to explain starting from first principles he came to the conclusion that underlying principles of a probabilistic system like the ones in machine learning play very little role in their behavior

instead the symmetries of the system determine its behaviors out of this comes the concept of universality according to that two systems that are widely different in our eyes can behave the same and mathematically be equivalent if they belong to the same symmetry class I cannot emphasize enough what a crucial role this symmetry plays in physics and how fundamental that is stay tuned take good care of yourself and I'll see you in the next video