If you can fit a line you can fit a squiggle if you can, make me laugh you can, make me giggle stat quest Hello, i'm josh stormer and welcome to stat quest today we're going to talk about logistic regression This is a technique that can be used for traditional statistics as, well as machine learning so let's get right to it Before we dive into logistic regression let's take. A, step back and review, linear regression in Another stat quest, we talked, about linear regression We had some data Weight and size then, we fit a line to it and With that line, we could do a lot of things First we could calculate r-squared and determine if weight and size are correlated large values imply a large effect, and Second calculate a p-value to determine if the r-squared value is statistically significant and Third, we could use the line, to predict, size given weight if a, new, mouse has this weight Then this, is the size that, we predict, from the weight although We didn't mention it at the time using data to predict something falls under the category of machine learning So plain old linear regression is a form of machine learning We also talked a little bit about multiple regression Now, we are trying to predict, size, using weight and blood volume Alternatively we could, say that, we are trying to model size using weight and blood volume Multiple regression, did the same things that normal regression did we calculated r-squared and we calculated the p-value and We could predict, size, using weight and blood volume and This, makes multiple regression a slightly fancier machine learning method We also talked, about how, we can use discrete measurements like genotype to predict size if you're Not familiar with the term genotype don't freak out it's. No, big deal just know that it refers to different types of mice lastly, we could compare models So on the left side we've got normal regression, using weight to predict size and We can, compare those predictions to the ones, we get from multiple regression, where we're using weight and blood volume to predict size Comparing the simple model to the complicated one tells us if we need to measure weight and blood volume to accurately predict Size or if we can get, away, with just weight Now that we remember all the cool, things, we can, do with linear regression Let's talk, about logistic regression Logistic regression is similar to linear regression except Logistic regression predicts whether something, is true or false instead of predicting something continuous, like, sighs these mice are obese and These mice are not Also instead of fitting a line to the data logistic regression fits an s-shaped logistic function The curve goes from zero to one?
And that, means that the curve tells you the probability that a mouse is obese based on its weight If we weighed a very heavy mouse? There is a high probability that the new, mouse is obese? If we weighed an intermediate mouse Then there is only a 50% chance of the mouse is obese?
Lastly, there's only a small probability that a light mouse is obese Although, logistic regression, tells the probability that a mouse is obese or not it's usually used for classification For example if the probability of mouse is obese is greater than 50% Then we'll classify it as obese Otherwise we'll classify it as not obese Just like with linear regression, we can, make simple models in this case, we can have obesity predicted, by weight or? more complicated models in this case obesity is predicted by weight and genotype in This, case, obesity is predicted. By weight and genotype and age and Lastly, obesity is predicted by weight genotype, age and Astrological sign in other words just like linear regression logistic Regression can work with continuous data, like weight and age and discrete data like genotype and astrological sign We can, also test to see if each variable is useful for predicting obesity however Unlike normal regression, we can't easily compare the complicated model to the simple model and we'll talk more about, why in a bit Instead we just test to see if a variables affect on the prediction is significantly different from zero If not it, means that the variable is not helping the prediction We use, wald's tests to figure this out we'll talk, about that in another stat quest in This, case, the astrological sign is totes useless That statistical jargon for not helping That, means we can, save time and space in our study.
By leaving it out Logistic regressions ability to provide probabilities and classify, new samples using continuous and discrete measurements Makes it a popular machine learning method One big difference between linear regression and logistic regression is how the line is fit to the data With linear regression, we fit the line, using least squares In other words, we find the line that minimizes the sum of the squares of these residuals We also use the residuals to calculate r. Squared and to compare simple models to complicated models Logistic regression doesn't have the same concept of a residual so it can't use least squares and it can't calculate r squared instead it uses something called maximum likelihood There's a whole stack quest on maximum likelihood so see that for details but in a nutshell You, pick a probability scaled.