Probability Part 1: Rules and Patterns: Crash Course Statistics #13

513.2k views2151 WordsCopy TextShare

CrashCourse

Today we’re going to begin our discussion of probability. We’ll talk about how the addition (OR) rul...

Video Transcript:

Hi, I’m Adriene Hill, and Welcome back to Crash Course, Statistics. If you’ve ever seen a face in an onion, or a grilled cheese, or any other inanimate object, you’ve experienced pareidolia, which is a product of our brains that causes us to see the pattern of a face in non-face objects. This happens because our brains are so good at seeing patterns, that they sometimes see them when they’re not really there, like a face in this bell pepper.

And faces aren’t the only patterns we see. Our brains recognize patterns in everything, especially in sequences of events, like the kind we’ll see today as we start talk about Probability. INTRO Alright, first let’s just establish a more specific definition of what probability is, because the way we use the word in everyday life can be different from how we use it in Statistics.

Statisticians talk about two types of probability: empirical, and theoretical. Empirical probability is something we observe in actual data, like the ratio of girls in each individual family. It has some uncertainty, because like the samples in experiments, it’s just a small amount of the data that is available.

Empirical probabilities, like sample statistics, give us a glimpse at the true theoretical probability, but they won’t always be equal to it because of the uncertainty and randomness of any sample. The theoretical probability on the other hand, is more of an ideal or a truth out there in the universe that we can’t directly see. Just like we use samples of data to guess what the true mean or standard deviation of the population is, we can use a sample of data to guess what the true probability of an event is.

Say you play a slot machine over and over, you’ll be able to guess the probability of winning the jackpot by counting the number of times you win, and dividing it by the number of times you played. If you play 100 times and win 6 times, you can be pretty sure that the probability of getting a jackpot is around 6/100 or 6%. Now this isn’t to say that you can rule out that the true probability is 5% or even 10%, but you’re relatively sure that it’s close to 6%, and not, say 99%.

So, the empirical probability can be a good estimation of the theoretical one, even if it’s not exact. So far we’ve been talking about the probability of just one event, but often there may be two or more events that we want to consider, like what if you want to know the probability of picking a purple OR a red skittle from a bag. The proportion of each color in a bag of Skittles is roughly equal, 20% for each of the 5 colors.

So let’s say you randomly select a Skittle without looking. For this, we need the addition rule of probability. Since a Skittle can’t be two different colors at once, the color possibilities are Mutually Exclusive.

That means the probability of a Skittle being red AND purple at the same time is 0. So, we can use the simplified addition rule which says that the probability of getting a Red or Purple Skittle is the sum of the probability of getting a Red, and the probability of getting a purple. Since we’re going to be talking a lot about probability in the next few episodes I’m going to introduce a little notation.

Instead of writing out “the probability of Red” we can use the notation P(Red). The probability of getting a red OR purple would then be written P(Red or Purple). So far we know what the probability of Red or Purple is, P(Red) + P(Purple), or 0.

2+0. 2 That equals 0. 40 or 40%.

I like all skittles so the probability that I will get a skittle I like is 0. 2 + 0. 2 +0.

2 +0. 2 +0. 2.

That’s 100%. Good odds! Red and Purple Skittles are mutually exclusive, but not all the events we’re interested in are.

For example, if you roll a die and flip a coin, the probability of getting a tails is not mutually exclusive of rolling a 6, since you can both roll a 6 and flip tails in the same turn. Since P(tails or 6) ≠ 0, these two events are not mutually exclusive, and we’ll need to adjust our addition rule accordingly. The full version of the addition rule states that P(tails or 6) = P(tails) + P(6) - P(tails and 6).

When two things are mutually exclusive, the probability that they happen together is 0, so we ignored it, but now the probability of both these things happening is not zero, so we’ll need to calculate it. You can see here that there are 12 possible outcomes when flipping a coin and rolling a die. There are 6 outcomes with a tails, and 2 outcomes with a 6.

If we add all of those together we get 8, but by looking through the chart, we can tell that there are only 7 possible outcomes that have either a tails or a 6. When we count T’s and 6’s independently, we double count the outcomes that have both If we didn’t subtract off the probability of (tails and 6), we would double count it. Let’s put these probabilities into a Venn Diagram we can see even more clearly why we need to subtract P(Tails and 6).

If this Circle is all the Times we flip tails, and this circle is all the times we roll a 6, this overlapping area is counted twice if we simply added the two circles together. In this simple case, we could easily see what the probability of tails and 6 is, but sometimes it’s not so easy to figure out. That’s why we have the Multiplication Rule, which helps us figure out the probability of two or more things happening at the same time.

Let’s say you just found out that actor Cole Sprouse goes to your local IHOP pretty often, and there’s a 20% chance that he’ll be there for dinner any given night. And yeah, I know that’s not how people work but we’re going to say that’s how Cole Sprouse works. Anyway to top that off, your local IHOP has a promotion where they randomly select certain nights to be “Free Ice Cream Night” in the hopes that customers will keep coming back in case that night is the night.

Each night there’s a 10% chance that it will be “Free Ice Cream Night”. Now, you love ice cream and you like like Cole Sprouse--as do we all--and your perfect night would include them both. So you try to calculate the probability that will happen on your visit tonight.

Using the multiplication rule, multiply the probability that Cole Sprouse will be at IHOP, 0. 2, with the probability that it will be Free Ice Cream Night, 0. 1.

And you come to the sad realization that there’s only a 2% chance that you’ll get to see Cole and get free dessert tonight. When we want to know the probability of two things happening at the same time, we first need to look at only the times when one thing--Cole is at IHOP--is true, which is 20% of the time. Now that we reduced our options to just Cole nights, out of all these Cole times, how often is it free ice cream time?

Only 10% of Cole nights. 10% of the original 20% leaves only a 2% chance that both will happen at the same time. But you could always change your expectations and calculate the probability of getting either by using the addition rule.

Cole or free Ice cream which, is calculated by adding the probability of Cole, to the probability of Free Ice Cream, minus the probability of both--so we don’t double count anything. You realize that there’s a 28% chance that something good will happen tonight, so you decide to still go, no matter what you’re going to get French Toast. Cole Sprouse and Free Ice Cream Night are independent.

Cole doesn’t have any secret knowledge about when it’s Free Ice Cream Night, so it has never affected his decision to come. Two events are considered independent if the probability of one event occurring is not changed by whether or not the second event occurred. In more concrete terms, if Cole’s decision to go to IHOP is independent of IHOP’s decision to give out free ice cream, than the probability of Cole showing up should be the same on both ice cream and non ice-cream nights, since he’s just choosing randomly.

We write conditional probabilities as P(Event 1 | Event 2). Conditional probabilities tell us the probability of Event 1, given that Event 2 has already happened. If two events are independent--like Cole and ice cream night--then we expect P(Cole | Ice Cream Night) to be the same as just plain ole P(Cole), since the two things are unrelated.

If P(Cole | Ice Cream) wasn’t the same as plain ole P(Cole), then that means that Ice Cream night might somehow affect Cole’s decision to show up at IHOP. We calculate conditional probability P( Event 2 | Event 1) by dividing the probability of Event 1 and Event 2 by the Probability of Event 1. The role of conditional probabilities are particularly important when we consider medical screenings.

For example, when screening for cervical cancer it used to be recommended that all adult women get screened once a year. But sometimes the results of the screenings are wrong. Either they can say there’s something abnormal when there isn’t (called a false positive) or that everything is all clear when it’s really not (called a false negative).

This is exactly the kind of scenario where knowing the likelihood that something is actually abnormal in this case cervical cancer given that you’ve gotten positive tests results would be useful. That is P(Cancer | Positive Test). When looking at the data of people who DON’T have cancer, 3% will get a false positive.

And people who DO have cancer will get false negatives 46% of the time. This means we miss a lot. And maybe freak some people who don’t need to be freaked out.

The logic of conditional probabilities can help us make sense of why doctors have recently recommended that these tests be done less frequently in some cases. In the United States, the rate of cervical cancer is about 0. 0081%, so only about 8 in 100,000 women get cervical cancer.

Using our rates of false negatives and positives, we can see that for every 100,000 women in the US, only about 4. . .

and we’re rounding here of the about 3,004 people with positive tests actually had abnormal growths. That means the conditional probability of having cancer, given that you got a positive test is only 0. 1%.

Give or take. We're rounding. And these positive tests require expensive and invasive follow up tests.

And I just want to point out that conditional probabilities aren’t reciprocal. That is to say P(Cancer|Pos Test) isn’t the same as P(Pos Test| Cancer) which would be about 50%. In real life you’re not always going to know the probability of Cole Sprouse showing up at the iHop--he’s unpredictable that way.

Unpredictable like . . .

pretty much the rest of life. It can be very difficult to put a specific probability on a lot of everyday situations. Like how likely it is that your teacher will call-in sick today.

Like whether or not you’re going to catch all the red lights on your way to school. Probabilities can--as we’ve seen--require a lot of calculations--and there’ not always time for that. But that doesn’t mean they belong only on the school-only side of your brain.

Say you want to go out on a Friday night with friends. More than anything you don’t want it to suck. Last week you wound up on the couch watching Sandy Wexler, again.

You know it’ll be hard to get tickets to see Black Panther so you make a backup plan just in case. You can always stream Get Out without stealing it of course. But if you’re determined to see Black Panther in the theater, Probability will help set your expectations.

If you’ll only settle for center row tickets you’re more likely to be disappointed. Your chance of seeing Black Panther is going to be greater if you’re willing to settle for whatever tickets you can get. Probabilities help us understand why it makes sense to apply to more than one college.

Why we should shouldn’t expect that the first short story you write will be get an A and be published in the New Yorker. And how likely it is that you’ll get mono. Given your significant other has mono.

Probability can help you figure that out too. Thanks for watching. We'll see you next time.