hey guys it's Mark from Ace tutors and in this video I'm going to explain at a very high level one of the biggest Concepts covered in statistics hypothesis testing now there are tons of different types of hypothesis tests and I know they can all be confusing at times so for this video I'm just going to describe what hypothesis testing even is what it's used for and the general process you can follow for all of the different types in subsequent videos I'll dive deeper into each of these different types to explain them in detail and show
you how to solve them with lots of examples because this topic is pretty involved I'm going to go into a good amount of detail to be thorough if you feel comfortable with certain steps but are fuzzy on other ones feel free to skip around as you see fit I'll try to split the video into chapters as best I can to help okay so first what even is hypothesis testing well hypothesis testing is a process where you first make an assumption about a population and then use data from a sample to test how plausible your assumption
is if you haven't yet seen our video explaining populations versus samples or you're fuzzy on those Concepts I'd recommend you take a minute to peek at that video first those two ideas are going to be crucial for hypothesis testing as we move on ultimately in hypothesis testing the goal is to figure out something about the overall population and we do that by first making some assumption about the thing we want to know about the population whether it's it's mean or standard deviation or proportion or whatever it is and since we usually can't gather information for
everything in a population we use a sample and all the cool things we learn in statistics to test whether that original assumption is plausible okay so that's the overall idea of hypothesis testing however there are a ton of different types of hypothesis tests depending on what you're trying to figure out to name a few that you may have heard of there are one sample Z tests and one sample T tests and two sample T tests and an NOA tests and Ki Square tests and so many more I'll dive into each of these in future videos
but for now I want to go over the general process that you can follow to do any of these types to solve any hypothesis test all you need to do is remember the five C's first you need to create your hypothesis then you need to check certain conditions are true to make sure your data is good next you calculate your your test statistic and or P value then you compare your test statistic or P value to your critical or Alpha value and finally conclude based on your findings now if that sounded like a lot of
garbage don't worry I'm going to break down each of these steps to explain what they actually mean okay so first we need to create our hypothesis this step is the same as making our assumption about the population like we discussed before to do this step we need to make two hypotheses the first we write as H with a subscript o and it's called the null hypothesis the second is called the alternative hypothesis and is written as H with a subscript a the null hypothesis is what we assume about the population if everything is as expected
the alternative hypothesis on the other hand is what we want to test about the population to see if our assumption is actually wrong to figure out how to write these I usually like to write my alternative hypothesis first then based on that the null hypothesis is just the opposite of that one for example let's say you saw a report that said the mean income in the US was $70,000 but you think it's actually less than that since you want to test to see if the population mean income is less than $70,000 your alternative hypothesis is
Mu or the mean is less than 70,000 since this is your alternative hypothesis and the null hypothesis is just the opposite of this that means the null hypothesis is that mu is greater than or equal to 70,000 now for this example we believe the true mean was less than the assumed value but there are two other common scenarios you might see the first is where you believe the true population mean is greater than a certain value which would make our alternative hypothesis mu is greater than 70,000 and the null hypothesis mu is less than or
equal to 70,000 and the other scenario is if you think the true mean is just something different than 70,000 it could be higher or lower but you just don't think it's equal to 70,000 in that case your alternative hypothesis would be mu is not equal to 70,000 and your null hypothesis would be mu is equal to 70,000 these are the three main scenarios you'll come across and we'll see each of these in future examples and videos but for now one of the key things to notice about each of these is that no matter the example
the null hypothesis is always going to have some form of an equal sign in it whether it's greater than or equal to less than or equal to or just equal to some teachers go a step further and just always set the null hypothesis to Mu is equal to whatever but for consistency throughout our videos we are going to set our null hypothesis to the opposite of the alternative hypothesis additionally in the couple examples I laid out we used means but you can also see these types of problems for proportions as well as some other measures
in some more complex hypothesis testing problems I'll cover each of these different types in detail in future videos all right after you created your hypothesis the next thing you'll do is check your conditions each type of hypothesis test has a different set of specific conditions that need to be satisfied in order to be able to conduct the test essentially what you're trying to do is make sure that the data that you have for your sample will be a good representation for the overall population there are various conditions based on the randomness of the sample you
picked and the sample size and various other things I'll go into greater detail about the conditions for each type of test in future videos but for now just know that in this step you need to make sure those conditions are met before moving on once you confirm that your conditions are met we are ready to get into the meat of hypothesis testing but before you start crunching numbers and making conclusions let's first draw a picture of the distribution matching our problem I find it very easy to get lost in all the numbers for hypothesis testing
so I definitely encourage you to draw a picture of your distribution each time now here I chose to draw the normal distribution because it's one of the most common but depending on the type of hypothesis test you're doing it might actually be a t distribution which looks more like this or an F distribution or Kai squ I'll go over which distribution to use for each type of problem in other videos but ultimately the idea is the same so we'll just use a normal distribution for this explanation now let's fill out some of the information we
know so far but what do we fill in well I mentioned before the first thing we do for hypothesis testing is make an assumption about our population so we're going to draw our distribution assuming that assumption's true for the example we used before since the assumed population mean was $70,000 we can set the middle of the normal distribution right at 70,000 or me then when we take a sample of our US population say we find the mean income of the sample AK xar was actually $65,000 which might be somewhere like here in the distribution clearly
our sample mean of 65,000 is less than the assumed population mean of 70,000 does this give us enough evidence to say that the true population mean is actually less than $70,000 it might but we might also have gotten our sample mean of 65,000 just by chance due to the people that we surveyed we really can't tell just yet but that brings us to our next C calculate ultimately we want to calculate what the probability is of getting a sample mean of $65,000 just by chance if the real population mean is actually 70,000 but how do
we calculate this probability well the first step is to get our data in a format that's easier to work with instead of dealing with our distribution in terms of income data let's convert this into a standardized format instead of centering the distribution at 70,000 we're going to center it at zero and using the different statistics formulas we learned or a calculator we're going to convert our sample mean of 65,000 to something called a test statistic now now all a test statistic is is a standardized value for our sample mean or sample proportion we ultimately want
to calculate what the probability of getting a sample mean of 65,000 is and this is the first step to help with that so let's say we use the formulas from class or a calculator to determine that our test statistic is some number we'll call Z now I used a z for zcore here because we're saying it's a normal distribution but depending on the type of problem and distribution it could really be a z or t or F or ki^ squ or some other form of a test statistic it doesn't really matter for this highle discussion
the important thing to recognize is that we converted our sample mean of 65,000 into a standardized value called a test statistic now that we have our test statistic we can now calculate the probability of getting the sample data that we got this probability that we find is called a P value we calculate this p value by using the tables from class or a calculator to get some number and we can even add this to our drawing the P value can be drawn as a shaded region but what portion of the distribution do we shade to
determine this we can look back to our hypotheses if you recall from the beginning the alternative hypothesis is what we're actually trying to test and since we were trying to test to see whether the population mean was actually less than $70,000 we ultimately calculated the probability of getting a sample mean of 65,000 or less because of this our P value represents the area to the left of the sample mean or test statistic so in the end you can look to your alternative hypothesis to tell you which side to shade if the alternative hypothesis has a
less than symbol you shade the area to the left if the alternative hypothesis instead had a greater than symbol you would shade the area to the right and when there's a not equal two symbol in the alternative hypothesis this is what they call a two-tail test and the area we shade gets a little more complex so I'll cover that Concept in a later video as well okay at this step we would have just calculated the probability of getting a sample mean of 65,000 when the true population mean is in fact 70,000 but how do you
know whether the probability you get is low enough to indicate that the null hypothesis is wrong and the true mean is actually less than 70,000 well that brings us to our fourth C compare once you have your test statistic and P value you need to compare these values that you get with some threshold to make this decision about your population this is where the critical value and Alpha value that I mentioned earlier come into play these are what define the threshold for this comparison the alpha value is the probability threshold below which you would be
sufficiently confident that your original assumption or the null hypothesis is wrong this value is moderately arbitrary and is decided at the beginning of the problem some common values are 01 05 and 0.1 the critical value is then just the standardized value ztf ki^ squ or whatever that corresponds to this Alpha value for your distribution for example we'll call this Zar then you can add this critical value and Alpha value to your distribution drawing to help you compare them let's say they are here once you have your test statistic P value critical value and Alpha value
you can now make your comparison and there are a couple ways to do this you can either compare the alpha value to the P value or compare the critical value to the test statistic either way will work if you want to compare the test statistic and critical value you should compare the absolute value of each of these that way you can account for the fact that these values can be negative if you're comparing the P value to your Alpha value then you can just compare them as is because these are probabilities and probabilities are always
positive after you compare them you can make a conclusion about your hypotheses based on which one is larger when it comes to your conclusion there are only two different conclusions you can make the first is that you reject your null hypothesis in favor of your alternative hypothesis because the probability that your sample data was so different from your original assumption is low enough to give you sufficient evidence that it didn't just happen by Chance the other one is that you failed to reject your null hypothesis and this wording is very important you never accept that
the null hypothesis is true all that you can conclude is that you don't have enough evidence to say that your original assumption the null hypothesis is wrong okay now that we covered what the two conclusions are let's discuss when you would conclude each one first let's start with the conclusion of rejecting the null hypothesis if you decide to compare your test statistic and critical value you would make this conclusion whenever the absolute value of your test statistic is greater than the absolute value of your critical value alternatively if you compare your P value and Alpha
value you'd make this conclusion if the P value is less than your Alpha value one trick I always use to remember this rule is the saying if p is low reject ho in other words if the P value is lower than the alpha value you reject the null hypothesis ho since I have that saying stuck in my head I usually prefer to always compare my P value and Alpha value however feel free to use whichever technique you find easier no matter which method you choose both of these situations will tell the same story this says
that the probability of getting your sample mean or proportion or whatever if your original assumption about the population was true is so low that you can confidently reject your null hypothesis and it is usually good to conclude in terms of your problem so for our income example we could say that because our P value of blank was less than our Alpha value of blank we have sufficient evidence to reject our null hypothesis in support of the claim that the true mean income for people in the US is less than 70,000 and while you can compare
the numbers directly I do have one more tip to potentially help you make this conclusion in your distribution drawing if you add in your critical value or Alpha value the Shaded region representing the alpha value depicts something called the rejection region since this is the threshold for determining your conclusion if your test statistic or P value end up in this rejection region we' make the conclusion of rejecting the null hypothesis as well this ultimately tells the same story as the rule we went over but if you are more of a visual person this may help
a bit as you can see if our test statistic and P value are in the rejection region in either the right tailed or left tailed cases this would mean that the absolute value of the test statistic is greater than the absolute value of the critical value and the P value is less than the alpha value this also works for the other conclusion but let's go over the rule first for the other conclusion of failing to reject the null hypothesis everything here is just the opposite this time if you're comparing the test statistic and critical value
you would make this decision whenever the absolute value of your test statistic is less than the absolute value of your critical value if you're comparing the p and Alpha values you would similarly fail to reject the null hypothesis whenever the P value is greater than the alpha value in these cases this says that the probability of getting the sample mean or proportion or whatever if your original assumption about the population was true is high enough that you can't confidently reject your null hypothesis and say it isn't true and to put it in terms of our
income example we would say that because our P value of blank was greater than our Alpha value of blank we don't have sufficient evidence to reject our null hypothesis in support of the claim that the true mean income for people in the US is less than 70,000 going back to our rejection region visualization this would mean that the test statistic and P value would be outside of the rejection region like this where the absolute value of the test statistic is less than the absolute value of the critical value and the P value is greater than
the alpha value and just as a note this goes for either the right tailed or the left tailed cases as well okay so that's the very high level explanation of hypothesis testing I know this concept can be very confusing so i' encourage you to rewind and take a look at the parts you're struggling with a couple times to get the overall idea and process down there are tons of different types of hypothesis tests and I'll create videos diving deep into each one of these but if you learn one thing from this video remember the five
c's for solving these types of problems first create your null and alternative hypotheses then check your conditions to see if you can do the test next calculate your test statistic and P value then compare your test statistic and critical value or P value and Alpha value and finally conclude based on your results I hope you found this video helpful and if you did please hit that subscribe button to support us making more of these videos for you if you didn't please leave us a comment down below to let us know what we can do better
thanks again for watching and remember you have Big Dreams don't let a class get in the way [Music]