Lec 16, Hypothesis Testing- I

52.33k views4624 WordsCopy TextShare

IIT Roorkee July 2018

Hypothesis testing, Null hypothesis, alternative hypothesis, Type I error, Type II error, Population...

Video Transcript:

[Music] [Music] [Applause] [Music] [Applause] [Music] welcome students today we are entering to a very very interesting topic that is on hypothesis testing especially this topic is going to be fundamental for coming lectures so you're to carefully you should understand the class objectives are first I will to explain how to develop null and alternative hypotheses because solving a hypothesis problem is very easy the most important is how to formulate to the hypothesis once you are very good at the formulating the hypothesis solving the problem is very easy then I am going to explain what is a type 1 and type 2 error and how this context of type 1 type 2 error is connected with hypothesis next we are going to do hypothesis testing when Sigma that is a population standard deviation is known next we will go to hypothesis testing when population standard deviation is not known then we will do hypothesis testing for the population proportion first we will go for what is hypotheses are testing hypothesis testing can be used to determine whether a statement about the value of the population parameter should or should not be rejected so hypothesis is nothing but some assumptions about the population parameter we know that most of the populations which we are going to do is going to follow a particular distribution for example a normal distribution so normal distribution having two parameter one is mean and variance so we can assume the population mean as a hypothesis and assumption otherwise you can have population variance also and hypothesis the null hypothesis denoted by h0 is the tentative assumption about the population parameter so whatever assumptions which are having that will go to the population parameter the alternative hypothesis denoted by H ei is the opposite of what is stated in the null hypothesis the hypothesis testing procedure uses data from the sample to test to computing statement indicated by h0 hijae what are the two competing statement days one is null hypothesis another one is alternative hypothesis next we will see how to develop null and alternative hypothesis it is not always obvious how the null and alternative hypothesis should be formulated we should be very careful to structure the hypothesis appropriately so that the test conclusion provides the information the researcher wants the context of the situation is very important in determining how the hypothesis should be stated in some cases it is easier to identify the alternative hypothesis first in other cases the null is easier so correct hypothesis formulation will take a practice in this lecture we are going to take some example and I am going to explain how to formulate the hypothesis whether it is null or altered hypothesis first you start with a Trudeau hypothesis as a research hypothesis most of the time the researchers wanted to prove the alternate hypothesis many applications of hypothesis testing involve an attempt to gather evidence in support of research hypothesis in such cases it is often best to begin with the alternative hypothesis and make it conclusion that the researcher hopes to support it because many of the time the researchers wanted to support his hypothesis so first you have to write the alternative hypothesis the conclusion that the research hypothesis true is made if the sample data provide sufficient evidence to show that the null hypothesis can be rejected so if we want to accept your alternative hypothesis the data which we are collected from the sample has to support the null hypothesis to reject it next alternative hypothesis as a research hypothesis example we will see some examples here in example is a new manufacturing method is believed to be better than the current method assume that in your manufacturing context some is proposing a new way of doing work a new manufacturing method so we want to test this assumption so what is uttered a hypothesis the new manufacturing method is better because that new method was given by the researcher always the researcher will believe that whatever he says is there is a support for that so first we will formulate to the alternate hypothesis that is the new manufacturing method is better the null hypothesis just to the complement of alternate hypothesis so the new method is no better than the world method we'll take another example a new bonus plan that is developed in an attempt to increase the sales now what is happening any other organization so we are introducing new bonus plan we are going to see that the bonus has any impact on the sales so what is ultra diapause is the new bonus plan increases sales so first to go for alternate hypothesis then what is the null hypothesis the new bonus plan does not increase the sales you see that whenever when you look at the null hypothesis it will say always does not increase the sales that way it is called the null nothing has happened there's a meaning of null we'll go for another example of alternative hypothesis see a new drug is developed with a goal of lowering cholesterol cholesterol level more than the existing drug so what is happening there are already there are some drugs are available to lower the cholesterol that a researcher has found some drug so that is reducing the cholesterol better than the existing medicine drink so we'll go for alternate hypothesis the new drug lowers the cholesterol level more than the existing drug so null hypothesis the new drug does not lower the cholesterol level more than the existing threads you see that that does not so this does not represent the null so nothing significance has happened that is why we are calling it is null hypothesis then null hypothesis an assumption to be challenged we might begin with the belief or assumption that the statement about the value of the population parameter is true so in the hypothesis testing context always we'll start the problem assuming that the null hypothesis is true for example in India before starting a trail suppose somebody was accused so before starting a trial we the trial will be started assuming that the person is innocent person you see what is happening the trial will be started assuming that the person is innocent the police has to bring some evidence and they have to say that it is not innocent when in other countries the person who is being suspected he has to prove his innocence so it is reverse so what is the meaning of this reverse that even though something has happened if there is no evidence that person is free we then using a hypothesis test to challenge the assumption and determine if there is a statistical evidence to conclude that the assumption is incorrect in this situation it is helpful to develop the null hypothesis first we'll take an example of how to develop your null hypothesis and null hypothesis is an assumption to be challenged example you see little the label on your milk bottle states that it contains thousand ml null hypothesis the label is correct so mu is greater than or equal to thousand ml another hint is the null hypothesis is nothing but the status quo in null hypothesis always there will be equal to sign the null hypothesis is a lucrative optimistic perspective if somebody say the bottle contains the thousand ml we are assuming that yes that assumption is correct so we formulating the null hypothesis is mu is greater than or equal to thousand ml so alternate hypothesis the label is incorrect mu less than thousand ml you see that the signs are complementary since the null hypothesis it is greater than or equal to there is here less than if the null hypothesis is less than or equal to so the alternate hypothesis it will be mu the null hypothesis is equal to the alternative hypothesis is not equal to and the null hypothesis always will have equal to sign alternate hypothesis never contained equal to sign the status quo will go to null hypothesis we ought to challenge the status quo that is nothing but your alternate hypothesis okay you see how the nature of the null hypothesis the Equality part of the hypothesis always appear in the null hypothesis that means in null hypothesis always there will be a equal to sign so when is equal to means it is equivalent to null nothing has happened that is the status quo is maintained as it is in general the hypothesis test about the value of the population mean mu must we take one of the following three forms where nu naught is the hypothesis value of the population mean you see that the hypothesis may take different forms for example mu greater than or equal to MU naught the MU naught is what do you have a similar population mean you see that the null hypothesis there is a greater than or equal to so we are writing in the alternative hypothesis less than because the signs are complementary so this test is one tailed test that is called lower tailed test so how we are calling it to lower tailed test days for example if I am drawing here we ought to look at the sign of your alternate hypothesis the sign of for alternate hypothesis is less than mu naught so it is left tailed test if anything goes beyond the left hand side we will reject it see there is another context mu H naught mu less than or equal to mu 0 so alternative hypothesis is mu greater than mu 0 here also you look at this this is a less than or equal to so complement sign is greater than so it is one tailed it is called per tail test look at the sign of four alternate hypothesis it is greater than so if it is greater than so it is called right tailed test if anything beyond this point suppose this mean beyond this point will will be rejected the last one is equal to sign mu equal to MU not so mu not equal to may not this is called a two tailed test so two tailed test is you see that the rejection area will be on both side so if the value goes below this will reject it the value goes about this rejected what is the meaning of value I will explain we will take an example to do a hypothesis testing a major hospital in Chennai provides one of the most comprehensive emergency medical services in the world operating in here multiple hospital system with approximately hundred mobile medical units the hospital is having hundreds not hundred ten mobile medical units the service goal is to respond to medical emergencies with a mean time of 8 minutes or less so the problem is that they have 10 mobile medical units they are - whenever there is a emergency they ought to respond 8 minutes or less the director of medical services want to formulate a hypothesis test that could you see a sample of emergency response times to determine whether or not the service goal of the goal of 8 minutes or less is being achieved look at this problem the director wanted to test the service goal of 8 minutes or less is being achieved see now it is like here alternative hypothesis the researchers wanted to test whether the service goal of 8 minutes or less is achieved so what will happen now the status quo the status quo is 8 minutes or less so what happened the status quo will go to null hypothesis so what is the null hypothesis the Emer services meeting the response goal so no follow-up action is required the another name why it is called null hypothesis is when you accept a null hypothesis no follow-up action is required no course of action is required so why we are saying Mew less than or equal to 8 because that is the status quo so always null hypothesis null hypothesis look at at their optimistic perspective so when I say Mew less than or equal to 8 you see that the opposite of this what is that the emergency service is not meeting the response goal that is appropriate follow-up action is necessary that is why it is called alternate hypothesis so mu greater than equal to 8 you see that here it is a less than or equal to 8 so the sign is complimentary it is greater than equal to 8 while we are item mule ism equal to the status go should go to null hypothesis okay where the mu is the mean response time for a for the population of medical emergency request so we will go to what is a type 1 error because hypothesis tests are based on the sample data we must allow for possibility of errors because the conclusion of hypothesis that is to accept a reject is based on sample data so always there is a possibility of error here type 1 error is rejecting H naught when it is true as he told you in the code context somebody is bleeding that is innocent but the judge is not accepting his innocent but really is innocent but he was his innocence was rejected that is incorrect rejection that is a type 1 error the probability of making a type 1 error is when the null hypothesis is true when the null hypothesis is called the level of significance so level of significance we call it is alpha most of the time it is five percentage what is the meaning of this five percentages the probability of incorrectly rejection is only five percentage application of hypothesis testing that one that only control the type one error are often called significance test type two error a type two error is accepting H naught when it is false it is difficult to control the probability of making a type two error status easy and avoid there is a risk of making type 2 error by using do not reject H naught instead of accept null hypothesis because in the hypothesis context when we concluded we will not say accept null hypothesis we will say do not reject null hypothesis because there is no proof for that null hypothesis is true see the context see the population condition is H naught is true you see that in the conclusion H naught is true we'll see this when you reject H naught that is called your type 1 error so that is called incorrect rejection you see the other case the H naught is false but you have accepted so that is called your type 2 error so another name for a type 1 error is incorrect rejection for type 2 error it is false acceptance we can say another example the producer risk we call this alpha alpha is called type 1 error beta consumer risk is called type 2 error what is the meaning of this producer risk and consumer is case assume that I am the manufacturer I am producer I am producing shaft so whose diameter is for example the sharp diameter is say 50 mm TMO suppose they're a supplier is coming the supplier has taken some sample from my production lot then he is rejected my lot he says that you were your production level is not meeting our specification that is 40mm there is a two possibilities there the supplier who has the vaguest measured is wrong otherwise I made the sample which have kept is not correct so that is incorrect rejection even though I have quality good products they have rejected that is an incorrect rejection that is called to produce a risk so there is another another possibility assume that I am making only 49 mm of sharped again the supplier kims he measured is 50 it is 49 but he is measure it is 50 then he has accepted my lot so that is false acceptance that is called a type 2 error there are two possibility one is the sample which I have kept that meet all his requirements but my whole lot does not meet meeting his requirement so that means my sample is not the representative of the population that is one possibility otherwise the way they have measured it that is wrong so that is called false acceptance that is a type 2 error the next lecture we will see the application of type 2 error in detail there are three approaches for hypothesis testing first approach is p-value approach most of the statistical package follow this method second method is critical value method the third one is confidence interval value method the confidence interval value method mostly used for two-tailed test first we'll go for p-value approach that is a one-tailed hypothesis testing what is the p-value the p-value is the probability computed using test statistic you should be careful test the statistic that measures the support our lack of support provided by the sample for the null hypothesis so the p-value says whether it is supporting the null hypothesis or it is not supporting null hypothesis if the p-value is very high it will support null hypothesis you will accept null hypothesis if the p-values be less it will not support null hypothesis we will reject the null hypothesis say we say that what is the test statistic the test statistics for example in the Z context the test status nothing but this one X bar minus mu greater by Sigma by root n that is the test statistic for Z test if it is a t-test this is X bar minus mu yes duder by root n so n minus 1 degrees of freedom so whatever value which have calculated with the help of sample that is called a test statistic if the p-value is less than R equal to the value of the Alpha then the value of the test statistics in the is in the rejection region I'll show you in the next slide reject H naught if the p-value is less than alpha okay the p-value is very less it is not supporting null hypothesis here to reject it where alpha is significant level we will see how to use hypothesis how to do hypothesis testing using p-value approach the p-value approach the first one is C assume that the problem alpha equal to 10% age it is given okay this was alpha so we have to calculate this test statistics that that is ver Z value so X bar might be given X bar is the sample mean minus mu is the population mean what we have assumed Sigma value must be given root of Ian for example this value assumed that it is minus one point four six okay so this minus one point four six corresponding what is the left side area so this value is our p-value okay how to get this one so when Z value is minus one point four six we can get corresponding area of your normal distribution on the left hand side so that one we can do with the help of Python for that first you have to import scipy so importing library from scipy import stats then the left side area is say minus one point and the left side is Z statics minus one point four six so when you put a minus this one stats dot norm dot CDF cumulative distribution function minus one point four six we are getting the probability is 0. 07 so that is nothing but point zero seven you see that this alpha is 10% ation so the p-value is less than the Alpha so way out to this region is a rejection region so this region is accept NC region beyond this point it is the rejection region so the value of the P that is when Z value equal to minus one point four six since we are standing on the left hand side that is we are standing on the rejection site we ought to reject the null hypothesis this is left side test lower tailed test this way explain now so suppose if it is a right tailed test say the calculated Z value is two point two nine we got some X BAR value new value Sigma by Sigma value root n value so suppose this is giving two point two nine so for alpha assume that alpha equal to four percentage when alpha equal to four we go to market alpha equal to four percentage from from the from the right to left so when alpha equal to 0. 4 percentage when Z values two point two nine we would what is the corresponding area towards the right side so what you have to do we can find out 1.

75 also see that when Z values 2 point 2 9 so stat start Nam dot CDF will give you the right side area when you put 1 minus stats dot Nam dot CDF 2. 29 will give you the left side area so this first one actually it is not required here because alpha will is directly given here this is only for proof this is for testing testing how to use from Z how this is Z value from that we have find out the probability value so now the Z value is 2 point 2 9 so we want to know the right side area so 1 minus corresponding area that will give the right side area that is point 0 1 1 so this area I'm saying this area is point 0 1 1 now look at this alpha so the p-value is less than the Alpha otherwise you see the p-value so this side is the rejection region this side the acceptance region so when the p-value is 0. 01 still you are standing in the rejection C rejection region so how to reject a null hypothesis in case if the p-value is point zero 5 you might across today the boundary after crossing the boundary you will be landing on the acceptance region so we have to accept a null hypothesis I will go to another method critical value approach for one tailed hypothesis are testing the test statistic Z has the standard normal probability distribution we can use the standard normal probability distribution table to find out the Z value with an area of alpha in the lower tail or upper tail of the distribution for example we know the Alpha value say alpha value is point zero five so this side area is point zero five with the help of Python when alpha is point zero five you can get the Z value this is lower tailed test for upper tail test when alpha equal to point zero five you can find out corresponding Z value in Python what you have to do if you want to know this right side Hotel test you want to know the Z value you have to is one minus point zero five for that probability you were to find out corresponding Z value okay the value of the test statistic that established the boundary of the rejection region is called critical value of the test if it is for the 5% age you will get one point six four five here also minus one point six four five so this minus one point six four five is called a critical region rejection rule if it is a left tailed test reject if the Z value that means your calculated Z value is less than this minus 1.

645 because you will be standing on the rejection side if it is a right tailed test the calculated Z values greater than your table value then you have to reject it see for example Sigma is known the Alpha equal to ten percentage when alpha equal to ten percentage corresponding Z value is minus one point two eight this is our critical region this is our critical region so with the help of sample data you have to find out the Z value if the Z value is lying on this side you have to reject it if the Z value is lying on that side you have to accept it okay for example when area equal to 0. 1 the corresponding Z value is minus one point two eight and going back to three slides minus one point two eight now we'll go for upper tail test when alpha equal to 0. 05 so the right study area is point zero five so this side is 0.

95 so when the left side here is point nine five corresponding Z values 1.