Hypothesis Test: Difference between revisions
No edit summary |
|||
Line 30: | Line 30: | ||
Next, we need to calculate a '''test-statistic <math>t_s</math>'''. This | Next, we need to calculate a '''test-statistic <math>t_s</math>'''. This | ||
measures how much our sample data differ from <math>H_0</math>. | measures how much our sample data differ from <math>H_0</math>. It | ||
summarizes our data to one number to perform hypothesis test on. | |||
For mean | For mean comparison, the hypothesized difference is 0 (i.e. the means | ||
comparison, the hypothesized difference is 0 (i.e. the means are the same). | are the same). Therefore, the test-statistic is calculated as follows: | ||
<math> | <math> | ||
Line 39: | Line 40: | ||
</math> | </math> | ||
where the difference mean is subtracted by 0 to since that is the | |||
comparison point; all three sets of hypotheses in mean comparison | |||
compares against 0. | |||
<math>t_s</math>, the more our data differs from <math>H_0</math>. | |||
Notice that it increases with sample mean difference and decreases with | |||
variance. | |||
the sampling distribution of <math>\bar{Y}_1 - \bar{Y}_2</math> is | == 3. Find P-value == | ||
The '''p-value''' is the probability of observing our data or more | |||
extreme if <math>H_0</math> is in fact true. To find this, we first need | |||
to know the sampling distribution of our random variable. | |||
=== Distribution === | |||
In the case of mean comparison, because sample mean has normal | |||
distribution, by RV linear combination, the sampling distribution of | |||
<math>\bar{Y}_1 - \bar{Y}_2</math> is | |||
<math> | <math> | ||
Line 53: | Line 65: | ||
</math> | </math> | ||
== | Both means follow the t-distribution, therefore the difference also | ||
follows t-distribution. | |||
We are not going to derive it, but the degree of freedom in this case is | |||
<math> | |||
df = \upsilon = \frac{ (\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} )^2 } | |||
{ \frac{(s_1^2 / n_1)^2}{ n_1 - 1} + \frac{(s_2^2 / n_2)^2}{ n_2 - 1} } | |||
</math> | |||
where <math>\upsilon</math> is '''rounded down''' when using the t-table. | |||
Now that we know the ''degrees of freedom'' and the ''test-statistic'' | |||
to compare against, we can calculate the p-value. | |||
=== P-value === | |||
In the case of mean comparison, we have the following p-values: | In the case of mean comparison, we have the following p-values: | ||
Line 68: | Line 92: | ||
The smaller the p-value, the less likely it is to observe our data or | The smaller the p-value, the less likely it is to observe our data or | ||
more extreme if <math>H_0</math> is true. | more extreme if <math>H_0</math> is true, meaning that our data is | ||
unlikely if our claim is true. | |||
== 4. Conclusion == | == 4. Conclusion == | ||
Line 79: | Line 104: | ||
<math>H_0</math>. | <math>H_0</math>. | ||
A CI that covers <math>0</math> implies that there is no significant | A CI that covers <math>0</math> implies that there is no significant |
Revision as of 01:17, 16 March 2024
Hypothesis test is a technique where sample data is used to determine if the confidence interval supports a particular claim. Hypothesis tests quantify how likely our data is given a particular claim.
This page will focus on usage of hypothesis tests in the context of mean comparison.
Procedure (Mean Comparison)
1. Null and Alternative Hypothesis
To perform hypothesis test with mean comparison, we need two things:
- The null hypothesis is the statement which we assume to be true
- The alternative hypothesis is the complement of the null hypothesis.
Mean comparison work with the difference in means
As such, there are three sets of hypotheses:
- vs
- vs
- vs
2. Test-Statistic
Next, we need to calculate a test-statistic . This measures how much our sample data differ from . It summarizes our data to one number to perform hypothesis test on.
For mean comparison, the hypothesized difference is 0 (i.e. the means are the same). Therefore, the test-statistic is calculated as follows:
where the difference mean is subtracted by 0 to since that is the comparison point; all three sets of hypotheses in mean comparison compares against 0.
, the more our data differs from . Notice that it increases with sample mean difference and decreases with variance.
3. Find P-value
The p-value is the probability of observing our data or more extreme if is in fact true. To find this, we first need to know the sampling distribution of our random variable.
Distribution
In the case of mean comparison, because sample mean has normal distribution, by RV linear combination, the sampling distribution of is
Both means follow the t-distribution, therefore the difference also follows t-distribution.
We are not going to derive it, but the degree of freedom in this case is
where is rounded down when using the t-table.
Now that we know the degrees of freedom and the test-statistic to compare against, we can calculate the p-value.
P-value
In the case of mean comparison, we have the following p-values:
- For , the p-value is
- Two tails
- For , the p-value is
- Upper tail
- For , the p-value is
- Lower tail
The smaller the p-value, the less likely it is to observe our data or more extreme if is true, meaning that our data is unlikely if our claim is true.
4. Conclusion
We decide a cutoff point for our p-values, typically at , called the level of significance.
If , our data supports , therefore is rejected. Otherwise, we failed to reject .
A CI that covers implies that there is no significant
difference, as it is plausible for the population means to be equal.