Proportion Estimation: Difference between revisions
(Created blank page) |
No edit summary |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Proportion estimation is another common task for sample statistics. | |||
We have sample proportion | |||
<math> | |||
\hat{p} = \frac{y}{n} | |||
</math> | |||
where <math>y</math> is the number of subjects in the sample with a | |||
particular trait, and <math>n</math> is the sample size. | |||
We have | |||
<math> | |||
\mu_\hat{p} = p, \sigma_\hat{p} = \sqrt{\frac{p (1 - p)}{n}} | |||
</math> | |||
and standard error | |||
<math> | |||
SE = \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}} | |||
</math> | |||
= Assumptions = | |||
We assume that | |||
* A random sample was taken | |||
* <math>y \geq 5</math> and <math>n - y \geq 5</math> | |||
** rooted in normal approximation of binomial | |||
= Wilson-Adjusted CI for p = | |||
''Correcting'' the sample proportion narrows the confidence interval. We | |||
do this with the '''Wilson-Adjusted estimate''' for <math>p</math> | |||
<math> | |||
\widetilde{p} = \frac{y + 2}{n + 4} | |||
</math> | |||
This essentially adds two samples that fits the traits and two that does | |||
not into consideration. It doesn't impact large samples but works "well" | |||
with small samples. | |||
with standard error | |||
<math> | |||
SE(\widetilde{p}) = \sqrt{\frac{\widetilde{p} (1 - \widetilde{p})}{n + 4}} | |||
</math> | |||
Remember that the confidence interval is mean plus or minus standard | |||
error. | |||
<math>\widetilde{p}</math> is slightly skewed towards <math>0.5</math>, | |||
but results in better CIs for <math>p</math>. I don't know why. | |||
= Confidence Interval = | |||
We use ''normal distribution'' since <math>p</math> is bounded between 0 | |||
and 1, and we don't have extra error from extra parameters such as | |||
multiple sample mean. | |||
Remember that the confidence interval is just mean plus-or-minus error | |||
margin, and the error margin is just the z score multiplied by standard | |||
error (since we are using normal distribution). | |||
Notaby, it is possible to have a bound ''above 1 or below 0''. This | |||
usually happens when the point estimate is close to 0 or 1. In this | |||
case, instead of listing the impossible bounds, we report that they have | |||
been cut off. | |||
[[Category:Sample Statistics]] |
Latest revision as of 17:43, 18 March 2024
Proportion estimation is another common task for sample statistics.
We have sample proportion
where is the number of subjects in the sample with a particular trait, and is the sample size.
We have
and standard error
Assumptions
We assume that
- A random sample was taken
- and
- rooted in normal approximation of binomial
Wilson-Adjusted CI for p
Correcting the sample proportion narrows the confidence interval. We do this with the Wilson-Adjusted estimate for
This essentially adds two samples that fits the traits and two that does not into consideration. It doesn't impact large samples but works "well" with small samples.
with standard error
Remember that the confidence interval is mean plus or minus standard error.
is slightly skewed towards , but results in better CIs for . I don't know why.
Confidence Interval
We use normal distribution since is bounded between 0 and 1, and we don't have extra error from extra parameters such as multiple sample mean.
Remember that the confidence interval is just mean plus-or-minus error margin, and the error margin is just the z score multiplied by standard error (since we are using normal distribution).
Notaby, it is possible to have a bound above 1 or below 0. This usually happens when the point estimate is close to 0 or 1. In this case, instead of listing the impossible bounds, we report that they have been cut off.