Proportion Estimation: Difference between revisions

From Rice Wiki
No edit summary
No edit summary
 
Line 38: Line 38:
\widetilde{p} = \frac{y + 2}{n + 4}
\widetilde{p} = \frac{y + 2}{n + 4}
</math>
</math>
This essentially adds two samples that fits the traits and two that does
not into consideration. It doesn't impact large samples but works "well"
with small samples.


with standard error
with standard error
Line 45: Line 49:
</math>
</math>


Remember that the confidence interval is ca
Remember that the confidence interval is mean plus or minus standard
error.


<math>\widetilde{p}</math> is slightly skewed towards <math>0.5</math>,
<math>\widetilde{p}</math> is slightly skewed towards <math>0.5</math>,
Line 52: Line 57:
= Confidence Interval =
= Confidence Interval =


We use ''normal distribution'' since <math>p</math> is bounded between
We use ''normal distribution'' since <math>p</math> is bounded between 0
0 and 1, and we don't have extra error from extra parameters such as
and 1, and we don't have extra error from extra parameters such as
multiple sample mean.
multiple sample mean.



Latest revision as of 17:43, 18 March 2024

Proportion estimation is another common task for sample statistics.

We have sample proportion

where is the number of subjects in the sample with a particular trait, and is the sample size.

We have

and standard error

Assumptions

We assume that

  • A random sample was taken
  • and
    • rooted in normal approximation of binomial

Wilson-Adjusted CI for p

Correcting the sample proportion narrows the confidence interval. We do this with the Wilson-Adjusted estimate for

This essentially adds two samples that fits the traits and two that does not into consideration. It doesn't impact large samples but works "well" with small samples.

with standard error

Remember that the confidence interval is mean plus or minus standard error.

is slightly skewed towards , but results in better CIs for . I don't know why.

Confidence Interval

We use normal distribution since is bounded between 0 and 1, and we don't have extra error from extra parameters such as multiple sample mean.

Remember that the confidence interval is just mean plus-or-minus error margin, and the error margin is just the z score multiplied by standard error (since we are using normal distribution).

Notaby, it is possible to have a bound above 1 or below 0. This usually happens when the point estimate is close to 0 or 1. In this case, instead of listing the impossible bounds, we report that they have been cut off.