Latest revision as of 17:43, 18 March 2024

Proportion estimation is another common task for sample statistics.

We have sample proportion

${\hat {p}}={\frac {y}{n}}$

where $y$ is the number of subjects in the sample with a particular trait, and $n$ is the sample size.

We have

$\mu _{\hat {p}}=p,\sigma _{\hat {p}}={\sqrt {\frac {p(1-p)}{n}}}$

and standard error

$SE={\sqrt {\frac {{\hat {p}}(1-{\hat {p}})}{n}}}$

Assumptions

We assume that

A random sample was taken
$y\geq 5$ $y\geq 5$ and $n-y\geq 5$ $n-y\geq 5$
- rooted in normal approximation of binomial

Wilson-Adjusted CI for p

Correcting the sample proportion narrows the confidence interval. We do this with the Wilson-Adjusted estimate for $p$

${\widetilde {p}}={\frac {y+2}{n+4}}$

This essentially adds two samples that fits the traits and two that does not into consideration. It doesn't impact large samples but works "well" with small samples.

with standard error

$SE({\widetilde {p}})={\sqrt {\frac {{\widetilde {p}}(1-{\widetilde {p}})}{n+4}}}$

Remember that the confidence interval is mean plus or minus standard error.

${\widetilde {p}}$ is slightly skewed towards $0.5$ , but results in better CIs for $p$ . I don't know why.

Confidence Interval

We use normal distribution since $p$ is bounded between 0 and 1, and we don't have extra error from extra parameters such as multiple sample mean.

Remember that the confidence interval is just mean plus-or-minus error margin, and the error margin is just the z score multiplied by standard error (since we are using normal distribution).

Notaby, it is possible to have a bound above 1 or below 0. This usually happens when the point estimate is close to 0 or 1. In this case, instead of listing the impossible bounds, we report that they have been cut off.

@@ Line 21: / Line 21: @@
 SE = \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}
 </math>
+= Assumptions =
+We assume that
+* A random sample was taken
+* <math>y \geq 5</math> and <math>n - y \geq 5</math>
+** rooted in normal approximation of binomial
 = Wilson-Adjusted CI for p =
@@ Line 28: / Line 36: @@
 <math>
-\widetilde{p}
+\widetilde{p} = \frac{y + 2}{n + 4}
+</math>
+This essentially adds two samples that fits the traits and two that does
+not into consideration. It doesn't impact large samples but works "well"
+with small samples.
+with standard error
+<math>
+SE(\widetilde{p}) = \sqrt{\frac{\widetilde{p} (1 - \widetilde{p})}{n + 4}}
 </math>
+Remember that the confidence interval is mean plus or minus standard
+error.
+<math>\widetilde{p}</math> is slightly skewed towards <math>0.5</math>,
+but results in better CIs for <math>p</math>. I don't know why.
+= Confidence Interval =
+We use ''normal distribution'' since <math>p</math> is bounded between 0
+and 1, and we don't have extra error from extra parameters such as
+multiple sample mean.
+Remember that the confidence interval is just mean plus-or-minus error
+margin, and the error margin is just the z score multiplied by standard
+error (since we are using normal distribution).
+Notaby, it is possible to have a bound ''above 1 or below 0''. This
+usually happens when the point estimate is close to 0 or 1. In this
+case, instead of listing the impossible bounds, we report that they have
+been cut off.
 [[Category:Sample Statistics]]

Anonymous

Search

Proportion Estimation: Difference between revisions

Namespaces

More

Page actions

Latest revision as of 17:43, 18 March 2024

Assumptions

Wilson-Adjusted CI for p

Confidence Interval

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Proportion Estimation: Difference between revisions

Latest revision as of 17:43, 18 March 2024

Assumptions

Wilson-Adjusted CI for p

Confidence Interval

Navigation

Wiki tools

Page tools

Categories