Let there be $Y_{1},Y_{2},\ldots ,Y_{n}$ , where each $Y_{i}$ is a randomv variable from the population.

Every Y have the same mean and distribution that we don't know.

$E(Y_{i})=\mu ,Var(Y_{i})=\sigma ^{2}$

We then have the sample mean

${\bar {Y}}={\frac {1}{n}}\sum _{i=1}^{n}Y_{i}$

The sample mean is expected to be $\mu$ through a pretty easy direct proof

The variance of the sample mean is ${\frac {\sigma ^{2}}{n}}$ , also through a pretty easy direct proof.

Central limit theorem

The central limit theorem states that the distribution of the sample mean follows normal distribution.

${\bar {Y}}\sim N(\mu ,{\frac {\sigma ^{2}}{n}})$

As long as the following two conditions are satisfied, CLT applies, regardless of the population's distribution.

The population distribution of $Y$ is normal, or
The sample size for each $Y_{i}$ is large $n>30$

By extension, we also have

$S\sim N(\mu _{S}=n\mu ,\sigma _{S}={\sqrt {n\sigma }})$

where $S=\sum Y_{i}$

Confidence Interval

Estimation is the guess for the unknown parameter. A point estimate is a "best guess" of the population parameter, where as the confidence interval is the range of reasonable values that are intended to contain the parameter of interest with a certain degree of confidence, calculated with

(point estimate - margin of error, point estimate + margin of error)

Constructing CIs

By CLT, ${\bar {Y}}\sim N(\mu ,{\frac {\sigma ^{2}}{n}})$ . The confidence interval is the range of plausible ${\bar {Y}}$ .

If we define the middle 90% to be plausible, to find the confidence interval, simply find the 5th and 95th percentile.

Generalized, if we want a confidence interval of the middle $(1-\alpha )100\%$ , have a confidence interval of

${\bar {y}}\pm Z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}$

where ${\bar {y}}$ is the sample mean and $Z_{x}$ is the z score of the x-th percentile.

T-Distribution

CLT has several restrictions, the biggest one being a large sample size. T-

Since we don't know the population variance $\sigma ^{2}$ , we have to use the sample variance $s$ to estimate it. This introduces more uncertainty, accounted for by the t-distribution.

T-distribution is the distribution of sample mean based on population mean, sample variance and degrees of freedom (covered later). It looks very similar to normal distribution.

When the sample size $n$ is small, there is greater uncertainty in the estimates. T-di

$t_{\alpha /2}>Z_{\alpha /2}$

The spread of t-distribution depends on the degrees of freedom, which is based on sample size. When looking up the table, round down df.

$\upsilon =n-1$

As the sample size increases, degrees of freedom increase, the spread of t-distribution decreases, and t-distribution approaches normal distribution.

Based on CLT and normal distribution, we had the confidence interval

${\bar {Y}}\pm Z_{\alpha /2}{\frac {\sigma }{\sqrt {n}}}$

Now, based on T-distribution, we have the CI

${\bar {Y}}\pm t_{\alpha /2}{\frac {s}{\sqrt {n}}}$

Find Sample Size

To calculate sample size needed depending on desired error margin and sample variance by assuming that $\upsilon =\infty$

$n={\frac {Z_{\alpha /2}^{2}s^{2}}{E^{2}}}$

We want to always round up to stay within the error margin.

I don't really know why.

Anonymous

Search

Sampling Distribution

Namespaces

More

Page actions

Contents

Central limit theorem

Confidence Interval

Constructing CIs

T-Distribution

Find Sample Size

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Sampling Distribution

Central limit theorem

Confidence Interval

Constructing CIs

T-Distribution

Find Sample Size

Navigation

Wiki tools

Page tools

Categories