Summary Statistics

From Rice Wiki
Revision as of 23:53, 18 March 2024 by Rice (talk | contribs) (Created page with "When we investigate a variable in a dataset, two things are great at ''summarizing'' the dataset: the ''center'' and the ''spread''. = Measuring Center = There are two ways to measure center: the mean and the median. == Mean == The '''mean''' is the average/expected value of a variable. The sample mean is denoted as <math>\bar{X}</math>, whereas the population mean is denoted as <math>\mu_X</math>. <math> \bar{X} = \frac{1}{n} \sum x_i </ma...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

When we investigate a variable in a dataset, two things are great at summarizing the dataset: the center and the spread.

Measuring Center

There are two ways to measure center: the mean and the median.

Mean

The mean is the average/expected value of a variable. The sample mean is denoted as , whereas the population mean is denoted as .

where is the sample size.

Median/Percentiles/Quartiles

The median tell us the literal center of the dataset: 50% of statistics are on the left, 50% on the right. It is denoted with

The quartiles is the same, except at 25% for the first quartile, 50% for the second (also the median), and 75% for the third. They are denoted with

The percentiles is also the same, except at a particular percentage. For example, the 80th percentile has 80% of data before it.

To calculate the P-th percentile (and thereby calculating all the other something-tiles), we have

where is the sample size.

Mode

The mode is the most frequently occurring value. It's pretty neglected lol.

Measuring Spread