Summary Statistics
When we investigate a variable in a dataset, two things are great at summarizing the dataset: the center and the spread.
Measuring Center
There are two ways to measure center: the mean and the median.
Mean
The mean is the average/expected value of a variable. The sample mean is denoted as , whereas the population mean is denoted as .
where is the sample size.
Median/Percentiles/Quartiles
The median tell us the literal center of the dataset: 50% of statistics are on the left, 50% on the right. It is denoted with
The quartiles is the same, except at 25% for the first quartile, 50% for the second (also the median), and 75% for the third. They are denoted with
The percentiles is also the same, except at a particular percentage. For example, the 80th percentile has 80% of data before it.
To calculate the P-th percentile (and thereby calculating all the other something-tiles), we have
where is the sample size.
Mode
The mode is the most frequently occurring value. It's pretty neglected lol.