Summary Statistics

From Rice Wiki
Revision as of 23:53, 18 March 2024 by Rice (talk | contribs) (Created page with "When we investigate a variable in a dataset, two things are great at ''summarizing'' the dataset: the ''center'' and the ''spread''. = Measuring Center = There are two ways to measure center: the mean and the median. == Mean == The '''mean''' is the average/expected value of a variable. The sample mean is denoted as <math>\bar{X}</math>, whereas the population mean is denoted as <math>\mu_X</math>. <math> \bar{X} = \frac{1}{n} \sum x_i </ma...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

When we investigate a variable in a dataset, two things are great at summarizing the dataset: the center and the spread.

Measuring Center

There are two ways to measure center: the mean and the median.

Mean

The mean is the average/expected value of a variable. The sample mean is denoted as Failed to parse (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {\bar {X}}} , whereas the population mean is denoted as Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mu_X} .

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \bar{X} = \frac{1}{n} \sum x_i }

where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n} is the sample size.

Median/Percentiles/Quartiles

The median tell us the literal center of the dataset: 50% of statistics are on the left, 50% on the right. It is denoted with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \widetilde{X}}

The quartiles is the same, except at 25% for the first quartile, 50% for the second (also the median), and 75% for the third. They are denoted with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Q_1, Q_2, Q_3}

The percentiles is also the same, except at a particular percentage. For example, the 80th percentile has 80% of data before it.

To calculate the P-th percentile (and thereby calculating all the other something-tiles), we have

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \left( \frac{P}{100} \right) (n + 1) }

where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n} is the sample size.

Mode

The mode is the most frequently occurring value. It's pretty neglected lol.

Measuring Spread