Bivariate: Difference between revisions
No edit summary |
No edit summary |
||
| Line 10: | Line 10: | ||
Cor(X, Y) = \rho = \frac{Cov(X,Y)}{sd(X) sd(Y)} | Cor(X, Y) = \rho = \frac{Cov(X,Y)}{sd(X) sd(Y)} | ||
</math> | </math> | ||
Correlation is always between -1 and 1 | |||
= Bivariate Normal = | = Bivariate Normal = | ||
| Line 42: | Line 44: | ||
<math>E(X), sd(X), E(Y), sd(Y), \rho</math> | <math>E(X), sd(X), E(Y), sd(Y), \rho</math> | ||
If <math>X, Y</math> follows | If <math>X, Y</math> follows bivariate normal distribution, then we | ||
have | have | ||
| Line 49: | Line 51: | ||
E(X)}{sd(X)} \right) | E(X)}{sd(X)} \right) | ||
</math> | </math> | ||
The left side is the ''predicted Z-score for Y'', and the right side is | |||
''the product of correlation and Z-score of X = x'' | |||
The variance is given by | |||
<math> | |||
Var(Y | X = x) = (1 - \rho^2) Var(Y) | |||
</math> | |||
Due to the range of <math>\rho</math>, the variance of Y given X is | |||
always smaller than the actual variance. The standard deviation is just | |||
rooted that. | |||
Revision as of 18:08, 18 March 2024
Consider two numerica random variables and . We can measure their covariance.
The correlation of two random variables measures the line dependent between and
Correlation is always between -1 and 1
Bivariate Normal
The bivariate normal (aka. bivariate gaussian) is one special type of continuous random variable.
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (X, Y)} is bivariate normal if
- The marginal PDF of both X and Y are normal
- For any Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} , the condition PDF of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y} given Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X = x} is Normal
- Works the other way around: Bivariate gaussian means that condition is satisfied
Predicting Y given X
Given bivariate normal, we can predict one variable given another. Let us try estimating the expected Y given X is x
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle E(Y| X = x) }
There are three main methods
- Scatter plot approximation
- Joint PDF
- 5 statistics
5 Parameters
We need to know 5 parameters about Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y}
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle E(X), sd(X), E(Y), sd(Y), \rho}
If Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X, Y} follows bivariate normal distribution, then we have
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \left( \frac{E(Y|X = x) - E(Y)}{sd(Y)} \right) = \rho \left( \frac{x - E(X)}{sd(X)} \right) }
The left side is the predicted Z-score for Y, and the right side is the product of correlation and Z-score of X = x
The variance is given by
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Var(Y | X = x) = (1 - \rho^2) Var(Y) }
Due to the range of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \rho} , the variance of Y given X is always smaller than the actual variance. The standard deviation is just rooted that.
