Bivariate: Difference between revisions

From Rice Wiki
No edit summary
No edit summary
Line 10: Line 10:
Cor(X, Y) = \rho = \frac{Cov(X,Y)}{sd(X) sd(Y)}
Cor(X, Y) = \rho = \frac{Cov(X,Y)}{sd(X) sd(Y)}
</math>
</math>
Correlation is always between -1 and 1


= Bivariate Normal =
= Bivariate Normal =
Line 42: Line 44:
<math>E(X), sd(X), E(Y), sd(Y), \rho</math>
<math>E(X), sd(X), E(Y), sd(Y), \rho</math>


If <math>X, Y</math> follows bivariatte normal distribution, then we
If <math>X, Y</math> follows bivariate normal distribution, then we
have
have


Line 49: Line 51:
E(X)}{sd(X)} \right)
E(X)}{sd(X)} \right)
</math>
</math>
The left side is the ''predicted Z-score for Y'', and the right side is
''the product of correlation and Z-score of X = x''
The variance is given by
<math>
Var(Y | X = x) = (1 - \rho^2) Var(Y)
</math>
Due to the range of <math>\rho</math>, the variance of Y given X is
always smaller than the actual variance. The standard deviation is just
rooted that.

Revision as of 18:08, 18 March 2024

Consider two numerica random variables and . We can measure their covariance.

The correlation of two random variables measures the line dependent between and

Correlation is always between -1 and 1

Bivariate Normal

The bivariate normal (aka. bivariate gaussian) is one special type of continuous random variable.

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (X, Y)} is bivariate normal if

  1. The marginal PDF of both X and Y are normal
  2. For any Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} , the condition PDF of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y} given Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X = x} is Normal
    • Works the other way around: Bivariate gaussian means that condition is satisfied

Predicting Y given X

Given bivariate normal, we can predict one variable given another. Let us try estimating the expected Y given X is x

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle E(Y| X = x) }

There are three main methods

  • Scatter plot approximation
  • Joint PDF
  • 5 statistics

5 Parameters

We need to know 5 parameters about Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X} and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Y}

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle E(X), sd(X), E(Y), sd(Y), \rho}

If Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X, Y} follows bivariate normal distribution, then we have

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \left( \frac{E(Y|X = x) - E(Y)}{sd(Y)} \right) = \rho \left( \frac{x - E(X)}{sd(X)} \right) }

The left side is the predicted Z-score for Y, and the right side is the product of correlation and Z-score of X = x

The variance is given by

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Var(Y | X = x) = (1 - \rho^2) Var(Y) }

Due to the range of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \rho} , the variance of Y given X is always smaller than the actual variance. The standard deviation is just rooted that.