Maximum likelihood estimation: Difference between revisions
(Created page with "'''Maximum likelihood estimation (MLE)''' is one of the methods to find the coefficients of a model that minimizes the RSS in linear regression. MLE does this by maximizing the likelihood of observing the training data given a model. = Background = Consider objective function <math>y = w_0 x_0 + w_1 x_1 + \ldots + w_m x_m + \epsilon = g(x) + \epsilon</math> where <math>y = g(x)</math> is the true relationship and <math>\epsilon</math> is the res...") |
|||
(6 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
where <math>y = g(x)</math> is the true relationship and <math>\epsilon</math> is the residual error/noise | where <math>y = g(x)</math> is the true relationship and <math>\epsilon</math> is the residual error/noise | ||
We assume that <math>x_0 = 1</math>, and <math>\epsilon \sim N(0, \sigma^2)</math> | We assume that <math>x_0 = 1</math>, y values are independent of each other, and <math>\epsilon \sim N(0, \sigma^2)</math> | ||
= Likelihood function = | = Likelihood function = | ||
Line 15: | Line 15: | ||
<math>L(w_i, \sigma^2|x,y) = \prod \frac{1}{ \sqrt{ 2 \pi \sigma^2}} exp \left( - \frac{(y_i - g(x_i))^2}{2 \sigma^2 } \right)</math> | <math>L(w_i, \sigma^2|x,y) = \prod \frac{1}{ \sqrt{ 2 \pi \sigma^2}} exp \left( - \frac{(y_i - g(x_i))^2}{2 \sigma^2 } \right)</math> | ||
The likelihood of observing the data is the product of observing each data point, given by the probability density function of standard distribution. | |||
The weights are then changed to fit it better, and the process repeats. | The weights are then changed to fit it better, and the process repeats. | ||
== Assumptions == | |||
MLE has the following assumptions | |||
* $x_0 = 1$ | |||
* $X$ is independent variable, and $y$ is dependent variable | |||
* Residual error $\epsilon ~ N(0,\sigma^2)$ for a constant $\sigma^2$ | |||
* $y$ is independent across observations | |||
== Optimizations with log == | |||
Multiplication of many large numbers is computationally expensive. To optimize, the ''log'' of the likelihood function is computed. Since log of multiplied values is the sum of log of each value, we simplify the above down to | |||
<math>\log(L(\theta|x,y) ) = \sum \log \left( \frac{1}{ \sqrt{ 2 \pi \sigma^2}} exp \left( - \frac{(y_i - g(x_i))^2}{2 \sigma^2 } \right) \right)</math> | |||
== Loss function == | |||
Instead of maximizing the likelihood function, we can minimize the negative of the likelihood function. This way, we can just use OLS. | |||
[[Category:Machine Learning]] | [[Category:Machine Learning]] |
Latest revision as of 03:00, 8 May 2024
Maximum likelihood estimation (MLE) is one of the methods to find the coefficients of a model that minimizes the RSS in linear regression. MLE does this by maximizing the likelihood of observing the training data given a model.
Background
Consider objective function
where is the true relationship and is the residual error/noise
We assume that , y values are independent of each other, and
Likelihood function
The likelihood function determines the likelihood of observing the data given the parameters of the model. A high likelihood indicates a good model.
The likelihood of observing the data is the product of observing each data point, given by the probability density function of standard distribution.
The weights are then changed to fit it better, and the process repeats.
Assumptions
MLE has the following assumptions
- $x_0 = 1$
- $X$ is independent variable, and $y$ is dependent variable
- Residual error $\epsilon ~ N(0,\sigma^2)$ for a constant $\sigma^2$
- $y$ is independent across observations
Optimizations with log
Multiplication of many large numbers is computationally expensive. To optimize, the log of the likelihood function is computed. Since log of multiplied values is the sum of log of each value, we simplify the above down to
Loss function
Instead of maximizing the likelihood function, we can minimize the negative of the likelihood function. This way, we can just use OLS.