Revision as of 02:54, 8 May 2024

Maximum likelihood estimation (MLE) is one of the methods to find the coefficients of a model that minimizes the RSS in linear regression. MLE does this by maximizing the likelihood of observing the training data given a model.

Background

Consider objective function

$y=w_{0}x_{0}+w_{1}x_{1}+\ldots +w_{m}x_{m}+\epsilon =g(x)+\epsilon$

where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y = g(x)} is the true relationship and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon} is the residual error/noise

We assume that $x_{0}=1$ , y values are independent of each other, and $\epsilon \sim N(0,\sigma ^{2})$

Likelihood function

The likelihood function determines the likelihood of observing the data given the parameters of the model. A high likelihood indicates a good model.

$L(w_{i},\sigma ^{2}|x,y)=\prod {\frac {1}{\sqrt {2\pi \sigma ^{2}}}}exp\left(-{\frac {(y_{i}-g(x_{i}))^{2}}{2\sigma ^{2}}}\right)$

The likelihood of observing the data is the product of observing each data point, given by the probability density function of standard distribution.

The weights are then changed to fit it better, and the process repeats.

Assumptions

We assume that the error is normally distributed.

Optimizations with log

Multiplication of many large numbers is computationally expensive. To optimize, the log of the likelihood function is computed. Since log of multiplied values is the sum of log of each value, we simplify the above down to

$\log(L(\theta |x,y))=\sum \log \left({\frac {1}{\sqrt {2\pi \sigma ^{2}}}}exp\left(-{\frac {(y_{i}-g(x_{i}))^{2}}{2\sigma ^{2}}}\right)\right)$

Loss function

Instead of maximizing the likelihood function, we can minimize the negative of the likelihood function. This way, we can just use OLS.

@@ Line 18: / Line 18: @@
 The weights are then changed to fit it better, and the process repeats.
+== Assumptions ==
+We assume that the error is normally distributed.
 == Optimizations with log ==

Anonymous

Search

Maximum likelihood estimation: Difference between revisions

Namespaces

More

Page actions

Revision as of 02:54, 8 May 2024

Contents

Background

Likelihood function

Assumptions

Optimizations with log

Loss function

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Maximum likelihood estimation: Difference between revisions

Revision as of 02:54, 8 May 2024

Background

Likelihood function

Assumptions

Optimizations with log

Loss function

Navigation

Wiki tools

Page tools

Categories