Maximum likelihood estimation
Maximum likelihood estimation (MLE) is one of the methods to find the coefficients of a model that minimizes the RSS in linear regression. MLE does this by maximizing the likelihood of observing the training data given a model.
Background
Consider objective function
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y = w_0 x_0 + w_1 x_1 + \ldots + w_m x_m + \epsilon = g(x) + \epsilon}
where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle y = g(x)} is the true relationship and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon} is the residual error/noise
We assume that Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_0 = 1} , and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \epsilon \sim N(0, \sigma^2)}
Likelihood function
The likelihood function determines the likelihood of observing the data given the parameters of the model. A high likelihood indicates a good model.
Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle L(w_i, \sigma^2|x,y) = \prod \frac{1}{ \sqrt{ 2 \pi \sigma^2}} exp \left( - \frac{(y_i - g(x_i))^2}{2 \sigma^2 } \right)}
For every data point, the likelihood is computed. The product of all likelihoods are taken.
The weights are then changed to fit it better, and the process repeats.
The computation can be simplified to the following
<math
