Stochastic Gradient Descent

From Rice Wiki
Revision as of 18:25, 15 April 2024 by Rice (talk | contribs) (Created page with " = How it works = First, a weight <math>\bf{w}</math> is selected. This is the starting point from which we iteratively improve the solution. For ''each datapoint'' in the dataset, the ''gradient'' of the loss function with respect to weights is computed and a learning rate is selected. These two statistics determine the speed and direction the model <math>\bf{w}</math> converges to. Then, a '''GD update rule''' is used to converge the weights to the desired outcome ba...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

How it works

First, a weight is selected. This is the starting point from which we iteratively improve the solution.

For each datapoint in the dataset, the gradient of the loss function with respect to weights is computed and a learning rate is selected. These two statistics determine the speed and direction the model converges to.

Then, a GD update rule is used to converge the weights to the desired outcome based on the gradient and learning rate.

After processing all data points, all weights are updated and 1 epoch is completed. MSE is then measured.

GD Update Rule

The GD update rule is used to update the weights after an iteration.