Stochastic Gradient Descent
From Rice Wiki
How it works
First, a weight is selected. This is the starting point from which we iteratively improve the solution.
For each datapoint in the dataset, the gradient of the loss function with respect to weights is computed and a learning rate is selected. These two statistics determine the speed and direction the model converges to.
Then, a GD update rule is used to converge the weights to the desired outcome based on the gradient and learning rate.
After processing all data points, all weights are updated and 1 epoch is completed. MSE is then measured.
GD Update Rule
The GD update rule is used to update the weights after an iteration.