Stochastic Gradient Descent: Revision history

From Rice Wiki

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

15 April 2024

  • curprev 18:2518:25, 15 April 2024Rice talk contribs 753 bytes +753 Created page with " = How it works = First, a weight <math>\bf{w}</math> is selected. This is the starting point from which we iteratively improve the solution. For ''each datapoint'' in the dataset, the ''gradient'' of the loss function with respect to weights is computed and a learning rate is selected. These two statistics determine the speed and direction the model <math>\bf{w}</math> converges to. Then, a '''GD update rule''' is used to converge the weights to the desired outcome ba..." Tag: Visual edit