Gradient Descent: Difference between revisions

From Rice Wiki
(Created page with "= How it works = After processing all data points, all weights are updated and 1 '''epoch''' is completed. == GD Update Rule == The '''GD update rule''' is used to update the weights after an iteration. === LMS === Least-mean-squared is a GD update rule.")
 
No edit summary
Line 1: Line 1:
= How it works =
'''Gradient descent (GD)''' is an iterative method to update the weights of a linear regression model to achieve the optimal model.
After processing all data points, all weights are updated and 1 '''epoch''' is completed.


== GD Update Rule ==
The downside to GD is that It does not gaurantee the global optimal. To achieve global minimum, tradeoff needs to be made.
The '''GD update rule''' is used to update the weights after an iteration.


=== LMS ===
= Epoch =
1 '''epoch''' is completed when the entire instances in the training sets are processed once to update the weights of the model.
 
= Hyperparameters =
The '''learning rate''' is a hyperparameter of gradient descent
 
= LMS =
Least-mean-squared is a GD update rule.
Least-mean-squared is a GD update rule.
= Optimizations =
There are two optimization of gradient descents:
* [[Stochastic Gradient Descent]]
* [[Batch Gradient Descent]]

Revision as of 18:28, 15 April 2024

Gradient descent (GD) is an iterative method to update the weights of a linear regression model to achieve the optimal model.

The downside to GD is that It does not gaurantee the global optimal. To achieve global minimum, tradeoff needs to be made.

Epoch

1 epoch is completed when the entire instances in the training sets are processed once to update the weights of the model.

Hyperparameters

The learning rate is a hyperparameter of gradient descent

LMS

Least-mean-squared is a GD update rule.

Optimizations

There are two optimization of gradient descents: