Batch Gradient Descent: Difference between revisions

From Rice Wiki
(Created page with "In '''batch gradient descent''', the unit of data is the entire dataset, in contrast to Stochastic Gradient Descent whose unit of data is one data point. It uses the ''average of the computed gradients'' to update the weights of a ''batch'' of data points. * Faster * Less performing/precise (not always)")
 
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 3: Line 3:
* Faster
* Faster
* Less performing/precise (not always)
* Less performing/precise (not always)
A variation, '''mini batch GD,''' uses smaller batches (not the entire dataset). It mitigates the lack in precision.
[[Category:Machine Learning]]

Latest revision as of 19:31, 17 May 2024

In batch gradient descent, the unit of data is the entire dataset, in contrast to Stochastic Gradient Descent whose unit of data is one data point. It uses the average of the computed gradients to update the weights of a batch of data points.

  • Faster
  • Less performing/precise (not always)

A variation, mini batch GD, uses smaller batches (not the entire dataset). It mitigates the lack in precision.