Dropout regularization: Difference between revisions

From Rice Wiki
(Created page with "Category:Machine Learning '''Dropout regularization''' behaves quite differently than other regularization techniques. Instead of penalizing large weights in the loss function, it adds a layer that randomly ignores neurons at every full-pass. Dropout regularization is controlled by hyperparameter '''dropout rate'''. For example, a dropout rate of 0.2 means that 20% of input neurons will be ignored.")
 
No edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
[[Category:Machine Learning]]
[[Category:Machine Learning]]


'''Dropout regularization''' behaves quite differently than other [[regularization techniques]]. Instead of penalizing large weights in the loss function, it adds a layer that randomly ignores neurons at every full-pass.
'''Dropout regularization''' behaves quite differently than other [[regularization]] techniques. Instead of penalizing large weights in the loss function, it adds a layer that randomly ignores neurons at every full-pass.


Dropout regularization is controlled by hyperparameter '''dropout rate'''. For example, a dropout rate of 0.2 means that 20% of input neurons will be ignored.
Dropout regularization is controlled by hyperparameter '''dropout rate'''. For example, a dropout rate of 0.2 means that 20% of input neurons will be ignored.
Since each neuron adds complexity/features to the network, by ignoring some of them, the complexity of the model is reduced, thereby preventing overfitting.

Latest revision as of 20:57, 18 May 2024


Dropout regularization behaves quite differently than other regularization techniques. Instead of penalizing large weights in the loss function, it adds a layer that randomly ignores neurons at every full-pass.

Dropout regularization is controlled by hyperparameter dropout rate. For example, a dropout rate of 0.2 means that 20% of input neurons will be ignored.

Since each neuron adds complexity/features to the network, by ignoring some of them, the complexity of the model is reduced, thereby preventing overfitting.