Rice: Created page with "'''Back propagation''' is a error calculation technique. It consists of passing a loss function ''backwards'' through a neural network layer-by-layer to update its weights. = Procedure = Consider the following RSS loss function (halved for easier derivative). $E = \frac{1}{2}\sum (y_i - \bf{w}\bf{x})^2$ After each feed-forward pass where one data point is passed through the neural network, the gradient of the loss function is computed. We compute gra..."

2024-05-01T03:18:36Z

Created page with "'''Back propagation''' is a error calculation technique. It consists of passing a loss function ''backwards'' through a neural network layer-by-layer to update its weights. = Procedure = Consider the following RSS loss function (halved for easier derivative). <math> E = \frac{1}{2}\sum (y_i - \bf{w}\bf{x})^2 </math> After each feed-forward pass where one data point is passed through the neural network, the gradient of the loss function is computed. We compute gra..."

New page

'''Back propagation''' is a error calculation technique. It consists of passing a loss function ''backwards'' through a [[neural network]] layer-by-layer to update its weights.

= Procedure =
Consider the following RSS loss function (halved for easier derivative).

<math>
E = \frac{1}{2}\sum (y_i - \bf{w}\bf{x})^2
</math>

After each feed-forward pass where one data point is passed through the neural network, the gradient of the loss function is computed.

We compute gradient separately at each layer due to difference in [[activation function]] by computing the loss function for each neuron of that layer and add up the result.

<math>
E = \sum \left[ \frac{1}{2}\sum (y_i - \bf{w} \bf{x})^2 \right]
</math>

Then, we compute the gradient of the loss function on that layer w.r.t. the weights that created that layer.

<math>
\frac{\partial E}{\partial w}
</math>

[[Category:Machine Learning]]

Back propagation - Revision history