Fast Recap:

Gradient Descent
Delta Rule

Recap:

MLP (Multilayer Perceptron) Learning Algorithm: Backpropagation:

Batch Criterion Function:

C (τ, w) = C (W) = \frac{1}{2} j = 1 \sum n i = 1 \sum m (\overset{y_{i}}{^} - y_{i})^{2}

Where:

$τ = {(\underline{x_{1}}, \underline{y_{1}}), \dots, (\underline{x_{1}}, \underline{y_{1}})}$ : is the training set (supervised)
$w$ : weight
$\overset{y_{i}}{^}$ : predicted output given by the ANN.

Online Criterion Function:

C (W) = \frac{1}{2} j = 1 \sum m (\overset{y_{i}}{^} - y_{i})^{2}

The difference with the Batch Criterion is that in the online mode only one output is considered to calculate the Cost $C (W)$ , while in the batch for each cost we consider $m$ outputs (a batch).

Gradient Descent: For a generic weight $w$ we apply the Gradient Descent:

w^{'} = w + Δ w where: Δ w = - η \frac{\partial C}{\partial w}

~Ex.: Online Mode, 1-Layer ANN 1-Layer ANN: 1 Input Layer, 1 Output Layer, we only have one set of weight.

$C (W) = \frac{1}{2} \sum_{j = 1}^{m} (\overset{y_{i}}{^} - y_{i})^{2}$
$y_{i} = f_{i} (a_{i})$ $f_{i}$ : $i$ -esim activation function
$a_{i} = \sum_{k} w_{ik} o_{i}$ $o_{i}$ : old layer, in this case the $i$ -esim input.

After some calculation the result is:

\frac{\partial C}{\partial w} = \frac{\partial f ( a _{i} )}{\partial a _{i}} o_{i}

~Ex.: Online Mode, 2-Layer ANN Having an Hidden Layer adds to the computation $o_{i}$ in the last equation is not the input layer but in this case it’s the output of the hidden layer, so it depends on some weights and on an activation function. The $Δ w$ of the output layer is:

Δ w = - η \frac{\partial C}{\partial w} = η (i = 1 \sum m w_{ij} δ_{i}) \cdot f_{j}^{'} (a_{j}) \cdot o_{k}

Where:

$f_{j} ’ (a_{j}) = \frac{\partial}{\partial a _{j}} f_{j} (a_{j})$
$δ_{j} = {(\overset{y_{j}}{^} - y_{j}) f_{j}^{'} (a_{j}) (\sum_{i \in L_{k + 1}} w_{ij} δ_{i}) \cdot f_{j}^{'} (a_{j}) if j \in L_{l} if j \in L_{k} where: k = l - 1, \dots, 0$
This is known as the Generalized Delta Rule: $Δ w_{jk} = η δ_{j} o_{k}$
Learning is iterated for a defined number of consecutive cycles called epochs, using only the training set.
An epoch is a cycle of application of the delta-rule over all the data in $τ$ .

🪴 Quartz 4.0

Explorer

AI - Lecture 13

Fast Recap:

Recap:

Original Files:

Graph View

Backlinks