ML - Lecture 13

How many steps do a general NN take to find a solution?

Starting from the formula for updating the weights:

w \leftarrow w - η \cdot \nabla E

We consider the updating process continuous and say:

\frac{\partial w}{\partial t} = - η \cdot \nabla E

Now suppose we use a new learning rate:

η \to \frac{η}{∥ \nabla E ∥ ^{2}}

The idea behind this is that when we are in a platò ( $\frac{\partial E}{\partial w} = 0$ ) the learning rate becomes huge, to reduce the step needed to exit the platò, while we are in a deep slope the learning rate automatically reduces to “proceed with caution”

We choose the square ( $∥ \nabla E ∥^{2}$ ) because:

\frac{\partial E}{\partial t} = \frac{\partial}{\partial t} E (w (t)) = \nabla E \cdot \frac{\partial w}{\partial t} = \nabla E \cdot (- η \frac{\nabla E}{∥ \nabla E ∥ ^{2}}) = - η (\frac{\nabla E \cdot \nabla E}{∥ \nabla E ∥ ^{2}}) = - η (\frac{∥ \nabla E ∥ ^{2}}{∥ \nabla E ∥ ^{2}}) = - η

So the error decreases linearly with $\frac{\partial E}{\partial t} = - η$ .

Given $E_{0} = E (t = 0)$ we can expect the error at time $t$ to be: $E_{0} - η t$ , so if we want to know ho much time $t^{*}$ is required to bring the error to $0$ :

t^{*} = \frac{E _{o}}{η}

Solution of non-linear separable problems with NN

~ XOR function

The XOR function is not linearly separable
A NN with sigmoid or sign activation function, and an hidden layer can solve it.

TODO: Make example of a NN that solves the XOR function

🪴 Quartz 4.0

Explorer

ML - Lecture 13

How many steps do a general NN take to find a solution?

Solution of non-linear separable problems with NN

Graph View

Backlinks