How many steps do a general NN take to find a solution?
Starting from the formula for updating the weights:
We consider the updating process continuous and say:
Now suppose we use a new learning rate:
The idea behind this is that when we are in a platΓ² () the learning rate becomes huge, to reduce the step needed to exit the platΓ², while we are in a deep slope the learning rate automatically reduces to βproceed with cautionβ
We choose the square () because:
So the error decreases linearly with .
Given we can expect the error at time to be: , so if we want to know ho much time is required to bring the error to :
Solution of non-linear separable problems with NN
~ XOR function
- The XOR function is not linearly separable
- A NN with sigmoid or sign activation function, and an hidden layer can solve it.
TODO: Make example of a NN that solves the XOR function