How many steps do a general NN take to find a solution?

Starting from the formula for updating the weights:

We consider the updating process continuous and say:

Now suppose we use a new learning rate:

The idea behind this is that when we are in a platΓ² () the learning rate becomes huge, to reduce the step needed to exit the platΓ², while we are in a deep slope the learning rate automatically reduces to β€œproceed with caution”

We choose the square () because:

So the error decreases linearly with .

Given we can expect the error at time to be: , so if we want to know ho much time is required to bring the error to :


Solution of non-linear separable problems with NN

~ XOR function

  • The XOR function is not linearly separable
  • A NN with sigmoid or sign activation function, and an hidden layer can solve it.

TODO: Make example of a NN that solves the XOR function