Pseudo-Inversion
From the formula what if is not invertible. Maybe is.
That is because the number of features, plus the bias () is really often much less than the number of examples taken ()
So i can change the formula a little:
So (pseudo inversion):
And:
Gradient Descent

IDEA: Given the nabla of the error function : we can calculate the next βstepβ such that updating the weights will bring us to a smaller .
REMBER: Ur objective is to bring .
ADVANTAGES: Instead of using the βexactβ formula to calculate the weights we can use gradient descent. Gradient descent will find a solution even if one perfect solution does not exist.
Updating Parameter :
defines how big of a step we will take.
- Bigger can solve the problem faster (less step required) but can also not solve the problem at all instead of going to 0
- Smaller will bring a more accurate solution but it will take longer