Regression and Classification
Linear Regression : Classification :
Error function:
Loss function:
The loss function defines how distant or different is the evaluation of one single set of features from the actual solution.
inputs: weights: real outputs: estimated outputs:
Gradient Descent
Is an algorithm It works with the linear regression formula. Works well when given a lot of parameters and runs really fast.
Empirical Derivative
To calculate a derivative if we donβt care too much about velocity we can use its definition:
And using a really small evaluate it empirically
Stochastic Gradient Descent
Instead of updating the gradient after calculating all the error function, we update it after calculating only the Loss function (1 step of gradient descent for every example taken)
There is a good change that the stochastic gradient descent actually brings us to the local minimum
Differences with Normal Gradient Descent: we have the security that it will brings us to the local minimum
Batch Gradient Descent
The middle ground, i update the gradient (1 step) after looking at an arbitrary number of Loss functions
Differences with Normal Gradient Descent: I update the gradient (1 step) after looking at the sum of all Loss functions (equal to the Error function)
The Normal gradient descent is the most mathematically correct one, but itβs not always the best choice, sometimes is not even doable:
~Ex.: When gathering data from an online website for example, the data stream is infinite, we cannot use the normal gradient descent but only the stochastic or batch one
Forgetting behaviour
With stochastic gradient descent and batch gradient descent you learn only the last samples and tend to forget the previous ones.
Calculus Chain Rule
Remember that given the loss function:
Its derivative can be rewritten as: