Example of Chain Rule
Given the chain rule:
In the case of NN with sigmoid activation function we have
So its partial derivative with respect to is:
Also, Given:
We have that:
So:
We also know that:
To help us with notations we define the delta error:
So:
Backpropagation Formula:
The main formula to remember for the backpropagation algorithm using a sigmoidal NN is:
One-hot Encoding
Source LABEL ENCODING (Look at the Categorical value column)
ββββββββββββββ¦ββββββββββββββββββ¦βββββββββ
β CompanyName Categoricalvalue β Price β
β βββββββββββββ¬ββββββββββββββββββ£βββββββββ
β VW β¬ 1 β 20000 β
β Acura β¬ 2 β 10011 β
β Honda β¬ 3 β 50000 β
β Honda β¬ 3 β 10000 β
ββββββββββββββ©ββββββββββββββββββ©βββββββββ
ONE-HOT ENCODING:
ββββββ¦βββββββ¦βββββββ¦βββββββββ¦
β VW β Acuraβ Hondaβ Price β
β βββββ¬βββββββ¬βββββββ¬βββββββββ¬
β 1 β¬ 0 β¬ 0 β 20000 β
β 0 β¬ 1 β¬ 0 β 10011 β
β 0 β¬ 0 β¬ 1 β 50000 β
β 0 β¬ 0 β¬ 1 β 10000 β
ββββββ©βββββββ©βββββββ©βββββββββ
We usually prefer one-hot encoding in respect to categorical value for 2 main reasons
- The label encoding assumes hierarchy, if our ML Model internally calculates the average then if we use label encoding we have that: VM < Acura < Honda, which doesnβt make any sense.
- The one-hot encoding can also be compared to the output of a sigmoid function, or any other ML activation function that output belongs .
Entropy Loss
REMEMBER: For classification problems can only be or
OBSERVATION: For the Entropy loss the delta error is null only for absolute minima
OBSERVATION: If the neuron is saturated () but the actual output is the opposite () the entropy returns a big value, think of it as notifying the NN that it made a big mistake. When the loss is big loss the next step made by the NN will also be big, so with the Entropy loss is much easier to escape the condition of saturation