Rosenblatt’s Perceptron Algorithm
Given a training set with targets taking values , find and such that the hyperplane perpendicular to correctly separates the examples and is the number of times that is updated.
- INITIALIZE: Set , , , and .
- NORMALIZE: Compute for all set .
- CARROT OR STICK ?: If , set , , .
- ALL TESTED ?: Set ; If go back to step 3.
- NO MISTAKES ?: If , the algorithm terminates; set and return .
- TRY AGAIN: Set , , and go back to step P3.
: because can only be or , (two classes), the classification in this case is defined as sign agreement, because the supervisor stop the algorithm only when the sign agrees ()
This algorithm can also be seen as a NN where the activation function is the sign function
ReLU
Rectified Linear Unit

Another activation function that can be substituted to the sigmoid function.
Prevents saturation for but not for .
Also note that the derivate for doesn’t exist. So we have to directly specify its value in the code (not too difficult)
Robust Linear Separation
The Rosenblatt’s perceptron algorithm perform a robust linear separation
Robust because all the points are divided from the linear separation by a factor of (distance), this value is not known at prior, it can be found as the of the distances from the line of linear-separation and all the points.

Also from calculations and theorems we get that the number of steps that allow the algorithm to find the perfect solution will be:
Where R is the radius of the space occupied by the points, as shown in the figure.
Linear Separation with more variables
Let’s now get an intuition on why having more features can increment the possibility of finding linear-separation in the data.
Do do this let’s see to the opposite case.
Take 3 point in a 2D plane:

All of this can be linearly separated by a single line Also even if we project them in a 1D plane they can still be linearly separated:
First one for example:

So this can all be linearly separated
Now take for example this points and their projection:


Notice how the 3 point in the 2D plane can be linearly separated, while the points projected in the 1D plan cannot