Ridge Regression
- The power of regularization
- Ridge regression idea
- Statistical notes
The Ridge Classifier, based on Ridge regression method, converts the label data into and solves the problem with regression method. The highest value in prediction is accepted as a target class and for multiclass data multi-output regression is applied.
Neurons

Where sigma can be any function, typically tho it will return outputs between
The most common activation function () is the sigmoid function:

Saturation
The sigmoid function, like many other activation functions, can be subjected to a phenomenon called saturation of the neuron.
This happens when the weight becomes too large (positive or negative) and no matter of the inputs the resulting output: will always be 0 or 1 nothing in between.
The algorithm do solve this problem automatically but it could take a lot of time, so its best to adopt some strategies to stop the saturation from happening.
Classification:
Suppose we want to separate 2 sets of points, black from gray, we define Linear Separability as a property of the set when is possible to draw a straight line that completely separates the two sets.
Definition: Given a collection of points where are the parameters and is the class We say that is linearly separable if there exist such that:

Linear Separability in the Boolean function:

The XOR function is not linearly separable
Reference to probability
Remember: When measuring a random variable the distribution of the observation can be expected to take a Gaussian Distribution.
Given this knowledge then we can say that for a problem like “Determine if a persone is male or female given their height”
Then plotting the data we gather we can expect a graph like this:
The purple line represent the distribution of female heights
The red line the male heights
Then the point I’m looking for is exactly:
Because the distribution are the same, we can say that the point is found equaling:
From this formula we find the intersection point.