Ridge Regression

  • The power of regularization
  • Ridge regression idea
  • Statistical notes

The Ridge Classifier, based on Ridge regression method, converts the label data into and solves the problem with regression method. The highest value in prediction is accepted as a target class and for multiclass data multi-output regression is applied.


Neurons

Where sigma can be any function, typically tho it will return outputs between The most common activation function () is the sigmoid function:


Saturation

The sigmoid function, like many other activation functions, can be subjected to a phenomenon called saturation of the neuron.

This happens when the weight becomes too large (positive or negative) and no matter of the inputs the resulting output: will always be 0 or 1 nothing in between.

The algorithm do solve this problem automatically but it could take a lot of time, so its best to adopt some strategies to stop the saturation from happening.


Classification:

Suppose we want to separate 2 sets of points, black from gray, we define Linear Separability as a property of the set when is possible to draw a straight line that completely separates the two sets.

Definition: Given a collection of points where are the parameters and is the class We say that is linearly separable if there exist such that:


Linear Separability in the Boolean function:

The XOR function is not linearly separable


Reference to probability

Remember: When measuring a random variable the distribution of the observation can be expected to take a Gaussian Distribution.

Given this knowledge then we can say that for a problem like “Determine if a persone is male or female given their height” Then plotting the data we gather we can expect a graph like this: The purple line represent the distribution of female heights The red line the male heights Then the point I’m looking for is exactly: Because the distribution are the same, we can say that the point is found equaling:

From this formula we find the intersection point.