Classifier for Probability Estimation - K-NN, Parzen Window and Kernels

Parzen Window: Before training we can β€œfilter” the data using a function, for example using the hypercube function that selects only the data in a specific region, an hypercube, wide , with dimensions (so its volume will be ). We have defined the β€œwindow function” or β€œkernel” as:

The kernel is a kernel having value only within the hypercube centered in and having edge , the number of data in this hypercube () is:

As we have defined before the estimate of the unknown pdf is , then:

NOTE: We have defined as the hypercube function but this form is equivalent for all window functions.


For example one common choice of a kernel is the Gaussian Kernel:

  • : Gaussian pdf.
  • : vector of variables, this is a multivariate gaussian distribution
  • : mean
  • : variance

Sufficient Condition for Asserting that is a PDF :

NOTE: Also, no matter the type of function , we can just take where is an hyperparameter decided arbitrarily, and we can guarantee, as seen in the previous lecture, that the estimated pdf will converge to the actual pdf , as the number of data .


Dirac’s Window Function: Let us define:

Notice how:

  • If is big the function is a very smooth function, having small variation and representing a rectangle centered in high and total area equal to .
  • if is really small, , than is a Dirac’s Delta centered in .

NOTE: This is a useful example to understand how a different kernel can impact the quality of the estimate .


-Nearest Neighbour : Given that we could have some problems:

  • If is too small β‡’ .
  • If is too big β‡’ .

We can solve this problem generalizing the value of , first we want to assert that:

A possible solution to this problem is taking , such that:

Now instead of deciding the Volume we create it this way:

  1. Chose our center
  2. Create an hypersphere of dimension around
  3. Let the sphere expand until it includes data.
  4. Calculate the volume of this hypersphere.

To have more flexibility we can define where is an hyperparameter decided arbitrarily.


Original Files

Where:

  • : number of hypercubes.
  • : hypercube (cube wide with dimension ).
  • : Volume of the hypercube,
  • : window function, or kernel
  • : number of data given, in this case the sum of all the hypercubes’ volumes.
  • : multivariate pdf where is a vector.
  • : -th estimation of the pdf .

NOTE: Gaussian Kernel : : vector of variables, this is a multivariate gaussian distribution : mean : variance

Where:

  • : hyperparameter, width of the hypercubes.
  • : hyperparameter, number of hypercubes we wish to take.

If we change the