Classifier for Probability Estimation - K-NN, Parzen Window and Kernels
Parzen Window: Before training we can βfilterβ the data using a function, for example using the hypercube function that selects only the data in a specific region, an hypercube, wide , with dimensions (so its volume will be ). We have defined the βwindow functionβ or βkernelβ as:
The kernel is a kernel having value only within the hypercube centered in and having edge , the number of data in this hypercube () is:
As we have defined before the estimate of the unknown pdf is , then:
NOTE: We have defined as the hypercube function but this form is equivalent for all window functions.
For example one common choice of a kernel is the Gaussian Kernel:
- : Gaussian pdf.
- : vector of variables, this is a multivariate gaussian distribution
- : mean
- : variance
Sufficient Condition for Asserting that is a PDF :
NOTE: Also, no matter the type of function , we can just take where is an hyperparameter decided arbitrarily, and we can guarantee, as seen in the previous lecture, that the estimated pdf will converge to the actual pdf , as the number of data .
Diracβs Window Function: Let us define:
Notice how:
- If is big the function is a very smooth function, having small variation and representing a rectangle centered in high and total area equal to .
- if is really small, , than is a Diracβs Delta centered in .
NOTE: This is a useful example to understand how a different kernel can impact the quality of the estimate .
-Nearest Neighbour : Given that we could have some problems:
- If is too small β .
- If is too big β .
We can solve this problem generalizing the value of , first we want to assert that:
A possible solution to this problem is taking , such that:
Now instead of deciding the Volume we create it this way:
- Chose our center
- Create an hypersphere of dimension around
- Let the sphere expand until it includes data.
- Calculate the volume of this hypersphere.
To have more flexibility we can define where is an hyperparameter decided arbitrarily.
Original Files

Where:
- : number of hypercubes.
- : hypercube (cube wide with dimension ).
- : Volume of the hypercube,
- : window function, or kernel
- : number of data given, in this case the sum of all the hypercubesβ volumes.
- : multivariate pdf where is a vector.
- : -th estimation of the pdf .

NOTE: Gaussian Kernel : : vector of variables, this is a multivariate gaussian distribution : mean : variance
Where:
- : hyperparameter, width of the hypercubes.
- : hyperparameter, number of hypercubes we wish to take.
If we change the

