Fast Recap:

Recap:

May we use MLP as an estimate of the class-conditional PDFs? Since the output of an MLP can be seen as:

y_{i} (\underline{x}) ≃ P (ω_{i} ∣ \underline{x}) = \frac{p ( x ∣ ω _{i} ) P ( ω _{i} )}{p ( x )}

We can write:

\frac{y _{i} ( x )}{P ( ω _{i} )} ≃ \frac{p ( x ∣ ω _{i} )}{p ( x )}

Which is knows as scaled likelihood.

$p (\underline{x})$ is unknown, but can be estimated.
Also $p (\underline{x})$ estimates are more robust than $p (\underline{x} ∣ ω_{i})$ estimates, because we need to estimate only one estimate instead of $c$ other PDFs ( $c$ : number of classes, $ω_{1}, \dots, ω_{c}$ ), also with the same logic if we estimate only $p (\underline{x})$ we will have $c$ times more data.
Also if $P (ω_{i})$ changes over time (let’s say it assumes the new value $P^{'} (ω_{i})$ ) , we can just reuse the same MLP, so no re-training necessary and use the following formula:

P^{'} (ω_{i} ∣ \underline{x}) = = \frac{p ( x ∣ ω _{i} )}{p ( x )} P^{'} (ω_{i}) ≃ \frac{y _{i} ( x )}{P ( ω _{i} )} P^{'} (ω_{i})

RBF (Radial Basis Function) Networks: A generalized linear discriminant

All weights between the input layer and the first hidden layer are equal to $1$ .
There could be a bias terms: $b_{i}$ .
The RB Function (Radial Basis Function), or kernel is defined as:

φ (\underline{x}) = e^{- \frac{∥ x - μ _{k} ∥}{2 σ _{k}^{2}}}

A simple RBF Network with just 1-hidden layer will have this form:

y_{i} = j = 1 \sum k w_{ij} φ (\underline{x}) + b_{i}

RB Functions realize a mixture of Gaussian PDFs, hence they are particularly suitable for pdf estimation.
Like MLPs, RBF Networks are “universal” approximators.

For the learning part, it’s supervised

C (τ, w) = \frac{1}{2} i \sum (\overset{y_{i}}{^} - y_{i})^{2}

And we usually consider 2 approaches:

Via gradient descent over $C (w)$ , we learn the parameters: $w_{ij}$ , $b_{i}$ , $\underline{μ_{k}}$ and $σ_{k}$ .
$\underline{μ_{k}}$ and $σ_{k}$ are estimated statistically, then the other parameters $w_{ij}$ and $b_{i}$ are estimated via linear algebra methods (such as matrix inversion), or via the precedent method gradient descent.

NOTE: With RBF Networks we can apply gradient-ASCENT over ML (Maximum Likelihood) method in order to estimate PDFs.

The ML method only works if the weights between the last hidden layer and the output layer sum up to $1$ .

This can’t be done in MLPs because the constraint $\int p (x) d x = 1$ is violated, since they realize MLPs realize mixtures of activation functions that are not inherently pdfs.

🪴 Quartz 4.0

Explorer

AI - Lecture 17

Fast Recap:

Recap:

Original Files:

Graph View

Backlinks