Validation of Classifiers: Given our labeled data we divide it in training set $Y = Y_{1} \cup Y_{2} \cup \dots \cup Y_{c}$ , test set $τ = τ_{1} \cup τ_{2} \cup \dots \cup τ_{c}$ and validation set $V = V_{1} \cup V_{2} \cup \dots \cup V_{c}$ , then:

We select a model $p (\underline{x} ∣ \underline{Θ})$ .
Using the training set $Y$ we estimate $\hat{\underline{Θ}}$ .
After defining a cost function we calculate the evaluated error $\hat{P} (\underline{x} ∣ \underline{Θ})$ on the validation set $V$ .
If $\hat{P} (\underline{x} ∣ \underline{Θ})$ is not good, we restart from point ( $1.$ or $2.$ )
We do the final evaluation of the model error ( $\hat{P} (\underline{x} ∣ \underline{Θ})$ ) using the test set $τ$ .

“Leave One Out” Method : Used if the data sample is small and it is difficult/expensive to add data

Let $Y = {y_{1}, y_{2}, \dots, y_{n}}$
Loop for $i = 1 : n$ $\underline{x} = y_{i}, Y ’ = Y \ {y_{i}}$ Use $Y ’$ to estimate $\hat{\underline{Θ}}$ Compute and store $\hat{P} (error ∣ \underline{x})$ using the model with hyperparameters $\hat{\underline{Θ}}$
$\hat{P} (error) = E [\hat{P} (error ∣ \underline{x})] = \frac{1}{n} \sum_{i = 1}^{n} \hat{P} (error ∣ \underline{y_{i}})$

The error is always evaluated in “new data”, data that is not in the training set and the model has not yet seen.
At the end, all data is used both for training and testing, no data is “wasted”.
It must be used on small data set, this method scales bad

“Many-Fold Crossvalidation” Method: Alternative to the normal $Y, V, τ$ method, it is based on the idea of the “Leave One Out” method, but it scales much better

Let $Y = {y_{1}, y_{2}, \dots, y_{n}}$
Loop for $i = 1 : k$ where: $k < n$ Create a test set $τ_{i}$ with a certain percentage of still unused data. Use $Y ’ = Y \ τ_{i}$ to estimate $\hat{\underline{Θ}}$ . Compute and store $\hat{P} (error ∣ τ_{i})$ calculated on $τ_{i}$ and using the newly founds hyperparameters $\hat{\underline{Θ}}$ .
$\hat{P} (error) = E [\hat{P} (error ∣ \underline{x})] = \frac{1}{k} \sum_{i = 1}^{k} \hat{P} (error ∣ τ_{i})$

Original Files:

Referring to the second “Note*”*:

$P (w_{i} ∣ \underline{x})$ : real probability that $\underline{x}$ , the data or variable we want to classify belongs to/is identified as the class $ω_{i}$ (~ex.: in reality the percentage of male and female is $48% / 52%$ ).
$\hat{P} (w_{j} ∣ \underline{x})$ : $\hat{P} (w_{i} ∣ \underline{x})$ : estimated probability of $P (w_{i} ∣ \underline{x})$ (~ex.: we estimate that the the percentage of male and female is $50% / 50%$ , tho this is not actually true).
$\sum_{j \neq = i} \hat{P} (w_{j} ∣ \underline{x})$ : estimated error probability for the class $i$ .

🪴 Quartz 4.0

Explorer

University AI - Validation of Classifiers

Original Files:

Graph View

Backlinks