Fast Recap:

Recap:

Likelihood : Given $Y = {\underline{y_{1}} \dots, \underline{y_{n}}}$ a set of data that is given by the distribution $p (\underline{y} ∣ ω_{i})$ , given that they are all given by the same distribution we will say that they are identically distributed, suppose also that they independent* between each other. ⇒ $y_{i}$ are iid (independent and identically distributed).

Due to the independent assumption we can say that:

p (Y ∣ \underline{Θ}) = k = 1 \prod n p (y_{k} ∣ \underline{Θ})

this is called the likelihood of $\underline{Θ}$ given $Y$ , this is a function of $Θ$ :

We call the Maximum Likelihood (ML) Estimate $\underline{\hat{Θ}}$ the one which maximizes $p (Y ∣ Θ)$ .

Log-Likelihood : Where:

$\nabla_{\underline{Θ}} l (\hat{\underline{Θ}})$ is the gradient of the log-likelihood function $l (\hat{\underline{Θ}})$ with respect to $Θ$

~Example : If we know that $p (Y ∣ Θ)$ is a Gaussian Distribution the maximum likelihood of the mean and variance are respectively the sample-mean and biased sample variance.

Sample Mean : $\frac{1}{n} \sum_{k = 1}^{n} \underline{y_{k}}$ Biased Sample Variance : $\frac{1}{n} \sum_{k = 1}^{n} (\underline{y_{k}} - \overset{μ}{^})^{2}$

(Bonus) Unbiased Sample Variance* : $\frac{1}{n - 1} \sum_{k = 1}^{n} (\underline{y_{k}} - \overset{μ}{^})^{2}$

Naming :

$μ, \underline{μ}$ : mean and vector mean
$σ^{2}, Σ$ : variance and covariance matrix
$\underline{Θ}$ : parameter vector, for example: $\underline{Θ} = (\underline{μ}, Σ)$
$ω_{i}$ : $i$ classes, for example the gender (male/female) we want to identificate.
$w_{i}$ : weight
$b_{i}$ : bias
$\underline{x}, \underline{y}$ : data, could mean data in input or training data.
$c$ : number of samples used as the training set
$P (w_{i} ∣ \underline{x})$ : probability that given the data $\underline{x}$ belongs to/is identified as the class $ω_{i}$ .
$g_{i} (\underline{x})$ : discriminant function of class $w_{i}$ , usually it is defined as: $g_{i} (x) = lo g p (\underline{x} ∣ ω_{i}) + lo g P (ω_{i})$ $g_{i} (\underline{x}) = w_{i}^{t} \underline{x} + b_{i}$ (in the linear case)
$D (\underline{x}) = w_{i}$ : decision rule, a simple decision rule could be: $D (\underline{x}) = ω_{i} iff g_{i} (\underline{x}) \geq g_{j} (\underline{x})$

NOTE: weights are named with $w$ , while classes with $ω$ , it may cause confusion

Original Files:

Where:

$\nabla_{\underline{Θ}} l (\hat{\underline{Θ}})$ is the gradient of the log-likelihood function $l (\hat{\underline{Θ}})$ with respect to $Θ$

🪴 Quartz 4.0

Explorer

AI - Lecture 5

Fast Recap:

Recap:

Original Files:

Graph View

Backlinks