Pseudo-Inversion

From the formula $(\hat{X}^{T} \hat{X}) \hat{W}^{*}$ what if $(\hat{X}^{T} \hat{X}) \in R^{d + 1, d + 1}$ is not invertible. Maybe $(\hat{X} \hat{X}^{T}) \in R^{l, l}$ is.

That is because the number of features, plus the bias ( $d + 1$ ) is really often much less than the number of examples taken ( $l$ )

So i can change the formula a little:

(\hat{X}^{T} \hat{X}) \hat{W}^{*} = \hat{X} Y \to \hat{X} \hat{W} = Y

So (pseudo inversion):

\hat{W}^{*} = \hat{X}^{T} (\hat{X} \hat{X}^{T})^{- 1} Y

And:

\hat{X} \hat{W}^{*} = \hat{X} \hat{X}^{T} (\hat{X} \hat{X}^{T})^{- 1} Y = Y

Gradient Descent

IDEA: Given the nabla of the error function $\nabla E$ : we can calculate the next “step” such that updating the weights $w$ will bring us to a smaller $E$ .

REMBER: Ur objective is to bring $E \to 0$ .

ADVANTAGES: Instead of using the “exact” formula to calculate the weights we can use gradient descent. Gradient descent will find a solution even if one perfect solution does not exist.

Updating Parameter $η$ :

$η$ defines how big of a step we will take.

Bigger $η$ can solve the problem faster (less step required) but can also not solve the problem at all $E \to \infty$ instead of going to 0
Smaller $η$ will bring a more accurate solution but it will take longer

🪴 Quartz 4.0

Explorer

ML - Lecture 6

Pseudo-Inversion

Gradient Descent

Updating Parameter $η$ :

Graph View

Table of Contents

Backlinks

🪴 Quartz 4.0

Explorer

ML - Lecture 6

Pseudo-Inversion

Gradient Descent

Updating Parameter η :

Graph View

Table of Contents

Backlinks

Updating Parameter $η$ :