Summary

The prediction error $ε$ at time $t$ depending on the parametric vector $θ$ is equal to:

ε (t, θ) = y (t) - \overset{y}{^} (t ∣ t - 1, θ)

Can be written in 2 ways:

ε (t, θ) = {y (t) - φ^{T} (t) \cdot θ y (t) - φ^{T} (t, θ) \cdot θ

Where the 1° one is related to ARX models, where the regressor $φ$ does not depend on the parametric vector $θ$ . While the 2° one is related to ARMAX, OE, BJ models, and $φ (t, θ)$ (which depends on $θ$ ) is called pseudo-regressor.

Choosing the Best Model

→ We have to predict the stochastic term.

So to calculate the stochastic term we make use of the data set $Z$ up to time $t - 1$ , then to remove the stochastic term from $y (t)$ we consider its mean.

We may right the cost function $J$ which depends on $ε (t, θ)$ as a function $V_{N}$ that depends on $θ$ and $Z^{N}$ after all, $ε$ is calculate using those two arguments. Of course, we define the optimal parameter vector $\hat{θ}^{*}$ as the one that minimizes the cost function $J$ or $V_{N}$ .

How can I predict the output ?

First we assume that $y (t)$ is given by the following function:

You can thing of $v (t)$ as the filtered noise, while $e (t)$ is a generic stochastic preocess. → Inversely Stable: means that all zeros and poles of $H (z)$ lie inside the unit circle.

So: Where you have to remember that $e (t)$ is unpredictable

Then, since e(t) is unpredictable our best bet is to predict $\overset{v}{^}$ as:

Now we can calculate the prediction $\overset{y}{^} (t ∣ t - 1)$ :

$[1 - H (z)]$ will not have constant terms, also it’s grade will be at least $z^{- 1}$ (or less $z^{- 2}$ , $\dots$ So $\overset{y}{^} (t ∣ t - 1)$ will depend only on past term of $y (t)$
The same also holds for $H (z)^{- 1} G (z)$

The error $ε (t)$ will be

That will depend both on $t$ and $θ$

→ We will choose $θ$ such that we can minimize the previous cost function.

ARX Model (Error Function)

→TODO [ARX Model]

Remember that this is the structure of the ARX Model

So we have that:

And $θ$ is of the form: (already seen)

We can define the regressor vector as:

Such that: → Nothing changed from the explicit formula, it’s just much more compact

NOTE: The error is linear in $θ$

ARMAX Model (Error Function)

We can write the regressor as: → pseudo-regressor because it depends on $θ$

RESULT: → Differently fromTODO [ARX Model (Error Function)] it is non-linear in $θ$

OE Model (Error Function)

The pseudo-regressor is: → pseudo-regressor because $w (t)$ depends on $θ$ , and so does $φ (t)$ → $φ (t, θ)$

→ Like [ARMAX Model (Error Function)] the $\overset{y}{^} (t ∣ t - 1)$ is non-linear in $θ$

Paradox of the Optimal Prediction

To know the prediction of $y (t - 1)$ we need $y (t - 2)$ ,
But to know the prediction of $y (t - 2)$ we need $y (t - 3)$ ,
And so on …

→ Both the input and output starting from time $t - n_{a}$ (or $t - n_{b}$ ) up to $t - 1$ are known (i can measure them)

While for theTODO [OE Model (Error Function)] the pseudo-regressor depends on the $n_{F}$ past predictions which depend on all the past outputs (starting from $t = - \infty$ ) so like for theTODO [ARMAX Model (Error Function)] the problem stands.

🪴 Quartz 4.0

Explorer

SI&DA - Lecture 17 'Model Selection Criteria'