# System identification

A branch of science concerned with the construction of mathematical models of dynamical systems from measured input/output data. The constructed models are mostly of finite-dimensional difference or differential equation form. The area has close connections with statistics and time-series analysis, and also offers a very wide spectrum of applications.

From a formal point of view, a system identification method is a mapping from sets of data to sets of models. An example of a simple model is the discrete-time ARX-model

\begin{equation} \tag{a1} y ( t ) + a _ { 1 } y ( t - 1 ) + \ldots + a _ { n } y ( t - n ) = \end{equation}

\begin{equation*} = b _ { 1 } u ( t - 1 ) + \ldots + b _ { m } u ( t - m ) + e ( t ), \end{equation*}

where $y$ and $u$ are the outputs and inputs, respectively, of the system and $e$ is a realization of a stochastic process (often assumed to be a sequence of independent random variables, cf. also Random variable). Another example is the continuous-time state-space model, described by the linear stochastic differential equation

\begin{equation} \tag{a2} \left\{ \begin{array} { l } { d x ( t ) = A x ( t ) d t + B u ( t ) d t + d w ( t ), } \\ { d y ( t ) = C x ( t ) d t + D u ( t ) d t + d v ( t ), } \end{array} \right. \end{equation}

where $x$ is the vector of (internal) state variables and $w$ and $v$ are Wiener processes (cf. also Wiener process). Artificial neural networks form an example of common non-linear black-box models for dynamical systems.

In any case, the model can be associated with a predictor function $f$ that predicts $y ( t )$ from past (discrete-time) observations

\begin{equation*} Z ^ { t - 1 } = \{ y ( t - 1 ) , u ( t - 1 ) , \dots , y ( 0 ) , u ( 0 ) \}: \end{equation*}

\begin{equation} \tag{a3} \hat{y} ( t | t - 1 ) = f ( Z ^ { t - 1 } , t ). \end{equation}

A set of smoothly parametrized such predictor functions, $f ( Z ^ { t - 1 } , t , \theta )$, forms a model structure $\mathcal{M}$ as $\theta$ ranges over a subset $D _ {\cal{ M} }$ of $\mathbf{R} ^ { d }$. The mapping (estimator or identification method) from observed data $Z ^ { N }$ to $D _ {\cal{ M} }$, yielding the estimate $\hat { \theta } _ { N }$, can be chosen based on a least-squares fit or as a maximum-likelihood estimator (cf. also Least squares, method of; Maximum-likelihood method). This leads to a mapping of the kind

\begin{equation} \tag{a4} \hat { \theta } _ { N } = \operatorname { arg } \operatorname { min } _ { \theta \in D _ { \mathcal{M} } } \sum _ { \mathcal{M} } ^ { N _ { t } = 1 } \text{l} \left( y ( t ) - f ( Z ^ { t - 1 } , t , \theta ) \right), \end{equation}

with a positive scalar-valued function $\operatorname{l}$.

When the data $Z ^ { N }$ are described as random variables, the law of large numbers and the central limit theorem can be applied under weak assumptions to infer the asymptotic (as $N \rightarrow \infty$) properties of the random variable $\hat { \theta } _ { N }$. The covariance matrix of the asymptotic (normal) distribution of the estimate takes the typical form

\begin{equation} \tag{a5} P = \operatorname { lim } _ { N \rightarrow \infty } N . \operatorname{Cov} ( \hat{\theta}_ N ) = \end{equation}

\begin{equation*} = \lambda \operatorname { lim } _ { N \rightarrow \infty } \sum _ { t = 1 } ^ { N } \mathsf{E} \frac { \partial } { \partial \theta } f ( Z ^ { t - 1 } , t , \theta ) \left( \frac { \partial } { \partial \theta } f ( Z ^ { t - 1 } , t , \theta ) \right) ^ { T }, \end{equation*}

where $\lambda$ is the variance of the resulting model's prediction errors, and $\mathsf{E}$ denotes mathematical expectation. Explicit expressions for $P$ form the basis for experiment design and other user-oriented issues. For general treatments of system identification, see, e.g., [a5], [a7], and [a3].

By adaptive system identification (also called recursive identification or sequential identification) one means that the mapping from $Z ^ { N }$ to $\hat { \theta } _ { N }$ is constrained to be of the form

\begin{equation} \tag{a6} \left\{ \begin{array} { r l r l } { X _ { N } = H ( N , X _ { N - 1 } , y ( N ) , u ( N ) ), } \\ { \hat{\theta}_{N} = h ( X _ { N } ), } \end{array} \right. \end{equation}

where $X ( t )$ is a vector of fixed dimensions. This structure allows the computation of the estimate at step (time) $N$ with a fixed amount of calculations. This is instrumental in an application where the model is required "on-line" as the data is measured. Such applications include adaptive control, adaptive filtering, supervision, etc. The structure (a6) often takes the more specific form

\begin{equation} \tag{a7} \left\{ \begin{array} { c c c c }{ \hat{ \theta }_{N} =\hat{\theta }_{N-1}+ \gamma (N) Q_1(X(N),y(N),u(N)), }\\{X_{N}= X _ { N - 1 } + \mu _ { N } Q _ { 2 } ( X _ { N-1} ,y(N), u(N)), }\end{array} \right. \end{equation}

to reflect that the estimate is adjusted from the previous one, usually by a small amount. The convergence analysis of algorithms like (a7) is treated in e.g. [a6], [a1], [a4], and [a8]. The underlying theory is typically based on averaging, relating (a7) to an associated differential equation, and the subsequent stability analysis of this equation, or on stochastic Lyapunov functions (cf. also Lyapunov stochastic function). It is also of interest to determine the asymptotic distribution of the estimate as $\gamma$ and $\mu$ become small, see, e.g., [a2] and [a8].

How to Cite This Entry:
System identification. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=System_identification&oldid=50263
This article was adapted from an original article by L. Ljung (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article