# Algorithmic information theory

The branch of mathematical logic which gives a more exact formulation of the basic concepts of the theory of information, based on the concepts of an algorithm and a computable function. Algorithmic information theory attempts to give a base to these concepts without recourse to probability theory, so that the concepts of entropy and quantity of information might be applicable to individual objects. It also gives rise to its own problems, which are related to the study of the entropy of specific individual objects.

The key concept of algorithmic information theory is that of the entropy of an individual object, also called the (Kolmogorov) complexity of the object. The intuitive meaning of this concept is the minimum amount of information needed to reconstruct a given object. An exact definition of the concept of complexity of an individual object, and (on the basis of this concept) of the concept of the quantity of information in an individual object $y$ relative to an individual object $x$, was given by A.N. Kolmogorov in 1962–1965, after which the development of algorithmic information theory began.

Without loss of generality, only binary words (linear strings of 0's and 1's) are considered. The complexity of a word $x$ is understood to mean the length $l(p)$ of the shortest binary word $p$ which contains all the information required to reconstruct $x$ with the aid of some definite decoding procedure. A precise definition of this concept follows.

Let $F(s)$ be an arbitrary universal partial recursive function. Then the complexity of the word $x$ with respect to $F$ will be

$$K _ {F} ( x ) = \ \left \{ \begin{array}{ll} \mathop{\rm min} l ( p ) & \textrm{ if } F ( p ) = x , \\ \infty &\ \textrm{ if } \textrm{ there } \textrm{ is } \textrm{ no } p \ \textrm{ such } \textrm{ that } F(p) = x. \\ \end{array} \right .$$

The word $p$ such that $F(p) = x$ is known as the code, or the program, by which $F$ reconstructs the word $x$. The following theorem is valid: There exists a partial recursive function $F _ {0}$( an optimal function), such that for any partial recursive function $F$ the following inequality holds:

$$K _ {F _ {0} } ( x ) \leq K _ {F} ( x ) + C _ {F} ,$$

where $C _ {F}$ is a constant which is independent of $x$. This theorem yields an invariant (up to an additive constant) definition of complexity: The complexity $K(x)$ of the word $x$ is the complexity $K _ {F _ {0} } (x)$ with respect to some partial recursive function $F _ {0}$, which has been fixed once and for all. Clearly, $K (x) \leq l(x) + C$, where $C$ is a constant independent of $x$.

Using $K(x)$ one may also determine the complexity $K(x, y)$ of pairs of words $( x, y )$; to do this, the pairs $(x, y)$ are effectively coded as one word $c (x, y)$, and the complexity $K(x, y)$ is understood to mean $K (c (x, y))$.

P. Martin-Löf studied the behaviour of the complexity of the initial segments of infinite binary sequences. Let $\omega _ {n}$ denote the initial segment of a sequence $\omega$, consisting of the $n$ initial characters. Then almost all (according to the Lebesgue measure) infinite binary sequences $\omega$ have the following properties: a) for infinitely many natural numbers $n$ the inequality $K ( \omega _ {n} ) \geq n - c$, where $c$ is some constant, is valid; and b) for all natural numbers $n$ larger than some value $n _ \epsilon$, $K ( \omega _ {n} ) \geq n - (1 + \epsilon ) \mathop{\rm log} _ {2} n$, where $\epsilon$ is an arbitrary positive number (the Martin-Löf theorem). Conversely, for any sequence $\omega$ there exist infinitely many $n$ such that $K ( \omega _ {n} ) \leq n - \mathop{\rm log} _ {2} n$.

The concept of complexity is also employed in the study of algorithmic problems. Here, a more natural concept is the so-called complexity of solution of the word $x$. The intuitive meaning of this concept is the minimum quantity of information required to find, for each number $i \leq l(x)$, the $i$- th character of word $x$( the length of the word $x$ need not necessarily be known). More exactly, the complexity of solution of the word $x = x _ {1} \dots x _ {l(x)}$ by a partial recursive function $G (s, t)$ means that

$$K R _ {G} ( x ) = \ \left \{ \begin{array}{lt} \mathop{\rm min} \{ {l ( p ) } : {\forall i \leq l(x) , G ( p , i ) = x _ {i} } \} , &\infty \ \textrm{ if } \textrm{ there } \textrm{ is } \textrm{ no } \textrm{ such } p , \\ \end{array} \right .$$

The following theorem is valid: There exists a partial recursive function $G _ {0}$( an optimal function) such that any other partial recursive function $G$ satisfies the inequality $KR _ {G _ {0} } (x) \leq KR _ {G} ( x ) + C _ {G}$, where $C _ {G}$ is a constant independent of $x$. The complexity of solution, $KR (x)$, of the word $x$ is the complexity $KR _ {G _ {0} } (x)$ of some optimal partial recursive function $G _ {0}$ which has been fixed once and for all. Clearly, $KR (x) \leq K (x) + C$ and $K(x) \leq KR (x) + 2 l (x) + C$. It is possible, by using $KR (x)$, to determine, for any set $M$ of natural numbers, the complexity $K (M, n)$ for the $n$- segment of the set $M$: $K (M, n) = KR ( \omega _ {n} )$, where $\omega = x _ {1} \dots x _ {i} \dots$ is the characteristic sequence of the set $M$( $x _ {i} = 1$ if $i \in M$, and $x _ {i} = 0$ if $i \notin M$).

Algorithmic problems are usually represented as problems of belonging to some recursively enumerable set $M$. If some value $n$ is given, and the problem of belonging to $M$ is restricted to the first $n$ natural numbers only, then one obtains a bounded algorithmic problem. The magnitude $K (M, n)$ may be intuitively understood as expressing the quantity of information required to solve this bounded problem. In particular, the set $M$ is recursive if and only if $K(M, n)$ is bounded from above by some constant.

It follows from the Martin-Löf theorem that there exist sets for which $K (M, n) \sim n$. It was also found that sets with maximum degree of complexity exist even among sets defined by arithmetic predicates with two quantifiers. However, the following theorem applies to recursively enumerable sets: 1) for any recursively enumerable set $M$ and any value $n$, the inequality $K ( M, n) \leq \mathop{\rm log} _ {2} n + C$, where $C$ does not depend on $n$, is valid; and 2) there exists a recursively enumerable set $M$ such that for any $n$ the inequality $K (M,n) > \mathop{\rm log} _ {2} n$ holds. At the same time there exists a recursively enumerable set $M$ such that if the time $t$ of computation of an arbitrary general recursive function is bounded, the estimate $K ^ {t} (M, n) \geq n/ c _ {t}$, where $c _ {t}$ does not depend on $n$, is valid.

Sets which are universal with respect to a certain type of reducibility may also be defined in these terms (cf. Universal set): A set $M$ is weakly tabularly universal if and only if there exists an unbounded general recursive function $f(n)$ such that the inequality $K (M, n) \geq f(n)$ is valid for all values of $n$.

In studying the complexity of bounded algorithmic problems other measures of complexity may also be employed, e.g. the minimum of the length of the coding of the normal algorithm which solves the bounded problem in question. It was found, however, that asymptotically exact relations (cf. Algorithm, complexity of description of an) exist between the complexities introduced above and the analogues of these complexities expressed as the minimum length of the coding of an algorithm (or as the minimum number of internal states of a Turing machine).

Another concept which is introduced in constructing algorithmic information theory is that of conditional complexity of a word $x$, for a known $y$, using a partial recursive function $G (s, t)$:

$$K _ {G} ( x \mid y ) = \ \left \{ \begin{array}{lt} \mathop{\rm min} \{ {l ( p ) } : {G ( p , y ) = x } \} , &\infty \ \textrm{ if } \textrm{ there } \textrm{ is } \textrm{ no } p \ \textrm{ such } \textrm{ that } G (p, y) = x. \\ \end{array} \right .$$

This concept is related to the theorem on the existence of an optimal function $G _ {0}$, and the conditional complexity $K (x \mid y)$ of a word $x$, if $y$ is known, is defined as the complexity $K _ {G _ {0} } (x \mid y)$ using an optimal function $G _ {0}$ fixed once and for all. The intuitive meaning of $K(x \mid y)$ is the minimum quantity of information which must be added to the information contained in the word $y$ for the reconstitution of the word $x$ to be possible. Clearly, $K (x \mid y) \leq K (x) + C$.

The next central concept of algorithmic information theory is the concept of the amount of information in an individual object $y$ relative to an individual object $x$:

$$I ( y : x ) = K ( x ) - K ( x \mid y ) .$$

The quantity $I (y:x)$ is called the algorithmic amount of information in $y$ about $x$. The quantities $K(x)$ and $K( x \mid y )$ are sometimes called the algorithmic entropy of $x$ and the algorithmic conditional entropy of $x$ for a given $y$. The formulas for the decomposition of the entropy $H(X, Y) = H(X) + H(Y \mid X)$ and of the commutation of information $I(X:Y) = I (Y:X)$ are valid in the algorithmic concept up to terms of order $O ( \mathop{\rm log} H (X, Y) )$:

$$| K ( x , y ) - [ K ( x ) + K ( y \mid x ) ] | \leq O ( \mathop{\rm log} K ( x , y ) ) ,$$

$$| I ( x : y ) - I ( y : x ) | \leq O ( \mathop{\rm log} K ( x , y ) ) .$$

There exists a certain relation (due to Kolmogorov) between the algorithmic and the classical definitions of the amount of information (or, more exactly, between the complexity of a word according to Kolmogorov and the entropy of the Shannon distribution of frequencies): Let a number $r$ be given, let the word $x$ of length $i \cdot r$ consist of $i$ words of length $r$, and let the $k$- th word of length $r$ appear in $x$ with a frequency $q _ {k}$( $k = 1 \dots 2 ^ {r}$); one then has

$$\frac{K ( x ) }{i} \leq H + \alpha ( i ) ,$$

where

$$H = - \sum _ {k = 1 } ^ { {2 } ^ {r} } q _ {k} \mathop{\rm log} _ {2} q _ {k} ,$$

and $\alpha (i) \rightarrow 0$ as $i \rightarrow \infty$. In the general case no closer connection can be established between entropy and complexity. This is because entropy is intended to study sequences subject to frequency laws only. Consequently, one may establish the following relation for random sequences $\omega$ in measure between the magnitudes under study, which corresponds to independent trials:

$$\lim\limits _ {i \rightarrow \infty } \frac{K ( \omega _ {ir} ) }{i} = H .$$

Similar considerations apply to an arbitrary ergodic stationary stochastic process.

#### References

 [1] A.N. Kolmogorov, "Three approaches to the quantitative definition of information" Problems in Information Transmission , 1 (1965) pp. 1–7 Problemy Peredachi Informatsii , 1 : 1 (1965) pp. 3–11 [2] A.N. Kolmogorov, "On the logical foundations of information theory and probability theory" Problems Inform. Transmission , 5 : 3 (1969) pp. 1–4 Problemy Peredachi Informatsii , 5 : 3 (1969) pp. 3–7 [3] A.K. Zvonkin, L.A. Levin, "The complexity of finite objects and the developments of the concepts of information and randomness by means of the theory of algorithms" Russian Math. Surveys , 25 : 6 (1970) pp. 82–124 Uspekhi Mat. Nauk , 25 : 6 (1970) pp. 85–127 [4] Ya.M. Bardzin', "Complexity of programs to determine whether natural numbers not greater than belong to a recursively enumerable set" Soviet Math. Dokl. , 9 : 5 (1968) pp. 1251–1254 Dokl. Akad. Nauk SSSR , 182 : 6 (1968) pp. 1249–1252 [5] M.I. Kanovich, "Complexity of resolution of a recursively enumerable set as a criterion of its universality" Soviet Math. Dokl. , 11 : 5 (1970) pp. 1224–1228 Dokl. Akad. Nauk SSSR , 194 : 3 (1970) pp. 500–503