# Complexity function

2010 Mathematics Subject Classification: Primary: 68R15 [MSN][ZBL]

of a word $w$

The complexity function of a word $w$ (finite, infinite or bi-infinite) over a finite alphabet $A$ is the function $p_w(n)$ that counts the number of distinct factors (substrings of consecutive symbols) of length $n$ in $w$. More generally, the complexity function of a language, a set of finite words over an alphabet, counts the number of distinct words of given length.

For a string $u$ of length at least $n$ over an alphabet of size $k$ we clearly have $$1 \le p_u(n) \le k^n \ ,$$ the bounds being achieved by the constant word and a disjunctive word (for example, the Champernowne word) respectively. For infinite words $u$, we have $p_u(n)$ bounded if $u$ is ultimately periodic (a finite, possibly empty, sequence followed by a finite cycle). Conversely, if $p_u(n) \le n$ for some $n$, then $u$ is ultimately periodic.

An aperiodic sequence is one which is not ultimately periodic. An aperiodic sequence has strictly increasing complexity function (this is the Morse–Hedlund theorem), so $p_u(n) \ge n+1$.

A set $S$ of finite binary words is balanced if for each $n$ the subset $S_n$ of words of length $n$ has the property that the Hamming weight of the words in $S_n$ takes at most two distinct values. A balanced sequence is one for which the set of factors is balanced. A balanced sequence has complexity function $p(n) \le n+1$.

A Sturmian word over a binary alphabet is one with complexity function $p(n) = n+1$ A sequence is Sturmian if and only if it is balanced and aperiodic. An example is the Fibonacci word. More generally, a Sturmian word over an alphabet of size $k$ is one with complexity $p(n) = n+k-1$. An Arnoux-Rauzy word over a ternary alphabet has complexity $2n+1$: an example is the Tribonacci word.

Let $L$ be a language over an alphabet and define the function $P_L(n)$ of a positive integer $n$ to be the number of different words of length $n$ in $L$ The complexity function of a word is thus the complexity function of the language consisting of the factors of that word.

The complexity function of a language is less constrained than that of a word. For example, it may be bounded but not eventually constant: the complexity function of the regular language $a(bb)^*a$ takes values 0 and 1 on odd and even $n \ge 2$ respectively. There is an analogue of the Morse–Hedlund theorem: if the complexity of $L$ satisfies $p_L(n) \le n$ for some $n$, then $p_L$ is bounded and there is a finite language $F$ such that $$L \subseteq \{ x y^k z : x,y,z \in F,\ k \in \mathbb{N} \} \ .$$ A polynomial or sparse language is one for which the complexity function $p(n)$ is bounded by a fixed power of $n$. An exponential language is one for which there exists $k>1$ such that there are infinitely many $n$ for which $p(n) > k^n$. Words exist with complexity functions having growth intermediate between polynomial and exponential; however, a regular language is either polynomial or exponential.

The topological entropy of an infinite sequence $u$ is defined by $$H_{\mathrm{top}}(u) = \lim_{n \rightarrow \infty} \frac{\log p_u(n)}{n \log k} \ .$$

The limit exists as the logarithm of the complexity function is a subadditive function: indeed, $p(m+n) \le p(m) \cdot p(n)$. Every real number between 0 and 1 occurs as the topological entropy of some sequence, which may be taken to be a uniformly recurrent word or even uniquely ergodic.

For $x$ a real number and $b$ an integer $\ge 2$ then the complexity function of $x$ base $b$ is the complexity function $p_{x,b}(n)$ of the sequence of digits of $x$ written in base $b$. If $x$ is an irrational number then $p_{x,b}(n) \ge n+1$; if x is rational then $p_{x,b}(n) \le C$ for some constant $C$ depending on $x$ and $b$. It is conjectured that for algebraic irrational $x$ the complexity is $b^n$ (which would follow if all such numbers were normal) but all that is known in this case is that $p_{x,b}(n)$ grows faster than any linear function of $n$.

The abelian complexity function $p^{\text{ab}}(n)$ similarly counts the number of occurrences of distinct factors of given length $n$, where now we identify factors that differ only by a permutation of the positions. Clearly $p^{\text{ab}}(n) \le p(n)$. The abelian complexity of a Sturmian sequence satisfies $p^{\text{ab}}(n) = 2$.

How to Cite This Entry:
Complexity function. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Complexity_function&oldid=39290