Complexity function
2020 Mathematics Subject Classification: Primary: 68R15 [MSN][ZBL]
of a word $w$
The complexity function of a word $w$ (finite, infinite or bi-infinite) over a finite alphabet $A$ is the function $p_w(n)$ that counts the number of distinct factors (substrings of consecutive symbols) of length $n$ in $w$. More generally, the complexity function of a language, a set of finite words over an alphabet, counts the number of distinct words of given length.
For a string $u$ of length at least $n$ over an alphabet of size $k$ we clearly have $$ 1 \le p_u(n) \le k^n \ , $$ the bounds being achieved by the constant word and a disjunctive word (for example, the Champernowne word) respectively. For infinite words $u$, we have $p_u(n)$ bounded if $u$ is ultimately periodic (a finite, possibly empty, sequence followed by a finite cycle). Conversely, if $p_u(n) \le n$ for some $n$, then $u$ is ultimately periodic.
An aperiodic sequence is one which is not ultimately periodic. An aperiodic sequence has strictly increasing complexity function (this is the Morse–Hedlund theorem), so $p_u(n) \ge n+1$.
A set $S$ of finite binary words is balanced if for each $n$ the subset $S_n$ of words of length $n$ has the property that the Hamming weight of the words in $S_n$ takes at most two distinct values. A balanced sequence is one for which the set of factors is balanced. A balanced sequence has complexity function $p(n) \le n+1$.
A Sturmian word over a binary alphabet is one with complexity function $p(n) = n+1$ A sequence is Sturmian if and only if it is balanced and aperiodic. An example is the Fibonacci word. More generally, a Sturmian word over an alphabet of size $k$ is one with complexity $p(n) = n+k-1$. An Arnoux-Rauzy word over a ternary alphabet has complexity $2n+1$: an example is the Tribonacci word.
Let $L$ be a language over an alphabet and define the function $P_L(n)$ of a positive integer $n$ to be the number of different words of length $n$ in $L$ The complexity function of a word is thus the complexity function of the language consisting of the factors of that word.
The complexity function of a language is less constrained than that of a word. For example, it may be bounded but not eventually constant: the complexity function of the regular language $a(bb)^*a$ takes values 0 and 1 on odd and even $n \ge 2$ respectively. There is an analogue of the Morse–Hedlund theorem: if the complexity of $L$ satisfies $p_L(n) \le n$ for some $n$, then $p_L$ is bounded and there is a finite language $F$ such that $$ L \subseteq \{ x y^k z : x,y,z \in F,\ k \in \mathbb{N} \} \ . $$ A polynomial or sparse language is one for which the complexity function $p(n)$ is bounded by a fixed power of $n$. An exponential language is one for which there exists $k>1$ such that there are infinitely many $n$ for which $p(n) > k^n$. Words exist with complexity functions having growth intermediate between polynomial and exponential; however, a regular language is either polynomial or exponential.
The topological entropy of an infinite sequence $u$ is defined by $$ H_{\mathrm{top}}(u) = \lim_{n \rightarrow \infty} \frac{\log p_u(n)}{n \log k} \ . $$
The limit exists as the logarithm of the complexity function is a subadditive function: indeed, $p(m+n) \le p(m) \cdot p(n)$. Every real number between 0 and 1 occurs as the topological entropy of some sequence, which may be taken to be a uniformly recurrent word or even uniquely ergodic.
For $x$ a real number and $b$ an integer $\ge 2$ then the complexity function of $x$ base $b$ is the complexity function $p_{x,b}(n)$ of the sequence of digits of $x$ written in base $b$. If $x$ is an irrational number then $p_{x,b}(n) \ge n+1$; if x is rational then $p_{x,b}(n) \le C$ for some constant $C$ depending on $x$ and $b$. It is conjectured that for algebraic irrational $x$ the complexity is $b^n$ (which would follow if all such numbers were normal) but all that is known in this case is that $p_{x,b}(n)$ grows faster than any linear function of $n$.
The abelian complexity function $p^{\text{ab}}(n)$ similarly counts the number of occurrences of distinct factors of given length $n$, where now we identify factors that differ only by a permutation of the positions. Clearly $p^{\text{ab}}(n) \le p(n)$. The abelian complexity of a Sturmian sequence satisfies $p^{\text{ab}}(n) = 2$.
References
- Allouche, Jean-Paul; Shallit, Jeffrey (2003). Automatic Sequences: Theory, Applications, Generalizations. Cambridge University Press. ISBN 978-0-521-82332-6 Zbl 1086.11015
- Berstel, Jean; Lauve, Aaron; Reutenauer, Christophe; Saliola, Franco V. (2009). Combinatorics on words. Christoffel words and repetitions in words. CRM Monograph Series. 27. Providence, RI: American Mathematical Society. ISBN 978-0-8218-4480-9 Zbl 1161.68043
- Berthé, Valérie; Rigo, Michel, eds. (2010). Combinatorics, automata, and number theory. Encyclopedia of Mathematics and its Applications. 135. Cambridge: Cambridge University Press. ISBN 978-0-521-51597-9 Zbl 1197.68006
- Bugeaud, Yann (2012). Distribution modulo one and Diophantine approximation. Cambridge Tracts in Mathematics. 193. Cambridge: Cambridge University Press. ISBN 978-0-521-11169-0 Zbl 1260.11001.
- Cassaigne, Julien; Nicolas, François (2010). "Factor complexity". In Berthé, Valérie; Rigo, Michel. Combinatorics, automata, and number theory. Encyclopedia of Mathematics and its Applications. 135. Cambridge: Cambridge University Press. pp. 163–247. ISBN 978-0-521-51597-9 Zbl 1216.68204
- Lothaire, M. (2011). Algebraic combinatorics on words. Encyclopedia of Mathematics and Its Applications. 90. With preface by Jean Berstel and Dominique Perrin (Reprint of the 2002 hardback ed.). Cambridge University Press. ISBN 978-0-521-18071-9 Zbl 1221.68183
- Pytheas Fogg, N. (2002). Substitutions in dynamics, arithmetics and combinatorics. Lecture Notes in Mathematics. 1794. Editors Berthé, Valérie; Ferenczi, Sébastien; Mauduit, Christian; Siegel, A. Berlin: Springer-Verlag. ISBN 3-540-44141-7 Zbl 1014.11015
Morse–Hedlund theorem. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Morse%E2%80%93Hedlund_theorem&oldid=39293