# Thue-Morse sequence

$\newcommand{\calA}{\mathcal{A}}$

A sequence appearing in numerous fields and discovered and rediscovered in different contexts.

The Thue–Morse sequence may be defined as the sequence $u = (u_n)_n$ which counts the sum modulo $2$ of the digits of $n$ in base $2$ ($u_n$ gives the parity of the number of $1$s in the binary expansion of $n$). This sequence can also be generated by an iterative process called substitution. Let $\calA = \{a, b\}$ and let $\calA^*$ denote the set of words defined on the alphabet $\calA$ (cf. also Word). Consider the mapping $\sigma : \calA \to \calA^*$ defined by $\sigma(a) = ab$ and $\sigma(b) = ba$. The mapping $\sigma$ extends to a morphism of $\calA^*$ by concatenation. One can iterate $\sigma$, and the nested words $\sigma^n(a)$ converge in the product topology to the infinite sequence which begins with $\sigma^n(a)$, for every $n$. This sequence is precisely the Thue–Morse sequence.

This sequence was first discovered by E. Prouhet in 1851 as a solution of the so-called Prouhet–Tarry–Escott problem (G. Tarry and E.B. Escott re-introduced this problem after Prouhet in the years 1910–1920): Consider a finite set of integers that can be partitioned into $c$ classes with the same cardinality $s$ such that the sums of the elements, the sums of squares, etc., the sums of the $k$th powers in each class, are independent of the class. If the sums of the $(k+1)$th powers are not equal, then the solution is said to be an exact solution of order $k$. The Thue–Morse sequence provides a solution of degree $k$ exactly (when $c=2$) by considering the classes $\left\{ 0 \le n \le 2^{k+1} - 1 : u_n = a \right\}$ and $\left\{0 \le n \le 2^{k+1} - 1 : u_n = b \right\}$. Note that this partition is conjectured to be the unique partition of the set $\{0, 1, \ldots, 2^{k+1} - 1\}$ into $2$ classes providing an exact solution of degree $k$. For more results on this subject, see [a1].

The sequence $u$ was next rediscovered by A. Thue in 1912 [a7]. Thue tried to construct arbitrarily long words on a two-letter alphabet without cubes, i.e., without factors of the form $www$, where $w$ is a non-empty word. It is easily seen that there are no infinite square-free sequences on two letters. It is then natural to ask whether there are infinite sequences free of powers $2+\epsilon$, for any $\epsilon > 0$. Such a sequence is called overlap-free; in other words, none of its factors is of the form $xuxux$. In fact, the Thue–Morse sequence is an infinite overlap-free word on two letters. Moreover, all overlap-free words are derived from this sequence. Furthermore, if one defines the Thue–Morse sequence on $\{0, 1\}$ (by mapping $a$ to $1$ and $b$ to $0$), then the sequence $(u_{n+1} - u_n)_n$ defined on the three-letter alphabet $\{-1, 0, 1\}$ is square-free. By applying the morphism $\mu(-1) = a$, $\mu(0) = ab$, $\mu(1) = abb$, one also gets an infinite cube-free word on two letters. These results have been extended within the theory of avoidable and unavoidable patterns in strings. Note that such combinatorial properties were used to solve various algebraic problems, and to provide a negative answer to the Burnside problem. For more information on the subject, see [a4].

The sequence $u$ was next introduced by M. Morse [a5] in 1921 in order to show the existence of non-periodic recurrent geodesics over simply-connected surfaces with constant negative curvature (cf. also Simply-connected domain; Surface; Geodesic line), by coding a geodesic by an infinite sequence of $0$s and $1$s, according to which boundary of the surface it meets. Indeed, the Thue–Morse sequence is a uniformly recurrent word, i.e., every factor appears in an infinite number of places with bounded gaps. In other words, the symbolic dynamical system generated by the Thue–Morse sequence is minimal (see, for instance, [a6] or Dynamical system).

The Thue–Morse sequence is a typical example of a $k$-automatic sequence. Actually, like every fixed point of a substitution of constant length, it can be generated by a finite machine, called a finite automaton (cf. Automaton, finite), as follows. A $k$-automaton is given by a finite set of states $S$, one state being called the initial state, by $k$ mappings from $S$ into itself (denoted by $0,\dots,k-1$) and by an output mapping $\phi$ from $S$ into a given set $Y$. Such an automaton generates a sequence with values in $Y$ as follows: Feed the automaton with the digits of the base-$k$ expansion of $n$, starting with the initial state; then define $u_n$ as the image under $\phi$ of the reached state. In the Thue–Morse case, the automaton has two states, say $\{A,B\}$, the mapping $0$ maps each state to itself whereas the mapping $1$ exchanges both states $A \leftrightarrow B$, the output mapping is the identity mapping and the state $A$ is the initial state.

Automatic sequences have many nice characterizations (see, for instance, [a8]). Automatic sequences are exactly the letter-to-letter images of fixed points of constant-length substitutions. Furthermore, this is equivalent to the fact that the following subset of subsequences (called the $k$-kernel) $$\left\lbrace{ \left({ u_{k^ t n+r} }\right)_n : t \ge 0\,,\ 0 \le r \le t^k-1 }\right\rbrace$$ is finite or, in the case $k$ is a prime power, to the fact that the series $\sum u_n Z^n$ is algebraic over $\mathbf{F}_k(Z)$. Note that, on the other hand, the real number that has, as dyadic expansion, the Thue–Morse sequence is transcendental. For more references and connections with physics, see [a3].

Define the Rudin–Shapiro sequence $v = (v_n)$ that counts modulo $2$ the number of $11$s (possibly with overlap) in the base-$2$ expansion of $n$. The sequence $v$ is easily seen to have a finite $2$-kernel and hence to be $2$-automatic. This sequence was introduced independently by W. Rudin and H.S. Shapiro (see the references in [a6]) in order to minimize uniformly $\left|{ \sum_{n=0}^{N-1} a_n e^{int} }\right|$, for a sequence $a_n$ defined over $\{ \pm 1 \}$. The Rudin–Shapiro sequence achieves $$\sup_{t} \left|{ \sum_{n=0}^{N-1} v_n e^{int} }\right| \le (2+\sqrt 2)\sqrt{N} \ .$$

The dynamical system generated by each of the two sequences $u$ and $v$ is strictly ergodic (cf. also Ergodic theory), since both underlying substitutions are primitive (see, for instance, [a6]). But, although very similar in their definition, these two sequences have very distinct spectral properties. The Morse system has a singular simple spectrum, whereas the dynamical system generated by the Rudin–Shapiro sequence provides an example of a system with finite spectral multiplicity and a Lebesgue component in the spectrum. For more references on the ergodic, spectral and harmonic properties of substitutive sequences, see [a6].

If (almost) everything is known concerning the Thue–Morse and the Rudin–Shapiro sequences, then the situation is completely different for the fascinating Kolakoski sequence. The Kolakoski sequence is the self-determined sequence defined over the alphabet $\{1, 2\}$ as follows. The sequence begins with $2$, and the sequence of lengths of the consecutive strings of $2$s and $1$s is the sequence itself. Hence this sequence is equal to $22112122122112\ldots$ For a survey of related properties and conjectures, see [a2].

How to Cite This Entry:
Thue-Morse sequence. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Thue-Morse_sequence&oldid=43402
This article was adapted from an original article by V. BerthÃ© (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article