# Lyndon word

A word is a sequence of letters $(a_1,\ldots,a_n)$, that is, elements chosen from a set $A$ called an alphabet. A word is usually written as $a_1\ldots a_n$, or abbreviated by a single symbol: $u = a_1\ldots a_n$. The length of $u$ is equal to the number of letters in $u$, i.e. $|u| = n$. One may concatenate two words $u = a_1\ldots a_n$, $v = b_1\ldots b_m$, and this operation is concisely written as $uv = a_1\ldots a_n b_1\ldots b_m$ . Words may be ordered lexicographically (cf. Lexicographic order); more precisely, given a total order $<$ on $A$, one may extend it over the set of words by setting $a_1\ldots a_n < b_1\ldots b_m$ if either $n < m$ and $a_1 = b_1, \ldots, a_m = b_m$ or there exists an $i$ such that $1 \le i \le \min(m,n)$ with $a_1 = b_1, \ldots, a_{i-1} = b_{i-1}$ and $a_i < b_i$. This word order is sometimes referred to as alphabetical order. For more details, see [a1].

Lyndon words are words strictly less than any of their non-empty proper right factors (or strict suffixes), that is, $a_1\ldots a_n$ is a Lyndon word if it is strictly less than $a_i\ldots a_n$ for $2 \le i \le n$ with respect to the lexicographic order. These were introduced by R.C. Lyndon [a2] for constructing bases of the lower central series for free groups (cf. Central series of a group). From the definition one can see that single letters are Lyndon words. Equivalently, $a_1\ldots a_n$ is a Lyndon word if it is strictly less than any of the circular shifts $a_i\ldots a_n a_1\ldots a_{i-1}$ ($2 \le i \le n$). One important characterization of these words yields a recursive process to generate them: a word $w$ is a Lyndon word if and only if there exists two Lyndon words $u$,$v$ such that $u < v$ and $w = uv$. J.P. Duval [a3] gave a beautiful, and clever, algorithm that generates Lyndon words of bounded length over a finite alphabet. The number of Lyndon words of length $n$ over a finite alphabet of cardinality $p$ may be explicitly computed, using Witt's formula: $$ \frac{1}{n} \sum_{d | n} \mu(d) p^{n/d} $$ where $\mu(n)$ is the Möbius function.

A multi-homogeneous version of Witt's formula may be found in [a4]. The central result about Lyndon words is the following Chen–Fox–Lyndon theorem: Any word can be expressed as a unique non-increasing product of Lyndon words. The preceding characterization suggests an algorithm to compute this factorization. Duval [a5] gave a simple and efficient algorithm that factorizes a word in linear time.

The applications of Lyndon words, in algebra and combinatorics, seem innumerable. For instance, there are as many Lyndon words over an alphabet of cardinality $p$ ($p$ a prime) as there are irreducible polynomials over a finite field of characteristic $p$; see [a4]. The factorization theorem, when applied to permutations (considered as words without repetitions), yields a variant of the so-called Foata transform [a1], Chap. 10.

Given any Lyndon word $w$, one may compute the factorization of $w$ as a product $uv$, where $v$ is a Lyndon word of maximal length (less than the length of $w$). The strict suffix (or proper right factor) $v$ can also be characterized as the smallest strict suffix of $w$. Then $u$ is also a Lyndon word and $u < uv < v$. This factorization of $w$ is called its right standard factorization. Similarly, one may define its left standard factorization.

Let $\lambda$ be a mapping from the set of Lyndon words into the free associative algebra on $A$ by setting $\lambda(a) = a$, and $\lambda(w) = \lambda(u)\lambda(v) - \lambda(v)\lambda(u)$, where $w = uv$ is its right standard factorization. The polynomial thus associated to $w$ is a Lie polynomial, and the set $\{ \lambda(w) : w \,\text{a Lyndon word of}\,A \}$ forms a linear basis of the free Lie algebra over $A$. Note that the iterated right standard factorizations recursively define a bracketing of $w$ (see Binary tree), and thus define a Hall set. The corresponding basis is called the Chen–Fox–Lyndon basis, or simply the Lyndon basis. It is practically identical with the Shirshov basis.

A good introduction to Lyndon words is [a1], Chap. 5. For applications of Lyndon words related to free Lie algebras, see [a4].

See also Hall word; Hall set; Lazard set.

#### References

[a1] | "Combinatorics on words" M. Lothaire (ed.) , Addison-Wesley (1983) |

[a2] | K.T. Chen, R.H. Fox, R.C. Lyndon, "Free differential calculus, IV. The quotient groups of the lower central series" Ann. of Math. , 68 (1958) pp. 81–95 |

[a3] | J.P. Duval, "Génération d'une section des classes de conjugaison et arbre de mots de Lyndon de longueur bornée" Theor. Comput. Sci. , 60 (1988) pp. 255–283 |

[a4] | C. Reutenauer, "Free Lie algebras" , London Math. Soc. Monographs New Ser. , 7 , Oxford Univ. Press (1993) |

[a5] | J.P. Duval, "Factorizing words over an ordered alphabet" J. Algorithms , 4 (1983) pp. 363–381 |

[a6] | G. Melançon, C. Reutenauer, "Lyndon words, free algebras and shuffles" Canad. J. Math. , XLI : 4 (1989) pp. 577–591 |

**How to Cite This Entry:**

Lyndon word.

*Encyclopedia of Mathematics.*URL: http://encyclopediaofmath.org/index.php?title=Lyndon_word&oldid=37485