# Formal language

*in mathematical linguistics*

An arbitrary set of chains (that is, words, cf. Word) over some (finite or infinite) alphabet $ V $( sometimes also called a dictionary), that is, a set of expressions of the form $ \omega = a _ {1} \dots a _ {k} $, where $ a _ {1} \dots a _ {k} \in V $; the number $ k $, usually denoted by $ | \omega | $, is the length of the chain $ \omega $. One also considers the empty chain, denoted by $ \Lambda $; one sets $ | \Lambda | = 0 $. One often speaks of a language over the alphabet $ V $, omitting the word "formal" . In mathematical linguistics and the theory of automata (cf. Automata, theory of) one considers various effective ways of specifying a formal language, principally by means of formal grammars (cf. Grammar, formal) and automata of various types, which can, in the majority of cases, be described as modifications of non-deterministic Turing machines (cf. Turing machine), often multi-tape, with some restrictions on the ways the machine works on each tape.

## Operations on formal languages.

In addition to the usual set-theoretic operations on formal languages, one carries out: multiplication (or direct multiplication, or concatenation):

$$ L _ {1} L _ {2} = \ \{ {xy } : {x \in L _ {1} , y \in L _ {2} } \} ; $$

left division:

$$ L _ {2} \setminus L _ {1} = \ \{ {x } : {\exists y, z ( y \in L _ {1} \& z \in L _ {2} \& y = zx) } \} ; $$

right division $ L _ {1} /L _ {2} $ is defined by analogy with left division; iteration:

$$ L ^ {*} = L ^ {0} \cup L ^ {1} \cup \dots , $$

where $ L ^ {0} $ denotes $ \{ \Lambda \} $ and $ L ^ {n + 1 } = L ^ {n} L $( in particular, the set of all chains in $ V $ is $ V ^ {*} $); truncated iteration:

$$ L ^ {+} = L ^ {1} \cup L ^ {2} \cup \dots ; $$

and substitution: If $ L $ is a language over the finite alphabet $ \{ a _ {1} \dots a _ {n} \} $ and $ L _ {1} \dots L _ {n} $ are arbitrary languages, then

$$ S ( L; a _ {1} \dots a _ {n} \mid L _ {1} \dots L _ {n} ) = $$

$$ = \ \{ x _ {i _ {1} } \dots x _ {i _ {k} } : \ a _ {i _ {1} } \dots a _ {i _ {k} } \in L \& x _ {i _ {1} } \in L _ {i _ {1} } \& \dots \& x _ {i _ {k} } \in L _ {i _ {k} } \} ; $$

if each of the languages $ L _ {i} $, $ i = 1 \dots n $, consists of one chain $ z _ {i} $, a substitution is called a homomorphism; if all the $ z _ {i} $ are non-empty, one speaks of an unabbreviated homomorphism. If the language $ \{ x \} $ consists of one chain $ x $, then instead of $ \{ x \} L $, $ \{ x \} \setminus L $, etc., one writes $ xL $, $ x \setminus L $, etc., respectively.

A variety of languages is an ordered pair $ ( {\mathcal E} , {\mathcal L} ) $( or $ {\mathcal L} $, if $ {\mathcal E} $ is taken for granted), where $ {\mathcal E} $ is an infinite alphabet and $ {\mathcal L} $ is a class of languages such that: 1) for every $ L \in {\mathcal L} $ there is a finite alphabet $ \Sigma \subset {\mathcal E} $ such that $ L \subset \Sigma ^ {*} $; 2) $ L \neq \emptyset $ for some $ L \in {\mathcal L} $; and 3) $ {\mathcal L} $ is closed relative to union, multiplication, intersection with regular sets, truncated iteration, unabbreviated homomorphisms, and inverses of arbitrary homomorphisms. A variety that is closed relative to arbitrary homomorphisms is called complete.

#### References

[1] | A.V. Gladkii, "Formal grammars and languages" , Moscow (1973) (In Russian) |

[2] | S. Ginsburg, S. Greibach, Y. Hopcroft, "Studies in abstract families of languages" Mem. Amer. Math. Soc. , 87 (1969) pp. 1–32 |

#### Comments

For more details see Formal languages and automata.

**How to Cite This Entry:**

Formal language.

*Encyclopedia of Mathematics.*URL: http://encyclopediaofmath.org/index.php?title=Formal_language&oldid=46955