# Formal language

in mathematical linguistics

An arbitrary set of chains (that is, words, cf. Word) over some (finite or infinite) alphabet $V$( sometimes also called a dictionary), that is, a set of expressions of the form $\omega = a _ {1} \dots a _ {k}$, where $a _ {1} \dots a _ {k} \in V$; the number $k$, usually denoted by $| \omega |$, is the length of the chain $\omega$. One also considers the empty chain, denoted by $\Lambda$; one sets $| \Lambda | = 0$. One often speaks of a language over the alphabet $V$, omitting the word "formal" . In mathematical linguistics and the theory of automata (cf. Automata, theory of) one considers various effective ways of specifying a formal language, principally by means of formal grammars (cf. Grammar, formal) and automata of various types, which can, in the majority of cases, be described as modifications of non-deterministic Turing machines (cf. Turing machine), often multi-tape, with some restrictions on the ways the machine works on each tape.

## Operations on formal languages.

In addition to the usual set-theoretic operations on formal languages, one carries out: multiplication (or direct multiplication, or concatenation):

$$L _ {1} L _ {2} = \ \{ {xy } : {x \in L _ {1} , y \in L _ {2} } \} ;$$

left division:

$$L _ {2} \setminus L _ {1} = \ \{ {x } : {\exists y, z ( y \in L _ {1} \& z \in L _ {2} \& y = zx) } \} ;$$

right division $L _ {1} /L _ {2}$ is defined by analogy with left division; iteration:

$$L ^ {*} = L ^ {0} \cup L ^ {1} \cup \dots ,$$

where $L ^ {0}$ denotes $\{ \Lambda \}$ and $L ^ {n + 1 } = L ^ {n} L$( in particular, the set of all chains in $V$ is $V ^ {*}$); truncated iteration:

$$L ^ {+} = L ^ {1} \cup L ^ {2} \cup \dots ;$$

and substitution: If $L$ is a language over the finite alphabet $\{ a _ {1} \dots a _ {n} \}$ and $L _ {1} \dots L _ {n}$ are arbitrary languages, then

$$S ( L; a _ {1} \dots a _ {n} \mid L _ {1} \dots L _ {n} ) =$$

$$= \ \{ x _ {i _ {1} } \dots x _ {i _ {k} } : \ a _ {i _ {1} } \dots a _ {i _ {k} } \in L \& x _ {i _ {1} } \in L _ {i _ {1} } \& \dots \& x _ {i _ {k} } \in L _ {i _ {k} } \} ;$$

if each of the languages $L _ {i}$, $i = 1 \dots n$, consists of one chain $z _ {i}$, a substitution is called a homomorphism; if all the $z _ {i}$ are non-empty, one speaks of an unabbreviated homomorphism. If the language $\{ x \}$ consists of one chain $x$, then instead of $\{ x \} L$, $\{ x \} \setminus L$, etc., one writes $xL$, $x \setminus L$, etc., respectively.

A variety of languages is an ordered pair $( {\mathcal E} , {\mathcal L} )$( or ${\mathcal L}$, if ${\mathcal E}$ is taken for granted), where ${\mathcal E}$ is an infinite alphabet and ${\mathcal L}$ is a class of languages such that: 1) for every $L \in {\mathcal L}$ there is a finite alphabet $\Sigma \subset {\mathcal E}$ such that $L \subset \Sigma ^ {*}$; 2) $L \neq \emptyset$ for some $L \in {\mathcal L}$; and 3) ${\mathcal L}$ is closed relative to union, multiplication, intersection with regular sets, truncated iteration, unabbreviated homomorphisms, and inverses of arbitrary homomorphisms. A variety that is closed relative to arbitrary homomorphisms is called complete.

#### References

 [1] A.V. Gladkii, "Formal grammars and languages" , Moscow (1973) (In Russian) [2] S. Ginsburg, S. Greibach, Y. Hopcroft, "Studies in abstract families of languages" Mem. Amer. Math. Soc. , 87 (1969) pp. 1–32