Grammar, generative

Chomsky grammar

A type of formal grammar (cf. Grammar, formal); actually it is a special case of a Post calculus (see Post canonical system). A systematic study of this grammar was begun in the 1950s by N. Chomsky, who pointed out its applications to linguistics and isolated the classes of generative grammars which are most important to applications — context-sensitive grammars, context-free grammars, regular grammars (cf. Grammar, context-sensitive; Grammar, context-free; Grammar, regular). These classes proved to be especially interesting from the mathematical point of view.

A generative grammar is an ordered quadruple $\Gamma = \langle V, W, I, R \rangle$, where $V$ and $W$ are disjoint finite sets, known, respectively, as the terminal and non-terminal alphabets, or dictionaries (their elements are called, respectively, terminal, or basic, and non-terminal, or auxiliary, symbols), $I$ is an element of $W$ called the initial symbol, and $R$ is a finite set of rules of the form $\phi \rightarrow \psi$, where $\phi$ and $\psi$ are strings (words, cf. Word) over the alphabet $V \cup W$ and $\rightarrow$ forms no part of $V \cup W$; $R$ is called the scheme of the grammar. If two strings $\xi$ and $\eta$ can be represented as, respectively, $\xi = \chi _ {1} \phi \chi _ {2}$, $\eta = \chi _ {1} \psi \chi _ {2}$, where $\phi \rightarrow \psi$ is one of the rules of $\Gamma$, then one says that $\eta$ is directly derivable from $\xi$ in $\Gamma$( denoted by $\xi \Rightarrow _ \Gamma \eta$ or $\xi \Rightarrow \eta$). A sequence of strings $( \omega _ {0} \dots \omega _ {n} )$ is said to be a derivation of $\omega _ {n}$ from $\omega _ {1}$ in $\Gamma$ if for all $i$, $1 \leq i \leq n$, $\omega _ {i-} 1 \Rightarrow _ \Gamma \omega _ {i}$; the number $n$ is the length of the derivation. A derivation is said to be complete if $\omega _ {0} = I$ and $\omega _ {n}$ does not contain non-terminal symbols. If there exists a derivation of a string $\eta$ from a string $\xi$ in $\Gamma$, then $\eta$ is said to be derivable from $\xi$ in $\Gamma$. The set of strings in the terminal alphabet that are derivable in $\Gamma$ from $I$ is said to be the language generated by the grammar $\Gamma$( denoted by $L ( \Gamma )$). Two generative grammars are equivalent if they generate the same language. The class of languages generated by all possible generative grammars coincides with the class of recursively-enumerable sets of strings.

The so-called complexity functions, the most important ones being time complexity and space complexity, are used to estimate the complexity of a derivation in generative grammars. The time complexity of a grammar $\Gamma$ is a function $\tau _ \Gamma ( n)$ of a natural argument, the value of which for each $n$ is equal to the smallest number $k$ having the following property: For any string $x \in L ( \Gamma )$ such that $| x | \leq n$( where $| x |$ is the length of $x$) there exists a derivation $D$ of this chain from the initial symbol of $\Gamma$ with length at most $k$; if there are no strings $x$ such that $x \in L ( \Gamma )$ and $| x | \leq n$, then $\tau _ \Gamma ( n) = 0$. The space complexity $\sigma _ \Gamma ( n)$ of the grammar $\Gamma$ is defined in an analogous manner by replacing the length of a derivation $D = ( \omega _ {0} \dots \omega _ {s} )$ by the greatest length of the strings $\omega _ {i}$, $i = 0 \dots s$.

Let $M ^ \prime$ be some set of complete derivations in the grammar $\Gamma$ and let $L ^ \prime$ be the set of terminal strings of derivations belonging to $M ^ \prime$; then $L ^ \prime \subseteq L ( \Gamma )$. If $M ^ \prime$ is effectively given, then one says that a certain way of controlling a derivation in $\Gamma$ has been given. A study of the ways of controlling a derivation is material in applications, since the possibility of using certain definite derivations rather than random ones corresponds to the situation which occurs in natural language. Control of a derivation can be realized, in particular, by imposing restrictions on the sequence of rules used in the derivation (e.g. the set of "admissible" sequences of rules may itself be generated by some generative grammar $\Gamma ^ \prime$; in such a case the language is defined by an ordered pair of generative grammars $( \Gamma , \Gamma ^ \prime )$, which is called a generalized grammar), either on the strings constituting the derivation or by some more complicated method (e.g. the rule employed in a subsequent stage may depend on the type of the string obtained in the preceding stage).

Studies of generative grammars necessarily involve algorithmic problems. If $\alpha$ is a property of languages, ${\mathcal T}$ is a decidable class of grammars, and if there exists an algorithm which allows one to decide, for any grammar $\Gamma \in {\mathcal T}$, whether or not the language $L ( \Gamma )$ has the property $\alpha$, then one says that $\alpha$ is decidable in the class ${\mathcal T}$. In the class of all generative grammars no non-trivial property (i.e. such that the corresponding class of languages contains both languages which display and languages which do not display this property) is decidable. One may speak, in a similar manner, about relations which are decidable in a certain class of grammars. There also arise problems of a different type, e.g. the existence, for a given grammar $\Gamma$, of an algorithm by which it is possible to find, from any $n$ strings $x _ {1} \dots x _ {n}$ in its terminal alphabet, the value of a given predicate $P ( L, \omega _ {1} \dots \omega _ {n} )$ for $L = L ( \Gamma )$, $\omega _ {1} = x _ {1} \dots \omega _ {n} = x _ {n}$. In particular, if $P ( L, \omega )$ denotes $\omega \in L$, then one is asking for an algorithm for recognizing if an arbitrary string forms part of the language $L ( \Gamma )$. If such an algorithm in fact exists for the grammar $\Gamma$, then the problem of its complexity — or, in other words, of the recognition complexity of the language $L ( \Gamma )$— is important.

References

 [1] M. Gross, A. Lentin, "Introduction to formal grammars" , Springer (1970) (Translated from French) [2] A.V. Gladkii, "Formal grammars and languages" , Moscow (1973) (In Russian) [3] J.E. Hopcroft, J.D. Ullman, "Formal languages and their relation to automata" , Addison-Wesley (1969) (Revised version: "Introduction to automata, languages and computation" Addison-Wesley, 1979) [4] A.V. Gladkii, A.Ya. Dikovskii, "Theory of formal grammars" J. Soviet Math. , 2 : 5 (1974) pp. 542–564 Itogi Nauk. i Tekhn. Teor. Veroyatnost. Mat. Statist. Teoret. Kibernetika , 10 (1972) pp. 107–142 [5] A.N. Maslov, E.D. Stotskii, "Classes of formal grammars" J. Soviet Math. , 6 : 2 (1976) pp. 189–209 Itogi Nauk. i Tekhn. Teor. Veroyatnost. Mat. Statist. Teoret. Kibernetika , 12 (1975) pp. 156–187 [6a] E.D. Stotskii, "Formal grammars and constraints on derivation" Problems Inform. Transmission , 7 : 1 (1971) pp. 69–81 Problemy Peredachi Informatsii , 7 : 1 (1971) pp. 87–101 [6b] E.D. Stotskii, "Control of the conclusion in formal grammars" Problems Inform. Transmission , 7 : 3 (1971) pp. 257–270 Problemy Peredachi Informatsii , 7 : 3 (1971) pp. 87–102