Post correspondence problem
This is one of the three basic undecidable problems, the other two being the halting problem and Hilbert's tenth problem. To establish the algorithmic unsolvability of a specific problem, a reduction to the Post correspondence problem, [a3], is in many cases the most natural procedure.
Proofs of undecidability are either direct or indirect. Indirect proofs require some points of reference: problems already known to be undecidable. In regard to language theory, the most useful among such points of reference is the Post correspondence problem. For instance, when dealing with context-free languages, it is rather difficult to use the halting problem as a point of reference, whereas the Post correspondence problem is very suitable.
The Post correspondence problem can be described as follows. One has to decide for two lists, both consisting of words, whether or not some catenation of words in one list equals the corresponding catenation of words in the other list. Here "corresponding" means that words in exactly the same positions in the two lists must be used. For example, if the two lists are
(a1) |
and one catenates the first, second, first, and third words in the lists (in this order), then the word results in both cases. This is expressed by saying that "the sequence of indices 1, 2, 1, 3 is a solution for an instance of the Post correspondence problemsolution for instance of the Post correspondence problem" . On the other hand, the instance consisting of the two lists
(a2) |
possesses no solution. This can be shown by the following argument. A possible solution of (a2) must begin with the index 1. Then the index 2 must follow, because the word coming from the second list must "catch up" the missing . So far the words and are obtained. Since one of the words must always be a prefix of the other, one concludes that the next index in the solution must be 2, yielding the words and . But because neither of the words in the first list begins with , one concludes that no further continuation is possible.
The notion of a morphism provides a convenient way of giving a formal definition for the Post correspondence problem. Let and be two morphisms with domain alphabet and range alphabet . The pair is called an instance of the Post correspondence problem. A non-empty word over such that is referred to as a solution of this instance.
For some instances of the Post correspondence problem it is rather easy to find out whether or not they have solutions. Above it was observed that (a1) has a solution, whereas (a2) has no solution. There are also some classes of instances for which an easy decision procedure exists. One such class consists of all instances for which consists of a single letter. Then the two lists have the form
(a3) |
It is easy to verify that (a3) has a solution if and only if either there is a such that , or else there are and such that and .
By the decidability of the Post correspondence problem is meant the existence of an algorithm that settles every instance. To show the undecidability of the Post correspondence problem, one reduces it to the halting problem, or, equivalently, to the membership problem of recursively enumerable languages. (See Formal languages and automata.)
Assume that the Post correspondence problem is decidable and consider an arbitrary grammar with non-terminal and terminal alphabets and , respectively, start letter , and set of productions. Further, consider an arbitrary word over . Below an instance of the Post correspondence problem is constructed such that has a solution if and only if is in . Since the construction of will be effective, this gives a method of deciding whether or not is in .
The letters of are denoted by , where is the start letter. Let
be the set of productions of . It may be assumed that none of the words and is empty. Let
(It is assumed that , , , and are letters not contained in the other alphabets considered. Intuitively, marks the beginning of a derivation and the end of it, whereas and are markers between two consecutive words in a derivation.) For any word over , let denote the word over obtained from by replacing every letter with .
Consider the instance of the Post correspondence problem. The domain alphabet of the morphisms and is
and the range alphabet is the alphabet defined above. The morphisms themselves are defined by the following table, where ranges over the numbers and ranges over the numbers .
<tbody> </tbody> |
Now possesses a solution if and only if . Hence, if the Post correspondence problem is decidable, then so is the membership problem for recursively enumerable languages.
The idea behind the construction of is that derivations according to will be simulated. Moreover, produces the derivation faster than because it starts with , whereas starts only with . The morphism can catch up at the end of the derivation (by the index 4) if and only if is the last word of the derivation.
For example, assume that the first three productions of are (in this order)
Assume, further, that and are the second and third letter of , respectively. (Recall that .) Consider the word . Clearly, has the following derivation:
(a4) |
Then a solution for is constructed according to the table below.
<tbody> </tbody> |
Hence the word
(where parentheses have been added for clarity) is a solution for , the common value of the morphisms and being for this word:
(a5) |
Clearly, (a5) results from (a4) after the following simple modifications. The boundary markers and are added to the beginning and end; the sign is replaced by ; moreover, every second step is primed, as is every second marker ; finally, an "idle" step (from to ) is added in order to reach the non-primed word corresponding to the final index of (a4).
It is easily seen that an analogous argument is valid in general: whenever is in , then has a solution (see [a4], where this fact is established by a detailed inductive argument). It is somewhat more difficult to establish the converse — that is, to show that if has a solution, then is in , see [a4].
Using a simple encoding it can be shown that the Post correspondence problem remains undecidable if the range alphabet of the morphisms and consists of two letters only. The decidability in case the domain alphabet consists of two letters was established in [a1]. Another non-trivial decidable subclass, [a2], consists of instances where one of the morphisms and is periodic, that is, all of its values are powers of a single word. The Post correspondence problem remains undecidable if consists of 9 letters. The decidability status is open if the cardinality of is a fixed number between 3 and 8 (inclusive).
The Post correspondence problem is closely linked with equality sets, which are important in the representation theory of formal languages. By definition, the equality set of and consists of all words such that . Clearly, an instance possesses a solution if and only if the equality set contains a non-empty word.
References
[a1] | A. Ehrenfeucht, J. Karhumäki, G. Rozenberg, "The (generalized) Post corresponding problem with lists consisting of two words is decidable" Theor. Comp. Sc. , 21 (1982) pp. 119–144 |
[a2] | J. Karhumäki, I. Simon, "A note on elementary homomorphisms and the regularity of equality sets" EATCS Bull. , 9 (1979) pp. 16–24 |
[a3] | E. Post, "A variant of a recursively unsolvable problem" Bull. Amer. Math. Soc. , 52 (1946) pp. 264–268 |
[a4] | A. Salomaa, "Formal languages" , Acad. Press (1987) |
Post correspondence problem. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Post_correspondence_problem&oldid=13231