Post correspondence problem
This is one of the three basic undecidable problems, the other two being the halting problem and Hilbert's tenth problem. To establish the algorithmic unsolvability of a specific problem, a reduction to the Post correspondence problem, [a3], is in many cases the most natural procedure.
Proofs of undecidability are either direct or indirect. Indirect proofs require some points of reference: problems already known to be undecidable. In regard to language theory, the most useful among such points of reference is the Post correspondence problem. For instance, when dealing with context-free languages, it is rather difficult to use the halting problem as a point of reference, whereas the Post correspondence problem is very suitable.
The Post correspondence problem can be described as follows. One has to decide for two lists, both consisting of words, whether or not some catenation of words in one list equals the corresponding catenation of words in the other list. Here "corresponding" means that words in exactly the same positions in the two lists must be used. For example, if the two lists are
![]() | (a1) |
and one catenates the first, second, first, and third words in the lists (in this order), then the word results in both cases. This is expressed by saying that "the sequence of indices 1, 2, 1, 3 is a solution for an instance of the Post correspondence problemsolution for instance of the Post correspondence problem" . On the other hand, the instance consisting of the two lists
![]() | (a2) |
possesses no solution. This can be shown by the following argument. A possible solution of (a2) must begin with the index 1. Then the index 2 must follow, because the word coming from the second list must "catch up" the missing . So far the words
and
are obtained. Since one of the words must always be a prefix of the other, one concludes that the next index in the solution must be 2, yielding the words
and
. But because neither of the words in the first list begins with
, one concludes that no further continuation is possible.
The notion of a morphism provides a convenient way of giving a formal definition for the Post correspondence problem. Let and
be two morphisms with domain alphabet
and range alphabet
. The pair
is called an instance of the Post correspondence problem. A non-empty word
over
such that
is referred to as a solution of this instance.
For some instances of the Post correspondence problem it is rather easy to find out whether or not they have solutions. Above it was observed that (a1) has a solution, whereas (a2) has no solution. There are also some classes of instances for which an easy decision procedure exists. One such class consists of all instances for which consists of a single letter. Then the two lists have the form
![]() | (a3) |
It is easy to verify that (a3) has a solution if and only if either there is a such that
, or else there are
and
such that
and
.
By the decidability of the Post correspondence problem is meant the existence of an algorithm that settles every instance. To show the undecidability of the Post correspondence problem, one reduces it to the halting problem, or, equivalently, to the membership problem of recursively enumerable languages. (See Formal languages and automata.)
Assume that the Post correspondence problem is decidable and consider an arbitrary grammar with non-terminal and terminal alphabets
and
, respectively, start letter
, and set
of productions. Further, consider an arbitrary word
over
. Below an instance
of the Post correspondence problem is constructed such that
has a solution if and only if
is in
. Since the construction of
will be effective, this gives a method of deciding whether or not
is in
.
The letters of are denoted by
, where
is the start letter. Let
![]() |
be the set of productions of . It may be assumed that none of the words
and
is empty. Let
![]() |
![]() |
(It is assumed that ,
,
, and
are letters not contained in the other alphabets considered. Intuitively,
marks the beginning of a derivation and
the end of it, whereas
and
are markers between two consecutive words in a derivation.) For any word
over
, let
denote the word over
obtained from
by replacing every letter
with
.
Consider the instance of the Post correspondence problem. The domain alphabet of the morphisms
and
is
![]() |
and the range alphabet is the alphabet defined above. The morphisms themselves are defined by the following table, where
ranges over the numbers
and
ranges over the numbers
.'
<tbody> </tbody>
|
Now possesses a solution if and only if
. Hence, if the Post correspondence problem is decidable, then so is the membership problem for recursively enumerable languages.
The idea behind the construction of is that derivations according to
will be simulated. Moreover,
produces the derivation faster than
because it starts with
, whereas
starts only with
. The morphism
can catch up at the end of the derivation (by the index 4) if and only if
is the last word of the derivation.
For example, assume that the first three productions of are (in this order)
![]() |
Assume, further, that and
are the second and third letter of
, respectively. (Recall that
.) Consider the word
. Clearly,
has the following derivation:
![]() | (a4) |
Then a solution for is constructed according to the table below.'
<tbody> </tbody>
|
Hence the word
![]() |
![]() |
(where parentheses have been added for clarity) is a solution for , the common value of the morphisms
and
being for this word:
![]() | (a5) |
Clearly, (a5) results from (a4) after the following simple modifications. The boundary markers and
are added to the beginning and end; the sign
is replaced by
; moreover, every second step is primed, as is every second marker
; finally, an "idle" step (from
to
) is added in order to reach the non-primed word
corresponding to the final index of (a4).
It is easily seen that an analogous argument is valid in general: whenever is in
, then
has a solution (see [a4], where this fact is established by a detailed inductive argument). It is somewhat more difficult to establish the converse — that is, to show that if
has a solution, then
is in
, see [a4].
Using a simple encoding it can be shown that the Post correspondence problem remains undecidable if the range alphabet of the morphisms
and
consists of two letters only. The decidability in case the domain alphabet
consists of two letters was established in [a1]. Another non-trivial decidable subclass, [a2], consists of instances
where one of the morphisms
and
is periodic, that is, all of its values are powers of a single word. The Post correspondence problem remains undecidable if
consists of 9 letters. The decidability status is open if the cardinality of
is a fixed number between 3 and 8 (inclusive).
The Post correspondence problem is closely linked with equality sets, which are important in the representation theory of formal languages. By definition, the equality set of and
consists of all words
such that
. Clearly, an instance
possesses a solution if and only if the equality set contains a non-empty word.
References
[a1] | A. Ehrenfeucht, J. Karhumäki, G. Rozenberg, "The (generalized) Post corresponding problem with lists consisting of two words is decidable" Theor. Comp. Sc. , 21 (1982) pp. 119–144 |
[a2] | J. Karhumäki, I. Simon, "A note on elementary homomorphisms and the regularity of equality sets" EATCS Bull. , 9 (1979) pp. 16–24 |
[a3] | E. Post, "A variant of a recursively unsolvable problem" Bull. Amer. Math. Soc. , 52 (1946) pp. 264–268 |
[a4] | A. Salomaa, "Formal languages" , Acad. Press (1987) |
Post correspondence problem. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Post_correspondence_problem&oldid=37351