# Post correspondence problem

This is one of the three basic undecidable problems, the other two being the halting problem and Hilbert's tenth problem. To establish the algorithmic unsolvability of a specific problem, a reduction to the Post correspondence problem, [a3], is in many cases the most natural procedure.

Proofs of undecidability are either direct or indirect. Indirect proofs require some points of reference: problems already known to be undecidable. In regard to language theory, the most useful among such points of reference is the Post correspondence problem. For instance, when dealing with context-free languages, it is rather difficult to use the halting problem as a point of reference, whereas the Post correspondence problem is very suitable.

The Post correspondence problem can be described as follows. One has to decide for two lists, both consisting of $n$ words, whether or not some catenation of words in one list equals the corresponding catenation of words in the other list. Here "corresponding" means that words in exactly the same positions in the two lists must be used. For example, if the two lists are

$$\tag{a1 } ( a ^ {2} , b ^ {2} , ab ^ {2} ) \ \textrm{ and } \ \ ( a ^ {2} b , ba , b )$$

and one catenates the first, second, first, and third words in the lists (in this order), then the word $a ^ {2} b ^ {2} a ^ {3} b ^ {2}$ results in both cases. This is expressed by saying that "the sequence of indices 1, 2, 1, 3 is a solution for an instance of the Post correspondence problem" . On the other hand, the instance consisting of the two lists

$$\tag{a2 } ( a ^ {2} b , a ) \ \textrm{ and } \ ( a ^ {2} , ba ^ {2} )$$

possesses no solution. This can be shown by the following argument. A possible solution of (a2) must begin with the index 1. Then the index 2 must follow, because the word coming from the second list must "catch up" the missing $b$. So far the words $a ^ {2} ba$ and $a ^ {2} ba ^ {2}$ are obtained. Since one of the words must always be a prefix of the other, one concludes that the next index in the solution must be 2, yielding the words $a ^ {2} ba ^ {2}$ and $a ^ {2} ba ^ {2} ba ^ {2}$. But because neither of the words in the first list begins with $b$, one concludes that no further continuation is possible.

The notion of a morphism provides a convenient way of giving a formal definition for the Post correspondence problem. Let $g$ and $h$ be two morphisms with domain alphabet $\Sigma$ and range alphabet $\Delta$. The pair $( g, h )$ is called an instance of the Post correspondence problem. A non-empty word $w$ over $\Sigma$ such that $g( w) = h( w)$ is referred to as a solution of this instance.

For some instances of the Post correspondence problem it is rather easy to find out whether or not they have solutions. Above it was observed that (a1) has a solution, whereas (a2) has no solution. There are also some classes of instances for which an easy decision procedure exists. One such class consists of all instances for which $\Delta$ consists of a single letter. Then the two lists have the form

$$\tag{a3 } ( a ^ {i _ {1} } \dots a ^ {i _ {n} } ) \ \textrm{ and } \ \ ( a ^ {j _ {1} } \dots a ^ {j _ {n} } ) .$$

It is easy to verify that (a3) has a solution if and only if either there is a $t$ such that $i _ {t} = j _ {t}$, or else there are $t$ and $u$ such that $i _ {t} > j _ {t}$ and $i _ {u} < j _ {u}$.

By the decidability of the Post correspondence problem is meant the existence of an algorithm that settles every instance. To show the undecidability of the Post correspondence problem, one reduces it to the halting problem, or, equivalently, to the membership problem of recursively enumerable languages. (See Formal languages and automata.)

Assume that the Post correspondence problem is decidable and consider an arbitrary grammar $G$ with non-terminal and terminal alphabets $\Sigma _ {N}$ and $\Sigma _ {T}$, respectively, start letter $S$, and set $P$ of productions. Further, consider an arbitrary word $w$ over $\Sigma _ {T}$. Below an instance $PCP$ of the Post correspondence problem is constructed such that $PCP$ has a solution if and only if $w$ is in $L( G)$. Since the construction of $PCP$ will be effective, this gives a method of deciding whether or not $w$ is in $L( G)$.

The letters of $\Sigma _ {N} \cup \Sigma _ {T} = \Sigma _ {1}$ are denoted by $a _ {1} \dots a _ {r}$, where $a _ {1}$ is the start letter. Let

$$\{ {\alpha _ {i} \rightarrow \beta _ {i} } : {1 \leq i \leq n } \}$$

be the set of productions of $G$. It may be assumed that none of the words $\alpha _ {i}$ and $\beta _ {i}$ is empty. Let

$$\Sigma _ {1} ^ \prime = \{ {a ^ \prime } : {a \in \Sigma _ {1} } \} ,$$

$$\Delta = \Sigma _ {1} \cup \Sigma _ {1} ^ \prime \cup \{ B, E, \# , \# ^ \prime \} .$$

(It is assumed that $B$, $E$, $\#$, and $\# ^ \prime$ are letters not contained in the other alphabets considered. Intuitively, $B$ marks the beginning of a derivation and $E$ the end of it, whereas $\#$ and $\# ^ \prime$ are markers between two consecutive words in a derivation.) For any word $x$ over $\Sigma _ {1}$, let $x ^ \prime$ denote the word over $\Sigma _ {1} ^ \prime$ obtained from $x$ by replacing every letter $a _ {i}$ with $a _ {i} ^ \prime$.

Consider the instance $PCP=( g, h)$ of the Post correspondence problem. The domain alphabet of the morphisms $g$ and $h$ is

$$\Sigma = \{ 1 \dots 2r+ 2n+ 4 \} ,$$

and the range alphabet is the alphabet $\Delta$ defined above. The morphisms themselves are defined by the following table, where $i$ ranges over the numbers $1 \dots r$ and $j$ ranges over the numbers $1 \dots n$.

<tbody> </tbody>
 $1$ $2$ $3$ $4$ $4+ i$ $4+ r+ i$ $4+ 2r+ j$ $4+ 2r+ n+ j$ $g$ $Ba _ {1} \#$ $\# ^ \prime$ $\#$ $E$ $a _ {i} ^ \prime$ $a _ {i}$ $\beta _ {j} ^ \prime$ $\beta _ {j}$ $h$ $B$ $\#$ $\# ^ \prime$ $\# ^ \prime wE$ $a _ {i}$ $a _ {i} ^ \prime$ $\alpha _ {j}$ $\alpha _ {j} ^ \prime$

Now $PCP$ possesses a solution if and only if $w \in L( G)$. Hence, if the Post correspondence problem is decidable, then so is the membership problem for recursively enumerable languages.

The idea behind the construction of $PCP$ is that derivations according to $G$ will be simulated. Moreover, $g$ produces the derivation faster than $h$ because it starts with $Ba _ {1} \#$, whereas $h$ starts only with $B$. The morphism $h$ can catch up at the end of the derivation (by the index 4) if and only if $w$ is the last word of the derivation.

For example, assume that the first three productions of $G$ are (in this order)

$$S \rightarrow aCD,\ \ CD \rightarrow D ,\ \ D \rightarrow d .$$

Assume, further, that $a$ and $D$ are the second and third letter of $\Sigma _ {1}$, respectively. (Recall that $S = a _ {1}$.) Consider the word $w = ad$. Clearly, $w$ has the following derivation:

$$\tag{a4 } S \Rightarrow aCD \Rightarrow aD \Rightarrow ad.$$

Then a solution for $PCP$ is constructed according to the table below.

<tbody> </tbody>
 $1$ $4+ 2r+ 1$ $2$ $4+ r+ 2$ $4+ 2r+ n+ 2$ $3$ $4+ 2$ $4+ 3$ $2$ $4+ r+ 2$ $4+ 2r+ n+ 3$ $4$ $g$ $BS\#$ $a ^ \prime C ^ \prime D ^ \prime$ $\# ^ \prime$ $a$ $D$ $\#$ $a ^ \prime$ $D ^ \prime$ $\# ^ \prime$ $a$ $d$ $E$ $h$ $B$ $S$ $\#$ $a ^ \prime$ $C ^ \prime D ^ \prime$ $\# ^ \prime$ $a$ $D$ $\#$ $a ^ \prime$ $D ^ \prime$ $\# ^ \prime adE$

Hence the word

$$1( 4+ 2r+ 1) 2( 4+ r+ 2)( 4+ 2r+ n+ 2)$$

$$3( 4+ 2)( 4+ 3) 2( 4+ r+ 2)( 4+ 2r+ n+ 3) 4$$

(where parentheses have been added for clarity) is a solution for $PCP$, the common value of the morphisms $g$ and $h$ being for this word:

$$\tag{a5 } BS \# a ^ \prime C ^ \prime D ^ \prime \# ^ \prime aD\# a ^ \prime D ^ \prime \# ^ \prime adE .$$

Clearly, (a5) results from (a4) after the following simple modifications. The boundary markers $B$ and $E$ are added to the beginning and end; the sign $\Rightarrow$ is replaced by $\#$; moreover, every second step is primed, as is every second marker $\#$; finally, an "idle" step (from $aD$ to $a ^ \prime D ^ \prime$) is added in order to reach the non-primed word $ad$ corresponding to the final index of (a4).

It is easily seen that an analogous argument is valid in general: whenever $w$ is in $L( G)$, then $PCP$ has a solution (see [a4], where this fact is established by a detailed inductive argument). It is somewhat more difficult to establish the converse — that is, to show that if $PCP$ has a solution, then $w$ is in $L( G)$, see [a4].

Using a simple encoding it can be shown that the Post correspondence problem remains undecidable if the range alphabet $\Delta$ of the morphisms $g$ and $h$ consists of two letters only. The decidability in case the domain alphabet $\Sigma$ consists of two letters was established in [a1]. Another non-trivial decidable subclass, [a2], consists of instances $( g, h)$ where one of the morphisms $g$ and $h$ is periodic, that is, all of its values are powers of a single word. The Post correspondence problem remains undecidable if $\Sigma$ consists of 9 letters. The decidability status is open if the cardinality of $\Sigma$ is a fixed number between 3 and 8 (inclusive).

The Post correspondence problem is closely linked with equality sets, which are important in the representation theory of formal languages. By definition, the equality set of $g$ and $h$ consists of all words $w$ such that $g( w) = h( w)$. Clearly, an instance $( g, h)$ possesses a solution if and only if the equality set contains a non-empty word.

#### References

 [a1] A. Ehrenfeucht, J. Karhumäki, G. Rozenberg, "The (generalized) Post corresponding problem with lists consisting of two words is decidable" Theor. Comp. Sc. , 21 (1982) pp. 119–144 [a2] J. Karhumäki, I. Simon, "A note on elementary homomorphisms and the regularity of equality sets" EATCS Bull. , 9 (1979) pp. 16–24 [a3] E. Post, "A variant of a recursively unsolvable problem" Bull. Amer. Math. Soc. , 52 (1946) pp. 264–268 [a4] A. Salomaa, "Formal languages" , Acad. Press (1987)
How to Cite This Entry:
Post correspondence problem. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Post_correspondence_problem&oldid=51066
This article was adapted from an original article by G. RozenbergA. Salomaa (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article