Coding and decoding
The process of representing information in a definite standard form and the inverse process of recovering the information in terms of such a representation of it. In the mathematical literature an encoding (coding) is a mapping of an arbitrary set into the set of finite sequences (words) over some alphabet
, while the inverse mapping is called a decoding. Examples of an encoding are: the representation of the natural numbers in the
-ary number systems, where each number
is made to correspond to the word
over the alphabet
for which
and
; the conversion of English texts by means of a telegraphic code into sequences consisting of the transmission of pulses and pauses of various lengths; and the mapping applied in writing down the digits of a (Russian) postal index (see Fig.a and Fig.b). In this latter case, there corresponds to each decimal digit a word over the alphabet
of length 9 in which numbers referring to a line that is to be used are marked by the symbol
. (For example, the word
corresponds to the digit
.)
Figure: c022890a
Figure: c022890b
The investigation of various properties of coding and decoding and the construction of encodings that are effective in some sense and possess specified properties constitute the subject matter of coding theory. Usually, a criterion for the effectiveness of an encoding is in some way or other tied up with the minimization of the length of the code words (images of the elements of the set ), and the specified properties of the coding relate to the guarantee of a given level of noise immunity, to be understood in some sense or other. In particular, noise immunity means the possibility of unique decoding in the absence of, or at a tolerable level of, distortion of the code words. Besides noise immunity, a number of additional requirements may be imposed on the code. For example, in the choice of an encoding for the digits of a postal code, an agreement with the ordinary way of writing digits is necessary. As additional requirements, restrictions are often applied relating to the allowed complexity of the scheme effecting the coding and decoding. The problems in coding theory were in the main created under the influence of the theory of information transmission as developed by C. Shannon [1]. A source of new problems in coding theory is provided by the creation and perfection of automated systems of gathering, storage, transmission, and processing of information. The methods of solving problems in coding theory are mainly combinatorial, probability-theoretic and algebraic.
An arbitrary coding of a set (alphabet)
by words over an alphabet
can be extended to the set
of all words over
(messages) as follows:
![]() |
where ,
. Such a mapping
is called a letter-by-letter encoding of the messages. A more general class of encodings of messages is found by automated encoding realized by initial asynchronous automata (cf. Automaton), which deliver at each instant of time some (possibly empty) word over the alphabet
. The importance of this generalization consists in the fact that the automaton realizes, in different states, different encodings of the letters of the alphabet of the messages. A letter-by-letter encoding is an automaton encoding realized by a single-state automaton. One of the branches of coding theory is the study of the general properties of a coding and the construction of algorithms for recognizing these properties (see Coding, alphabetical). In particular, necessary and sufficient conditions have been found for automated and letter-by-letter encodings in order that 1) the decoding be single-valued; 2) there exists a decoding automaton, that is, an automaton effecting the decoding with a certain bounded delay; or 3) there exists a self-adjusting decoding automaton (making it possible to eliminate in a limited amount of time the effect of a mistake in the input sequence or in the functioning of the automaton itself).
The majority of problems in coding theory reduces to the study of finite or countable sets of words over an alphabet . Such sets are called codes. In particular, there corresponds to each single-valued encoding
(and letter-by-letter encoding
) a code
. One of the basic assertions in coding theory is that the condition of injectivity of a letter-by-letter encoding
imposes the following restrictions on the lengths
of the code words
:
![]() | (1) |
The converse statement also holds: If is a set of natural numbers satisfying (1), then there exists a one-to-one letter-by-letter encoding
such that the word
has length
. Furthermore, if the numbers
are increasingly ordered, then one can take for
the first
symbols after the decimal point of the expansion of
in an
-ary fraction (Shannon's method).
The most definitive results in coding theory relate to the construction of effective one-to-one encodings. The constructions described here are used in practice for the compression of information and access of information from the memory. The concept of effectiveness of an encoding depends on the choice of the cost criterion. In the definition of the cost of a one-to-one letter-by-letter encoding
it is assumed that there corresponds to each number
a positive number
and that
. The following versions of a definition of the cost
have been investigated:
1) ,
2) ,
,
3) , where it is supposed that in the first two cases the
are the probabilities with which some Bernoullian source generates the corresponding letters of the alphabet
(
), while in the third case, the
are desirable lengths of code words. In the first definition, the cost is equal to the average length of a code word; in the second definition, as the parameter
increases, the longer code words have a greater influence on the cost (
as
and
as
); in the third definition, the cost is equal to the maximum excess of the length
of the code word over the desired length
. The problem of constructing a one-to-one letter-by-letter encoding
minimizing the cost
is equivalent to that of minimizing the function
on the sets
of natural numbers satisfying the condition (1). The solution of this problem is known for each of the above definitions of a cost.
Suppose that the minimum of the quantity on the sets
of arbitrary (not necessarily natural) numbers satisfying condition (1) is equal to
and is attained on the set
. The non-negative quantity
is called the redundancy, and the quantity
is called the relative redundancy of the encoding
. The redundancy of a one-to-one encoding
constructed by the method of Shannon for lengths
,
, satisfies the inequality
. For the first, most usual, definition of the cost as the average number of code symbols necessary for one letter of the message generated by the source, the quantity
is equal to the Shannon entropy
![]() |
of the source calculated in the base , while
. The bound of the redundancy,
, can be improved by using so-called coding blocks of length
, in which messages of length
(rather than separate letters) are encoded by the Shannon method. The redundancy of such an encoding does not exceed
. This same method is used for the effective encoding of related sources. In connection with the fact that the determination of the lengths
in coding by the Shannon method is based on a knowledge of the statistics of the source, methods have been developed, for certain classes of sources, for the construction of a universal encoding that guarantees a definite upper bound on the redundancy for any source in this class. In particular, a coding by blocks of length
has been constructed with a redundancy that is, for any Bernoullian source, asymptotically at most
(for fixed
as
), where this asymptotic limit cannot be improved upon.
Along with the problems of the effective compression of information, problems of the estimation of the redundancy of concrete types of information are also considered. For example, the relative redundancy of certain natural languages (in particular, English and Russian) has been estimated under the hypothesis that their texts are generated by Markov sources with a large number of states.
In the investigation of problems on the construction of effective noise-immune encodings, one usually considers encodings to which correspond codes
belonging to the set
of words of length
over the alphabet
. It is understood here that the letters of the alphabet of the messages
are equi-probable. The effectiveness of such an encoding is estimated by the redundancy
or by the transmission rate
. In the definition of the noise-immunity of an encoding, the concept of an error is formalized and a model for the generation of errors is brought into consideration. An error of substitution type (or simply an error) is a transformation of words consisting of the substitution of a symbol in a word by another symbol in the alphabet
. For example, the production of a superfluous line in writing out the figure of a Russian postal index leads to the replacement in the coded word of the symbol 0 by the symbol 1, while the omission of a necessary line leads to the replacement of 1 by 0. The possibility of detecting and correcting errors is based on the fact that, for an encoding
with non-zero redundancy, the decoding
can be extended in arbitrary fashion onto the
words in
which are not code words. In particular, if the set
is partitioned into
disjoint sets
such that
and the decoding
is extended so that
, then in decoding, all errors that translate a code word
in
,
, will be corrected. Similar possibilities arise also in the case of other types of error, such as the erasure of a symbol (substitution by a symbol of a different alphabet), the alteration of the numerical value of a code word by
,
,
(an arithmetic error), the deletion or insertion of a symbol, etc.
In the theory of information transmission (see Information, transmission of), probabilistic models of error generation are considered; these are called channels. The simplest memoryless channel is defined by the probabilities of substitution of a symbol
by a symbol
. One defines for this channel the quantity (channel capacity)
![]() |
where the maximum is taken over all sets such that
and
. The effectiveness of an encoding
is characterized by the transmission rate
, while the noise-immunity is characterized by the mean probability of an error in the decoding
(under the optimal partitioning of
into subsets
). The main result in the theory of information transmission (Shannon's theorem) is that the channel capacity
is the least upper bound of the numbers
such that for any
and for all sufficiently large
there exists an encoding
for which
and
.
Another model for the generation of errors (see Error-correcting code; Code with correction of arithmetical errors; Code with correction of deletions and insertions) is characterized by the property that in each word of length there occur not more than a prescribed number
of errors. Let
be the set of words obtainable from
as a result of
or fewer errors. If for the code
![]() |
the sets ,
, are pairwise disjoint, then in a decoding such that
, all errors admitting of a model of the above type for the generation of errors will be corrected, and such a code is called a
-error-correcting code. For many types of errors (for example, substitutions, arithmetic errors, insertions and deletions) the function
equal to the minimal number of errors of given type taking the word
into the word
is a metric, and
is the metric ball of radius
. Therefore the problem of constructing the most effective code (that is, the code with maximum number of words
) in
with correction of
errors is equivalent to that of the densest packing of the metric space
by balls of radius
. The code for the figures of a Russian postal index is not a one-error-correcting code, because
and
, while all other distances between code words are at least 3.
The problem of studying the minimal redundancy of a code in
for correction of
errors of substitution type is divided into two basic cases. In the first case, when
is fixed as
, the following asymptotic relation holds:
![]() | (2) |
where the "power" bound is attained, based on a count of the number of words of length in a ball of radius
. For
the asymptotic behaviour of
when
except
for
, and also when
for many other types of errors (for example, arithmetic errors, deletions and insertions), is not known (1987). In the second case, when
, where
is some fixed number,
and
, the "power" bound
![]() |
where , is substantially improved. It is conjectured that the upper bound
![]() | (3) |
obtained by the method of random sampling of codes, is asymptotically exact for , that is,
. The proof or refutation of this conjecture is one of the central problems in coding theory.
The majority of the constructions of noise-immune codes are effective when the length of the code is sufficiently large. In this connection, questions relating to the complexity of systems that realize the coding and decoding (encoders and decoders) acquire special significance. Restrictions on the admissible type of decoder or on its complexity may lead to an increase in the redundancy necessary to guarantee a prescribed noise-immunity. For example, the minimal redundancy of a code in
for which there is a decoder consisting of a shift register and a single majorizing element and correcting one error has order
(compare with (2)). As mathematical models of an encoder and a decoder, diagrams of functional elements are usually considered, and by the complexity is meant the number of elements in the diagram. For the known classes of error-correcting codes, investigations have been carried out of the possible encoding and decoding algorithms and upper bounds for the complexity of the encoder and decoder have been obtained. Also, certain relationships have obtained among the transmission rate of the encoding, the noise-immunity of the encoding and the complexity of the decoder (see [5]).
Yet another line of the investigation in coding theory is connected with the fact that many results (e.g. Shannon's theorem and the upper bound (3)) are not "constructive" , but are theorems on the existence of infinite sequences of codes
. In this connection, efforts have been made to sharpen these results in order to prove them in a class of sequences
of codes for which there is a Turing machine that recognizes the membership of an arbitrary word of length
in the set
in a time having slow order of growth with respect to
(e.g.
).
Certain new constructions and methods for obtaining bounds, which have been developed within coding theory, have led to a substantial advance in questions that on the face of it seem very remote from the traditional problems in coding theory. One should mention here the use of a maximal code for the correction of one error in an asymptotically-optimal method for realizing functions of the algebra of logic by contact schemes (cf. Contact scheme); the fundamental improvement of the upper bound for the density of packing the -dimensional Euclidean space with identical balls; and the use of inequality (1) in estimating the complexity of realizing a class of functions of the algebra of logic by formulas. The ideas and results of coding theory find further developments in the synthesis of self-correcting schemes and reliable schemes of unreliable elements.
References
[1] | C. Shannon, "A mathematical theory of communication" Bell Systems Techn. J. , 27 (1948) pp. 379–423; 623–656 |
[2] | E. Berlekamp, "Algebraic coding theory" , McGraw-Hill (1968) |
[3] | E. Weldon jr., "Error-correcting codes" , M.I.T. (1972) |
[4] | , Discrete mathematics and mathematical problems in cybernetics , 1 , Moscow (1974) pp. Sect. 5 (In Russian) |
[5] | L.A. Bassalygo, V.V. Zyablov, M.S. Pinsker, "Problems of complexity in the theory of correcting codes" Problems of Information Transmission , 13 : 3 pp. 166–175 Problemy Peredachi Informatsii , 13 : 3 (1977) pp. 5–17 |
[6] | V.M. Sidel'nikov, "New bounds for densest packings of spheres in ![]() |
Comments
Two standard references on error-correcting codes and coding theory are [a1], [a2].
References
[a1] | F.J. MacWilliams, N.J.A. Sloane, "The theory of error-correcting codes" , I-II , North-Holland (1977) |
[a2] | J.H. van Lint, "Introduction to coding theory" , Springer (1982) |
Coding and decoding. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Coding_and_decoding&oldid=16981