Difference between revisions of "Information, transmission of"

Revision as of 22:12, 5 June 2020

The part of information theory related to the study of processes of information transfer from a source to a receiver (the addressee), cf. Information, source of. In the theory of information transmission one studies optimum and near-optimum methods of information transmission over a communication channel under the assumption that the methods of encoding the message into the input signal of the channel and of decoding the output signal of the channel into an output message may vary within wide ranges (cf. Coding and decoding).

The general scheme of systems of information transmission, first considered by C. Shannon [1], can be described as follows. The source of information produces the message that has to be transmitted over the communication channel from the source to the receiver. It is usually assumed that this message is a random variable defined on some probability space $ ( \Omega , \mathfrak A , {\mathsf P} ) $, taking values in a measurable space $ ( \mathfrak X , S _ {\mathfrak X } ) $ and having a probability distribution $ p ( \cdot ) $. Often $ \xi = \{ {\xi ( t) } : {t \in \Delta } \} $, where $ \Delta $ is the set of parameter values of $ t $, is a stochastic process in discrete or continuous time and with values in a measurable space $ ( X , S _ {X} ) $. E.g., in the case of discrete time $ \xi = \{ {\xi _ {k} } : {k = 1 , 2 ,\dots } \} $ or $ \xi = \{ {\xi _ {k} } : {\dots , - 1 , 0 , 1 ,\dots } \} $; the random variables $ \xi _ {k} $, taking values in $ ( X , S _ {X} ) $, are called the components of the input message and are often treated as the messages produced by the source at the moment of time $ k $. The sets $ \xi ^ {n} = ( \xi _ {1} \dots \xi _ {n} ) $, taking values in the $ n $- fold product of $ ( X , S _ {X} ) $, $ ( X ^ {n} , S _ {X ^ {n} } ) $, are called segments of length $ n $ of the input message. The corresponding concepts are defined in an analogous way for the case of a continuous-time stochastic process as message.

The output message, received by the receiver, is also a random variable $ \widetilde \xi $, defined on the same probability space $ ( \Omega , \mathfrak A , {\mathsf P} ) $ and taking values in a measurable space $ ( \widetilde{\mathfrak X} , S _ {\widetilde{\mathfrak X} } ) $( in general, different from $ ( \mathfrak X , S _ {\mathfrak X } ) $). When $ \widetilde \xi $ is a stochastic process in discrete or continuous time one introduces in an analogous way the concepts of the space $ ( \widetilde{X} , S _ {\widetilde{X} } ) $ of values of the components at the output, and the space $ ( \widetilde{X} {} ^ {n} , S _ {\widetilde{X} {} ^ {n} } ) $ of values of the segments of length $ n $ of the output message.

The exactness of reproducibility of information is taken as the measure for the quality of the transmission of the message over the communication channel (cf. Information, exactness of reproducibility of). As a rule, if the transmission is over a communication channel with noise, even if the sets $ \mathfrak X $ and $ \widetilde{\mathfrak X} $ coincide, it is impossible to obtain absolute exactness, i.e. complete coincidence of the messages sent and received. Often, the requirement reflecting the exactness is treated statistically, by introducing the class $ W $ of admissible joint probability distributions for pairs $ ( \xi , \widetilde \xi ) $ of messages sent and received in the set of all probability measures on the product $ ( \mathfrak X \times \widetilde{\mathfrak X} , S _ {\mathfrak X } \times S _ {\widetilde{\mathfrak X} } ) $. The class $ W $ is often given by means of a non-negative measurable function $ \rho ( x , \widetilde{x} ) $, $ x \in \mathfrak X $, $ \widetilde{x} \in \widetilde{\mathfrak X} $, and a number $ a > 0 $: It is considered that a probability distribution of $ ( \xi , \widetilde \xi ) $ belongs to $ W $ only if

$$ \tag{1 } {\mathsf E} \rho ( \xi , \widetilde \xi ) \leq a . $$

Thus, the condition of exactness of reproducibility indicates by what amount the message received can differ from the message sent.

Messages produced by a source are sent over a communication channel. A channel $ ( Q , V ) $ is a set of two measurable spaces $ ( \mathfrak Y , S _ {\mathfrak Y } ) $, $ ( \widetilde{\mathfrak Y} , S _ {\widetilde{\mathfrak Y} } ) $; a transition function $ Q ( y , A ) $, $ y \in \mathfrak Y $, $ A \in S _ {\widetilde{\mathfrak Y} } $, that is measurable with respect to the $ \sigma $- algebra $ S _ {\mathfrak Y } $ for fixed $ A \in S _ {\widetilde{\mathfrak Y} } $ and is a probability measure on $ ( \widetilde{\mathfrak Y} , S _ {\widetilde{\mathfrak Y} } ) $ for fixed $ y \in \mathfrak Y $; and a subset $ V $ in the space of all probability measures on $ ( \mathfrak Y , S _ {\mathfrak Y } ) $. The spaces $ ( \mathfrak Y , S _ {\mathfrak Y } ) $, $ ( \widetilde{\mathfrak Y} , S _ {\widetilde{\mathfrak Y} } ) $ are called, respectively, the spaces of input and output signals of the channel, and $ V $ is a restriction on the distribution of the input signal. One says that two random variables $ \eta $ and $ \widetilde \eta $( defined on a probability space $ ( \Omega , \mathfrak A , {\mathsf P} ) $) are related through a channel $ ( Q , V ) $ if they take values in $ ( \mathfrak Y , S _ {\mathfrak Y } ) $ and $ ( \widetilde{\mathfrak Y} , S _ {\widetilde{\mathfrak Y} } ) $, respectively, and if for any $ A \in S _ {\widetilde{\mathfrak Y} } $ with probability 1 the conditional probability

$$ \tag{2 } {\mathsf P} \{ \widetilde \eta \in A \mid \eta \} = \ Q ( \eta , A ) , $$

and the probability distribution of $ \eta $ belongs to $ V $. Most often $ V $ is given by a measurable function $ \pi ( y) $, $ y \in \mathfrak Y $, and a number $ b > 0 $: It is considered that the probability distribution of $ \eta $ belongs to $ V $ only if

$$ \tag{3 } {\mathsf E} \pi ( \eta ) \leq b . $$

In the case of discrete channels $ V $ usually coincides with the set of all probability distributions, i.e. there is no restriction.

In fact, $ \mathfrak Y $ is the set of signals given by the transmitter, and $ \widetilde{\mathfrak Y} $ is the set of signals taken by the receiver (in applications $ \mathfrak Y $ and $ \widetilde{\mathfrak Y} $ often coincide). If the random variable $ \eta $ of the input signal is known, then (2) makes it possible to find the conditional distribution of the output signal $ \widetilde \eta $. The introduction of the restriction $ V $ is related to the fact that in many applications the distributions of the input signal cannot be taken arbitrarily (the case when it is supposed that the mean value of the square (the power) of the input signal does not exceed a fixed constant is typical). The case when the input and output signals are discrete- or continuous-time stochastic processes $ \eta = \{ \eta ( t) \} $, $ \widetilde \eta = \{ \widetilde \eta ( t) \} $ defined on a certain finite or infinite (at one or both sides) interval on the real axis and taking values in certain measurable spaces $ ( Y , S _ {Y} ) $ and $ ( \widetilde{Y} , S _ {\widetilde{Y} } ) $, respectively, is important in applications. E.g., if $ \eta = \{ \eta _ {1} , \eta _ {2} ,\dots \} $ and $ \widetilde \eta = \{ \widetilde \eta _ {1} , \widetilde \eta _ {2} ,\dots \} $ are random sequences, then the communication channel for which $ \eta $ and $ \widetilde \eta $ serve as input and output signals is often regarded as a sequence of channels (in the sense described above), called segments of the given channel; the input and output signals of these segments are the vectors

$$ \eta ^ {n} = ( \eta _ {1} \dots \eta _ {n} ) \ \ \textrm{ and } \ \widetilde \eta {} ^ {n} = \ ( \widetilde \eta _ {1} \dots \widetilde \eta _ {n} ) ,\ \ n = 1 , 2 ,\dots . $$

In order to convert the input message into a signal transmittable over the communication channel and the signal received at the output of the channel into an output message it is necessary to perform operations of encoding and decoding of messages. An encoding is a function $ f ( x) $ of $ x \in \mathfrak X $ with values in $ \mathfrak Y $, and a decoding is a function $ g ( \widetilde{y} ) $ of $ \widetilde{y} \in \widetilde{\mathfrak Y} $ with values in $ \widetilde{\mathfrak X} $. The set of values of $ f ( x) $, $ x \in \mathfrak X $, is called a code, and the individual elements of this set are called code words. Using an encoding $ f ( x) $ and a decoding $ g ( \widetilde{y} ) $ means that if the message took the value $ x \in \mathfrak X $, then one transmits $ y = f ( x) $ over the channel; if $ \widetilde{y} \in \widetilde{\mathfrak Y} $ is received at the output of the channel, then it is decoded into the output message $ \widetilde{x} = g ( \widetilde{y} ) $. One often considers random encodings in the theory of information transmission, i.e. the code words are chosen randomly in correspondence with a certain probability distribution.

A message with probability distribution $ p ( \cdot ) $, produced by the source, can be transmitted with exactness of reproducibility $ W $ over a channel $ ( Q , V ) $ by means of an encoding $ f ( \cdot ) $ and a decoding $ g ( \cdot ) $ if random variables $ \xi $, $ \eta $, $ \widetilde \eta $, $ \widetilde \xi $ forming a Markov chain can be constructed such that $ \xi $ has probability distribution $ p ( \cdot ) $, the probability distribution of $ ( \xi , \widetilde \xi ) $ belongs to $ W $, the pair $ ( \eta , \widetilde \eta ) $ is related through $ ( Q , V ) $, and

$$ \tag{4 } \eta = f ( \xi ) ,\ \ \xi = g ( \widetilde \eta ) . $$

The assumption that $ \xi $, $ \eta $, $ \widetilde \eta $, $ \widetilde \xi $ form a Markov chain reduces to the assumption that the conditional probability of $ \widetilde \eta $ for fixed values of $ \xi $ and $ \eta $ depends on $ \eta $ only, i.e. it means that the output signal depends only on the input signal and not on the value of the message encoded by it.

The basic problem studied in the transmission of information is as follows. One considers known and fixed: a source generating messages with probability densities $ p ( \cdot ) $; a communication channel $ ( Q , V ) $; and a condition of exactness of reproducibility $ W $. The problem is to clarify: Under what conditions does there exist an encoding $ f ( \cdot ) $ and a decoding $ g ( \cdot ) $ such that a message produced by the given source can be transmitted with given $ W $ over $ ( Q , V ) $? Solutions to this problem for various assumptions are called coding theorems, or Shannon theorems. Another naturally arising problem is that when transmittance is possible, to construct the most simple and efficient ways of encoding and decoding the transmission.

Shannon [1] introduced quantities that allow one to formulate an answer to the first problem posed. The amount of information, or simply the information, $ I ( \cdot , \cdot ) $ is the most important among these (cf. Information, amount of). If

$$ C = C ( Q , V ) = \ \sup I ( \eta , \widetilde \eta ) $$

is the capacity of the channel $ ( Q , V ) $( cf. Transmission rate of a channel), where the supremum is over all pairs $ ( \eta , \widetilde \eta ) $ related through $ ( Q , V ) $, and if the number

$$ \tag{6 } H _ {W} ( p) = \inf I ( \xi , \widetilde \xi ) $$

is the $ W $- entropy (cf. Entropy) of the message, where the infimum is over all pairs $ ( \xi , \widetilde \xi ) $ such that the joint probability distribution of $ ( \xi , \widetilde \xi ) $ belongs to $ W $, while $ \xi $ has probability distribution $ p ( \cdot ) $, then the following theorem of Shannon (a converse of coding theorems) holds: If a message with probability distribution $ p ( \cdot ) $ can be transmitted over $ ( Q , V ) $ with exactness of reproducibility $ W $, then

$$ \tag{7 } H _ {W} ( p) \leq \ C ( Q , V ) . $$

Sufficient conditions for the possibility of transmission of information are more difficult to obtain. Thus, (7) is sufficient only in a certain asymptotic sense, in which the main assumption is that $ H _ {W} ( p) \rightarrow \infty $. Hence, (7) is necessary and sufficient only, roughly speaking, when applied to the problem of transmitting a sufficiently large amount of information. The remaining necessary assumptions are of the nature of regularity assumptions, which in concrete cases are usually fulfilled. In order to formulate sufficient conditions for the possibility of transmission in exact terms it is necessary to introduce supplementary concepts.

A sequence of pairs of random variables $ \{ ( \zeta ^ {t} , \widetilde \zeta {} ^ {t} ) \} _ {t=} 1 ^ \infty $ is called information-stable if $ 0 < I ( \zeta ^ {t} , \widetilde \zeta {} ^ {t} ) < \infty $ and

$$ \tag{8 } \lim\limits _ {t \rightarrow \infty } \ \frac{i _ {\zeta ^ {t} \widetilde \zeta {} ^ {t} } ( \zeta ^ {t} , \widetilde \zeta {} ^ {t} ) }{I ( \zeta ^ {t} , \widetilde \zeta {} ^ {t} ) } = 1 $$

in the sense of convergence in probability. Here $ i _ {\zeta ^ {t} \widetilde \zeta {} ^ {t} } ( \cdot , \cdot ) $ is the information density (cf. Information, amount of) of $ ( \zeta ^ {t} , \widetilde \zeta {} ^ {t} ) $. A sequence of channels $ \{ ( Q ^ {t} , V ^ {t} ) \} _ {t=} 1 ^ \infty $ with $ C ( Q ^ {t} , V ^ {t} ) < \infty $ is called information-stable if there exists an information-stable sequence of pairs $ ( \eta ^ {t} , \widetilde \eta {} ^ {t} ) $, related through $ ( Q ^ {t} , V ^ {t} ) $, such that

$$ \tag{9 } \lim\limits _ {t \rightarrow \infty } \ \frac{I ( \eta , \widetilde \eta {} ^ {t} ) }{C ( Q ^ {t} , V ^ {t} ) } = 1 . $$

A sequence of messages with probability distributions $ p ^ {t} ( \cdot ) $ and exactness conditions $ W ^ {t} $ with $ H _ {W ^ {t} } ( p ^ {t} ) < \infty $, $ t = 1 , 2 \dots $ is called information-stable if there exists a sequence of pairs $ ( \xi ^ {t} , \widetilde \xi {} ^ {t} ) $ such that $ \xi ^ {t} $ has probability density $ p ( \cdot ) $, the probability distribution of $ ( \xi ^ {t} , \widetilde \xi {} ^ {t} ) $ belongs to $ W ^ {t} $, and

$$ \tag{10 } \lim\limits _ {t \rightarrow \infty } \ \frac{I ( \xi ^ {t} , \widetilde \xi {} ^ {t} ) }{H _ {W ^ {t} } ( p ^ {t} ) } = 1 . $$

Let $ V _ \epsilon $ be the set of probability distributions for which (3) with $ b $ replaced by $ b + \epsilon $ holds, and let $ W _ \epsilon $ be the exactness condition given by (1) with $ a $ replaced by $ a + \epsilon $, $ \epsilon > 0 $. The following coding theorem (Shannon) holds: Suppose one is given an information-stable sequence of messages with probability densities $ p ^ {t} ( \cdot ) $ and exactness conditions $ W ^ {t} $, as well as an information-stable sequence of channels $ ( Q ^ {t} , V ^ {t} ) $ such that the functions $ \pi ^ {t} ( \cdot ) $ and $ \rho ^ {t} ( \cdot , \cdot ) $ are uniformly bounded in $ t $. Let $ H _ {W ^ {t} } ( p ^ {t} ) \rightarrow \infty $ as $ t \rightarrow \infty $ and let

$$ \tag{11 } \lim\limits _ {t \rightarrow \infty } \textrm{ i nf } \ \frac{C ( Q ^ {t} , V ^ {t} ) }{H _ {W ^ {t} } ( p ^ {t} ) } > 1 . $$

Then for any $ \epsilon > 0 $ there exists an arbitrary large $ t _ {0} $ such that for all $ t \geq t _ {0} $ the messages with probability distributions $ p ^ {t} ( \cdot ) $ can be transmitted over $ ( Q ^ {t} , V ^ {t} ) $ with exactness of reproducibility $ W _ \epsilon ^ {t} $.

This formulation of a direct coding theorem is most general. The assumption that $ \pi ^ {t} ( \cdot ) $ and $ \rho ^ {t} ( \cdot , \cdot ) $ are uniformly bounded in $ t $ can be substantially weakened. The information stability of a sequence of messages or channels is true in a large number of particular cases of practical interest. Finally, under certain conditions one may replace $ W _ \epsilon ^ {t} $ by $ W ^ {t} $ and $ V _ \epsilon ^ {t} $ by $ V ^ {t} $ in the formulation of the theorem.

In the description of real situations the case when the sequence of channels considered in the theorem is the sequence of segments of a fixed channel, while the sequence of messages is the sequence of segments of a message from a fixed source with growing number of components is of most interest. This corresponds to the functioning of a communication system in time. Below some versions of the coding theorem and its converse are given for this situation, in a somewhat different form. Moreover, discrete stationary sources and memoryless communication channels are considered.

Suppose that a discrete stationary source $ U $ produces messages $ \xi = \{ {\xi _ {k} } : {\dots, - 1 , 0 , 1 ,\dots } \} $, where the individual components (the letters of the message) $ \xi _ {k} $ take values from some finite set of an alphabet $ X $ of volume $ M $, and generates letters of messages at the rate of one letter per unit of time. Let the components of the message $ \widetilde \xi = \{ {\widetilde \xi _ {k} } : {\dots , - 1 , 0 , 1 ,\dots } \} $ received by the addressee take values in the same alphabet $ X $( i.e. $ \widetilde{X} = X $). Suppose further that a discrete memoryless channel is being used, and that transmission over it is at the rate of one symbol per time interval $ t $. Suppose that there are no restrictions on the distribution of the input signal of the channel. Suppose that a segment of length $ L = L _ {N} $ of a message, $ \xi ^ {L} = ( \xi _ {1} \dots \xi _ {L} ) $, is transmitted over the channel by intervals of length $ N = [ L / t ] $( where $ [ x] $ is the integer part of a number $ x $) using certain encoding and decoding methods of the type described above. Then if $ \widetilde \xi {} ^ {L} = ( \widetilde \xi _ {1} \dots \widetilde \xi _ {L} ) $ is the corresponding segment of the message obtained by the addressee and $ \widehat{P} _ {e} $ is the average probability of an error in a source letter, defined by

$$ \tag{12 } \widehat{P} _ {e} = \ \frac{1}{L} \sum _ { l= } 1 ^ { L } {\mathsf P} \{ \widetilde \xi _ {1} \neq \xi _ {l} \} , $$

then the following theorem holds.

Converse of the coding theorem: Let $ H ( U) $ be the rate of creation of messages of the given discrete stationary source and let $ C $ be the transmission rate (on transmitted symbols) of the memoryless channel used. Then for all $ L $:

$$ \tag{13 } \widehat{P} _ {e} \mathop{\rm log} ( M - 1 ) + h ( \widehat{P} _ {e} ) \geq H ( U) - \frac{1} \tau C , $$

where

$$ h ( x) = - x \mathop{\rm log} x - ( 1 - x ) \mathop{\rm log} ( 1 - x ) . $$

Thus, if the rate of creation of messages $ H ( U) $ is larger than $ C / \tau $( the transmission rate of the channel on a source letter), then the average probability of an error in a source letter is bounded from below by a non-zero constant, whatever $ L $ and the methods of encoding and decoding. Hence, this probability does not tend to zero as $ L \rightarrow \infty $.

In order to formulate the direct statement of a coding theorem one needs the quantity

$$ \tag{14 } R _ {N} = \ \frac{ \mathop{\rm log} _ {2} M ^ {L _ {N} } }{N} . $$

When $ M = 2 $ and $ L _ {N} / \tau $ is an integer one has $ R _ {N} = \tau $, i.e. $ R _ {N} $ is in bits (cf. Bit) the number of binary symbols produced by the source within the transmission time of one symbol over the channel. Moreover, $ R _ {N} $ coincides with $ H ( U) $ if the components are independent and identically distributed. The probability of an error and the average probability of an error in a block of the source are defined respectively, by,

$$ \tag{15 } P _ {e,x ^ {L} } = \ {\mathsf P} _ {x ^ {L} } \{ \widetilde \xi {} ^ {L} \neq \xi ^ {L} \} , $$

$$ \tag{16 } \overline{P}\; _ {e} = \sum _ {x ^ {L} } {\mathsf P} \{ \xi ^ {L} = x ^ {L} \} P _ {e,x ^ {L} } . $$

Here $ P _ {x ^ {L} } ( \cdot ) $ is the conditional probability under the condition $ \xi ^ {L} = x ^ {L} \in X ^ {L} $. The following coding theorem holds: For all $ N $ and any $ R < C $ there are encoding and decoding methods such that for $ R _ {N} \leq R $ and all $ x ^ {L} \in X ^ {L} $,

$$ \tag{17 } P _ {e,x ^ {L} } \leq e ^ {- N E ( R) } $$

(this also holds for $ \overline{P}\; _ {e} $). Moreover, for $ 0 \leq R < C $ the function $ E ( R) $ is convex, positive and decreases with increasing $ R $( see also Erroneous decoding, probability of). Thus, this theorem also shows that for all $ R < C $ the probability of an error tends to zero, exponentially fast, as $ R \rightarrow \infty $.

There are generalizations of Shannon theorems to the cases of so-called compound channels and messages with independent parameters. Such generalizations are of interest because in practice it is impossible to regard the statistical parameters of the source of messages and the communication channel as completely known, since these parameters can sometimes change in the process of transmission. Therefore, it is appropriate to assure that the source of the messages and the communication channel belong to a certain class of possible sources and channels. One introduces, moreover, the minimax criterion for the quality of transmission, in which the quality of a given transmission method is estimated for the best possible sources and channels belonging to the given classes.

There are also generalizations of Shannon theorems to the transmission of information over a channel with feedback. The presence of complete feedback means that at the moment of time $ t $ it is considered that at the transmitting side of the channel (i.e. at its input) all exact values of the output signals at all moments $ t ^ \prime < t $ are known. In particular, for a memoryless channel with feedback the basic result is that the presence of feedback does not increase the transmission rate of the channel, although it may substantially decrease the complexity of the encoding and decoding devices.

Other generalizations to be mentioned are the theory of information transmission over channels with error synchronization, in which random synchronizations are possible, with as result the disturbance of the one-to-one correspondence between the input and output signal, as well as the theory of transmission over a channel with multiple directions, when there are several sources and receivers of information, and where transmission may proceed over several directions at the same time.

References

[1]	C. Shannon, "A mathematical theory of communication" Bell Systems Techn. J. , 27 (1948) pp. 379–423; 623–656
[2]	P.L. Dobrushin, "A general formulation of the fundamental theorem of Shannon in information theory" Uspekhi Mat. Nauk , 14 : 4 (1959) pp. 3–104 (In Russian)
[3]	J. Wolfowitz, "Coding theorems of information theory" , Springer (1964)
[4]	R. Gallager, "Theory of information and reliable communication" , Wiley (1968)
[5]	A.A. Feinstein, "Foundations of information theory" , McGraw-Hill (1968)
[6]	R.M. Fano, "Transmission of information. Statistical theory of communications" , M.I.T. (1963)
[7]	A.A. Kharkevich, "Channels with noise" , Moscow (1965) (In Russian)
[8]	J.M. Wozencraft, I.M. Jacobs, "Principles of communication engineering" , Wiley (1965)
[9]	A.N. Kolmogorov, "Three approaches to the definition of the concept of "amount of information" " Probl. Peredachi Inform. , 1 : 1 (1965) pp. 3–11 (In Russian)
[10]	M.S. Pinsker, "Information and informational stability of random variables and processes" , Holden-Day (1964) (Translated from Russian)
[11]	B.R. Levin, "Theoretical foundations of statistical radiotechnics" , Moscow (1974) (In Russian)
[12]	A.N. Kolmogorov, "The theory of information transmission" , Meeting of the USSR Acad. Sci. on Scientific Problems of Automatizing Society 15–20 Oct. 1956, Talin. , 1 , Moscow (1957) pp. 66–99 (In Russian)
[13]	D. Slepian (ed.) , Key papers in the development of information theory , IEEE (1974)
[14]	, Proc. 1975 IEEE-USSR Joint Workshop Inform. Theory (Moscow, 15–19 Dec. 1975) , IEEE (1976)

How to Cite This Entry:
Information, transmission of. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Information,_transmission_of&oldid=11654

This article was adapted from an original article by R.L. DobrushinV.V. Prelov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Information, transmission of"

Revision as of 22:12, 5 June 2020

References

@@ Line 1: / Line 1: @@
+<!--
+i0511501.png
+$#A+1 = 249 n = 0
+$#C+1 = 249 : ~/encyclopedia/old_files/data/I051/I.0501150 Information, transmission of
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
+{{TEX|auto}}
+{{TEX|done}}
 The part of [[Information theory|information theory]] related to the study of processes of information transfer from a source to a receiver (the addressee), cf. [[Information, source of|Information, source of]]. In the theory of information transmission one studies optimum and near-optimum methods of information transmission over a [[Communication channel|communication channel]] under the assumption that the methods of encoding the message into the input signal of the channel and of decoding the output signal of the channel into an output message may vary within wide ranges (cf. [[Coding and decoding|Coding and decoding]]).
-The general scheme of systems of information transmission, first considered by C. Shannon [[#References|[1]]], can be described as follows. The source of information produces the message that has to be transmitted over the communication channel from the source to the receiver. It is usually assumed that this message is a random variable defined on some [[Probability space|probability space]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i0511501.png" />, taking values in a [[Measurable space|measurable space]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i0511502.png" /> and having a [[Probability distribution|probability distribution]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i0511503.png" />. Often <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i0511504.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i0511505.png" /> is the set of parameter values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i0511506.png" />, is a stochastic process in discrete or continuous time and with values in a measurable space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i0511507.png" />. E.g., in the case of discrete time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i0511508.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i0511509.png" />; the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115010.png" />, taking values in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115011.png" />, are called the components of the input message and are often treated as the messages produced by the source at the moment of time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115012.png" />. The sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115013.png" />, taking values in the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115014.png" />-fold product of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115015.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115016.png" />, are called segments of length <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115018.png" /> of the input message. The corresponding concepts are defined in an analogous way for the case of a continuous-time stochastic process as message.
+The general scheme of systems of information transmission, first considered by C. Shannon [[#References|[1]]], can be described as follows. The source of information produces the message that has to be transmitted over the communication channel from the source to the receiver. It is usually assumed that this message is a random variable defined on some [[Probability space|probability space]]  $  ( \Omega , \mathfrak A , {\mathsf P} ) $,
+taking values in a [[Measurable space|measurable space]]  $  ( \mathfrak X , S _ {\mathfrak X }  ) $
+and having a [[Probability distribution|probability distribution]]  $  p ( \cdot ) $.
+Often  $  \xi = \{ {\xi ( t) } : {t \in \Delta } \} $,
+where  $  \Delta $
+is the set of parameter values of  $  t $,
+is a stochastic process in discrete or continuous time and with values in a measurable space  $  ( X , S _ {X} ) $.
+E.g., in the case of discrete time  $  \xi = \{ {\xi _ {k} } : {k = 1 , 2 ,\dots } \} $
+or  $  \xi = \{ {\xi _ {k} } : {\dots  , - 1 , 0 , 1 ,\dots } \} $;
+the random variables  $  \xi _ {k} $,
+taking values in  $  ( X , S _ {X} ) $,
+are called the components of the input message and are often treated as the messages produced by the source at the moment of time  $  k $.
+The sets  $  \xi  ^ {n} = ( \xi _ {1} \dots \xi _ {n} ) $,
+taking values in the  $  n $-
+fold product of  $  ( X , S _ {X} ) $,
+$  ( X  ^ {n} , S _ {X  ^ {n}  } ) $,
+are called segments of length  $  n $
+of the input message. The corresponding concepts are defined in an analogous way for the case of a continuous-time stochastic process as message.
-The output message, received by the receiver, is also a random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115019.png" />, defined on the same probability space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115020.png" /> and taking values in a measurable space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115021.png" /> (in general, different from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115022.png" />). When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115023.png" /> is a stochastic process in discrete or continuous time one introduces in an analogous way the concepts of the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115024.png" /> of values of the components at the output, and the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115025.png" /> of values of the segments of length <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115026.png" /> of the output message.
+The output message, received by the receiver, is also a random variable  $  \widetilde \xi   $,
+defined on the same probability space  $  ( \Omega , \mathfrak A , {\mathsf P} ) $
+and taking values in a measurable space  $  ( \widetilde{\mathfrak X}  , S _ {\widetilde{\mathfrak X}  }  ) $(
+in general, different from  $  ( \mathfrak X , S _ {\mathfrak X }  ) $).
+When  $  \widetilde \xi   $
+is a stochastic process in discrete or continuous time one introduces in an analogous way the concepts of the space  $  ( \widetilde{X}  , S _ {\widetilde{X}  }  ) $
+of values of the components at the output, and the space  $  ( \widetilde{X}  {}  ^ {n} , S _ {\widetilde{X}  {}  ^ {n}  } ) $
+of values of the segments of length  $  n $
+of the output message.
-The exactness of reproducibility of information is taken as the measure for the quality of the transmission of the message over the communication channel (cf. [[Information, exactness of reproducibility of|Information, exactness of reproducibility of]]). As a rule, if the transmission is over a communication channel with noise, even if the sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115027.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115028.png" /> coincide, it is impossible to obtain absolute exactness, i.e. complete coincidence of the messages sent and received. Often, the requirement reflecting the exactness is treated statistically, by introducing the class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115029.png" /> of admissible joint probability distributions for pairs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115030.png" /> of messages sent and received in the set of all probability measures on the product <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115031.png" />. The class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115032.png" /> is often given by means of a non-negative measurable function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115033.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115034.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115035.png" />, and a number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115036.png" />: It is considered that a probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115037.png" /> belongs to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115038.png" /> only if
+The exactness of reproducibility of information is taken as the measure for the quality of the transmission of the message over the communication channel (cf. [[Information, exactness of reproducibility of|Information, exactness of reproducibility of]]). As a rule, if the transmission is over a communication channel with noise, even if the sets  $  \mathfrak X $
+and  $  \widetilde{\mathfrak X}  $
+coincide, it is impossible to obtain absolute exactness, i.e. complete coincidence of the messages sent and received. Often, the requirement reflecting the exactness is treated statistically, by introducing the class  $  W $
+of admissible joint probability distributions for pairs  $  ( \xi , \widetilde \xi   ) $
+of messages sent and received in the set of all probability measures on the product  $  ( \mathfrak X \times \widetilde{\mathfrak X}  , S _ {\mathfrak X }  \times S _ {\widetilde{\mathfrak X}  }  ) $.
+The class  $  W $
+is often given by means of a non-negative measurable function  $  \rho ( x , \widetilde{x}  ) $,
+$  x \in \mathfrak X $,
+$  \widetilde{x}  \in \widetilde{\mathfrak X}  $,
+and a number  $  a > 0 $:
+It is considered that a probability distribution of  $  ( \xi , \widetilde \xi   ) $
+belongs to  $  W $
+only if
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115039.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+$$ \tag{1 }
+{\mathsf E} \rho
+( \xi , \widetilde \xi   )
+ \leq   a .
+$$
 Thus, the condition of exactness of reproducibility indicates by what amount the message received can differ from the message sent.
-Messages produced by a source are sent over a communication channel. A channel <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115040.png" /> is a set of two measurable spaces <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115041.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115042.png" />; a transition function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115043.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115044.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115045.png" />, that is measurable with respect to the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115046.png" />-algebra <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115047.png" /> for fixed <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115048.png" /> and is a probability measure on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115049.png" /> for fixed <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115050.png" />; and a subset <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115051.png" /> in the space of all probability measures on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115052.png" />. The spaces <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115053.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115054.png" /> are called, respectively, the spaces of input and output signals of the channel, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115055.png" /> is a restriction on the distribution of the input signal. One says that two random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115056.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115057.png" /> (defined on a probability space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115058.png" />) are related through a channel <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115059.png" /> if they take values in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115060.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115061.png" />, respectively, and if for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115062.png" /> with probability 1 the conditional probability
+Messages produced by a source are sent over a communication channel. A channel  $  ( Q , V ) $
+is a set of two measurable spaces  $  ( \mathfrak Y , S _ {\mathfrak Y }  ) $,
+$  ( \widetilde{\mathfrak Y}  , S _ {\widetilde{\mathfrak Y}  }  ) $;
+a transition function  $  Q ( y , A ) $,
+$  y \in \mathfrak Y $,
+$  A \in S _ {\widetilde{\mathfrak Y}  }  $,
+that is measurable with respect to the  $  \sigma $-
+algebra  $  S _ {\mathfrak Y }  $
+for fixed  $  A \in S _ {\widetilde{\mathfrak Y}  }  $
+and is a probability measure on  $  ( \widetilde{\mathfrak Y}  , S _ {\widetilde{\mathfrak Y}  }  ) $
+for fixed  $  y \in \mathfrak Y $;
+and a subset  $  V $
+in the space of all probability measures on  $  ( \mathfrak Y , S _ {\mathfrak Y }  ) $.
+The spaces  $  ( \mathfrak Y , S _ {\mathfrak Y }  ) $,
+$  ( \widetilde{\mathfrak Y}  , S _ {\widetilde{\mathfrak Y}  }  ) $
+are called, respectively, the spaces of input and output signals of the channel, and  $  V $
+is a restriction on the distribution of the input signal. One says that two random variables  $  \eta $
+and  $  \widetilde \eta   $(
+defined on a probability space  $  ( \Omega , \mathfrak A , {\mathsf P} ) $)
+are related through a channel  $  ( Q , V ) $
+if they take values in  $  ( \mathfrak Y , S _ {\mathfrak Y }  ) $
+and  $  ( \widetilde{\mathfrak Y}  , S _ {\widetilde{\mathfrak Y}  }  ) $,
+respectively, and if for any  $  A \in S _ {\widetilde{\mathfrak Y}  }  $
+with probability 1 the conditional probability
+$$ \tag{2 }
+{\mathsf P} \{ \widetilde \eta   \in A \mid  \eta \}  = \
+Q ( \eta , A ) ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115063.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+and the probability distribution of  $  \eta $
+belongs to  $  V $.
+Most often  $  V $
+is given by a measurable function  $  \pi ( y) $,
+$  y \in \mathfrak Y $,
+and a number  $  b > 0 $:
+It is considered that the probability distribution of  $  \eta $
+belongs to  $  V $
+only if
-and the probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115064.png" /> belongs to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115065.png" />. Most often <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115066.png" /> is given by a measurable function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115067.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115068.png" />, and a number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115069.png" />: It is considered that the probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115070.png" /> belongs to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115071.png" /> only if
+$$ \tag{3 }
+{\mathsf E} \pi ( \eta )  \leq   b .
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115072.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
+In the case of discrete channels  $  V $
+usually coincides with the set of all probability distributions, i.e. there is no restriction.
-In the case of discrete channels <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115073.png" /> usually coincides with the set of all probability distributions, i.e. there is no restriction.
+In fact,  $  \mathfrak Y $
+is the set of signals given by the transmitter, and  $  \widetilde{\mathfrak Y}  $
+is the set of signals taken by the receiver (in applications  $  \mathfrak Y $
+and  $  \widetilde{\mathfrak Y}  $
+often coincide). If the random variable  $  \eta $
+of the input signal is known, then (2) makes it possible to find the conditional distribution of the output signal  $  \widetilde \eta   $.
+The introduction of the restriction  $  V $
+is related to the fact that in many applications the distributions of the input signal cannot be taken arbitrarily (the case when it is supposed that the mean value of the square (the power) of the input signal does not exceed a fixed constant is typical). The case when the input and output signals are discrete- or continuous-time stochastic processes  $  \eta = \{ \eta ( t) \} $,
+$  \widetilde \eta   = \{ \widetilde \eta   ( t) \} $
+defined on a certain finite or infinite (at one or both sides) interval on the real axis and taking values in certain measurable spaces  $  ( Y , S _ {Y} ) $
+and  $  ( \widetilde{Y}  , S _ {\widetilde{Y}  }  ) $,
+respectively, is important in applications. E.g., if  $  \eta = \{ \eta _ {1} , \eta _ {2} ,\dots \} $
+and  $  \widetilde \eta   = \{ \widetilde \eta   _ {1} , \widetilde \eta   _ {2} ,\dots \} $
+are random sequences, then the communication channel for which  $  \eta $
+and  $  \widetilde \eta   $
+serve as input and output signals is often regarded as a sequence of channels (in the sense described above), called segments of the given channel; the input and output signals of these segments are the vectors
-In fact, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115074.png" /> is the set of signals given by the transmitter, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115075.png" /> is the set of signals taken by the receiver (in applications <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115076.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115077.png" /> often coincide). If the random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115078.png" /> of the input signal is known, then (2) makes it possible to find the conditional distribution of the output signal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115079.png" />. The introduction of the restriction <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115080.png" /> is related to the fact that in many applications the distributions of the input signal cannot be taken arbitrarily (the case when it is supposed that the mean value of the square (the power) of the input signal does not exceed a fixed constant is typical). The case when the input and output signals are discrete- or continuous-time stochastic processes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115081.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115082.png" /> defined on a certain finite or infinite (at one or both sides) interval on the real axis and taking values in certain measurable spaces <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115083.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115084.png" />, respectively, is important in applications. E.g., if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115085.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115086.png" /> are random sequences, then the communication channel for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115087.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115088.png" /> serve as input and output signals is often regarded as a sequence of channels (in the sense described above), called segments of the given channel; the input and output signals of these segments are the vectors
+$$
+\eta  ^ {n}  =  ( \eta _ {1} \dots \eta _ {n} ) \ \
+\textrm{ and } \  \widetilde \eta   {}  ^ {n}  = \
+( \widetilde \eta   _ {1} \dots \widetilde \eta   _ {n} ) ,\ \
+n = 1 , 2 ,\dots .
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115089.png" /></td> </tr></table>
+In order to convert the input message into a signal transmittable over the communication channel and the signal received at the output of the channel into an output message it is necessary to perform operations of encoding and decoding of messages. An encoding is a function  $  f ( x) $
+of  $  x \in \mathfrak X $
+with values in  $  \mathfrak Y $,
+and a decoding is a function  $  g ( \widetilde{y}  ) $
+of  $  \widetilde{y}  \in \widetilde{\mathfrak Y}  $
+with values in  $  \widetilde{\mathfrak X}  $.
+The set of values of  $  f ( x) $,
+$  x \in \mathfrak X $,
+is called a code, and the individual elements of this set are called code words. Using an encoding  $  f ( x) $
+and a decoding  $  g ( \widetilde{y}  ) $
+means that if the message took the value  $  x \in \mathfrak X $,
+then one transmits  $  y = f ( x) $
+over the channel; if  $  \widetilde{y}  \in \widetilde{\mathfrak Y}  $
+is received at the output of the channel, then it is decoded into the output message  $  \widetilde{x}  = g ( \widetilde{y}  ) $.
+One often considers random encodings in the theory of information transmission, i.e. the code words are chosen randomly in correspondence with a certain probability distribution.
-In order to convert the input message into a signal transmittable over the communication channel and the signal received at the output of the channel into an output message it is necessary to perform operations of encoding and decoding of messages. An encoding is a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115090.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115091.png" /> with values in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115092.png" />, and a decoding is a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115093.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115094.png" /> with values in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115095.png" />. The set of values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115096.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115097.png" />, is called a code, and the individual elements of this set are called code words. Using an encoding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115098.png" /> and a decoding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i05115099.png" /> means that if the message took the value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150100.png" />, then one transmits <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150101.png" /> over the channel; if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150102.png" /> is received at the output of the channel, then it is decoded into the output message <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150103.png" />. One often considers random encodings in the theory of information transmission, i.e. the code words are chosen randomly in correspondence with a certain probability distribution.
+A message with probability distribution  $  p ( \cdot ) $,
+produced by the source, can be transmitted with exactness of reproducibility  $  W $
+over a channel  $  ( Q , V ) $
+by means of an encoding  $  f ( \cdot ) $
+and a decoding  $  g ( \cdot ) $
+if random variables  $  \xi $,
+$  \eta $,
+$  \widetilde \eta   $,
+$  \widetilde \xi   $
+forming a [[Markov chain|Markov chain]] can be constructed such that  $  \xi $
+has probability distribution  $  p ( \cdot ) $,
+the probability distribution of  $  ( \xi , \widetilde \xi   ) $
+belongs to  $  W $,
+the pair  $  ( \eta , \widetilde \eta   ) $
+is related through  $  ( Q , V ) $,
+and
-A message with probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150104.png" />, produced by the source, can be transmitted with exactness of reproducibility <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150105.png" /> over a channel <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150106.png" /> by means of an encoding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150107.png" /> and a decoding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150108.png" /> if random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150109.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150110.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150111.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150112.png" /> forming a [[Markov chain|Markov chain]] can be constructed such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150113.png" /> has probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150114.png" />, the probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150115.png" /> belongs to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150116.png" />, the pair <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150117.png" /> is related through <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150118.png" />, and
+$$ \tag{4 }
+\eta  =  f ( \xi ) ,\ \
+\xi  =  g ( \widetilde \eta   ) .
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150119.png" /></td> <td valign="top" style="width:5%;text-align:right;">(4)</td></tr></table>
+The assumption that  $  \xi $,
+$  \eta $,
+$  \widetilde \eta   $,
+$  \widetilde \xi   $
+form a Markov chain reduces to the assumption that the [[Conditional probability|conditional probability]] of  $  \widetilde \eta   $
+for fixed values of  $  \xi $
+and  $  \eta $
+depends on  $  \eta $
+only, i.e. it means that the output signal depends only on the input signal and not on the value of the message encoded by it.
-The assumption that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150120.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150121.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150122.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150123.png" /> form a Markov chain reduces to the assumption that the [[Conditional probability|conditional probability]] of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150124.png" /> for fixed values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150125.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150126.png" /> depends on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150127.png" /> only, i.e. it means that the output signal depends only on the input signal and not on the value of the message encoded by it.
+The basic problem studied in the transmission of information is as follows. One considers known and fixed: a source generating messages with probability densities  $  p ( \cdot ) $;
+a communication channel  $  ( Q , V ) $;
+and a condition of exactness of reproducibility  $  W $.
+The problem is to clarify: Under what conditions does there exist an encoding  $  f ( \cdot ) $
+and a decoding  $  g ( \cdot ) $
+such that a message produced by the given source can be transmitted with given  $  W $
+over  $  ( Q , V ) $?
+Solutions to this problem for various assumptions are called coding theorems, or Shannon theorems. Another naturally arising problem is that when transmittance is possible, to construct the most simple and efficient ways of encoding and decoding the transmission.
-The basic problem studied in the transmission of information is as follows. One considers known and fixed: a source generating messages with probability densities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150128.png" />; a communication channel <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150129.png" />; and a condition of exactness of reproducibility <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150130.png" />. The problem is to clarify: Under what conditions does there exist an encoding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150131.png" /> and a decoding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150132.png" /> such that a message produced by the given source can be transmitted with given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150133.png" /> over <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150134.png" />? Solutions to this problem for various assumptions are called coding theorems, or Shannon theorems. Another naturally arising problem is that when transmittance is possible, to construct the most simple and efficient ways of encoding and decoding the transmission.
+Shannon [[#References|[1]]] introduced quantities that allow one to formulate an answer to the first problem posed. The amount of information, or simply the information,  $  I ( \cdot , \cdot ) $
+is the most important among these (cf. [[Information, amount of|Information, amount of]]). If
-Shannon [[#References|[1]]] introduced quantities that allow one to formulate an answer to the first problem posed. The amount of information, or simply the information, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150135.png" /> is the most important among these (cf. [[Information, amount of|Information, amount of]]). If
+$$
+C  =  C ( Q , V )  = \
+\sup  I ( \eta , \widetilde \eta   )
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150136.png" /></td> </tr></table>
+is the capacity of the channel  $  ( Q , V ) $(
+cf. [[Transmission rate of a channel|Transmission rate of a channel]]), where the supremum is over all pairs  $  ( \eta , \widetilde \eta   ) $
+related through  $  ( Q , V ) $,
+and if the number
-is the capacity of the channel <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150137.png" /> (cf. [[Transmission rate of a channel|Transmission rate of a channel]]), where the supremum is over all pairs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150138.png" /> related through <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150139.png" />, and if the number
+$$ \tag{6 }
+H _ {W} ( p)  =  \inf  I ( \xi , \widetilde \xi   )
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150140.png" /></td> <td valign="top" style="width:5%;text-align:right;">(6)</td></tr></table>
+is the  $  W $-
+entropy (cf. [[Entropy|Entropy]]) of the message, where the infimum is over all pairs  $  ( \xi , \widetilde \xi   ) $
+such that the joint probability distribution of  $  ( \xi , \widetilde \xi   ) $
+belongs to  $  W $,
+while  $  \xi $
+has probability distribution  $  p ( \cdot ) $,
+then the following theorem of Shannon (a converse of coding theorems) holds: If a message with probability distribution  $  p ( \cdot ) $
+can be transmitted over  $  ( Q , V ) $
+with exactness of reproducibility  $  W $,
+then
-is the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150141.png" />-entropy (cf. [[Entropy|Entropy]]) of the message, where the infimum is over all pairs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150142.png" /> such that the joint probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150143.png" /> belongs to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150144.png" />, while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150145.png" /> has probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150146.png" />, then the following theorem of Shannon (a converse of coding theorems) holds: If a message with probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150147.png" /> can be transmitted over <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150148.png" /> with exactness of reproducibility <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150149.png" />, then
+$$ \tag{7 }
+H _ {W} ( p)  \leq  \
+C ( Q , V ) .
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150150.png" /></td> <td valign="top" style="width:5%;text-align:right;">(7)</td></tr></table>
+Sufficient conditions for the possibility of transmission of information are more difficult to obtain. Thus, (7) is sufficient only in a certain asymptotic sense, in which the main assumption is that  $  H _ {W} ( p) \rightarrow \infty $.
+Hence, (7) is necessary and sufficient only, roughly speaking, when applied to the problem of transmitting a sufficiently large amount of information. The remaining necessary assumptions are of the nature of regularity assumptions, which in concrete cases are usually fulfilled. In order to formulate sufficient conditions for the possibility of transmission in exact terms it is necessary to introduce supplementary concepts.
-Sufficient conditions for the possibility of transmission of information are more difficult to obtain. Thus, (7) is sufficient only in a certain asymptotic sense, in which the main assumption is that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150151.png" />. Hence, (7) is necessary and sufficient only, roughly speaking, when applied to the problem of transmitting a sufficiently large amount of information. The remaining necessary assumptions are of the nature of regularity assumptions, which in concrete cases are usually fulfilled. In order to formulate sufficient conditions for the possibility of transmission in exact terms it is necessary to introduce supplementary concepts.
+A sequence of pairs of random variables  $  \{ ( \zeta  ^ {t} , \widetilde \zeta   {}  ^ {t} ) \} _ {t=} 1  ^  \infty  $
+is called information-stable if  $  0 < I ( \zeta  ^ {t} , \widetilde \zeta   {}  ^ {t} ) < \infty $
+and
-A sequence of pairs of random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150152.png" /> is called information-stable if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150153.png" /> and
+$$ \tag{8 }
+\lim\limits _ {t \rightarrow \infty } \
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150154.png" /></td> <td valign="top" style="width:5%;text-align:right;">(8)</td></tr></table>
+\frac{i _ {\zeta  ^ {t}  \widetilde \zeta   {}  ^ {t} }
+( \zeta  ^ {t} , \widetilde \zeta   {}  ^ {t} ) }{I ( \zeta  ^ {t} , \widetilde \zeta   {}  ^ {t} ) }
-in the sense of convergence in probability. Here <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150155.png" /> is the information density (cf. [[Information, amount of|Information, amount of]]) of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150156.png" />. A sequence of channels <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150157.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150158.png" /> is called information-stable if there exists an information-stable sequence of pairs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150159.png" />, related through <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150160.png" />, such that
+ =  1
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150161.png" /></td> <td valign="top" style="width:5%;text-align:right;">(9)</td></tr></table>
+in the sense of convergence in probability. Here  $  i _ {\zeta  ^ {t}  \widetilde \zeta   {}  ^ {t} } ( \cdot , \cdot ) $
+is the information density (cf. [[Information, amount of|Information, amount of]]) of  $  ( \zeta  ^ {t} , \widetilde \zeta   {}  ^ {t} ) $.
+A sequence of channels  $  \{ ( Q  ^ {t} , V  ^ {t} ) \} _ {t=} 1  ^  \infty  $
+with  $  C ( Q  ^ {t} , V  ^ {t} ) < \infty $
+is called information-stable if there exists an information-stable sequence of pairs  $  ( \eta  ^ {t} , \widetilde \eta   {}  ^ {t} ) $,
+related through  $  ( Q  ^ {t} , V  ^ {t} ) $,
+such that
-A sequence of messages with probability distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150162.png" /> and exactness conditions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150163.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150164.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150165.png" /> is called information-stable if there exists a sequence of pairs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150166.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150167.png" /> has probability density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150168.png" />, the probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150169.png" /> belongs to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150170.png" />, and
+$$ \tag{9 }
+\lim\limits _ {t \rightarrow \infty } \
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150171.png" /></td> <td valign="top" style="width:5%;text-align:right;">(10)</td></tr></table>
+\frac{I ( \eta , \widetilde \eta   {}  ^ {t} ) }{C ( Q  ^ {t} , V  ^ {t} ) }
-Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150172.png" /> be the set of probability distributions for which (3) with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150173.png" /> replaced by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150174.png" /> holds, and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150175.png" /> be the exactness condition given by (1) with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150176.png" /> replaced by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150177.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150178.png" />. The following coding theorem (Shannon) holds: Suppose one is given an information-stable sequence of messages with probability densities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150179.png" /> and exactness conditions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150180.png" />, as well as an information-stable sequence of channels <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150181.png" /> such that the functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150182.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150183.png" /> are uniformly bounded in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150184.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150185.png" /> as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150186.png" /> and let
+ =  1 .
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150187.png" /></td> <td valign="top" style="width:5%;text-align:right;">(11)</td></tr></table>
+A sequence of messages with probability distributions  $  p  ^ {t} ( \cdot ) $
+and exactness conditions  $  W  ^ {t} $
+with  $  H _ {W  ^ {t}  } ( p  ^ {t} ) < \infty $,
+$  t = 1 , 2 \dots $
+is called information-stable if there exists a sequence of pairs  $  ( \xi  ^ {t} , \widetilde \xi   {}  ^ {t} ) $
+such that  $  \xi  ^ {t} $
+has probability density  $  p ( \cdot ) $,
+the probability distribution of  $  ( \xi  ^ {t} , \widetilde \xi   {}  ^ {t} ) $
+belongs to  $  W  ^ {t} $,
+and
-Then for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150188.png" /> there exists an arbitrary large <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150189.png" /> such that for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150190.png" /> the messages with probability distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150191.png" /> can be transmitted over <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150192.png" /> with exactness of reproducibility <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150193.png" />.
+$$ \tag{10 }
+\lim\limits _
+{t \rightarrow \infty } \
-This formulation of a direct coding theorem is most general. The assumption that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150194.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150195.png" /> are uniformly bounded in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150196.png" /> can be substantially weakened. The information stability of a sequence of messages or channels is true in a large number of particular cases of practical interest. Finally, under certain conditions one may replace <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150197.png" /> by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150198.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150199.png" /> by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150200.png" /> in the formulation of the theorem.
+\frac{I ( \xi  ^ {t} , \widetilde \xi   {}  ^ {t} ) }{H _ {W  ^ {t}  } ( p  ^ {t} ) }
+ =  1 .
+$$
+Let  $  V _  \epsilon  $
+be the set of probability distributions for which (3) with  $  b $
+replaced by  $  b + \epsilon $
+holds, and let  $  W _  \epsilon  $
+be the exactness condition given by (1) with  $  a $
+replaced by  $  a + \epsilon $,
+$  \epsilon > 0 $.
+The following coding theorem (Shannon) holds: Suppose one is given an information-stable sequence of messages with probability densities  $  p  ^ {t} ( \cdot ) $
+and exactness conditions  $  W  ^ {t} $,
+as well as an information-stable sequence of channels  $  ( Q  ^ {t} , V  ^ {t} ) $
+such that the functions  $  \pi  ^ {t} ( \cdot ) $
+and  $  \rho  ^ {t} ( \cdot , \cdot ) $
+are uniformly bounded in  $  t $.
+Let  $  H _ {W  ^ {t}  } ( p  ^ {t} ) \rightarrow \infty $
+as  $  t \rightarrow \infty $
+and let
+$$ \tag{11 }
+\lim\limits _ {t \rightarrow \infty }  \textrm{ i nf  } \
+\frac{C ( Q  ^ {t} , V  ^ {t} ) }{H _ {W  ^ {t}  } ( p  ^ {t} ) }
+ >  1 .
+$$
+Then for any  $  \epsilon > 0 $
+there exists an arbitrary large  $  t _ {0} $
+such that for all  $  t \geq  t _ {0} $
+the messages with probability distributions  $  p  ^ {t} ( \cdot ) $
+can be transmitted over  $  ( Q  ^ {t} , V  ^ {t} ) $
+with exactness of reproducibility  $  W _  \epsilon   ^ {t} $.
+This formulation of a direct coding theorem is most general. The assumption that  $  \pi  ^ {t} ( \cdot ) $
+and  $  \rho  ^ {t} ( \cdot , \cdot ) $
+are uniformly bounded in  $  t $
+can be substantially weakened. The information stability of a sequence of messages or channels is true in a large number of particular cases of practical interest. Finally, under certain conditions one may replace  $  W _  \epsilon   ^ {t} $
+by  $  W  ^ {t} $
+and  $  V _  \epsilon   ^ {t} $
+by  $  V  ^ {t} $
+in the formulation of the theorem.
 In the description of real situations the case when the sequence of channels considered in the theorem is the sequence of segments of a fixed channel, while the sequence of messages is the sequence of segments of a message from a fixed source with growing number of components is of most interest. This corresponds to the functioning of a communication system in time. Below some versions of the coding theorem and its converse are given for this situation, in a somewhat different form. Moreover, discrete stationary sources and memoryless communication channels are considered.
-Suppose that a discrete stationary source <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150201.png" /> produces messages <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150202.png" />, where the individual components (the letters of the message) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150203.png" /> take values from some finite set of an alphabet <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150204.png" /> of volume <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150205.png" />, and generates letters of messages at the rate of one letter per unit of time. Let the components of the message <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150206.png" /> received by the addressee take values in the same alphabet <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150207.png" /> (i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150208.png" />). Suppose further that a discrete [[Memoryless channel|memoryless channel]] is being used, and that transmission over it is at the rate of one symbol per time interval <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150209.png" />. Suppose that there are no restrictions on the distribution of the input signal of the channel. Suppose that a segment of length <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150210.png" /> of a message, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150211.png" />, is transmitted over the channel by intervals of length <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150212.png" /> (where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150213.png" /> is the integer part of a number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150214.png" />) using certain encoding and decoding methods of the type described above. Then if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150215.png" /> is the corresponding segment of the message obtained by the addressee and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150216.png" /> is the average probability of an error in a source letter, defined by
+Suppose that a discrete stationary source  $  U $
+produces messages  $  \xi = \{ {\xi _ {k} } : {\dots, - 1 , 0 , 1 ,\dots } \} $,
+where the individual components (the letters of the message)  $  \xi _ {k} $
+take values from some finite set of an alphabet  $  X $
+of volume  $  M $,
+and generates letters of messages at the rate of one letter per unit of time. Let the components of the message  $  \widetilde \xi   = \{ {\widetilde \xi   _ {k} } : {\dots , - 1 , 0 , 1 ,\dots } \} $
+received by the addressee take values in the same alphabet  $  X $(
+i.e.  $  \widetilde{X}  = X $).
+Suppose further that a discrete [[Memoryless channel|memoryless channel]] is being used, and that transmission over it is at the rate of one symbol per time interval  $  t $.
+Suppose that there are no restrictions on the distribution of the input signal of the channel. Suppose that a segment of length  $  L = L _ {N} $
+of a message,  $  \xi  ^ {L} = ( \xi _ {1} \dots \xi _ {L} ) $,
+is transmitted over the channel by intervals of length  $  N = [ L / t ] $(
+where  $  [ x] $
+is the integer part of a number  $  x $)
+using certain encoding and decoding methods of the type described above. Then if  $  \widetilde \xi   {}  ^ {L} = ( \widetilde \xi   _ {1} \dots \widetilde \xi   _ {L} ) $
+is the corresponding segment of the message obtained by the addressee and  $  \widehat{P}  _ {e} $
+is the average probability of an error in a source letter, defined by
+$$ \tag{12 }
+\widehat{P}  _ {e}  = \
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150217.png" /></td> <td valign="top" style="width:5%;text-align:right;">(12)</td></tr></table>
+\frac{1}{L}
+\sum _ { l= } 1 ^ { L }
+{\mathsf P}
+\{ \widetilde \xi   _ {1} \neq \xi _ {l} \} ,
+$$
 then the following theorem holds.
-Converse of the coding theorem: Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150218.png" /> be the rate of creation of messages of the given discrete stationary source and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150219.png" /> be the transmission rate (on transmitted symbols) of the memoryless channel used. Then for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150220.png" />:
+Converse of the coding theorem: Let  $  H ( U) $
+be the rate of creation of messages of the given discrete stationary source and let  $  C $
+be the transmission rate (on transmitted symbols) of the memoryless channel used. Then for all  $  L $:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150221.png" /></td> <td valign="top" style="width:5%;text-align:right;">(13)</td></tr></table>
+$$ \tag{13 }
+\widehat{P}  _ {e}   \mathop{\rm log} ( M - 1 ) + h ( \widehat{P}  _ {e} )
+ \geq   H ( U) -
+\frac{1} \tau
+ C ,
+$$
 where
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150222.png" /></td> </tr></table>
+$$
+h ( x)  =  - x   \mathop{\rm log}  x -
+( 1 - x )   \mathop{\rm log} ( 1 - x ) .
+$$
-Thus, if the rate of creation of messages <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150223.png" /> is larger than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150224.png" /> (the transmission rate of the channel on a source letter), then the average probability of an error in a source letter is bounded from below by a non-zero constant, whatever <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150225.png" /> and the methods of encoding and decoding. Hence, this probability does not tend to zero as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150226.png" />.
+Thus, if the rate of creation of messages  $  H ( U) $
+is larger than  $  C / \tau $(
+the transmission rate of the channel on a source letter), then the average probability of an error in a source letter is bounded from below by a non-zero constant, whatever  $  L $
+and the methods of encoding and decoding. Hence, this probability does not tend to zero as  $  L \rightarrow \infty $.
 In order to formulate the direct statement of a coding theorem one needs the quantity
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150227.png" /></td> <td valign="top" style="width:5%;text-align:right;">(14)</td></tr></table>
+$$ \tag{14 }
+R _ {N}  = \
+\frac{ \mathop{\rm log} _ {2}  M ^ {L _ {N} } }{N}
+ .
+$$
-When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150228.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150229.png" /> is an integer one has <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150230.png" />, i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150231.png" /> is in bits (cf. [[Bit|Bit]]) the number of binary symbols produced by the source within the transmission time of one symbol over the channel. Moreover, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150232.png" /> coincides with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150233.png" /> if the components are independent and identically distributed. The probability of an error and the average probability of an error in a block of the source are defined respectively, by,
+When  $  M = 2 $
+and  $  L _ {N} / \tau $
+is an integer one has  $  R _ {N} = \tau $,
+i.e.  $  R _ {N} $
+is in bits (cf. [[Bit|Bit]]) the number of binary symbols produced by the source within the transmission time of one symbol over the channel. Moreover,  $  R _ {N} $
+coincides with  $  H ( U) $
+if the components are independent and identically distributed. The probability of an error and the average probability of an error in a block of the source are defined respectively, by,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150234.png" /></td> <td valign="top" style="width:5%;text-align:right;">(15)</td></tr></table>
+$$ \tag{15 }
+P _ {e,x  ^ {L}  }  = \
+{\mathsf P} _ {x  ^ {L}  }
+\{ \widetilde \xi   {}  ^ {L} \neq \xi  ^ {L} \} ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150235.png" /></td> <td valign="top" style="width:5%;text-align:right;">(16)</td></tr></table>
+$$ \tag{16 }
+\overline{P}\; _ {e}  =  \sum _ {x  ^ {L} } {\mathsf P} \{ \xi
+ ^ {L} = x  ^ {L} \} P _ {e,x  ^ {L}  } .
+$$
-Here <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150236.png" /> is the conditional probability under the condition <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150237.png" />. The following coding theorem holds: For all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150238.png" /> and any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150239.png" /> there are encoding and decoding methods such that for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150240.png" /> and all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150241.png" />,
+Here  $  P _ {x  ^ {L}  } ( \cdot ) $
+is the conditional probability under the condition  $  \xi  ^ {L} = x  ^ {L} \in X  ^ {L} $.
+The following coding theorem holds: For all  $  N $
+and any  $  R < C $
+there are encoding and decoding methods such that for  $  R _ {N} \leq  R $
+and all  $  x  ^ {L} \in X  ^ {L} $,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150242.png" /></td> <td valign="top" style="width:5%;text-align:right;">(17)</td></tr></table>
+$$ \tag{17 }
+P _ {e,x  ^ {L}  }
+ \leq   e ^ {- N E ( R) }
+$$
-(this also holds for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150243.png" />). Moreover, for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150244.png" /> the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150245.png" /> is convex, positive and decreases with increasing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150246.png" /> (see also [[Erroneous decoding, probability of|Erroneous decoding, probability of]]). Thus, this theorem also shows that for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150247.png" /> the probability of an error tends to zero, exponentially fast, as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150248.png" />.
+(this also holds for  $  \overline{P}\; _ {e} $).
+Moreover, for  $  0 \leq  R < C $
+the function  $  E ( R) $
+is convex, positive and decreases with increasing  $  R $(
+see also [[Erroneous decoding, probability of|Erroneous decoding, probability of]]). Thus, this theorem also shows that for all  $  R < C $
+the probability of an error tends to zero, exponentially fast, as  $  R \rightarrow \infty $.
 There are generalizations of Shannon theorems to the cases of so-called compound channels and messages with independent parameters. Such generalizations are of interest because in practice it is impossible to regard the statistical parameters of the source of messages and the communication channel as completely known, since these parameters can sometimes change in the process of transmission. Therefore, it is appropriate to assure that the source of the messages and the communication channel belong to a certain class of possible sources and channels. One introduces, moreover, the minimax criterion for the quality of transmission, in which the quality of a given transmission method is estimated for the best possible sources and channels belonging to the given classes.
-There are also generalizations of Shannon theorems to the transmission of information over a [[Channel with feedback|channel with feedback]]. The presence of complete feedback means that at the moment of time <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150249.png" /> it is considered that at the transmitting side of the channel (i.e. at its input) all exact values of the output signals at all moments <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051150/i051150250.png" /> are known. In particular, for a [[Memoryless channel|memoryless channel]] with feedback the basic result is that the presence of feedback does not increase the transmission rate of the channel, although it may substantially decrease the complexity of the encoding and decoding devices.
+There are also generalizations of Shannon theorems to the transmission of information over a [[Channel with feedback|channel with feedback]]. The presence of complete feedback means that at the moment of time  $  t $
+it is considered that at the transmitting side of the channel (i.e. at its input) all exact values of the output signals at all moments  $  t  ^  \prime  < t $
+are known. In particular, for a [[Memoryless channel|memoryless channel]] with feedback the basic result is that the presence of feedback does not increase the transmission rate of the channel, although it may substantially decrease the complexity of the encoding and decoding devices.
 Other generalizations to be mentioned are the theory of information transmission over channels with error synchronization, in which random synchronizations are possible, with as result the disturbance of the one-to-one correspondence between the input and output signal, as well as the theory of transmission over a [[Channel with multiple directions|channel with multiple directions]], when there are several sources and receivers of information, and where transmission may proceed over several directions at the same time.