Information distance
A metric or pseudo-metric on the set of probability distributions, characterizing the "non-similarity" of the random phenomena described by these distributions. Most interesting is the information distance related to the measure of informativity of an experiment in the problem of differentiating between
and
by observations.
In any concrete statistical problem it is necessary to make inferences on the observed phenomenon. These inferences are, as a rule, not exact, since the outcomes of observations are random. It is intuitively clear that any sample carries some amount of useful information. Moreover: A) information may only get lost in transmission; and B) information presented by different independent sources, e.g. independent samples, can be summed. Thus, if one introduces the informativity of an experiment as the average amount of information (cf. also Information, amount of) in an observation, then for it the axioms A) and B) are fulfilled. Although the concept of information remains intuitive, one can sometimes find a quantity satisfying A) and B) that describes asymptotically the average exactness of inferences in a problem with a growing number of observations, and that therefore can naturally be taken as the informativity. The informativity is either a numerical or a matrix quantity. An important example is the information matrix in the problem of estimating the parameter of a distribution law.
According to axiom B) informativities behave like squares of length, i.e. the square of a reasonable information distance must have the property of additivity. The simplest information distances are: the distance in variation:
![]() |
and the Fisher distance in an invariant Riemannian metric:
![]() |
The latter does not have the property of additivity, and has no proper statistical meaning.
According to the Neyman–Pearson theory all the useful information about differentiating between probability distributions and
on a common space
of outcomes
is contained in the likelihood ratio or its logarithm:
![]() |
determined up to values on a set of outcomes of probability zero. The mathematical expectation
![]() |
![]() |
is called the (average) information for differentiating (according to Kullback) in favour of against
, and also the relative entropy, or information deviation. The non-negative (perhaps infinite) quantity
satisfies axioms A) and B). It characterizes the exactness of one-sided differentiation of
against
, having defined the maximal order of decrease of the probability
of an error of the second kind (i.e. falsely accepting hypothesis
when it is not true). As the number
of independent observations grows one has:
![]() |
for a fixed significance level — the probability of an error of the first kind,
.
The analogous quantity determines the maximal order of decrease of
for
. The relation of "similarity" , in particular that of "similarity" of random phenomena, is not symmetric and, as a rule,
. The geometric interpretation of
as half the square of the non-symmetric distance from
to
proved to be natural in a number of problems in statistics. For such information distances the triangle inequality is not true, but a non-symmetric analogue of the Pythagorean theorem holds:
![]() |
if
![]() |
A symmetric characteristic of similarity of and
arises when testing them by a minimax procedure. For an optimal test
![]() |
![]() |
![]() |
Certain other information distances are related to the information deviation (cf. [1], [2]). For infinitesimally close and
the principal part of the information deviation, as well of the square of any reasonable information distance, is given, up to a constant multiple
, by the Fisher quadratic form. For the information deviation
![]() |
On the other hand, any information deviation satisfying axiom A) only induces a topology which majorizes the topology induced by the distance in variation (defined above), [3], [4].
References
[1] | S. Kullback, "Information theory and statistics" , Wiley (1959) |
[2] | N.N. [N.N. Chentsov] Čentsov, "Statistical decision rules and optimal inference" , Amer. Math. Soc. (1982) (Translated from Russian) |
[3] | I. Csiszar, "On topological properties of ![]() |
[4] | E.A. Morozova, N.N. [N.N. Chentsov] Čencov, "Markov maps in noncommutative probability theory and mathematical statistics" Yu.V. Prokhorov (ed.) et al. (ed.) , Probability theory and mathematical statistics , VNU (1987) pp. 287–310 |
Information distance. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Information_distance&oldid=18595