Namespaces
Variants
Actions

Information correlation coefficient

From Encyclopedia of Mathematics
Jump to: navigation, search


The measure of dependency between two random variables $ X $ and $ Y $ defined as a function of the amount of information in one random variable with respect to the other by:

$$ R ( X , Y ) = \sqrt {1 - e ^ {- 2I( X, Y) } } , $$

where $ I ( X , Y ) $ is the amount of information (cf. Information, amount of).

The properties of $ R ( X , Y ) $ as a measure of dependency are completely determined by the properties of $ I ( X , Y ) $, which is itself a characteristic of the dependence between $ X $ and $ Y $. However, the use of $ R ( X , Y ) $ as a measure of dependency and as the information analogue of the correlation coefficient $ \rho $ is justified by the fact that for arbitrary random variables it has the advantage over $ \rho $ that, because of the properties of information, $ R ( X , Y ) = 0 $ if and only if $ X $ and $ Y $ are independent. If $ X $ and $ Y $ have a joint normal distribution, then these two coefficients coincide, since in this case

$$ I( X, Y) = - \frac{1}{2} \mathop{\rm ln} ( 1 - \rho ^ {2} ) . $$

The practical investigation of dependence by the information correlation coefficient is equivalent to the analysis of the amount of information in tables of the type of contingency tables of tests. The sample analogue of $ R $ is the coefficient

$$ \widehat{R} = \sqrt {1 - e ^ {- 2 \widehat{I} } } , $$

computed in terms of the information statistic $ \widehat{I} $:

$$ \widehat{I} = \ \sum_{i=1}^ { s } \sum_{j=1}^ { t } \frac{n _ {ij} }{n} \ \mathop{\rm ln} \ \frac{n n _ {ij} }{n _ {i \cdot } n _ {\cdot j } } , $$

where $ n $ is the number of observations, $ s $ and $ t $ are the numbers of grouping classes by the two characteristics, $ n _ {ij} $ is the number of observations in the class $ ( i , j ) $, $ n _ {i \cdot } = \sum_{j=1}^ {t} n _ {ij} $, $ n _ {\cdot j } = \sum_{i=1}^ {s} n _ {ij} $. Thus, the problem of the distribution of the sample information coefficient leads to the problem of the distribution of the sample information. The analysis of the sample information as a measure of dependency is made difficult by the fact that $ \widehat{I} $ strongly depends on the grouping of the observations.

References

[1] E. Linfoot, "An informational measure of correlation" Information and Control , 1 : 1 (1957) pp. 85–89
[2] S. Kullback, "Information theory and statistics" , Wiley (1959)
How to Cite This Entry:
Information correlation coefficient. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Information_correlation_coefficient&oldid=55202
This article was adapted from an original article by A.V. Prokhorov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article