# Difference between revisions of "Bhattacharyya distance"

(Importing text file) |
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
||

Line 1: | Line 1: | ||

+ | <!-- | ||

+ | b1104901.png | ||

+ | $#A+1 = 43 n = 0 | ||

+ | $#C+1 = 43 : ~/encyclopedia/old_files/data/B110/B.1100490 Bhattacharyya distance | ||

+ | Automatically converted into TeX, above some diagnostics. | ||

+ | Please remove this comment and the {{TEX|auto}} line below, | ||

+ | if TeX found to be correct. | ||

+ | --> | ||

+ | |||

+ | {{TEX|auto}} | ||

+ | {{TEX|done}} | ||

+ | |||

Several indices have been suggested in the statistical literature to reflect the degree of dissimilarity between any two probability distributions (cf. [[Probability distribution|Probability distribution]]). Such indices have been variously called measures of distance between two distributions (see [[#References|[a1]]], for instance), measures of separation (see [[#References|[a2]]]), measures of discriminatory information [[#References|[a3]]], [[#References|[a4]]], and measures of variation-distance [[#References|[a5]]]. While these indices have not all been introduced for exactly the same purpose, as the names given to them imply, they have the common property of increasing as the two distributions involved "move apart" . An index with this property may be called a measure of divergence of one distribution from another. A general method for generating measures of divergence has been discussed in [[#References|[a6]]]. | Several indices have been suggested in the statistical literature to reflect the degree of dissimilarity between any two probability distributions (cf. [[Probability distribution|Probability distribution]]). Such indices have been variously called measures of distance between two distributions (see [[#References|[a1]]], for instance), measures of separation (see [[#References|[a2]]]), measures of discriminatory information [[#References|[a3]]], [[#References|[a4]]], and measures of variation-distance [[#References|[a5]]]. While these indices have not all been introduced for exactly the same purpose, as the names given to them imply, they have the common property of increasing as the two distributions involved "move apart" . An index with this property may be called a measure of divergence of one distribution from another. A general method for generating measures of divergence has been discussed in [[#References|[a6]]]. | ||

− | The Bhattacharyya distance is a measure of divergence. It can be defined formally as follows. Let | + | The Bhattacharyya distance is a measure of divergence. It can be defined formally as follows. Let $ ( \Omega, B, \nu ) $ |

+ | be a [[Measure space|measure space]], and let $ P $ | ||

+ | be the set of all probability measures (cf. [[Probability measure|Probability measure]]) on $ B $ | ||

+ | that are absolutely continuous with respect to $ \nu $. | ||

+ | Consider two such probability measures $ {\mathsf P} _ {1} , {\mathsf P} _ {2} \in P $ | ||

+ | and let $ p _ {1} $ | ||

+ | and $ p _ {2} $ | ||

+ | be their respective density functions with respect to $ \nu $. | ||

− | The Bhattacharyya coefficient between | + | The Bhattacharyya coefficient between $ {\mathsf P} _ {1} $ |

+ | and $ {\mathsf P} _ {2} $, | ||

+ | denoted by $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) $, | ||

+ | is defined by | ||

− | + | $$ | |

+ | \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = \int\limits _ \Omega {\left ( { { | ||

+ | \frac{d {\mathsf P} _ {1} }{d \nu } | ||

+ | } } \cdot { { | ||

+ | \frac{d {\mathsf P} _ {2} }{d \nu } | ||

+ | } } \right ) ^ {1/2 } } {d \nu } , | ||

+ | $$ | ||

− | where | + | where $ { {d {\mathsf P} _ {i} } / {d \nu } } $ |

+ | is the Radon–Nikodým derivative (cf. [[Radon–Nikodým theorem|Radon–Nikodým theorem]]) of $ {\mathsf P} _ {i} $( | ||

+ | $ i = 1, 2 $) | ||

+ | with respect to $ \nu $. | ||

+ | It is also known as the Kakutani coefficient [[#References|[a9]]] and the Matusita coefficient [[#References|[a10]]]. Note that $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) $ | ||

+ | does not depend on the measure $ \nu $ | ||

+ | dominating $ {\mathsf P} _ {1} $ | ||

+ | and $ {\mathsf P} _ {2} $. | ||

It is easy to verify that | It is easy to verify that | ||

− | i) | + | i) $ 0 \leq \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) \leq 1 $; |

− | ii) | + | ii) $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = 1 $ |

+ | if and only if $ {\mathsf P} _ {1} = {\mathsf P} _ {2} $; | ||

− | iii) | + | iii) $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = 0 $ |

+ | if and only if $ {\mathsf P} _ {1} $ | ||

+ | is orthogonal to $ {\mathsf P} _ {2} $. | ||

− | The Bhattacharyya distance between two probability distributions | + | The Bhattacharyya distance between two probability distributions $ {\mathsf P} _ {1} $ |

+ | and $ {\mathsf P} _ {2} $, | ||

+ | denoted by $ B ( 1, 2 ) $, | ||

+ | is defined by | ||

− | + | $$ | |

+ | B ( 1, 2 ) = - { \mathop{\rm ln} } \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) . | ||

+ | $$ | ||

− | Clearly, | + | Clearly, $ 0 \leq B ( 1,2 ) \leq \infty $. |

+ | The distance $ B ( 1,2 ) $ | ||

+ | does not satisfy the triangle inequality (see [[#References|[a7]]]). The Bhattacharyya distance comes out as a special case of the Chernoff distance (taking $ t = {1 / 2 } $): | ||

− | + | $$ | |

+ | - { \mathop{\rm ln} } \inf _ {0 \leq t \leq 1 } \left ( \int\limits _ \Omega {p _ {1} ^ {t} p _ {2} ^ {1 - t } } {d \nu } \right ) . | ||

+ | $$ | ||

− | The [[Hellinger distance|Hellinger distance]] [[#References|[a8]]] between two probability measures | + | The [[Hellinger distance|Hellinger distance]] [[#References|[a8]]] between two probability measures $ {\mathsf P} _ {1} $ |

+ | and $ {\mathsf P} _ {2} $, | ||

+ | denoted by $ H ( 1,2 ) $, | ||

+ | is related to the Bhattacharyya coefficient by the following relation: | ||

− | + | $$ | |

+ | H ( 1, 2 ) = 2 [ 1 - \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) ] . | ||

+ | $$ | ||

− | + | $ B ( 1,2 ) $ | |

+ | is called the Bhattacharyya distance since it is defined through the Bhattacharyya coefficient. It should be noted that the distance defined in a statistical context by A. Bhattacharyya [[#References|[a11]]] is different from $ B ( 1, 2 ) $. | ||

− | The Bhattacharyya distance is successfully used in engineering and statistical sciences. In the context of control theory and in the study of the problem of signal selection [[#References|[a7]]], | + | The Bhattacharyya distance is successfully used in engineering and statistical sciences. In the context of control theory and in the study of the problem of signal selection [[#References|[a7]]], $ B ( 1, 2 ) $ |

+ | is found superior to the Kullback–Leibler distance (cf. also [[Kullback–Leibler-type distance measures|Kullback–Leibler-type distance measures]]). If one uses the Bayes criterion for classification and attaches equal costs to each type of misclassification, then it has been shown [[#References|[a12]]] that the total probability of misclassification is majorized by $ { \mathop{\rm exp} } \{ - B ( 1,2 ) \} $. | ||

+ | In the case of equal covariances, maximization of $ B ( 1,2 ) $ | ||

+ | yields the Fisher linear discriminant function. The Bhattacharyya distance is also used in evaluating the features in a two-class [[Pattern recognition|pattern recognition]] problem [[#References|[a13]]]. Furthermore, it has been applied in time series discriminant analysis [[#References|[a14]]], [[#References|[a15]]], [[#References|[a16]]], [[#References|[a17]]]. | ||

See also [[#References|[a18]]] and the references therein. | See also [[#References|[a18]]] and the references therein. |

## Latest revision as of 10:58, 29 May 2020

Several indices have been suggested in the statistical literature to reflect the degree of dissimilarity between any two probability distributions (cf. Probability distribution). Such indices have been variously called measures of distance between two distributions (see [a1], for instance), measures of separation (see [a2]), measures of discriminatory information [a3], [a4], and measures of variation-distance [a5]. While these indices have not all been introduced for exactly the same purpose, as the names given to them imply, they have the common property of increasing as the two distributions involved "move apart" . An index with this property may be called a measure of divergence of one distribution from another. A general method for generating measures of divergence has been discussed in [a6].

The Bhattacharyya distance is a measure of divergence. It can be defined formally as follows. Let $ ( \Omega, B, \nu ) $ be a measure space, and let $ P $ be the set of all probability measures (cf. Probability measure) on $ B $ that are absolutely continuous with respect to $ \nu $. Consider two such probability measures $ {\mathsf P} _ {1} , {\mathsf P} _ {2} \in P $ and let $ p _ {1} $ and $ p _ {2} $ be their respective density functions with respect to $ \nu $.

The Bhattacharyya coefficient between $ {\mathsf P} _ {1} $ and $ {\mathsf P} _ {2} $, denoted by $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) $, is defined by

$$ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = \int\limits _ \Omega {\left ( { { \frac{d {\mathsf P} _ {1} }{d \nu } } } \cdot { { \frac{d {\mathsf P} _ {2} }{d \nu } } } \right ) ^ {1/2 } } {d \nu } , $$

where $ { {d {\mathsf P} _ {i} } / {d \nu } } $ is the Radon–Nikodým derivative (cf. Radon–Nikodým theorem) of $ {\mathsf P} _ {i} $( $ i = 1, 2 $) with respect to $ \nu $. It is also known as the Kakutani coefficient [a9] and the Matusita coefficient [a10]. Note that $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) $ does not depend on the measure $ \nu $ dominating $ {\mathsf P} _ {1} $ and $ {\mathsf P} _ {2} $.

It is easy to verify that

i) $ 0 \leq \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) \leq 1 $;

ii) $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = 1 $ if and only if $ {\mathsf P} _ {1} = {\mathsf P} _ {2} $;

iii) $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = 0 $ if and only if $ {\mathsf P} _ {1} $ is orthogonal to $ {\mathsf P} _ {2} $.

The Bhattacharyya distance between two probability distributions $ {\mathsf P} _ {1} $ and $ {\mathsf P} _ {2} $, denoted by $ B ( 1, 2 ) $, is defined by

$$ B ( 1, 2 ) = - { \mathop{\rm ln} } \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) . $$

Clearly, $ 0 \leq B ( 1,2 ) \leq \infty $. The distance $ B ( 1,2 ) $ does not satisfy the triangle inequality (see [a7]). The Bhattacharyya distance comes out as a special case of the Chernoff distance (taking $ t = {1 / 2 } $):

$$ - { \mathop{\rm ln} } \inf _ {0 \leq t \leq 1 } \left ( \int\limits _ \Omega {p _ {1} ^ {t} p _ {2} ^ {1 - t } } {d \nu } \right ) . $$

The Hellinger distance [a8] between two probability measures $ {\mathsf P} _ {1} $ and $ {\mathsf P} _ {2} $, denoted by $ H ( 1,2 ) $, is related to the Bhattacharyya coefficient by the following relation:

$$ H ( 1, 2 ) = 2 [ 1 - \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) ] . $$

$ B ( 1,2 ) $ is called the Bhattacharyya distance since it is defined through the Bhattacharyya coefficient. It should be noted that the distance defined in a statistical context by A. Bhattacharyya [a11] is different from $ B ( 1, 2 ) $.

The Bhattacharyya distance is successfully used in engineering and statistical sciences. In the context of control theory and in the study of the problem of signal selection [a7], $ B ( 1, 2 ) $ is found superior to the Kullback–Leibler distance (cf. also Kullback–Leibler-type distance measures). If one uses the Bayes criterion for classification and attaches equal costs to each type of misclassification, then it has been shown [a12] that the total probability of misclassification is majorized by $ { \mathop{\rm exp} } \{ - B ( 1,2 ) \} $. In the case of equal covariances, maximization of $ B ( 1,2 ) $ yields the Fisher linear discriminant function. The Bhattacharyya distance is also used in evaluating the features in a two-class pattern recognition problem [a13]. Furthermore, it has been applied in time series discriminant analysis [a14], [a15], [a16], [a17].

See also [a18] and the references therein.

#### References

[a1] | B.P. Adhikari, D.D. Joshi, "Distance discrimination et résumé exhaustif" Publ. Inst. Statist. Univ. Paris , 5 (1956) pp. 57–74 |

[a2] | C.R. Rao, "Advanced statistical methods in biometric research" , Wiley (1952) |

[a3] | H. Chernoff, "A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations" Ann. Math. Stat. , 23 (1952) pp. 493–507 |

[a4] | S. Kullback, "Information theory and statistics" , Wiley (1959) |

[a5] | A.N. Kolmogorov, "On the approximation of distributions of sums of independent summands by infinitely divisible distributions" Sankhyā , 25 (1963) pp. 159–174 |

[a6] | S.M. Ali, S.D. Silvey, "A general class of coefficients of divergence of one distribution from another" J. Roy. Statist. Soc. B , 28 (1966) pp. 131–142 |

[a7] | T. Kailath, "The divergence and Bhattacharyya distance measures in signal selection" IEEE Trans. Comm. Techn. , COM–15 (1967) pp. 52–60 |

[a8] | E. Hellinger, "Neue Begrundung der Theorie quadratischer Formen von unendlichvielen Veränderlichen" J. Reine Angew. Math. , 36 (1909) pp. 210–271 |

[a9] | S. Kakutani, "On equivalence of infinite product measures" Ann. Math. Stat. , 49 (1948) pp. 214–224 |

[a10] | K. Matusita, "A distance and related statistics in multivariate analysis" P.R. Krishnaiah (ed.) , Proc. Internat. Symp. Multivariate Analysis , Acad. Press (1966) pp. 187–200 |

[a11] | A. Bhattacharyya, "On a measure of divergence between two statistical populations defined by probability distributions" Bull. Calcutta Math. Soc. , 35 (1943) pp. 99–109 |

[a12] | K. Matusita, "Some properties of affinity and applications" Ann. Inst. Statist. Math. , 23 (1971) pp. 137–155 |

[a13] | Ray, S., "On a theoretical property of the Bhattacharyya coefficient as a feature evaluation criterion" Pattern Recognition Letters , 9 (1989) pp. 315–319 |

[a14] | G. Chaudhuri, J.D. Borwankar, P.R.K. Rao, "Bhattacharyya distance-based linear discriminant function for stationary time series" Comm. Statist. (Theory and Methods) , 20 (1991) pp. 2195–2205 |

[a15] | G. Chaudhuri, J.D. Borwankar, P.R.K. Rao, "Bhattacharyya distance-based linear discrimination" J. Indian Statist. Assoc. , 29 (1991) pp. 47–56 |

[a16] | G. Chaudhuri, "Linear discriminant function for complex normal time series" Statistics and Probability Lett. , 15 (1992) pp. 277–279 |

[a17] | G. Chaudhuri, "Some results in Bhattacharyya distance-based linear discrimination and in design of signals" Ph.D. Thesis Dept. Math. Indian Inst. Technology, Kanpur, India (1989) |

[a18] | I.J. Good, E.P. Smith, "The variance and covariance of a generalized index of similarity especially for a generalization of an index of Hellinger and Bhattacharyya" Commun. Statist. (Theory and Methods) , 14 (1985) pp. 3053–3061 |

**How to Cite This Entry:**

Bhattacharyya distance.

*Encyclopedia of Mathematics.*URL: http://encyclopediaofmath.org/index.php?title=Bhattacharyya_distance&oldid=15124