Difference between revisions of "Bhattacharyya distance"

Latest revision as of 10:58, 29 May 2020

Several indices have been suggested in the statistical literature to reflect the degree of dissimilarity between any two probability distributions (cf. Probability distribution). Such indices have been variously called measures of distance between two distributions (see [a1], for instance), measures of separation (see [a2]), measures of discriminatory information [a3], [a4], and measures of variation-distance [a5]. While these indices have not all been introduced for exactly the same purpose, as the names given to them imply, they have the common property of increasing as the two distributions involved "move apart" . An index with this property may be called a measure of divergence of one distribution from another. A general method for generating measures of divergence has been discussed in [a6].

The Bhattacharyya distance is a measure of divergence. It can be defined formally as follows. Let $ ( \Omega, B, \nu ) $ be a measure space, and let $ P $ be the set of all probability measures (cf. Probability measure) on $ B $ that are absolutely continuous with respect to $ \nu $. Consider two such probability measures $ {\mathsf P} _ {1} , {\mathsf P} _ {2} \in P $ and let $ p _ {1} $ and $ p _ {2} $ be their respective density functions with respect to $ \nu $.

The Bhattacharyya coefficient between $ {\mathsf P} _ {1} $ and $ {\mathsf P} _ {2} $, denoted by $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) $, is defined by

$$ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = \int\limits _ \Omega {\left ( { { \frac{d {\mathsf P} _ {1} }{d \nu } } } \cdot { { \frac{d {\mathsf P} _ {2} }{d \nu } } } \right ) ^ {1/2 } } {d \nu } , $$

where $ { {d {\mathsf P} _ {i} } / {d \nu } } $ is the Radon–Nikodým derivative (cf. Radon–Nikodým theorem) of $ {\mathsf P} _ {i} $( $ i = 1, 2 $) with respect to $ \nu $. It is also known as the Kakutani coefficient [a9] and the Matusita coefficient [a10]. Note that $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) $ does not depend on the measure $ \nu $ dominating $ {\mathsf P} _ {1} $ and $ {\mathsf P} _ {2} $.

It is easy to verify that

i) $ 0 \leq \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) \leq 1 $;

ii) $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = 1 $ if and only if $ {\mathsf P} _ {1} = {\mathsf P} _ {2} $;

iii) $ \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = 0 $ if and only if $ {\mathsf P} _ {1} $ is orthogonal to $ {\mathsf P} _ {2} $.

The Bhattacharyya distance between two probability distributions $ {\mathsf P} _ {1} $ and $ {\mathsf P} _ {2} $, denoted by $ B ( 1, 2 ) $, is defined by

$$ B ( 1, 2 ) = - { \mathop{\rm ln} } \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) . $$

Clearly, $ 0 \leq B ( 1,2 ) \leq \infty $. The distance $ B ( 1,2 ) $ does not satisfy the triangle inequality (see [a7]). The Bhattacharyya distance comes out as a special case of the Chernoff distance (taking $ t = {1 / 2 } $):

$$ - { \mathop{\rm ln} } \inf _ {0 \leq t \leq 1 } \left ( \int\limits _ \Omega {p _ {1} ^ {t} p _ {2} ^ {1 - t } } {d \nu } \right ) . $$

The Hellinger distance [a8] between two probability measures $ {\mathsf P} _ {1} $ and $ {\mathsf P} _ {2} $, denoted by $ H ( 1,2 ) $, is related to the Bhattacharyya coefficient by the following relation:

$$ H ( 1, 2 ) = 2 [ 1 - \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) ] . $$

$ B ( 1,2 ) $ is called the Bhattacharyya distance since it is defined through the Bhattacharyya coefficient. It should be noted that the distance defined in a statistical context by A. Bhattacharyya [a11] is different from $ B ( 1, 2 ) $.

The Bhattacharyya distance is successfully used in engineering and statistical sciences. In the context of control theory and in the study of the problem of signal selection [a7], $ B ( 1, 2 ) $ is found superior to the Kullback–Leibler distance (cf. also Kullback–Leibler-type distance measures). If one uses the Bayes criterion for classification and attaches equal costs to each type of misclassification, then it has been shown [a12] that the total probability of misclassification is majorized by $ { \mathop{\rm exp} } \{ - B ( 1,2 ) \} $. In the case of equal covariances, maximization of $ B ( 1,2 ) $ yields the Fisher linear discriminant function. The Bhattacharyya distance is also used in evaluating the features in a two-class pattern recognition problem [a13]. Furthermore, it has been applied in time series discriminant analysis [a14], [a15], [a16], [a17].

See also [a18] and the references therein.

References

[a1]	B.P. Adhikari, D.D. Joshi, "Distance discrimination et résumé exhaustif" Publ. Inst. Statist. Univ. Paris , 5 (1956) pp. 57–74
[a2]	C.R. Rao, "Advanced statistical methods in biometric research" , Wiley (1952)
[a3]	H. Chernoff, "A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations" Ann. Math. Stat. , 23 (1952) pp. 493–507
[a4]	S. Kullback, "Information theory and statistics" , Wiley (1959)
[a5]	A.N. Kolmogorov, "On the approximation of distributions of sums of independent summands by infinitely divisible distributions" Sankhyā , 25 (1963) pp. 159–174
[a6]	S.M. Ali, S.D. Silvey, "A general class of coefficients of divergence of one distribution from another" J. Roy. Statist. Soc. B , 28 (1966) pp. 131–142
[a7]	T. Kailath, "The divergence and Bhattacharyya distance measures in signal selection" IEEE Trans. Comm. Techn. , COM–15 (1967) pp. 52–60
[a8]	E. Hellinger, "Neue Begrundung der Theorie quadratischer Formen von unendlichvielen Veränderlichen" J. Reine Angew. Math. , 36 (1909) pp. 210–271
[a9]	S. Kakutani, "On equivalence of infinite product measures" Ann. Math. Stat. , 49 (1948) pp. 214–224
[a10]	K. Matusita, "A distance and related statistics in multivariate analysis" P.R. Krishnaiah (ed.) , Proc. Internat. Symp. Multivariate Analysis , Acad. Press (1966) pp. 187–200
[a11]	A. Bhattacharyya, "On a measure of divergence between two statistical populations defined by probability distributions" Bull. Calcutta Math. Soc. , 35 (1943) pp. 99–109
[a12]	K. Matusita, "Some properties of affinity and applications" Ann. Inst. Statist. Math. , 23 (1971) pp. 137–155
[a13]	Ray, S., "On a theoretical property of the Bhattacharyya coefficient as a feature evaluation criterion" Pattern Recognition Letters , 9 (1989) pp. 315–319
[a14]	G. Chaudhuri, J.D. Borwankar, P.R.K. Rao, "Bhattacharyya distance-based linear discriminant function for stationary time series" Comm. Statist. (Theory and Methods) , 20 (1991) pp. 2195–2205
[a15]	G. Chaudhuri, J.D. Borwankar, P.R.K. Rao, "Bhattacharyya distance-based linear discrimination" J. Indian Statist. Assoc. , 29 (1991) pp. 47–56
[a16]	G. Chaudhuri, "Linear discriminant function for complex normal time series" Statistics and Probability Lett. , 15 (1992) pp. 277–279
[a17]	G. Chaudhuri, "Some results in Bhattacharyya distance-based linear discrimination and in design of signals" Ph.D. Thesis Dept. Math. Indian Inst. Technology, Kanpur, India (1989)
[a18]	I.J. Good, E.P. Smith, "The variance and covariance of a generalized index of similarity especially for a generalization of an index of Hellinger and Bhattacharyya" Commun. Statist. (Theory and Methods) , 14 (1985) pp. 3053–3061

How to Cite This Entry:
Bhattacharyya distance. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bhattacharyya_distance&oldid=15124

This article was adapted from an original article by G. Chaudhuri (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Bhattacharyya distance"

Latest revision as of 10:58, 29 May 2020

References

@@ Line 1: / Line 1: @@
+<!--
+b1104901.png
+$#A+1 = 43 n = 0
+$#C+1 = 43 : ~/encyclopedia/old_files/data/B110/B.1100490 Bhattacharyya distance
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
+{{TEX|auto}}
+{{TEX|done}}
 Several indices have been suggested in the statistical literature to reflect the degree of dissimilarity between any two probability distributions (cf. [[Probability distribution|Probability distribution]]). Such indices have been variously called measures of distance between two distributions (see [[#References|[a1]]], for instance), measures of separation (see [[#References|[a2]]]), measures of discriminatory information [[#References|[a3]]], [[#References|[a4]]], and measures of variation-distance [[#References|[a5]]]. While these indices have not all been introduced for exactly the same purpose, as the names given to them imply, they have the common property of increasing as the two distributions involved  "move apart" . An index with this property may be called a measure of divergence of one distribution from another. A general method for generating measures of divergence has been discussed in [[#References|[a6]]].
-The Bhattacharyya distance is a measure of divergence. It can be defined formally as follows. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b1104901.png" /> be a [[Measure space|measure space]], and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b1104902.png" /> be the set of all probability measures (cf. [[Probability measure|Probability measure]]) on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b1104903.png" /> that are absolutely continuous with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b1104904.png" />. Consider two such probability measures <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b1104905.png" /> and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b1104906.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b1104907.png" /> be their respective density functions with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b1104908.png" />.
+The Bhattacharyya distance is a measure of divergence. It can be defined formally as follows. Let  $  ( \Omega, B, \nu ) $
+be a [[Measure space|measure space]], and let  $  P $
+be the set of all probability measures (cf. [[Probability measure|Probability measure]]) on  $  B $
+that are absolutely continuous with respect to  $  \nu $.
+Consider two such probability measures  $  {\mathsf P} _ {1} , {\mathsf P} _ {2} \in P $
+and let  $  p _ {1} $
+and  $  p _ {2} $
+be their respective density functions with respect to  $  \nu $.
-The Bhattacharyya coefficient between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b1104909.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049010.png" />, denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049011.png" />, is defined by
+The Bhattacharyya coefficient between  $  {\mathsf P} _ {1} $
+and  $  {\mathsf P} _ {2} $,
+denoted by  $  \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) $,
+is defined by
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049012.png" /></td> </tr></table>
+$$
+\rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = \int\limits _  \Omega  {\left ( { {
+\frac{d {\mathsf P} _ {1} }{d \nu }
+ } } \cdot { {
+\frac{d {\mathsf P} _ {2} }{d \nu }
+ } } \right ) ^ {1/2 } }  {d \nu } ,
+$$
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049013.png" /> is the Radon–Nikodým derivative (cf. [[Radon–Nikodým theorem|Radon–Nikodým theorem]]) of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049014.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049015.png" />) with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049016.png" />. It is also known as the Kakutani coefficient [[#References|[a9]]] and the Matusita coefficient [[#References|[a10]]]. Note that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049017.png" /> does not depend on the measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049018.png" /> dominating <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049019.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049020.png" />.
+where  $  { {d {\mathsf P} _ {i} } / {d \nu } } $
+is the Radon–Nikodým derivative (cf. [[Radon–Nikodým theorem|Radon–Nikodým theorem]]) of  $  {\mathsf P} _ {i} $(
+$  i = 1, 2 $)
+with respect to  $  \nu $.
+It is also known as the Kakutani coefficient [[#References|[a9]]] and the Matusita coefficient [[#References|[a10]]]. Note that  $  \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) $
+does not depend on the measure  $  \nu $
+dominating  $  {\mathsf P} _ {1} $
+and  $  {\mathsf P} _ {2} $.
 It is easy to verify that
-i) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049021.png" />;
+i)  $  0 \leq  \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) \leq  1 $;
-ii) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049022.png" /> if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049023.png" />;
+ii)  $  \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = 1 $
+if and only if  $  {\mathsf P} _ {1} = {\mathsf P} _ {2} $;
-iii) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049024.png" /> if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049025.png" /> is orthogonal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049026.png" />.
+iii)  $  \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = 0 $
+if and only if  $  {\mathsf P} _ {1} $
+is orthogonal to  $  {\mathsf P} _ {2} $.
-The Bhattacharyya distance between two probability distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049027.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049028.png" />, denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049029.png" />, is defined by
+The Bhattacharyya distance between two probability distributions  $  {\mathsf P} _ {1} $
+and  $  {\mathsf P} _ {2} $,
+denoted by  $  B ( 1, 2 ) $,
+is defined by
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049030.png" /></td> </tr></table>
+$$
+B ( 1, 2 ) = - { \mathop{\rm ln} } \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) .
+$$
-Clearly, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049031.png" />. The distance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049032.png" /> does not satisfy the triangle inequality (see [[#References|[a7]]]). The Bhattacharyya distance comes out as a special case of the Chernoff distance (taking <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049033.png" />):
+Clearly,  $  0 \leq  B ( 1,2 ) \leq  \infty $.
+The distance  $  B ( 1,2 ) $
+does not satisfy the triangle inequality (see [[#References|[a7]]]). The Bhattacharyya distance comes out as a special case of the Chernoff distance (taking  $  t = {1 / 2 } $):
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049034.png" /></td> </tr></table>
+$$
+- { \mathop{\rm ln} }  \inf  _ {0 \leq  t \leq  1 } \left ( \int\limits _  \Omega  {p _ {1}  ^ {t} p _ {2} ^ {1 - t } }  {d \nu } \right ) .
+$$
-The [[Hellinger distance|Hellinger distance]] [[#References|[a8]]] between two probability measures <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049035.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049036.png" />, denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049037.png" />, is related to the Bhattacharyya coefficient by the following relation:
+The [[Hellinger distance|Hellinger distance]] [[#References|[a8]]] between two probability measures  $  {\mathsf P} _ {1} $
+and  $  {\mathsf P} _ {2} $,
+denoted by  $  H ( 1,2 ) $,
+is related to the Bhattacharyya coefficient by the following relation:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049038.png" /></td> </tr></table>
+$$
+H ( 1, 2 ) = 2 [ 1 - \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) ] .
+$$
-<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049039.png" /> is called the Bhattacharyya distance since it is defined through the Bhattacharyya coefficient. It should be noted that the distance defined in a statistical context by A. Bhattacharyya [[#References|[a11]]] is different from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049040.png" />.
+$  B ( 1,2 ) $
+is called the Bhattacharyya distance since it is defined through the Bhattacharyya coefficient. It should be noted that the distance defined in a statistical context by A. Bhattacharyya [[#References|[a11]]] is different from  $  B ( 1, 2 ) $.
-The Bhattacharyya distance is successfully used in engineering and statistical sciences. In the context of control theory and in the study of the problem of signal selection [[#References|[a7]]], <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049041.png" /> is found superior to the Kullback–Leibler distance (cf. also [[Kullback–Leibler-type distance measures|Kullback–Leibler-type distance measures]]). If one uses the Bayes criterion for classification and attaches equal costs to each type of misclassification, then it has been shown [[#References|[a12]]] that the total probability of misclassification is majorized by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049042.png" />. In the case of equal covariances, maximization of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110490/b11049043.png" /> yields the Fisher linear discriminant function. The Bhattacharyya distance is also used in evaluating the features in a two-class [[Pattern recognition|pattern recognition]] problem [[#References|[a13]]]. Furthermore, it has been applied in time series discriminant analysis [[#References|[a14]]], [[#References|[a15]]], [[#References|[a16]]], [[#References|[a17]]].
+The Bhattacharyya distance is successfully used in engineering and statistical sciences. In the context of control theory and in the study of the problem of signal selection [[#References|[a7]]],  $  B ( 1, 2 ) $
+is found superior to the Kullback–Leibler distance (cf. also [[Kullback–Leibler-type distance measures|Kullback–Leibler-type distance measures]]). If one uses the Bayes criterion for classification and attaches equal costs to each type of misclassification, then it has been shown [[#References|[a12]]] that the total probability of misclassification is majorized by  $  { \mathop{\rm exp} } \{ - B ( 1,2 ) \} $.
+In the case of equal covariances, maximization of  $  B ( 1,2 ) $
+yields the Fisher linear discriminant function. The Bhattacharyya distance is also used in evaluating the features in a two-class [[Pattern recognition|pattern recognition]] problem [[#References|[a13]]]. Furthermore, it has been applied in time series discriminant analysis [[#References|[a14]]], [[#References|[a15]]], [[#References|[a16]]], [[#References|[a17]]].
 See also [[#References|[a18]]] and the references therein.