# Difference between revisions of "Bhattacharyya distance"

Several indices have been suggested in the statistical literature to reflect the degree of dissimilarity between any two probability distributions (cf. Probability distribution). Such indices have been variously called measures of distance between two distributions (see [a1], for instance), measures of separation (see [a2]), measures of discriminatory information [a3], [a4], and measures of variation-distance [a5]. While these indices have not all been introduced for exactly the same purpose, as the names given to them imply, they have the common property of increasing as the two distributions involved "move apart" . An index with this property may be called a measure of divergence of one distribution from another. A general method for generating measures of divergence has been discussed in [a6].

The Bhattacharyya distance is a measure of divergence. It can be defined formally as follows. Let $( \Omega, B, \nu )$ be a measure space, and let $P$ be the set of all probability measures (cf. Probability measure) on $B$ that are absolutely continuous with respect to $\nu$. Consider two such probability measures ${\mathsf P} _ {1} , {\mathsf P} _ {2} \in P$ and let $p _ {1}$ and $p _ {2}$ be their respective density functions with respect to $\nu$.

The Bhattacharyya coefficient between ${\mathsf P} _ {1}$ and ${\mathsf P} _ {2}$, denoted by $\rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} )$, is defined by

$$\rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = \int\limits _ \Omega {\left ( { { \frac{d {\mathsf P} _ {1} }{d \nu } } } \cdot { { \frac{d {\mathsf P} _ {2} }{d \nu } } } \right ) ^ {1/2 } } {d \nu } ,$$

where ${ {d {\mathsf P} _ {i} } / {d \nu } }$ is the Radon–Nikodým derivative (cf. Radon–Nikodým theorem) of ${\mathsf P} _ {i}$( $i = 1, 2$) with respect to $\nu$. It is also known as the Kakutani coefficient [a9] and the Matusita coefficient [a10]. Note that $\rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} )$ does not depend on the measure $\nu$ dominating ${\mathsf P} _ {1}$ and ${\mathsf P} _ {2}$.

It is easy to verify that

i) $0 \leq \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) \leq 1$;

ii) $\rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = 1$ if and only if ${\mathsf P} _ {1} = {\mathsf P} _ {2}$;

iii) $\rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) = 0$ if and only if ${\mathsf P} _ {1}$ is orthogonal to ${\mathsf P} _ {2}$.

The Bhattacharyya distance between two probability distributions ${\mathsf P} _ {1}$ and ${\mathsf P} _ {2}$, denoted by $B ( 1, 2 )$, is defined by

$$B ( 1, 2 ) = - { \mathop{\rm ln} } \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) .$$

Clearly, $0 \leq B ( 1,2 ) \leq \infty$. The distance $B ( 1,2 )$ does not satisfy the triangle inequality (see [a7]). The Bhattacharyya distance comes out as a special case of the Chernoff distance (taking $t = {1 / 2 }$):

$$- { \mathop{\rm ln} } \inf _ {0 \leq t \leq 1 } \left ( \int\limits _ \Omega {p _ {1} ^ {t} p _ {2} ^ {1 - t } } {d \nu } \right ) .$$

The Hellinger distance [a8] between two probability measures ${\mathsf P} _ {1}$ and ${\mathsf P} _ {2}$, denoted by $H ( 1,2 )$, is related to the Bhattacharyya coefficient by the following relation:

$$H ( 1, 2 ) = 2 [ 1 - \rho ( {\mathsf P} _ {1} , {\mathsf P} _ {2} ) ] .$$

$B ( 1,2 )$ is called the Bhattacharyya distance since it is defined through the Bhattacharyya coefficient. It should be noted that the distance defined in a statistical context by A. Bhattacharyya [a11] is different from $B ( 1, 2 )$.

The Bhattacharyya distance is successfully used in engineering and statistical sciences. In the context of control theory and in the study of the problem of signal selection [a7], $B ( 1, 2 )$ is found superior to the Kullback–Leibler distance (cf. also Kullback–Leibler-type distance measures). If one uses the Bayes criterion for classification and attaches equal costs to each type of misclassification, then it has been shown [a12] that the total probability of misclassification is majorized by ${ \mathop{\rm exp} } \{ - B ( 1,2 ) \}$. In the case of equal covariances, maximization of $B ( 1,2 )$ yields the Fisher linear discriminant function. The Bhattacharyya distance is also used in evaluating the features in a two-class pattern recognition problem [a13]. Furthermore, it has been applied in time series discriminant analysis [a14], [a15], [a16], [a17].