# Behrens-Fisher problem

An analytical problem which arose in the context of the statistical problem of comparing, starting from empirical data, the mathematical expectations of two normal distributions, the variances of which are unknown (it is assumed that the ratio of the variances is also unknown). This problem was posed by W.U. Behrens [1] in connection with processing crop data. The modern formulation of the Behrens–Fisher problem is due to R. Fisher and is based on the concept of sufficient statistics. Let $X _ {11} \dots X _ {1 n _ {1} }$ and $X _ {21} \dots X _ {2 n _ {2} }$ be mutually independent random variables with a normal distribution, and let ${\mathsf E} X _ {1i} = \mu _ {1}$, ${\mathsf E} (X _ {1i} - \mu _ {1} ) ^ {2} = \sigma _ {1} ^ {2}$( $i = 1 \dots n _ {1}$) and ${\mathsf E} X _ {2j} = \mu _ {2}$, ${\mathsf E} (X _ {2j} - \mu _ {2} ) ^ {2} = \sigma _ {2} ^ {2}$( $j = 1 \dots n _ {2}$). It is assumed that the values of the mathematical expectations $\mu _ {1}$, $\mu _ {2}$, of the variances $\sigma _ {1} ^ {2}$, $\sigma _ {2} ^ {2}$ and of their ratio $\sigma _ {1} ^ {2} / \sigma _ {2} ^ {2}$ are unknown. A sufficient statistic in the case $n _ {1} , n _ {2} \geq 2$ is a four-dimensional vector $( \overline{X}\; _ {1} , \overline{X}\; _ {2} , S _ {1} ^ {2} , S _ {2} ^ {2} )$, the components of which are expressed by the formulas

$$\overline{X}\; _ {1} = \frac{1}{n _ {1} } \sum _ { i=1 } ^ { {n _ 1} } X _ {1i} ,\ \ \overline{X}\; _ {2} = \ \frac{1}{n _ {2} } \sum _ { j=1 } ^ { {n _ 2} } X _ {2j} ,$$

$$S _ {1} ^ {2} = \sum _ { i=1 } ^ { {n _ 1} } (X _ {1i} - \overline{X}\; _ {1} ) ^ {2} ,\ S _ {2} ^ {2} = \sum _ { j=1 } ^ { {n _ 2} } (X _ {2j} - \overline{X}\; _ {2} ) ^ {2} ,$$

and which are mutually independent random variables; $\sqrt {n _ {1} } ( {\overline{X}\; } _ {1} - \mu _ {1} ) / \sigma _ {1}$ and $\sqrt {n _ {2} } ( {\overline{X}\; } _ {2} - \mu _ {2} ) / \sigma _ {2}$ have a standard normal distribution, while $S _ {1} ^ {2} / \sigma _ {1} ^ {2}$ and $S _ {2} ^ {2} / \sigma _ {2} ^ {2}$ have a "chi-squared" distribution with $n _ {1} - 1$ and $n _ {2} - 1$ degrees of freedom, respectively. Since a sufficient statistic contains the same information about the unknown parameters $\mu _ {1} , \mu _ {2} , \sigma _ {1} ^ {2} , \sigma _ {2} ^ {2}$ as the initial $n _ {1} + n _ {2}$ random variables $X _ {1i}$ and $X _ {2j}$, it follows that only the sufficient statistics need be considered in testing hypotheses about the values of these parameters. In particular, this idea is the basis of the modern formulation of the problem of hypotheses testing, concerning the hypothesis $\mu _ {1} - \mu _ {2} = \Delta$, where $\Delta$ is a previously given number; here the Behrens–Fisher problem is reduced to finding a set $K _ \alpha$ in the space of possible values of the random variables $\overline{X}\; _ {1} - \overline{X}\; _ {2}$, $S _ {1} ^ {2}$, $S _ {2} ^ {2}$ such that, if the hypothesis being tested is correct, the probability of the event $( \overline{X}\; _ {1} - \overline{X}\; _ {2} , S _ {1} ^ {2} , S _ {2} ^ {2} ) \in K _ \alpha$ does not depend on all the unknown parameters and is exactly equal to a given number $\alpha$ in the interval $0 < \alpha < 1$.

The question of the existence of a solution to the Behrens–Fisher problem was discussed at length by prominent mathematicians (mainly in connection with the approach to the problem taken by R.A. Fisher, which passed beyond the borders of probability theory). It was shown by Yu.V. Linnik et al., in 1964, that if the sample sizes $n _ {1}$ and $n _ {2}$ are of different parities, a solution $K _ \alpha$ to the Behrens–Fisher problem exists. If the parities of $n _ {1}$ and $n _ {2}$ are equal, the existence of a solution remains an open question.

The Behrens–Fisher problem has often been generalized and modified. A. Wald, in particular, posed the problem of finding a set $K _ \alpha$ in the sample space of the two variables $( \overline{X}\; _ {1} - \overline{X}\; _ {2} )/ S _ {1} ^ {2}$ and $S _ {1} ^ {2} / S _ {2} ^ {2}$. The question of the existence of a solution to this problem remains open. However, it is effectively possible to construct a set $K _ \alpha ^ {*}$ such that if the hypothesis $\mu _ {1} - \mu _ {2} = \Delta$ being tested is in fact correct, the probability of the event $(( \overline{X}\; _ {1} - \overline{X}\; _ {2} )/S _ {1} ^ {2} , S _ {1} ^ {2} / S _ {2} ^ {2} ) \in K _ \alpha ^ {*}$, while still depending on the unknown ratio $\sigma _ {1} ^ {2} / \sigma _ {2} ^ {2}$, will deviate from the given $\alpha$ only by a small amount. This fact is the basis of modern recommendations for the practical construction of tests to compare $\mu _ {1}$ and $\mu _ {2}$. Simple tests for the comparison of $\mu _ {1}$ with $\mu _ {2}$, which are also computationally convenient, were proposed by V.T. Romanovskii, M. Bartlett, H. Scheffe and others. However, the statistics of these tests are not expressed in terms of sufficient statistics and are, for this reason, usually less powerful than tests based on the solution of the Behrens–Fisher problem and its generalizations.

#### References

 [1] W.U. Behrens, Landwirtsch. Jahresber. , 68 : 6 (1929) pp. 807–837 [2] Yu.V. Linnik, "Statistical problems with nuisance parameters" , Amer. Math. Soc. (1968) (Translated from Russian) [3] Yu.V. Linnik, I.V. Romanovskii, V.N. Sudakov, "A nonrandomized homogeneous test in the Behrens–Fisher problem" Soviet Math. Dokl. , 5 : 2 (1964) pp. 570–572 Dokl. Akad. Nauk SSSR , 155 : 6 (1964) pp. 1262–1264