Rao-Blackwell-Kolmogorov theorem

A proposition from the theory of statistical estimation on which a method for the improvement of unbiased statistical estimators is based.

Let $X$ be a random variable with values in a sample space $( \mathfrak X , {\mathcal B} , {\mathsf P} _ \theta )$, $\theta \in \Theta$, such that the family of probability distributions $\{ { {\mathsf P} _ \theta } : {\theta \in \Theta } \}$ has a sufficient statistic $T = T ( X)$, and let $\phi = \phi ( X)$ be a vector statistic with finite matrix of second moments. Then the mean ${\mathsf E} _ \theta \{ \phi \}$ of $\phi$ exists and, moreover, the conditional mean $\phi ^ {*} = {\mathsf E} _ \theta \{ \phi \mid T \}$ is an unbiased estimator for ${\mathsf E} _ \theta \{ \phi \}$, that is,

$${\mathsf E} _ \theta \{ \phi ^ {*} \} = \ {\mathsf E} _ {0} \{ {\mathsf E} _ {0} \{ \phi \mid T \} \} = {\mathsf E} _ \theta \{ \phi \} .$$

The Rao–Blackwell–Kolmogorov theorem states that under these conditions the quadratic risk of $\phi ^ {*}$ does not exceed the quadratic risk of $\phi$, uniformly in $\theta \in \Theta$, i.e. for any vector $z$ of the same dimension as $\phi$, the inequality

$$z {\mathsf E} _ {0} \{ ( \phi - {\mathsf E} _ {0} \{ \phi \} ) ^ {T} ( \phi - {\mathsf E} _ {0} \{ \phi \} ) \} z ^ {T\ } \geq$$

$$\geq \ z {\mathsf E} _ {0} \{ ( \phi ^ {*} - {\mathsf E} _ {0} \{ \phi ^ {*} \} ) ^ {T} ( \phi ^ {*} - {\mathsf E} _ {0} \{ \phi ^ {*} \} ) \} z ^ {T}$$

holds for any $\theta \in \Theta$. In particular, if $\phi$ is a one-dimensional statistic, then for any $\theta \in \Theta$ the variance ${\mathsf D} _ \theta \phi ^ {*}$ of $\phi ^ {*}$ does not exceed the variance ${\mathsf D} _ \theta \phi$ of $\phi$.

In the most general situation the Rao–Blackwell–Kolmogorov theorem states that averaging over a sufficient statistic does not lead to an increase of the risk with respect to any convex loss function. This implies that good statistical estimators should be looked for only in terms of sufficient statistics, that is, in the class of functions of sufficient statistics.

In case the family $\{ {\mathsf P} _ \theta T ^ {-} 1 \}$ is complete, that is, when the function of $T$ that is almost-everywhere equal to zero is the only unbiased estimator based on $T$ for zero, the unbiased estimator with uniformly minimal risk provided by the Rao–Blackwell–Kolmogorov theorem is unique. Thus, the Rao–Blackwell–Kolmogorov theorem gives a recipe for constructing best unbiased estimators: one has to take some unbiased estimator and then average it over a sufficient statistic. That is how the best unbiased estimator for the distribution function of the normal law is constructed in the following example, which is due to A.N. Kolmogorov.

Example. Given a realization of a random vector $X = ( X _ {1} \dots X _ {n} )$ whose components $X _ {i}$, $i = 1 \dots n$, $n \geq 3$, are independent random variables subject to the same normal law $N _ {1} ( \xi , \sigma ^ {2} )$, it is required to estimate the distribution function

$$\Phi \left ( \frac{x - \xi } \sigma \right ) = \ \frac{1}{\sqrt {2 \pi } \sigma } \int\limits _ {- \infty } ^ { x } e ^ {- ( u - \xi ) ^ {2} / 2 \sigma ^ {2} } \ d u ,\ | \xi | < \infty ,\ \ \sigma > 0 .$$

The parameters $\xi$ and $\sigma ^ {2}$ are supposed to be unknown. Since the family

$$\left \{ {\Phi \left ( \frac{x - \xi } \sigma \right ) } : { | \xi | \langle \infty , \sigma \rangle 0 } \right \}$$

of normal laws has a complete sufficient statistic $T = ( \overline{X}\; , S ^ {2} )$, where

$$\overline{X}\; = \frac{X _ {1} + \dots + X _ {n} }{n}$$

and

$$S ^ {2} = \frac{1}{n} \sum _ { i= } 1 ^ { n } ( X _ {i} - \overline{X}\; ) ^ {2} ,$$

the Rao–Blackwell–Kolmogorov theorem can be used for the construction of the best unbiased estimator for the distribution function $\Phi ( ( x - \xi ) / \sigma )$. As an initial statistic $\phi$ one may use, e.g., the empirical distribution function constructed from an arbitrary component $X _ {1}$ of $X$:

$$\phi = \left \{ \begin{array}{ll} 0 & \textrm{ if } x < X _ {1} , \\ 1 & \textrm{ if } x \geq X _ {1} . \\ \end{array} \right .$$

This is a trivial unbiased estimator for $\Phi ( ( x - \xi ) / \sigma )$, since

$${\mathsf E} \{ \phi \} = {\mathsf P} \{ X _ {1} \leq x \} = \Phi \left ( \frac{x - \xi } \sigma \right ) .$$

Averaging of $\phi$ over the sufficient statistic $T$ gives the estimator

$$\tag{1 } \phi ^ {*} = {\mathsf E} \{ \phi \mid T \} = \ {\mathsf P} \{ X _ {1} \leq x \mid \overline{X}\; , S ^ {2} \} =$$

$$= \ {\mathsf P} \left \{ \frac{X _ {1} - \overline{X}\; }{S} \leq \frac{x - \overline{X}\; }{S} \mid \overline{X}\; , S ^ {2} \right \} .$$

Since the statistic

$$V = \left ( \frac{X _ {1} - \overline{X}\; }{S} \dots \frac{X _ {n} - \overline{X}\; }{S} \right ) ,$$

which is complementary to $T$, has a uniform distribution on the $( n - 2 )$- dimensional sphere of radius $n$ and, therefore, depends neither on the unknown parameters $\xi$ and $\sigma ^ {2}$ nor on $T$, the same is true for $( X _ {1} - \overline{X}\; ) / S$ and

$$\tag{2 } {\mathsf P} \left \{ \frac{X _ {1} - \overline{X}\; }{S} \leq u \right \} = T _ {n-} 2 ( u) ,\ \ | u | < \sqrt n- 1 ,$$

where

$$\tag{3 } T _ {f} ( u) =$$

$$= \ \frac{1}{\sqrt {\pi ( f + 1 ) } } \frac{\Gamma ( ( f+ 1) / 2 ) }{\Gamma ( f / 2 ) } \int\limits _ {- \sqrt {f + 1 } } ^ { u } \left ( 1 - \frac{t ^ {2 } }{f+} 1 \right ) ^ {( f - 2) / 2 } du$$

is the Thompson distribution with $f$ degrees of freedom. Thus, (1)–(3) imply that the best unbiased estimator for $\Phi ( ( x - \xi ) / \sigma )$ obtained from $n$ independent observations $X _ {1} \dots X _ {n}$ is

$$\phi ^ {*} = \ T _ {n-} 2 \left ( \frac{x - \overline{X}\; }{S} \right ) =$$

$$= \ S _ {n-} 2 \left ( \frac{x - \overline{X}\; }{S} \sqrt {n- \frac{2}{n - 1 - ( ( x - \overline{X}\; ) / S ) ^ {2} } } \right ) ,$$

where $S _ {f} ( \cdot )$ is the Student distribution with $f$ degrees of freedom.

References

 [1] A.N. Kolmogorov, "Unbiased estimates" Izv. Akad. Nauk SSSR Ser. Mat. , 14 : 4 (1950) pp. 303–326 (In Russian) [2] C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965) [3] B.L. van der Waerden, "Mathematische Statistik" , Springer (1957) [4] D. Blackwell, "Conditional expectation and unbiased sequential estimation" Ann. Math. Stat. , 18 (1947) pp. 105–110