Namespaces
Variants
Actions

Difference between revisions of "Rao-Cramér inequality"

From Encyclopedia of Mathematics
Jump to: navigation, search
m (Completed rendering of article in TeX.)
m (Corrected a minor typo error and improved the consistency of wording in the article.)
Line 3: Line 3:
 
An inequality in mathematical statistics that establishes a lower bound for the risk corresponding to a quadratic loss function in the problem of estimating an unknown parameter.
 
An inequality in mathematical statistics that establishes a lower bound for the risk corresponding to a quadratic loss function in the problem of estimating an unknown parameter.
  
Suppose that the probability distribution of a random vector $ X = (X_{1},\ldots,X_{n}) $ with values in the $ n $-dimensional Euclidean space $ \mathbb{R}^{n} $ is defined by a density function $ p(x  |\theta) $, where $ x = (x_{1},\ldots,x_{n})^{\intercal} $ and $ \theta \in \Theta \subseteq \mathbb{R} $. Suppose that a statistic $ T = T(X) $ such that
+
Suppose that the probability distribution of a random vector $ X = (X_{1},\ldots,X_{n}) $, with values in the $ n $-dimensional Euclidean space $ \mathbb{R}^{n} $, is defined by a density function $ p(x  |\theta) $, where $ x = (x_{1},\ldots,x_{n})^{\intercal} $ and $ \theta \in \Theta \subseteq \mathbb{R} $. Suppose that a statistic $ T = T(X) $ such that
 
$$
 
$$
 
{\mathsf{E}_{\theta}}[T] = \theta + b(\theta)
 
{\mathsf{E}_{\theta}}[T] = \theta + b(\theta)
 
$$
 
$$
is used as an estimator for the unknown scalar parameter $ \theta $, where $ b(\theta) $ is a differentiable function, called the '''bias''' of $ T $. Then under certain regularity conditions on the family $ (p(x|\theta))_{\theta \in \Theta} $, one of which is that the '''Fisher information'''
+
is used as an estimator for the unknown scalar parameter $ \theta $, where $ b(\theta) $ is a differentiable function called the '''bias''' of $ T $. Then under certain regularity conditions on the family $ (p(x|\theta))_{\theta \in \Theta} $, one of which is that the '''Fisher information'''
 
$$
 
$$
 
I(\theta) \stackrel{\text{df}}{=} \mathsf{E} \! \left[ \left\{ \frac{\partial \ln(p(X|\theta))}{\partial \theta} \right\}^{2} \right]
 
I(\theta) \stackrel{\text{df}}{=} \mathsf{E} \! \left[ \left\{ \frac{\partial \ln(p(X|\theta))}{\partial \theta} \right\}^{2} \right]
Line 15: Line 15:
 
\mathsf{E}_{\theta} \! \left[ |T - \theta|^{2} \right] \geq \frac{[1 + b'(\theta)]^{2}}{I(\theta)} + {b^{2}}(\theta) \qquad (1)
 
\mathsf{E}_{\theta} \! \left[ |T - \theta|^{2} \right] \geq \frac{[1 + b'(\theta)]^{2}}{I(\theta)} + {b^{2}}(\theta) \qquad (1)
 
$$
 
$$
holds. This inequality gives a lower bound for the mean-square error $ \mathsf{E}_{\theta} \! \left[ |T - \theta|^{2} \right] $ of all estimators $ T $ for the unknown parameter $ \theta $ that have the same bias function $ b(\theta) $.
+
holds. This inequality gives a lower bound for the mean squared error $ \mathsf{E}_{\theta} \! \left[ |T - \theta|^{2} \right] $ of all estimators $ T $ for the unknown parameter $ \theta $ that have the same bias function $ b(\theta) $.
  
 
In particular, if $ T $ is an [[Unbiased estimator|unbiased estimator]] for $ \theta $, i.e., if $ {\mathsf{E}_{\theta}}[T] = \theta $, then (1) implies that
 
In particular, if $ T $ is an [[Unbiased estimator|unbiased estimator]] for $ \theta $, i.e., if $ {\mathsf{E}_{\theta}}[T] = \theta $, then (1) implies that
Line 21: Line 21:
 
\mathsf{D} T = \mathsf{E_{\theta}} \! \left[ |T - \theta|^{2} \right] \geq \frac{1}{I(\theta)}. \quad (2)
 
\mathsf{D} T = \mathsf{E_{\theta}} \! \left[ |T - \theta|^{2} \right] \geq \frac{1}{I(\theta)}. \quad (2)
 
$$
 
$$
Thus, in this case, the Cramér–Rao inequality provides a lower bound for the variance of the unbiased estimators $ T $ for $ \theta $, equal to $ \dfrac{1}{I(\theta))} $, and also demonstrates that the existence of [[Consistent estimator|consistent estimators]] is connected with unrestricted growth of the Fisher information $ I(\theta) $ as $ n \to \infty $. If equality is attained in (2) for a certain unbiased estimator $ T $, then $ T $ is optimal in the class of all unbiased estimators with regard to minimum quadratic risk; it is called an '''efficient estimator'''. For example, if $ X_{1},\ldots,X_{n} $ are independent random variables subject to the same normal law $ N(\theta,1) $, then $ T = (X_{1},\ldots,X_{n}) / n $ is an efficient estimator of the unknown mean $ \theta $.
+
Thus, in this case, the Cramér–Rao inequality provides a lower bound for the variance of the unbiased estimators $ T $ for $ \theta $, equal to $ \dfrac{1}{I(\theta)} $, and also demonstrates that the existence of [[Consistent estimator|consistent estimators]] is connected with unrestricted growth of the Fisher information $ I(\theta) $ as $ n \to \infty $. If equality is attained in (2) for a certain unbiased estimator $ T $, then $ T $ is optimal in the class of all unbiased estimators with regard to minimum quadratic risk; in this case, it is called an '''efficient estimator'''. For example, if $ X_{1},\ldots,X_{n} $ are independent random variables subject to the same normal law $ N(\theta,1) $, then $ T = (X_{1},\ldots,X_{n}) / n $ is an efficient estimator of the unknown mean $ \theta $.
  
 
In general, equality in (2) is attained if and only if $ (p(x|\theta))_{\theta \in \Theta} $ is an exponential family, i.e., if the probability density of $ X $ can be represented in the form
 
In general, equality in (2) is attained if and only if $ (p(x|\theta))_{\theta \in \Theta} $ is an exponential family, i.e., if the probability density of $ X $ can be represented in the form
Line 27: Line 27:
 
p(x|\theta) = c(x) e^{u(\theta) \phi(x) - v(\theta)},
 
p(x|\theta) = c(x) e^{u(\theta) \phi(x) - v(\theta)},
 
$$
 
$$
in which case the sufficient statistic $ T = \phi(X) $ is an efficient estimator of its expectation $ \dfrac{v'(\theta)}{u'(\theta)} $. If no efficient estimator exists, the lower bound of the variances of the unbiased estimators can be refined, as the Cramér–Rao inequality does not necessarily give the greatest lower bound. For example, if $ X_{1},\ldots,X_{n} $ are independent random variables with the same normal distribution $ N(a^{1/3},1) $, then the greatest lower bound to the variance of unbiased estimators of $ a $ is equal to
+
in which case the sufficient statistic $ T = \phi(X) $ is an efficient estimator of its expectation $ \dfrac{v'(\theta)}{u'(\theta)} $. If no efficient estimator exists, then the lower bound for the variance of the unbiased estimators can be refined as the Cramér–Rao inequality does not necessarily give the greatest lower bound. For example, if $ X_{1},\ldots,X_{n} $ are independent random variables with the same normal distribution $ N(a^{1/3},1) $, then the greatest lower bound for the variance of unbiased estimators of $ a $ is equal to
 
$$
 
$$
 
\frac{9 a^{4}}{n} + \frac{18 a^{2}}{n^{2}} + \frac{6}{n^{3}},
 
\frac{9 a^{4}}{n} + \frac{18 a^{2}}{n^{2}} + \frac{6}{n^{3}},
Line 35: Line 35:
 
\frac{1}{I(\theta)} = \frac{9 a^{4}}{n}.
 
\frac{1}{I(\theta)} = \frac{9 a^{4}}{n}.
 
$$
 
$$
In general, absence of equality in (2) does not mean that the estimator that has been found is not optimal, as it may well be the only unbiased estimator.
+
In general, absence of equality in (2) does not mean that the estimator found is not optimal, as it may well be the only unbiased estimator.
  
 
There are different generalizations of the Cramér–Rao inequality to the case of a vector parameter, or to that of estimating a function of the parameter. Refinements of the lower bound in (2) play an important role in such cases.
 
There are different generalizations of the Cramér–Rao inequality to the case of a vector parameter, or to that of estimating a function of the parameter. Refinements of the lower bound in (2) play an important role in such cases.

Revision as of 21:31, 29 January 2017

Cramér–Rao inequality, Fréchet inequality, information inequality

An inequality in mathematical statistics that establishes a lower bound for the risk corresponding to a quadratic loss function in the problem of estimating an unknown parameter.

Suppose that the probability distribution of a random vector $ X = (X_{1},\ldots,X_{n}) $, with values in the $ n $-dimensional Euclidean space $ \mathbb{R}^{n} $, is defined by a density function $ p(x |\theta) $, where $ x = (x_{1},\ldots,x_{n})^{\intercal} $ and $ \theta \in \Theta \subseteq \mathbb{R} $. Suppose that a statistic $ T = T(X) $ such that $$ {\mathsf{E}_{\theta}}[T] = \theta + b(\theta) $$ is used as an estimator for the unknown scalar parameter $ \theta $, where $ b(\theta) $ is a differentiable function called the bias of $ T $. Then under certain regularity conditions on the family $ (p(x|\theta))_{\theta \in \Theta} $, one of which is that the Fisher information $$ I(\theta) \stackrel{\text{df}}{=} \mathsf{E} \! \left[ \left\{ \frac{\partial \ln(p(X|\theta))}{\partial \theta} \right\}^{2} \right] $$ is not zero, the so-called Cramér–Rao inequality $$ \mathsf{E}_{\theta} \! \left[ |T - \theta|^{2} \right] \geq \frac{[1 + b'(\theta)]^{2}}{I(\theta)} + {b^{2}}(\theta) \qquad (1) $$ holds. This inequality gives a lower bound for the mean squared error $ \mathsf{E}_{\theta} \! \left[ |T - \theta|^{2} \right] $ of all estimators $ T $ for the unknown parameter $ \theta $ that have the same bias function $ b(\theta) $.

In particular, if $ T $ is an unbiased estimator for $ \theta $, i.e., if $ {\mathsf{E}_{\theta}}[T] = \theta $, then (1) implies that $$ \mathsf{D} T = \mathsf{E_{\theta}} \! \left[ |T - \theta|^{2} \right] \geq \frac{1}{I(\theta)}. \quad (2) $$ Thus, in this case, the Cramér–Rao inequality provides a lower bound for the variance of the unbiased estimators $ T $ for $ \theta $, equal to $ \dfrac{1}{I(\theta)} $, and also demonstrates that the existence of consistent estimators is connected with unrestricted growth of the Fisher information $ I(\theta) $ as $ n \to \infty $. If equality is attained in (2) for a certain unbiased estimator $ T $, then $ T $ is optimal in the class of all unbiased estimators with regard to minimum quadratic risk; in this case, it is called an efficient estimator. For example, if $ X_{1},\ldots,X_{n} $ are independent random variables subject to the same normal law $ N(\theta,1) $, then $ T = (X_{1},\ldots,X_{n}) / n $ is an efficient estimator of the unknown mean $ \theta $.

In general, equality in (2) is attained if and only if $ (p(x|\theta))_{\theta \in \Theta} $ is an exponential family, i.e., if the probability density of $ X $ can be represented in the form $$ p(x|\theta) = c(x) e^{u(\theta) \phi(x) - v(\theta)}, $$ in which case the sufficient statistic $ T = \phi(X) $ is an efficient estimator of its expectation $ \dfrac{v'(\theta)}{u'(\theta)} $. If no efficient estimator exists, then the lower bound for the variance of the unbiased estimators can be refined as the Cramér–Rao inequality does not necessarily give the greatest lower bound. For example, if $ X_{1},\ldots,X_{n} $ are independent random variables with the same normal distribution $ N(a^{1/3},1) $, then the greatest lower bound for the variance of unbiased estimators of $ a $ is equal to $$ \frac{9 a^{4}}{n} + \frac{18 a^{2}}{n^{2}} + \frac{6}{n^{3}}, $$ while $$ \frac{1}{I(\theta)} = \frac{9 a^{4}}{n}. $$ In general, absence of equality in (2) does not mean that the estimator found is not optimal, as it may well be the only unbiased estimator.

There are different generalizations of the Cramér–Rao inequality to the case of a vector parameter, or to that of estimating a function of the parameter. Refinements of the lower bound in (2) play an important role in such cases.

The inequality (1) was independently obtained by M. Fréchet, C.R. Rao and H. Cramér.

References

[1] H. Cramér, “Mathematical methods of statistics”, Princeton Univ. Press (1946).
[2] B.L. van der Waerden, “Mathematische Statistik”, Springer (1957).
[3] L.N. Bol’shev, “A refinement of the Cramér–Rao inequality”, Theory Probab. Appl., 6 (1961), pp. 295–301; Teor. Veryatnost. Primenen., 6: 3 (1961), pp. 319–326.
[4a] A. Bhattacharyya, “On some analogues of the amount of information and their uses in statistical estimation, Chapt. I”, Sankhyā, 8: 1 (1946), pp. 1–14.
[4b] A. Bhattacharyya, “On some analogues of the amount of information and their uses in statistical estimation, Chapt. II-III”, Sankhyā, 8: 3 (1947), pp. 201–218.
[4c] A. Bhattacharyya, “On some analogues of the amount of information and their uses in statistical estimation, Chapt. IV”, Sankhyā, 8: 4 (1948), pp. 315–328.

References

[a1] E.L. Lehmann, “Theory of point estimation”, Wiley (1983).
How to Cite This Entry:
Rao-Cramér inequality. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rao-Cram%C3%A9r_inequality&oldid=40199
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article