# Statistical estimation

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

One of the fundamental parts of mathematical statistics, dedicated to the estimation using random observations of various characteristics of their distribution.

### Example 1.

Let be independent random variables (or observations) with a common unknown distribution on the straight line. The empirical (sample) distribution which ascribes the weight to every random point is a statistical estimator for . The empirical moments

serve as estimators for the moments . In particular,

is an estimator for the mean, and

is an estimator for the variance.

## Basic concepts.

In the general theory of estimation, an observation of is a random element with values in a measurable space , whose unknown distribution belongs to a given family of distributions . The family of distributions can always be parametrized and written in the form . Here the form of dependence on the parameter and the set are assumed to be known. The problem of estimation using an observation of an unknown parameter or of the value of a function at the point consists of constructing a function from the observations made, which gives a sufficiently good approximation of .

A comparison of estimators is carried out in the following way. Let a non-negative loss function be defined on , the sense of this being that the use of for the actual value of leads to losses . The mean losses and the risk function are taken as a measure of the quality of the statistic as an estimator of given the loss function . A partial order relation is thereby introduced on the set of estimators: An estimator is preferable to an estimator if . In particular, an estimator of the parameter is said to be inadmissible (in relation to the loss function ) if an estimator exists such that for all , and for some strict inequality occurs. In this method of comparing the quality of estimators, many estimators prove to be incomparable, and, moreover, the choice of a loss function is to a large extent arbitrary.

It is sometimes possible to find estimators that are optimal within a certain narrower class of estimators. Unbiased estimators form an important class. If the initial experiment is invariant relative to a certain group of transformations, it is natural to restrict to estimators that do not disrupt the symmetry of the problem (see Equivariant estimator).

Estimators can be compared by their behaviour at "worst" points: An estimator of is called a minimax estimator relative to the loss function if

where the lower bound is taken over all estimators .

In the Bayesian formulation of the problem (cf. Bayesian approach), the unknown parameter is considered to represent values of the random variable with a priori distribution on . In this case, the best estimator relative to the loss function is defined by the relation

and the lower bound is taken over all estimators .

There is a distinction between parametric estimation problems, in which is a subset of a finite-dimensional Euclidean space, and non-parametric problems. In parametric problems one usually considers loss functions in the form , where is a non-negative, non-decreasing function on . The most frequently used quadratic loss function plays an important part.

If is a sufficient statistic for the family , then it is often possible to restrict to estimators . Thus, if , , where is a convex function and is any estimator for , an estimator exists that is not worse than ; if is unbiased, can also be chosen unbiased (Blackwell's theorem). If is a complete sufficient statistic for the family and is an unbiased estimator for , then an unbiased estimator in the form with minimum variance in the class of unbiased estimators exists (the Lehmann–Scheffé theorem).

As a rule, it is assumed that in parametric estimation problems the elements of the family are absolutely continuous with respect to a certain -finite measure and that the density exists. If is a sufficiently-smooth function of and the Fisher information matrix

exists, the estimation problem is said to be regular. For regular problems, the accuracy of the estimation is bounded from below by the Cramér–Rao inequality: If , then for any estimator ,

### Examples of estimation problems 2.

The most widespread formulation is that in which a sample of size is observed: are independent identically-distributed variables taking values in a measurable space with common distribution density relative to a measure , and . In regular problems, if is the Fisher information on one observation, then the Fisher information of the whole sample . The Cramér–Rao inequality takes the form

. Let be normal random variables with distribution density

Let the unknown parameter be ; and can serve as estimators for and , and is then a sufficient statistic. The estimator is unbiased, while is biased. If is known, is an unbiased estimator of minimal variance, and is a minimax estimator relative to the quadratic loss function.

. Let be normal random variables in with density

The statistic is an unbiased estimator of ; if , it is admissible relative to the quadratic loss function, if , it is inadmissible.

. Let be random variables in with unknown distribution density belonging to a given family of densities. For a sufficiently broad class , this is a non-parametric problem. The problem of estimating at a point is a problem of estimating the functional .

### Example 3.

The linear regression model. The variables

are observed; the are random disturbances, ; the matrix is known; and the parameter must be estimated.

### Example 4.

A segment of a stationary Gaussian process , , with rational spectral density is observed; the unknown parameters , are to be estimated.

## Methods of producing estimators.

The most widely used maximum-likelihood method recommends that the estimator defined as the maximum point of the random function is taken, the so-called maximum-likelihood estimator. If , the maximum-likelihood estimators are to be found among the roots of the likelihood equation

In example 3, the method of least squares (cf. Least squares, method of) recommends that the minimum point of the function

be used as the estimator.

Another method is to take a Bayesian estimator relative to a loss function and an a priori distribution , although the initial formulation is not Bayesian. For example, if , it is possible to estimate by means of

This is a Bayesian estimator relative to the quadratic loss function and a uniform a priori distribution.

The method of moments (cf. Moments, method of (in probability theory)) consists of the following. Let , and suppose that there are "good" estimators for . Estimators by the method of moments are solutions of the system . Empirical moments are frequently chosen in the capacity of (see example 1).

If the sample is observed, then (see example 1) as an estimator for it is possible to choose . If the function is not defined (for example, , where is Lebesgue measure), appropriate modifications are chosen. For example, for an estimator of the density a histogram or an estimator of the form

is used.

## Asymptotic behaviour of estimators.

For the sake of being explicit a problem such as Example 2 is examined, in which . It is to be expected that when , "good" estimators will get infinitely close to the characteristic being estimated. A sequence of estimators is called a consistent sequence of estimators of if in the probability for all . The above methods of producing estimators lead, under broad hypotheses, to consistent estimators (cf. Consistent estimator). The estimators in example 1 are consistent. For regular estimation problems, maximum-likelihood estimators and Bayesian estimators are asymptotically normal with mean and correlation matrix . Under such conditions, these estimators are asymptotically locally minimax relative to a broad class of loss functions, and they can be considered as being asymptotically optimal (see Asymptotically-efficient estimator).

## Interval estimation.

A random subset of the set is called a confidence region for the estimator with confidence coefficient if (). Many confidence regions with a given usually exist, and the problem is to choose the one possessing certain optimal properties (for example, the interval of minimum length, if ). Under the conditions of example 2.1, let . Then the interval

is a confidence interval with confidence coefficient (see Interval estimator).

#### References

 [1] R.A. Fisher, "On the mathematical foundations of theoretical statistics" Phil. Trans. Roy. Soc. London Ser. A , 222 (1922) pp. 309–368 [2] A.N. Kolmogorov, "Sur l'estimation statistique des paramètres de la loi de Gauss" Izv. Akad. Nauk SSSR Ser. Mat. , 6 : 1 (1942) pp. 3–32 [3] H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946) [4] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Griffin (1979) [5] I.A. Ibragimov, R.Z. [R.Z. Khas'minskii] Has'minskii, "Statistical estimation: asymptotic theory" , Springer (1981) (Translated from Russian) [6] N.N. Chentsov, "Statistical decision laws and optimal inference" , Amer. Math. Soc. (1982) (Translated from Russian) [7] S. Zacks, "The theory of statistical inference" , Wiley (1975) [8] U. Grenander, "Abstract inference" , Wiley (1981)