Difference between revisions of "Statistical estimator"

Latest revision as of 16:41, 13 January 2024

A function of random variables that can be used in estimating unknown parameters of a theoretical probability distribution. Methods of the theory of statistical estimation form the basis of the modern theory of errors; physical constants to be measured are commonly used as the unknown parameters, while the results of direct measurements subject to random errors are taken as the random variables. For example, if $ X _ {1} \dots X _ {n} $ are independent, identically normally distributed random variables (the results of equally accurate measurements subject to independent normally distributed random errors), then for the unknown mean value $ a $( the value of an approximately measurable physical constant) the arithmetical mean

$$ \tag{1 } X = \frac{X _ {1} + \dots + X _ {n} }{n} $$

is taken as the statistical estimator.

A statistical estimator as a function of random variables is most frequently given by formulas, the choice of which is prescribed by practical requirements. A distinction must be made here between point and interval estimators.

Point estimators.

A point estimator is a statistical estimator whose value can be represented geometrically in the form of a point in the same space as the values of the unknown parameters (the dimension of the space is equal to the number of parameters to be estimated). In fact, point estimators are also used as approximate values for unknown physical variables. For the sake of simplicity, it is further supposed that one natural parameter is subject to estimation; in this case, a point estimator is a function of the results of observations, and takes numerical values.

A point estimator is said to be unbiased if its mathematical expectation coincides with the parameter being estimated, i.e. if the statistical estimation is free of systematic errors. The arithmetical mean (1) is an unbiased statistical estimator for the mathematical expectation of identically-distributed random variables $ X _ {i} $( not necessarily normal). At the same time, the sample variance

$$ \tag{2 } \widehat{s} {} ^ {2} = \frac{( X _ {1} - \overline{X}\; ) ^ {2} + \dots + ( X _ {n} - \overline{X}\; ) ^ {2} }{n} $$

is a biased statistical estimator for the variance $ \sigma ^ {2} = {\mathsf D} X _ {i} $, since $ {\mathsf E} {\widehat{s} } {} ^ {2} = ( 1- 1/n) \sigma ^ {2} $; the function

$$ s ^ {2} = \frac{n}{n-1} {\widehat{s} } {} ^ {2} $$

is usually taken as the unbiased statistical estimator for $ \sigma ^ {2} $.

See also Unbiased estimator.

As a measure of the accuracy of the unbiased statistical estimator $ \alpha $ for a parameter $ a $ one most often uses the variance $ {\mathsf D} \alpha $.

The statistical estimator with smallest variance is called the best. In the example quoted, the arithmetical mean (1) is the best statistical estimator. However, if the probability distribution of the random variables $ X _ {i} $ is different from normal, then (1) need not be the best statistical estimator. For example, if the results of the observations of $ X _ {i} $ are uniformly distributed in an interval $ ( b, c) $, then the best statistical estimator for the mathematical expectation $ a = ( b+ c)/2 $ will be half the sum of the boundary values:

$$ \tag{3 } \alpha = \frac{\min X _ {i} + \max X _ {i} }{2} . $$

The criterion for the comparison of the accuracy of different statistical estimators ordinarily used is the relative efficiency — the ratio of the variances of the best estimator and the given unbiased estimator. For example, if the results of the observations of $ X _ {i} $ are uniformly distributed, then the variances of the estimators (1) and (3) are expressed by the formulas

$$ {\mathsf D} \overline{X}\; = \frac{( c- b) ^ {2} }{12n} $$

and

$$ \tag{4 } {\mathsf D} \alpha = \frac{( c- b) ^ {2} }{2( n+ 1) ( n+ 2) } . $$

Since (3) is the best estimator, the relative efficiency of the estimator (1) in the given case is

$$ e _ {n} ( \overline{X}\; ) = \frac{6n}{( n+ 1)( n+ 2) } \sim \frac{6}{n} . $$

For a large number of observations $ n $, it is usually required that the chosen statistical estimator tends in probability to the true value of the parameter $ a $, i.e. that for every $ \epsilon > 0 $,

$$ \lim\limits _ {n \rightarrow \infty } {\mathsf P} \{ | \alpha - a | > \epsilon \} = 0; $$

such statistical estimators are called consistent (for example, any unbiased estimator with variance tending to zero, when $ n \rightarrow \infty $, is consistent; see also Consistent estimator). Insofar as the order of tendency to the limit is of significance, the asymptotically best estimators are the asymptotically efficient statistical estimators, i.e. those for which

$$ \frac{ {\mathsf E} ( \alpha - a) }{\sqrt { {\mathsf E} ( \alpha - a) ^ {2} } } \rightarrow 0 \ \textrm{ and } \ e _ {n} ( \alpha ) \rightarrow 1, $$

when $ n \rightarrow \infty $. For example, if $ X _ {1} \dots X _ {n} $ are identically normally distributed, then (2) is an asymptotically efficient estimator for the unknown parameter $ \sigma ^ {2} = {\mathsf D} X _ {i} $, since, when $ n \rightarrow \infty $, the variance of $ \widehat{s} {} ^ {2} $ and that of the best estimator $ \widehat{s} {} ^ {2} n/( n- 1) $ are asymptotically equivalent:

$$ \frac{ {\mathsf D} {\widehat{s} } {} ^ {2} }{ {\mathsf D} [ {\widehat{s} } {} ^ {2} n/( n- 1)] } = \ \frac{n}{( n- 1) ^ {2} } ,\ \ {\mathsf D} {\widehat{s} } {} ^ {2} = \ \frac{2 \sigma ^ {4} }{n-1}, $$

and, moreover,

$$ {\mathsf E} ( {\widehat{s} } {} ^ {2} - \sigma ^ {2} ) = \frac{- \sigma ^ {2} }{n} . $$

Of prime importance in the theory of statistical estimation and its applications is the fact that the quadratic deviation of a statistical estimator for a parameter $ a $ is bounded from below by a certain quantity (R. Fisher proposed that this quantity be characterized by the amount of information regarding the unknown parameter $ a $ contained in the results of the observations). For example, if $ X _ {1} \dots X _ {n} $ are independent and identically distributed, with probability density $ p( x; a) $, and if $ \alpha = \phi ( X _ {1} \dots X _ {n} ) $ is a statistical estimator for a certain function $ g( a) $ of the parameter $ a $, then in a broad class of cases

$$ \tag{5 } {\mathsf E} [ \alpha - g( a)] ^ {2} \geq \frac{nb ^ {2} ( a) I( a) + [ g ^ \prime ( a) + b ^ \prime ( a)] ^ {2} }{nI( a) } , $$

where

$$ b( a) = {\mathsf E} [ \alpha - g( a)] \ \textrm{ and } \ \ I( a) = {\mathsf E} \left [ \frac{\partial \mathop{\rm ln} p( X; a) }{\partial a } \right ] ^ {2} . $$

The function $ b( a) $ is called the bias, while the quantity inverse to the right-hand side of inequality (5) is called the Fisher information, with respect to the function $ g( a) $, contained in the results of the observations. In particular, if $ \alpha $ is an unbiased statistical estimator of the parameter $ a $, then

$$ g( a) \equiv a,\ b( a) \equiv 0 , $$

and

$$ \tag{6 } {\mathsf E} [ \alpha - g( a)] ^ {2} = {\mathsf D} \alpha \geq \frac{1}{nI(} a) , $$

whereby the information $ nI( a) $ in this instance is proportional to the number of observations (the function $ I( a) $ is called the information contained in one observation).

The basic conditions under which the inequalities (5) and (6) hold are smoothness of the estimator $ \alpha $ as a function of $ X _ {i} $, and the independence of the parameter $ a $ of the set of those points $ x $ where $ p( x; a) = 0 $. The latter condition is not fulfilled, for example, in the case of a uniform distribution, and the variance of the estimator (3) does therefore not satisfy inequality (6) (according to (4), this variance is a quantity of order $ n ^ {-2} $, while, according to inequality (6), it cannot have an order of smallness higher than $ n ^ {-1} $).

The inequalities (5) and (6) also hold for discretely distributed random variables $ X _ {i} $: In defining the information $ I( a) $, the density $ p( x; a) $ must be replaced by the probability of the event $ \{ X = x \} $.

If the variance of an unbiased statistical estimator $ \alpha ^ {*} $ for the parameter $ a $ coincides with the right-hand side of inequality (6), then $ \alpha ^ {*} $ is the best estimator. The converse assertion, generally speaking, is not true: The variance of the best statistical estimator can exceed $ [ nI( a)] ^ {-1} $. However, as $ n \rightarrow \infty $, the variance of the best estimator, $ {\mathsf D} \alpha ^ {*} $, is asymptotically equivalent to the right-hand side of (6), i.e. $ n {\mathsf D} \alpha ^ {*} \rightarrow 1/I( a) $. In this way, using the Fisher information, it is possible to define the asymptotic efficiency of an unbiased statistical estimator $ \alpha $, by proposing

$$ \tag{7 } e _ \infty ( \alpha ) = \ \lim\limits _ {n \rightarrow \infty } \frac{ {\mathsf D} \alpha ^ {*} }{ {\mathsf D} \alpha } = \ \lim\limits _ {n \rightarrow \infty } \frac{1}{nI( a) {\mathsf D} \alpha } . $$

One information approach to the theory of statistical estimators which proves to be particularly fruitful is that where the density (in the discrete instance, the probability) of the joint distribution of the random variables $ X _ {1} \dots X _ {n} $ can be represented in the form of the product of two functions $ h( x _ {1} \dots x _ {n} ) q[ y( x _ {1} \dots x _ {n} ); a] $, the first of which does not depend on $ a $ while the second is the density of the distribution of a certain random variable $ Z = y( X _ {1} \dots X _ {n} ) $, called a sufficient statistic.

One of the most frequently used methods of finding point estimators is the method of moments (cf. Moments, method of (in probability theory)). According to this method, a theoretical distribution dependent on unknown parameters corresponds to a discrete sample distribution, which is defined by the results of observations of $ X _ {i} $ and which is the probability distribution of a theoretical random variable which takes the values $ X _ {1} \dots X _ {n} $ with identical probabilities equal to $ 1/n $( the sample distribution can be seen as a point estimator for the theoretical distribution). The statistical estimator for the moments of a theoretical distribution is taken to be that of the corresponding moments of the sample distribution; for example, for the mathematical expectation $ a $ and variance $ \sigma ^ {2} $, the method of moments provides the following statistical estimators: the sample mean (1) and the sample variance (2). The unknown parameters are usually expressed (exactly or approximately) in the form of functions of several moments of the theoretical distribution. By replacing theoretical moments in these functions by sample moments, the required statistical estimators are obtained. This method, which in practice often reduces to comparatively simple calculations, generally gives a statistical estimator of low asymptotic efficiency (see the above example of the estimator of the mathematical expectation of a uniform distribution).

Another method for finding statistical estimators, which is more complete from the theoretical point of view, is the maximum-likelihood method. According to this method, the likelihood function $ L( a) $ is considered, which is a function of the unknown parameter $ a $, and which is obtained as a result of substituting the random variables $ X _ {i} $ in the density $ p( x _ {1} \dots x _ {n} ; n) $ of the joint distribution for the arguments; if the $ X _ {i} $ are independent and identically distributed with probability density $ p( x; a) $, then

$$ L( a) = p( X _ {1} ; a) \dots p( X _ {n} ; a) $$

(if the $ X _ {i} $ are discretely distributed, then in defining the likelihood function $ L $ the density should be replaced by the probability of the events $ \{ X _ {i} = x _ {i} \} $). The variable $ \alpha $ for which $ L( \alpha ) $ has its largest value is used as the maximum-likelihood estimator for the unknown parameter $ a $( instead of $ L $, the so-called logarithmic likelihood function is often considered: $ l( \alpha ) = \mathop{\rm ln} L( \alpha ) $; owing to the monotone nature of the logarithm, the maximum points of $ L( \alpha ) $ and $ l( \alpha ) $ coincide).

The basic merit of maximum-likelihood estimators lies in the fact that, given certain general conditions, they are consistent, asymptotically efficient and approximately normally distributed. These properties mean that if $ \alpha $ is a maximum-likelihood estimator, then, when $ n \rightarrow \infty $,

$$ {\mathsf E} \alpha \sim a \ \textrm{ and } \ \ {\mathsf E} ( \alpha - a) ^ {2} \sim {\mathsf D} \alpha \sim \sigma _ {n} ^ {2} ( a) = \frac{1}{ {\mathsf E} \left [ \frac{d}{da} l ( a) \right ] ^ {2} } $$

(if the $ X _ {i} $ are independent, then $ \sigma _ {n} ^ {2} ( a) = [ nI( a)] ^ {-1} $). Thus, for the distribution function of a normalized statistical estimator $ ( \alpha - a)/ \sigma _ {n} ( a) $, the limit relation

$$ \tag{8 } \lim\limits _ {n \rightarrow \infty } {\mathsf P} \left \{ \frac{\alpha - a }{\sigma _ {n} ( a) } < x \right \} = \ \frac{1}{\sqrt {2 \pi } } \int\limits _ {- \infty } ^ { x } e ^ {- t ^ {2} /2 } dt \equiv \ \Phi ( x) $$

holds.

The advantages of the maximum-likelihood estimator justify the amount of calculation involved in seeking the maximum of the function $ L $( or $ l $). In certain cases, the amount of calculation is greatly reduced as a result of the following properties: firstly, if $ \alpha ^ {*} $ is a statistical estimator for which inequality (6) becomes an equality, then the maximum-likelihood estimator is unique and coincides with $ \alpha ^ {*} $; secondly, if a sufficient statistic $ Z $ exists, then the maximum-likelihood estimator is a function of $ Z $.

For example, let $ X _ {1} \dots X _ {n} $ be independent and normally distributed, and such that

$$ p( x; a, \sigma ) = \ \frac{1}{\sigma \sqrt {2 \pi } } \mathop{\rm exp} \left \{ - \frac{1}{2 \sigma ^ {2} } ( x - a) ^ {2} \right \} , $$

then

$$ l( a, \sigma ) = \mathop{\rm ln} L( a, \sigma ) = $$

$$ = \ - \frac{n}{2} \mathop{\rm ln} ( 2 \pi ) - n \mathop{\rm ln} \sigma - \frac{1}{2 \sigma ^ {2} } \sum_{i=1}^ { n } ( X _ {i} - a) ^ {2} . $$

The coordinates $ a = a _ {0} $ and $ \sigma = \sigma _ {0} $ of the maximum point of the function $ I( a, \sigma ) $ satisfy the system of equations

$$ \frac{\partial l }{\partial a } \equiv \ \frac{1}{\sigma ^ {2} } \sum ( X _ {i} - a) = 0, $$

$$ \frac{\partial l }{\partial a } \equiv - \frac{n}{\sigma ^ {3} } \left [ \sigma ^ {2} - \frac{1}{n} \sum ( X _ {i} - a) ^ {2} \right ] = 0. $$

Thus, $ a _ {0} = \overline{X}\; = \sum X _ {i/n} $, $ \sigma _ {0} ^ {2} = {\widehat{s} } {} ^ {2} = \sum ( X _ {i} - \overline{X}\; ) ^ {2} /n $, and in the given case (1) and (2) are maximum-likelihood estimators, whereby $ \overline{X}\; $ is the best statistical estimator of the parameter $ a $, normally distributed ( $ {\mathsf E} \overline{X}\; = a $, $ {\mathsf D} \overline{X}\; = \sigma ^ {2} /n $), while $ {\widehat{s} } {} ^ {2} $ is an asymptotically efficient statistical estimator of the parameter $ \sigma ^ {2} $, distributed approximately normally for large $ n $( $ {\mathsf E} {\widehat{s} } {} ^ {2} \sim \sigma ^ {2} $, $ {\mathsf D} {\widehat{s} } {} ^ {2} \sim 2 \sigma ^ {4} /n $). Both estimators are independent sufficient statistics.

As a further example, suppose that

$$ p( x; a) = \{ \pi [ 1+( x- a) ^ {2} ] \} . $$

This density gives a satisfactory description of the distribution of one of the coordinates of the particles reaching a plane screen and emanating from a point outside the screen ( $ a $ is the coordinate of the projection of the source onto the screen, and is presumed to be unknown). The mathematical expectation of this distribution does not exist, since the corresponding integral is divergent. For this reason it is not possible to find a statistical estimator of $ a $ by means of the method of moments. The formal use of the arithmetical mean (1) as a statistical estimator is meaningless, since $ \overline{X}\; $ is distributed in the given instance with the same density $ p( x; a) $ as every single result of the observations. For estimation of $ a $ it is possible to make use of the property that the distribution in question is symmetric relative to the point $ x= a $, where $ a $ is the median of the theoretical distribution. By slightly modifying the method of moments, the sample median $ \mu $ can be used as a statistical estimator. When $ n \geq 3 $, it is unbiased for $ a $ and if $ n $ is large, $ \mu $ is distributed approximately normally with variance

$$ {\mathsf D} \mu \sim \frac{\pi ^ {2} }{4n} . $$

At the same time,

$$ l( a) = - n \mathop{\rm ln} \pi + \sum_{i=1}^ { n } \mathop{\rm ln} [ 1 + ( X _ {i} - a) ^ {2} ], $$

thus $ nl( a) = n/2 $ and, according to (7), the asymptotic efficiency $ e _ \infty ( \mu ) $ is equal to $ 8/ \pi ^ {2} \approx 0.811 $. Thus, in order that the sample median $ \mu $ is as accurate a statistical estimator for $ a $ as the maximum-likelihood estimator $ \alpha $, the number of observations has to be increased by $ 25\pct $. If the losses in the experiment are great, then, in the definition of $ a $, that statistical estimator $ \alpha $ must be used, which, in the given case, is defined as the root of the equation

$$ \frac{\partial l }{\partial a } \equiv - 2 \sum_{i=1}^ { n } \frac{X _ {i} - a }{1 + ( X _ {i} - a) ^ {2} } = 0. $$

As a first approximation, $ \alpha _ {0} = \mu $ is used, and this equation is then solved by successive approximation using the formula

$$ \alpha _ {k+1} = \alpha _ {k} + \frac{4}{n} \sum_{i=1}^ { n } \frac{X _ {i} - \alpha _ {k} }{1 + ( X _ {i} - \alpha _ {k} ) ^ {2} } . $$

Interval estimators.

An interval estimator is a statistical estimator which is represented geometrically as a set of points in the parameter space. An interval estimator can be seen as a set of point estimators. This set depends on the results of observations, and is consequently random; every interval estimator is therefore (partly) characterized by the probability with which this estimator will "cover" the unknown parameter point. This probability, in general, depends on unknown parameters; therefore, as a characteristic of the reliability of an interval estimator a confidence coefficient is used; this is the lowest possible value of the given probability. Interesting statistical conclusions can be drawn for only those interval estimators which have a confidence coefficient close to one.

If a single parameter $ a $ is estimated, then an interval estimator is usually a certain interval $ ( \beta , \gamma ) $( the so-called confidence interval), the end-points $ \beta $ and $ \gamma $ of which are functions of the observations; the confidence coefficient $ \omega $ in the given case is defined as the lower bound of the probability of the simultaneous realization of the two events $ \{ \beta < a \} $ and $ \{ \gamma > a \} $, which can be calculated using all possible values of the parameter $ a $:

$$ \omega = \inf _ { a } {\mathsf P} \{ \beta < a < \gamma \} . $$

If the mid-point $ ( \beta + \gamma )/2 $ of such an interval is taken as a point estimator for the parameter $ a $, then it can be claimed, with probability not less that $ \omega $, that the absolute error of this statistical estimator does not exceed half the length of the interval, $ ( \gamma - \beta )/2 $. In other words, if one is guided by the rule of estimation of the absolute error, then an erroneous conclusion will be obtained on the average in less than $ 100( 1- \omega )\pct $ of the cases. Given a fixed confidence coefficient $ \omega $, the most suitable are the shortest confidence intervals for which the mathematical expectation of the length $ {\mathsf E} ( \gamma - \beta ) $ attains its lowest value.

If the distribution of random variables $ X _ {i} $ depends only on one unknown parameter $ a $, then the construction of the confidence interval is usually realized by the use of a certain point estimator $ \alpha $. For the majority of cases of practical interest, the distribution function $ {\mathsf P} \{ \alpha < x \} = F( x; a) $ of a sensibly chosen statistical estimator $ \alpha $ depends monotonically on the parameter $ a $. Under these conditions, when seeking an interval estimator it makes sense to insert $ x = \alpha $ in $ F( x; a) $ and to determine the roots $ a _ {1} = a _ {1} ( \alpha , \omega ) $ and $ a _ {2} = a _ {2} ( \alpha , \omega ) $ of the equations

$$ \tag{9 } F( \alpha ; a _ {1} ) = \ \frac{1 - \omega }{2} \ \textrm{ and } \ \ F( \alpha + 0; a _ {2} ) = \frac{1 + \omega }{2} , $$

where

$$ F( x+ 0; a) = \lim\limits _ {\Delta \rightarrow 0 } F( x + \Delta ^ {2} ; a) $$

(for continuous distributions $ F( x+ 0; a) = F( x; a) $). The points with coordinates $ a _ {1} ( \alpha ; \omega ) $ and $ a _ {2} ( \alpha ; \omega ) $ bound the confidence interval with confidence coefficient $ \omega $. It is reasonable to expect that such a simply constructed interval differs in many cases from the optimal (shortest) interval. However, if $ \alpha $ is an asymptotically efficient statistical estimator for $ a $, then, given a sufficiently large number of observations, such an interval estimator differs from the optimal, although in practice the difference is immaterial. This is particularly true for maximum-likelihood estimators, since they are asymptotically normally distributed (see (8)). In cases where solving the equations (9) is difficult, the interval estimator is calculated approximately, using a maximum-likelihood point estimator and the relation (8):

$$ \beta \approx \beta ^ {*} = \alpha - x \sigma _ {n} ( \alpha ) \ \textrm{ and } \ \ \gamma \approx \gamma ^ {*} = \alpha + x \sigma _ {n} ( \alpha ) , $$

where $ x $ is the root of the equation $ \phi ( x) = ( 1+ \omega )/2 $.

If $ n \rightarrow \infty $, then the true confidence coefficient of the interval estimator $ ( \beta ^ {*} , \gamma ^ {*} ) $ tends to $ \omega $. In a more general case, the distribution of the results of observations $ X _ {i} $ depends on various parameters $ a, b , . . . $. Then the above rules for the construction of confidence intervals often prove to be not feasible, since the distribution of a point estimator $ \alpha $ depends, as a rule, not only on $ a $, but also on other parameters. However, in cases of practical interest the statistical estimator $ \alpha $ can be replaced by a function of the observations $ X _ {i} $ and an unknown parameter $ a $, the distribution of which does not depend (or "nearly does not depend" ) on all unknown parameters. An example of such a function is a normalized maximum-likelihood estimator $ ( \alpha - a)/ \sigma _ {n} ( a, b , . . . ) $; if in the denominator the arguments $ a, b , . . . $ are replaced by maximum-likelihood estimators $ \alpha , \beta \dots $ then the limit distribution will remain the same as in formula (8). The approximate confidence intervals for each parameter in isolation can therefore be constructed in the same way as in the case of a single parameter.

As has already been noted, if $ X _ {1} \dots X _ {n} \dots $ are independent and identically normally distributed random variables, then $ \overline{X}\; $ and $ s ^ {2} $ are the best statistical estimators for the parameters $ a $ and $ \sigma ^ {2} $, respectively. The distribution function of the statistical estimator is expressed by the formula

$$ {\mathsf P} \{ \overline{X}\; < x \} = \Phi \left [ \frac{\sqrt n ( x- a) } \sigma \right ] $$

and, consequently, it depends not only on $ a $ but also on $ \sigma $. At the same time, the distribution of the so-called Student statistic

$$ \frac{\sqrt n ( \overline{X}\; - a) }{s} = \tau $$

does not depend on $ a $ or $ \sigma $, and

$$ {\mathsf P} \{ | \tau | \leq t \} = \ \omega _ {n-1} ( t) = C _ {n-1} \int\limits _ { 0 } ^ { t } \left ( 1+ \frac{\nu ^ {2} }{n-1} \right ) ^ {-n/2} d \nu , $$

where the constant $ C _ {n-1} $ is chosen so that the equality $ \omega _ {n-1} ( \infty ) = 1 $ is satisfied. Thus, the confidence coefficient $ \omega _ {n-1} ( t) $ corresponds to the confidence interval

$$ {\overline{X}\; - } \frac{st}{\sqrt n } < a < {\overline{X}\; + } \frac{st}{\sqrt n } . $$

The distribution of the estimator $ s ^ {2} $ depends only on $ \sigma ^ {2} $, while the distribution function of $ s ^ {2} $ is defined by the formula

$$ {\mathsf P} \left \{ s ^ {2} < \frac{\sigma ^ {2} x }{n-1} \right \} = \ G _ {n-1} ( x) = \ D _ {n-1} \int\limits _ { 0 } ^ { x } v ^ {(} n- 3)/2 e ^ {- v/2} dv, $$

where the constant $ D _ {n-1} $ is defined by the condition $ G _ {n-1} ( \infty ) = 1 $( the so-called $ \chi ^ {2} $- distribution with $ n- 1 $ degrees of freedom, cf. Chi-squared distribution). Since the probability $ {\mathsf P} \{ s ^ {2} < \sigma ^ {2} x/( n- 1) \} $ increases monotonically when $ \sigma $ increases, rule (9) can be used to construct an interval estimator. Thus, if $ x _ {1} $ and $ x _ {2} $ are the roots of the equations $ G _ {n-1} ( x _ {1} ) = ( 1- \omega )/2 $ and $ G _ {n-1} ( x _ {2} ) = ( 1+ \omega )/2 $, then the confidence coefficient $ \omega $ corresponds to the confidence interval

$$ \frac{( n- 1) s ^ {2} }{x _ {2} } < \sigma ^ {2} < \frac{( n- 1) s ^ {2} }{x _ {1} } . $$

Hence it follows that the confidence interval for the relative error is defined by the inequalities

$$ \frac{x _ {1} }{n-1} - 1 < \frac{s ^ {2} - \sigma ^ {2} }{\sigma ^ {2} } < \frac{x _ {2} }{n-1} - 1. $$

Detailed tables of the Student distribution function $ \omega _ {n-1} ( t) $ and of the $ \chi ^ {2} $- distribution $ G _ {n-1} ( x) $ can be found in most textbooks on mathematical statistics.

Until now it has been supposed that the distribution function of the results of observations is known up to values of various parameters. However, in practice the form of the distribution function is often unknown. In this case, when estimating the parameters, the so-called non-parametric methods in statistics can prove useful (i.e. methods which do not depend on the initial probability distribution). Suppose, for example, that the median $ m $ of a theoretical continuous distribution of independent random variables $ X _ {1} \dots X _ {n} $ has to be estimated (for symmetric distributions, the median coincides with the mathematical expectation, provided, of course, that it exists). Let $ Y _ {1} \leq \dots \leq Y _ {n} $ be the same variables $ X _ {i} $ arranged in ascending order. Then, if $ k $ is an integer which satisfies the inequalities $ 1 \leq k \leq n/2 $,

$$ {\mathsf P} \{ Y _ {k} < m < Y _ {n- k+ 1} \} = \ 1- 2 \sum_{r=0}^ {k-1} \left ( \begin{array}{c} n \\ r \end{array} \right ) \left ( \frac{1}{2} \right ) ^ {n} = \ \omega _ {n,k} . $$

Thus, $ ( Y _ {k} , Y _ {n- k+ 1} ) $ is an interval estimator for $ m $ with confidence coefficient $ \omega = \omega _ {n,k} $. This conclusion holds for any continuous distribution of the random variables $ X _ {i} $.

It has already been noted that a sample distribution is a point estimator for an unknown theoretical distribution. Moreover, the sample distribution function $ F _ {n} ( x) $ is an unbiased estimator for a theoretical distribution function $ F( x) $. Here, as A.N. Kolmogorov demonstrated, the distribution of the statistic

$$ \lambda _ {n} = \sqrt n \max _ {- \infty < x < \infty } | F _ {n} ( x) - F( x) | $$

does not depend on the unknown theoretical distribution and, when $ n \rightarrow \infty $, tends to a limit distribution $ K( y) $, which is called a Kolmogorov distribution. Thus, if $ y $ is the solution of the equation $ K( y) = \omega $, then it can be claimed, with probability $ \omega $, that the graph of the function of the theoretical distribution function $ F( y) $ is completely "covered" by a strip enclosed between the graphs of the functions $ F _ {n} ( x) \pm y/ \sqrt n $( when $ n \geq 20 $, the difference between the exact and limit distributions of the statistic $ \lambda _ {n} $ is immaterial). An interval estimator of this type is called a confidence region. See also Interval estimator.

Statistical estimators in the theory of errors.

The theory of errors is an area of mathematical statistics devoted to the numerical determination of unknown variables by means of results of measurements. Owing to the random nature of measurement errors, and possibly of the actual phenomenon being studied, these results are not all equally correct: when measurements are repeated, some results are encountered more frequently, some less frequently.

The theory of errors is based on a mathematical model according to which the totality of all conceivable results of the measurements is treated as the set of values of a certain random variable. The theory of statistical estimators is therefore of considerable importance. The conclusions drawn from the theory of errors are of a statistical character. The sense and content of these conclusions (and indeed of the conclusions of the theory of statistical estimation) become clear only in the light of the law of large numbers (an example of this approach is the statistical interpretation of the sense of the confidence coefficient discussed above).

In proposing the result of a measurement $ X $ of a random variable, there are three separate basic types of error measurements: systematic, random and gross (qualitative descriptions of these errors are given under Errors, theory of). Here, the difference $ X- a $ is called the error of the measurement of the unknown variable $ a $; the mathematical expectation of this difference, $ {\mathsf E} ( X- a) = b $, is called the systematic error (if $ b= 0 $, then the measurements are said to be free of systematic errors), while the difference $ \delta = X- a- b $ is called the random error ( $ {\mathsf E} \delta = 0 $). Thus, if $ n $ independent measurements of the variable $ a $ are taken, then their results can be written in the form of the equalities

$$ \tag{10 } X _ {i} = a + b + \delta _ {i} ,\ \ i = 1 \dots n, $$

where $ a $ and $ b $ are constants, while $ \delta _ {i} $ are random variables. In a more general case

$$ \tag{11 } X _ {i} = a + ( b + \beta _ {i} ) + \delta _ {i} ,\ \ i = 1 \dots n, $$

where $ \beta _ {i} $ are random variables which do not depend on $ \delta _ {i} $, and which are equal to zero with probability very close to one (every other value $ \beta _ {i} \neq 0 $ is therefore improbable). The values $ \beta _ {i} $ are called the gross errors (or outliers).

The problem of estimating (and eliminating) systematic errors does not normally fall within the limits of mathematical statistics. Two exceptions to this rule are the standard method, in which, when estimating $ b $, a series of measurements of the known value $ a $ is made (in this method, $ b $ is a value to be estimated and $ a $ is a known systematic error) and dispersion analysis, in which the systematic divergence between various series of measurements is estimated.

The fundamental problem in the theory of errors is to find a statistical estimator for an unknown variable $ a $ and to estimate the accuracy of the measurements. If the systematic error is eliminated $ ( b= 0) $ and the observations do not contain gross errors, then according to (10), $ X _ {i} = a + \delta _ {i} $, and in this case the problem of estimating $ a $ reduces to the problem of finding the optimal statistical estimator in one sense or another for the mathematical expectation of the identically distributed random variables $ X _ {i} $. As shown above, the form of such a statistical (point or interval) estimator depends essentially on the distribution law of the random errors. If this law is known up to various unknown parameters, then the maximum-likelihood method can be used to find an estimator for $ a $; in the alternative case, a statistical estimator for an unknown distribution function of the random errors $ \delta _ {i} $ has to be found, using the results of the observations of $ X _ {i} $( the "non-parametric" interval estimator of this function is shown above). In practice, two statistical estimators $ \overline{X}\; \approx a $ and $ s ^ {2} \approx {\mathsf D} \delta _ {i} $ often suffice (see (1) and (2)). If $ \delta _ {i} $ are identically normally distributed, then these statistical estimators are the best; in other cases, these estimators can prove to be quite inefficient.

The appearance of outliers (gross errors) complicates the problem of estimating the parameter $ a $. The proportion of observations in which $ \beta _ {i} \neq 0 $ is usually small, while the mathematical expectation of non-zero $ | \beta _ {i} | $ is significantly higher than $ \sqrt { {\mathsf D} \delta _ {i} } $( gross errors arise as a result of random miscalculation, incorrect reading of the measuring equipment, etc.). Results of measurements which contain gross errors are often easily spotted, as they differ greatly from the other results. Under these conditions, the most advisable means of identifying (and eliminating) gross errors is to carry out a direct analysis of the measurements, to check carefully that all experiments were carried out under the same conditions, to make a "double note" of the results, etc. Statistical methods of finding gross errors are only to be used in cases of doubt.

The simplest example of these methods is the statistical occurrence of an outlier, when either $ Y _ {1} = \min X _ {1} $ or $ Y _ {n} = \max X _ {i} $ is open to doubt (it is proposed that in the equalities (11) $ b= 0 $ and that the distribution law of the variables $ \delta _ {i} $ is known). In order to establish whether the hypothesis of the presence of an outlier is justified, a joint interval estimator (or prediction region) for the pair $ Y _ {1} , Y _ {n} $ is calculated (a confidence region), by proposing that all $ \beta _ {i} $ are equal to zero. If this statistical estimator "covers" the point with coordinates $ ( Y _ {1} , Y _ {n} ) $, then the doubt over the presence of an outlier has to be considered statistically unjustified; in the alternative case, the hypothesis of the absence of an outlier has to be accepted (the rejected theory is then usually discarded, as it is statistically impossible to reliably estimate the value of the outlier at all using one observation).

For example, let $ a $ be unknown, let $ b= 0 $ and let $ \delta _ {i} $ be independent and identically normally distributed (the variance is unknown). If all $ \beta _ {i} = 0 $, then the distribution of the random variable

$$ Z = \frac{\max | X _ {i} - \overline{X}\; | }{\widehat{s} } $$

does not depend on unknown parameters (the statistical estimators $ X $ and $ \widehat{s} $ are calculated, using all $ n $ observations, according to the formulas (1) and (2)). For large values

$$ {\mathsf P} \{ Z > z \} \approx n \left [ 1 - \omega _ {n-2} \left ( z {\sqrt { \frac{n- 2 }{n- 1- z ^ {2} } } } \right ) \right ] , $$

where $ \omega _ {r} ( t) $ is the Student distribution function, as defined above. Thus, with confidence coefficient

$$ \tag{12 } \omega \approx 1 - n \left [ 1 - \omega _ {n-2} \left ( z \sqrt {n- \frac{2}{n- 1- z ^ {2} } } \right ) \right ] $$

it can be claimed that in the absence of an outlier the inequality $ Z < z $ is satisfied, or, put another way,

$$ \overline{X}\; - z \widehat{s} < Y _ {1} < Y _ {n} < \overline{X}\; + z \widehat{s} . $$

(The error in the estimation of the confidence coefficient by means of formula (12) does not exceed $ \omega ^ {2} /2 $.) Therefore, if all results of the measurements of $ X _ {i} $ fall within the limits $ X \pm z \widehat{s} $, then there are no grounds for supposing that any measurement contains an outlier.

References

[1]	H. Cramér, M.R. Leadbetter, "Stationary and related stochastic processes" , Wiley (1967) pp. Chapts. 33–34
[2]	N.V. Smirnov, I.V. Dunin-Barkovskii, "Mathematische Statistik in der Technik" , Deutsch. Verlag Wissenschaft. (1969) (Translated from Russian)
[3]	Yu.V. Linnik, "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft. (1961) (Translated from Russian)
[4]	B.L. van der Waerden, "Mathematische Statistik" , Springer (1957)
[5]	N. Arley, K.R. Buch, "Introduction to the theory of probability and statistics" , Wiley (1950)
[6]	A.N. Kolmogorov, "On the statistical estimation of the parameters of the Gauss distribution" Izv. Akad. Nauk SSSR Ser. Mat. , 6 : 1–2 (1942) pp. 3–32 (In Russian) (French abstract)

Comments

This article passes by the possibility of robust estimation, whereby procedures for dealing with gross errors ( "outlieroutliers" ) are integrated with estimation of the parameter concerned. See e.g. [a1] and Robust statistics.

References

[a1]	F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, W.A. Stahel, "Robust statistics. The approach based on influence functions" , Wiley (1986)
[a2]	E.L. Lehmann, "Theory of point estimation" , Wiley (1983)
[a3]	D.R. Cox, D.V. Hinkley, "Theoretical statistics" , Chapman & Hall (1974) pp. 21
[a4]	E.A. Nadaraya, "Nonparametric estimation of probability densities and regression curves" , Kluwer (1989) (Translated from Russian)

How to Cite This Entry:
Statistical estimator. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Statistical_estimator&oldid=13861

This article was adapted from an original article by L.N. Bol'shev (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Statistical estimator"

Latest revision as of 16:41, 13 January 2024

Contents

Point estimators.

Interval estimators.

Statistical estimators in the theory of errors.

References

Comments

References

@@ Line 1: / Line 1: @@
-A function of random variables that can be used in estimating unknown parameters of a theoretical probability distribution. Methods of the theory of statistical estimation form the basis of the modern theory of errors; physical constants to be measured are commonly used as the unknown parameters, while the results of direct measurements subject to random errors are taken as the random variables. For example, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s0873601.png" /> are independent, identically normally distributed random variables (the results of equally accurate measurements subject to independent normally distributed random errors), then for the unknown mean value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s0873602.png" /> (the value of an approximately measurable physical constant) the arithmetical mean
+<!--
+s0873601.png
+$#A+1 = 327 n = 0
+$#C+1 = 327 : ~/encyclopedia/old_files/data/S087/S.0807360 Statistical estimator
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s0873603.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+{{TEX|auto}}
+{{TEX|done}}
+A function of random variables that can be used in estimating unknown parameters of a theoretical probability distribution. Methods of the theory of statistical estimation form the basis of the modern theory of errors; physical constants to be measured are commonly used as the unknown parameters, while the results of direct measurements subject to random errors are taken as the random variables. For example, if  $  X _ {1} \dots X _ {n} $
+are independent, identically normally distributed random variables (the results of equally accurate measurements subject to independent normally distributed random errors), then for the unknown mean value  $  a $(
+the value of an approximately measurable physical constant) the arithmetical mean
+$$ \tag{1 }
+X  =
+\frac{X _ {1} + \dots + X _ {n} }{n}
+$$
 is taken as the statistical estimator.
@@ Line 10: / Line 28: @@
 A point estimator is a statistical estimator whose value can be represented geometrically in the form of a point in the same space as the values of the unknown parameters (the dimension of the space is equal to the number of parameters to be estimated). In fact, point estimators are also used as approximate values for unknown physical variables. For the sake of simplicity, it is further supposed that one natural parameter is subject to estimation; in this case, a point estimator is a function of the results of observations, and takes numerical values.
-A point estimator is said to be unbiased if its mathematical expectation coincides with the parameter being estimated, i.e. if the statistical estimation is free of systematic errors. The arithmetical mean (1) is an unbiased statistical estimator for the mathematical expectation of identically-distributed random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s0873604.png" /> (not necessarily normal). At the same time, the sample variance
+A point estimator is said to be unbiased if its mathematical expectation coincides with the parameter being estimated, i.e. if the statistical estimation is free of systematic errors. The arithmetical mean (1) is an unbiased statistical estimator for the mathematical expectation of identically-distributed random variables  $  X _ {i} $(
+not necessarily normal). At the same time, the sample variance
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s0873605.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+$$ \tag{2 }
+\widehat{s}  {}  ^ {2}  =
+\frac{( X _ {1} - \overline{X}\; )  ^ {2} + \dots + ( X _ {n} - \overline{X}\; )  ^ {2} }{n}
-is a biased statistical estimator for the variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s0873606.png" />, since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s0873607.png" />; the function
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s0873608.png" /></td> </tr></table>
+is a biased statistical estimator for the variance  $  \sigma  ^ {2} = {\mathsf D} X _ {i} $,
+since  $  {\mathsf E} {\widehat{s}  } {}  ^ {2} = ( 1- 1/n) \sigma  ^ {2} $;
+the function
-is usually taken as the unbiased statistical estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s0873609.png" />.
+$$
+s  ^ {2}  =  \frac{n}{n-1} {\widehat{s}  } {}  ^ {2}
+$$
+is usually taken as the unbiased statistical estimator for  $  \sigma  ^ {2} $.
 See also [[Unbiased estimator|Unbiased estimator]].
-As a measure of the accuracy of the unbiased statistical estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736010.png" /> for a parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736011.png" /> one most often uses the variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736012.png" />.
+As a measure of the accuracy of the unbiased statistical estimator  $  \alpha $
+for a parameter  $  a $
+one most often uses the variance  $  {\mathsf D} \alpha $.
-The statistical estimator with smallest variance is called the best. In the example quoted, the arithmetical mean (1) is the best statistical estimator. However, if the probability distribution of the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736013.png" /> is different from normal, then (1) need not be the best statistical estimator. For example, if the results of the observations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736014.png" /> are uniformly distributed in an interval <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736015.png" />, then the best statistical estimator for the mathematical expectation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736016.png" /> will be half the sum of the boundary values:
+The statistical estimator with smallest variance is called the best. In the example quoted, the arithmetical mean (1) is the best statistical estimator. However, if the probability distribution of the random variables  $  X _ {i} $
+is different from normal, then (1) need not be the best statistical estimator. For example, if the results of the observations of  $  X _ {i} $
+are uniformly distributed in an interval  $  ( b, c) $,
+then the best statistical estimator for the mathematical expectation  $  a = ( b+ c)/2 $
+will be half the sum of the boundary values:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736017.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
+$$ \tag{3 }
+\alpha  =
+\frac{\min  X _ {i} + \max  X _ {i} }{2}
+ .
+$$
-The criterion for the comparison of the accuracy of different statistical estimators ordinarily used is the relative efficiency — the ratio of the variances of the best estimator and the given unbiased estimator. For example, if the results of the observations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736018.png" /> are uniformly distributed, then the variances of the estimators (1) and (3) are expressed by the formulas
+The criterion for the comparison of the accuracy of different statistical estimators ordinarily used is the relative efficiency — the ratio of the variances of the best estimator and the given unbiased estimator. For example, if the results of the observations of  $  X _ {i} $
+are uniformly distributed, then the variances of the estimators (1) and (3) are expressed by the formulas
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736019.png" /></td> </tr></table>
+$$
+{\mathsf D} \overline{X}\;  =
+\frac{( c- b)  ^ {2} }{12n}
+$$
 and
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736020.png" /></td> <td valign="top" style="width:5%;text-align:right;">(4)</td></tr></table>
+$$ \tag{4 }
+{\mathsf D} \alpha  =
+\frac{( c- b)  ^ {2} }{2( n+ 1) ( n+ 2) }
+ .
+$$
 Since (3) is the best estimator, the relative efficiency of the estimator (1) in the given case is
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736021.png" /></td> </tr></table>
+$$
+e _ {n} ( \overline{X}\; )  =
+\frac{6n}{( n+ 1)( n+ 2) }
+  \sim
+\frac{6}{n}
+ .
+$$
+For a large number of observations  $  n $,
+it is usually required that the chosen statistical estimator tends in probability to the true value of the parameter  $  a $,
+i.e. that for every  $  \epsilon > 0 $,
+$$
+\lim\limits _ {n \rightarrow \infty }  {\mathsf P} \{
+| \alpha - a | > \epsilon \}  =  0;
+$$
+such statistical estimators are called consistent (for example, any unbiased estimator with variance tending to zero, when  $  n \rightarrow \infty $,
+is consistent; see also [[Consistent estimator|Consistent estimator]]). Insofar as the order of tendency to the limit is of significance, the asymptotically best estimators are the asymptotically efficient statistical estimators, i.e. those for which
+$$
-For a large number of observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736022.png" />, it is usually required that the chosen statistical estimator tends in probability to the true value of the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736023.png" />, i.e. that for every <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736024.png" />,
+\frac{ {\mathsf E} ( \alpha - a) }{\sqrt { {\mathsf E} ( \alpha - a)  ^ {2} } }
+  \rightarrow  0
+\  \textrm{ and } \  e _ {n} ( \alpha )  \rightarrow  1,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736025.png" /></td> </tr></table>
+when  $  n \rightarrow \infty $.
+For example, if  $  X _ {1} \dots X _ {n} $
+are identically normally distributed, then (2) is an asymptotically efficient estimator for the unknown parameter  $  \sigma  ^ {2} = {\mathsf D} X _ {i} $,
+since, when  $  n \rightarrow \infty $,
+the variance of  $  \widehat{s}  {}  ^ {2} $
+and that of the best estimator  $  \widehat{s}  {}  ^ {2} n/( n- 1) $
+are asymptotically equivalent:
-such statistical estimators are called consistent (for example, any unbiased estimator with variance tending to zero, when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736026.png" />, is consistent; see also [[Consistent estimator|Consistent estimator]]). Insofar as the order of tendency to the limit is of significance, the asymptotically best estimators are the asymptotically efficient statistical estimators, i.e. those for which
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736027.png" /></td> </tr></table>
+\frac{ {\mathsf D} {\widehat{s}  } {}  ^ {2} }{ {\mathsf D} [ {\widehat{s}  } {}  ^ {2} n/( n- 1)] }
+  = \
-when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736028.png" />. For example, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736029.png" /> are identically normally distributed, then (2) is an asymptotically efficient estimator for the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736030.png" />, since, when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736031.png" />, the variance of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736032.png" /> and that of the best estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736033.png" /> are asymptotically equivalent:
+\frac{n}{( n- 1)  ^ {2} }
+ ,\ \
+{\mathsf D} {\widehat{s}  } {}  ^ {2}  = \
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736034.png" /></td> </tr></table>
+\frac{2 \sigma  ^ {4} }{n-1},
+$$
 and, moreover,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736035.png" /></td> </tr></table>
+$$
+{\mathsf E} ( {\widehat{s}  } {}  ^ {2} - \sigma  ^ {2} )  =
+\frac{- \sigma  ^ {2} }{n}
+ .
+$$
-Of prime importance in the theory of statistical estimation and its applications is the fact that the quadratic deviation of a statistical estimator for a parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736036.png" /> is bounded from below by a certain quantity (R. Fisher proposed that this quantity be characterized by the amount of information regarding the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736037.png" /> contained in the results of the observations). For example, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736038.png" /> are independent and identically distributed, with probability density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736039.png" />, and if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736040.png" /> is a statistical estimator for a certain function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736041.png" /> of the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736042.png" />, then in a broad class of cases
+Of prime importance in the theory of statistical estimation and its applications is the fact that the quadratic deviation of a statistical estimator for a parameter  $  a $
+is bounded from below by a certain quantity (R. Fisher proposed that this quantity be characterized by the amount of information regarding the unknown parameter  $  a $
+contained in the results of the observations). For example, if  $  X _ {1} \dots X _ {n} $
+are independent and identically distributed, with probability density  $  p( x;  a) $,
+and if  $  \alpha = \phi ( X _ {1} \dots X _ {n} ) $
+is a statistical estimator for a certain function  $  g( a) $
+of the parameter  $  a $,
+then in a broad class of cases
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736043.png" /></td> <td valign="top" style="width:5%;text-align:right;">(5)</td></tr></table>
+$$ \tag{5 }
+{\mathsf E} [ \alpha - g( a)]  ^ {2}  \geq
+\frac{nb  ^ {2} ( a) I( a) + [ g  ^  \prime  ( a) + b  ^  \prime
+( a)]  ^ {2} }{nI( a) }
+ ,
+$$
 where
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736044.png" /></td> </tr></table>
+$$
+b( a)  =  {\mathsf E} [ \alpha - g( a)] \  \textrm{ and } \ \
+I( a)  =  {\mathsf E} \left [
+\frac{\partial   \mathop{\rm ln}  p( X;  a) }{\partial  a }
+ \right ]  ^ {2} .
+$$
-The function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736045.png" /> is called the bias, while the quantity inverse to the right-hand side of inequality (5) is called the Fisher information, with respect to the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736046.png" />, contained in the results of the observations. In particular, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736047.png" /> is an unbiased statistical estimator of the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736048.png" />, then
+The function  $  b( a) $
+is called the bias, while the quantity inverse to the right-hand side of inequality (5) is called the Fisher information, with respect to the function  $  g( a) $,
+contained in the results of the observations. In particular, if  $  \alpha $
+is an unbiased statistical estimator of the parameter  $  a $,
+then
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736049.png" /></td> </tr></table>
+$$
+g( a)  \equiv  a,\  b( a)  \equiv  0 ,
+$$
 and
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736050.png" /></td> <td valign="top" style="width:5%;text-align:right;">(6)</td></tr></table>
+$$ \tag{6 }
+{\mathsf E} [ \alpha - g( a)]  ^ {2}  =  {\mathsf D} \alpha  \geq
+\frac{1}{nI(}
+ a) ,
+$$
+whereby the information  $  nI( a) $
+in this instance is proportional to the number of observations (the function  $  I( a) $
+is called the information contained in one observation).
+The basic conditions under which the inequalities (5) and (6) hold are smoothness of the estimator  $  \alpha $
+as a function of  $  X _ {i} $,
+and the independence of the parameter  $  a $
+of the set of those points  $  x $
+where  $  p( x;  a) = 0 $.
+The latter condition is not fulfilled, for example, in the case of a [[Uniform distribution|uniform distribution]], and the variance of the estimator (3) does therefore not satisfy inequality (6) (according to (4), this variance is a quantity of order  $  n  ^ {-2} $,
+while, according to inequality (6), it cannot have an order of smallness higher than  $  n  ^ {-1} $).
+The inequalities (5) and (6) also hold for discretely distributed random variables  $  X _ {i} $:
+In defining the information  $  I( a) $,
+the density  $  p( x;  a) $
+must be replaced by the probability of the event  $  \{ X = x \} $.
-whereby the information <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736051.png" /> in this instance is proportional to the number of observations (the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736052.png" /> is called the information contained in one observation).
+If the variance of an unbiased statistical estimator  $  \alpha  ^ {*} $
+for the parameter  $  a $
+coincides with the right-hand side of inequality (6), then  $  \alpha  ^ {*} $
+is the best estimator. The converse assertion, generally speaking, is not true: The variance of the best statistical estimator can exceed  $  [ nI( a)]  ^ {-1} $.
+However, as  $  n \rightarrow \infty $,
+the variance of the best estimator,  $  {\mathsf D} \alpha  ^ {*} $,
+is asymptotically equivalent to the right-hand side of (6), i.e.  $  n {\mathsf D} \alpha  ^ {*} \rightarrow 1/I( a) $.
+In this way, using the Fisher information, it is possible to define the asymptotic efficiency of an unbiased statistical estimator  $  \alpha $,
+by proposing
-The basic conditions under which the inequalities (5) and (6) hold are smoothness of the estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736053.png" /> as a function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736054.png" />, and the independence of the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736055.png" /> of the set of those points <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736056.png" /> where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736057.png" />. The latter condition is not fulfilled, for example, in the case of a [[Uniform distribution|uniform distribution]], and the variance of the estimator (3) does therefore not satisfy inequality (6) (according to (4), this variance is a quantity of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736058.png" />, while, according to inequality (6), it cannot have an order of smallness higher than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736059.png" />).
+$$ \tag{7 }
+e _  \infty  ( \alpha )  = \
+\lim\limits _ {n \rightarrow \infty }
+\frac{ {\mathsf D} \alpha  ^ {*} }{ {\mathsf D} \alpha }
+  = \
+\lim\limits _ {n \rightarrow \infty }
+\frac{1}{nI( a) {\mathsf D} \alpha }
+ .
+$$
-The inequalities (5) and (6) also hold for discretely distributed random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736060.png" />: In defining the information <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736061.png" />, the density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736062.png" /> must be replaced by the probability of the event <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736063.png" />.
+One information approach to the theory of statistical estimators which proves to be particularly fruitful is that where the density (in the discrete instance, the probability) of the joint distribution of the random variables  $  X _ {1} \dots X _ {n} $
+can be represented in the form of the product of two functions  $  h( x _ {1} \dots x _ {n} ) q[ y( x _ {1} \dots x _ {n} );  a] $,
+the first of which does not depend on  $  a $
+while the second is the density of the distribution of a certain random variable  $  Z = y( X _ {1} \dots X _ {n} ) $,
+called a [[Sufficient statistic|sufficient statistic]].
-If the variance of an unbiased statistical estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736064.png" /> for the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736065.png" /> coincides with the right-hand side of inequality (6), then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736066.png" /> is the best estimator. The converse assertion, generally speaking, is not true: The variance of the best statistical estimator can exceed <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736067.png" />. However, as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736068.png" />, the variance of the best estimator, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736069.png" />, is asymptotically equivalent to the right-hand side of (6), i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736070.png" />. In this way, using the Fisher information, it is possible to define the asymptotic efficiency of an unbiased statistical estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736071.png" />, by proposing
+One of the most frequently used methods of finding point estimators is the method of moments (cf. [[Moments, method of (in probability theory)|Moments, method of (in probability theory)]]). According to this method, a theoretical distribution dependent on unknown parameters corresponds to a discrete sample distribution, which is defined by the results of observations of  $  X _ {i} $
+and which is the probability distribution of a theoretical random variable which takes the values  $  X _ {1} \dots X _ {n} $
+with identical probabilities equal to  $  1/n $(
+the sample distribution can be seen as a point estimator for the theoretical distribution). The statistical estimator for the moments of a theoretical distribution is taken to be that of the corresponding moments of the sample distribution; for example, for the mathematical expectation  $  a $
+and variance  $  \sigma  ^ {2} $,
+the method of moments provides the following statistical estimators: the sample mean (1) and the sample variance (2). The unknown parameters are usually expressed (exactly or approximately) in the form of functions of several moments of the theoretical distribution. By replacing theoretical moments in these functions by sample moments, the required statistical estimators are obtained. This method, which in practice often reduces to comparatively simple calculations, generally gives a statistical estimator of low asymptotic efficiency (see the above example of the estimator of the mathematical expectation of a uniform distribution).
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736072.png" /></td> <td valign="top" style="width:5%;text-align:right;">(7)</td></tr></table>
+Another method for finding statistical estimators, which is more complete from the theoretical point of view, is the [[Maximum-likelihood method|maximum-likelihood method]]. According to this method, the likelihood function  $  L( a) $
+is considered, which is a function of the unknown parameter  $  a $,
+and which is obtained as a result of substituting the random variables  $  X _ {i} $
+in the density  $  p( x _ {1} \dots x _ {n} ;  n) $
+of the joint distribution for the arguments; if the  $  X _ {i} $
+are independent and identically distributed with probability density  $  p( x;  a) $,
+then
-One information approach to the theory of statistical estimators which proves to be particularly fruitful is that where the density (in the discrete instance, the probability) of the joint distribution of the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736073.png" /> can be represented in the form of the product of two functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736074.png" />, the first of which does not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736075.png" /> while the second is the density of the distribution of a certain random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736076.png" />, called a [[Sufficient statistic|sufficient statistic]].
+$$
+L( a)  =  p( X _ {1} ;  a) \dots p( X _ {n} ;  a)
+$$
-One of the most frequently used methods of finding point estimators is the method of moments (cf. [[Moments, method of (in probability theory)|Moments, method of (in probability theory)]]). According to this method, a theoretical distribution dependent on unknown parameters corresponds to a discrete sample distribution, which is defined by the results of observations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736077.png" /> and which is the probability distribution of a theoretical random variable which takes the values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736078.png" /> with identical probabilities equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736079.png" /> (the sample distribution can be seen as a point estimator for the theoretical distribution). The statistical estimator for the moments of a theoretical distribution is taken to be that of the corresponding moments of the sample distribution; for example, for the mathematical expectation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736080.png" /> and variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736081.png" />, the method of moments provides the following statistical estimators: the sample mean (1) and the sample variance (2). The unknown parameters are usually expressed (exactly or approximately) in the form of functions of several moments of the theoretical distribution. By replacing theoretical moments in these functions by sample moments, the required statistical estimators are obtained. This method, which in practice often reduces to comparatively simple calculations, generally gives a statistical estimator of low asymptotic efficiency (see the above example of the estimator of the mathematical expectation of a uniform distribution).
+(if the  $  X _ {i} $
+are discretely distributed, then in defining the likelihood function  $  L $
+the density should be replaced by the probability of the events  $  \{ X _ {i} = x _ {i} \} $).
+The variable  $  \alpha $
+for which  $  L( \alpha ) $
+has its largest value is used as the maximum-likelihood estimator for the unknown parameter  $  a $(
+instead of  $  L $,
+the so-called logarithmic likelihood function is often considered:  $  l( \alpha ) =  \mathop{\rm ln}  L( \alpha ) $;
+owing to the monotone nature of the logarithm, the maximum points of  $  L( \alpha ) $
+and  $  l( \alpha ) $
+coincide).
-Another method for finding statistical estimators, which is more complete from the theoretical point of view, is the [[Maximum-likelihood method|maximum-likelihood method]]. According to this method, the likelihood function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736082.png" /> is considered, which is a function of the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736083.png" />, and which is obtained as a result of substituting the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736084.png" /> in the density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736085.png" /> of the joint distribution for the arguments; if the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736086.png" /> are independent and identically distributed with probability density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736087.png" />, then
+The basic merit of maximum-likelihood estimators lies in the fact that, given certain general conditions, they are consistent, asymptotically efficient and approximately normally distributed. These properties mean that if  $  \alpha $
+is a maximum-likelihood estimator, then, when  $  n \rightarrow \infty $,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736088.png" /></td> </tr></table>
+$$
+{\mathsf E} \alpha  \sim  a \  \textrm{ and } \ \
+{\mathsf E} ( \alpha - a)  ^ {2}  \sim  {\mathsf D} \alpha  \sim  \sigma _ {n}  ^ {2} ( a)
+ =
+\frac{1}{ {\mathsf E} \left [
+\frac{d}{da}
+ l ( a) \right ]  ^ {2} }
-(if the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736089.png" /> are discretely distributed, then in defining the likelihood function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736090.png" /> the density should be replaced by the probability of the events <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736091.png" />). The variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736092.png" /> for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736093.png" /> has its largest value is used as the maximum-likelihood estimator for the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736094.png" /> (instead of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736095.png" />, the so-called logarithmic likelihood function is often considered: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736096.png" />; owing to the monotone nature of the logarithm, the maximum points of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736097.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736098.png" /> coincide).
+$$
-The basic merit of maximum-likelihood estimators lies in the fact that, given certain general conditions, they are consistent, asymptotically efficient and approximately normally distributed. These properties mean that if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s08736099.png" /> is a maximum-likelihood estimator, then, when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360100.png" />,
+(if the  $  X _ {i} $
+are independent, then  $  \sigma _ {n}  ^ {2} ( a) = [ nI( a)]  ^ {-1} $).
+Thus, for the distribution function of a normalized statistical estimator  $  ( \alpha - a)/ \sigma _ {n} ( a) $,
+the limit relation
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360101.png" /></td> </tr></table>
+$$ \tag{8 }
+\lim\limits _ {n \rightarrow \infty }  {\mathsf P} \left \{
-(if the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360102.png" /> are independent, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360103.png" />). Thus, for the distribution function of a normalized statistical estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360104.png" />, the limit relation
+\frac{\alpha - a }{\sigma _ {n} ( a) }
+ < x \right \}  = \
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360105.png" /></td> <td valign="top" style="width:5%;text-align:right;">(8)</td></tr></table>
+\frac{1}{\sqrt {2 \pi } }
+ \int\limits _ {- \infty } ^ { x }  e ^ {- t  ^ {2} /2 }  dt  \equiv \
+\Phi ( x)
+$$
 holds.
-The advantages of the maximum-likelihood estimator justify the amount of calculation involved in seeking the maximum of the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360106.png" /> (or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360107.png" />). In certain cases, the amount of calculation is greatly reduced as a result of the following properties: firstly, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360108.png" /> is a statistical estimator for which inequality (6) becomes an equality, then the maximum-likelihood estimator is unique and coincides with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360109.png" />; secondly, if a sufficient statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360110.png" /> exists, then the maximum-likelihood estimator is a function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360111.png" />.
+The advantages of the maximum-likelihood estimator justify the amount of calculation involved in seeking the maximum of the function  $  L $(
+or  $  l $).
+In certain cases, the amount of calculation is greatly reduced as a result of the following properties: firstly, if  $  \alpha  ^ {*} $
+is a statistical estimator for which inequality (6) becomes an equality, then the maximum-likelihood estimator is unique and coincides with  $  \alpha  ^ {*} $;
+secondly, if a sufficient statistic  $  Z $
+exists, then the maximum-likelihood estimator is a function of  $  Z $.
+For example, let  $  X _ {1} \dots X _ {n} $
+be independent and normally distributed, and such that
-For example, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360112.png" /> be independent and normally distributed, and such that
+$$
+p( x;  a, \sigma )  = \
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360113.png" /></td> </tr></table>
+\frac{1}{\sigma \sqrt {2 \pi } }
+  \mathop{\rm exp} \left \{ -
+\frac{1}{2 \sigma  ^ {2} }
+ ( x - a)
+ ^ {2} \right \} ,
+$$
 then
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360114.png" /></td> </tr></table>
+$$
+l( a, \sigma )  =   \mathop{\rm ln}  L( a, \sigma ) =
+$$
+$$
+= \
+-
+\frac{n}{2}
+  \mathop{\rm ln} ( 2 \pi ) - n   \mathop{\rm ln}  \sigma -
+\frac{1}{2
+\sigma  ^ {2} }
+ \sum_{i=1}^ { n }  ( X _ {i} - a)  ^ {2} .
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360115.png" /></td> </tr></table>
+The coordinates  $  a = a _ {0} $
+and  $  \sigma = \sigma _ {0} $
+of the maximum point of the function  $  I( a, \sigma ) $
+satisfy the system of equations
-The coordinates <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360116.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360117.png" /> of the maximum point of the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360118.png" /> satisfy the system of equations
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360119.png" /></td> </tr></table>
+\frac{\partial  l }{\partial  a }
+  \equiv \
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360120.png" /></td> </tr></table>
+\frac{1}{\sigma  ^ {2} }
+ \sum ( X _ {i} - a)  =  0,
+$$
-Thus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360121.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360122.png" />, and in the given case (1) and (2) are maximum-likelihood estimators, whereby <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360123.png" /> is the best statistical estimator of the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360124.png" />, normally distributed (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360125.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360126.png" />), while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360127.png" /> is an asymptotically efficient statistical estimator of the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360128.png" />, distributed approximately normally for large <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360129.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360130.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360131.png" />). Both estimators are independent sufficient statistics.
+$$
+\frac{\partial  l }{\partial  a }
+  \equiv  -
+\frac{n}{\sigma  ^ {3} }
+ \left [
+\sigma  ^ {2} -
+\frac{1}{n}
+ \sum ( X _ {i} - a)  ^ {2} \right ]  =  0.
+$$
+Thus,  $  a _ {0} = \overline{X}\; = \sum X _ {i/n} $,
+$  \sigma _ {0}  ^ {2} = {\widehat{s}  } {}  ^ {2} = \sum ( X _ {i} - \overline{X}\; )  ^ {2} /n $,
+and in the given case (1) and (2) are maximum-likelihood estimators, whereby  $  \overline{X}\; $
+is the best statistical estimator of the parameter  $  a $,
+normally distributed ( $  {\mathsf E} \overline{X}\; = a $,
+$  {\mathsf D} \overline{X}\; = \sigma  ^ {2} /n $),
+while  $  {\widehat{s}  } {}  ^ {2} $
+is an asymptotically efficient statistical estimator of the parameter  $  \sigma  ^ {2} $,
+distributed approximately normally for large  $  n $(
+$  {\mathsf E} {\widehat{s}  } {}  ^ {2} \sim \sigma  ^ {2} $,
+$  {\mathsf D} {\widehat{s}  } {}  ^ {2} \sim 2 \sigma  ^ {4} /n $).
+Both estimators are independent sufficient statistics.
 As a further example, suppose that
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360132.png" /></td> </tr></table>
+$$
+p( x;  a)  =  \{ \pi [ 1+( x- a)  ^ {2} ] \} .
+$$
-This density gives a satisfactory description of the distribution of one of the coordinates of the particles reaching a plane screen and emanating from a point outside the screen (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360133.png" /> is the coordinate of the projection of the source onto the screen, and is presumed to be unknown). The mathematical expectation of this distribution does not exist, since the corresponding integral is divergent. For this reason it is not possible to find a statistical estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360134.png" /> by means of the method of moments. The formal use of the arithmetical mean (1) as a statistical estimator is meaningless, since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360135.png" /> is distributed in the given instance with the same density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360136.png" /> as every single result of the observations. For estimation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360137.png" /> it is possible to make use of the property that the distribution in question is symmetric relative to the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360138.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360139.png" /> is the median of the theoretical distribution. By slightly modifying the method of moments, the sample median <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360140.png" /> can be used as a statistical estimator. When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360141.png" />, it is unbiased for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360142.png" /> and if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360143.png" /> is large, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360144.png" /> is distributed approximately normally with variance
+This density gives a satisfactory description of the distribution of one of the coordinates of the particles reaching a plane screen and emanating from a point outside the screen ( $  a $
+is the coordinate of the projection of the source onto the screen, and is presumed to be unknown). The mathematical expectation of this distribution does not exist, since the corresponding integral is divergent. For this reason it is not possible to find a statistical estimator of  $  a $
+by means of the method of moments. The formal use of the arithmetical mean (1) as a statistical estimator is meaningless, since  $  \overline{X}\; $
+is distributed in the given instance with the same density  $  p( x;  a) $
+as every single result of the observations. For estimation of  $  a $
+it is possible to make use of the property that the distribution in question is symmetric relative to the point  $  x= a $,
+where  $  a $
+is the median of the theoretical distribution. By slightly modifying the method of moments, the sample median  $  \mu $
+can be used as a statistical estimator. When  $  n \geq  3 $,
+it is unbiased for  $  a $
+and if  $  n $
+is large,  $  \mu $
+is distributed approximately normally with variance
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360145.png" /></td> </tr></table>
+$$
+{\mathsf D} \mu  \sim
+\frac{\pi  ^ {2} }{4n}
+ .
+$$
 At the same time,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360146.png" /></td> </tr></table>
+$$
+l( a)  =  - n   \mathop{\rm ln}  \pi + \sum_{i=1}^ { n }   \mathop{\rm ln} [ 1 + ( X _ {i} - a)  ^ {2} ],
+$$
-thus <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360147.png" /> and, according to (7), the asymptotic efficiency <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360148.png" /> is equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360149.png" />. Thus, in order that the sample median <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360150.png" /> is as accurate a statistical estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360151.png" /> as the maximum-likelihood estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360152.png" />, the number of observations has to be increased by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360153.png" />. If the losses in the experiment are great, then, in the definition of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360154.png" />, that statistical estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360155.png" /> must be used, which, in the given case, is defined as the root of the equation
+thus  $  nl( a) = n/2 $
+and, according to (7), the asymptotic efficiency  $  e _  \infty  ( \mu ) $
+is equal to  $  8/ \pi  ^ {2} \approx 0.811 $.
+Thus, in order that the sample median  $  \mu $
+is as accurate a statistical estimator for  $  a $
+as the maximum-likelihood estimator  $  \alpha $,
+the number of observations has to be increased by  $  25\pct $.
+If the losses in the experiment are great, then, in the definition of  $  a $,
+that statistical estimator  $  \alpha $
+must be used, which, in the given case, is defined as the root of the equation
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360156.png" /></td> </tr></table>
+$$
-As a first approximation, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360157.png" /> is used, and this equation is then solved by successive approximation using the formula
+\frac{\partial  l }{\partial  a }
+  \equiv  - 2 \sum_{i=1}^ { n }
+\frac{X _ {i} - a }{1 + ( X _ {i} - a)  ^ {2} }
+  =  0.
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360158.png" /></td> </tr></table>
+As a first approximation,  $  \alpha _ {0} = \mu $
+is used, and this equation is then solved by successive approximation using the formula
+$$
+\alpha _ {k+1}  =  \alpha _ {k} +
+\frac{4}{n}
+ \sum_{i=1}^ { n }
+\frac{X _ {i} - \alpha _ {k} }{1 + ( X _ {i} - \alpha _ {k} )  ^ {2} }
+ .
+$$
 See also [[Point estimator|Point estimator]].
@@ Line 147: / Line 429: @@
 An interval estimator is a statistical estimator which is represented geometrically as a set of points in the parameter space. An interval estimator can be seen as a set of point estimators. This set depends on the results of observations, and is consequently random; every interval estimator is therefore (partly) characterized by the probability with which this estimator will  "cover"  the unknown parameter point. This probability, in general, depends on unknown parameters; therefore, as a characteristic of the reliability of an interval estimator a confidence coefficient is used; this is the lowest possible value of the given probability. Interesting statistical conclusions can be drawn for only those interval estimators which have a confidence coefficient close to one.
-If a single parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360159.png" /> is estimated, then an interval estimator is usually a certain interval <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360160.png" /> (the so-called confidence interval), the end-points <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360161.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360162.png" /> of which are functions of the observations; the confidence coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360163.png" /> in the given case is defined as the lower bound of the probability of the simultaneous realization of the two events <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360164.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360165.png" />, which can be calculated using all possible values of the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360166.png" />:
+If a single parameter  $  a $
+is estimated, then an interval estimator is usually a certain interval  $  ( \beta , \gamma ) $(
+the so-called confidence interval), the end-points  $  \beta $
+and  $  \gamma $
+of which are functions of the observations; the confidence coefficient  $  \omega $
+in the given case is defined as the lower bound of the probability of the simultaneous realization of the two events  $  \{ \beta < a \} $
+and  $  \{ \gamma > a \} $,
+which can be calculated using all possible values of the parameter  $  a $:
+$$
+\omega  =  \inf _ { a }  {\mathsf P} \{ \beta < a < \gamma \} .
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360167.png" /></td> </tr></table>
+If the mid-point  $  ( \beta + \gamma )/2 $
+of such an interval is taken as a point estimator for the parameter  $  a $,
+then it can be claimed, with probability not less that  $  \omega $,
+that the absolute error of this statistical estimator does not exceed half the length of the interval,  $  ( \gamma - \beta )/2 $.
+In other words, if one is guided by the rule of estimation of the absolute error, then an erroneous conclusion will be obtained on the average in less than  $  100( 1- \omega )\pct $
+of the cases. Given a fixed confidence coefficient  $  \omega $,
+the most suitable are the shortest confidence intervals for which the mathematical expectation of the length  $  {\mathsf E} ( \gamma - \beta ) $
+attains its lowest value.
-If the mid-point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360168.png" /> of such an interval is taken as a point estimator for the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360169.png" />, then it can be claimed, with probability not less that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360170.png" />, that the absolute error of this statistical estimator does not exceed half the length of the interval, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360171.png" />. In other words, if one is guided by the rule of estimation of the absolute error, then an erroneous conclusion will be obtained on the average in less than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360172.png" /> of the cases. Given a fixed confidence coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360173.png" />, the most suitable are the shortest confidence intervals for which the mathematical expectation of the length <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360174.png" /> attains its lowest value.
+If the distribution of random variables  $  X _ {i} $
+depends only on one unknown parameter  $  a $,
+then the construction of the confidence interval is usually realized by the use of a certain point estimator  $  \alpha $.
+For the majority of cases of practical interest, the distribution function  $  {\mathsf P} \{ \alpha < x \} = F( x;  a) $
+of a sensibly chosen statistical estimator  $  \alpha $
+depends monotonically on the parameter  $  a $.
+Under these conditions, when seeking an interval estimator it makes sense to insert  $  x = \alpha $
+in  $  F( x;  a) $
+and to determine the roots  $  a _ {1} = a _ {1} ( \alpha , \omega ) $
+and  $  a _ {2} = a _ {2} ( \alpha , \omega ) $
+of the equations
-If the distribution of random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360175.png" /> depends only on one unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360176.png" />, then the construction of the confidence interval is usually realized by the use of a certain point estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360177.png" />. For the majority of cases of practical interest, the distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360178.png" /> of a sensibly chosen statistical estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360179.png" /> depends monotonically on the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360180.png" />. Under these conditions, when seeking an interval estimator it makes sense to insert <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360181.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360182.png" /> and to determine the roots <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360183.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360184.png" /> of the equations
+$$ \tag{9 }
+F( \alpha ;  a _ {1} )  = \
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360185.png" /></td> <td valign="top" style="width:5%;text-align:right;">(9)</td></tr></table>
+\frac{1 - \omega }{2}
+ \  \textrm{ and } \ \
+F( \alpha + 0;  a _ {2} )  =
+\frac{1 + \omega }{2}
+ ,
+$$
 where
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360186.png" /></td> </tr></table>
+$$
+F( x+ 0;  a)  =  \lim\limits _ {\Delta \rightarrow 0 }  F( x + \Delta  ^ {2} ;  a)
+$$
+(for continuous distributions  $  F( x+ 0;  a) = F( x;  a) $).
+The points with coordinates  $  a _ {1} ( \alpha ;  \omega ) $
+and  $  a _ {2} ( \alpha ;  \omega ) $
+bound the confidence interval with confidence coefficient  $  \omega $.
+It is reasonable to expect that such a simply constructed interval differs in many cases from the optimal (shortest) interval. However, if  $  \alpha $
+is an asymptotically efficient statistical estimator for  $  a $,
+then, given a sufficiently large number of observations, such an interval estimator differs from the optimal, although in practice the difference is immaterial. This is particularly true for maximum-likelihood estimators, since they are asymptotically normally distributed (see (8)). In cases where solving the equations (9) is difficult, the interval estimator is calculated approximately, using a maximum-likelihood point estimator and the relation (8):
+$$
+\beta  \approx  \beta  ^ {*}  =  \alpha - x \sigma _ {n} ( \alpha ) \  \textrm{ and } \ \
+\gamma  \approx  \gamma  ^ {*}  =  \alpha + x \sigma _ {n} ( \alpha ) ,
+$$
-(for continuous distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360187.png" />). The points with coordinates <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360188.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360189.png" /> bound the confidence interval with confidence coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360190.png" />. It is reasonable to expect that such a simply constructed interval differs in many cases from the optimal (shortest) interval. However, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360191.png" /> is an asymptotically efficient statistical estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360192.png" />, then, given a sufficiently large number of observations, such an interval estimator differs from the optimal, although in practice the difference is immaterial. This is particularly true for maximum-likelihood estimators, since they are asymptotically normally distributed (see (8)). In cases where solving the equations (9) is difficult, the interval estimator is calculated approximately, using a maximum-likelihood point estimator and the relation (8):
+where  $  x $
+is the root of the equation  $  \phi ( x) = ( 1+ \omega )/2 $.
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360193.png" /></td> </tr></table>
+If  $  n \rightarrow \infty $,
+then the true confidence coefficient of the interval estimator  $  ( \beta  ^ {*} , \gamma  ^ {*} ) $
+tends to  $  \omega $.
+In a more general case, the distribution of the results of observations  $  X _ {i} $
+depends on various parameters  $  a, b , .  .  . $.
+Then the above rules for the construction of confidence intervals often prove to be not feasible, since the distribution of a point estimator  $  \alpha $
+depends, as a rule, not only on  $  a $,
+but also on other parameters. However, in cases of practical interest the statistical estimator  $  \alpha $
+can be replaced by a function of the observations  $  X _ {i} $
+and an unknown parameter  $  a $,
+the distribution of which does not depend (or  "nearly does not depend" ) on all unknown parameters. An example of such a function is a normalized maximum-likelihood estimator  $  ( \alpha - a)/ \sigma _ {n} ( a, b , .  .  . ) $;
+if in the denominator the arguments  $  a, b , .  .  . $
+are replaced by maximum-likelihood estimators  $  \alpha , \beta \dots $
+then the limit distribution will remain the same as in formula (8). The approximate confidence intervals for each parameter in isolation can therefore be constructed in the same way as in the case of a single parameter.
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360194.png" /> is the root of the equation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360195.png" />.
+As has already been noted, if  $  X _ {1} \dots X _ {n} \dots $
+are independent and identically normally distributed random variables, then  $  \overline{X}\; $
+and  $  s  ^ {2} $
+are the best statistical estimators for the parameters  $  a $
+and  $  \sigma  ^ {2} $,
+respectively. The distribution function of the statistical estimator is expressed by the formula
-If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360196.png" />, then the true confidence coefficient of the interval estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360197.png" /> tends to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360198.png" />. In a more general case, the distribution of the results of observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360199.png" /> depends on various parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360200.png" />. Then the above rules for the construction of confidence intervals often prove to be not feasible, since the distribution of a point estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360201.png" /> depends, as a rule, not only on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360202.png" />, but also on other parameters. However, in cases of practical interest the statistical estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360203.png" /> can be replaced by a function of the observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360204.png" /> and an unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360205.png" />, the distribution of which does not depend (or  "nearly does not depend" ) on all unknown parameters. An example of such a function is a normalized maximum-likelihood estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360206.png" />; if in the denominator the arguments <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360207.png" /> are replaced by maximum-likelihood estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360208.png" /> then the limit distribution will remain the same as in formula (8). The approximate confidence intervals for each parameter in isolation can therefore be constructed in the same way as in the case of a single parameter.
+$$
+{\mathsf P} \{ \overline{X}\; < x \}  =  \Phi \left [
+\frac{\sqrt n ( x- a) } \sigma
+ \right ]
+$$
-As has already been noted, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360209.png" /> are independent and identically normally distributed random variables, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360210.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360211.png" /> are the best statistical estimators for the parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360212.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360213.png" />, respectively. The distribution function of the statistical estimator is expressed by the formula
+and, consequently, it depends not only on  $  a $
+but also on  $  \sigma $.
+At the same time, the distribution of the so-called Student statistic
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360214.png" /></td> </tr></table>
+$$
-and, consequently, it depends not only on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360215.png" /> but also on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360216.png" />. At the same time, the distribution of the so-called Student statistic
+\frac{\sqrt n ( \overline{X}\; - a) }{s}
+  =  \tau
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360217.png" /></td> </tr></table>
+does not depend on  $  a $
+or  $  \sigma $,
+and
-does not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360218.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360219.png" />, and
+$$
+{\mathsf P} \{ | \tau | \leq  t \}  = \
+\omega _ {n-1} ( t)  =  C _ {n-1}
+\int\limits _ { 0 } ^ { t }  \left ( 1+
+\frac{\nu  ^ {2} }{n-1}
+  \right )  ^ {-n/2}  d \nu ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360220.png" /></td> </tr></table>
+where the constant  $  C _ {n-1} $
+is chosen so that the equality  $  \omega _ {n-1} ( \infty ) = 1 $
+is satisfied. Thus, the confidence coefficient  $  \omega _ {n-1} ( t) $
+corresponds to the confidence interval
-where the constant <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360221.png" /> is chosen so that the equality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360222.png" /> is satisfied. Thus, the confidence coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360223.png" /> corresponds to the confidence interval
+$$
+{\overline{X}\; - }
+\frac{st}{\sqrt n }
+  <  a  <  {\overline{X}\; + }
+\frac{st}{\sqrt n }
+ .
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360224.png" /></td> </tr></table>
+The distribution of the estimator  $  s  ^ {2} $
+depends only on  $  \sigma  ^ {2} $,
+while the distribution function of  $  s  ^ {2} $
+is defined by the formula
-The distribution of the estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360225.png" /> depends only on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360226.png" />, while the distribution function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360227.png" /> is defined by the formula
+$$
+{\mathsf P} \left \{ s  ^ {2} <
+\frac{\sigma  ^ {2} x }{n-1} \right \}  = \
+G _ {n-1} ( x)  = \
+D _ {n-1} \int\limits _ { 0 } ^ { x }  v  ^ {(} n- 3)/2 e  ^ {- v/2}  dv,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360228.png" /></td> </tr></table>
+where the constant  $  D _ {n-1} $
+is defined by the condition  $  G _ {n-1} ( \infty ) = 1 $(
+the so-called  $  \chi  ^ {2} $-
+distribution with  $  n- 1 $
+degrees of freedom, cf. [[Chi-squared distribution| Chi-squared  distribution]]). Since the probability  $  {\mathsf P} \{ s  ^ {2} < \sigma  ^ {2} x/( n- 1) \} $
+increases monotonically when  $  \sigma $
+increases, rule (9) can be used to construct an interval estimator. Thus, if  $  x _ {1} $
+and  $  x _ {2} $
+are the roots of the equations  $  G _ {n-1} ( x _ {1} ) = ( 1- \omega )/2 $
+and  $  G _ {n-1} ( x _ {2} ) = ( 1+ \omega )/2 $,
+then the confidence coefficient  $  \omega $
+corresponds to the confidence interval
-where the constant <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360229.png" /> is defined by the condition <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360230.png" /> (the so-called <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360231.png" />-distribution with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360232.png" /> degrees of freedom, cf. [["Chi-squared" distribution| "Chi-squared"  distribution]]). Since the probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360233.png" /> increases monotonically when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360234.png" /> increases, rule (9) can be used to construct an interval estimator. Thus, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360235.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360236.png" /> are the roots of the equations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360237.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360238.png" />, then the confidence coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360239.png" /> corresponds to the confidence interval
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360240.png" /></td> </tr></table>
+\frac{( n- 1) s  ^ {2} }{x _ {2} }
+  <  \sigma  ^ {2}  <
+\frac{( n- 1) s  ^ {2} }{x _ {1} }
+ .
+$$
 Hence it follows that the confidence interval for the relative error is defined by the inequalities
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360241.png" /></td> </tr></table>
+$$
-Detailed tables of the Student distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360242.png" /> and of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360243.png" />-distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360244.png" /> can be found in most textbooks on mathematical statistics.
+\frac{x _ {1} }{n-1}
+ - 1  <
+\frac{s  ^ {2} - \sigma  ^ {2} }{\sigma  ^ {2} }
+  <
+\frac{x _ {2} }{n-1}
+  - 1.
+$$
-Until now it has been supposed that the distribution function of the results of observations is known up to values of various parameters. However, in practice the form of the distribution function is often unknown. In this case, when estimating the parameters, the so-called [[Non-parametric methods in statistics|non-parametric methods in statistics]] can prove useful (i.e. methods which do not depend on the initial probability distribution). Suppose, for example, that the median <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360245.png" /> of a theoretical continuous distribution of independent random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360246.png" /> has to be estimated (for symmetric distributions, the median coincides with the mathematical expectation, provided, of course, that it exists). Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360247.png" /> be the same variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360248.png" /> arranged in ascending order. Then, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360249.png" /> is an integer which satisfies the inequalities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360250.png" />,
+Detailed tables of the Student distribution function  $  \omega _ {n-1} ( t) $
+and of the  $  \chi  ^ {2} $-
+distribution  $  G _ {n-1} ( x) $
+can be found in most textbooks on mathematical statistics.
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360251.png" /></td> </tr></table>
+Until now it has been supposed that the distribution function of the results of observations is known up to values of various parameters. However, in practice the form of the distribution function is often unknown. In this case, when estimating the parameters, the so-called [[Non-parametric methods in statistics|non-parametric methods in statistics]] can prove useful (i.e. methods which do not depend on the initial probability distribution). Suppose, for example, that the median  $  m $
+of a theoretical continuous distribution of independent random variables  $  X _ {1} \dots X _ {n} $
+has to be estimated (for symmetric distributions, the median coincides with the mathematical expectation, provided, of course, that it exists). Let  $  Y _ {1} \leq  \dots \leq  Y _ {n} $
+be the same variables  $  X _ {i} $
+arranged in ascending order. Then, if  $  k $
+is an integer which satisfies the inequalities  $  1 \leq  k \leq  n/2 $,
-Thus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360252.png" /> is an interval estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360253.png" /> with confidence coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360254.png" />. This conclusion holds for any continuous distribution of the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360255.png" />.
+$$
+{\mathsf P} \{ Y _ {k} < m < Y _ {n- k+ 1} \}  = \
+- 2 \sum_{r=0}^ {k-1} \left ( \begin{array}{c}
+n \\
+ r
+\end{array}
+ \right ) \left
+(
+\frac{1}{2}
+ \right )  ^ {n}  = \
+\omega _ {n,k} .
+$$
-It has already been noted that a sample distribution is a point estimator for an unknown theoretical distribution. Moreover, the sample distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360256.png" /> is an unbiased estimator for a theoretical distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360257.png" />. Here, as A.N. Kolmogorov demonstrated, the distribution of the statistic
+Thus,  $  ( Y _ {k} , Y _ {n- k+ 1} ) $
+is an interval estimator for  $  m $
+with confidence coefficient  $  \omega = \omega _ {n,k} $.
+This conclusion holds for any continuous distribution of the random variables  $  X _ {i} $.
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360258.png" /></td> </tr></table>
+It has already been noted that a sample distribution is a point estimator for an unknown theoretical distribution. Moreover, the sample distribution function  $  F _ {n} ( x) $
+is an unbiased estimator for a theoretical distribution function  $  F( x) $.
+Here, as A.N. Kolmogorov demonstrated, the distribution of the statistic
-does not depend on the unknown theoretical distribution and, when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360259.png" />, tends to a limit distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360260.png" />, which is called a Kolmogorov distribution. Thus, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360261.png" /> is the solution of the equation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360262.png" />, then it can be claimed, with probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360263.png" />, that the graph of the function of the theoretical distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360264.png" /> is completely  "covered"  by a strip enclosed between the graphs of the functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360265.png" /> (when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360266.png" />, the difference between the exact and limit distributions of the statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360267.png" /> is immaterial). An interval estimator of this type is called a confidence region. See also [[Interval estimator|Interval estimator]].
+$$
+\lambda _ {n}  =  \sqrt n \max _ {- \infty < x < \infty }  | F _ {n} ( x) - F( x) |
+$$
+does not depend on the unknown theoretical distribution and, when  $  n \rightarrow \infty $,
+tends to a limit distribution  $  K( y) $,
+which is called a Kolmogorov distribution. Thus, if  $  y $
+is the solution of the equation  $  K( y) = \omega $,
+then it can be claimed, with probability  $  \omega $,
+that the graph of the function of the theoretical distribution function  $  F( y) $
+is completely  "covered"  by a strip enclosed between the graphs of the functions  $  F _ {n} ( x) \pm  y/ \sqrt n $(
+when  $  n \geq  20 $,
+the difference between the exact and limit distributions of the statistic  $  \lambda _ {n} $
+is immaterial). An interval estimator of this type is called a confidence region. See also [[Interval estimator|Interval estimator]].
 ==Statistical estimators in the theory of errors.==
@@ Line 216: / Line 657: @@
 The theory of errors is based on a mathematical model according to which the totality of all conceivable results of the measurements is treated as the set of values of a certain random variable. The theory of statistical estimators is therefore of considerable importance. The conclusions drawn from the theory of errors are of a statistical character. The sense and content of these conclusions (and indeed of the conclusions of the theory of statistical estimation) become clear only in the light of the [[Law of large numbers|law of large numbers]] (an example of this approach is the statistical interpretation of the sense of the confidence coefficient discussed above).
-In proposing the result of a measurement <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360268.png" /> of a random variable, there are three separate basic types of error measurements: systematic, random and gross (qualitative descriptions of these errors are given under [[Errors, theory of|Errors, theory of]]). Here, the difference <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360269.png" /> is called the error of the measurement of the unknown variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360270.png" />; the mathematical expectation of this difference, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360271.png" />, is called the systematic error (if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360272.png" />, then the measurements are said to be free of systematic errors), while the difference <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360273.png" /> is called the random error (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360274.png" />). Thus, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360275.png" /> independent measurements of the variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360276.png" /> are taken, then their results can be written in the form of the equalities
+In proposing the result of a measurement  $  X $
+of a random variable, there are three separate basic types of error measurements: systematic, random and gross (qualitative descriptions of these errors are given under [[Errors, theory of|Errors, theory of]]). Here, the difference  $  X- a $
+is called the error of the measurement of the unknown variable  $  a $;
+the mathematical expectation of this difference,  $  {\mathsf E} ( X- a) = b $,
+is called the systematic error (if  $  b= 0 $,
+then the measurements are said to be free of systematic errors), while the difference  $  \delta = X- a- b $
+is called the random error ( $  {\mathsf E} \delta = 0 $).
+Thus, if  $  n $
+independent measurements of the variable  $  a $
+are taken, then their results can be written in the form of the equalities
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360277.png" /></td> <td valign="top" style="width:5%;text-align:right;">(10)</td></tr></table>
+$$ \tag{10 }
+X _ {i}  =  a + b + \delta _ {i} ,\ \
+i = 1 \dots n,
+$$
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360278.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360279.png" /> are constants, while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360280.png" /> are random variables. In a more general case
+where  $  a $
+and  $  b $
+are constants, while  $  \delta _ {i} $
+are random variables. In a more general case
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360281.png" /></td> <td valign="top" style="width:5%;text-align:right;">(11)</td></tr></table>
+$$ \tag{11 }
+X _ {i}  =  a + ( b + \beta _ {i} ) + \delta _ {i} ,\ \
+i = 1 \dots n,
+$$
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360282.png" /> are random variables which do not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360283.png" />, and which are equal to zero with probability very close to one (every other value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360284.png" /> is therefore improbable). The values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360285.png" /> are called the gross errors (or outliers).
+where  $  \beta _ {i} $
+are random variables which do not depend on  $  \delta _ {i} $,
+and which are equal to zero with probability very close to one (every other value  $  \beta _ {i} \neq 0 $
+is therefore improbable). The values  $  \beta _ {i} $
+are called the gross errors (or outliers).
-The problem of estimating (and eliminating) systematic errors does not normally fall within the limits of mathematical statistics. Two exceptions to this rule are the standard method, in which, when estimating <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360286.png" />, a series of measurements of the known value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360287.png" /> is made (in this method, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360288.png" /> is a value to be estimated and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360289.png" /> is a known systematic error) and dispersion analysis, in which the systematic divergence between various series of measurements is estimated.
+The problem of estimating (and eliminating) systematic errors does not normally fall within the limits of mathematical statistics. Two exceptions to this rule are the standard method, in which, when estimating  $  b $,
+a series of measurements of the known value  $  a $
+is made (in this method,  $  b $
+is a value to be estimated and  $  a $
+is a known systematic error) and dispersion analysis, in which the systematic divergence between various series of measurements is estimated.
-The fundamental problem in the theory of errors is to find a statistical estimator for an unknown variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360290.png" /> and to estimate the accuracy of the measurements. If the systematic error is eliminated <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360291.png" /> and the observations do not contain gross errors, then according to (10), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360292.png" />, and in this case the problem of estimating <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360293.png" /> reduces to the problem of finding the optimal statistical estimator in one sense or another for the mathematical expectation of the identically distributed random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360294.png" />. As shown above, the form of such a statistical (point or interval) estimator depends essentially on the distribution law of the random errors. If this law is known up to various unknown parameters, then the maximum-likelihood method can be used to find an estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360295.png" />; in the alternative case, a statistical estimator for an unknown distribution function of the random errors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360296.png" /> has to be found, using the results of the observations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360297.png" /> (the  "non-parametric"  interval estimator of this function is shown above). In practice, two statistical estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360298.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360299.png" /> often suffice (see (1) and (2)). If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360300.png" /> are identically normally distributed, then these statistical estimators are the best; in other cases, these estimators can prove to be quite inefficient.
+The fundamental problem in the theory of errors is to find a statistical estimator for an unknown variable  $  a $
+and to estimate the accuracy of the measurements. If the systematic error is eliminated  $  ( b= 0) $
+and the observations do not contain gross errors, then according to (10),  $  X _ {i} = a + \delta _ {i} $,
+and in this case the problem of estimating  $  a $
+reduces to the problem of finding the optimal statistical estimator in one sense or another for the mathematical expectation of the identically distributed random variables  $  X _ {i} $.
+As shown above, the form of such a statistical (point or interval) estimator depends essentially on the distribution law of the random errors. If this law is known up to various unknown parameters, then the maximum-likelihood method can be used to find an estimator for  $  a $;
+in the alternative case, a statistical estimator for an unknown distribution function of the random errors  $  \delta _ {i} $
+has to be found, using the results of the observations of  $  X _ {i} $(
+the  "non-parametric"  interval estimator of this function is shown above). In practice, two statistical estimators  $  \overline{X}\; \approx a $
+and  $  s  ^ {2} \approx {\mathsf D} \delta _ {i} $
+often suffice (see (1) and (2)). If  $  \delta _ {i} $
+are identically normally distributed, then these statistical estimators are the best; in other cases, these estimators can prove to be quite inefficient.
-The appearance of outliers (gross errors) complicates the problem of estimating the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360301.png" />. The proportion of observations in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360302.png" /> is usually small, while the mathematical expectation of non-zero <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360303.png" /> is significantly higher than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360304.png" /> (gross errors arise as a result of random miscalculation, incorrect reading of the measuring equipment, etc.). Results of measurements which contain gross errors are often easily spotted, as they differ greatly from the other results. Under these conditions, the most advisable means of identifying (and eliminating) gross errors is to carry out a direct analysis of the measurements, to check carefully that all experiments were carried out under the same conditions, to make a  "double note"  of the results, etc. Statistical methods of finding gross errors are only to be used in cases of doubt.
+The appearance of outliers (gross errors) complicates the problem of estimating the parameter  $  a $.
+The proportion of observations in which  $  \beta _ {i} \neq 0 $
+is usually small, while the mathematical expectation of non-zero  $  | \beta _ {i} | $
+is significantly higher than  $  \sqrt { {\mathsf D} \delta _ {i} } $(
+gross errors arise as a result of random miscalculation, incorrect reading of the measuring equipment, etc.). Results of measurements which contain gross errors are often easily spotted, as they differ greatly from the other results. Under these conditions, the most advisable means of identifying (and eliminating) gross errors is to carry out a direct analysis of the measurements, to check carefully that all experiments were carried out under the same conditions, to make a  "double note"  of the results, etc. Statistical methods of finding gross errors are only to be used in cases of doubt.
-The simplest example of these methods is the statistical occurrence of an outlier, when either <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360305.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360306.png" /> is open to doubt (it is proposed that in the equalities (11) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360307.png" /> and that the distribution law of the variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360308.png" /> is known). In order to establish whether the hypothesis of the presence of an outlier is justified, a joint interval estimator (or prediction region) for the pair <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360309.png" /> is calculated (a confidence region), by proposing that all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360310.png" /> are equal to zero. If this statistical estimator  "covers"  the point with coordinates <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360311.png" />, then the doubt over the presence of an outlier has to be considered statistically unjustified; in the alternative case, the hypothesis of the absence of an outlier has to be accepted (the rejected theory is then usually discarded, as it is statistically impossible to reliably estimate the value of the outlier at all using one observation).
+The simplest example of these methods is the statistical occurrence of an outlier, when either  $  Y _ {1} = \min  X _ {1} $
+or  $  Y _ {n} = \max  X _ {i} $
+is open to doubt (it is proposed that in the equalities (11)  $  b= 0 $
+and that the distribution law of the variables  $  \delta _ {i} $
+is known). In order to establish whether the hypothesis of the presence of an outlier is justified, a joint interval estimator (or prediction region) for the pair  $  Y _ {1} , Y _ {n} $
+is calculated (a confidence region), by proposing that all  $  \beta _ {i} $
+are equal to zero. If this statistical estimator  "covers"  the point with coordinates  $  ( Y _ {1} , Y _ {n} ) $,
+then the doubt over the presence of an outlier has to be considered statistically unjustified; in the alternative case, the hypothesis of the absence of an outlier has to be accepted (the rejected theory is then usually discarded, as it is statistically impossible to reliably estimate the value of the outlier at all using one observation).
-For example, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360312.png" /> be unknown, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360313.png" /> and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360314.png" /> be independent and identically normally distributed (the variance is unknown). If all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360315.png" />, then the distribution of the random variable
+For example, let  $  a $
+be unknown, let  $  b= 0 $
+and let  $  \delta _ {i} $
+be independent and identically normally distributed (the variance is unknown). If all  $  \beta _ {i} = 0 $,
+then the distribution of the random variable
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360316.png" /></td> </tr></table>
+$$
+Z  =
+\frac{\max | X _ {i} - \overline{X}\; | }{\widehat{s}  }
-does not depend on unknown parameters (the statistical estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360317.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360318.png" /> are calculated, using all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360319.png" /> observations, according to the formulas (1) and (2)). For large values
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360320.png" /></td> </tr></table>
+does not depend on unknown parameters (the statistical estimators  $  X $
+and  $  \widehat{s}  $
+are calculated, using all  $  n $
+observations, according to the formulas (1) and (2)). For large values
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360321.png" /> is the Student distribution function, as defined above. Thus, with confidence coefficient
+$$
+{\mathsf P} \{ Z > z \}  \approx  n \left [ 1 - \omega _ {n-2} \left (
+z {\sqrt {
+\frac{n- 2 }{n- 1- z  ^ {2} }
+ } } \right ) \right ] ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360322.png" /></td> <td valign="top" style="width:5%;text-align:right;">(12)</td></tr></table>
+where  $  \omega _ {r} ( t) $
+is the Student distribution function, as defined above. Thus, with confidence coefficient
-it can be claimed that in the absence of an outlier the inequality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360323.png" /> is satisfied, or, put another way,
+$$ \tag{12 }
+\omega  \approx  1 - n \left [ 1 - \omega _ {n-2} \left ( z \sqrt {n-
+\frac{2}{n- 1- z  ^ {2} }
+ } \right ) \right ]
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360324.png" /></td> </tr></table>
+it can be claimed that in the absence of an outlier the inequality  $  Z < z $
+is satisfied, or, put another way,
-(The error in the estimation of the confidence coefficient by means of formula (12) does not exceed <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360325.png" />.) Therefore, if all results of the measurements of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360326.png" /> fall within the limits <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087360/s087360327.png" />, then there are no grounds for supposing that any measurement contains an outlier.
+$$
+\overline{X}\; - z \widehat{s}   <  Y _ {1}  <  Y _ {n}  <  \overline{X}\; + z \widehat{s}  .
+$$
+(The error in the estimation of the confidence coefficient by means of formula (12) does not exceed  $  \omega  ^ {2} /2 $.)
+Therefore, if all results of the measurements of  $  X _ {i} $
+fall within the limits  $  X \pm  z \widehat{s}  $,
+then there are no grounds for supposing that any measurement contains an outlier.
 ====References====
 <table><TR><TD valign="top">[1]</TD> <TD valign="top">  H. Cramér,   M.R. Leadbetter,   "Stationary and related stochastic processes" , Wiley  (1967)  pp. Chapts. 33–34</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  N.V. Smirnov,   I.V. Dunin-Barkovskii,   "Mathematische Statistik in der Technik" , Deutsch. Verlag Wissenschaft.  (1969)  (Translated from Russian)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  Yu.V. Linnik,   "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft.  (1961)  (Translated from Russian)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  B.L. van der Waerden,   "Mathematische Statistik" , Springer  (1957)</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  N. Arley,   K.R. Buch,   "Introduction to the theory of probability and statistics" , Wiley  (1950)</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top">  A.N. Kolmogorov,   "On the statistical estimation of the parameters of the Gauss distribution"  ''Izv. Akad. Nauk SSSR Ser. Mat.'' , '''6''' :  1–2  (1942)  pp. 3–32  (In Russian)  (French abstract)</TD></TR></table>
 ====Comments====