Namespaces
Variants
Actions

Difference between revisions of "Order statistic"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
(gather refs)
 
(4 intermediate revisions by one other user not shown)
Line 1: Line 1:
A member of the series of order statistics (also called [[Variational series|variational series]]) based on the results of observations. Let a random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o0700701.png" /> be observed which assumes values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o0700702.png" /> in an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o0700703.png" />-dimensional Euclidean space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o0700704.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o0700705.png" />, and let, further, a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o0700706.png" /> be given on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o0700707.png" /> by the rule
+
<!--
 +
o0700701.png
 +
$#A+1 = 125 n = 1
 +
$#C+1 = 125 : ~/encyclopedia/old_files/data/O070/O.0700070 Order statistic
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o0700708.png" /></td> </tr></table>
+
{{TEX|auto}}
 +
{{TEX|done}}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o0700709.png" /> is a vector in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007010.png" /> obtained from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007011.png" /> by rearranging its coordinates <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007012.png" /> in ascending order of magnitude, i.e. the components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007013.png" /> of the vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007014.png" /> satisfy the relation
+
A member of the series of order statistics (also called [[variational series]]) based on the results of observations. Let a random vector $  X = ( X _ {1} \dots X _ {n} ) $
 +
be observed which assumes values  $  x = ( x _ {1} \dots x _ {n} ) $
 +
in an  $  n $-
 +
dimensional Euclidean space  $  \mathbf R  ^ {n} $,
 +
$  n \geq  2 $,
 +
and let, further, a function  $  \phi ( \cdot ) : \mathbf R  ^ {n} \rightarrow \mathbf R  ^ {n} $
 +
be given on  $  \mathbf R  ^ {n} $
 +
by the rule
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007015.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+
$$
 +
\phi ( x)  = x ^ {( \cdot ) } ,\ \
 +
x \in \mathbf R  ^ {n} ,
 +
$$
  
In this case the statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007016.png" /> is the series (or vector) of order statistics, and its <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007017.png" />-th component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007018.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007019.png" />) is called the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007021.png" />-th order statistic.
+
where  $  x ^ {( \cdot ) } = ( x _ {(} n1) \dots x _ {(} nn) ) $
 +
is a vector in  $  \mathbf R  ^ {n} $
 +
obtained from  $  x $
 +
by rearranging its coordinates  $  x _ {1} \dots x _ {n} $
 +
in ascending order of magnitude, i.e. the components  $  x _ {(} n1) \dots x _ {(} nn) $
 +
of the vector  $  x ^ {( \cdot ) } $
 +
satisfy the relation
  
In the theory of order statistics the best studied case is the one where the components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007022.png" /> of the random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007023.png" /> are independent random variables having the same distribution, as is assumed hereafter. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007024.png" /> is the distribution function of the random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007025.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007026.png" />, then the distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007027.png" /> of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007028.png" />-th order statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007029.png" /> is given by the formula
+
$$ \tag{1 }
 +
x _ {(} n1)  \leq  \dots \leq  x _ {(} nn) .
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007030.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+
In this case the statistic  $  X ^ {( \cdot ) } = \phi ( X) = ( X _ {(} n1) \dots X _ {(} nn) ) $
 +
is the series (or vector) of order statistics, and its  $  k $-
 +
th component  $  X _ {nk} $(
 +
$  k = 1 \dots n $)
 +
is called the  $  k $-
 +
th order statistic.
 +
 
 +
In the theory of order statistics the best studied case is the one where the components  $  X _ {1} \dots X _ {n} $
 +
of the random vector  $  X $
 +
are independent random variables having the same distribution, as is assumed hereafter. If  $  F ( u) $
 +
is the distribution function of the random variable  $  X _ {i} $,
 +
$  i = 1 \dots n $,
 +
then the distribution function  $  F _ {nk} ( u) $
 +
of the  $  k $-
 +
th order statistic  $  X _ {(} nk) $
 +
is given by the formula
 +
 
 +
$$ \tag{2 }
 +
F _ {nk} ( u)  =  {\mathsf P} \{ X _ {(} nk) \leq  u \}  = \
 +
I _ {F(} u) ( k , n - k + 1 ) ,
 +
$$
  
 
where
 
where
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007031.png" /></td> </tr></table>
+
$$
 +
I _ {y} ( a , b )  =
 +
\frac{1}{B ( a , b ) }
 +
 
 +
\int\limits _ { 0 } ^ { y }  x  ^ {a-} 1 ( 1 - x )  ^ {b-} 1  dx
 +
$$
  
is the [[Incomplete beta-function|incomplete beta-function]]. From (2) it follows that if the distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007032.png" /> has probability density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007033.png" />, then the probability density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007034.png" /> of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007035.png" />-th order statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007036.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007037.png" />, also exists and is given by the formula
+
is the [[Incomplete beta-function|incomplete beta-function]]. From (2) it follows that if the distribution function $  F( u) $
 +
has probability density $  f ( u) $,  
 +
then the probability density $  f _ {nk} ( u) $
 +
of the $  k $-
 +
th order statistic $  X _ {(} nk) $,  
 +
$  k = 1 \dots n $,  
 +
also exists and is given by the formula
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007038.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
+
$$ \tag{3 }
 +
f _ {nk} ( u)  = \frac{n!}{( k - 1 ) ! ( n - k ) ! }
 +
[ F ( u) ]  ^ {k-} 1 [ 1 - F ( u) ]  ^ {n-} k f ( u) ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007039.png" /></td> </tr></table>
+
$$
 +
- \infty  < < \infty .
 +
$$
  
Assuming the existence of the probability density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007040.png" /> one obtains the joint probability density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007041.png" /> of the order statistics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007042.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007043.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007044.png" />, which is given by the formula
+
Assuming the existence of the probability density $  f ( u) $
 +
one obtains the joint probability density $  f _ {r _ {1}  \dots r _ {k} } ( u _ {1} \dots u _ {k} ) $
 +
of the order statistics $  X _ {( nr _ {1}  ) } \dots X _ {( nr _ {k}  ) } $,
 +
$  1 \leq  r _ {1} < \dots < r _ {k} \leq  n $,  
 +
$  k \leq  n $,  
 +
which is given by the formula
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007045.png" /></td> <td valign="top" style="width:5%;text-align:right;">(4)</td></tr></table>
+
$$ \tag{4 }
 +
f _ {r _ {1}  \dots r _ {k} } ( u _ {1} \dots u _ {k} ) =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007046.png" /></td> </tr></table>
+
$$
 +
= \
 +
\frac{n!} {( r _ {1} - 1 ) ! ( r _ {2} - r _ {1} - 1 ) ! \dots ( n - r _ {k} ) ! } \times
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007047.png" /></td> </tr></table>
+
$$
 +
\times
 +
F ^ { r _ {1} - 1 } ( u _ {1} ) f ( u _ {1} ) [ F ( u _ {2} ) -
 +
F ( u _ {1} ) ] ^ {r _ {2} - r _ {1} - 1 } f ( u _ {2} ) \dots
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007048.png" /></td> </tr></table>
+
$$
 +
\dots [ 1 - F ( u _ {k} ) ] ^ {n - r _ {k} } f ( u _ {k} ) ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007049.png" /></td> </tr></table>
+
$$
 +
- \infty  < u _ {1}  < \dots < u _ {k}  < \infty .
 +
$$
  
 
The formulas (2)–(4) allow one, for instance, to find the distribution of the so-called extremal order statistics (or sample minimum and sample maximum)
 
The formulas (2)–(4) allow one, for instance, to find the distribution of the so-called extremal order statistics (or sample minimum and sample maximum)
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007050.png" /></td> </tr></table>
+
$$
 +
X _ {(} n1)  = \min _ {1 \leq  i \leq  n } \
 +
( X _ {1} \dots X _ {n} ) \  \textrm{ and } \ \
 +
X _ {(} nn)  = \max _ {1 \leq  i \leq  n } \
 +
( X _ {1} \dots X _ {n} ) ,
 +
$$
  
and also the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007051.png" />, called the range statistic (or sample range). For instance, if the distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007052.png" /> is continuous, then the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007053.png" /> is given by
+
and also the distribution of $  W _ {n} = X _ {(} nn) - X _ {(} n1) $,  
 +
called the range statistic (or sample range). For instance, if the distribution function $  F ( u) $
 +
is continuous, then the distribution of $  W _ {n} $
 +
is given by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007054.png" /></td> <td valign="top" style="width:5%;text-align:right;">(5)</td></tr></table>
+
$$ \tag{5 }
 +
{\mathsf P} \{ W _ {n} < w \}  = n \int\limits _ {- \infty } ^  \infty 
 +
[ F ( u + w ) - F ( u) ]  ^ {n-} 1  d F ( u) ,\  w \geq  0 .
 +
$$
  
Formulas (2)–(5) show that, as in the general theory of sampling methods, exact distributions of order statistics cannot be used to obtain statistical inferences if the distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007055.png" /> is unknown. It is precisely for this reason that asymptotic methods for the distribution functions of order statistics, as the dimension <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007056.png" /> of the vector of observations tends to infinity, have been widely developed in the theory of order statistics. In the asymptotic theory of order statistics one studies the limit distributions of appropriately standardized sequences of order statistics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007057.png" /> as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007058.png" />; moreover, generally speaking, the order number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007059.png" /> can change as a function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007060.png" />. If the order number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007061.png" /> changes as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007062.png" /> tends to infinity in such a way that the limit <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007063.png" /> exists and is not equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007064.png" /> or to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007065.png" />, then the corresponding order statistics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007066.png" /> of the considered sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007067.png" /> are called central or mean order statistics. If, however, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007068.png" /> is equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007069.png" /> or to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007070.png" />, then they are called extreme order statistics.
+
Formulas (2)–(5) show that, as in the general theory of sampling methods, exact distributions of order statistics cannot be used to obtain statistical inferences if the distribution function $  F ( u) $
 +
is unknown. It is precisely for this reason that asymptotic methods for the distribution functions of order statistics, as the dimension $  n $
 +
of the vector of observations tends to infinity, have been widely developed in the theory of order statistics. In the asymptotic theory of order statistics one studies the limit distributions of appropriately standardized sequences of order statistics $  \{ X _ {(} nk) \} $
 +
as $  n \rightarrow \infty $;  
 +
moreover, generally speaking, the order number $  k $
 +
can change as a function of $  n $.  
 +
If the order number $  k $
 +
changes as $  n $
 +
tends to infinity in such a way that the limit $  \lim\limits _ {n \rightarrow \infty }  k / n $
 +
exists and is not equal to 0 $
 +
or to $  1 $,  
 +
then the corresponding order statistics $  X _ {(} nk) $
 +
of the considered sequence $  \{ X _ {(} nk) \} $
 +
are called central or mean order statistics. If, however, $  \lim\limits _ {n \rightarrow \infty }  k/n $
 +
is equal to 0 $
 +
or to $  1 $,  
 +
then they are called extreme order statistics.
  
In mathematical statistics central order statistics are used to construct consistent sequences of estimators (cf. [[Consistent estimator|Consistent estimator]]) for quantiles (cf. [[Quantile|Quantile]]) of the unknown distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007071.png" /> based on the realization of a random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007072.png" /> or, in other words, to estimate the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007073.png" />. For instance, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007074.png" /> be a quantile of level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007075.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007076.png" />) of the distribution function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007077.png" /> about which one knowns that its probability density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007078.png" /> is continuous and strictly positive in some neighbourhood of the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007079.png" />. In this case the sequence of central order statistics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007080.png" /> with order numbers <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007081.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007082.png" /> is the integer part of the real number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007083.png" />, is a sequence of consistent estimators for the quantiles <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007084.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007085.png" />. Moreover, this sequence of order statistics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007086.png" /> has an asymptotically normal distribution with parameters
+
In mathematical statistics central order statistics are used to construct consistent sequences of estimators (cf. [[Consistent estimator|Consistent estimator]]) for quantiles (cf. [[Quantile|Quantile]]) of the unknown distribution $  F ( u) $
 +
based on the realization of a random vector $  X $
 +
or, in other words, to estimate the function $  F ^ { - 1 } ( u) $.  
 +
For instance, let $  x _ {P} $
 +
be a quantile of level $  P $(
 +
$  0 < P < 1 $)  
 +
of the distribution function $  F ( u) $
 +
about which one knowns that its probability density $  f ( u) $
 +
is continuous and strictly positive in some neighbourhood of the point $  x _ {P} $.  
 +
In this case the sequence of central order statistics $  \{ X _ {(} nk) \} $
 +
with order numbers $  k = [ ( n+ 1) P + 0 ,5 ] $,
 +
where $  [ a] $
 +
is the integer part of the real number $  a $,  
 +
is a sequence of consistent estimators for the quantiles $  x _ {P} $,  
 +
$  n \rightarrow \infty $.  
 +
Moreover, this sequence of order statistics $  \{ X _ {(} nk) \} $
 +
has an asymptotically normal distribution with parameters
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007087.png" /></td> </tr></table>
+
$$
 +
x _ {P} \  \textrm{ and } \ 
 +
\frac{P ( 1 - P ) }{f ^ { 2 } ( x _ {P} ) ( n + 1 ) }
 +
,
 +
$$
  
i.e. for any real <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007088.png" />
+
i.e. for any real $  x $
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007089.png" /></td> <td valign="top" style="width:5%;text-align:right;">(6)</td></tr></table>
+
$$ \tag{6 }
 +
\lim\limits _ {n \rightarrow \infty }  {\mathsf P} \left \{
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007090.png" /> is the standard normal distribution function.
+
\frac{X _ {(} nk) - x _ {P} }{\sqrt {P( 1 - P) / ( n + 1) } }
 +
f ( x _ {P} ) < x \right \}  = \Phi ( x) ,
 +
$$
  
Example 1. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007091.png" /> be a vector of order statistics based on a random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007092.png" />. The components of this vector are assumed to be independent random variables having the same probability distribution with a probability density that is continuous and positive in some neighbourhood of the median <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007093.png" />. In this case the sequence of sample medians <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007094.png" />, defined for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007095.png" /> by
+
where  $  \Phi ( x) $
 +
is the standard normal distribution function.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007096.png" /></td> </tr></table>
+
Example 1. Let  $  X ^ {( \cdot ) } = ( X _ {(} n1) \dots X _ {(} nn) ) $
 +
be a vector of order statistics based on a random vector  $  X = ( X _ {1} \dots X _ {n} ) $.  
 +
The components of this vector are assumed to be independent random variables having the same probability distribution with a probability density that is continuous and positive in some neighbourhood of the median  $  x _ {1/2} $.  
 +
In this case the sequence of sample medians  $  \{ \mu _ {n} \} $,
 +
defined for any  $  n \geq  2 $
 +
by
  
has an asymptotically normal distribution, as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007097.png" />, with parameters
+
$$
 +
\mu _ {n}  =  \left \{
 +
\begin{array}{ll}
 +
X _ {(} n,m+ 1)  & \textrm{ for }  n= 2m+ 1  \textrm{ odd } , \\
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007098.png" /></td> </tr></table>
+
\frac{1}{2}
 +
( X _ {(} nm) + X _ {(} n,m+ 1) )  & \textrm{ for }  n= 2m  \textrm{ even }  \\
 +
\end{array}
  
In particular, if
+
\right .$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o07007099.png" /></td> </tr></table>
+
has an asymptotically normal distribution, as  $  n \rightarrow \infty $,
 +
with parameters
  
that is, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070100.png" /> has the normal distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070101.png" />, then the sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070102.png" /> is asymptotically normally distributed with parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070103.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070104.png" />. If the sequence of statistics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070105.png" /> is compared with the sequence of best unbiased estimators (cf. [[Unbiased estimator|Unbiased estimator]])
+
$$
 +
x _ {1/2} \  \textrm{ and } \  \{ 4 ( n+ 1) f ^ { 2 } ( x _ {1/2} ) \}  ^ {-} 1 .
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070106.png" /></td> </tr></table>
+
In particular, if
  
for the mean <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070107.png" /> of the normal distribution, then one should prefer the sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070108.png" />, since
+
$$
 +
f ( x)  =
 +
\frac{1}{\sqrt {2 \pi \sigma }}
 +
  \mathop{\rm exp}
 +
\left \{ -
 +
\frac{( x - a )  ^ {2} }{2 \sigma  ^ {2} }
 +
\right \} ,\ \
 +
| a | \langle  \infty ,\  \sigma \rangle 0 ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070109.png" /></td> </tr></table>
+
that is,  $  X _ {i} $
 +
has the normal distribution  $  N ( a , \sigma  ^ {2} ) $,
 +
then the sequence  $  \{ \mu _ {n} \} $
 +
is asymptotically normally distributed with parameters  $  x _ {1/2} = a $
 +
and  $  \sigma  ^ {2} \pi / ( 2 ( n+ 1) ) $.  
 +
If the sequence of statistics  $  \{ \mu _ {n} \} $
 +
is compared with the sequence of best unbiased estimators (cf. [[Unbiased estimator|Unbiased estimator]])
  
for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070110.png" />.
+
$$
 +
\{ \overline{X}\; _ {n} \} ,\  \overline{X}\; _ {n}  =
 +
\frac{1}{n}
  
Example 2. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070111.png" /> be the vector of order statistics based on the random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070112.png" /> whose components are independent and uniformly distributed on an interval <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070113.png" />; moreover, suppose that the parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070114.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070115.png" /> are unknown. In this case the sequences <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070116.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070117.png" /> of statistics, where
+
\sum _ { i= } 1 ^ { n }  X _ {i} ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070118.png" /></td> </tr></table>
+
for the mean  $  a $
 +
of the normal distribution, then one should prefer the sequence  $  \{ \overline{X}\; _ {n} \} $,
 +
since
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070119.png" /></td> </tr></table>
+
$$
 +
{\mathsf D} \overline{X}\; _ {n}  =
 +
\frac{\sigma  ^ {2} }{n}
 +
  < \
  
are consistent sequences of superefficient unbiased estimators (cf. [[Superefficient estimator|Superefficient estimator]]) for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070120.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070121.png" />, respectively. Moreover,
+
\frac{\sigma  ^ {2} \pi }{2 ( n + 1 ) }
 +
  \approx  {\mathsf D} \mu _ {n}  $$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070122.png" /></td> </tr></table>
+
for any  $  n \geq  2 $.
  
One can show that the sequences <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070123.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070124.png" /> define the best estimators for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070125.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070126.png" /> in the sense of the minimum of the square risk in the class of linear unbiased estimators expressed in terms of order statistics.
+
Example 2. Let  $  X ^ {( \cdot ) } = ( X _ {(} n1) \dots X _ {(} nn) ) $
 +
be the vector of order statistics based on the random vector  $  X = ( X _ {1} \dots X _ {n} ) $
 +
whose components are independent and uniformly distributed on an interval  $  [ a - h , a + h ] $;
 +
moreover, suppose that the parameters  $  a $
 +
and $  h $
 +
are unknown. In this case the sequences  $  \{ Y _ {n} \} $
 +
and  $  \{ Z _ {n} \} $
 +
of statistics, where
  
====References====
+
$$
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  H. Cramér,  "Mathematical methods of statistics" , Princeton Univ. Press (1946)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  S.S. Wilks,  "Mathematical statistics" , Princeton Univ. Press  (1950)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  H.A. David,  "Order statistics" , Wiley  (1970)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  E.J. Gumble,  "Statistics of extremes" , Columbia Univ. Press  (1958)</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top"> J. Hájek,  Z. Sidák,  "Theory of rank tests" , Acad. Press (1967)</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top">  B.V. Gnedenko,  "Limit theorems for the maximal term of a variational series"  ''Dokl. Akad. Nauk SSSR'' , '''32''' : 1 (1941)  pp. 7–9  (In Russian)</TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top">  B.V. Gnedenko,  "Sur la distribution limite du terme maximum d'une série aléatoire"  ''Ann. of Math.'' , '''44''' :  3  (1943)  pp. 423–453</TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top">  N.V. Smirnov,  "Limit distributions for the terms of a variational series"  ''Trudy Mat. Inst. Steklov.'' , '''25'''  (1949)  pp. 5–59  (In Russian)</TD></TR><TR><TD valign="top">[9]</TD> <TD valign="top">  N.V. Smirnov,  "Some remarks on limit laws for order statistics"  ''Theor. Probab. Appl.'' , '''12''' :  2 (1967)  pp. 337–339  ''Teor. Veroyatnost. i Primenen.'' , '''12''' :  2  (1967)  pp. 391–392</TD></TR><TR><TD valign="top">[10]</TD> <TD valign="top">  D.M. Chibisov,  "On limit distributions for order statistics"  ''Theor. Probab. Appl.'' , '''9''' : 1 (1964) pp. 142–148  ''Teor. Veroyatnost. Primenen.'' , '''9''' :  1  (1964)  pp. 159–165</TD></TR><TR><TD valign="top">[11]</TD> <TD valign="top">  A.T. Craig,  "On the distributions of certain statistics"  ''Amer. J. Math.'' , '''54'''  (1932) pp. 353–366</TD></TR><TR><TD valign="top">[12]</TD> <TD valign="top">  L.H.C. Tippett,  "On the extreme individuals and the range of samples taken from a normal population"  ''Biometrika'' , '''17'''  (1925) pp. 364–387</TD></TR><TR><TD valign="top">[13]</TD> <TD valign="top">  E.S. Pearson,  "The percentage limits for the distribution of ranges in samples from a normal population (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070127.png" />)"  ''Biometrika'' , '''24'''  (1932)  pp. 404–417</TD></TR></table>
+
Y _ {n}  =
 +
\frac{1}{2}
 +
  ( X _ {(} n1) + X _ {(} nn) ) \ \textrm{ and } \ \
 +
Z _ {n} n+
 +
\frac{1}{2(}
 +
  n- 1) ( X _ {(} nn) - X _ {(} n1) ),
 +
$$
  
 +
$$
 +
n  \geq  2 ,
 +
$$
  
 +
are consistent sequences of superefficient unbiased estimators (cf. [[Superefficient estimator|Superefficient estimator]]) for  $  a $
 +
and  $  h $,
 +
respectively. Moreover,
  
====Comments====
+
$$
 +
{\mathsf D} Y _ {n}  =
 +
\frac{2 h  ^ {2} }{( n+ 1) ( n+ 2) }
 +
\  \textrm{ and } \ \
 +
{\mathsf D} Z _ {n}  =
 +
\frac{2 h  ^ {2} }{( n- 1) ( n+ 2) }
 +
.
 +
$$
  
 +
One can show that the sequences  $  \{ Y _ {n} \} $
 +
and  $  \{ Z _ {n} \} $
 +
define the best estimators for  $  a $
 +
and  $  h $
 +
in the sense of the minimum of the square risk in the class of linear unbiased estimators expressed in terms of order statistics.
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  R.J. Serfling,  "Approximation theorems of mathematical statistics" , Wiley  (1980)</TD></TR></table>
+
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  H. Cramér,  "Mathematical methods of statistics" , Princeton Univ. Press  (1946)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  S.S. Wilks,  "Mathematical statistics" , Princeton Univ. Press  (1950)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  H.A. David,  "Order statistics" , Wiley  (1970)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  E.J. Gumble,  "Statistics of extremes" , Columbia Univ. Press  (1958)</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  J. Hájek,  Z. Sidák,  "Theory of rank tests" , Acad. Press  (1967)</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top">  B.V. Gnedenko,  "Limit theorems for the maximal term of a variational series"  ''Dokl. Akad. Nauk SSSR'' , '''32''' :  1  (1941)  pp. 7–9  (In Russian)</TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top">  B.V. Gnedenko,  "Sur la distribution limite du terme maximum d'une série aléatoire"  ''Ann. of Math.'' , '''44''' :  3  (1943)  pp. 423–453</TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top">  N.V. Smirnov,  "Limit distributions for the terms of a variational series"  ''Trudy Mat. Inst. Steklov.'' , '''25'''  (1949)  pp. 5–59  (In Russian)</TD></TR><TR><TD valign="top">[9]</TD> <TD valign="top">  N.V. Smirnov,  "Some remarks on limit laws for order statistics"  ''Theor. Probab. Appl.'' , '''12''' :  2  (1967)  pp. 337–339  ''Teor. Veroyatnost. i Primenen.'' , '''12''' :  2  (1967)  pp. 391–392</TD></TR><TR><TD valign="top">[10]</TD> <TD valign="top">  D.M. Chibisov,  "On limit distributions for order statistics"  ''Theor. Probab. Appl.'' , '''9''' :  1  (1964)  pp. 142–148  ''Teor. Veroyatnost. Primenen.'' , '''9''' :  1  (1964)  pp. 159–165</TD></TR><TR><TD valign="top">[11]</TD> <TD valign="top">  A.T. Craig,  "On the distributions of certain statistics"  ''Amer. J. Math.'' , '''54'''  (1932)  pp. 353–366</TD></TR><TR><TD valign="top">[12]</TD> <TD valign="top">  L.H.C. Tippett,  "On the extreme individuals and the range of samples taken from a normal population"  ''Biometrika'' , '''17'''  (1925)  pp. 364–387</TD></TR><TR><TD valign="top">[13]</TD> <TD valign="top">  E.S. Pearson,  "The percentage limits for the distribution of ranges in samples from a normal population (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/o/o070/o070070/o070070127.png" />)"  ''Biometrika'' , '''24'''  (1932)  pp. 404–417</TD></TR>
 +
<TR><TD valign="top">[a1]</TD> <TD valign="top">  R.J. Serfling,  "Approximation theorems of mathematical statistics" , Wiley  (1980)</TD></TR></table>

Latest revision as of 09:08, 10 April 2023


A member of the series of order statistics (also called variational series) based on the results of observations. Let a random vector $ X = ( X _ {1} \dots X _ {n} ) $ be observed which assumes values $ x = ( x _ {1} \dots x _ {n} ) $ in an $ n $- dimensional Euclidean space $ \mathbf R ^ {n} $, $ n \geq 2 $, and let, further, a function $ \phi ( \cdot ) : \mathbf R ^ {n} \rightarrow \mathbf R ^ {n} $ be given on $ \mathbf R ^ {n} $ by the rule

$$ \phi ( x) = x ^ {( \cdot ) } ,\ \ x \in \mathbf R ^ {n} , $$

where $ x ^ {( \cdot ) } = ( x _ {(} n1) \dots x _ {(} nn) ) $ is a vector in $ \mathbf R ^ {n} $ obtained from $ x $ by rearranging its coordinates $ x _ {1} \dots x _ {n} $ in ascending order of magnitude, i.e. the components $ x _ {(} n1) \dots x _ {(} nn) $ of the vector $ x ^ {( \cdot ) } $ satisfy the relation

$$ \tag{1 } x _ {(} n1) \leq \dots \leq x _ {(} nn) . $$

In this case the statistic $ X ^ {( \cdot ) } = \phi ( X) = ( X _ {(} n1) \dots X _ {(} nn) ) $ is the series (or vector) of order statistics, and its $ k $- th component $ X _ {nk} $( $ k = 1 \dots n $) is called the $ k $- th order statistic.

In the theory of order statistics the best studied case is the one where the components $ X _ {1} \dots X _ {n} $ of the random vector $ X $ are independent random variables having the same distribution, as is assumed hereafter. If $ F ( u) $ is the distribution function of the random variable $ X _ {i} $, $ i = 1 \dots n $, then the distribution function $ F _ {nk} ( u) $ of the $ k $- th order statistic $ X _ {(} nk) $ is given by the formula

$$ \tag{2 } F _ {nk} ( u) = {\mathsf P} \{ X _ {(} nk) \leq u \} = \ I _ {F(} u) ( k , n - k + 1 ) , $$

where

$$ I _ {y} ( a , b ) = \frac{1}{B ( a , b ) } \int\limits _ { 0 } ^ { y } x ^ {a-} 1 ( 1 - x ) ^ {b-} 1 dx $$

is the incomplete beta-function. From (2) it follows that if the distribution function $ F( u) $ has probability density $ f ( u) $, then the probability density $ f _ {nk} ( u) $ of the $ k $- th order statistic $ X _ {(} nk) $, $ k = 1 \dots n $, also exists and is given by the formula

$$ \tag{3 } f _ {nk} ( u) = \frac{n!}{( k - 1 ) ! ( n - k ) ! } [ F ( u) ] ^ {k-} 1 [ 1 - F ( u) ] ^ {n-} k f ( u) , $$

$$ - \infty < u < \infty . $$

Assuming the existence of the probability density $ f ( u) $ one obtains the joint probability density $ f _ {r _ {1} \dots r _ {k} } ( u _ {1} \dots u _ {k} ) $ of the order statistics $ X _ {( nr _ {1} ) } \dots X _ {( nr _ {k} ) } $, $ 1 \leq r _ {1} < \dots < r _ {k} \leq n $, $ k \leq n $, which is given by the formula

$$ \tag{4 } f _ {r _ {1} \dots r _ {k} } ( u _ {1} \dots u _ {k} ) = $$

$$ = \ \frac{n!} {( r _ {1} - 1 ) ! ( r _ {2} - r _ {1} - 1 ) ! \dots ( n - r _ {k} ) ! } \times $$

$$ \times F ^ { r _ {1} - 1 } ( u _ {1} ) f ( u _ {1} ) [ F ( u _ {2} ) - F ( u _ {1} ) ] ^ {r _ {2} - r _ {1} - 1 } f ( u _ {2} ) \dots $$

$$ \dots [ 1 - F ( u _ {k} ) ] ^ {n - r _ {k} } f ( u _ {k} ) , $$

$$ - \infty < u _ {1} < \dots < u _ {k} < \infty . $$

The formulas (2)–(4) allow one, for instance, to find the distribution of the so-called extremal order statistics (or sample minimum and sample maximum)

$$ X _ {(} n1) = \min _ {1 \leq i \leq n } \ ( X _ {1} \dots X _ {n} ) \ \textrm{ and } \ \ X _ {(} nn) = \max _ {1 \leq i \leq n } \ ( X _ {1} \dots X _ {n} ) , $$

and also the distribution of $ W _ {n} = X _ {(} nn) - X _ {(} n1) $, called the range statistic (or sample range). For instance, if the distribution function $ F ( u) $ is continuous, then the distribution of $ W _ {n} $ is given by

$$ \tag{5 } {\mathsf P} \{ W _ {n} < w \} = n \int\limits _ {- \infty } ^ \infty [ F ( u + w ) - F ( u) ] ^ {n-} 1 d F ( u) ,\ w \geq 0 . $$

Formulas (2)–(5) show that, as in the general theory of sampling methods, exact distributions of order statistics cannot be used to obtain statistical inferences if the distribution function $ F ( u) $ is unknown. It is precisely for this reason that asymptotic methods for the distribution functions of order statistics, as the dimension $ n $ of the vector of observations tends to infinity, have been widely developed in the theory of order statistics. In the asymptotic theory of order statistics one studies the limit distributions of appropriately standardized sequences of order statistics $ \{ X _ {(} nk) \} $ as $ n \rightarrow \infty $; moreover, generally speaking, the order number $ k $ can change as a function of $ n $. If the order number $ k $ changes as $ n $ tends to infinity in such a way that the limit $ \lim\limits _ {n \rightarrow \infty } k / n $ exists and is not equal to $ 0 $ or to $ 1 $, then the corresponding order statistics $ X _ {(} nk) $ of the considered sequence $ \{ X _ {(} nk) \} $ are called central or mean order statistics. If, however, $ \lim\limits _ {n \rightarrow \infty } k/n $ is equal to $ 0 $ or to $ 1 $, then they are called extreme order statistics.

In mathematical statistics central order statistics are used to construct consistent sequences of estimators (cf. Consistent estimator) for quantiles (cf. Quantile) of the unknown distribution $ F ( u) $ based on the realization of a random vector $ X $ or, in other words, to estimate the function $ F ^ { - 1 } ( u) $. For instance, let $ x _ {P} $ be a quantile of level $ P $( $ 0 < P < 1 $) of the distribution function $ F ( u) $ about which one knowns that its probability density $ f ( u) $ is continuous and strictly positive in some neighbourhood of the point $ x _ {P} $. In this case the sequence of central order statistics $ \{ X _ {(} nk) \} $ with order numbers $ k = [ ( n+ 1) P + 0 ,5 ] $, where $ [ a] $ is the integer part of the real number $ a $, is a sequence of consistent estimators for the quantiles $ x _ {P} $, $ n \rightarrow \infty $. Moreover, this sequence of order statistics $ \{ X _ {(} nk) \} $ has an asymptotically normal distribution with parameters

$$ x _ {P} \ \textrm{ and } \ \frac{P ( 1 - P ) }{f ^ { 2 } ( x _ {P} ) ( n + 1 ) } , $$

i.e. for any real $ x $

$$ \tag{6 } \lim\limits _ {n \rightarrow \infty } {\mathsf P} \left \{ \frac{X _ {(} nk) - x _ {P} }{\sqrt {P( 1 - P) / ( n + 1) } } f ( x _ {P} ) < x \right \} = \Phi ( x) , $$

where $ \Phi ( x) $ is the standard normal distribution function.

Example 1. Let $ X ^ {( \cdot ) } = ( X _ {(} n1) \dots X _ {(} nn) ) $ be a vector of order statistics based on a random vector $ X = ( X _ {1} \dots X _ {n} ) $. The components of this vector are assumed to be independent random variables having the same probability distribution with a probability density that is continuous and positive in some neighbourhood of the median $ x _ {1/2} $. In this case the sequence of sample medians $ \{ \mu _ {n} \} $, defined for any $ n \geq 2 $ by

$$ \mu _ {n} = \left \{ \begin{array}{ll} X _ {(} n,m+ 1) & \textrm{ for } n= 2m+ 1 \textrm{ odd } , \\ \frac{1}{2} ( X _ {(} nm) + X _ {(} n,m+ 1) ) & \textrm{ for } n= 2m \textrm{ even } \\ \end{array} \right .$$

has an asymptotically normal distribution, as $ n \rightarrow \infty $, with parameters

$$ x _ {1/2} \ \textrm{ and } \ \{ 4 ( n+ 1) f ^ { 2 } ( x _ {1/2} ) \} ^ {-} 1 . $$

In particular, if

$$ f ( x) = \frac{1}{\sqrt {2 \pi \sigma }} \mathop{\rm exp} \left \{ - \frac{( x - a ) ^ {2} }{2 \sigma ^ {2} } \right \} ,\ \ | a | \langle \infty ,\ \sigma \rangle 0 , $$

that is, $ X _ {i} $ has the normal distribution $ N ( a , \sigma ^ {2} ) $, then the sequence $ \{ \mu _ {n} \} $ is asymptotically normally distributed with parameters $ x _ {1/2} = a $ and $ \sigma ^ {2} \pi / ( 2 ( n+ 1) ) $. If the sequence of statistics $ \{ \mu _ {n} \} $ is compared with the sequence of best unbiased estimators (cf. Unbiased estimator)

$$ \{ \overline{X}\; _ {n} \} ,\ \overline{X}\; _ {n} = \frac{1}{n} \sum _ { i= } 1 ^ { n } X _ {i} , $$

for the mean $ a $ of the normal distribution, then one should prefer the sequence $ \{ \overline{X}\; _ {n} \} $, since

$$ {\mathsf D} \overline{X}\; _ {n} = \frac{\sigma ^ {2} }{n} < \ \frac{\sigma ^ {2} \pi }{2 ( n + 1 ) } \approx {\mathsf D} \mu _ {n} $$

for any $ n \geq 2 $.

Example 2. Let $ X ^ {( \cdot ) } = ( X _ {(} n1) \dots X _ {(} nn) ) $ be the vector of order statistics based on the random vector $ X = ( X _ {1} \dots X _ {n} ) $ whose components are independent and uniformly distributed on an interval $ [ a - h , a + h ] $; moreover, suppose that the parameters $ a $ and $ h $ are unknown. In this case the sequences $ \{ Y _ {n} \} $ and $ \{ Z _ {n} \} $ of statistics, where

$$ Y _ {n} = \frac{1}{2} ( X _ {(} n1) + X _ {(} nn) ) \ \textrm{ and } \ \ Z _ {n} = n+ \frac{1}{2(} n- 1) ( X _ {(} nn) - X _ {(} n1) ), $$

$$ n \geq 2 , $$

are consistent sequences of superefficient unbiased estimators (cf. Superefficient estimator) for $ a $ and $ h $, respectively. Moreover,

$$ {\mathsf D} Y _ {n} = \frac{2 h ^ {2} }{( n+ 1) ( n+ 2) } \ \textrm{ and } \ \ {\mathsf D} Z _ {n} = \frac{2 h ^ {2} }{( n- 1) ( n+ 2) } . $$

One can show that the sequences $ \{ Y _ {n} \} $ and $ \{ Z _ {n} \} $ define the best estimators for $ a $ and $ h $ in the sense of the minimum of the square risk in the class of linear unbiased estimators expressed in terms of order statistics.

References

[1] H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
[2] S.S. Wilks, "Mathematical statistics" , Princeton Univ. Press (1950)
[3] H.A. David, "Order statistics" , Wiley (1970)
[4] E.J. Gumble, "Statistics of extremes" , Columbia Univ. Press (1958)
[5] J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)
[6] B.V. Gnedenko, "Limit theorems for the maximal term of a variational series" Dokl. Akad. Nauk SSSR , 32 : 1 (1941) pp. 7–9 (In Russian)
[7] B.V. Gnedenko, "Sur la distribution limite du terme maximum d'une série aléatoire" Ann. of Math. , 44 : 3 (1943) pp. 423–453
[8] N.V. Smirnov, "Limit distributions for the terms of a variational series" Trudy Mat. Inst. Steklov. , 25 (1949) pp. 5–59 (In Russian)
[9] N.V. Smirnov, "Some remarks on limit laws for order statistics" Theor. Probab. Appl. , 12 : 2 (1967) pp. 337–339 Teor. Veroyatnost. i Primenen. , 12 : 2 (1967) pp. 391–392
[10] D.M. Chibisov, "On limit distributions for order statistics" Theor. Probab. Appl. , 9 : 1 (1964) pp. 142–148 Teor. Veroyatnost. Primenen. , 9 : 1 (1964) pp. 159–165
[11] A.T. Craig, "On the distributions of certain statistics" Amer. J. Math. , 54 (1932) pp. 353–366
[12] L.H.C. Tippett, "On the extreme individuals and the range of samples taken from a normal population" Biometrika , 17 (1925) pp. 364–387
[13] E.S. Pearson, "The percentage limits for the distribution of ranges in samples from a normal population ()" Biometrika , 24 (1932) pp. 404–417
[a1] R.J. Serfling, "Approximation theorems of mathematical statistics" , Wiley (1980)
How to Cite This Entry:
Order statistic. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Order_statistic&oldid=18995
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article