Namespaces
Variants
Actions

Difference between revisions of "Rank vector"

From Encyclopedia of Mathematics
Jump to: navigation, search
m (tex encoded by computer)
m (fixing subscripts)
 
Line 13: Line 13:
 
A vector statistic (cf. [[Statistics|Statistics]])  $  R = ( R _ {1} \dots R _ {n} ) $
 
A vector statistic (cf. [[Statistics|Statistics]])  $  R = ( R _ {1} \dots R _ {n} ) $
 
constructed from a random observation vector  $  X = ( X _ {1} \dots X _ {n} ) $
 
constructed from a random observation vector  $  X = ( X _ {1} \dots X _ {n} ) $
with  $  i $-
+
with  $  i $-th component  $  R _ {i} = R _ {i} ( X) $,  
th component  $  R _ {i} = R _ {i} ( X) $,  
 
 
$  i = 1 \dots n $,  
 
$  i = 1 \dots n $,  
 
defined by
 
defined by
  
 
$$  
 
$$  
R _ {i}  =  \sum _ { j= } 1 ^ { n }  \delta ( X _ {i} - X _ {j} ) ,
+
R _ {i}  =  \sum _ { j= 1} ^ { n }  \delta ( X _ {i} - X _ {j} ) ,
 
$$
 
$$
  
Line 37: Line 36:
  
 
The statistic  $  R _ {i} $
 
The statistic  $  R _ {i} $
is called the rank of the  $  i $-
+
is called the rank of the  $  i $-th component  $  X _ {i} $,  
th component  $  X _ {i} $,  
 
 
$  i = 1 \dots n $,  
 
$  i = 1 \dots n $,  
 
of the random vector  $  X $.  
 
of the random vector  $  X $.  
Line 57: Line 55:
 
of the rank  $  R _ {i} $
 
of the rank  $  R _ {i} $
 
is equal to the number of components of  $  X $
 
is equal to the number of components of  $  X $
whose observed values do not exceed the realization of the  $  i $-
+
whose observed values do not exceed the realization of the  $  i $-th component  $  X _ {i} $,  
th component  $  X _ {i} $,  
 
 
$  i = 1 \dots n $.
 
$  i = 1 \dots n $.
  
Let  $  X ^ {( \cdot ) } = ( X _ {(} n1) \dots X _ {(} nn) ) $
+
Let  $  X ^ {( \cdot ) } = ( X _ {( n1)} \dots X _ {( nn)} ) $
 
be the vector of order statistics (cf. [[Order statistic|Order statistic]]) constructed from the observation vector  $  X $.  
 
be the vector of order statistics (cf. [[Order statistic|Order statistic]]) constructed from the observation vector  $  X $.  
 
Then the pair  $  ( R , X ^ {( \cdot ) } ) $
 
Then the pair  $  ( R , X ^ {( \cdot ) } ) $
Line 85: Line 82:
 
$$ \tag{1 }
 
$$ \tag{1 }
 
p ( x)  =  p ( x _ {1} \dots x _ {n} )
 
p ( x)  =  p ( x _ {1} \dots x _ {n} )
  =  \prod _ { i= } 1 ^ { n }  f ( x _ {i} ) ,
+
  =  \prod _ { i= 1} ^ { n }  f ( x _ {i} ) ,
 
$$
 
$$
  
Line 130: Line 127:
 
\frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! }
 
\frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! }
 
  [ F (
 
  [ F (
x _ {i} ) ]  ^ {k-} 1 [ 1 - F ( x _ {i} ) ]  ^ {n-} k f ( x _ {i} ) ,
+
x _ {i} ) ]  ^ {k- 1} [ 1 - F ( x _ {i} ) ]  ^ {n- k }f ( x _ {i} ) ,
 
$$
 
$$
  
Line 137: Line 134:
 
It follows from (2) and (3) that the conditional density  $  q ( X _ {i} \mid  R _ {i} = k ) $
 
It follows from (2) and (3) that the conditional density  $  q ( X _ {i} \mid  R _ {i} = k ) $
 
of  $  X _ {i} $
 
of  $  X _ {i} $
given  $  R _ {i} = k $(
+
given  $  R _ {i} = k $ ($  k = 1 \dots n $)  
$  k = 1 \dots n $)  
 
 
is expressed by the formula
 
is expressed by the formula
  
Line 147: Line 143:
 
$$  
 
$$  
 
= \  
 
= \  
n! over {( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} )
+
{n! \over ( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} )
]  ^ {k-} 1 [ 1 - F ( x _ {i} ) ]  ^ {n-} k f ( x _ {i} ) .
+
]  ^ {k- 1} [ 1 - F ( x _ {i} ) ]  ^ {n- k} f ( x _ {i} ) .
 
$$
 
$$
  
Line 154: Line 150:
 
the rank vector  $  R $
 
the rank vector  $  R $
 
and the vector  $  X ^ {( \cdot ) } $
 
and the vector  $  X ^ {( \cdot ) } $
of order statistics, since (4) is just the probability density of the  $  k $-
+
of order statistics, since (4) is just the probability density of the  $  k $-th order statistic  $  X _ {( nk) }$,  
th order statistic  $  X _ {(} nk) $,  
 
 
$  k = 1 \dots n $.  
 
$  k = 1 \dots n $.  
 
Moreover, it follows from (3) that the conditional distribution of the rank  $  R _ {i} $
 
Moreover, it follows from (3) that the conditional distribution of the rank  $  R _ {i} $
Line 169: Line 164:
 
\frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! }
 
\frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! }
  
[ F ( X _ {i} ) ]  ^ {k-} 1 [ 1 - F ( X _ {i} ) ]  ^ {n-} k .
+
[ F ( X _ {i} ) ]  ^ {k- 1} [ 1 - F ( X _ {i} ) ]  ^ {n- k} .
 
$$
 
$$
  
Line 196: Line 191:
 
$$  
 
$$  
 
\rho ( X _ {i} , R _ {i} )  = \  
 
\rho ( X _ {i} , R _ {i} )  = \  
\sqrt {n-  
+
\sqrt {n- \frac{1}{n+1 }} .
\frac{1}{n+}
 
1 } .
 
 
$$
 
$$
  

Latest revision as of 07:07, 21 June 2022


A vector statistic (cf. Statistics) $ R = ( R _ {1} \dots R _ {n} ) $ constructed from a random observation vector $ X = ( X _ {1} \dots X _ {n} ) $ with $ i $-th component $ R _ {i} = R _ {i} ( X) $, $ i = 1 \dots n $, defined by

$$ R _ {i} = \sum _ { j= 1} ^ { n } \delta ( X _ {i} - X _ {j} ) , $$

where $ \delta ( x) $ is the characteristic function (indicator function) of $ [ 0 , + \infty ] $, that is,

$$ \delta ( x) = \ \left \{ \begin{array}{ll} 1 & \textrm{ if } x \geq 0 , \\ 0 & \textrm{ if } x < 0 . \\ \end{array} \right .$$

The statistic $ R _ {i} $ is called the rank of the $ i $-th component $ X _ {i} $, $ i = 1 \dots n $, of the random vector $ X $. This definition of a rank vector is precise under the condition

$$ {\mathsf P} \{ X _ {i} = X _ {j} \} = 0 ,\ \ i \neq j , $$

which automatically holds if the probability distribution of $ X $ is defined by a density $ p ( x) = p ( x _ {1} \dots x _ {n} ) $. It follows from the definition of a rank vector that, under these conditions, $ R $ takes values in the space $ \mathfrak R = \{ r \} $ of all permutations $ r = ( r _ {1} \dots r _ {n} ) $ of $ 1 \dots n $ and the realization $ r _ {i} $ of the rank $ R _ {i} $ is equal to the number of components of $ X $ whose observed values do not exceed the realization of the $ i $-th component $ X _ {i} $, $ i = 1 \dots n $.

Let $ X ^ {( \cdot ) } = ( X _ {( n1)} \dots X _ {( nn)} ) $ be the vector of order statistics (cf. Order statistic) constructed from the observation vector $ X $. Then the pair $ ( R , X ^ {( \cdot ) } ) $ is a sufficient statistic for the distribution of $ X $, and $ X $ itself can be uniquely recovered from $ ( R , X ^ {( \cdot ) } ) $. Moreover, under the additional assumption that the density $ p ( x) $ of $ X $ is symmetric with respect to permutations of the arguments, the components $ R $ and $ X ^ {( \cdot ) } $ of the sufficient statistic $ ( R , X ^ {( \cdot ) } ) $ are independent and

$$ {\mathsf P} \{ R = r \} = \frac{1}{n ! } ,\ \ r \in \mathfrak R . $$

In particular, if

$$ \tag{1 } p ( x) = p ( x _ {1} \dots x _ {n} ) = \prod _ { i= 1} ^ { n } f ( x _ {i} ) , $$

that is, the components $ X _ {1} \dots X _ {n} $ are independent identically-distributed random variables ( $ f ( x _ {i} ) $ stands for the density of $ X _ {i} $), then

$$ \tag{2 } \left . \begin{array}{c} {\mathsf P} \{ R _ {i} = k \} = \frac{1}{n} ,\ i = 1 \dots n , \\ {\mathsf P} \{ R _ {i} = k , R _ {j} = m \} = \frac{1}{n ( n - 1 ) } , \ i \neq j ,\ k \neq m , \\ {\mathsf E} \{ R _ {i} \} = n+ \frac{1}{2} ,\ {\mathsf D} \{ R _ {i} \} = \frac{n ^ {2} - 1 }{12} ,\ \ i = 1 \dots n , \\ \end{array} \right \} $$

for any $ k = 1 \dots n $.

If (1) holds, there is a joint density $ q ( x _ {i} , k ) $, $ k = 1 \dots n $, of $ X _ {i} $ and $ R _ {i} $, defined by the formula

$$ \tag{3 } q ( x _ {i} , k ) = $$

$$ = \ \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} ) ] ^ {k- 1} [ 1 - F ( x _ {i} ) ] ^ {n- k }f ( x _ {i} ) , $$

where $ F ( x _ {i} ) $ is the distribution function of $ X _ {i} $. It follows from (2) and (3) that the conditional density $ q ( X _ {i} \mid R _ {i} = k ) $ of $ X _ {i} $ given $ R _ {i} = k $ ($ k = 1 \dots n $) is expressed by the formula

$$ \tag{4 } q ( x _ {i} \mid R _ {i} = k ) = $$

$$ = \ {n! \over ( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} ) ] ^ {k- 1} [ 1 - F ( x _ {i} ) ] ^ {n- k} f ( x _ {i} ) . $$

The latter formula allows one to trace the internal connection between the observation vector $ X $, the rank vector $ R $ and the vector $ X ^ {( \cdot ) } $ of order statistics, since (4) is just the probability density of the $ k $-th order statistic $ X _ {( nk) }$, $ k = 1 \dots n $. Moreover, it follows from (3) that the conditional distribution of the rank $ R _ {i} $ is given by the formula

$$ {\mathsf P} \{ R _ {i} = k \mid X _ {i} \} = $$

$$ = \ \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } [ F ( X _ {i} ) ] ^ {k- 1} [ 1 - F ( X _ {i} ) ] ^ {n- k} . $$

Finally, under the assumption that the moments $ {\mathsf E} \{ X _ {i} \} $ and $ {\mathsf D} \{ X _ {i} \} $ exist and that (1) holds, (2) and (3) imply that the correlation coefficient $ \rho ( X _ {i} , R _ {i} ) $ between $ X _ {i} $ and $ R _ {i} $ is equal to

$$ \rho ( X _ {i} , R _ {i} ) = \ \sqrt { \frac{12 ( n - 1 ) }{( n + 1 ) {\mathsf D} \{ X _ {i} \} } } \int\limits _ {- \infty } ^ \infty x _ {i} \left [ F ( x _ {i} ) - \frac{1}{2} \right ] d F ( x _ {i} ) . $$

In particular, if $ X _ {i} $ is uniformly distributed on $ [ 0 , 1 ] $, then

$$ \rho ( X _ {i} , R _ {i} ) = \ \sqrt {n- \frac{1}{n+1 }} . $$

If $ X $ has the normal distribution $ N ( a , \sigma ^ {2} ) $, then

$$ \rho ( X _ {i} , R _ {i} ) = \ \sqrt { \frac{3 ( n - 1 ) }{\pi ( n + 1 ) } } , $$

and $ \rho ( X _ {i} , R _ {i} ) $ does not depend on the parameters of the normal distribution.

References

[1] W. Hoeffding, " "Optimum" nonparametric tests" , Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950) , Univ. California Press (1951) pp. 83–92
[2] J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)
[3] F.P. Tarasenko, "Non-parametric statistics" , Tomsk (1976) (In Russian)
How to Cite This Entry:
Rank vector. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rank_vector&oldid=52468
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article