# Rank vector

A vector statistic (cf. Statistics) $R = ( R _ {1} \dots R _ {n} )$ constructed from a random observation vector $X = ( X _ {1} \dots X _ {n} )$ with $i$-th component $R _ {i} = R _ {i} ( X)$, $i = 1 \dots n$, defined by

$$R _ {i} = \sum _ { j= 1} ^ { n } \delta ( X _ {i} - X _ {j} ) ,$$

where $\delta ( x)$ is the characteristic function (indicator function) of $[ 0 , + \infty ]$, that is,

$$\delta ( x) = \ \left \{ \begin{array}{ll} 1 & \textrm{ if } x \geq 0 , \\ 0 & \textrm{ if } x < 0 . \\ \end{array} \right .$$

The statistic $R _ {i}$ is called the rank of the $i$-th component $X _ {i}$, $i = 1 \dots n$, of the random vector $X$. This definition of a rank vector is precise under the condition

$${\mathsf P} \{ X _ {i} = X _ {j} \} = 0 ,\ \ i \neq j ,$$

which automatically holds if the probability distribution of $X$ is defined by a density $p ( x) = p ( x _ {1} \dots x _ {n} )$. It follows from the definition of a rank vector that, under these conditions, $R$ takes values in the space $\mathfrak R = \{ r \}$ of all permutations $r = ( r _ {1} \dots r _ {n} )$ of $1 \dots n$ and the realization $r _ {i}$ of the rank $R _ {i}$ is equal to the number of components of $X$ whose observed values do not exceed the realization of the $i$-th component $X _ {i}$, $i = 1 \dots n$.

Let $X ^ {( \cdot ) } = ( X _ {( n1)} \dots X _ {( nn)} )$ be the vector of order statistics (cf. Order statistic) constructed from the observation vector $X$. Then the pair $( R , X ^ {( \cdot ) } )$ is a sufficient statistic for the distribution of $X$, and $X$ itself can be uniquely recovered from $( R , X ^ {( \cdot ) } )$. Moreover, under the additional assumption that the density $p ( x)$ of $X$ is symmetric with respect to permutations of the arguments, the components $R$ and $X ^ {( \cdot ) }$ of the sufficient statistic $( R , X ^ {( \cdot ) } )$ are independent and

$${\mathsf P} \{ R = r \} = \frac{1}{n ! } ,\ \ r \in \mathfrak R .$$

In particular, if

$$\tag{1 } p ( x) = p ( x _ {1} \dots x _ {n} ) = \prod _ { i= 1} ^ { n } f ( x _ {i} ) ,$$

that is, the components $X _ {1} \dots X _ {n}$ are independent identically-distributed random variables ( $f ( x _ {i} )$ stands for the density of $X _ {i}$), then

$$\tag{2 } \left . \begin{array}{c} {\mathsf P} \{ R _ {i} = k \} = \frac{1}{n} ,\ i = 1 \dots n , \\ {\mathsf P} \{ R _ {i} = k , R _ {j} = m \} = \frac{1}{n ( n - 1 ) } , \ i \neq j ,\ k \neq m , \\ {\mathsf E} \{ R _ {i} \} = n+ \frac{1}{2} ,\ {\mathsf D} \{ R _ {i} \} = \frac{n ^ {2} - 1 }{12} ,\ \ i = 1 \dots n , \\ \end{array} \right \}$$

for any $k = 1 \dots n$.

If (1) holds, there is a joint density $q ( x _ {i} , k )$, $k = 1 \dots n$, of $X _ {i}$ and $R _ {i}$, defined by the formula

$$\tag{3 } q ( x _ {i} , k ) =$$

$$= \ \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} ) ] ^ {k- 1} [ 1 - F ( x _ {i} ) ] ^ {n- k }f ( x _ {i} ) ,$$

where $F ( x _ {i} )$ is the distribution function of $X _ {i}$. It follows from (2) and (3) that the conditional density $q ( X _ {i} \mid R _ {i} = k )$ of $X _ {i}$ given $R _ {i} = k$ ($k = 1 \dots n$) is expressed by the formula

$$\tag{4 } q ( x _ {i} \mid R _ {i} = k ) =$$

$$= \ {n! \over ( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} ) ] ^ {k- 1} [ 1 - F ( x _ {i} ) ] ^ {n- k} f ( x _ {i} ) .$$

The latter formula allows one to trace the internal connection between the observation vector $X$, the rank vector $R$ and the vector $X ^ {( \cdot ) }$ of order statistics, since (4) is just the probability density of the $k$-th order statistic $X _ {( nk) }$, $k = 1 \dots n$. Moreover, it follows from (3) that the conditional distribution of the rank $R _ {i}$ is given by the formula

$${\mathsf P} \{ R _ {i} = k \mid X _ {i} \} =$$

$$= \ \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } [ F ( X _ {i} ) ] ^ {k- 1} [ 1 - F ( X _ {i} ) ] ^ {n- k} .$$

Finally, under the assumption that the moments ${\mathsf E} \{ X _ {i} \}$ and ${\mathsf D} \{ X _ {i} \}$ exist and that (1) holds, (2) and (3) imply that the correlation coefficient $\rho ( X _ {i} , R _ {i} )$ between $X _ {i}$ and $R _ {i}$ is equal to

$$\rho ( X _ {i} , R _ {i} ) = \ \sqrt { \frac{12 ( n - 1 ) }{( n + 1 ) {\mathsf D} \{ X _ {i} \} } } \int\limits _ {- \infty } ^ \infty x _ {i} \left [ F ( x _ {i} ) - \frac{1}{2} \right ] d F ( x _ {i} ) .$$

In particular, if $X _ {i}$ is uniformly distributed on $[ 0 , 1 ]$, then

$$\rho ( X _ {i} , R _ {i} ) = \ \sqrt {n- \frac{1}{n+1 }} .$$

If $X$ has the normal distribution $N ( a , \sigma ^ {2} )$, then

$$\rho ( X _ {i} , R _ {i} ) = \ \sqrt { \frac{3 ( n - 1 ) }{\pi ( n + 1 ) } } ,$$

and $\rho ( X _ {i} , R _ {i} )$ does not depend on the parameters of the normal distribution.

#### References

 [1] W. Hoeffding, " "Optimum" nonparametric tests" , Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950) , Univ. California Press (1951) pp. 83–92 [2] J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967) [3] F.P. Tarasenko, "Non-parametric statistics" , Tomsk (1976) (In Russian)
How to Cite This Entry:
Rank vector. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rank_vector&oldid=52468
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article