Difference between revisions of "Rank vector"

Revision as of 14:53, 7 June 2020

A vector statistic (cf. Statistics) constructed from a random observation vector with -th component , , defined by

where is the characteristic function (indicator function) of , that is,

The statistic is called the rank of the -th component , , of the random vector . This definition of a rank vector is precise under the condition

which automatically holds if the probability distribution of is defined by a density . It follows from the definition of a rank vector that, under these conditions, takes values in the space of all permutations of and the realization of the rank is equal to the number of components of whose observed values do not exceed the realization of the -th component , .

Let be the vector of order statistics (cf. Order statistic) constructed from the observation vector . Then the pair is a sufficient statistic for the distribution of , and itself can be uniquely recovered from . Moreover, under the additional assumption that the density of is symmetric with respect to permutations of the arguments, the components and of the sufficient statistic are independent and

In particular, if

(1)

that is, the components are independent identically-distributed random variables ( stands for the density of ), then

(2)

for any .

If (1) holds, there is a joint density , , of and , defined by the formula

(3)

where is the distribution function of . It follows from (2) and (3) that the conditional density of given () is expressed by the formula

(4)

The latter formula allows one to trace the internal connection between the observation vector , the rank vector and the vector of order statistics, since (4) is just the probability density of the -th order statistic , . Moreover, it follows from (3) that the conditional distribution of the rank is given by the formula

Finally, under the assumption that the moments and exist and that (1) holds, (2) and (3) imply that the correlation coefficient between and is equal to

In particular, if is uniformly distributed on , then

If has the normal distribution , then

and does not depend on the parameters of the normal distribution.

References

[1]	W. Hoeffding, " "Optimum" nonparametric tests" , Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950) , Univ. California Press (1951) pp. 83–92
[2]	J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)
[3]	F.P. Tarasenko, "Non-parametric statistics" , Tomsk (1976) (In Russian)

How to Cite This Entry:
Rank vector. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rank_vector&oldid=48436

This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

@@ Line 1: / Line 1: @@
-<!--
+A vector statistic (cf. [[Statistics|Statistics]]) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775401.png" /> constructed from a random observation vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775402.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775403.png" />-th component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775404.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775405.png" />, defined by
-r0775401.png
-$#A+1 = 81 n = 0
-$#C+1 = 81 : ~/encyclopedia/old_files/data/R077/R.0707540 Rank vector
-Automatically converted into TeX, above some diagnostics.
-Please remove this comment and the {{TEX|auto}} line below,
-if TeX found to be correct.
--->
-{{TEX|auto}}
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775406.png" /></td> </tr></table>
-{{TEX|done}}
-A vector statistic (cf. [[Statistics|Statistics]])  $  R = ( R _ {1} \dots R _ {n} ) $
+where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775407.png" /> is the characteristic function (indicator function) of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775408.png" />, that is,
-constructed from a random observation vector  $  X = ( X _ {1} \dots X _ {n} ) $
-with  $  i $-
-th component  $  R _ {i} = R _ {i} ( X) $,
-$  i = 1 \dots n $,
-defined by
-$$
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775409.png" /></td> </tr></table>
-R _ {i}  =  \sum _ { j= } 1 ^ { n }  \delta ( X _ {i} - X _ {j} ) ,
-$$
-where  $  \delta ( x) $
+The statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754010.png" /> is called the rank of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754011.png" />-th component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754012.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754013.png" />, of the random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754014.png" />. This definition of a rank vector is precise under the condition
-is the characteristic function (indicator function) of  $  [ 0 , + \infty ] $,
-that is,
-$$
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754015.png" /></td> </tr></table>
-\delta ( x)  = \
-\left \{
-The statistic  $  R _ {i} $
+which automatically holds if the probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754016.png" /> is defined by a density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754017.png" />. It follows from the definition of a rank vector that, under these conditions, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754018.png" /> takes values in the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754019.png" /> of all permutations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754020.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754021.png" /> and the realization <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754022.png" /> of the rank <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754023.png" /> is equal to the number of components of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754024.png" /> whose observed values do not exceed the realization of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754025.png" />-th component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754026.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754027.png" />.
-is called the rank of the  $  i $-
-th component  $  X _ {i} $,
-$  i = 1 \dots n $,
-of the random vector  $  X $.
-This definition of a rank vector is precise under the condition
-$$
+Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754028.png" /> be the vector of order statistics (cf. [[Order statistic|Order statistic]]) constructed from the observation vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754029.png" />. Then the pair <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754030.png" /> is a [[Sufficient statistic|sufficient statistic]] for the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754031.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754032.png" /> itself can be uniquely recovered from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754033.png" />. Moreover, under the additional assumption that the density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754034.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754035.png" /> is symmetric with respect to permutations of the arguments, the components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754036.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754037.png" /> of the sufficient statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754038.png" /> are independent and
-{\mathsf P} \{ X _ {i} = X _ {j} \}  =  0 ,\ \
-i \neq j ,
-$$
-which automatically holds if the probability distribution of  $  X $
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754039.png" /></td> </tr></table>
-is defined by a density  $  p ( x) = p ( x _ {1} \dots x _ {n} ) $.
-It follows from the definition of a rank vector that, under these conditions,  $  R $
-takes values in the space  $  \mathfrak R = \{ r \} $
-of all permutations  $  r = ( r _ {1} \dots r _ {n} ) $
-of  $  1 \dots n $
-and the realization  $  r _ {i} $
-of the rank  $  R _ {i} $
-is equal to the number of components of  $  X $
-whose observed values do not exceed the realization of the  $  i $-
-th component  $  X _ {i} $,
-$  i = 1 \dots n $.
-Let  $  X ^ {( \cdot ) } = ( X _ {(} n1) \dots X _ {(} nn) ) $
-be the vector of order statistics (cf. [[Order statistic|Order statistic]]) constructed from the observation vector  $  X $.
-Then the pair  $  ( R , X ^ {( \cdot ) } ) $
-is a [[Sufficient statistic|sufficient statistic]] for the distribution of  $  X $,
-and  $  X $
-itself can be uniquely recovered from  $  ( R , X ^ {( \cdot ) } ) $.
-Moreover, under the additional assumption that the density  $  p ( x) $
-of  $  X $
-is symmetric with respect to permutations of the arguments, the components  $  R $
-and  $  X ^ {( \cdot ) } $
-of the sufficient statistic  $  ( R , X ^ {( \cdot ) } ) $
-are independent and
-$$
-{\mathsf P} \{ R = r \}  =
-\frac{1}{n ! }
- ,\ \
-r \in \mathfrak R .
-$$
 In particular, if
-$$ \tag{1 }
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754040.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
-p ( x)  =  p ( x _ {1} \dots x _ {n} )
- =  \prod _ { i= } 1 ^ { n }  f ( x _ {i} ) ,
-$$
-that is, the components  $  X _ {1} \dots X _ {n} $
-are independent identically-distributed random variables ( $  f ( x _ {i} ) $
-stands for the density of  $  X _ {i} $),
-then
-$$ \tag{2 }
-\left .
-for any  $  k = 1 \dots n $.
-If (1) holds, there is a joint density  $  q ( x _ {i} , k ) $,
+that is, the components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754041.png" /> are independent identically-distributed random variables (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754042.png" /> stands for the density of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754043.png" />), then
-$  k = 1 \dots n $,
-of  $  X _ {i} $
-and  $  R _ {i} $,
-defined by the formula
-$$ \tag{3 }
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754044.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
-q ( x _ {i} , k ) =
-$$
-$$
+for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754045.png" />.
-= \
-\frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! }
+If (1) holds, there is a joint density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754046.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754047.png" />, of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754048.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754049.png" />, defined by the formula
- [ F (
-x _ {i} ) ]  ^ {k-} 1 [ 1 - F ( x _ {i} ) ]  ^ {n-} k f ( x _ {i} ) ,
-$$
-where  $  F ( x _ {i} ) $
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754050.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
-is the distribution function of  $  X _ {i} $.
-It follows from (2) and (3) that the conditional density  $  q ( X _ {i} \mid  R _ {i} = k ) $
-of  $  X _ {i} $
-given  $  R _ {i} = k $(
-$  k = 1 \dots n $)
-is expressed by the formula
-$$ \tag{4 }
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754051.png" /></td> </tr></table>
-q ( x _ {i} \mid  R _ {i} = k ) =
-$$
-$$
+where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754052.png" /> is the distribution function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754053.png" />. It follows from (2) and (3) that the conditional density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754054.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754055.png" /> given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754056.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754057.png" />) is expressed by the formula
-= \
-n! over {( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} )
-]  ^ {k-} 1 [ 1 - F ( x _ {i} ) ]  ^ {n-} k f ( x _ {i} ) .
-$$
-The latter formula allows one to trace the internal connection between the observation vector  $  X $,
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754058.png" /></td> <td valign="top" style="width:5%;text-align:right;">(4)</td></tr></table>
-the rank vector  $  R $
-and the vector  $  X ^ {( \cdot ) } $
-of order statistics, since (4) is just the probability density of the  $  k $-
-th order statistic  $  X _ {(} nk) $,
-$  k = 1 \dots n $.
-Moreover, it follows from (3) that the conditional distribution of the rank  $  R _ {i} $
-is given by the formula
-$$
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754059.png" /></td> </tr></table>
-{\mathsf P} \{ R _ {i} = k \mid  X _ {i} \} =
-$$
-$$
+The latter formula allows one to trace the internal connection between the observation vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754060.png" />, the rank vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754061.png" /> and the vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754062.png" /> of order statistics, since (4) is just the probability density of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754063.png" />-th order statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754064.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754065.png" />. Moreover, it follows from (3) that the conditional distribution of the rank <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754066.png" /> is given by the formula
-= \
-\frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! }
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754067.png" /></td> </tr></table>
-[ F ( X _ {i} ) ]  ^ {k-} 1 [ 1 - F ( X _ {i} ) ]  ^ {n-} k .
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754068.png" /></td> </tr></table>
-$$
-Finally, under the assumption that the moments  $  {\mathsf E} \{ X _ {i} \} $
+Finally, under the assumption that the moments <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754069.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754070.png" /> exist and that (1) holds, (2) and (3) imply that the correlation coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754071.png" /> between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754072.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754073.png" /> is equal to
-and  $  {\mathsf D} \{ X _ {i} \} $
-exist and that (1) holds, (2) and (3) imply that the correlation coefficient  $  \rho ( X _ {i} , R _ {i} ) $
-between  $  X _ {i} $
-and  $  R _ {i} $
-is equal to
-$$
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754074.png" /></td> </tr></table>
-\rho ( X _ {i} , R _ {i} )  = \
-\sqrt {
-\frac{12 ( n - 1 ) }{( n + 1 ) {\mathsf D} \{ X _ {i} \} }
- }
-\int\limits _ {- \infty } ^  \infty
-x _ {i} \left [ F ( x _ {i} ) -
-\frac{1}{2}
- \right ]  d F ( x _ {i} ) .
-$$
-In particular, if  $  X _ {i} $
+In particular, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754075.png" /> is uniformly distributed on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754076.png" />, then
-is uniformly distributed on  $  [ 0 , 1 ] $,
-then
-$$
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754077.png" /></td> </tr></table>
-\rho ( X _ {i} , R _ {i} )  = \
-\sqrt {n-
-\frac{1}{n+}
-} .
-$$
-If  $  X $
+If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754078.png" /> has the [[Normal distribution|normal distribution]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754079.png" />, then
-has the [[Normal distribution|normal distribution]]  $  N ( a , \sigma  ^ {2} ) $,
-then
-$$
+<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754080.png" /></td> </tr></table>
-\rho ( X _ {i} , R _ {i} )  = \
-\sqrt {
-\frac{3 ( n - 1 ) }{\pi ( n + 1 ) }
- } ,
-$$
-and  $  \rho ( X _ {i} , R _ {i} ) $
+and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754081.png" /> does not depend on the parameters of the normal distribution.
-does not depend on the parameters of the normal distribution.
 ====References====
 <table><TR><TD valign="top">[1]</TD> <TD valign="top">  W. Hoeffding,   " "Optimum"  nonparametric tests" , ''Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950)'' , Univ. California Press  (1951)  pp. 83–92</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  J. Hájek,   Z. Sidák,   "Theory of rank tests" , Acad. Press  (1967)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  F.P. Tarasenko,   "Non-parametric statistics" , Tomsk  (1976)  (In Russian)</TD></TR></table>

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Rank vector"

Revision as of 14:53, 7 June 2020

References