Difference between revisions of "Rank vector"
(Importing text file) |
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
||
Line 1: | Line 1: | ||
− | + | <!-- | |
+ | r0775401.png | ||
+ | $#A+1 = 81 n = 0 | ||
+ | $#C+1 = 81 : ~/encyclopedia/old_files/data/R077/R.0707540 Rank vector | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
− | + | {{TEX|auto}} | |
+ | {{TEX|done}} | ||
− | + | A vector statistic (cf. [[Statistics|Statistics]]) $ R = ( R _ {1} \dots R _ {n} ) $ | |
+ | constructed from a random observation vector $ X = ( X _ {1} \dots X _ {n} ) $ | ||
+ | with $ i $- | ||
+ | th component $ R _ {i} = R _ {i} ( X) $, | ||
+ | $ i = 1 \dots n $, | ||
+ | defined by | ||
− | + | $$ | |
+ | R _ {i} = \sum _ { j= } 1 ^ { n } \delta ( X _ {i} - X _ {j} ) , | ||
+ | $$ | ||
− | + | where $ \delta ( x) $ | |
+ | is the characteristic function (indicator function) of $ [ 0 , + \infty ] $, | ||
+ | that is, | ||
− | + | $$ | |
+ | \delta ( x) = \ | ||
+ | \left \{ | ||
− | + | The statistic $ R _ {i} $ | |
+ | is called the rank of the $ i $- | ||
+ | th component $ X _ {i} $, | ||
+ | $ i = 1 \dots n $, | ||
+ | of the random vector $ X $. | ||
+ | This definition of a rank vector is precise under the condition | ||
− | + | $$ | |
+ | {\mathsf P} \{ X _ {i} = X _ {j} \} = 0 ,\ \ | ||
+ | i \neq j , | ||
+ | $$ | ||
− | + | which automatically holds if the probability distribution of $ X $ | |
+ | is defined by a density $ p ( x) = p ( x _ {1} \dots x _ {n} ) $. | ||
+ | It follows from the definition of a rank vector that, under these conditions, $ R $ | ||
+ | takes values in the space $ \mathfrak R = \{ r \} $ | ||
+ | of all permutations $ r = ( r _ {1} \dots r _ {n} ) $ | ||
+ | of $ 1 \dots n $ | ||
+ | and the realization $ r _ {i} $ | ||
+ | of the rank $ R _ {i} $ | ||
+ | is equal to the number of components of $ X $ | ||
+ | whose observed values do not exceed the realization of the $ i $- | ||
+ | th component $ X _ {i} $, | ||
+ | $ i = 1 \dots n $. | ||
+ | |||
+ | Let $ X ^ {( \cdot ) } = ( X _ {(} n1) \dots X _ {(} nn) ) $ | ||
+ | be the vector of order statistics (cf. [[Order statistic|Order statistic]]) constructed from the observation vector $ X $. | ||
+ | Then the pair $ ( R , X ^ {( \cdot ) } ) $ | ||
+ | is a [[Sufficient statistic|sufficient statistic]] for the distribution of $ X $, | ||
+ | and $ X $ | ||
+ | itself can be uniquely recovered from $ ( R , X ^ {( \cdot ) } ) $. | ||
+ | Moreover, under the additional assumption that the density $ p ( x) $ | ||
+ | of $ X $ | ||
+ | is symmetric with respect to permutations of the arguments, the components $ R $ | ||
+ | and $ X ^ {( \cdot ) } $ | ||
+ | of the sufficient statistic $ ( R , X ^ {( \cdot ) } ) $ | ||
+ | are independent and | ||
+ | |||
+ | $$ | ||
+ | {\mathsf P} \{ R = r \} = | ||
+ | \frac{1}{n ! } | ||
+ | ,\ \ | ||
+ | r \in \mathfrak R . | ||
+ | $$ | ||
In particular, if | In particular, if | ||
− | + | $$ \tag{1 } | |
+ | p ( x) = p ( x _ {1} \dots x _ {n} ) | ||
+ | = \prod _ { i= } 1 ^ { n } f ( x _ {i} ) , | ||
+ | $$ | ||
+ | |||
+ | that is, the components $ X _ {1} \dots X _ {n} $ | ||
+ | are independent identically-distributed random variables ( $ f ( x _ {i} ) $ | ||
+ | stands for the density of $ X _ {i} $), | ||
+ | then | ||
+ | |||
+ | $$ \tag{2 } | ||
+ | \left . | ||
+ | |||
+ | for any $ k = 1 \dots n $. | ||
− | + | If (1) holds, there is a joint density $ q ( x _ {i} , k ) $, | |
+ | $ k = 1 \dots n $, | ||
+ | of $ X _ {i} $ | ||
+ | and $ R _ {i} $, | ||
+ | defined by the formula | ||
− | + | $$ \tag{3 } | |
+ | q ( x _ {i} , k ) = | ||
+ | $$ | ||
− | + | $$ | |
+ | = \ | ||
− | + | \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } | |
+ | [ F ( | ||
+ | x _ {i} ) ] ^ {k-} 1 [ 1 - F ( x _ {i} ) ] ^ {n-} k f ( x _ {i} ) , | ||
+ | $$ | ||
− | + | where $ F ( x _ {i} ) $ | |
+ | is the distribution function of $ X _ {i} $. | ||
+ | It follows from (2) and (3) that the conditional density $ q ( X _ {i} \mid R _ {i} = k ) $ | ||
+ | of $ X _ {i} $ | ||
+ | given $ R _ {i} = k $( | ||
+ | $ k = 1 \dots n $) | ||
+ | is expressed by the formula | ||
− | + | $$ \tag{4 } | |
+ | q ( x _ {i} \mid R _ {i} = k ) = | ||
+ | $$ | ||
− | + | $$ | |
+ | = \ | ||
+ | n! over {( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} ) | ||
+ | ] ^ {k-} 1 [ 1 - F ( x _ {i} ) ] ^ {n-} k f ( x _ {i} ) . | ||
+ | $$ | ||
− | + | The latter formula allows one to trace the internal connection between the observation vector $ X $, | |
+ | the rank vector $ R $ | ||
+ | and the vector $ X ^ {( \cdot ) } $ | ||
+ | of order statistics, since (4) is just the probability density of the $ k $- | ||
+ | th order statistic $ X _ {(} nk) $, | ||
+ | $ k = 1 \dots n $. | ||
+ | Moreover, it follows from (3) that the conditional distribution of the rank $ R _ {i} $ | ||
+ | is given by the formula | ||
− | + | $$ | |
+ | {\mathsf P} \{ R _ {i} = k \mid X _ {i} \} = | ||
+ | $$ | ||
− | + | $$ | |
+ | = \ | ||
− | + | \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } | |
− | + | [ F ( X _ {i} ) ] ^ {k-} 1 [ 1 - F ( X _ {i} ) ] ^ {n-} k . | |
+ | $$ | ||
− | Finally, under the assumption that the moments | + | Finally, under the assumption that the moments $ {\mathsf E} \{ X _ {i} \} $ |
+ | and $ {\mathsf D} \{ X _ {i} \} $ | ||
+ | exist and that (1) holds, (2) and (3) imply that the correlation coefficient $ \rho ( X _ {i} , R _ {i} ) $ | ||
+ | between $ X _ {i} $ | ||
+ | and $ R _ {i} $ | ||
+ | is equal to | ||
− | + | $$ | |
+ | \rho ( X _ {i} , R _ {i} ) = \ | ||
+ | \sqrt { | ||
+ | \frac{12 ( n - 1 ) }{( n + 1 ) {\mathsf D} \{ X _ {i} \} } | ||
+ | } | ||
+ | \int\limits _ {- \infty } ^ \infty | ||
+ | x _ {i} \left [ F ( x _ {i} ) - | ||
+ | \frac{1}{2} | ||
+ | \right ] d F ( x _ {i} ) . | ||
+ | $$ | ||
− | In particular, if | + | In particular, if $ X _ {i} $ |
+ | is uniformly distributed on $ [ 0 , 1 ] $, | ||
+ | then | ||
− | + | $$ | |
+ | \rho ( X _ {i} , R _ {i} ) = \ | ||
+ | \sqrt {n- | ||
+ | \frac{1}{n+} | ||
+ | 1 } . | ||
+ | $$ | ||
− | If | + | If $ X $ |
+ | has the [[Normal distribution|normal distribution]] $ N ( a , \sigma ^ {2} ) $, | ||
+ | then | ||
− | + | $$ | |
+ | \rho ( X _ {i} , R _ {i} ) = \ | ||
+ | \sqrt { | ||
+ | \frac{3 ( n - 1 ) }{\pi ( n + 1 ) } | ||
+ | } , | ||
+ | $$ | ||
− | and | + | and $ \rho ( X _ {i} , R _ {i} ) $ |
+ | does not depend on the parameters of the normal distribution. | ||
====References==== | ====References==== | ||
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> W. Hoeffding, " "Optimum" nonparametric tests" , ''Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950)'' , Univ. California Press (1951) pp. 83–92</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> F.P. Tarasenko, "Non-parametric statistics" , Tomsk (1976) (In Russian)</TD></TR></table> | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> W. Hoeffding, " "Optimum" nonparametric tests" , ''Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950)'' , Univ. California Press (1951) pp. 83–92</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> F.P. Tarasenko, "Non-parametric statistics" , Tomsk (1976) (In Russian)</TD></TR></table> |
Revision as of 08:09, 6 June 2020
A vector statistic (cf. Statistics) $ R = ( R _ {1} \dots R _ {n} ) $
constructed from a random observation vector $ X = ( X _ {1} \dots X _ {n} ) $
with $ i $-
th component $ R _ {i} = R _ {i} ( X) $,
$ i = 1 \dots n $,
defined by
$$ R _ {i} = \sum _ { j= } 1 ^ { n } \delta ( X _ {i} - X _ {j} ) , $$
where $ \delta ( x) $ is the characteristic function (indicator function) of $ [ 0 , + \infty ] $, that is,
$$ \delta ( x) = \ \left \{ The statistic $ R _ {i} $ is called the rank of the $ i $- th component $ X _ {i} $, $ i = 1 \dots n $, of the random vector $ X $. This definition of a rank vector is precise under the condition $$ {\mathsf P} \{ X _ {i} = X _ {j} \} = 0 ,\ \ i \neq j , $$ which automatically holds if the probability distribution of $ X $ is defined by a density $ p ( x) = p ( x _ {1} \dots x _ {n} ) $. It follows from the definition of a rank vector that, under these conditions, $ R $ takes values in the space $ \mathfrak R = \{ r \} $ of all permutations $ r = ( r _ {1} \dots r _ {n} ) $ of $ 1 \dots n $ and the realization $ r _ {i} $ of the rank $ R _ {i} $ is equal to the number of components of $ X $ whose observed values do not exceed the realization of the $ i $- th component $ X _ {i} $, $ i = 1 \dots n $. Let $ X ^ {( \cdot ) } = ( X _ {(} n1) \dots X _ {(} nn) ) $ be the vector of order statistics (cf. [[Order statistic|Order statistic]]) constructed from the observation vector $ X $. Then the pair $ ( R , X ^ {( \cdot ) } ) $ is a [[Sufficient statistic|sufficient statistic]] for the distribution of $ X $, and $ X $ itself can be uniquely recovered from $ ( R , X ^ {( \cdot ) } ) $. Moreover, under the additional assumption that the density $ p ( x) $ of $ X $ is symmetric with respect to permutations of the arguments, the components $ R $ and $ X ^ {( \cdot ) } $ of the sufficient statistic $ ( R , X ^ {( \cdot ) } ) $ are independent and $$ {\mathsf P} \{ R = r \} = \frac{1}{n ! }
,\ \
r \in \mathfrak R . $$ In particular, if $$ \tag{1 } p ( x) = p ( x _ {1} \dots x _ {n} )
= \prod _ { i= } 1 ^ { n } f ( x _ {i} ) ,
$$ that is, the components $ X _ {1} \dots X _ {n} $ are independent identically-distributed random variables ( $ f ( x _ {i} ) $ stands for the density of $ X _ {i} $), then $$ \tag{2 } \left .
for any $ k = 1 \dots n $.
If (1) holds, there is a joint density $ q ( x _ {i} , k ) $, $ k = 1 \dots n $, of $ X _ {i} $ and $ R _ {i} $, defined by the formula
$$ \tag{3 } q ( x _ {i} , k ) = $$
$$ = \ \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} ) ] ^ {k-} 1 [ 1 - F ( x _ {i} ) ] ^ {n-} k f ( x _ {i} ) , $$
where $ F ( x _ {i} ) $ is the distribution function of $ X _ {i} $. It follows from (2) and (3) that the conditional density $ q ( X _ {i} \mid R _ {i} = k ) $ of $ X _ {i} $ given $ R _ {i} = k $( $ k = 1 \dots n $) is expressed by the formula
$$ \tag{4 } q ( x _ {i} \mid R _ {i} = k ) = $$
$$ = \ n! over {( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} ) ] ^ {k-} 1 [ 1 - F ( x _ {i} ) ] ^ {n-} k f ( x _ {i} ) . $$
The latter formula allows one to trace the internal connection between the observation vector $ X $, the rank vector $ R $ and the vector $ X ^ {( \cdot ) } $ of order statistics, since (4) is just the probability density of the $ k $- th order statistic $ X _ {(} nk) $, $ k = 1 \dots n $. Moreover, it follows from (3) that the conditional distribution of the rank $ R _ {i} $ is given by the formula
$$ {\mathsf P} \{ R _ {i} = k \mid X _ {i} \} = $$
$$ = \ \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } [ F ( X _ {i} ) ] ^ {k-} 1 [ 1 - F ( X _ {i} ) ] ^ {n-} k . $$
Finally, under the assumption that the moments $ {\mathsf E} \{ X _ {i} \} $ and $ {\mathsf D} \{ X _ {i} \} $ exist and that (1) holds, (2) and (3) imply that the correlation coefficient $ \rho ( X _ {i} , R _ {i} ) $ between $ X _ {i} $ and $ R _ {i} $ is equal to
$$ \rho ( X _ {i} , R _ {i} ) = \ \sqrt { \frac{12 ( n - 1 ) }{( n + 1 ) {\mathsf D} \{ X _ {i} \} } } \int\limits _ {- \infty } ^ \infty x _ {i} \left [ F ( x _ {i} ) - \frac{1}{2} \right ] d F ( x _ {i} ) . $$
In particular, if $ X _ {i} $ is uniformly distributed on $ [ 0 , 1 ] $, then
$$ \rho ( X _ {i} , R _ {i} ) = \ \sqrt {n- \frac{1}{n+} 1 } . $$
If $ X $ has the normal distribution $ N ( a , \sigma ^ {2} ) $, then
$$ \rho ( X _ {i} , R _ {i} ) = \ \sqrt { \frac{3 ( n - 1 ) }{\pi ( n + 1 ) } } , $$
and $ \rho ( X _ {i} , R _ {i} ) $ does not depend on the parameters of the normal distribution.
References
[1] | W. Hoeffding, " "Optimum" nonparametric tests" , Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950) , Univ. California Press (1951) pp. 83–92 |
[2] | J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967) |
[3] | F.P. Tarasenko, "Non-parametric statistics" , Tomsk (1976) (In Russian) |
Rank vector. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rank_vector&oldid=15810