|
|
(3 intermediate revisions by one other user not shown) |
Line 1: |
Line 1: |
− | A vector statistic (cf. [[Statistics|Statistics]]) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775401.png" /> constructed from a random observation vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775402.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775403.png" />-th component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775404.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775405.png" />, defined by
| + | <!-- |
| + | r0775401.png |
| + | $#A+1 = 81 n = 0 |
| + | $#C+1 = 81 : ~/encyclopedia/old_files/data/R077/R.0707540 Rank vector |
| + | Automatically converted into TeX, above some diagnostics. |
| + | Please remove this comment and the {{TEX|auto}} line below, |
| + | if TeX found to be correct. |
| + | --> |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775406.png" /></td> </tr></table>
| + | {{TEX|auto}} |
| + | {{TEX|done}} |
| | | |
− | where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775407.png" /> is the characteristic function (indicator function) of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775408.png" />, that is,
| + | A vector statistic (cf. [[Statistics|Statistics]]) $ R = ( R _ {1} \dots R _ {n} ) $ |
| + | constructed from a random observation vector $ X = ( X _ {1} \dots X _ {n} ) $ |
| + | with $ i $-th component $ R _ {i} = R _ {i} ( X) $, |
| + | $ i = 1 \dots n $, |
| + | defined by |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r0775409.png" /></td> </tr></table>
| + | $$ |
| + | R _ {i} = \sum _ { j= 1} ^ { n } \delta ( X _ {i} - X _ {j} ) , |
| + | $$ |
| | | |
− | The statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754010.png" /> is called the rank of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754011.png" />-th component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754012.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754013.png" />, of the random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754014.png" />. This definition of a rank vector is precise under the condition
| + | where $ \delta ( x) $ |
| + | is the characteristic function (indicator function) of $ [ 0 , + \infty ] $, |
| + | that is, |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754015.png" /></td> </tr></table>
| + | $$ |
| + | \delta ( x) = \ |
| + | \left \{ |
| + | \begin{array}{ll} |
| + | 1 & \textrm{ if } x \geq 0 , \\ |
| + | 0 & \textrm{ if } x < 0 . \\ |
| + | \end{array} |
| | | |
− | which automatically holds if the probability distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754016.png" /> is defined by a density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754017.png" />. It follows from the definition of a rank vector that, under these conditions, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754018.png" /> takes values in the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754019.png" /> of all permutations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754020.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754021.png" /> and the realization <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754022.png" /> of the rank <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754023.png" /> is equal to the number of components of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754024.png" /> whose observed values do not exceed the realization of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754025.png" />-th component <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754026.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754027.png" />.
| + | \right .$$ |
| | | |
− | Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754028.png" /> be the vector of order statistics (cf. [[Order statistic|Order statistic]]) constructed from the observation vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754029.png" />. Then the pair <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754030.png" /> is a [[Sufficient statistic|sufficient statistic]] for the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754031.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754032.png" /> itself can be uniquely recovered from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754033.png" />. Moreover, under the additional assumption that the density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754034.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754035.png" /> is symmetric with respect to permutations of the arguments, the components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754036.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754037.png" /> of the sufficient statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754038.png" /> are independent and
| + | The statistic $ R _ {i} $ |
| + | is called the rank of the $ i $-th component $ X _ {i} $, |
| + | $ i = 1 \dots n $, |
| + | of the random vector $ X $. |
| + | This definition of a rank vector is precise under the condition |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754039.png" /></td> </tr></table>
| + | $$ |
| + | {\mathsf P} \{ X _ {i} = X _ {j} \} = 0 ,\ \ |
| + | i \neq j , |
| + | $$ |
| + | |
| + | which automatically holds if the probability distribution of $ X $ |
| + | is defined by a density $ p ( x) = p ( x _ {1} \dots x _ {n} ) $. |
| + | It follows from the definition of a rank vector that, under these conditions, $ R $ |
| + | takes values in the space $ \mathfrak R = \{ r \} $ |
| + | of all permutations $ r = ( r _ {1} \dots r _ {n} ) $ |
| + | of $ 1 \dots n $ |
| + | and the realization $ r _ {i} $ |
| + | of the rank $ R _ {i} $ |
| + | is equal to the number of components of $ X $ |
| + | whose observed values do not exceed the realization of the $ i $-th component $ X _ {i} $, |
| + | $ i = 1 \dots n $. |
| + | |
| + | Let $ X ^ {( \cdot ) } = ( X _ {( n1)} \dots X _ {( nn)} ) $ |
| + | be the vector of order statistics (cf. [[Order statistic|Order statistic]]) constructed from the observation vector $ X $. |
| + | Then the pair $ ( R , X ^ {( \cdot ) } ) $ |
| + | is a [[Sufficient statistic|sufficient statistic]] for the distribution of $ X $, |
| + | and $ X $ |
| + | itself can be uniquely recovered from $ ( R , X ^ {( \cdot ) } ) $. |
| + | Moreover, under the additional assumption that the density $ p ( x) $ |
| + | of $ X $ |
| + | is symmetric with respect to permutations of the arguments, the components $ R $ |
| + | and $ X ^ {( \cdot ) } $ |
| + | of the sufficient statistic $ ( R , X ^ {( \cdot ) } ) $ |
| + | are independent and |
| + | |
| + | $$ |
| + | {\mathsf P} \{ R = r \} = |
| + | \frac{1}{n ! } |
| + | ,\ \ |
| + | r \in \mathfrak R . |
| + | $$ |
| | | |
| In particular, if | | In particular, if |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754040.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
| + | $$ \tag{1 } |
| + | p ( x) = p ( x _ {1} \dots x _ {n} ) |
| + | = \prod _ { i= 1} ^ { n } f ( x _ {i} ) , |
| + | $$ |
| + | |
| + | that is, the components $ X _ {1} \dots X _ {n} $ |
| + | are independent identically-distributed random variables ( $ f ( x _ {i} ) $ |
| + | stands for the density of $ X _ {i} $), |
| + | then |
| + | |
| + | $$ \tag{2 } |
| + | \left . |
| + | \begin{array}{c} |
| + | {\mathsf P} \{ R _ {i} = k \} = |
| + | \frac{1}{n} |
| + | ,\ i = 1 \dots n , \\ |
| + | {\mathsf P} \{ R _ {i} = k , R _ {j} = m \} = |
| + | \frac{1}{n ( n - 1 ) } |
| + | , |
| + | \ i \neq j ,\ k \neq m , \\ |
| + | {\mathsf E} \{ R _ {i} \} = n+ |
| + | \frac{1}{2} |
| + | ,\ {\mathsf D} \{ R _ {i} \} = |
| + | \frac{n ^ {2} - 1 }{12} |
| + | ,\ \ |
| + | i = 1 \dots n , \\ |
| + | \end{array} |
| + | \right \} |
| + | $$ |
| + | |
| + | for any $ k = 1 \dots n $. |
| | | |
− | that is, the components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754041.png" /> are independent identically-distributed random variables (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754042.png" /> stands for the density of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754043.png" />), then
| + | If (1) holds, there is a joint density $ q ( x _ {i} , k ) $, |
| + | $ k = 1 \dots n $, |
| + | of $ X _ {i} $ |
| + | and $ R _ {i} $, |
| + | defined by the formula |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754044.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
| + | $$ \tag{3 } |
| + | q ( x _ {i} , k ) = |
| + | $$ |
| | | |
− | for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754045.png" />.
| + | $$ |
| + | = \ |
| | | |
− | If (1) holds, there is a joint density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754046.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754047.png" />, of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754048.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754049.png" />, defined by the formula
| + | \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } |
| + | [ F ( |
| + | x _ {i} ) ] ^ {k- 1} [ 1 - F ( x _ {i} ) ] ^ {n- k }f ( x _ {i} ) , |
| + | $$ |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754050.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
| + | where $ F ( x _ {i} ) $ |
| + | is the distribution function of $ X _ {i} $. |
| + | It follows from (2) and (3) that the conditional density $ q ( X _ {i} \mid R _ {i} = k ) $ |
| + | of $ X _ {i} $ |
| + | given $ R _ {i} = k $ ($ k = 1 \dots n $) |
| + | is expressed by the formula |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754051.png" /></td> </tr></table>
| + | $$ \tag{4 } |
| + | q ( x _ {i} \mid R _ {i} = k ) = |
| + | $$ |
| | | |
− | where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754052.png" /> is the distribution function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754053.png" />. It follows from (2) and (3) that the conditional density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754054.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754055.png" /> given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754056.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754057.png" />) is expressed by the formula
| + | $$ |
| + | = \ |
| + | {n! \over ( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} ) |
| + | ] ^ {k- 1} [ 1 - F ( x _ {i} ) ] ^ {n- k} f ( x _ {i} ) . |
| + | $$ |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754058.png" /></td> <td valign="top" style="width:5%;text-align:right;">(4)</td></tr></table>
| + | The latter formula allows one to trace the internal connection between the observation vector $ X $, |
| + | the rank vector $ R $ |
| + | and the vector $ X ^ {( \cdot ) } $ |
| + | of order statistics, since (4) is just the probability density of the $ k $-th order statistic $ X _ {( nk) }$, |
| + | $ k = 1 \dots n $. |
| + | Moreover, it follows from (3) that the conditional distribution of the rank $ R _ {i} $ |
| + | is given by the formula |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754059.png" /></td> </tr></table>
| + | $$ |
| + | {\mathsf P} \{ R _ {i} = k \mid X _ {i} \} = |
| + | $$ |
| | | |
− | The latter formula allows one to trace the internal connection between the observation vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754060.png" />, the rank vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754061.png" /> and the vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754062.png" /> of order statistics, since (4) is just the probability density of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754063.png" />-th order statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754064.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754065.png" />. Moreover, it follows from (3) that the conditional distribution of the rank <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754066.png" /> is given by the formula
| + | $$ |
| + | = \ |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754067.png" /></td> </tr></table>
| + | \frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! } |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754068.png" /></td> </tr></table>
| + | [ F ( X _ {i} ) ] ^ {k- 1} [ 1 - F ( X _ {i} ) ] ^ {n- k} . |
| + | $$ |
| | | |
− | Finally, under the assumption that the moments <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754069.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754070.png" /> exist and that (1) holds, (2) and (3) imply that the correlation coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754071.png" /> between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754072.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754073.png" /> is equal to | + | Finally, under the assumption that the moments $ {\mathsf E} \{ X _ {i} \} $ |
| + | and $ {\mathsf D} \{ X _ {i} \} $ |
| + | exist and that (1) holds, (2) and (3) imply that the correlation coefficient $ \rho ( X _ {i} , R _ {i} ) $ |
| + | between $ X _ {i} $ |
| + | and $ R _ {i} $ |
| + | is equal to |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754074.png" /></td> </tr></table>
| + | $$ |
| + | \rho ( X _ {i} , R _ {i} ) = \ |
| + | \sqrt { |
| + | \frac{12 ( n - 1 ) }{( n + 1 ) {\mathsf D} \{ X _ {i} \} } |
| + | } |
| + | \int\limits _ {- \infty } ^ \infty |
| + | x _ {i} \left [ F ( x _ {i} ) - |
| + | \frac{1}{2} |
| + | \right ] d F ( x _ {i} ) . |
| + | $$ |
| | | |
− | In particular, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754075.png" /> is uniformly distributed on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754076.png" />, then | + | In particular, if $ X _ {i} $ |
| + | is uniformly distributed on $ [ 0 , 1 ] $, |
| + | then |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754077.png" /></td> </tr></table>
| + | $$ |
| + | \rho ( X _ {i} , R _ {i} ) = \ |
| + | \sqrt {n- \frac{1}{n+1 }} . |
| + | $$ |
| | | |
− | If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754078.png" /> has the [[Normal distribution|normal distribution]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754079.png" />, then | + | If $ X $ |
| + | has the [[Normal distribution|normal distribution]] $ N ( a , \sigma ^ {2} ) $, |
| + | then |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754080.png" /></td> </tr></table>
| + | $$ |
| + | \rho ( X _ {i} , R _ {i} ) = \ |
| + | \sqrt { |
| + | \frac{3 ( n - 1 ) }{\pi ( n + 1 ) } |
| + | } , |
| + | $$ |
| | | |
− | and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077540/r07754081.png" /> does not depend on the parameters of the normal distribution. | + | and $ \rho ( X _ {i} , R _ {i} ) $ |
| + | does not depend on the parameters of the normal distribution. |
| | | |
| ====References==== | | ====References==== |
| <table><TR><TD valign="top">[1]</TD> <TD valign="top"> W. Hoeffding, " "Optimum" nonparametric tests" , ''Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950)'' , Univ. California Press (1951) pp. 83–92</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> F.P. Tarasenko, "Non-parametric statistics" , Tomsk (1976) (In Russian)</TD></TR></table> | | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> W. Hoeffding, " "Optimum" nonparametric tests" , ''Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950)'' , Univ. California Press (1951) pp. 83–92</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> F.P. Tarasenko, "Non-parametric statistics" , Tomsk (1976) (In Russian)</TD></TR></table> |
A vector statistic (cf. Statistics) $ R = ( R _ {1} \dots R _ {n} ) $
constructed from a random observation vector $ X = ( X _ {1} \dots X _ {n} ) $
with $ i $-th component $ R _ {i} = R _ {i} ( X) $,
$ i = 1 \dots n $,
defined by
$$
R _ {i} = \sum _ { j= 1} ^ { n } \delta ( X _ {i} - X _ {j} ) ,
$$
where $ \delta ( x) $
is the characteristic function (indicator function) of $ [ 0 , + \infty ] $,
that is,
$$
\delta ( x) = \
\left \{
\begin{array}{ll}
1 & \textrm{ if } x \geq 0 , \\
0 & \textrm{ if } x < 0 . \\
\end{array}
\right .$$
The statistic $ R _ {i} $
is called the rank of the $ i $-th component $ X _ {i} $,
$ i = 1 \dots n $,
of the random vector $ X $.
This definition of a rank vector is precise under the condition
$$
{\mathsf P} \{ X _ {i} = X _ {j} \} = 0 ,\ \
i \neq j ,
$$
which automatically holds if the probability distribution of $ X $
is defined by a density $ p ( x) = p ( x _ {1} \dots x _ {n} ) $.
It follows from the definition of a rank vector that, under these conditions, $ R $
takes values in the space $ \mathfrak R = \{ r \} $
of all permutations $ r = ( r _ {1} \dots r _ {n} ) $
of $ 1 \dots n $
and the realization $ r _ {i} $
of the rank $ R _ {i} $
is equal to the number of components of $ X $
whose observed values do not exceed the realization of the $ i $-th component $ X _ {i} $,
$ i = 1 \dots n $.
Let $ X ^ {( \cdot ) } = ( X _ {( n1)} \dots X _ {( nn)} ) $
be the vector of order statistics (cf. Order statistic) constructed from the observation vector $ X $.
Then the pair $ ( R , X ^ {( \cdot ) } ) $
is a sufficient statistic for the distribution of $ X $,
and $ X $
itself can be uniquely recovered from $ ( R , X ^ {( \cdot ) } ) $.
Moreover, under the additional assumption that the density $ p ( x) $
of $ X $
is symmetric with respect to permutations of the arguments, the components $ R $
and $ X ^ {( \cdot ) } $
of the sufficient statistic $ ( R , X ^ {( \cdot ) } ) $
are independent and
$$
{\mathsf P} \{ R = r \} =
\frac{1}{n ! }
,\ \
r \in \mathfrak R .
$$
In particular, if
$$ \tag{1 }
p ( x) = p ( x _ {1} \dots x _ {n} )
= \prod _ { i= 1} ^ { n } f ( x _ {i} ) ,
$$
that is, the components $ X _ {1} \dots X _ {n} $
are independent identically-distributed random variables ( $ f ( x _ {i} ) $
stands for the density of $ X _ {i} $),
then
$$ \tag{2 }
\left .
\begin{array}{c}
{\mathsf P} \{ R _ {i} = k \} =
\frac{1}{n}
,\ i = 1 \dots n , \\
{\mathsf P} \{ R _ {i} = k , R _ {j} = m \} =
\frac{1}{n ( n - 1 ) }
,
\ i \neq j ,\ k \neq m , \\
{\mathsf E} \{ R _ {i} \} = n+
\frac{1}{2}
,\ {\mathsf D} \{ R _ {i} \} =
\frac{n ^ {2} - 1 }{12}
,\ \
i = 1 \dots n , \\
\end{array}
\right \}
$$
for any $ k = 1 \dots n $.
If (1) holds, there is a joint density $ q ( x _ {i} , k ) $,
$ k = 1 \dots n $,
of $ X _ {i} $
and $ R _ {i} $,
defined by the formula
$$ \tag{3 }
q ( x _ {i} , k ) =
$$
$$
= \
\frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! }
[ F (
x _ {i} ) ] ^ {k- 1} [ 1 - F ( x _ {i} ) ] ^ {n- k }f ( x _ {i} ) ,
$$
where $ F ( x _ {i} ) $
is the distribution function of $ X _ {i} $.
It follows from (2) and (3) that the conditional density $ q ( X _ {i} \mid R _ {i} = k ) $
of $ X _ {i} $
given $ R _ {i} = k $ ($ k = 1 \dots n $)
is expressed by the formula
$$ \tag{4 }
q ( x _ {i} \mid R _ {i} = k ) =
$$
$$
= \
{n! \over ( k - 1 ) ! ( n - k ) ! } [ F ( x _ {i} )
] ^ {k- 1} [ 1 - F ( x _ {i} ) ] ^ {n- k} f ( x _ {i} ) .
$$
The latter formula allows one to trace the internal connection between the observation vector $ X $,
the rank vector $ R $
and the vector $ X ^ {( \cdot ) } $
of order statistics, since (4) is just the probability density of the $ k $-th order statistic $ X _ {( nk) }$,
$ k = 1 \dots n $.
Moreover, it follows from (3) that the conditional distribution of the rank $ R _ {i} $
is given by the formula
$$
{\mathsf P} \{ R _ {i} = k \mid X _ {i} \} =
$$
$$
= \
\frac{( n - 1 ) ! }{( k - 1 ) ! ( n - k ) ! }
[ F ( X _ {i} ) ] ^ {k- 1} [ 1 - F ( X _ {i} ) ] ^ {n- k} .
$$
Finally, under the assumption that the moments $ {\mathsf E} \{ X _ {i} \} $
and $ {\mathsf D} \{ X _ {i} \} $
exist and that (1) holds, (2) and (3) imply that the correlation coefficient $ \rho ( X _ {i} , R _ {i} ) $
between $ X _ {i} $
and $ R _ {i} $
is equal to
$$
\rho ( X _ {i} , R _ {i} ) = \
\sqrt {
\frac{12 ( n - 1 ) }{( n + 1 ) {\mathsf D} \{ X _ {i} \} }
}
\int\limits _ {- \infty } ^ \infty
x _ {i} \left [ F ( x _ {i} ) -
\frac{1}{2}
\right ] d F ( x _ {i} ) .
$$
In particular, if $ X _ {i} $
is uniformly distributed on $ [ 0 , 1 ] $,
then
$$
\rho ( X _ {i} , R _ {i} ) = \
\sqrt {n- \frac{1}{n+1 }} .
$$
If $ X $
has the normal distribution $ N ( a , \sigma ^ {2} ) $,
then
$$
\rho ( X _ {i} , R _ {i} ) = \
\sqrt {
\frac{3 ( n - 1 ) }{\pi ( n + 1 ) }
} ,
$$
and $ \rho ( X _ {i} , R _ {i} ) $
does not depend on the parameters of the normal distribution.
References
[1] | W. Hoeffding, " "Optimum" nonparametric tests" , Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950) , Univ. California Press (1951) pp. 83–92 |
[2] | J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967) |
[3] | F.P. Tarasenko, "Non-parametric statistics" , Tomsk (1976) (In Russian) |