Tie
A group of observations in a sample that have the same value. Let $ X _ {1} \dots X _ {n} $
be independent random variables subject to the same absolutely-continuous probability law with probability density $ p( x) $.
Then with probability $ 1 $,
none of the observations $ X _ {1} \dots X _ {n} $
will be equal, that is, $ X _ {i} \neq X _ {j} $
if $ i \neq j $,
and so every member $ X _ {(} i) $
of the order statistics (cf. Order statistic)
$$ \tag{* } X _ {(} 1) < \dots < X _ {(} n) $$
constructed from the sample $ X _ {1} \dots X _ {n} $ will be strictly greater than its predecessor $ X _ {(} i- 1) $.
However, in practice, because of rounding-off errors in the calculation of $ X _ {1} \dots X _ {n} $, several groups of observations can arise, in each of which the observations are all equal. Every such group of coincident observations is called a tie. Thus, instead of (*), the experimenter may observe the order statistics
$$ X _ {(} 1) = \dots = X _ {( \tau _ {1} ) } < X _ {( \tau _ {1} + 1) } = \dots = X _ {( \tau _ {1} + \tau _ {2} ) } < \dots $$
$$ \dots < X _ {( \tau _ {1} + \dots + \tau _ {k-} 1 + 1 ) } = \dots = X _ {( \tau _ {1} + \dots + \tau _ {k} ) } , $$
where all $ \tau _ {i} \geq 1 $ and $ \tau _ {1} + \dots + \tau _ {k} = n $. Thus, when ties occur, that is, when some $ \tau _ {j} \geq 2 $, difficulties arise in defining the rank vector, which plays a basic role in the construction of rank statistics (cf. Rank statistic). As yet (1992) there are no precise recommendations for defining the ranks of coincident observations. There are two common approaches to the solution of this problem. The first consists of randomization. According to this approach, the ranks of the elements
$$ X _ {( \tau _ {1} + \dots + \tau _ {j-} 1 + 1 ) } = \dots = X _ {( \tau _ {1} + \dots + \tau _ {j} ) } $$
making up the the $ j $- th group are taken to be some permutation of the numbers
$$ \tau _ {1} + \dots + \tau _ {j-} 1 + 1 , \tau _ {1} + \dots + \tau _ {j-} 1 + 2 \dots $$
$$ \dots \tau _ {1} + \dots + \tau _ {j} , $$
each having probability $ 1/ \tau _ {j } ! $. The merit of this approach consists of its simplicity, but for certain alternatives with respect to the distribution of the $ X _ {i} $, the actual randomization chosen has an effect on the results of the statistical analysis.
In the second approach, all tied observations
$$ X _ {( \tau _ {1} + \dots + \tau _ {j-} 1 + 1 ) } = \dots = \ X _ {( \tau _ {1} + \dots + \tau _ {j} ) } $$
making up the $ j $- th group are assigned the same, so-called midrank
$$ \tau _ {j} = \tau _ {1} + \dots + \tau _ {j-} 1 + \frac{\tau _ {j} + 1 }{2} , $$
equal to the arithmetic mean of the numbers
$$ \tau _ {1} + \dots + \tau _ {j-} 1 + 1 , \tau _ {1} + \dots + \tau _ {j-} 1 + 2 \dots $$
$$ \dots \tau _ {1} + \dots + \tau _ {j} . $$
It is natural that such a procedure also affects the properties of rank statistics, and this must be taken into account in practice. For example, the second approach is recommended in the construction of the statistics $ W $ of the Wilcoxon test when there are ties. Then the expectation $ {\mathsf E} W $ of $ W $ remains the same as in the case when there are no ties, but its variance $ {\mathsf D} W $ decreases to
$$ {\mathsf D} = \frac{mn( m+ n- 1) }{12 } \left \{ 1 - \frac{1 }{( m+ n)[( m+ n) ^ {2} - 1 ] } \sum _ { j= } 1 ^ { k } \tau _ {j} ( \tau _ {j} ^ {2} - 1 ) \right \} , $$
and this must be taken into account when normalizing $ W $.
References
[1] | J. Hájek, "Theory of rank tests" , Academia (1967) |
[2] | L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , Libr. math. tables , 46 , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova) |
Tie. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Tie&oldid=48976