Difference between revisions of "Tolerance intervals"
(Importing text file) |
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
||
Line 1: | Line 1: | ||
− | + | <!-- | |
+ | t0929701.png | ||
+ | $#A+1 = 76 n = 0 | ||
+ | $#C+1 = 76 : ~/encyclopedia/old_files/data/T092/T.0902970 Tolerance intervals | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
− | + | {{TEX|auto}} | |
+ | {{TEX|done}} | ||
− | + | Random intervals, constructed for independent identically-distributed random variables with unknown distribution function $ F ( x) $, | |
+ | containing with given probability $ \gamma $ | ||
+ | at least a proportion $ p $( | ||
+ | $ 0 < p < 1 $) | ||
+ | of the probability measure $ dF $. | ||
− | + | Let $ X _ {1} \dots X _ {n} $ | |
+ | be independent and identically-distributed random variables with unknown distribution function $ F ( x) $, | ||
+ | and let $ T _ {1} = T _ {1} ( X _ {1} \dots X _ {n} ) $, | ||
+ | $ T _ {2} = T _ {2} ( X _ {1} \dots X _ {n} ) $ | ||
+ | be statistics such that, for a number $ p $( | ||
+ | $ 0 < p < 1 $) | ||
+ | fixed in advance, the event $ \{ F ( T _ {2} ) - F ( T _ {1} ) > p \} $ | ||
+ | has a given probability $ \gamma $, | ||
+ | that is, | ||
− | + | $$ \tag{1 } | |
+ | {\mathsf P} \left \{ | ||
+ | \int\limits _ { T _ {1} } ^ { {T _ 2 } } | ||
+ | dF ( x) \geq p | ||
+ | \right \} = \gamma . | ||
+ | $$ | ||
− | + | In this case the random interval $ ( T _ {1} , T _ {2} ) $ | |
+ | is called a $ \gamma $- | ||
+ | tolerance interval for the distribution function $ F ( x) $, | ||
+ | its end points $ T _ {1} $ | ||
+ | and $ T _ {2} $ | ||
+ | are called tolerance bounds, and the probability $ \gamma $ | ||
+ | is called a confidence coefficient. It follows from (1) that the one-sided tolerance bounds $ T _ {1} $ | ||
+ | and $ T _ {2} $( | ||
+ | i.e. with $ T _ {2} = + \infty $, | ||
+ | respectively $ T _ {1} = - \infty $) | ||
+ | are the usual one-sided confidence bounds with confidence coefficient $ \gamma $ | ||
+ | for the quantiles $ x _ {1 - p } = F ^ { - 1 } ( 1 - p) $ | ||
+ | and $ x _ {p} = F ^ { - 1 } ( p) $, | ||
+ | respectively, that is, | ||
− | + | $$ | |
+ | {\mathsf P} \{ x _ {1 - p } \in [ T _ {1} , + \infty ) \} = \gamma , | ||
+ | $$ | ||
− | + | $$ | |
+ | {\mathsf P} \{ x _ {p} \in (- \infty , T _ {2} ] \} = \gamma . | ||
+ | $$ | ||
− | + | Example. Let $ X _ {1} \dots X _ {n} $ | |
+ | be independent random variables having a normal distribution $ N ( a, \sigma ^ {2} ) $ | ||
+ | with unknown parameters $ a $ | ||
+ | and $ \sigma ^ {2} $. | ||
+ | In this case it is natural to take the tolerance bounds $ T _ {1} $ | ||
+ | and $ T _ {2} $ | ||
+ | to be functions of the sufficient statistic $ ( \overline{X}\; , S ^ {2} ) $, | ||
+ | where | ||
− | + | $$ | |
+ | \overline{X}\; = \ | ||
+ | { | ||
+ | \frac{X _ {1} + \dots + X _ {n} }{n} | ||
+ | } ,\ \ | ||
+ | S ^ {2} = \ | ||
+ | { | ||
+ | \frac{1}{n - 1 } | ||
+ | } | ||
+ | \sum _ {i = 1 } ^ { n } | ||
+ | ( X _ {i} - \overline{X}\; ) ^ {2} . | ||
+ | $$ | ||
− | + | Specifically, one takes $ T _ {1} = \overline{X}\; - kS ^ {2} $ | |
+ | and $ T _ {2} = \overline{X}\; + kS ^ {2} $, | ||
+ | where the constant $ k $, | ||
+ | called the tolerance multiplier, is obtained as the solution to the equation | ||
− | + | $$ | |
+ | {\mathsf P} \left \{ | ||
+ | \Phi \left ( | ||
+ | { | ||
+ | \frac{\overline{X}\; + kS - a } \sigma | ||
+ | } | ||
+ | \right ) - \Phi \left ( | ||
+ | { | ||
+ | \frac{\overline{X}\; - kS - a } \sigma | ||
+ | } | ||
+ | \right ) \geq p | ||
+ | \right \} = \gamma , | ||
+ | $$ | ||
− | + | where $ \Phi ( x) $ | |
+ | is the distribution function of the standard normal law; moreover, $ k = k ( n, \gamma , p) $ | ||
+ | does not depend on the unknown parameters $ a $ | ||
+ | and $ \sigma ^ {2} $. | ||
+ | The tolerance interval constructed in this way satisfies the following property: With confidence probability $ \gamma $ | ||
+ | the interval $ ( \overline{X}\; - kS ^ {2} , \overline{X}\; + kS ^ {2} ) $ | ||
+ | contains at least a proportion $ p $ | ||
+ | of the probability mass of the normal distribution of the variables $ X _ {1} \dots X _ {n} $. | ||
− | + | Assuming the existence of a probability density function $ f ( x) = F ^ { \prime } ( x) $, | |
+ | the probability of the event $ \{ F ( T _ {2} ) - F ( T _ {1} ) \geq p \} $ | ||
+ | is independent of $ F ( x) $ | ||
+ | if and only if $ T _ {1} $ | ||
+ | and $ T _ {2} $ | ||
+ | are order statistics (cf. [[Order statistic|Order statistic]]). Precisely this fact is the basis of a general method for constructing non-parametric, or distribution-free, tolerance intervals. Let $ X ^ {(*)} = ( X _ {(} n1) \dots X _ {(} nn) ) $ | ||
+ | be the vector of order statistics constructed from the sample $ X _ {1} \dots X _ {n} $ | ||
+ | and let | ||
− | + | $$ | |
+ | T _ {1} = X _ {(} nr) ,\ \ | ||
+ | T _ {2} = X _ {(} ns) ,\ \ | ||
+ | 1 \leq r < s \leq n. | ||
+ | $$ | ||
− | which allows one, for given | + | Since the random variable $ F ( X _ {(} ns) ) - F ( X _ {(} nr) ) $ |
+ | has the beta-distribution with parameters $ s - r $ | ||
+ | and $ n - s + r + 1 $, | ||
+ | the probability of the event $ \{ F ( X _ {(} ns) ) - F ( X _ {(} nr) ) \geq p \} $ | ||
+ | can be calculated as the integral $ I _ {1 - p } ( n - s + r + 1, s - r) $, | ||
+ | where $ I _ {x} ( a, b) $ | ||
+ | is the incomplete beta-function, and hence in this case instead of (1) one obtains the relation | ||
+ | |||
+ | $$ \tag{2 } | ||
+ | I _ {1 - p } ( n - s + r + 1, s - r) = \gamma , | ||
+ | $$ | ||
+ | |||
+ | which allows one, for given $ \gamma $, | ||
+ | $ p $ | ||
+ | and $ n $, | ||
+ | to define numbers $ r $ | ||
+ | and $ s $ | ||
+ | so that the order statistics $ X _ {(} nr) $ | ||
+ | and $ X _ {(} ns) $ | ||
+ | are the tolerance bounds of the desired tolerance interval. Moreover, for given $ \gamma $, | ||
+ | $ p $, | ||
+ | $ r $, | ||
+ | relation (2) allows one to determine the size $ n $ | ||
+ | of the collection $ X _ {1} \dots X _ {n} $ | ||
+ | necessary for the relation (2) to hold. There are statistical tables available for solving such problems. | ||
====References==== | ====References==== | ||
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , ''Libr. math. tables'' , '''46''' , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> S.S. Wilks, "Mathematical statistics" , Wiley (1962)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> H.H. David, "Order statistics" , Wiley (1981)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> R.B. Murphy, "Non-parametric tolerance limits" ''Ann. Math. Stat.'' , '''19''' (1948) pp. 581–589</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top"> P.N. Somerville, "Tables for obtaining non-parametric tolerance limits" ''Ann. Math. Stat.'' , '''29''' (1958) pp. 599–601</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top"> H. Scheffé, J.W. Tukey, "Non-parametric estimation I. Validation of order statistics" ''Ann. Math. Stat.'' , '''16''' (1945) pp. 187–192</TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top"> D.A.S. Fraser, "Nonparametric methods in statistics" , Wiley (1957)</TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top"> A. Wald, J. Wolfowitz, "Tolerance limits for a normal distribution" ''Ann. Math. Stat.'' , '''17''' (1946) pp. 208–215</TD></TR><TR><TD valign="top">[9]</TD> <TD valign="top"> H. Robbins, "On distribution-free tolerance limits in random sampling" ''Ann. Math. Stat.'' , '''15''' (1944) pp. 214–216</TD></TR></table> | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , ''Libr. math. tables'' , '''46''' , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> S.S. Wilks, "Mathematical statistics" , Wiley (1962)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> H.H. David, "Order statistics" , Wiley (1981)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> R.B. Murphy, "Non-parametric tolerance limits" ''Ann. Math. Stat.'' , '''19''' (1948) pp. 581–589</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top"> P.N. Somerville, "Tables for obtaining non-parametric tolerance limits" ''Ann. Math. Stat.'' , '''29''' (1958) pp. 599–601</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top"> H. Scheffé, J.W. Tukey, "Non-parametric estimation I. Validation of order statistics" ''Ann. Math. Stat.'' , '''16''' (1945) pp. 187–192</TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top"> D.A.S. Fraser, "Nonparametric methods in statistics" , Wiley (1957)</TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top"> A. Wald, J. Wolfowitz, "Tolerance limits for a normal distribution" ''Ann. Math. Stat.'' , '''17''' (1946) pp. 208–215</TD></TR><TR><TD valign="top">[9]</TD> <TD valign="top"> H. Robbins, "On distribution-free tolerance limits in random sampling" ''Ann. Math. Stat.'' , '''15''' (1944) pp. 214–216</TD></TR></table> |
Revision as of 08:25, 6 June 2020
Random intervals, constructed for independent identically-distributed random variables with unknown distribution function $ F ( x) $,
containing with given probability $ \gamma $
at least a proportion $ p $(
$ 0 < p < 1 $)
of the probability measure $ dF $.
Let $ X _ {1} \dots X _ {n} $ be independent and identically-distributed random variables with unknown distribution function $ F ( x) $, and let $ T _ {1} = T _ {1} ( X _ {1} \dots X _ {n} ) $, $ T _ {2} = T _ {2} ( X _ {1} \dots X _ {n} ) $ be statistics such that, for a number $ p $( $ 0 < p < 1 $) fixed in advance, the event $ \{ F ( T _ {2} ) - F ( T _ {1} ) > p \} $ has a given probability $ \gamma $, that is,
$$ \tag{1 } {\mathsf P} \left \{ \int\limits _ { T _ {1} } ^ { {T _ 2 } } dF ( x) \geq p \right \} = \gamma . $$
In this case the random interval $ ( T _ {1} , T _ {2} ) $ is called a $ \gamma $- tolerance interval for the distribution function $ F ( x) $, its end points $ T _ {1} $ and $ T _ {2} $ are called tolerance bounds, and the probability $ \gamma $ is called a confidence coefficient. It follows from (1) that the one-sided tolerance bounds $ T _ {1} $ and $ T _ {2} $( i.e. with $ T _ {2} = + \infty $, respectively $ T _ {1} = - \infty $) are the usual one-sided confidence bounds with confidence coefficient $ \gamma $ for the quantiles $ x _ {1 - p } = F ^ { - 1 } ( 1 - p) $ and $ x _ {p} = F ^ { - 1 } ( p) $, respectively, that is,
$$ {\mathsf P} \{ x _ {1 - p } \in [ T _ {1} , + \infty ) \} = \gamma , $$
$$ {\mathsf P} \{ x _ {p} \in (- \infty , T _ {2} ] \} = \gamma . $$
Example. Let $ X _ {1} \dots X _ {n} $ be independent random variables having a normal distribution $ N ( a, \sigma ^ {2} ) $ with unknown parameters $ a $ and $ \sigma ^ {2} $. In this case it is natural to take the tolerance bounds $ T _ {1} $ and $ T _ {2} $ to be functions of the sufficient statistic $ ( \overline{X}\; , S ^ {2} ) $, where
$$ \overline{X}\; = \ { \frac{X _ {1} + \dots + X _ {n} }{n} } ,\ \ S ^ {2} = \ { \frac{1}{n - 1 } } \sum _ {i = 1 } ^ { n } ( X _ {i} - \overline{X}\; ) ^ {2} . $$
Specifically, one takes $ T _ {1} = \overline{X}\; - kS ^ {2} $ and $ T _ {2} = \overline{X}\; + kS ^ {2} $, where the constant $ k $, called the tolerance multiplier, is obtained as the solution to the equation
$$ {\mathsf P} \left \{ \Phi \left ( { \frac{\overline{X}\; + kS - a } \sigma } \right ) - \Phi \left ( { \frac{\overline{X}\; - kS - a } \sigma } \right ) \geq p \right \} = \gamma , $$
where $ \Phi ( x) $ is the distribution function of the standard normal law; moreover, $ k = k ( n, \gamma , p) $ does not depend on the unknown parameters $ a $ and $ \sigma ^ {2} $. The tolerance interval constructed in this way satisfies the following property: With confidence probability $ \gamma $ the interval $ ( \overline{X}\; - kS ^ {2} , \overline{X}\; + kS ^ {2} ) $ contains at least a proportion $ p $ of the probability mass of the normal distribution of the variables $ X _ {1} \dots X _ {n} $.
Assuming the existence of a probability density function $ f ( x) = F ^ { \prime } ( x) $, the probability of the event $ \{ F ( T _ {2} ) - F ( T _ {1} ) \geq p \} $ is independent of $ F ( x) $ if and only if $ T _ {1} $ and $ T _ {2} $ are order statistics (cf. Order statistic). Precisely this fact is the basis of a general method for constructing non-parametric, or distribution-free, tolerance intervals. Let $ X ^ {(*)} = ( X _ {(} n1) \dots X _ {(} nn) ) $ be the vector of order statistics constructed from the sample $ X _ {1} \dots X _ {n} $ and let
$$ T _ {1} = X _ {(} nr) ,\ \ T _ {2} = X _ {(} ns) ,\ \ 1 \leq r < s \leq n. $$
Since the random variable $ F ( X _ {(} ns) ) - F ( X _ {(} nr) ) $ has the beta-distribution with parameters $ s - r $ and $ n - s + r + 1 $, the probability of the event $ \{ F ( X _ {(} ns) ) - F ( X _ {(} nr) ) \geq p \} $ can be calculated as the integral $ I _ {1 - p } ( n - s + r + 1, s - r) $, where $ I _ {x} ( a, b) $ is the incomplete beta-function, and hence in this case instead of (1) one obtains the relation
$$ \tag{2 } I _ {1 - p } ( n - s + r + 1, s - r) = \gamma , $$
which allows one, for given $ \gamma $, $ p $ and $ n $, to define numbers $ r $ and $ s $ so that the order statistics $ X _ {(} nr) $ and $ X _ {(} ns) $ are the tolerance bounds of the desired tolerance interval. Moreover, for given $ \gamma $, $ p $, $ r $, relation (2) allows one to determine the size $ n $ of the collection $ X _ {1} \dots X _ {n} $ necessary for the relation (2) to hold. There are statistical tables available for solving such problems.
References
[1] | L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , Libr. math. tables , 46 , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova) |
[2] | S.S. Wilks, "Mathematical statistics" , Wiley (1962) |
[3] | H.H. David, "Order statistics" , Wiley (1981) |
[4] | R.B. Murphy, "Non-parametric tolerance limits" Ann. Math. Stat. , 19 (1948) pp. 581–589 |
[5] | P.N. Somerville, "Tables for obtaining non-parametric tolerance limits" Ann. Math. Stat. , 29 (1958) pp. 599–601 |
[6] | H. Scheffé, J.W. Tukey, "Non-parametric estimation I. Validation of order statistics" Ann. Math. Stat. , 16 (1945) pp. 187–192 |
[7] | D.A.S. Fraser, "Nonparametric methods in statistics" , Wiley (1957) |
[8] | A. Wald, J. Wolfowitz, "Tolerance limits for a normal distribution" Ann. Math. Stat. , 17 (1946) pp. 208–215 |
[9] | H. Robbins, "On distribution-free tolerance limits in random sampling" Ann. Math. Stat. , 15 (1944) pp. 214–216 |
Tolerance intervals. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Tolerance_intervals&oldid=48983