Difference between revisions of "Sample method"

Latest revision as of 08:12, 6 June 2020

A statistical method for the study of the general properties of a certain population of objects by studying the properties of only a sample (a part) of these objects. The mathematical theory of sample methods is based on two important sections of mathematical statistics — the theory of sampling from a finite population and the theory of sampling from an infinite population. The fundamental difference between the sampling theory for finite and infinite populations consists in the fact that in the former case the theory is usually applied to objects of a non-random, determined nature (for example, the number of defective articles in a given industrial batch of products is not a random variable: it is an unknown constant which must be estimated from the sampling data). In the latter case the theory is usually employed to study the properties of random objects (for example, to study the properties of continuously-distributed random experimental errors, each one of which may be interpreted, in principle, as the realization of one out of an infinite set of possible results).

Samples from finite populations and their theory form the base of methods in statistical quality control and are often employed in sociological studies. According to probability theory, the sample will correctly reproduce the properties of the population as a whole if the sampling is conducted at random, i.e. so that each one of the possible samples of a given size $ n $ out of a population of size $ N $( the number of such samples is $ N ! / n ! ( N - n) ! $) has an equal chance of being selected in actual practice.

The method which is most often used in practice is sampling without replacement, the item already chosen not being returned to the population under study before taking the next items in the sample (for example, in drawing the winning lottery tickets, in statistical quality control and in long lasting demographic investigations). Sampling with replacement is usually employed in theoretical studies only (an example of this technique is the recording of the number of particles colliding with the container walls during a given period of time in the study of Brownian motion). If $ n \ll N $, practically equivalent results are obtained from the two techniques.

The properties of the populations which are studied by the sample method may be qualitative and quantitative. In the former case the task of investigating the sample consists of finding the number $ M $ of items in the population having certain characteristics (e.g. during statistical control the parameter of interest is often the number $ M $ of defective items in a batch of $ N $ items). $ M $ is estimated by the ratio $ mN/n $, where $ m $ is the number of items displaying the characteristic under study in a sample of size $ n $. In the case of a quantitative characteristic the task consists of determining the mean value $ \overline{x}\; = ( x _ {1} + \dots + x _ {N} ) / N $ of the population. The value $ \overline{x}\; $ is estimated by means of the sample average

$$ \overline{X}\; = \frac{X _ {1} + \dots + X _ {N} }{n} , $$

where $ X _ {1} \dots X _ {N} $ are the numbers of quantities in the populations $ x _ {1} \dots x _ {N} $ under study which belong to the sample. From the mathematical point of view, the former situation is a special case of the latter, which occurs if $ M $ of the variables $ x _ {i} $ are equal to one, while the remaining $ ( N - M ) $ are zero; in this situation $ \overline{x}\; = M/N $ and $ \overline{X}\; = m/n $.

In the mathematical theory of sample methods, estimating the mean value is the key operation, since this value forms the base of a quantitative description of the variability of the characteristic within the population; in fact, the variability of the characteristic is usually defined as the variance

$$ \sigma ^ {2} = \ \frac{( x _ {1} - \overline{x}\; ) ^ {2} + \dots + ( x _ {N} - \overline{x}\; ) ^ {2} }{N } , $$

which is the average of the squares of the deviations of $ x _ {i} $ from the average value $ \overline{x}\; $. If a qualitative characteristic is studied, then

$$ \sigma ^ {2} = \frac{M ( N - M) }{N ^ {2} } . $$

The accuracies of the estimates $ m/n $ and $ \overline{X}\; $ are found from their variances

$$ \sigma _ {m/n } ^ {2} = {\mathsf E} \left ( { \frac{m}{n} } - { \frac{M}{N} } \right ) ^ {2} \ \ \textrm{ and } \ \ \sigma _ {\overline{X}\; } ^ {2} = {\mathsf E} ( \overline{X}\; - \overline{x}\; ) ^ {2} , $$

which are expressed, in terms of the variances $ \sigma ^ {2} $ of the finite population, as the ratios $ \sigma ^ {2} / n $( in the case of sampling with replacement) and $ \sigma ^ {2} ( N - n)/n( N - 1) $( in the case of sampling without replacement). Since in many problems of practical interest the random variables $ m/n $ and $ \overline{X}\; $ roughly follow a normal distribution if $ n \geq 30 $, it follows that the deviations of $ m/n $ from $ M/N $ and $ \overline{X}\; $ from $ \overline{x}\; $, with absolute values larger than $ 2 \sigma _ {m/n} $ and $ 2 \sigma _ {\overline{X}\; } $, respectively, may occur, on the average, in about one case in twenty.

More complete information about the distribution of a quantitative characteristic in a given population may be obtained from the empirical distribution of this characteristic in the sample.

Sampling from an infinite population.

It is usual in mathematical statistics to describe as a sample the results of given homogeneous observations (mostly independent ones) even through this differs from the concept of a sample from a finite population with or without replacement. Thus, the measurements of angles, which involve continuously-distributed random errors, would be denoted as a sample from an infinite population. It is assumed that it is possible, in principle, to carry out any desired number of such observations. The results obtained form a so-called sample from an infinite set of possible results, which is called the general aggregate. The concept of a general aggregate is neither logically unobjectionable nor indispensable. In solving practical problems, there is no need of the infinite general aggregate itself, but only of certain characteristics corresponding to it. From the point of view of probability theory, these characteristics are numerical or functional properties of a certain probability distribution, while the sample items are random variables subject to this distribution. Such an interpretation makes it possible to apply the general theory of statistical estimation to sample estimates. This is why, for example, in probability theory, when processing observations, the concept of an infinite general aggregate is replaced by the concept of a probability distribution involving unknown parameters. The results of the observations are treated as experimentally found values of the random variables subject to this distribution. The objective of the processing is to use the results of the observations to compute optimal (in some sense) statistical estimators for the unknown distribution parameters.

So far, the concern has been with sampling from one population of certain objects. In practice, however, sampling is often performed with several identical populations (e.g. in estimating the fraction of defective articles in several batches of finished industrial products). In such a situation the object of study is no longer a single number $ M $, but several unknown numbers $ M _ {1} , M _ {2} , . . . $. For instance, let each batch of the finished product contain $ N $ articles, let $ M _ {1} , M _ {2} \dots $ be the numbers of defective articles in these batches, and let $ m _ {1} , m _ {2} \dots $ be the corresponding numbers of defective articles found in samples of size $ n $. If the so-called principle of defect-free acceptance is accepted, the $ r $- th batch is delivered to the customer if $ m _ {i} = 0 $ and is rejected otherwise. If it is assumed that the control of the articles involves their destruction, then the customer obtains a batch of size $ R _ {i} = 0 $( if $ m _ {i} > 0 $) or a batch of size $ R _ {i} = N - n $ containing $ D _ {i} = M _ {i} $( if $ m _ {i} = 0 $) defective articles, the values of $ R _ {1} , R _ {2} , . . . $( thus, their sum as well) being known, while the value of $ D _ {1} + D _ {2} + \dots $ is not known. The ratio $ ( D _ {1} + D _ {2} + \dots ) / ( R _ {1} + R _ {2} + \dots ) $ is known as the fraction of passed defectives, and its mathematical expectation $ q $ is known as the average fraction of passed defectives. The task of mathematical statistics is to estimate $ q $ from the values of $ R _ {1} , R _ {2} \dots $ which are determined using the sample method. If the values $ M _ {1} , M _ {2} \dots $ may be treated as the realization of independent identically-distributed random variables with a known distribution law $ {\mathsf P} \{ M _ {i} = r \} = p _ {r} $, then, according to the Bayes formula, a statistical estimator of the average number of passed defective articles in the accepted batches can be expressed by the formula

$$ \widetilde{D} = \ {\mathsf E} \{ M \mid m = 0 \} = \ \frac{\left ( \sum _ {r = 1 } ^ { {N } - n } r \frac{C _ {N - r } ^ {n} }{C _ {N} ^ {n} } p _ {r} \right ) }{ {\mathsf P} \{ m = 0 \} } , $$

and

$$ \widetilde{D} \leq \frac{( N - n ) {\mathsf P} \{ m = 1 \} }{n {\mathsf P} \{ m= 0 \} } , $$

where

$$ {\mathsf P} \{ m = k \} = \ \sum _ {r = 0 } ^ { {N } - n } \frac{C _ {r} ^ {k} C _ {N - r } ^ {n} }{C _ {N} ^ {n} } p _ {r} ,\ \ k = 0 \dots n. $$

For this reason the estimator

$$ \widetilde{q} = \frac{\widetilde{D} }{( N - n) } $$

of the average fraction of passed defectives in the accepted batches satisfies the inequality

$$ \widetilde{q} \leq \frac{ {\mathsf P} \{ m = 1 \} }{n {\mathsf P} \{ m= 0 \} } \approx \frac{s _ {1} }{ns _ {0} } , $$

where $ s _ {0} $ is the number of accepted batches while $ s _ {1} $ is the number of defective batches the samples of which yielded exactly one defective article.

References

[1]	N.V. Smirnov, I.V. Dunin-Barkovskii, "Mathematische Statistik in der Technik" , Deutsch. Verlag Wissenschaft. (1969) (Translated from Russian)
[2]	Yu.K. Belyaev, "Probabilistic methods of sample control" , Moscow (1975) (In Russian)
[3]	M.G. Kendall, A. Stuart, "The advanced theory of statistics. Distribution theory" , 3. Design and analysis , Griffin (1969)

Comments

References

[a1]	J.M. Juran (ed.) , Quality control handbook , McGraw-Hill (1962)

How to Cite This Entry:
Sample method. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Sample_method&oldid=16362

This article was adapted from an original article by L.N. Bol'shev (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Sample method"

Latest revision as of 08:12, 6 June 2020

Contents

Sampling from an infinite population.

References

Comments

References

@@ Line 1: / Line 1: @@
+<!--
+s0831701.png
+$#A+1 = 68 n = 0
+$#C+1 = 68 : ~/encyclopedia/old_files/data/S083/S.0803170 Sample method
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
+{{TEX|auto}}
+{{TEX|done}}
 A statistical method for the study of the general properties of a certain population of objects by studying the properties of only a sample (a part) of these objects. The mathematical theory of sample methods is based on two important sections of mathematical statistics — the theory of sampling from a finite population and the theory of sampling from an infinite population. The fundamental difference between the sampling theory for finite and infinite populations consists in the fact that in the former case the theory is usually applied to objects of a non-random, determined nature (for example, the number of defective articles in a given industrial batch of products is not a random variable: it is an unknown constant which must be estimated from the sampling data). In the latter case the theory is usually employed to study the properties of random objects (for example, to study the properties of continuously-distributed random experimental errors, each one of which may be interpreted, in principle, as the realization of one out of an infinite set of possible results).
-Samples from finite populations and their theory form the base of methods in [[Statistical quality control|statistical quality control]] and are often employed in sociological studies. According to probability theory, the sample will correctly reproduce the properties of the population as a whole if the sampling is conducted at random, i.e. so that each one of the possible samples of a given size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s0831701.png" /> out of a population of size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s0831702.png" /> (the number of such samples is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s0831703.png" />) has an equal chance of being selected in actual practice.
+Samples from finite populations and their theory form the base of methods in [[Statistical quality control|statistical quality control]] and are often employed in sociological studies. According to probability theory, the sample will correctly reproduce the properties of the population as a whole if the sampling is conducted at random, i.e. so that each one of the possible samples of a given size  $  n $
+out of a population of size  $  N $(
+the number of such samples is  $  N ! / n ! ( N - n) ! $)
+has an equal chance of being selected in actual practice.
-The method which is most often used in practice is sampling without replacement, the item already chosen not being returned to the population under study before taking the next items in the sample (for example, in drawing the winning lottery tickets, in statistical quality control and in long lasting demographic investigations). Sampling with replacement is usually employed in theoretical studies only (an example of this technique is the recording of the number of particles colliding with the container walls during a given period of time in the study of [[Brownian motion|Brownian motion]]). If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s0831704.png" />, practically equivalent results are obtained from the two techniques.
+The method which is most often used in practice is sampling without replacement, the item already chosen not being returned to the population under study before taking the next items in the sample (for example, in drawing the winning lottery tickets, in statistical quality control and in long lasting demographic investigations). Sampling with replacement is usually employed in theoretical studies only (an example of this technique is the recording of the number of particles colliding with the container walls during a given period of time in the study of [[Brownian motion|Brownian motion]]). If  $  n \ll  N $,
+practically equivalent results are obtained from the two techniques.
-The properties of the populations which are studied by the sample method may be qualitative and quantitative. In the former case the task of investigating the sample consists of finding the number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s0831705.png" /> of items in the population having certain characteristics (e.g. during statistical control the parameter of interest is often the number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s0831706.png" /> of defective items in a batch of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s0831707.png" /> items). <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s0831708.png" /> is estimated by the ratio <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s0831709.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317010.png" /> is the number of items displaying the characteristic under study in a sample of size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317011.png" />. In the case of a quantitative characteristic the task consists of determining the mean value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317012.png" /> of the population. The value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317013.png" /> is estimated by means of the sample average
+The properties of the populations which are studied by the sample method may be qualitative and quantitative. In the former case the task of investigating the sample consists of finding the number  $  M $
+of items in the population having certain characteristics (e.g. during statistical control the parameter of interest is often the number  $  M $
+of defective items in a batch of  $  N $
+items).  $  M $
+is estimated by the ratio  $  mN/n $,
+where  $  m $
+is the number of items displaying the characteristic under study in a sample of size  $  n $.
+In the case of a quantitative characteristic the task consists of determining the mean value  $  \overline{x}\; = ( x _ {1} + \dots + x _ {N} ) / N $
+of the population. The value  $  \overline{x}\; $
+is estimated by means of the sample average
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317014.png" /></td> </tr></table>
+$$
+\overline{X}\;  =
+\frac{X _ {1} + \dots + X _ {N} }{n}
+ ,
+$$
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317015.png" /> are the numbers of quantities in the populations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317016.png" /> under study which belong to the sample. From the mathematical point of view, the former situation is a special case of the latter, which occurs if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317017.png" /> of the variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317018.png" /> are equal to one, while the remaining <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317019.png" /> are zero; in this situation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317020.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317021.png" />.
+where  $  X _ {1} \dots X _ {N} $
+are the numbers of quantities in the populations  $  x _ {1} \dots x _ {N} $
+under study which belong to the sample. From the mathematical point of view, the former situation is a special case of the latter, which occurs if  $  M $
+of the variables  $  x _ {i} $
+are equal to one, while the remaining  $  ( N - M ) $
+are zero; in this situation  $  \overline{x}\; = M/N $
+and  $  \overline{X}\; = m/n $.
 In the mathematical theory of sample methods, estimating the mean value is the key operation, since this value forms the base of a quantitative description of the variability of the characteristic within the population; in fact, the variability of the characteristic is usually defined as the variance
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317022.png" /></td> </tr></table>
+$$
+\sigma  ^ {2}  = \
-which is the average of the squares of the deviations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317023.png" /> from the average value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317024.png" />. If a qualitative characteristic is studied, then
+\frac{( x _ {1} - \overline{x}\; )  ^ {2} + \dots + ( x _ {N} - \overline{x}\; )  ^ {2} }{N }
+ ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317025.png" /></td> </tr></table>
+which is the average of the squares of the deviations of  $  x _ {i} $
+from the average value  $  \overline{x}\; $.
+If a qualitative characteristic is studied, then
-The accuracies of the estimates <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317026.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317027.png" /> are found from their variances
+$$
+\sigma  ^ {2}  =
+\frac{M ( N - M) }{N  ^ {2} }
+ .
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317028.png" /></td> </tr></table>
+The accuracies of the estimates  $  m/n $
+and  $  \overline{X}\; $
+are found from their variances
-which are expressed, in terms of the variances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317029.png" /> of the finite population, as the ratios <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317030.png" /> (in the case of sampling with replacement) and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317031.png" /> (in the case of sampling without replacement). Since in many problems of practical interest the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317032.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317033.png" /> roughly follow a [[Normal distribution|normal distribution]] if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317034.png" />, it follows that the deviations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317035.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317036.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317037.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317038.png" />, with absolute values larger than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317039.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317040.png" />, respectively, may occur, on the average, in about one case in twenty.
+$$
+\sigma _ {m/n }   ^ {2}  =  {\mathsf E} \left (
+{
+\frac{m}{n}
+ } - {
+\frac{M}{N}
+ } \right )  ^ {2} \ \
+\textrm{ and } \ \
+\sigma _ {\overline{X}\; }   ^ {2}  =  {\mathsf E} ( \overline{X}\; - \overline{x}\; )  ^ {2} ,
+$$
+which are expressed, in terms of the variances  $  \sigma  ^ {2} $
+of the finite population, as the ratios  $  \sigma  ^ {2} / n $(
+in the case of sampling with replacement) and  $  \sigma  ^ {2} ( N - n)/n( N - 1) $(
+in the case of sampling without replacement). Since in many problems of practical interest the random variables  $  m/n $
+and  $  \overline{X}\; $
+roughly follow a [[Normal distribution|normal distribution]] if  $  n \geq  30 $,
+it follows that the deviations of  $  m/n $
+from  $  M/N $
+and  $  \overline{X}\; $
+from  $  \overline{x}\; $,
+with absolute values larger than  $  2 \sigma _ {m/n} $
+and  $  2 \sigma _ {\overline{X}\; }  $,
+respectively, may occur, on the average, in about one case in twenty.
 More complete information about the distribution of a quantitative characteristic in a given population may be obtained from the [[Empirical distribution|empirical distribution]] of this characteristic in the sample.
@@ Line 30: / Line 99: @@
 It is usual in mathematical statistics to describe as a sample the results of given homogeneous observations (mostly independent ones) even through this differs from the concept of a sample from a finite population with or without replacement. Thus, the measurements of angles, which involve continuously-distributed random errors, would be denoted as a sample from an infinite population. It is assumed that it is possible, in principle, to carry out any desired number of such observations. The results obtained form a so-called sample from an infinite set of possible results, which is called the [[General aggregate|general aggregate]]. The concept of a general aggregate is neither logically unobjectionable nor indispensable. In solving practical problems, there is no need of the infinite general aggregate itself, but only of certain characteristics corresponding to it. From the point of view of probability theory, these characteristics are numerical or functional properties of a certain probability distribution, while the sample items are random variables subject to this distribution. Such an interpretation makes it possible to apply the general theory of [[Statistical estimation|statistical estimation]] to sample estimates. This is why, for example, in probability theory, when processing observations, the concept of an infinite general aggregate is replaced by the concept of a probability distribution involving unknown parameters. The results of the observations are treated as experimentally found values of the random variables subject to this distribution. The objective of the processing is to use the results of the observations to compute optimal (in some sense) statistical estimators for the unknown distribution parameters.
-So far, the concern has been with sampling from one population of certain objects. In practice, however, sampling is often performed with several identical populations (e.g. in estimating the fraction of defective articles in several batches of finished industrial products). In such a situation the object of study is no longer a single number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317041.png" />, but several unknown numbers <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317042.png" />. For instance, let each batch of the finished product contain <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317043.png" /> articles, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317044.png" /> be the numbers of defective articles in these batches, and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317045.png" /> be the corresponding numbers of defective articles found in samples of size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317046.png" />. If the so-called principle of defect-free acceptance is accepted, the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317047.png" />-th batch is delivered to the customer if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317048.png" /> and is rejected otherwise. If it is assumed that the control of the articles involves their destruction, then the customer obtains a batch of size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317049.png" /> (if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317050.png" />) or a batch of size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317051.png" /> containing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317052.png" /> (if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317053.png" />) defective articles, the values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317054.png" /> (thus, their sum as well) being known, while the value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317055.png" /> is not known. The ratio <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317056.png" /> is known as the fraction of passed defectives, and its mathematical expectation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317057.png" /> is known as the average fraction of passed defectives. The task of mathematical statistics is to estimate <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317058.png" /> from the values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317059.png" /> which are determined using the sample method. If the values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317060.png" /> may be treated as the realization of independent identically-distributed random variables with a known distribution law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317061.png" />, then, according to the [[Bayes formula|Bayes formula]], a statistical estimator of the average number of passed defective articles in the accepted batches can be expressed by the formula
+So far, the concern has been with sampling from one population of certain objects. In practice, however, sampling is often performed with several identical populations (e.g. in estimating the fraction of defective articles in several batches of finished industrial products). In such a situation the object of study is no longer a single number  $  M $,
+but several unknown numbers  $  M _ {1} , M _ {2} , .  .  . $.
+For instance, let each batch of the finished product contain  $  N $
+articles, let  $  M _ {1} , M _ {2} \dots $
+be the numbers of defective articles in these batches, and let  $  m _ {1} , m _ {2} \dots $
+be the corresponding numbers of defective articles found in samples of size  $  n $.
+If the so-called principle of defect-free acceptance is accepted, the  $  r $-
+th batch is delivered to the customer if  $  m _ {i} = 0 $
+and is rejected otherwise. If it is assumed that the control of the articles involves their destruction, then the customer obtains a batch of size  $  R _ {i} = 0 $(
+if  $  m _ {i} > 0 $)
+or a batch of size  $  R _ {i} = N - n $
+containing  $  D _ {i} = M _ {i} $(
+if  $  m _ {i} = 0 $)
+defective articles, the values of  $  R _ {1} , R _ {2} , .  .  . $(
+thus, their sum as well) being known, while the value of  $  D _ {1} + D _ {2} + \dots $
+is not known. The ratio  $  ( D _ {1} + D _ {2} + \dots ) / ( R _ {1} + R _ {2} + \dots ) $
+is known as the fraction of passed defectives, and its mathematical expectation  $  q $
+is known as the average fraction of passed defectives. The task of mathematical statistics is to estimate  $  q $
+from the values of  $  R _ {1} , R _ {2} \dots $
+which are determined using the sample method. If the values  $  M _ {1} , M _ {2} \dots $
+may be treated as the realization of independent identically-distributed random variables with a known distribution law  $  {\mathsf P} \{ M _ {i} = r \} = p _ {r} $,
+then, according to the [[Bayes formula|Bayes formula]], a statistical estimator of the average number of passed defective articles in the accepted batches can be expressed by the formula
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317062.png" /></td> </tr></table>
+$$
+\widetilde{D}   = \
+{\mathsf E} \{ M \mid  m = 0 \}  = \
+\frac{\left ( \sum _ {r = 1 } ^ { {N }  - n } r
+\frac{C _ {N - r }   ^ {n} }{C _ {N}  ^ {n} }
+ p _ {r} \right ) }{ {\mathsf P} \{ m = 0 \} }
+ ,
+$$
 and
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317063.png" /></td> </tr></table>
+$$
+\widetilde{D}   \leq
+\frac{( N - n ) {\mathsf P} \{ m = 1 \} }{n {\mathsf P} \{ m= 0 \} }
+ ,
+$$
 where
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317064.png" /></td> </tr></table>
+$$
+{\mathsf P} \{ m = k \}  = \
+\sum _ {r = 0 } ^ { {N }  - n }
+\frac{C _ {r}  ^ {k} C _ {N - r }   ^ {n} }{C _ {N}  ^ {n} }
+ p _ {r} ,\ \
+k = 0 \dots n.
+$$
 For this reason the estimator
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317065.png" /></td> </tr></table>
+$$
+\widetilde{q}   =
+\frac{\widetilde{D}  }{( N - n) }
+$$
 of the average fraction of passed defectives in the accepted batches satisfies the inequality
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317066.png" /></td> </tr></table>
+$$
+\widetilde{q}   \leq
+\frac{ {\mathsf P} \{ m = 1 \} }{n {\mathsf P} \{ m= 0 \} }
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317067.png" /> is the number of accepted batches while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s083/s083170/s08317068.png" /> is the number of defective batches the samples of which yielded exactly one defective article.
+ \approx
+\frac{s _ {1} }{ns _ {0} }
+ ,
+$$
+where  $  s _ {0} $
+is the number of accepted batches while  $  s _ {1} $
+is the number of defective batches the samples of which yielded exactly one defective article.
 ====References====
 <table><TR><TD valign="top">[1]</TD> <TD valign="top">  N.V. Smirnov,   I.V. Dunin-Barkovskii,   "Mathematische Statistik in der Technik" , Deutsch. Verlag Wissenschaft.  (1969)  (Translated from Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  Yu.K. Belyaev,   "Probabilistic methods of sample control" , Moscow  (1975)  (In Russian)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  M.G. Kendall,   A. Stuart,   "The advanced theory of statistics. Distribution theory" , '''3. Design and analysis''' , Griffin  (1969)</TD></TR></table>
 ====Comments====
 ====References====
 <table><TR><TD valign="top">[a1]</TD> <TD valign="top">  J.M. Juran (ed.) , ''Quality control handbook'' , McGraw-Hill  (1962)</TD></TR></table>