Difference between revisions of "Variance"

Latest revision as of 20:19, 27 January 2020

in probability theory

2020 Mathematics Subject Classification: Primary: 60-01 [MSN][ZBL]

The measure $\newcommand{\Var}{\operatorname{Var}} \newcommand{\Ex}{\mathop{\mathsf{E}}} \newcommand{\Prob}{\mathop{\mathsf{P}}} \Var X$ of the deviation of a random variable $X$ from its mathematical expectation $\Ex X$ defined by the equation: $$\begin{equation}\label{eq:1} \Var X = \Ex(X-\Ex X)^2. \end{equation}$$

The properties of the variance are: $$\begin{equation} \Var X = \Ex X^2 - (\Ex X)^2; \end{equation}$$ if $c$ is a real number, then $$\begin{equation} \Var (cX) = c^2\Var X, \end{equation}$$ in particular, $\Var(-X) = \Var X$.

In speaking of the variance of a random variable $X$, it is always assumed that its expectation $\Ex X$ exists; the variance $\Var X$ may exist (i.e. be finite) or may not (i.e. be infinite). In modern probability theory the expectation of a random variable is defined in terms of the Lebesgue integral over the sample space. However, formulas expressing the expectation of various functions of a random variable $X$ in terms of the distribution of this variable on the set of real numbers are of importance (cf. Mathematical expectation). For the variance $\Var X$ these formulas are

a) $$\begin{equation} \Var X = \sum_i(a_i-\Ex X)^2p_i, \end{equation}$$ for a discrete random variable $X$ which assumes at most a countable number of different values $a_i$ with probabilities $p_i=\Prob\{X=a_i\}$;

b) $$\begin{equation} \Var X = \int\limits_{-\infty}^{\infty}(x-\Ex X)^2p(x)\,dx, \end{equation}$$ for a random variable $X$ with a density $p$ of the probability distribution;

c) $$\begin{equation} \Var X = \int\limits_{-\infty}^{\infty}(x-\Ex X)^2\,dF(x), \end{equation}$$ in the general case; here $F$ is the distribution function of the random variable $X$, and the integral is understood in the sense of Lebesgue–Stieltjes or Riemann–Stieltjes.

The variance is not the only conceivable measure of the deviation of a random variable from its expectation. Other measures of the deviation, constructed on the same principle, e.g. $\Ex|X-\Ex X|$, $\Ex(X-\Ex X)^4$, etc., are also possible, as are measures of deviation based on quantiles (cf. Quantile). The importance of the variance is mainly due to the role played by this concept in limit theorems. Roughly speaking, one may say that if the expectation and variance of the sum of a large number of random variables are known, it is possible to describe completely the distribution law of this sum: It is (approximately) normal, with corresponding parameters (cf. Normal distribution). Thus, the most important properties of the variance are connected with the expression for the variance $\Var(X_1+\cdots+X_n)$ of the sum of random variables $X_1,\dots, X_n$:

$$ {\mathsf D} ( X _{1} + \dots + X _{n} ) \ = \ \sum _ {i = 1} ^ n {\mathsf D} X _{i} + 2 \sum _ {i < j} \mathop{\rm cov}\nolimits ( X _{i} ,\ X _{j} ) , $$

where

$$ \mathop{\rm cov}\nolimits ( X _{i} ,\ X _{j} ) \ = \ {\mathsf E} \{ ( X _{i} - {\mathsf E} X _{i} ) ( X _{j} - {\mathsf E} X _{j} ) \} $$

denotes the covariance of the random variables $ X _{i} $ and $ X _{j} $. If the random variables $ X _{1} \dots X _{n} $ are pairwise independent, then $ \mathop{\rm cov}\nolimits ( X _{i} ,\ X _{j} ) = 0 $. Accordingly, the equation

$$ \tag{7} {\mathsf D} ( X _{1} + \dots + X _{n} ) \ = \ {\mathsf D} X _{1} + \dots + {\mathsf D} X _{n} $$

is valid for pairwise independent random variables. The converse proposition is not valid: (7) does not entail independence. Nevertheless, the utilization of (7) is usually based on the independence of the random variables. Strictly speaking, a sufficient condition for the validity of (7) is that $ \mathop{\rm cov}\nolimits ( X _{i} ,\ X _{j} ) = 0 $, i.e. the random variables $ X _{1} \dots X _{n} $ need to be pairwise uncorrelated.

The applications of the concept of the variance have had two directions of development. The first is in the limit theorems of probability theory. If, for a sequence of random variables $ X _{1} ,\ X _{2} \dots $ one has $ D X _{n} \rightarrow 0 $ as $ n \rightarrow \infty $, then for any $ \epsilon > 0 $,

$$ {\mathsf P} \{ | X _{n} - {\mathsf E} X _{n} | > \epsilon \} \ \rightarrow \ 0 $$

as $ n \rightarrow \infty $( cf. Chebyshev inequality in probability theory), i.e. if $ n $ is large the random variable $ X _{n} $ becomes practically identical with the non-random variable $ {\mathsf E} X _{n} $. The development of these concepts yields a proof of the law of large numbers, of the consistency of estimators (cf. Consistent estimator) in mathematical statistics, and also leads to other applications in which convergence in probability is established for random variables. Another application to limit theorems is connected with the concept of normalization. Normalization of a random variable $ X $ is effected by subtracting the expectation and dividing by the square root of the variance $ \sqrt { {\mathsf D} X} $; in other words, the variable $ Y = ( X - {\mathsf E} X ) / \sqrt { {\mathsf D} X} $ is considered. Normalization of a sequence of random variables is usually necessary in order to obtain a convergent sequence of distribution laws, in particular, convergence to the normal law with parameters zero and one. The second direction consists in the application of the concept of the variance in mathematical statistics to sample processing. If a random variable is considered as the realization of a random experiment, an arbitrary change in the numerical scale converts the random variable $ X $ to $ Y = \sigma X + a $, where $ a $ is an arbitrary random number and $ \sigma $ is a positive number. It is accordingly meaningful, in many cases, to consider not the one theoretical distribution law $ F (x) $ of the random variable $ X $ alone, but rather the type of the law, i.e. the family of distribution laws of the type $F((x-a)/\sigma)$, which is a function of at least two parameters $ a $ and $ \sigma $. If $ {\mathsf E} X = 0 $, $ {\mathsf D} X = 1 $, then $ {\mathsf E} X = a $ and $ {\mathsf D} Y = \sigma ^{2} $. Accordingly, the meaning of the parameters in the theoretical law is $ a = {\mathsf E} Y $ and $ \sigma = \sqrt { {\mathsf D} Y} $. This makes it possible to determine these parameters by sampling.

References

[G]	B.V. Gnedenko, "The theory of probability", Chelsea, reprint (1962) (Translated from Russian)
[F]	W. Feller, "An introduction to probability theory and its applications", 1–2, Wiley (1957–1971)
[C]	H. Cramér, "Mathematical methods of statistics", Princeton Univ. Press (1946) MR0016588 Zbl 0063.01014

Comments

Dispersion is usually termed variance in English, and one accordingly uses $ \mathop{\rm Var}\nolimits \ X $ instead of $ {\mathsf D} X $.

How to Cite This Entry:
Variance. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Variance&oldid=29498

This article was adapted from an original article by V.N. Tutubalin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Variance"

Latest revision as of 20:19, 27 January 2020

References

Comments

@@ Line 1: / Line 1: @@
+{{TEX|done}}
-{{TEX|want}}
-$\newcommand{\Var}{\operatorname{Var}}$
-$\newcommand{\Ex}{\mathop{\mathsf{E}}}$
-$\newcommand{\Prob}{\mathop{\mathsf{P}}}$
 ''in probability theory''
 {{MSC|60-01}}
-The measure $\Var X$ of the deviation of a [[Random_variable | random variable]] $X$ from its [[Mathematical expectation|mathematical expectation]] $\Ex X$ defined by the equation:
+The measure $\newcommand{\Var}{\operatorname{Var}}
-\begin{equation}\label{eq:1}
+\newcommand{\Ex}{\mathop{\mathsf{E}}}
+\newcommand{\Prob}{\mathop{\mathsf{P}}}
+\Var X$ of the deviation of a [[Random_variable | random variable]] $X$ from its [[Mathematical expectation|mathematical expectation]] $\Ex X$ defined by the equation:
+$$\begin{equation}\label{eq:1}
 \Var X = \Ex(X-\Ex X)^2.
-\end{equation}
+\end{equation}$$
 The properties of the variance are:
-\begin{equation}
+$$\begin{equation}
 \Var X = \Ex X^2 - (\Ex X)^2;
-\end{equation}
+\end{equation}$$
 if $c$ is a real number, then
-\begin{equation}
+$$\begin{equation}
 \Var (cX) = c^2\Var X,
-\end{equation}
+\end{equation}$$
 in particular, $\Var(-X) = \Var X$.
@@ Line 29: / Line 27: @@
 a)
-\begin{equation}
+$$\begin{equation}
 \Var X = \sum_i(a_i-\Ex X)^2p_i,
-\end{equation}
+\end{equation}$$
 for a discrete random variable $X$ which assumes at most a countable number of different values $a_i$ with probabilities $p_i=\Prob\{X=a_i\}$;
 b)
-\begin{equation}
+$$\begin{equation}
 \Var X = \int\limits_{-\infty}^{\infty}(x-\Ex X)^2p(x)\,dx,
-\end{equation}
+\end{equation}$$
 for a random variable $X$ with a density $p$ of the probability distribution;
 c)
-\begin{equation}
+$$\begin{equation}
 \Var X = \int\limits_{-\infty}^{\infty}(x-\Ex X)^2\,dF(x),
-\end{equation}
+\end{equation}$$
-in the integral case; here $F$ is the distribution function of the random variable $X$, and the integral is understood in the sense of [[Lebesgue-Stieltjes_integral|Lebesgue–Stieltjes]] or [[Riemann–Stieltjes_integral|Riemann–Stieltjes]].
+in the general case; here $F$ is the distribution function of the random variable $X$, and the integral is understood in the sense of [[Lebesgue-Stieltjes_integral|Lebesgue–Stieltjes]] or [[Riemann–Stieltjes_integral|Riemann–Stieltjes]].
 The variance is not the only conceivable measure of the deviation of a random variable from its expectation. Other measures of the deviation, constructed on the same principle, e.g. $\Ex|X-\Ex X|$, $\Ex(X-\Ex X)^4$, etc., are also possible, as are measures of deviation based on quantiles (cf. [[Quantile|Quantile]]). The importance of the variance is mainly due to the role played by this concept in [[Limit theorems|limit theorems]]. Roughly speaking, one may say that if the expectation and variance of the sum of a large number of random variables are known, it is possible to describe completely the distribution law of this sum: It is (approximately) normal, with corresponding parameters (cf. [[Normal distribution|Normal distribution]]). Thus, the most important properties of the variance are connected with the expression for the variance $\Var(X_1+\cdots+X_n)$ of the sum of random variables $X_1,\dots, X_n$:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333028.png" /></td> </tr></table>
+$$
+{\mathsf D} ( X _{1} + \dots + X _{n} ) \  = \  \sum _ {i = 1} ^ n
+{\mathsf D} X _{i} + 2 \sum _ {i < j}  \mathop{\rm cov}\nolimits ( X _{i} ,\  X _{j} ) ,
+$$
 where
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333029.png" /></td> </tr></table>
+$$
+ \mathop{\rm cov}\nolimits ( X _{i} ,\  X _{j} ) \  = \  {\mathsf E} \{ ( X _{i} - {\mathsf E} X
+_{i} ) ( X _{j} - {\mathsf E} X _{j} ) \}
+$$
+denotes the [[Covariance|covariance]] of the random variables  $  X _{i} $
+and  $  X _{j} $.
+If the random variables  $  X _{1} \dots X _{n} $
+are pairwise independent, then  $   \mathop{\rm cov}\nolimits ( X _{i} ,\  X _{j} ) = 0 $.
+Accordingly, the equation
+$$ \tag{7}
+{\mathsf D} ( X _{1} + \dots + X _{n} ) \  = \  {\mathsf D} X _{1} + \dots
++ {\mathsf D} X _{n}  $$
-denotes the [[Covariance|covariance]] of the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333030.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333031.png" />. If the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333032.png" /> are pairwise independent, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333033.png" />. Accordingly, the equation
+is valid for pairwise independent random variables. The converse proposition is not valid: (7) does not entail independence. Nevertheless, the utilization of (7) is usually based on the independence of the random variables. Strictly speaking, a sufficient condition for the validity of (7) is that  $   \mathop{\rm cov}\nolimits ( X _{i} ,\  X _{j} ) = 0 $,
+i.e. the random variables  $  X _{1} \dots X _{n} $
+need to be pairwise uncorrelated.
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333034.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+The applications of the concept of the variance have had two directions of development. The first is in the limit theorems of probability theory. If, for a sequence of random variables  $  X _{1} ,\  X _{2} \dots $
+one has  $  D X _{n} \rightarrow 0 $
+as  $  n \rightarrow \infty $,
+then for any  $  \epsilon > 0 $,
-is valid for pairwise independent random variables. The converse proposition is not valid: (2) does not entail independence. Nevertheless, the utilization of (2) is usually based on the independence of the random variables. Strictly speaking, a sufficient condition for the validity of (2) is that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333035.png" />, i.e. the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333036.png" /> need to be pairwise uncorrelated.
-The applications of the concept of the variance have had two directions of development. The first is in the limit theorems of probability theory. If, for a sequence of random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333037.png" /> one has <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333038.png" /> as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333039.png" />, then for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333040.png" />,
+$$
+{\mathsf P} \{ | X _{n} - {\mathsf E} X _{n} | > \epsilon \}
+\  \rightarrow \  0
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333041.png" /></td> </tr></table>
-as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333042.png" /> (cf. [[Chebyshev inequality in probability theory|Chebyshev inequality in probability theory]]), i.e. if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333043.png" /> is large the random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333044.png" /> becomes practically identical with the non-random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333045.png" />. The development of these concepts yields a proof of the [[Law of large numbers|law of large numbers]], of the consistency of estimators (cf. [[Consistent estimator|Consistent estimator]]) in mathematical statistics, and also leads to other applications in which convergence in probability is established for random variables. Another application to limit theorems is connected with the concept of normalization. Normalization of a random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333046.png" /> is effected by subtracting the expectation and dividing by the square root of the variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333047.png" />; in other words, the variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333048.png" /> is considered. Normalization of a sequence of random variables is usually necessary in order to obtain a convergent sequence of distribution laws, in particular, convergence to the normal law with parameters zero and one. The second direction consists in the application of the concept of the variance in mathematical statistics to sample processing. If a random variable is considered as the realization of a random experiment, an arbitrary change in the numerical scale converts the random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333049.png" /> to <img align="absmiddle" border="0" src="/leg
+as  $  n \rightarrow \infty $(
-acyimages/d/d033/d033330/d03333050.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333051.png" /> is an arbitrary random number and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333052.png" /> is a positive number. It is accordingly meaningful, in many cases, to consider not the one theoretical distribution law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333053.png" /> of the random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333054.png" /> alone, but rather the type of the law, i.e. the family of distribution laws of the type $F((x-a)/\sigma)$, which is a function of at least two parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333056.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333057.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333058.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333059.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333060.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333061.png" />. Accordingly, the meaning of the parameters in the theoretical law is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333062.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333063.png" />. This makes it possible to determine these parameters by sampling.
+cf. [[Chebyshev inequality in probability theory|Chebyshev inequality in probability theory]]), i.e. if  $  n $
+is large the random variable  $  X _{n} $
+becomes practically identical with the non-random variable  $  {\mathsf E} X _{n} $.
+The development of these concepts yields a proof of the [[Law of large numbers|law of large numbers]], of the consistency of estimators (cf. [[Consistent estimator|Consistent estimator]]) in mathematical statistics, and also leads to other applications in which convergence in probability is established for random variables. Another application to limit theorems is connected with the concept of normalization. Normalization of a random variable  $  X $
+is effected by subtracting the expectation and dividing by the square root of the variance  $  \sqrt { {\mathsf D} X} $;
+in other words, the variable  $  Y = ( X - {\mathsf E} X ) / \sqrt { {\mathsf D} X} $
+is considered. Normalization of a sequence of random variables is usually necessary in order to obtain a convergent sequence of distribution laws, in particular, convergence to the normal law with parameters zero and one. The second direction consists in the application of the concept of the variance in mathematical statistics to sample processing. If a random variable is considered as the realization of a random experiment, an arbitrary change in the numerical scale converts the random variable  $  X $
+to  $  Y = \sigma X + a $,
+where  $  a $
+is an arbitrary random number and  $  \sigma $
+is a positive number. It is accordingly meaningful, in many cases, to consider not the one theoretical distribution law  $  F (x) $
+of the random variable  $  X $
+alone, but rather the type of the law, i.e. the family of distribution laws of the type $F((x-a)/\sigma)$, which is a function of at least two parameters  $  a $
+and  $  \sigma $.
+If  $  {\mathsf E} X = 0 $,
+$  {\mathsf D} X = 1 $,
+then  $  {\mathsf E} X = a $
+and  $  {\mathsf D} Y = \sigma ^{2} $.
+Accordingly, the meaning of the parameters in the theoretical law is  $  a = {\mathsf E} Y $
+and  $  \sigma = \sqrt { {\mathsf D} Y} $.
+This makes it possible to determine these parameters by sampling.
 ====References====
@@ Line 77: / Line 120: @@
 ====Comments====
-Dispersion is usually termed variance in English, and one accordingly uses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333064.png" /> instead of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d033/d033330/d03333065.png" />.
+Dispersion is usually termed variance in English, and one accordingly uses  $   \mathop{\rm Var}\nolimits \  X $
+instead of  $  {\mathsf D} X $.