# Independence

in probability theory

One of the most important notions in probability theory. Other terms occasionally used are statistical independence, stochastic independence. The assumption that the events, trials and random variables being considered are independent has long been a common premise, from the very beginnings of mathematical probability theory.

Independence of two random events is defined as follows. Let $A$ and $B$ be two random events, and let ${\mathsf P} ( A)$ and ${\mathsf P} ( B)$ be their probabilities. The conditional probability of $B$ given that $A$ has occurred is defined by

$${\mathsf P} ( B \mid A) = \ \frac{ {\mathsf P} ( A \cap B) }{ {\mathsf P} ( A) } ,$$

where ${\mathsf P} ( A \cap B)$ is the probability of the joint occurrence of $A$ and $B$. The events $A$ and $B$ are said to be independent if

$$\tag{1 } {\mathsf P} ( A \cap B) = {\mathsf P} ( A) {\mathsf P} ( B).$$

If ${\mathsf P} ( A) > 0$ this is equivalent to

$$\tag{2 } {\mathsf P} ( B \mid A) = {\mathsf P} ( B).$$

The meaning of this definition can be explained as follows. On the assumption that a large number $N$ of trials is being carried out, and assuming for the moment that (2) refers to relative frequencies rather than probabilities, one may conclude that the relative frequency of the event $B$ in all $N$ trials must be equal to the relative frequency of its occurrences in the trials in which $A$ also occurs. Thus, independence of two events indicates that there is no discernable connection between the occurrence of the one event and that of the other. Thus, the event "a randomly-selected person has a family name beginning, say, with the letter A" , and the event that "the same person will win the grand prize in the next play of the state lottery" are independent.

The definition of independence of $n$ random events $A _ {1} \dots A _ {n}$, $n > 2$, may be presented in several equivalent versions. According to one version, these events are said to be independent if, for any $m$, $2 \leq m \leq n$, and for any $m$ pairwise distinct natural numbers $k _ {1} \dots k _ {m} \leq n$, the probability of the joint occurrence of the events $A _ {k _ {1} } \dots A _ {k _ {m} }$ is equal to the product of their probabilities:

$$\tag{3 } {\mathsf P} ( A _ {k _ {1} } \cap \dots \cap A _ {k _ {m} } ) = \ {\mathsf P} ( A _ {k _ {1} } ) \dots {\mathsf P} ( A _ {k _ {m} } ).$$

Hence, as before, one may conclude that the conditional probability of each event given the occurrence of any combination of the others is equal to its "unconditional" probability.

Sometimes, besides the independence (mutual independence) of the events $A _ {1} \dots A _ {n}$, one also considers the notion known as pairwise independence: Any two of these events, say $A _ {i}$ and $A _ {j}$, $i \neq j$, are independent. Independence of events implies pairwise independence, but the converse need not be true.

Prior to the axiomatic construction of probability theory, the independence concept was not interpreted in an adequately clear-cut fashion. In the words of A.A. Markov ([1], p. 24): "The concept of independent events may be considered quite clear in known theoretical problems; in other problems, however, the concept may of course become quite obscured, in keeping with the obscurity of the fundamental notion of probability" .

In the context of an axiomatic approach, the most natural definition of independence is the following. Let $( \Omega , {\mathcal A} , {\mathsf P} )$ be some probability space, where $\Omega$ is the set of elementary events, ${\mathcal A}$ a $\sigma$- algebra of events and ${\mathsf P}$ a probability measure defined on ${\mathcal A}$. One first defines independence of classes of events (the only classes ${\mathcal B}$ considered here will be sub $\sigma$- algebras of ${\mathcal A}$). Classes ${\mathcal B} _ {1} \dots {\mathcal B} _ {n}$ are said to be independent (relative to ${\mathsf P}$) if any events $A _ {1} \in {\mathcal B} _ {1} \dots A _ {n} \in {\mathcal B} _ {n}$ are independent in the sense of (3); the classes ${\mathcal B} _ {t}$( $t \in T$, where $T$ is an arbitrary index set) are said to be independent if, for any integer $n \geq 2$ and any pairwise distinct $t _ {1} \dots t _ {n} \in T$, the classes ${\mathcal B} _ {t _ {1} } \dots {\mathcal B} _ {t _ {n} }$ are independent. Independence of events $A _ {k}$, $1 \leq k \leq n$, is equivalent to independence of the classes

$${\mathcal B} _ {k} = \{ \emptyset , A _ {k} , \overline{ {A _ {k} }}\; , \Omega \} .$$

In the case of trials, independence is precisely the independence of the $\sigma$- algebras generated by the trials.

For random variables $X _ {t}$, $t \in T$, independence is defined as independence of the sub $\sigma$- algebras ${\mathcal B} ( X _ {t} )$, where ${\mathcal B} ( X _ {t} )$ is the pre-image under $X _ {t}$ of the $\sigma$- algebra of Borel sets on the real line. Independence of random events $A _ {1} \dots A _ {n}$ is equivalent to independence of their indicators $I _ {A _ {k} }$, i.e. independence of the random variables defined by

$$I _ {A _ {k} } ( \omega ) = 1 \ \textrm{ for } \omega \in A _ {k}$$

and

$$I _ {A _ {k} } ( \omega ) = 0 \ \textrm{ for } \omega \notin A _ {k} .$$

There are various necessary and sufficient conditions for the independence of random variables $X _ {1} \dots X _ {n}$:

1) For arbitrary real numbers $a _ {1} \dots a _ {n}$, the value of the distribution function

$$F _ {X _ {1} \dots X _ {n} } ( a _ {1} \dots a _ {n} ) = \ {\mathsf P} \{ \omega : {X _ {1} ( \omega ) < a _ {1} \dots X _ {n} ( \omega ) < a _ {n} } \}$$

is equal to the product of the values of the individual distribution functions

$$F _ {X _ {1} \dots X _ {n} } ( a _ {1} \dots a _ {n} ) = \ F _ {X _ {1} } ( a _ {1} ) \dots F _ {X _ {n} } ( a _ {n} ).$$

2) If there exist densities $p _ {X _ {1} \dots X _ {n} } ( a _ {1} \dots a _ {n} )$( cf. Density of a probability distribution), then the density is equal to the product $p _ {X _ {1} } ( a _ {1} ) \dots p _ {X _ {n} } ( a _ {n} )$ of the individual densities for almost all $( a _ {1} \dots a _ {n} )$ with respect to Lebesgue measure on $\mathbf R ^ {n}$.

3) The characteristic function

$$f _ {X _ {1} \dots X _ {n} } ( u _ {1} \dots u _ {n} ) = \ {\mathsf E} e ^ {iu _ {1} X _ {1} + \dots + iu _ {n} X _ {n} }$$

is equal, for all real numbers $u _ {1} \dots u _ {n}$, to the product $f _ {X _ {1} } ( u _ {1} ) \dots f _ {X _ {n} } ( u _ {n} )$, $f _ {X _ {k} } ( u _ {k} ) = {\mathsf E} e ^ {iu _ {k} X _ {k} }$, of the individual characteristic functions.

The most important schemes of probability theory are based on the assumption that various events and random variables are independent: sequences of independent random variables (see, e.g., Bernoulli random walk; Law of large numbers; Limit theorems of probability theory), stochastic processes with independent increments (see, e.g., Wiener process; Stochastic process), etc. (see also Zero-one law).

## General remarks about the concept of independence.

1) Independence of functions of independent random variables. Given the independence of random variables $X _ {1} \dots X _ {n}$ one may deduce various propositions that are fairly obvious and also in full intuitive agreement with the idea of independence. For example, a function of $X _ {1} \dots X _ {k}$ and a function of $X _ {k + 1 } \dots X _ {n}$, $1 \leq k < n$, are independent random variables. Functions of other types may be independent only if certain additional assumptions are made; such independence may serve in the definition of various classes of distributions. For example, if $X _ {1} \dots X _ {n}$ are independent, identically distributed and have a normal distribution, then the functions

$$\tag{4 } \overline{X}\; = \ \frac{X _ {1} + \dots + X _ {n} }{n}$$

and

$$\tag{5 } { \frac{1}{n} } \sum _ {j = 1 } ^ { n } ( X _ {j} - \overline{X}\; ) ^ {2}$$

(these are statistical estimators for the expectation and the variance of the $X _ {k}$, respectively) are independent random variables. The converse is also true: If $X _ {1} \dots X _ {k}$ are independent and identically distributed and if the functions (4) and (5) are independent, then the $X _ {k}$ are normally distributed. In exactly the same way, if it is known that $X _ {1} \dots X _ {k}$ are independent and identically distributed, that the two linear forms

$$Y _ {1} = \ \sum _ {j = 1 } ^ { n } a _ {j} X _ {j} \ \ \textrm{ and } \ \ Y _ {2} = \ \sum _ {j = 1 } ^ { n } b _ {j} X _ {j}$$

are independent random variables, that $( a _ {1} \dots a _ {n} ) \neq ( b _ {1} \dots b _ {n} )$, and that none of the coefficients $a _ {j}$, $b _ {j}$ vanishes, then all the $X _ {j}$ are normally distributed. (This kind of theorem can be used to deduce, under minimal assumptions, say, Maxwell's law for the distribution of molecular velocities.) The above propositions are examples of what are known as characterization theorems, and were most thoroughly studied by Yu.V. Linnik and his school.

2) Existence of independent random variables on a given probability space. If the set of elementary events $\Omega$ consists of three elements, each of which is assigned probability $1 / 3$, then there do not exist non-constant independent random variables on $\Omega$. Letting the probability space be the interval $[ 0, 1]$ with Lebesgue measure $m$, then, given any sequence of distribution functions $F _ {1} ( x), F _ {2} ( x) \dots$ one can define measurable functions $X _ {k} ( \omega )$ on $[ 0, 1]$ that are independent random variables with respect to $m$ and such that

$$m \{ \omega : {0 \leq \omega \leq 1, X _ {k} ( \omega ) < x } \} = \ F _ {k} ( x).$$

The simplest example of this kind of statistically-independent functions on $[ 0, 1]$ is furnished by the signs of the binary decomposition of an $\omega$, $0 \leq \omega \leq 1$, or of the related Rademacher functions:

$$r _ {k} ( \omega ) = \ \mathop{\rm sign} \sin \ ( 2 \pi \cdot 2 ^ {k - 1 } \omega ),\ \ k = 1, 2 ,\dots .$$

It should be noted that the existence of some probability space on which one can define independent random variables with given distributions is a corollary of Kolmogorov's theorem on probabilities in infinite-dimensional spaces (see [3], Chapt. III, Sect. 4).

3) Independent random variables as a source of other schemes. Let $Y _ {1} \dots Y _ {n} \dots$ be a sequence of independent random variables and set

$$X _ {0} = 0 ,\ X _ {n} = \ \sum _ { k= } 1 ^ { n } Y _ {k} \ \ ( n \geq 1 ) ;$$

then one obtains a sequence of random variables forming a Markov chain. A similar procedure will yield a Markov process, for example, beginning with a Wiener process and using a stochastic differential equation. Starting with Gaussian random measures with independent values and using the Fourier transform, one can construct Gaussian stationary stochastic processes, etc.

4) Weak dependence. The asymptotic laws of probability theory that are established for sequences of independent random variables can usually be extended to sequences of so-called weakly-dependent variables, i.e. to sequences $X _ {1} \dots X _ {n} \dots$ in which there is a suitably measured dependence between "distant" segments of the sequence that is "small" (in the simplest cases, these may be sequences of $m$- dependent random variables, where $X _ {k}$ and $X _ {l}$ are independent if $| k - l | > m$; or sequences of random variables forming an ergodic Markov chain (cf. Markov chain, ergodic); etc.). One of the main methods for proving theorems of this type is reduction to the situation of independence.

5) Independence in number theory. Let $p \geq 2$ and $q \geq 2$ be two relatively-prime numbers. Let $N$ be a natural number, and suppose that a number between 1 and $N$ is chosen at random (the probability of each being chosen is assumed to be $1/N$). Let $A _ {p}$( $A _ {q}$) be the event that the chosen number is divisible by $p$( by $q$). Then

$${\mathsf P} ( A _ {p} ) = \ { \frac{1}{N} } \left [ { \frac{N}{p} } \right ] ,\ \ {\mathsf P} ( A _ {q} ) = \ { \frac{1}{N} } \left [ { \frac{N}{q} } \right ] ,$$

$${\mathsf P} ( A _ {p} \cap A _ {q} ) = { \frac{1}{N} } \left [ { \frac{N}{pq} } \right ] ,$$

and if one lets $N \rightarrow \infty$, then the events $A _ {p}$ and $A _ {q}$ become "almost independent" . A much more profound proposition is the following: Letting $N \rightarrow \infty$, one can choose $S = S _ {N} \rightarrow \infty$ such that the events $A _ {2} \dots A _ {p _ {S} }$( where $A _ {j}$ denotes divisibility by the $j$- th prime) are jointly "almost independent" ; this proposition provides the basis for studying the value distribution of arithmetic functions (see Number theory, probabilistic methods in). There are also other branches of number theory in which the idea of independence plays an explicit or implicit part.

6) For testing of hypotheses of using independence results of observations, see Statistical hypotheses, verification of.

#### References

 [1] A.A. Markov, "Wahrscheinlichkeitsrechung" , Teubner (1912) (Translated from Russian) [2] A.N. Kolmogorov, "Foundations of the theory of probability" , Chelsea, reprint (1950) (Translated from Russian) [3] A.N. Kolmogorov, "The theory of probability" , Mathematics, its content, methods and meaning , 4 , Amer. Math. Soc. (1963) pp. Chapt. 6 (Translated from Russian) [4] M. Kac, "Statistical independence in probability, analysis and number theory" , Math. Assoc. Amer. (1963) [5] W. Feller, "An introduction to probability theory and its applications", 1–2 , Wiley (1957–1971)