Difference between revisions of "Distribution function"
(either "limit of finite convolutions" or "infinite convolution" (mistake in the original)) |
m (Added category TEXdone) |
||
Line 1: | Line 1: | ||
+ | {{TEX|done}} | ||
''of a random variable $X$'' | ''of a random variable $X$'' | ||
Latest revision as of 13:51, 12 December 2013
of a random variable $X$
2020 Mathematics Subject Classification: Primary: 60E05 [MSN][ZBL]
The function of a real variable $x$ taking at each $x$ the value equal to the probability of the inequality $X < x$.
Every distribution function $F(x)$ has the following properties:
1) $F(x') \le F(x'')$ when $x' < x''$;
2) $F(x)$ is left-continuous at every $x$;
3) $\lim\limits_{x \rightarrow -\infty} F(x) = 0$, $\lim\limits_{x \rightarrow +\infty} F(x) = 1$. (Sometimes a distribution function is defined as the probability of $X \le x$; it is then right-continuous.)
In mathematical analysis, a distribution function is any function satisfying 1)–3). There is a one-to-one correspondence between the probability distributions $P_{F}$ on the $\sigma$-algebra $\mathcal{B}$ of Borel subsets of the real line $\mathbb{R}^{1}$ and the distribution functions. This correspondence is as follows: For any interval $\left[ a, b \right]$,
$$ P_{F}([a, b]) = F(b+) - F(a-) $$
Any function $F$ satisfying 1)–3) can be regarded as the distribution function of some random variable $X$ (e.g. $X(x) = x$) defined on the probability space $\left( \mathbb{R}^1, \mathcal{B}, P_{F} \right)$.
Any distribution function can be uniquely written as a sum
$$ F(x) = \alpha_{1} F_{1}(x) + \alpha_{2} F_{2}(x) + \alpha_{3} F_{3}(x), $$
where $\alpha_{1}, \alpha_{2}, \alpha_{3}$ are non-negative numbers with sum equal to 1, and $F_{1}, F_{2}, F_{3}$ are distribution functions such that $F_{1}(x)$ is absolutely continuous,
$$ F_{1}(x) = \int\limits_{-\infty}^{x} p(z) dz, $$
$F_{2}(x)$ is a "step-function",
$$ F_{2}(x) = \sum\limits_{x_{k} < x} p_{k}, $$
where the $x_{k}$ are the points where $F(x)$ "jumps" and the $p_{k} > 0$ are proportional to the size of these jumps, and $F_{3}(x)$ is the "singular" component — a continuous function whose derivative is zero almost-everywhere.
Example. Let $X_{k}$, $k = 1, 2, \ldots,$ be an infinite sequence of independent random variables assuming the values 1 and 0 with probabilities $0 < p_{k} \le \frac{1}{2}$ and $q_{k} = 1 - p_{k}$, respectively. Also, let
$$ X = \sum\limits_{k = 1}^{\infty} \frac{X_{k}}{2^{k}} $$
Now:
1) if $p_k = q_k = \frac{1}{2}$ for all $k$, then $X$ has an absolutely-continuous distribution function (with $p(x) = 1$ for $0 \le x \le 1$, that is, $X$ is uniformly distributed on $\left[ 0, 1 \right]$);
2) if $\sum\limits_{k = 1}^{\infty} p_k < \infty$, then $X$ has a "step" distribution function (it has jumps at all the dyadic-rational points in $\left[ 0, 1 \right]$);
3) if $\sum\limits_{k = 1}^{\infty} p_k = \infty$ and $p_k \rightarrow 0$ as $k \rightarrow \infty$, then $X$ has a "singular" distribution function.
This example serves to illustrate the theorem of P. Lévy asserting that the infinite convolution of discrete distribution functions can contain only one of the components mentioned above.
The "distance" between two distributions $P$ and $Q$ on the real line is often defined in terms of the corresponding distribution functions $F$ and $S$, by putting, for example,
$$ \rho_1(P, Q) = \sup_{x} \left| F(x) - S(x) \right| $$
or
$$ \rho_2(P, Q) = \mathrm{Var} \left( F(x) - S(x) \right) $$
(see Distributions, convergence of; Lévy metric; Characteristic function).
The distribution functions of the probability distributions most often used (e.g. the normal, binomial and Poisson distributions) have been tabulated.
To test hypotheses concerning a distribution function $F$ using results of independent observations, one can use some measure of the deviation of $F$ from the empirical distribution function (see Kolmogorov test; Kolmogorov–Smirnov test; Cramér–von Mises test).
The concept of a distribution function can be extended in a natural way to the multi-dimensional case, but multi-dimensional distribution functions are significantly less used in comparison to one-dimensional distribution functions.
For a more detailed treatment of distribution functions see Gram–Charlier series; Edgeworth series; Limit theorems.
References
[C] | H. Cramér, "Random variables and probability distributions", Cambridge Univ. Press (1970) MR0254895 Zbl 0184.40101 |
[C2] | H. Cramér, "Mathematical methods of statistics", Princeton Univ. Press (1946) MR0016588 Zbl 0063.01014 |
[F] | W. Feller, "An introduction to probability theory and its applications", 1–2, Wiley (1957–1971) |
[BS] | L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics", Libr. math. tables, 46, Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova) Zbl 0529.62099 |
Comments
In the Russian literature distributions functions are taken to be left-continuous. In the Western literature it is common to define them to be right-continuous. Thus, the distribution function of a random variable $X$ is the function $F(x) = \mathrm{P} \lbrace X \le x \rbrace$. It then has the properties 1); 2') $F(x)$ is right-continuous at every $x$; 3). The unique probability distribution $P_{F}$ corresponding to it is now defined as
$$ P_{F}(a, b) = F(b) - F(a) $$
while the "step-function" $F_{2}(x)$ in the above-mentioned decomposition $F = \alpha_{1} F_{1} + \alpha_{2} F_{2} + \alpha_{3} F_{3} $ is
$$ F_{2} (x) = \sum\limits_{x_{k} \le x} p_{k}. $$
References
[JK] | N.L. Johnson, S. Kotz, "Distributions in statistics" , Houghton Mifflin (1970) MR0270476 MR0270475 Zbl 0213.21101 |
Distribution function. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Distribution_function&oldid=30010