Namespaces
Variants
Actions

Difference between revisions of "User:Boris Tsirelson/sandbox"

From Encyclopedia of Mathematics
Jump to: navigation, search
Line 247: Line 247:
  
  
==Strong Mixing Conditions==
+
=Strong Mixing Conditions=
  
 
Richard C. Bradley <br>
 
Richard C. Bradley <br>

Revision as of 22:05, 24 March 2016


Queueing Theory

U. Narayan Bhat Southern Methodist University, Dallas, TX, USA 75275

Keywords: birth and death process; Markov chain; Markov process; matrix analytic method; queue length; steady state probability; stochastic process; transition probability; waiting line. .25in

Queueing is essential to manage congestion in traffic of any type in the modern technological world. This does not mean it is a new phenomenon. More than one hundred years ago, recognizing its importance to telephone traffic, Danish mathematician A. K. Erlang (1909) showed for the first time how probability theory can be used to provide a mathematical model for telephone conversations. From then on, slowly in the first three decades, moderately in the next three decades, and tremendously in the last four decades, the probabilistic approach to modeling queueing phenomena has grown and contributed significantly to the technological progress. For a historical perspective of the growth of queueing theory see Chapter 1 of Bhat (2008).

Queueing theory describes probabilistically and mathematically the interaction between the arrival process of customers and the service provided to them in order to manage the system in an efficient manner. The term customer is used in a generic sense representing a unit, human or otherwise, demanding service. The unit providing service is known as the server. Some examples of a queueing system are: a communication system with voice and data traffic demanding transmission; a manufacturing system with several work stations; patients arriving at a doctor's office; vehicles requiring service; and so on.

Since the arrival process and service are random phenomena we start with a probabilistic model (also known as a stochastic model) of a queueing system. If we analyze such models using mathematical techniques we can derive its properties that can be used in understanding its behavior and managing it for its efficient use.

In order to build a probabilistic model, first we describe the arrival process (called the input process) using probability distributions. For example, the arrival of customers could be in a Poisson process; i.e. the number of customers arriving in a set period of time has a Poisson distribution. Its parameter, say $\lambda$, gives the mean number of customers arriving during a unit time. The known distribution now identifies the arrival process. The amount of service provided by the facility is represented by a random variable since it could be random. The distribution of the random variable identifies the service process. When we talk about service we have to take into consideration the mode of service such as service provided with several servers, service provided in a network of servers, etc. Also we must include factors such as queue discipline (e.g., first come, first served (FCFS), also known as first in, first out (FIFO); last come, first served (LCFS or LIFO); group service; priority service; etc). Another factor that complicates the model is the system capacity, which may be finite or infinite.

Because of the multitude of factors involved in a queueing system, we use a three or four element symbolic representation in discussing various types of systems. The basic structure of the representation is to use symbols or numbers for the three elements: input/service/number of servers. When the system capacity is finite an additional element is added. The commonly used symbols for distributions are: $M$ for Poisson or exponential, $E_k$ for Erlangian with $k$ phases (gamma distribution with an integer scale parameter $k$), $D$ for deterministic, and $G$ for a general (also $GI$ for general independent) or an unspecified distribution. Thus M/G/1 represents a Poisson arrival, general service, and single server system, and M/G/1/N has the same description as above with a capacity restriction of $N$ customers in the system.

When the arrival process is represented by a random variable with an index parameter t, define $A(t)$ as the number of customers arriving and $D(t)$ the number of customers leaving the system during a time period $(0, t)$. Let the number of customers in the system at time $t$ be $Q(t)$. Then $Q(t) = A(t) - D(t)$. In order to manage the system efficiently one has to understand how the process $Q(t)$ behaves over time. Note that all $A(t), D(t)$, and $Q(t)$ are stochastic processes (which are sequences of random variables indexed by the time parameter $t$.) Since the total number of customers leaving the system at $t$ is dependent on the number customers arriving during that time, the mode of their arrival (e.g. there may be time periods with no customers in the system, commonly called idle periods), the service mechanism, queue discipline (when some customers get preferred treatment) and other factors that affect the operation of the system (e.g. service breakdowns), to analyze $Q(t)$, all these factors need to be taken into account in the model.

In the analysis of a queueing system the stochastic process $W(t)$ representing the waiting time of a customer to get served, and the random variable, say $B$, representing the busy period (the amount of time the system is continuously busy at a stretch) are also used. The objective of the analysis is to get the distributional properties of the stochastic processes $Q(t)$ and $W(t)$ and the random variable $B$ for use in decision making. Analyzing stochastic processes in finite time t often becomes very complex. When the constituent elements such as arrival and service are not time-dependent we can derive the distributions of the limit random variables $Q = \lim_{t \rightarrow\infty} Q(t)$ and $W = \lim_{t \rightarrow\infty} W(t)$ when they exist. The ratio arrival rate/service rate is commonly known as the traffic intensity of the queueing system (say $\rho$). The property $\rho< 1$ is generally the requirement for the existence of the limit distributions of the stochastic processes $Q(t)$ and $W(t)$, when they are time-independent. The behavioral performance measures of interest in a queueing system are transition probability distributions of $Q(t)$ and $W(t)$, probability distributions of $Q, W$, and $B$, and their means and variances.

In addition to the behavioral problems of underlying stochastic processes mentioned above, we are also interested in inference problems such as estimation and tests of hypotheses regarding basic parameters and performance measures, and optimization problems for assistance in decision making. An introduction to these topics and the necessary references may be found in Bhat (2008).

In order to provide an illustration of the behavioral analysis of a queueing system we consider below a system with Poisson arrivals, exponential service, and single server, symbolically known as an M/M/1 queue. This is the simplest and the most used system in applications.

Let customers arrive in a Poisson process with rate $\lambda$. This means that the number $A(t)$ of the customers arriving in $(0, t)$ has a Poisson distribution \[ P[A(t)=j] = e^{-\lambda t}\frac{(\lambda t)^j}{j!}, \hspace{.15in} j=0,1,2,... \] It also means that the interarrival times have an exponential distribution with probability density $a(x) = \lambda e^{-\lambda x} (x > 0)$. We assume the service times to have an exponential distribution with probability density $b(x)=\mu e^{-\mu x} (x > 0)$. With these assumptions we have E[inter-arrival time] $=(1/\lambda) = 1$/arrival rate and E[service time] $= (1/\mu) = 1$/service rate. The ratio of arrival rate to service rate is the traffic intensity . Note that we have assumed the processes to be time-independent.

Let $Q(t)$ be the number of customers in the system at time $t$ and its transition probability distribution be defined as \[ P_{ij}=P[Q(t)=j|Q(0)=i] \] Because of the Poisson arrival process and the exponential service distribution, $Q(t)$ can be modeled as a birth and death process (a class of stochastic processes with major properties (a) probability of more than one state change during an infinitesimal interval of time is close to zero; (b) the rate of change in a unit time is constant and (c) changes occurring in non-overlapping intervals of time are independent of each other) governed by the following difference-differential equations. \begin{eqnarray} P_{i0}(t)&=& -\lambda P_{i0}(t)+ \mu P_{i1}(t) \nonumber\\ P_{in}(t)&=&-(\lambda+ \mu)P_{in}(t)+\lambda P_{i,n-1}(t) \nonumber\\ & & + \mu P_{i,n+1}(t) \hspace{.30in} n=1,2,... \end{eqnarray}

with $P_{in}(0)=1$ when $n = i$ and $= 0$ otherwise. Solving these equations to obtain $P_{in}(t)$ is not very simple. Readers may refer to Gross et al (2008) and its earlier editions for their solutions.

When $\rho< 1$, the limit $P_{ij}(t)=p_j$ exists and is independent of the initial state $i$. It is known as the steady state probability. It can be obtained easily from the following equations that result by letting $t \rightarrow\infty$ in the above set of difference-differential equations. \begin{eqnarray} \lambda p_0 &=& \mu p_1 \nonumber\\ (\lambda+ \mu)p_n &=& \lambda p_{n-1} + \mu p_{n+1} \hspace{.3in} n=1,2,... \end{eqnarray}

along with $\sum^\infty_{n=0} p_n = 1$. We get $p_0 = 1 - \rho, p_n = (1-\rho)\rho^n(n=0,1,2,...,)$. The mean $E(Q)$ and variance $V(Q)$ of this distribution can be obtained as $E(Q) = \rho/(1-\rho)$ and $V(Q) = \rho/(1-\rho)^2$.

The waiting time of an arriving customer, when the queue discipline is FCFS, is the total amount of time required to serve the customers who are already in the system and this total time has an Erlangian distribution. Let us denote it as $T_q$ (we use $T$ as the random variable representing the total time the customer is in the system, also known as total workload.) Accordingly we get \begin{eqnarray*} P[T_q \leq t] &=& 1-\rho+ \int^t_0 \sum^\infty_{n=1}p_n e^{-\mu t}\frac{\mu^n t^n}{(n-1)!} dt \\ &=& 1 - \rho e^{-\mu(1-\rho)t} \\ E[T_q] &=& \rho/\mu(1-\rho) \mbox{ and } E[T]=1/\mu(1-\rho). \end{eqnarray*}

Let $E(Q) = L$ and $E(T) = W$. Looking at the above results we can see that $L = \lambda W$ showing, how $L$ and $W$ are related in this system. This property is known as Little's Law and it holds in more complex systems under certain general conditions. Another property is the exponential nature of the limiting waiting time distribution shown above which holds in more general queueing systems as well.

The derivation of the distribution of the busy period B is more complicated even in this simple system. We may refer the reader to Gross et al (2008) for its derivation.

When systems include more complicated features, more advanced techniques are employed to analyze resulting stochastic models. For instance, generalizing the concepts introduced earlier, the birth and death process can be used to model a wide class of queueing systems. Using the terminology of birth for the arrival and death for the departure of customers, when there are $n$ customers in the system, let $\lambda_n$ be the rate of birth and $\mu_n$ be the rate of dealth. The birth and dealth process is governed by the three properties identified earlier. Writing $P_{in}(t) \equiv P_n(t)$ for ease of notation, the transition probabilities of the stochastic process $Q(t)$ can be shown to satisfy equations similar to (1), with $\lambda$ and $\mu$ replaced by $\lambda_n$ and $\mu_n$ respectively. Denoting the row vector $[P_0(t), P_1(t),...]$ as $\boldsymbol{P(t)}$, the row vector of their derivatives as $\boldsymbol{P}^\prime(\boldsymbol{t})$, and the transition rate matrix (also called the generator matrix) \begin{equation} \boldsymbol{A} = \left[ \matrix{& & & & \cr& -\lambda_0 & \lambda_0 & & \cr& \mu_1 & -(\lambda_1 + \mu_1) & \lambda_1 & \cr& & \mu_2 & -(\lambda_2 + \mu_2) & \lambda_2 \cr& & & \ddots& } \right] \end{equation}

the generalized Eq. (1) can be written as \begin{equation} \boldsymbol{P}^{'} \boldsymbol{(t)} = \boldsymbol{P(t)} \boldsymbol{A} \end{equation}

As $t \rightarrow\infty$ and when $P_n(t)$ exists, writing $P_n(t)_{\lim t \rightarrow\infty} = p_n$ and denoting the row vector of $\{p_n, n=0,1,2,...\}$ as $\boldsymbol{p}$, Eq. (4) becomes \begin{equation} 0 = \boldsymbol{p A} \end{equation} [Eq. (2) in the M/M/1 case.]

In queueing models of a wide variety of systems from computer, communication, manufacturing, and other areas of applications, birth and death process models can be used with generator matrix $\boldsymbol{A}$ of appropriate structures, sometimes with submatrices taking the place of matrix elements. The problem now consists of solving equations of the type (4) and (5).

In two classical papers Kendall (1951, 1953) introduced the imbedded Markov chain technique for the analysis of queueing systems when their arrival processes are not Poisson and/or the service time distributions are not exponential. In an M/G/1 queue the queue length process $Q(t)$ is not a Markov process for all $t$. But if we observe the process only at departure points, say $t_0, t_1, t_2,..., \{Q(t_n) n=0,1,2,...,\}$ is an imbedded markov chain, the transition probablility matrix of which can be fully described when the arrival rate and the service time distribution are known. In a GI/M/1 queue, to get an imbedded Markov chain, we have to observe the system at arrival points. Then knowing the distribution of inter-arrival times and the rate of service we can specify the elements of the corresponding transition probability matrix. The analysis of the queueing systems then follow using standard techniques for the analysis of Markov chains.

Until the 1970s researchers analyzed stochastic processes underlying queueing system models using difference, differential and integral equations with the help of transforms; combinatorial relationships; and recursive solution techniques. As queueing models in applications such as computer-communication and manufacturing systems became more complex the available methods became inadequate. The need for analyzing such processes was answered with the development of the matrix analytic approach initiated by Neuts (1981) on generalized forms of matrix structures show in (3) and the transition probability matrices of the imbedded Markov chains of M/G/1 and GI/M/1, and advanced by many researchers since then. See Latouche and Ramaswami (1999).

The literature on queueing theory is vast and it is impossible to cover all facets of the analysis of queueing systems using various modeling and sophisticated mathematical techniques in a short article in an encyclopedia. One such system is the queueing network in which customers are served in various types of networks of service nodes. It has spawned a major area of research and applications relevant to systems such as computer, communication, manufacturing and transportation. Retrial, polling and vacation models are some of the models used in the analysis of individual or networks of queues.

As mentioned earlier, analysis of queueing systems involves the study of interaction between arrival and service processes under various queueing structures and queueing disciplines. These complexities generate a large number of problems in practice. For instance, if the arrival rate is larger than service rate, the system becomes unstable. Limit theorems on appropriate distributions provide the behavior of such congested systems. The properties of the tail probabilities of queue length and waiting time distributions describe the influence the basic distributions of arrival and service processes have on the system behavior. Also the complexities of the systems may require the use of mathematically more sophisticated stochastic processes such as diffusion processes.

The following references provide the basic understanding of the subject at two levels: Bhat (2008) for those who have a background only in probability and statistics at an introductory level and Gross and Harris (1998) or Gross et al. (2008) for those who have some background in stochastic processes. Bibliographies given in these books and specialized books such as Buzacott and Shanthikumar (1993) and Chen and Yao (2001) provide a good selection of references on the subject. The major source for articles in queueing theory is the journal Queueing Systems. Other sources are major operations research journals such as European Journal of Operations Research, Management Science, Operations Research, and Stochastic Models; and other journals in application areas such as electrical and communications engineering, industrial engineering, and manufacturing.

References

Bhat, U. N. (2008). An Introduction to Queueing Theory, Birkhauser, Boston.

Buzacott, J. A. and Shanthikumar, J. G. (1993), Stochastic Models of Manufacturing Systems, Prentice Hall, Upper Saddle River, NJ.

Chen, H. and Yao, D. D. (2001), Fundamentals of Queueing Networks, Springer, New York.

Erlang, A. K. (1909), The theory of probabilities and telephone conversations Nyt Tidsskrift fur Matematik B, 20, 33.

Gross, D. and Harris, C. M. (1998), Fundamentals of Queueing Theory, 3rd Ed., Wiley, New York.

Gross, D., Shortle, J. F., Thompson, J. M. and Harris, C. M. (2008), Fundamentals of Queueing Theory, 4th Ed., Wiley, New York.

Kendall, D. G. (1951), "Some problems in the theory of queues" J. Royal Statist. Soc. B, 13, 151-185.

Kendall, D. G. (1953), "Stochastic processes occurring in the theory of queues and their analysis by the method of imbedded Markov chains", Ann. Math. Statist., 24, 338-354.

Latouche, G. and Ramaswami, V. (1999), Introduction to Matrix Analytic Methods in Stochastic Modeling, ASA-SIAM Series on Statistics and Applied Probability, Philadelphia, PA.

Neuts, M. F. (1981), Matrix-Geometric Solutions in Stochastic Models. An Algorithmic Approach, The Johns Hopkins University Press, Baltimore, MD.

Acknowledgement : Based on an article from Lovric, Miodrag (2011), International Encyclopedia of Statistical Science. Heidelberg: Springer Science + Business Media, LLC.

Monty Hall problem

Introduction

Imagine you are a guest in a TV game show. The host, a certain Mr. Monty Hall, shows you three large doors and tells you that behind one of the doors there is a car while behind the other two there are goats. You want to win the car. He asks you to choose a door. After you have made your choice, he opens another door, revealing a goat. He then asks you whether you want to stay with your initial choice, or switch to the remaining closed door. Would you switch or stay?

The host, naturally, knows in advance which of the three doors hides the car. This means that whatever door you initially choose, he can indeed open a different door and reveal a goat. Stronger still: not only can he do this; you also know he certainly will do this.

The instinctive, but incorrect, answer of almost all newcomers to the problem is: “Stay, since it is equally likely that the car is behind either of the two closed doors”.

However, under very natural assumptions, the good answer is “Switch, since this doubles my chance of winning the car: it goes from one third to two thirds”.

Because of this conflict the Monty Hall problem is often called the Monty Hall paradox.

The key to accepting and understanding the paradox is to realize that the (subjective) probabilities relevant for the decision are not determined by the situation (two doors closed) alone, but also by what is known about the development that led to this situation. In statistical terminology, the data is not an unordered set of two closed doors, but an ordered set, where the ordering corresponds to the roles of the closed doors: the one chosen by you, and the one left un-chosen by the host. We have to model the data-generating mechanism as well as the data.

There are several intuitive arguments why switching is a good stategy. One is the following. The chances are 1 in 3 that the door initially chosen hides the car. When that happens staying is good, it gives the car. Both of the other two doors hide goats; one is revealed by the host, but switching to the other door just gives the other goat. Complementarily to this, the chances are 2 in 3 that the door initially chosen hides a goat. When that happens, staying is not good: it gives a goat. On the other hand, switching certainly does give the car: the host is forced to open the other door hiding a goat, and the remaining closed door is the door hiding the car.

In many repetitions, one third of the times the stayer will win and the switcher will lose; two thirds of the time the stayer will lose and the switcher will win.

The (wrong) intuitive answer “50–50” is often supported by saying that the host has not provided any new information by opening a door and revealing a goat since the contestant knows in advance that at least one of the other two doors hides a goat, and that the host will open that this door or one of those doors as the case may be. The contestant merely gets to know the identity of one of those two. How can this “non-information” change the fact the remaining doors are equally likely to hide the car?

However, precisely the same reasoning can be used against this answer: if indeed the host's action does not give away information about what is behind the closed doors, how can his action increase the winning chances for the door first chosen from 1 in 3 to 1 in 2? The paradox is that while initially doors 1 and 2 were equally likely to hide the car, after the player has chosen door 1 and the host has opened door 3, door 2 is twice as likely as door 1 to hide the car. The paradox (apparent, but not actual, contradiction) holds because it is equally true that initially door 1 had chance 1/3 to hide the car, while after the player has chosen door 1 and the host has opened door 3, door 1 still has chance 1/3 to hide the car.

The Monty Hall problem became internationally famous after its publication vos Savant (1990) in a popular weekly magazine led to a huge controversy in the media. It has been causing endless disputes and arguments since then.

The origins of the problem

The Monty Hall problem, also known as the as the Monty Hall paradox, the three doors problem, the quizmaster problem, and the problem of the car and the goats, was introduced by biostatistician Steve Selvin (1975a) in a letter to the journal The American Statistician. Depending on what assumptions are made, it can be seen as mathematically identical to the Three Prisoners Problem of Martin Gardner (1959a,b). It is named after the stage-name of the actual quizmaster, Monty Halperin (or Halparin) of the long-running 1960's TV show “Let's make a Deal”. Selvin's letter provoked a number of people to write to the author, and he published a second letter in response, Selvin (1975b). One of his correspondents was Monty Hall himself, who pointed out that the formulation of the Monty Hall problem did not correspond with reality: in reality, Monty only occasionally offered a player the option to switch to another door, and he did this depending on whether or not the player had made a good or bad initial choice.

The problem, true to reality or not, became world famous in 1990 with its presentation in the popular weekly column “Ask Marilyn” in Parade magazine. The author Marilyn Vos Savant, was, according to the Guiness Book of Records at the time, the person with the highest IQ in the world. Rewriting in her own words a problem posed to her by a correspondent, Craig Whitaker, vos Savant asked the following:

Suppose you're on a game show, and you're given the choice of three doors: behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?”.

Vos Savant proceded to give a number of simple arguments for the good answer: switch, it doubles your chance of winning the car. One of them was the previously mentioned argument that a stayer wins if and only if a switcher loses. A stayer only wins one third of the time. Hence a switcher only loses one third of the time, and wins two thirds of the time.

Another intuitive reasoning is the following: one could say that when the contestant initially chooses door 1, the host is offering the contestant a choice between his initial choice door 1, or doors 2 and 3 together, and kindly opens one of doors 2 and 3 in advance.

By changing one aspect of the problem, this way of understanding why the contestant indeed should switch may become even more compelling to the reader. Consider the 100-door problem: 99 goats and one car. The player chooses one of the 100 doors. Let's say that he chooses door number 1. The host, who knows the location of the car, one by one opens all 99 of the other doors but one – let's say that he skips door number 38. Would you switch?

The simple solutions often implicitly used a frequentist picture of probability: probability refers to relative frequency in many repetitions. They also do not address the issue of whether the specific door opened by the host is relevant: if the player has initially chosen door 1, could it be that the decision to switch should depend on whether the host opens door 2 or door 3? Intuition says no, but we already saw that naive intuition can be misleading.

These topics are taken up in the next section.

A more refined analysis

Marilyn vos Savant was taken to task by Morgan, Chaganty, Dahiya and Doviak (1991a), in another paper in The American Statistician, for not computing the conditional probability that switching will give the car, given the choice of the player and which door was opened by the host. Possibly with a frequentist view of probability in mind, Morgan et al. took it for granted that the car is hidden uniformly at random behind one of the three doors, but did not assume that the probability that the host would open door 3, rather than door 2, given that player has chosen door 1 and the car is behind door 1, is one half. Vos Savant (1991) responded angrily in a letter to the editor. Further comments were given by Seymann (1991), Bell (1992) and Bhaskara Rao (1992), then (after a long pause) Hogbin and Nijdam (2010), followed by a final response Morgan et al. (2010). This last note incidentally finally revealed Craig Whitaker's original wording of the problem in his letter to vos Savant: “I've worked out two different situations based on whether or not Monty knows what's behind the doors. In one situation it is to your advantage to switch, in the other there is no advantage to switch. What do you think?” (spelling and grammatical errors corrected by RDG).

The solution to be given in this section takes explicit account of which door was opened by the host: door 2 or door 3 – and does so in order to argue that this does not change the answer. This is probably the reason why lay persons on hearing the simple solutions of the previous section, do not see any need whatsoever for further analysis. Having seen that the strategy of “always switching” gives a success rate of 2/3, while “always staying” gives a success rate of 1/3, there seems little point in pondering whether or not the success rate of 2/3 could be improved.

This is mathematically a true fact, given the only probabilistic assumption which we have made (and used) so far: initially all doors are equally likely to hide the car. However, a really short rigorous and intuitive proof of this fact does not seem to exist.

Let's make a more careful analysis, in which a further (natural) assumption will indeed be used. In this section, probability is used its daily-life Bayesian or subjective sense: that is to say, probability statements are supposed to reflect the state of knowledge of one person. That person will be a contestant on the show who initially knows no more than the following: he'll choose a door; the quizmaster (who knows where the car is hidden) will thereupon open a different door revealing a goat and make the offer that the contestant switches to the remaining closed door. The argument will be kept intuitive or informal, however the student of probability theory will be able to convert every step into formal mathematics, if the need is felt to do so.

For our tabula rasa contestant, initially all doors are equally likely to hide the car. Moreover, if he chooses any particular door, and if the car happens to be behind that particular door, then as far as this contestant is concerned, the host is equally likely to open either of the other two doors.

The contestant initially chooses door number 1. Initially, his odds that the car is behind this door are 2 to 1 against: it is two times as likely for him that his choice is wrong as that it is right.

The host opens one of the other two doors, revealing a goat. Let's suppose that for the moment, the contestant doesn't take any notice of which door was opened. Since the host is certain to open a door revealing a goat whether or not the car is behind door 1, the information that an unspecified door is opened revealing a goat cannot change the contestant's odds that the car is indeed behind door 1; they are still 2 to 1 against.

Now here comes the further detail which we will take account of in this solution: the contestant also gets informed which specific door was opened by the host – let's say it was door 3. Does this piece of information influence his odds that the car is behind door 1? No: from the contestant's point of view, the chance that the car is behind door 1 obviously can't depend on whether the host opens door 2 or door 3 – the door numbers are arbitrary, exchangeable.

Therefore, also knowing that the host opened specifically door 3 to reveal a goat, the contestant's odds on the car being behind his initially chosen door 1 still remain 2 to 1 against. He had better switch to door 2.

Explicit computations

Students of probability theory might feel uneasy about the informality (the intuitive nature) of the last argument. Ordinary people's intuition about probability is well known to be often wrong – after all, it is ordinary intuition which makes most people believe there is no point in switching doors! To feel more secure, students of probability theory might consider the mathematical concept of symmetry and use the law of total probability to show how symmetry leads to statistical independence between the events “Car is behind door 1” and “Host opens door 3” when it is given that the contestant chose door 1. Alternatively, they might like to explicitly use Bayes' theorem, in the form known as Bayes' rule: posterior odds equals prior odds times likelihood ratio (aka Bayes factor). They just have to check that under the two competing hypotheses (whether or not the car is behind the door chosen by the contestant, door 1), the fact that it is door 3 (rather than door 2) which gets opened by the host has the same probability 1/2.

Either of these routes can be used to convert the last step of the argument in the previous section into a formal mathematical proof.

An alternative approach is to use symmetry in advance to dispose of the door-numbers. Suppose without loss of generality (since later we will condition on its value anyway) that the contestant's initial choice of door number $X$ is uniformly distributed over the three door numbers $\{1,2,3\}$. Independently of this, the car is hidden behind door $C$, also uniformly at random. Given $X$ and $C$, the host opens door number $H$ uniformly at random from the door numbers different from both $X$ and $C$ (in number, either one or two). Let $Y$ be the remaining closed door, so $(X,H,Y)$ is a random permutation of $(1,2,3)$. By symmetry it is uniformly distributed over the set of six permutations. We know that either $C=X$ or $C=Y$ with probabilities 1/3 and 2/3 respectively. By symmetry, the conditional probability that $C=X$ given the value of $(X,H,Y)$ – one of the six permutations of $(1,2,3)$ – cannot depend on that value and hence the event $\{C=X\}$ is statistically independent of $(X,H,Y)$.

The actual numbers of door chosen and door opened are irrelevant to deciding whether to switch or stay.

Note the use of the trick of symmetrization – randomization over the door initially chosen – in order to simplify the mathematical analysis.

Almost all introductory statistical texts solve the Monty Hall problem by computing the conditional probability that switching will give the car, from first principles. Arguments for the chosen assumptions, and for the chosen approach to solution, are usually lacking. Gill (2011) argues that Monty Hall can be seen as an exercise in the art of statistical model building, and actually allows many different solutions: as one makes more assumptions, the conclusions are stronger but the scope of application becomes smaller; moreover, the meaning and the meaningfulness of the assumptions and of the result are tied to the user's interpretation of probability. The task of the statistician is to present a menu of solutions; the user is the one who should choose according to his resources and wishes. Rosenthal (2005, 2008) is one of the few who at least uses Bayes' rule to make the solution more insightful.

Variations

Many, many variations of the Monty Hall problem have been studied in the enormous literature which has grown up about the problem. The book Rosenhouse (2010) is a good resource, as are also the wikipedia pages on the topic. We just consider two variations here.

The biased host

Morgan et al. (1981)'s main innovation was to allow the host to have a bias to one door or the other. Suppose that when he has a choice between doors 2 and 3, he opens door 3 with probability $q$. The Bayes factor for the hypotheses that the car is or is not behind door 1 therefore becomes $q:1/2$. The prior odds were $1:2$ so the posterior odds become $q:1$. This can be anything between $0:1$ and $1:1$, but whatever it is, it is not unfavourable to switching. A frequentist player who knows that the car has been hidden by a true uniform randomization, but does not know anything about the probabilistic nature of Monty's brain processes with regards to choosing a door to open, should switch anyway. He does not actually know the conditional probability that switching gives him the car, but he does know the unconditional probability is 2/3.

Game theory

In the literature of game theory and mathematical economics, starting with Nalebuff (1987), the Monty Hall problem is treated as a finite two stage two person zero sum game. The car is hidden by the host (in advance), the contestant independently chooses a door. The host opens a door revealing a goat. The contestant is allowed to choose again. The contestant wants to win the car, the host wants to keep it. If we allow the two “game-players” (host, contestant) randomized strategies, then according to von Neumann's minimax theorem, they both have a minimax stategy, and the game has a value say $p$, such that if the contestant uses his minimax strategy, then whatever strategy is used by the host, the contestant will go home with the car with probability at least $p$; while on the other hand, if the host uses his minimax strategy, then whatever strategy is used by the contestant, the contestant will go home with the car with probability at most $p$.

It is not difficult to show, and symmetry is one way to establish this, that the minimax strategy of the host is: hide the car uniformly at random, and open either door with equal chance when there is a choice. The minimax strategy of the contestant is: choose a door uniformly at random and thereafter switch, regardless of the which door is opened by the host.

With his minimax strategy the contestant wins the car with probability $2/3$ exactly, whatever strategy is used by the player. With the host's minimax strategy, the contestant can't do better than $2/3$ (random initial choice and thereafter switch).

A wise player would be recommended to choose a door number in advance, at home, by a fair randomization, and later switch. He'll get the car with probability $2/3$, he cannot do better, and his ego won't be damaged when his initial choice turned out to have been right and yet he switched and lost the car.

References

Bell, W. (1992), Comment on Morgan et al. (1991a), Am. Statist. 46 241.

Bhaskara Rao, M. (1992), Comment on Morgan et al. (1991a), Am. Statist. 46 241–242.

Gardner, M. (1959a), “Mathematical Games” column, Scientific American, October 1959, 180–182. Reprinted in The Second Scientific American Book of Mathematical Puzzles and Diversions.

Gardner, M. (1959b), “Mathematical Games” column, Scientific American, November 1959, p. 188.

Gill, R.D. (2011), The Monty Hall Problem is not a Probability Puzzle – it's a challenge in mathematical modelling, Statistica Neerlandica 65 58–71.

Hogbin, M. and Nijdam, W. (2010), Letter to the editor, Am. Statist. 64 193.

Morgan, J. P., Chaganty, N. R., Dahiya, R. C., and Doviak, M. J. (1991a), Let's make a deal: The player's dilemma, Am. Statist. 45 284–287.

Morgan, J. P., Chaganty, N. R., Dahiya, R. C., and Doviak, M. J. (1991b), Rejoinder to Seymann's comment on “Let's make a deal: the player's dilemma”, Am. Statist. 45 289.

Morgan, J. P., Chaganty, N. R., Dahiya, R. C., and Doviak, M. J. (2010), Response to Hogbin and Nijdam's letter, Am. Statist. 64 193–194.

Nalebuff, B. (1987), Puzzles: Choose a curtain, duel-ity, two point conversions, and more, J. Econ. Perspectives 1 (2) 157–163.

Rosenhouse, J. (2009), The Monty Hall Problem, Oxford University Press.

Rosenthal, J. S. (2005), Struck by Lightning: The Curious World of Probabilities. Harper Collins, Canada.

Rosenthal, J. S. (2008), Monty Hall, Monty Fall, Monty Crawl, Math Horizons 16 5–7. http://probability.ca/jeff/writing/montyfall.pdf

Selvin, S. (1975a), A problem in probability (letter to the editor), Am. Statist. 29 67.

Selvin, S. (1975b), On the Monty Hall problem (letter to the editor), Am. Statist. 29 134.

Seyman, R. G. (1991), Comment on “Let's make a deal: the player's dilemma”, Am. Statist. 64 287–288.

vos Savant, Marilyn (1990), “Ask Marilyn” column, Parade Magazine, 9 September 1990, p.16.

vos Savant, Marilyn (1991), Letter to the editor “Marilyn vos Savant's reply”, Am. Statist. 45 347. With response by Morgan, Chaganty, Dahiya and Doviak, pp. 347-348.


Strong Mixing Conditions

Richard C. Bradley

Department of Mathematics, Indiana University, Bloomington, Indiana, USA

There has been much research on stochastic models that have a well defined, specific structure — for example, Markov chains, Gaussian processes, or linear models, including ARMA (autoregressive – moving average) models. However, it became clear in the middle of the last century that there was a need for a theory of statistical inference (e.g. central limit theory) that could be used in the analysis of time series that did not seem to “fit” any such specific structure but which did seem to have some “asymptotic independence” properties. That motivated the development of a broad theory of “strong mixing conditions” to handle such situations. This note is a brief description of that theory.

The field of strong mixing conditions is a vast area, and a short note such as this cannot even begin to do justice to it. Journal articles (with one exception) will not be cited; and many researchers who made important contributions to this field will not be mentioned here. All that can be done here is to give a narrow snapshot of part of the field.

The strong mixing ($\alpha$-mixing) condition. Suppose $X := (X_k, k \in{\bf Z})$ is a sequence of random variables on a given probability space $(\Omega, {\cal F}, P)$. For $-\infty\leq j \leq\ell\leq\infty$, let ${\cal F}_j^\ell$ denote the $\sigma$-field of events generated by the random variables $X_k,\ j \le k \leq\ell\ (k \in{\bf Z})$. For any two $\sigma$-fields ${\cal A}$ and ${\cal B} \subset{\cal F}$, define the “measure of dependence” \begin{equation} \alpha({\cal A}, {\cal B}) := \sup_{A \in{\cal A}, B \in{\cal B}} |P(A \cap B) - P(A)P(B)|. \end{equation} For the given random sequence $X$, for any positive integer $n$, define the dependence coefficient \begin{equation} \alpha(n) = \alpha(X,n) := \sup_{j \in{\bf Z}} \alpha({\cal F}_{-\infty}^j, {\cal F}_{j + n}^{\infty}). \end{equation} By a trivial argument, the sequence of numbers $(\alpha(n), n \in{\bf N})$ is nonincreasing. The random sequence $X$ is said to be “strongly mixing”, or “$\alpha$-mixing”, if $\alpha(n) \to0$ as $n \to\infty$. This condition was introduced in 1956 by Rosenblatt [Ro1], and was used in that paper in the proof of a central limit theorem. (The phrase “central limit theorem” will henceforth be abbreviated CLT.)

In the case where the given sequence $X$ is strictly stationary (i.e. its distribution is invariant under a shift of the indices), eq. (2) also has the simpler form \begin{equation} \alpha(n) = \alpha(X,n) := \alpha({\cal F}_{-\infty}^0, {\cal F}_n^{\infty}). \end{equation} For simplicity, in the rest of this note, we shall restrict to strictly stationary sequences. (Some comments below will have obvious adaptations to nonstationary processes.)

In particular, for strictly stationary sequences, the strong mixing ($\alpha$-mixing) condition implies Kolmogorov regularity (a trivial “past tail” $\sigma$-field), which in turn implies “mixing” (in the ergodic-theoretic sense), which in turn implies ergodicity. (None of the converse implications holds.) For further related information, see e.g. [Br, v1, Chapter 2].

Comments on limit theory under $\alpha$-mixing. Under $\alpha$-mixing and other similar conditions (including ones reviewed below), there has been a vast development of limit theory — for example, CLTs, weak invariance principles, laws of the iterated logarithm, almost sure invariance principles, and rates of convergence in the strong law of large numbers. For example, the CLT in [Ro1] evolved through subsequent refinements by several researchers into the following “canonical” form. (For its history and a generously detailed presentation of its proof, see e.g. [Br, v1, Theorems 1.19 and 10.2].)

Theorem 1. Suppose $(X_k, k \in{\bf Z})$ is a strictly stationary sequence of random variables such that $EX_0 = 0$, $EX_0^2 < \infty$, $\sigma_n^2 := ES_n^2 \to\infty$ as $n \to\infty$, and $\alpha(n) \to0$ as $n \to\infty$. Then the following two conditions (A) and (B) are equivalent:

(A) The family of random variables $(S_n^2/\sigma_n^2, n \in{\bf N})$ is uniformly integrable.

(B) $S_n/\sigma_n \Rightarrow N(0,1)$ as $n \to\infty$.

If (the hypothesis and) these two equivalent conditions (A) and (B) hold, then $\sigma_n^2 = n \cdot h(n)$ for some function $h(t),\ t \in(0, \infty)$ which is slowly varying as $t \to\infty$.

Here $S_n := X_1 + X_2 + \dots+ X_n$; and $\Rightarrow$ denotes convergence in distribution. The assumption $ES_n^2 \to\infty$ is needed here in order to avoid trivial $\alpha$-mixing (or even 1-dependent) counterexamples in which a kind of “cancellation” prevents the partial sums $S_n$ from “growing” (in probability) and becoming asymptotically normal.

In the context of Theorem 1, if one wants to obtain asymptotic normality of the partial sums (as in condition (B)) without an explicit uniform integrability assumption on the partial sums (as in condition (A)), then as an alternative, one can impose a combination of assumptions on, say, (i) the (marginal) distribution of $X_0$ and (ii) the rate of decay of the numbers $\alpha(n)$ to 0 (the “mixing rate”). This involves a “trade-off”; the weaker one assumption is, the stronger the other has to be. One such CLT of Ibragimov in 1962 involved such a “trade-off” in which it is assumed that for some $\delta> 0$, $E|X_0|^{2 + \delta} < \infty$ and $\sum_{n=1}^\infty[\alpha(n)]^{\delta/(2 + \delta)} < \infty$. Counterexamples of Davydov in 1973 (with just slightly weaker properties) showed that that result is quite sharp. However, it is not at the exact “borderline”. From a covariance inequality of Rio in 1993 and a CLT (in fact a weak invariance principle) of Doukhan, Massart, and Rio in 1994, it became clear that the “exact borderline” CLTs of this kind have to involve quantiles of the (marginal) distribution of $X_0$ (rather than just moments). For a generously detailed exposition of such CLTs, see [Br, v1, Chapter 10]; and for further related results, see also Rio [Ri].

Under the hypothesis (first sentence) of Theorem 1 (with just finite second moments), there is no mixing rate, no matter how fast (short of $m$-dependence), that can insure that a CLT holds. That was shown in 1983 with two different counterexamples, one by the author and the other by Herrndorf. See [Br, v1&3, Theorem 10.25 and Chapter 31].

Several other classic strong mixing conditions. As indicated above, the terms “$\alpha$-mixing” and “strong mixing condition” (singular) both refer to the condition $\alpha(n) \to0$. (A little caution is in order; in ergodic theory, the term “strong mixing” is often used to refer to the condition of “mixing in the ergodic-theoretic sense”, which is weaker than $\alpha$-mixing as noted earlier.) The term “strong mixing conditions” (plural) can reasonably be thought of as referring to all conditions that are at least as strong as (i.e. that imply) $\alpha$-mixing. In the classical theory, five strong mixing conditions (again, plural) have emerged as the most prominent ones: $\alpha$-mixing itself and four others that will be defined here.

Recall our probability space $(\Omega, {\cal F}, P)$. For any two $\sigma$-fields ${\cal A}$ and ${\cal B} \subset{\cal F}$, define the following four “measures of dependence”: \begin{eqnarray} \phi({\cal A}, {\cal B}) &:= & \sup_{A \in{\cal A}, B \in{\cal B}, P(A) > 0} |P(B|A) - P(B)|; \\ \psi({\cal A}, {\cal B}) &:= & \sup_{A \in{\cal A}, B \in{\cal B}, P(A) > 0, P(B) > 0} |P(B \cap A)/[P(A)P(B)]\thinspace-\thinspace1|; \\ \rho({\cal A}, {\cal B}) &:= & \sup_{f \in{\cal L}^2({\cal A}),\thinspace g \in{\cal L}^2({\cal B})} |{\rm Corr}(f,g)|; \quad{\rm and} \\ \beta({\cal A}, {\cal B}) &:=& \sup\ (1/2) \sum_{i=1}^I \sum_{j=1}^J |P(A_i \cap B_j) - P(A_i)P(B_j)| \end{eqnarray} where the latter supremum is taken over all pairs of finite partitions $(A_1, A_2, \dots, A_I)$ and $(B_1, B_2, \dots, B_J)$ of $\Omega$ such that $A_i \in{\cal A}$ for each $i$ and $B_j \in{\cal B}$ for each $j$. In (6), for a given $\sigma$-field ${\cal D} \subset{\cal F}$, the notation ${\cal L}^2({\cal D})$ refers to the space of (equivalence classes of) square-integrable, ${\cal D}$-measurable random variables.

Now suppose $X := (X_k, k \in{\bf Z})$ is a strictly stationary sequence of random variables on $(\Omega, {\cal F}, P)$. For any positive integer $n$, analogously to (3), define the dependence coefficient \begin{equation} \phi(n) = \phi(X,n) := \phi({\cal F}_{-\infty}^0, {\cal F}_n^{\infty}), \end{equation} and define analogously the dependence coefficients $\psi(n)$, $\rho(n)$, and $\beta(n)$. Each of these four sequences of dependence coefficients is trivially nonincreasing. The (strictly stationary) sequence $X$ is said to be
“$\phi$-mixing” if $\phi(n) \to0$ as $n \to\infty$;
“$\psi$-mixing” if $\psi(n) \to0$ as $n \to\infty$;
“$\rho$-mixing” if $\rho(n) \to0$ as $n \to\infty$; and
“absolutely regular”, or “$\beta$-mixing”, if $\beta(n) \to0$ as $n \to\infty$.

The $\phi$-mixing condition was introduced by Ibragimov in 1959 and was also studied by Cogburn in 1960 . The $\psi$-mixing condition evolved through papers of Blum, Hanson, and Koopmans in 1963 and Philipp in 1969; and (see e.g. [Io]) it was also implicitly present in earlier work of Doeblin in 1940 involving the metric theory of continued fractions. The $\rho$-mixing condition was introduced by Kolmogorov and Rozanov 1960. (The “maximal correlation coefficient” $\rho({\cal A}, {\cal B})$ itself was first studied by Hirschfeld in 1935 in a statistical context that had no particular connection with “stochastic processes”.) The absolute regularity ($\beta$-mixing) condition was introduced by Volkonskii and Rozanov in 1959, and in the ergodic theory literature it is also called the “weak Bernoulli” condition.

For the five measures of dependence in (1) and (4)–(7), one has the following well known inequalities: \begin{eqnarray*} 2\alpha({\cal A}, {\cal B}) \thinspace& \leq\thinspace\beta({\cal A}, {\cal B}) \thinspace\leq\thinspace\phi({\cal A}, {\cal B}) \thinspace\leq\thinspace(1/2) \psi({\cal A}, {\cal B}); \\ 4 \alpha({\cal A}, {\cal B})\thinspace&\leq\thinspace\rho({\cal A}, {\cal B}) \thinspace\leq\thinspace\psi({\cal A}, {\cal B}); \quad{\rm and} \\ \rho({\cal A}, {\cal B}) \thinspace&\leq\thinspace2 [\phi({\cal A}, {\cal B})]^{1/2} [\phi({\cal B}, {\cal A})]^{1/2} \thinspace\leq\thinspace2 [\phi({\cal A}, {\cal B})]^{1/2}. \end{eqnarray*} For a history and proof of these inequalities, see e.g. [Br, v1, Theorem 3.11]. As a consequence of these inequalities and some well known examples, one has the following “hierarchy” of the five strong mixing conditions here:

(i) $\psi$-mixing implies $\phi$-mixing.

(ii) $\phi$-mixing implies both $\rho$-mixing and $\beta$-mixing (absolute regularity).

(iii) $\rho$-mixing and $\beta$-mixing each imply $\alpha$-mixing (strong mixing).

(iv) Aside from “transitivity”, there are in general no other implications between these five mixing conditions. In particular, neither of the conditions $\rho$-mixing and $\beta$-mixing implies the other.

For all of these mixing conditions, the “mixing rates” can be essentially arbitrary, and in particular, arbitrarily slow. That general principle was established by Kesten and O'Brien in 1976 with several classes of examples. For further details, see e.g. [Br, v3, Chapter 26].

The various strong mixing conditions above have been used extensively in statistical inference for weakly dependent data. See e.g. [DDLLLP], [DMS], [Ro3], or [Žu].

Ibragimov's conjecture and related material. Suppose (as in Theorem 1) $X := (X_k, k \in{\bf Z})$ is a strictly stationary sequence of random variables such that $$ EX_0 = 0,\ \ EX_0^2 < \infty,\ \ {\rm and}\ \ ES_n^2 \to\infty\ {\rm as}\ n \to\infty. \eqno(9) $$

In the 1960s, I.A. Ibragimov conjectured that under these assumptions, if also $X$ is $\phi$-mixing, then a CLT holds. Technically, this conjecture remains unsolved. Peligrad showed in 1985 that it holds under the stronger “growth” assumption $\liminf_{n \to\infty} n^{-1} ES_n^2 > 0$. (See e.g. [Br, v2, Theorem 17.7].)

Under (9) and $\rho$-mixing (which is weaker than $\phi$-mixing), a CLT need not hold (see [Br, v3, Chapter 34] for counterexamples). However, if one also imposes either the stronger moment condition $E|X_0|^{2 + \delta} < \infty$ for some $\delta> 0$, or else the “logarithmic” mixing rate assumption $\sum_{n=1}^\infty\rho(2^n) < \infty$, then a CLT does hold (results of Ibragimov in 1975). For further limit theory under $\rho$-mixing, see e.g. [LL] or [Br, v1, Chapter 11].

Under (9) and an “interlaced” variant of the $\rho$-mixing condition (i.e. with the two index sets allowed to be “interlaced” instead of just “past” and “future”), a CLT does hold. For this and related material, see e.g. [Br, v1, Sections 11.18-11.28].

There is a vast literature on central limit theory for random fields satisfying various strong mixing conditions. See e.g. [Ro3], [Žu], [Do], and [Br, v3]. In the formulation of mixing conditions for random fields — and also “interlaced” mixing conditions for random sequences — some caution is needed; see e.g. [Br, v1&3, Theorems 5.11, 5.13, 29.9, and 29.12].

Connections with specific types of models. Now let us return briefly to a theme from the beginning of this write-up: the connection between strong mixing conditions and specific structures.

Markov chains. Suppose $X := (X_k, k \in{\bf Z})$ is a strictly stationary Markov chain. In the case where $X$ has finite state space and is irreducible and aperiodic, it is $\psi$-mixing, with at least exponentially fast mixing rate. In the case where $X$ has countable (but not necessarily finite) state space and is irreducible and aperiodic, it satisfies $\beta$-mixing, but the mixing rate can be arbitrarily slow. In the case where $X$ has (say) real (but not necessarily countable) state space, (i) Harris recurrence and “aperiodicity” (suitably defined) together are equivalent to $\beta$-mixing, (ii) the “geometric ergodicity” condition is equivalent to $\beta$-mixing with at least exponentially fast mixing rate, and (iii) one particular version of “Doeblin's condition” is equivalent to $\phi$-mixing (and the mixing rate will then be at least exponentially fast). There exist strictly stationary, countable-state Markov chains that are $\phi$-mixing but not “time reversed” $\phi$-mixing (note the asymmetry in the definition of $\phi({\cal A}, {\cal B})$ in (4)). For this and other information on strong mixing conditions for Markov chains, see e.g. [Ro2, Chapter 7], [Do], [MT], and [Br, v1&2, Chapters 7 and 21].

Stationary Gaussian sequences. For stationary Gaussian sequences $X := (X_k, k \in{\bf Z})$, Ibragimov and Rozanov [IR] give characterizations of various strong mixing conditions in terms of properties of spectral density functions. Here are just a couple of comments: For stationary Gaussian sequences, the $\alpha$- and $\rho$-mixing conditions are equivalent to each other, and the $\phi$- and $\psi$-mixing conditions are each equivalent to $m$-dependence. If a stationary Gaussian sequence has a continuous positive spectral density function, then it is $\rho$-mixing. For some further closely related information on stationary Gaussian sequences, see also [Br, v1&3, Chapters 9 and 27].

Dynamical systems. Many dynamical systems have strong mixing properties. Certain one-dimensional “Gibbs states” processes are $\psi$-mixing with at least exponentially fast mixing rate. A well known standard “continued fraction” process is $\psi$-mixing with at least exponentially fast mixing rate (see [Io]). For certain stationary finite-state stochastic processes built on piecewise expanding mappings of the unit interval onto itself, the absolute regularity condition holds with at least exponentially fast mixing rate. For more detains on the mixing properties of these and other dynamical systems, see e.g. Denker [De].

Linear and related processes. There is a large literature on strong mixing properties of strictly stationary linear processes (including strictly stationary ARMA processes and also “non-causal” linear processes and linear random fields) and also of some other related processes such as bilinear, ARCH, or GARCH models. For details on strong mixing properties of these and other related processes, see e.g. Doukhan [Do, Chapter 2].

However, many strictly stationary linear processes fail to be $\alpha$-mixing. A well known classic example is the strictly stationary AR(1) process (autoregressive process of order 1) $X := (X_k, k \in{\bf Z})$ of the form $X_k = (1/2)X_{k-1} + \xi_k$ where $(\xi_k, k \in{\bf Z})$ is a sequence of independent, identically distributed random variables, each taking the values 0 and 1 with probability 1/2 each. It has long been well known that this random sequence $X$ is not $\alpha$-mixing. For more on this example, see e.g. [Br, v1, Example 2.15] or [Do, Section 2.3.1].

Further related developments. The AR(1) example spelled out above, together with many other examples that are not $\alpha$-mixing but seem to have some similar “weak dependence” quality, have motivated the development of more general conditions of weak dependence that have the “spirit” of, and most of the advantages of, strong mixing conditions, but are less restrictive, i.e. applicable to a much broader class of models (including the AR(1) example above). There is a substantial development of central limit theory for strictly stationary sequences under weak dependence assumptions explicitly involving characteristic functions in connection with “block sums”; much of that theory is codified in [Ja]. There is a substantial development of limit theory of various kinds under weak dependence assumptions that involve covariances of certain multivariate Lipschitz functions of random variables from the “past” and “future” (in the spirit of, but much less restrictive than, say, the dependence coefficient $\rho(n)$ defined analogously to (3) and (8)); see e.g. [DDLLLP]. There is a substantial development of limit theory under weak dependence assumptions that involve dependence coefficients similar to $\alpha(n)$ in (3) but in which the classes of events are restricted to intersections of finitely many events of the form $\{X_k > c\}$ for appropriate indices $k$ and appropriate real numbers $c$; for the use of such conditions in extreme value theory, see e.g. [LLR]. In recent years, there has been a considerable development of central limit theory under “projective” criteria related to martingale theory (motivated by Gordin's martingale-approximation technique — see [HH]); for details, see e.g. [Pe]. There are far too many other types of weak dependence conditions, of the general spirit of strong mixing conditions but less restrictive, to describe here; for more details, see e.g. [DDLLLP] or [Br, v1, Chapter 13].


References

[Br] R.C. Bradley. Introduction to Strong Mixing Conditions, Vols. 1, 2, and 3. Kendrick Press, Heber City (Utah), 2007.

[DDLLLP] J. Dedecker, P. Doukhan, G. Lang, J.R. León, S. Louhichi, and C. Prieur. Weak Dependence: Models, Theory, and Applications. Lecture Notes in Statistics 190. Springer-Verlag, New York, 2007.

[DMS] H. Dehling, T. Mikosch, and M. Sørensen, eds. Empirical Process Techniques for Dependent Data. Birkhäuser, Boston, 2002.

[De] M. Denker. The central limit theorem for dynamical systems. In: Dynamical Systems and Ergodic Theory, (K. Krzyzewski, ed.), pp. 33-62. Banach Center Publications, Polish Scientific Publishers, Warsaw, 1989.

[Do] P. Doukhan. Mixing: Properties and Examples. Springer-Verlag, New York, 1995.

[HH] P. Hall and C.C. Heyde. Martingale Limit Theory and its Application. Academic Press, San Diego, 1980.

[IR] I.A. Ibragimov and Yu.A. Rozanov. Gaussian Random Processes. Springer-Verlag, New York, 1978.

[Io] M. Iosifescu. Doeblin and the metric theory of continued fractions: a functional theoretic solution to Gauss' 1812 problem. In: Doeblin and Modern Probability, (H. Cohn, ed.), pp. 97-110. Contemporary Mathematics 149, American Mathematical Society, Providence, 1993.

[Ja] A. Jakubowski. Asymptotic Independent Representations for Sums and Order Statistics of Stationary Sequences. Uniwersytet Mikołaja Kopernika, Toruń, Poland, 1991.

[LL] Z. Lin and C. Lu. Limit Theory for Mixing Dependent Random Variables. Kluwer Academic Publishers, Boston, 1996.

[LLR] M.R. Leadbetter, G. Lindgren, and H. Rootzén. Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag, New York, 1983.

[MT] S.P. Meyn and R.L. Tweedie. Markov Chains and Stochastic Stability (3rd printing). Springer-Verlag, New York, 1996.

[Pe] M. Peligrad. Conditional central limit theorem via martingale approximation. In: Dependence in Probability, Analysis and Number Theory, (I. Berkes, R.C. Bradley, H. Dehling, M. Peligrad, and R. Tichy, eds.), pp. 295-309. Kendrick Press, Heber City (Utah), 2010.

[Ri] E. Rio. Théorie Asymptotique des Processus Aléatoires Faiblement Dépendants.
Mathématiques & Applications 31. Springer, Paris, 2000.

[Ro1] M. Rosenblatt. A central limit theorem and a strong mixing condition. Proc. Natl. Acad. Sci. USA 42 (1956) 43-47.

[Ro2] M. Rosenblatt. Markov Processes, Structure and Asymptotic Behavior. Springer-Verlag, New York, 1971.

[Ro3] M. Rosenblatt. Stationary Sequences and Random Fields. Birkhäuser, Boston, 1985.

[Žu] I.G. Žurbenko. The Spectral Analysis of Time Series. North-Holland, Amsterdam, 1986.

How to Cite This Entry:
Boris Tsirelson/sandbox. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Boris_Tsirelson/sandbox&oldid=38496