Difference between revisions of "Dirichlet process"
m (link) |
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
||
Line 1: | Line 1: | ||
+ | <!-- | ||
+ | d1102101.png | ||
+ | $#A+1 = 104 n = 0 | ||
+ | $#C+1 = 104 : ~/encyclopedia/old_files/data/D110/D.1100210 Dirichlet process | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
+ | |||
+ | {{TEX|auto}} | ||
+ | {{TEX|done}} | ||
+ | |||
The Dirichlet process provides one means of placing a [[Probability distribution|probability distribution]] on the space of distribution functions, as is done in Bayesian statistical analysis (cf. also [[Bayesian approach|Bayesian approach]]). The support of the Dirichlet process is large: For each distribution function there is a set of distributions nearby that receives positive probability. This contrasts with a typical probability distribution on the space of distribution functions where, for example, one might place a probability distribution on the mean and variance of a normal distribution. The support in this example would be contained in the collection of normal distributions. The large support of the Dirichlet process accounts for its use in non-parametric Bayesian analysis. General references are [[#References|[a4]]], [[#References|[a5]]]. | The Dirichlet process provides one means of placing a [[Probability distribution|probability distribution]] on the space of distribution functions, as is done in Bayesian statistical analysis (cf. also [[Bayesian approach|Bayesian approach]]). The support of the Dirichlet process is large: For each distribution function there is a set of distributions nearby that receives positive probability. This contrasts with a typical probability distribution on the space of distribution functions where, for example, one might place a probability distribution on the mean and variance of a normal distribution. The support in this example would be contained in the collection of normal distributions. The large support of the Dirichlet process accounts for its use in non-parametric Bayesian analysis. General references are [[#References|[a4]]], [[#References|[a5]]]. | ||
− | The Dirichlet process is indexed by its parameter, a non-null, finite measure | + | The Dirichlet process is indexed by its parameter, a non-null, finite measure $ \alpha $. |
+ | Formally, consider a space $ {\mathcal X} $ | ||
+ | with a collection of Borel sets $ {\mathcal B} $ | ||
+ | on $ {\mathcal X} $. | ||
+ | The random probability distribution $ P $ | ||
+ | has a Dirichlet process prior distribution with parameter $ \alpha $, | ||
+ | denoted by $ {\mathcal D} _ \alpha $, | ||
+ | if for every measurable partition $ \{ A _ {1} \dots A _ {m} \} $ | ||
+ | of $ {\mathcal X} $ | ||
+ | the random vector $ ( P ( A _ {1} ) \dots P ( A _ {m} ) ) $ | ||
+ | has the Dirichlet distribution with parameter vector $ ( \alpha ( A _ {1} ) \dots \alpha ( A _ {m} ) ) $. | ||
− | When a [[prior distribution]] is put on | + | When a [[prior distribution]] is put on $ {\mathcal X} $, |
+ | then for every measurable subset $ A $ | ||
+ | of $ {\mathcal X} $, | ||
+ | the quantity $ P ( A ) $ | ||
+ | is a random variable. Then $ \alpha _ {0} = {\alpha / {\alpha ( {\mathcal X} ) } } $ | ||
+ | is a probability measure on $ {\mathcal X} $. | ||
+ | From the definition one sees that if $ P \sim {\mathcal D} _ \alpha $, | ||
+ | then $ {\mathsf E} P ( A ) = \alpha _ {0} ( A ) $. | ||
− | An alternative representation of the Dirichlet process is given in [[#References|[a6]]]: Let | + | An alternative representation of the Dirichlet process is given in [[#References|[a6]]]: Let $ B _ {1} , B _ {2} , \dots $ |
+ | be independent and identically distributed $ { \mathop{\rm Beta} } ( 1, \alpha ( {\mathcal X} ) ) $ | ||
+ | random variables, and let $ V _ {1} , V _ {2} , \dots $ | ||
+ | be a sequence of independent and identically distributed random variables with distribution $ \alpha _ {0} ( A ) $, | ||
+ | and independent of the random variables $ B $. | ||
+ | Define $ B _ {0} = 0 $, | ||
+ | and $ P _ {i} = B _ {i} \prod _ {j = 0 } ^ {i - 1 } ( 1 - B _ {j} ) $. | ||
+ | The random distribution $ \sum _ {i = 1 } ^ \infty P _ {i} \delta _ {V _ {i} } $ | ||
+ | has the distribution $ {\mathcal D} _ \alpha $. | ||
+ | Here, $ \delta _ {a} $ | ||
+ | represents the point mass at $ a $. | ||
+ | This representation makes clear the fact that the Dirichlet process assigns probability one to the set of discrete distributions, and emphasizes the role of the mass of the measure $ \alpha $. | ||
+ | For example, as $ \alpha ( {\mathcal X} ) \rightarrow \infty $, | ||
+ | $ {\mathcal D} _ \alpha $ | ||
+ | converges to the point mass at $ \alpha _ {0} $( | ||
+ | in the [[Weak topology|weak topology]] induced by $ {\mathcal B} $); | ||
+ | and as $ \alpha ( {\mathcal X} ) \rightarrow 0 $, | ||
+ | $ {\mathcal D} _ \alpha $ | ||
+ | converges to the random distribution which is degenerate at a point $ V $, | ||
+ | whose location has distribution $ \alpha _ {0} $. | ||
− | The Dirichlet process is conjugate, in that if | + | The Dirichlet process is conjugate, in that if $ P \sim {\mathcal D} _ \alpha $, |
+ | and data points $ X _ {1} \dots X _ {n} $ | ||
+ | independent and identically drawn from $ P $ | ||
+ | are observed, then the conditional distribution of $ P $ | ||
+ | given $ X _ {1} \dots X _ {n} $ | ||
+ | is $ {\mathcal D} _ {\alpha + \sum _ {i = 1 } ^ {n} \delta _ {X _ {i} } } $. | ||
+ | This conjugation property is an extension of the conjugacy of the Dirichlet distribution for multinomial data. It ensures the existence of analytical results with a simple form for many problems. The combination of simplicity and usefulness has given the Dirichlet process its reputation as the standard non-parametric model for a probability distribution on the space of distribution functions. | ||
− | An important extension of the class of Dirichlet processes is the class of mixtures of Dirichlet processes. A mixture of Dirichlet processes is a Dirichlet process in which the parameter measure is itself random. In applications, the parameter measure ranges over a finite-dimensional parametric family. Formally, one considers a parametric family of probability distributions | + | An important extension of the class of Dirichlet processes is the class of mixtures of Dirichlet processes. A mixture of Dirichlet processes is a Dirichlet process in which the parameter measure is itself random. In applications, the parameter measure ranges over a finite-dimensional parametric family. Formally, one considers a parametric family of probability distributions $ \{ {\alpha _ {\theta,0 } } : {\theta \in \Theta } \} $. |
+ | Suppose that for every $ \theta \in \Theta $, | ||
+ | $ \alpha _ \theta ( {\mathcal X} ) $ | ||
+ | is a positive constant, and let $ \alpha _ \theta = \alpha _ \theta ( {\mathcal X} ) \cdot \alpha _ {\theta,0 } $. | ||
+ | If $ \nu $ | ||
+ | is a probability distribution on $ \Theta $, | ||
+ | and if, first, $ \theta $ | ||
+ | is chosen from $ \nu $, | ||
+ | and then $ P $ | ||
+ | is chosen from $ {\mathcal D} _ {\alpha _ \theta } $, | ||
+ | one says that the prior on $ P $ | ||
+ | is a mixture of Dirichlet processes (with parameter $ ( \{ \alpha _ \theta \} _ {\theta \in \Theta } , \nu ) $). | ||
+ | A reference for this is [[#References|[a1]]]. Often, $ \alpha _ \theta ( {\mathcal X} ) \equiv M $, | ||
+ | i.e., the constants $ \alpha _ \theta ( {\mathcal X} ) $ | ||
+ | do not depend on $ \theta $. | ||
+ | In this case, large values of $ M $ | ||
+ | indicate that the prior on $ P $ | ||
+ | is "concentrated around the parametric family aq,0qQ" . More precisely, as $ M \rightarrow \infty $, | ||
+ | the distribution of $ P $ | ||
+ | converges to $ \int {\alpha _ {\theta,0 } } {\nu ( d \theta ) } $, | ||
+ | the standard Bayesian model for the parametric family $ \{ {\alpha _ {\theta,0 } } : {\theta \in \Theta } \} $ | ||
+ | in which $ \theta $ | ||
+ | has prior $ \nu $. | ||
− | The Dirichlet process has been used in many applications. A particularly interesting one is the Bayesian hierarchical model, which is the Bayesian version of the random effects model. A typical example is as follows. Suppose one is studying the success of a certain type of operation for patients from different hospitals. Suppose one has | + | The Dirichlet process has been used in many applications. A particularly interesting one is the Bayesian hierarchical model, which is the Bayesian version of the random effects model. A typical example is as follows. Suppose one is studying the success of a certain type of operation for patients from different hospitals. Suppose one has $ n _ {i} $ |
+ | patients in hospital $ i $, | ||
+ | $ i = 1 \dots I $. | ||
+ | One might model the number of failures $ X _ {i} $ | ||
+ | in hospital $ i $ | ||
+ | as a [[Binomial distribution|binomial distribution]], with success probability depending on the hospital. And one might wish to view the $ I $ | ||
+ | binomial parameters as being independent and identically distributed drawn from a common distribution. The typical hierarchical model then is written as | ||
− | + | $$ \tag{a1 } | |
+ | \textrm{ given } \theta _ {i} , X _ {i} \sim { \mathop{\rm Bin} } ( n _ {i} , \theta _ {i} ) , | ||
+ | $$ | ||
− | + | $$ | |
+ | \theta _ {i} \sim { \mathop{\rm Beta} } ( a, b ) \textrm{ iid } , | ||
+ | $$ | ||
− | + | $$ | |
+ | ( a, b ) \sim G ( \cdot, \cdot ) . | ||
+ | $$ | ||
− | Here, the | + | Here, the $ \theta _ {i} $ |
+ | are unobserved, or latent, variables. If the distribution $ G $ | ||
+ | was degenerate, then the $ \theta _ {i} $ | ||
+ | would be independent, so that data from one hospital would not give any information on the success rate from any other hospital. On the other hand, when $ G $ | ||
+ | is not degenerate, then data coming from the other hospitals provide some information on the success rate of hospital $ i $. | ||
− | Consider now the problem of prediction of the number of successes for a new hospital, indexed | + | Consider now the problem of prediction of the number of successes for a new hospital, indexed $ I + 1 $. |
+ | A disadvantage of the model (a1) is that if the $ \theta _ {i} $ | ||
+ | are independent and identically drawn from a distribution which is not a Beta, then even as $ I \rightarrow \infty $, | ||
+ | the predictive distribution of $ X _ {I + 1 } $ | ||
+ | based on the (incorrect) model (a1) need not converge to the actual predictive distribution of $ X _ {I + 1 } $. | ||
+ | An alternative model, using a mixture of Dirichlet processes prior, would be written as | ||
− | + | $$ \tag{a2 } | |
+ | \textrm{ given } \theta _ {i} , X _ {i} \sim { \mathop{\rm Bin} } ( n _ {i} , \theta _ {i} ) , | ||
+ | $$ | ||
− | + | $$ | |
+ | \theta _ {i} \sim P \textrm{ iid } , | ||
+ | $$ | ||
− | + | $$ | |
+ | P \sim {\mathcal D} _ {M \cdot { \mathop{\rm Beta} } ( a,b ) } , | ||
+ | $$ | ||
− | + | $$ | |
+ | ( a, b ) \sim G ( \cdot, \cdot ) . | ||
+ | $$ | ||
− | The model (a2) does not have the defect suffered by (a1), because the support of the distribution on | + | The model (a2) does not have the defect suffered by (a1), because the support of the distribution on $ P $ |
+ | is the set of all distributions concentrated in the interval $ [0,1] $. | ||
It is not possible to obtain closed-form expressions for the posterior distributions in (a2). Computational schemes to obtain these have been developed by M. Escobar and M. West [[#References|[a3]]] and C.A. Bush and S.N. MacEachern [[#References|[a2]]]. | It is not possible to obtain closed-form expressions for the posterior distributions in (a2). Computational schemes to obtain these have been developed by M. Escobar and M. West [[#References|[a3]]] and C.A. Bush and S.N. MacEachern [[#References|[a2]]]. | ||
− | The parameter | + | The parameter $ M $ |
+ | plays an interesting role. When $ M $ | ||
+ | is small, then, with high probability, the $ \theta _ {i} $ | ||
+ | are all equal, so that, in effect, one is working with the model in which the $ X _ {i} $ | ||
+ | are independent binomial samples with the same success probability. On the other hand, when $ M $ | ||
+ | is large, the model (a2) is very close to (a1). | ||
− | It is interesting to note that when | + | It is interesting to note that when $ M $ |
+ | is large and the distribution $ G $ | ||
+ | is degenerate, then the measure on $ P $ | ||
+ | is essentially degenerate, so that one is treating the data from the hospitals as independent. Thus, when the distribution $ G $ | ||
+ | is degenerate, the parameter $ M $ | ||
+ | determines the extent to which data from other hospitals is used when making an inference about hospital $ I $, | ||
+ | and in that sense plays the role of tuning parameter in the bias-variance tradeoff of frequentist analysis. | ||
====References==== | ====References==== | ||
<table><TR><TD valign="top">[a1]</TD> <TD valign="top"> C. Antoniak, "Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems" ''Ann. Statist.'' , '''2''' (1974) pp. 1152–1174</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top"> C.A. Bush, S.N. MacEachern, "A semi-parametric Bayesian model for randomized block designs" ''Biometrika'' , '''83''' (1996) pp. 275–285</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top"> M. Escobar, M. West, "Bayesian density estimation and inference using mixtures" ''J. Amer. Statist. Assoc.'' , '''90''' (1995) pp. 577–588</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top"> T.S. Ferguson, "A Bayesian analysis of some nonparametric problems" ''Ann. Statist.'' , '''1''' (1973) pp. 209–230</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top"> T.S. Ferguson, "Prior distributions on spaces of probability measures" ''Ann. Statist.'' , '''2''' (1974) pp. 615–629</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top"> J. Sethuraman, "A constructive definition of Dirichlet priors" ''Statistica Sinica'' , '''4''' (1994) pp. 639–650</TD></TR></table> | <table><TR><TD valign="top">[a1]</TD> <TD valign="top"> C. Antoniak, "Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems" ''Ann. Statist.'' , '''2''' (1974) pp. 1152–1174</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top"> C.A. Bush, S.N. MacEachern, "A semi-parametric Bayesian model for randomized block designs" ''Biometrika'' , '''83''' (1996) pp. 275–285</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top"> M. Escobar, M. West, "Bayesian density estimation and inference using mixtures" ''J. Amer. Statist. Assoc.'' , '''90''' (1995) pp. 577–588</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top"> T.S. Ferguson, "A Bayesian analysis of some nonparametric problems" ''Ann. Statist.'' , '''1''' (1973) pp. 209–230</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top"> T.S. Ferguson, "Prior distributions on spaces of probability measures" ''Ann. Statist.'' , '''2''' (1974) pp. 615–629</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top"> J. Sethuraman, "A constructive definition of Dirichlet priors" ''Statistica Sinica'' , '''4''' (1994) pp. 639–650</TD></TR></table> |
Latest revision as of 19:35, 5 June 2020
The Dirichlet process provides one means of placing a probability distribution on the space of distribution functions, as is done in Bayesian statistical analysis (cf. also Bayesian approach). The support of the Dirichlet process is large: For each distribution function there is a set of distributions nearby that receives positive probability. This contrasts with a typical probability distribution on the space of distribution functions where, for example, one might place a probability distribution on the mean and variance of a normal distribution. The support in this example would be contained in the collection of normal distributions. The large support of the Dirichlet process accounts for its use in non-parametric Bayesian analysis. General references are [a4], [a5].
The Dirichlet process is indexed by its parameter, a non-null, finite measure $ \alpha $. Formally, consider a space $ {\mathcal X} $ with a collection of Borel sets $ {\mathcal B} $ on $ {\mathcal X} $. The random probability distribution $ P $ has a Dirichlet process prior distribution with parameter $ \alpha $, denoted by $ {\mathcal D} _ \alpha $, if for every measurable partition $ \{ A _ {1} \dots A _ {m} \} $ of $ {\mathcal X} $ the random vector $ ( P ( A _ {1} ) \dots P ( A _ {m} ) ) $ has the Dirichlet distribution with parameter vector $ ( \alpha ( A _ {1} ) \dots \alpha ( A _ {m} ) ) $.
When a prior distribution is put on $ {\mathcal X} $, then for every measurable subset $ A $ of $ {\mathcal X} $, the quantity $ P ( A ) $ is a random variable. Then $ \alpha _ {0} = {\alpha / {\alpha ( {\mathcal X} ) } } $ is a probability measure on $ {\mathcal X} $. From the definition one sees that if $ P \sim {\mathcal D} _ \alpha $, then $ {\mathsf E} P ( A ) = \alpha _ {0} ( A ) $.
An alternative representation of the Dirichlet process is given in [a6]: Let $ B _ {1} , B _ {2} , \dots $ be independent and identically distributed $ { \mathop{\rm Beta} } ( 1, \alpha ( {\mathcal X} ) ) $ random variables, and let $ V _ {1} , V _ {2} , \dots $ be a sequence of independent and identically distributed random variables with distribution $ \alpha _ {0} ( A ) $, and independent of the random variables $ B $. Define $ B _ {0} = 0 $, and $ P _ {i} = B _ {i} \prod _ {j = 0 } ^ {i - 1 } ( 1 - B _ {j} ) $. The random distribution $ \sum _ {i = 1 } ^ \infty P _ {i} \delta _ {V _ {i} } $ has the distribution $ {\mathcal D} _ \alpha $. Here, $ \delta _ {a} $ represents the point mass at $ a $. This representation makes clear the fact that the Dirichlet process assigns probability one to the set of discrete distributions, and emphasizes the role of the mass of the measure $ \alpha $. For example, as $ \alpha ( {\mathcal X} ) \rightarrow \infty $, $ {\mathcal D} _ \alpha $ converges to the point mass at $ \alpha _ {0} $( in the weak topology induced by $ {\mathcal B} $); and as $ \alpha ( {\mathcal X} ) \rightarrow 0 $, $ {\mathcal D} _ \alpha $ converges to the random distribution which is degenerate at a point $ V $, whose location has distribution $ \alpha _ {0} $.
The Dirichlet process is conjugate, in that if $ P \sim {\mathcal D} _ \alpha $, and data points $ X _ {1} \dots X _ {n} $ independent and identically drawn from $ P $ are observed, then the conditional distribution of $ P $ given $ X _ {1} \dots X _ {n} $ is $ {\mathcal D} _ {\alpha + \sum _ {i = 1 } ^ {n} \delta _ {X _ {i} } } $. This conjugation property is an extension of the conjugacy of the Dirichlet distribution for multinomial data. It ensures the existence of analytical results with a simple form for many problems. The combination of simplicity and usefulness has given the Dirichlet process its reputation as the standard non-parametric model for a probability distribution on the space of distribution functions.
An important extension of the class of Dirichlet processes is the class of mixtures of Dirichlet processes. A mixture of Dirichlet processes is a Dirichlet process in which the parameter measure is itself random. In applications, the parameter measure ranges over a finite-dimensional parametric family. Formally, one considers a parametric family of probability distributions $ \{ {\alpha _ {\theta,0 } } : {\theta \in \Theta } \} $. Suppose that for every $ \theta \in \Theta $, $ \alpha _ \theta ( {\mathcal X} ) $ is a positive constant, and let $ \alpha _ \theta = \alpha _ \theta ( {\mathcal X} ) \cdot \alpha _ {\theta,0 } $. If $ \nu $ is a probability distribution on $ \Theta $, and if, first, $ \theta $ is chosen from $ \nu $, and then $ P $ is chosen from $ {\mathcal D} _ {\alpha _ \theta } $, one says that the prior on $ P $ is a mixture of Dirichlet processes (with parameter $ ( \{ \alpha _ \theta \} _ {\theta \in \Theta } , \nu ) $). A reference for this is [a1]. Often, $ \alpha _ \theta ( {\mathcal X} ) \equiv M $, i.e., the constants $ \alpha _ \theta ( {\mathcal X} ) $ do not depend on $ \theta $. In this case, large values of $ M $ indicate that the prior on $ P $ is "concentrated around the parametric family aq,0qQ" . More precisely, as $ M \rightarrow \infty $, the distribution of $ P $ converges to $ \int {\alpha _ {\theta,0 } } {\nu ( d \theta ) } $, the standard Bayesian model for the parametric family $ \{ {\alpha _ {\theta,0 } } : {\theta \in \Theta } \} $ in which $ \theta $ has prior $ \nu $.
The Dirichlet process has been used in many applications. A particularly interesting one is the Bayesian hierarchical model, which is the Bayesian version of the random effects model. A typical example is as follows. Suppose one is studying the success of a certain type of operation for patients from different hospitals. Suppose one has $ n _ {i} $ patients in hospital $ i $, $ i = 1 \dots I $. One might model the number of failures $ X _ {i} $ in hospital $ i $ as a binomial distribution, with success probability depending on the hospital. And one might wish to view the $ I $ binomial parameters as being independent and identically distributed drawn from a common distribution. The typical hierarchical model then is written as
$$ \tag{a1 } \textrm{ given } \theta _ {i} , X _ {i} \sim { \mathop{\rm Bin} } ( n _ {i} , \theta _ {i} ) , $$
$$ \theta _ {i} \sim { \mathop{\rm Beta} } ( a, b ) \textrm{ iid } , $$
$$ ( a, b ) \sim G ( \cdot, \cdot ) . $$
Here, the $ \theta _ {i} $ are unobserved, or latent, variables. If the distribution $ G $ was degenerate, then the $ \theta _ {i} $ would be independent, so that data from one hospital would not give any information on the success rate from any other hospital. On the other hand, when $ G $ is not degenerate, then data coming from the other hospitals provide some information on the success rate of hospital $ i $.
Consider now the problem of prediction of the number of successes for a new hospital, indexed $ I + 1 $. A disadvantage of the model (a1) is that if the $ \theta _ {i} $ are independent and identically drawn from a distribution which is not a Beta, then even as $ I \rightarrow \infty $, the predictive distribution of $ X _ {I + 1 } $ based on the (incorrect) model (a1) need not converge to the actual predictive distribution of $ X _ {I + 1 } $. An alternative model, using a mixture of Dirichlet processes prior, would be written as
$$ \tag{a2 } \textrm{ given } \theta _ {i} , X _ {i} \sim { \mathop{\rm Bin} } ( n _ {i} , \theta _ {i} ) , $$
$$ \theta _ {i} \sim P \textrm{ iid } , $$
$$ P \sim {\mathcal D} _ {M \cdot { \mathop{\rm Beta} } ( a,b ) } , $$
$$ ( a, b ) \sim G ( \cdot, \cdot ) . $$
The model (a2) does not have the defect suffered by (a1), because the support of the distribution on $ P $ is the set of all distributions concentrated in the interval $ [0,1] $.
It is not possible to obtain closed-form expressions for the posterior distributions in (a2). Computational schemes to obtain these have been developed by M. Escobar and M. West [a3] and C.A. Bush and S.N. MacEachern [a2].
The parameter $ M $ plays an interesting role. When $ M $ is small, then, with high probability, the $ \theta _ {i} $ are all equal, so that, in effect, one is working with the model in which the $ X _ {i} $ are independent binomial samples with the same success probability. On the other hand, when $ M $ is large, the model (a2) is very close to (a1).
It is interesting to note that when $ M $ is large and the distribution $ G $ is degenerate, then the measure on $ P $ is essentially degenerate, so that one is treating the data from the hospitals as independent. Thus, when the distribution $ G $ is degenerate, the parameter $ M $ determines the extent to which data from other hospitals is used when making an inference about hospital $ I $, and in that sense plays the role of tuning parameter in the bias-variance tradeoff of frequentist analysis.
References
[a1] | C. Antoniak, "Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems" Ann. Statist. , 2 (1974) pp. 1152–1174 |
[a2] | C.A. Bush, S.N. MacEachern, "A semi-parametric Bayesian model for randomized block designs" Biometrika , 83 (1996) pp. 275–285 |
[a3] | M. Escobar, M. West, "Bayesian density estimation and inference using mixtures" J. Amer. Statist. Assoc. , 90 (1995) pp. 577–588 |
[a4] | T.S. Ferguson, "A Bayesian analysis of some nonparametric problems" Ann. Statist. , 1 (1973) pp. 209–230 |
[a5] | T.S. Ferguson, "Prior distributions on spaces of probability measures" Ann. Statist. , 2 (1974) pp. 615–629 |
[a6] | J. Sethuraman, "A constructive definition of Dirichlet priors" Statistica Sinica , 4 (1994) pp. 639–650 |
Dirichlet process. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Dirichlet_process&oldid=37599