%%% Title of object: Marginal Probability. Its use in Bayesian Statistics as the Evidence of Models and Bayes Factors
%%% Canonical Name: MarginalProbabilityItsUseInBayesianStatisticsAsTheEvidenceOfModelsAndBayesFactors3
%%% Type: Topic
%%% Created on: 2010-08-24 22:47:03
%%% Modified on: 2010-08-24 22:47:03
%%% Creator: pericchi
%%% Modifier: jkimmel
%%%
%%% Classification: msc:62F03, msc:62F15
%%% Preamble:
\documentclass[10pt]{article}
% this is the default PlanetMath preamble. as your knowledge
% of TeX increases, you will probably want to edit this, but
% it should be fine as is for beginners.
% almost certainly you want these
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{amsfonts}
% used for TeXing text within eps files
%\usepackage{psfrag}
% need this for including graphics (\includegraphics)
%\usepackage{graphicx}
% for neatly defining theorems and propositions
%\usepackage{amsthm}
% making logically defined graphics
%\usepackage{xypic}
% there are many more packages, add them here as you need them
% define commands here
%%% Content:
\begin{document}
%\documentclass[11pt]{article}
%\usepackage[activeacute,spanish]{babel}
%\usepackage{graphicx}
%\usepackage{amsmath,amsthm,amssymb}
%\oddsidemargin .3 cm \evensidemargin .3 cm \textheight 21 cm
%\topmargin -1 cm \textwidth 6.3 in
%\newcommand{\R}{\mathbb{R}}
\newcommand{\be}{\begin{equation}}
\newcommand{\ee}{\end{equation}}
\newcommand{\bx}{\bf{x}}
\newcommand{\btheta}{\bf{\theta}}
%\theoremstyle{definition}
%\newtheorem{thm}{Theorem}[section]
%\newtheorem{cor}[thm]{Corollary}
%\newtheorem{lem}[thm]{Lemma}
%\newtheorem{exmp}{Example}
%\newtheorem{res}{Result}
%\newtheorem*{defin}{Definition}
%\begin{document}
\title{Marginal Probability. Its use in Bayesian Statistics as the Evidence of
Models and Bayes Factors}
\author{Luis Ra\'ul Pericchi, Department of Mathematics and Biostatistics and Bioinformatics Center,\\
University of Puerto Rico, Rio Piedras, San Juan, Puerto Rico.\thanks{email address: luarpr@uprrp.edu,This work sponsored in part by NIH Grant: P20-RR016470}}
\date{}
\maketitle
\textbf{Keywords:}\emph{Bayes Factors, Evidence of Models, Intrinsic Bayes Factors, Intrinsic Priors, Posterior Model Probabiities}
\section{Definition} Suppose that we have vectors of
random variables
$[\textbf{v,w}]=[v_1,v_2,\ldots,v_I,w_1,\ldots,w_J]$ in
$\Re^{(I+J)}$. Denote as the \textbf{joint} density function:
$f_{\textbf{v,w}}$, which obeys:$f_{\textbf{v,w}}(v,w) \ge 0$ and \\
$\int^{\infty}_{-\infty}\ldots\int^{\infty}_{-\infty}
f_{\textbf{v,w}}(v,w) dv_1\ldots dv_I dw_1\ldots dw_I=1$. Then the
probability of the set $[A_v,B_w]$ is given by
\[
P(A_v,B_w)=\int \ldots \int_{A_v,B_w} f_{\textbf{v,w}}(v,w)
\textbf{dv} \textbf{dw}.
\] The the \textbf{marginal} density $f_{\textbf{v}}$ is obtained as
\[
f_{\textbf{v}}(v)=\int^{\infty}_{-\infty}\ldots
\int^{\infty}_{-\infty}f_{\textbf{v,w}}(v,w) dw_1\ldots dw_I.
\]
The the \textbf{marginal probability} of the set $A_v$ is then
obtained as,
\[
P(A_v)=\int \ldots \int_{A_v} f_{\textbf{v}}(v) dv.
\]
We have assumed that the random variables are continuous. When they
are discrete, integrals are substituted by sums.\\ We proceed to
present an important application of marginal densities to construct the \emph{Evidence of the Model} and marginal probabilities for
measuring the \emph{Bayesian Probability of a Model}.
\section{Measuring the Evidence in Favor of a Model}
In Statistics, a parametric model, is denoted as
$f(x_1,\ldots,x_n|\theta_1,\ldots,\theta_k)$, where
$\textbf{x}=(x_1,\ldots, x_n)$ is the vector of $n$ observations and
$\btheta=(\theta_1,\ldots,\theta_k)$ is the vector of $k$
parameters. For instance we may have $n$ observations normally
distributed and the vector of parameters is $(\theta_1,\theta_2)$
the location and scale respectively, denoted by
$f_{\textit{Normal}}(\textbf{x}|\btheta)=\prod_{i=1}^n
\frac{1}{\sqrt{2 \pi} \theta_2} \exp(-\frac{1}{2 \theta^2_2}
(x_i-\theta_1)^2)$.\\ Assume now that there is reason to suspect
that the location is zero.
As a second example, it may be suspected that the sampling model which usually has been assumed Normally distributed, is instead a Cauchy, $f_{\textit{Cauchy}}(\textbf{x}|\btheta)=\prod_{i=1}^n \frac{1}{\pi \theta_2}\frac{1}{(1+(\frac{x_i-\theta_1}{\theta_2})^{2})}$. The first problem is a \emph{hypothesis test} denoted by
\[H_0: \theta_1=0 \mbox{ VS } H_1: \theta_1 \neq 0, \]
and the second problem is a \emph{model selection} problem:
\[
M_0: f_{\textit{Normal}} \mbox{ VS } M_1: f_{\textit{Cauchy}}.
\]
How to measure the evidence in favor of $H_0$ or $M_0$? Instead of maximizing likelihoods as it is done in traditional significance testing, in Bayesian statistics the central concept is \textit{the evidence} or \textit{marginal probability density}
\[
m_j({\bx})=\int f_j({\bx}|\btheta_j) \pi(\btheta_j) d\btheta_j,
\]
where $j$ denotes either model or hypothesis $j$ and $\pi(\btheta_j)$ denotes the prior
for the parameters under model or hypothesis $j$.\\ Marginal probabilities embodies the likelihood of
a model or hypothesis in great generality and can be claimed it is the natural probabilistic quantity to
compare models.
\section{Marginal Probability of a Model} Once the marginal densities of the model j, for $j=1,\ldots,J$ models have been calculated and assuming the prior model probabilities $P(M_j), j=1,\ldots, J$ with $\sum_{j=1}^J P(M_j)=1$ then, using Bayes Theorem, \textit{the marginal probability of a model} $P(M_j|\bx)$ can be calculated as,
\[
P(M_j|\bx)=\frac{m_j({\bx}) \cdot P(M_j)}{\sum_{i=1}^n m_i({\bx}) \cdot P(M_i)}.
\]
We have then the following formula for any two models or hypotheses:
\[
\frac{P(M_j|\bx)}{P(M_i|\bx)}= \frac{P(M_j)}{P(M_i)} \times \frac{m_j({\bx})}{m_i({\bx})},
\]
or in words: Posterior Odds equals Prior Odds times Bayes Factor, where the Bayes Factor of $M_j$ over $M_i$ is
\[
B_{j,i}=\frac{m_j({\bx})}{m_i({\bx})},
\]
Jeffreys (1961).\\
In contrast to \textit{p-values}, which have interpretations heavily
dependent on the sample size $n$, and its definition is not the same
as the scientific question, the posterior probabilities and Bayes
Factors address the scientific question: "how probable is model or
hypothesis j as compared with model or hypothesis i?", and the
interpretation is the same for any sample size, Berger and Pericchi
(1996a, 2001). Bayes Factors and Marginal Posterior Model Probabilities
have several advantages, like for example large sample consistency,
that is as the sample size grows the Posterior Model Probability of
the sampling model tends to one. Furthermore, if the goal is to
predict future observations $y_f$ it is \textbf{not} necessary to
select one model as \textit{the} predicting model since we may
predict by the so called Bayesian Model Averaging, which if
quadratic loss is assumed, the optimal predictor takes the form,
\[
E[Y_f|\bx]= \sum_{j=1}^J E[Y_f|\bx, M_j] \times P(M_j|\bx),
\]
where $E[Y_f|\bx,M_j]$ is the expected value of a future observation under the model or hypothesis $M_j$.
\section{Intrinsic Priors for Model Selection and Hypothesis Testing}
Having said some of the advantages of the marginal probabilities of
models, the question arises: how to assign the conditional priors
$\pi(\theta_j)$? In the two examples above which priors are sensible
to use? The problem is \textbf{not} a simple one since it is not
possible to use the usual Uniform priors since then the Bayes
Factors are undetermined. To solve this problem with some
generality, Berger and Pericchi (1996a,b) introduced the concepts of
Intrinsic Bayes Factors and Intrinsic Priors. Start by splitting the
sample in two sub-samples $\bx=[\bx(l),\bx(-l)]$ where the training
sample $\bx(l)$ is as small as possible such that for $j=1,\ldots,J:
0