\documentclass[12pt]{article}
\begin{document}
\noindent{\bf Thomas BAYES}\\
b.c. 1701 - d. 7 April 1761
\vspace{.5 cm}
\noindent{\bf Summary.} The problem
of passing from a population to the properties of a sample was
one of the first studied in probability. Thomas Bayes, a nonconformist
minister, was the first to solve the inverse problem of passage from sample
to population, using ideas that are widely used today.
\vspace{.5 cm}
Thomas Bayes, born in London, the son of a nonconformist minister, spent
most of his adult life in a similar position in Tunbridge Wells,
England. He was educated at Edinburgh University and was elected
a fellow
of the Royal Society in 1742. During his lifetime he published a few
mathematical papers, of which the best-known is a 1736 defence of
Newton's ideas against an attack by Bishop Berkeley. He is today
remembered for a paper that his friend Richard Price claimed to
have found amongst his possessions after death. It appeared in the
Society's Transactions in 1763 and has often been republished.
Apart from these bare facts, surprisingly little is known of Bayes'
life.
By the middle of the 18th century it was well-understood that if,
to use modern terminology, in each of $n$ independent trials, the
chance of success had the same value, $\theta$ say; then the
probability of exactly $r$ successes was given by the binomial
distribution
$$P(r|\theta,n) = {n \choose r}\theta^r(1-\theta)^{n-r}.$$
\noindent Jacob Bernoulli (q.v.) had established the weak law of large
numbers and de Moivre (q.v.) had found the normal approximation to the
binomial. The passage from a known value of $\theta$ to the
empirical observation of $r$ was therefore extensively appreciated.
Bayes studied the inverse problem; what did the data $(r,n)$ say
about the chance $\theta$? There already existed partial answers.
For example, Arbuthnot had observed $r$ male, and $n-r$ female,
births with $r$ considerably in excess of $1/2n$. He argued that,
on the basis of the binomial with $\theta = 1/2$, a value of $r$ as
high as this was so improbable that $\theta$ could not be $1/2$.
That idea has been much extended into the modern form of a
significance test and its associated $P$-value or significance
level.
Bayes proceeded differently using the theorem that nowadays always
bears his name, though it does not appear explicitly in the 1763
paper,
$$P(A|B) = P(B|A)P(A)/P(B)$$
\noindent for events $A, B$ with $P(B)\neq 0$. The theorem permits
the inversion of the events in $P(B|A)$ into $P(A|B)$. Applied
when $A$ refers to $\theta$ and $B$ to the empirical $r$, we have
$$P(\theta|r,n) \propto P(r|\theta,n)P(\theta|n).$$
\noindent (The missing constant of proportionality does not depend on
$\theta$. It is $P(r|n)^{-1}$ but is most easily found by making
the product integrate to one by multiplying by the constant). The
result effects the passage from the binomial, on the right, to a
probability statement about the change, on the left. It therefore
becomes possible to pass from the data to a statement about what
are probable, and what are improbable, values of the chance.
This elegantly and simply solves the problem, except for one
difficulty. It requires a value for $P(\theta|n)$, a probability
distribution for the chance before the result of the trials has
been observed. It is usual to describe this as the prior
distribution (prior, that is, to $r$) and the final result as the
posterior distribution. Thus the theorem describes how your views
of $\theta$ change, from prior to posterior, as a result of data
$r$. Bayes discussed the choice of prior but his approach is
ambiguous. He is usually supposed to have taken $P(\theta|n)$
uniform in (0,1) - the so-called Bayes's postulate - but an
alternative reading suggests he took $P(r|n)$ to be uniform.
Mathematically these lead to the same result.
Little notice was taken of the 1763 paper at the time. It was
first appreciated by Laplace (q.v.), in the early years of the next
century, who used the ideas in his eclectic approach to
probability. The theorem is of basic importance because it
provides a solution to the general problem of inference or
induction. Let $H$ be a universal hypothesis and $E$ empirical
evidence bearing on $H$. A simple example might be $H$, all swans
are white, and $E$ the observation of the colour of a swan. A more
sophisticated one would have $H$ as Newton's laws and $E$
observation of the motions of the planets. In either case,
$P(E|H)$ can be calculated. Bayes's theorem says
$$P(H|E) \propto P(E|H)P(H),$$
\noindent expressing a view about the hypothesis, given the evidence,
in terms of the known probability of the evidence, given the
hypothesis, and the prior view about $H$. As more evidence
supporting $H$ accrues, having large probability on $H$, so even
the sceptic, with low $P(H)$, will become convinced, $P(H|E)$ will
approach one and the hypothesis accepted. Many people, following
Jeffreys (q.v.), who extensively developed these ideas into a practicable
scientific tool, hold that this provides a description of the
scientific method. This view differs from that of Popper, who only
admits refutation of a hypothesis and whose attitude to probability
is regarded as unsound by supporters of Jeffrey's ideas.
Recently Bayes's theorem has been used as a means of processing
evidence in a court of law. Let $G$ be the hypothesis that the
defendant is truly guilty of the offence with which he or she has
been charged and $E$ a new piece of evidence. Then applying the
theorem both to $G$ and to $\bar G$, denoting innocence,
$$\frac{P(G|E)}{P(\bar G|E)} = \frac{P(E|G)}{P(E|\bar
G)}\;\frac{P(G)}{P(\bar G)}.$$
\noindent The expression on the left is the odds on guilt, given $E$;
that on the right is the same odds without $E$. The remaining term
is the likelihood ratio, being a comparison of the probability of
the evidence, supposing guilt, to the same probability supposing
innocence. Forensic scientists present evidence to the court in
the form of a likelihood ratio. The court can then multiply their
former (prior) odds by the likelihood ratio to obtain new
(posterior) odds as a result of hearing the evidence.
Bayes's result has attained increased importance following work by
Ramsey, de Finetti and Savage between 1925 and 1955, which
demonstrated that our knowledge had to be based on probability and
that our beliefs must obey the rules of the probability calculus,
of which Bayes's is essentially the multiplication rule. In this
view, the significance levels of the classical school are unsound
because they do not express opinions about hypotheses like $H$, or
parameters like $\theta$, in terms of direct probabilities of $H$
or $\theta$. The resulting methodology is called Bayesian
statistics and has rather different procedures and results from
those of the classical school. Bayesians regard probability as a
measure of a person's belief, whereas the classical school only
admits probability as a frequency concept.
Bayes's result is also central to modern ideas on decision-making
under uncertainty. Suppose there is a choice to be made amongst a
set ${d}$ of decisions in the presence of uncertainty about a
parameter $\theta$. The work of Ramsey and others leads to the
introduction of a utility function $u(d, \theta)$; describing the
worth of decision $d$ when the parameter has the value $\theta$,
and the choice of that $d$ which maximizes the expected utility
$\sigma_{\theta}u(d,\theta)P(\theta)$. Additional evidence $E$
updates $P(\theta)$ to $P(\theta|E)$, by the theorem, and improves
the decision-making. All this is a long way from Bayes's original
problem and its resolution. He would doubtless be astonished were
he to realize how his wonderful idea has been extended and his name
used.
\vspace{.5 cm}
\begin{thebibliography}{3}
\bibitem{1} The original paper appeared in {\it The Philosophical
Transactions of the Royal Society of London} (1763) {\bf 53},
370-418. There is a reprint in {\it Biometrika} (1958) {\bf 45},
296-315. An illuminating commentary on it is provided by S.M.
Stigler (1982) Thomas Bayes's Bayesian Inference. {\it Journal of
the Royal Statistical Society, Series A},
{\bf 145}, 250-258. The most complete biography
is provided by A.W.F. Edwards in the latest edition of {\it The
Dictionary of National Biography}.
\vspace{.25 cm}
\bibitem{2} Two recent books on modern Bayesion methods are A.O'Hagan
(1994) {\it Bayesian Inference}. Vol.2B of Kendall's {\it Advanced
Theory of Statistics}. Edward Arnold, London; John Wiley, New York.
J.M. Bernardo $\&$ A.F.M. Smith (1994) {\it Bayesian Theory}.
John Wiley, Chichester. The latter is part of a forthcoming
3-volume work and has an extensive bibliography. The modern
`classic' is B. de Finetti (1974/5) {\it Theory of Probability}.
John Wiley, London, in 2 volumes, translated from the Italian.
\vspace{.25 cm}
\bibitem{3} C.G.G. Aitken (1995) {\it Statistics and the Evaluation
of Evidence for Forensic Scientists}. John Wiley, Chichester, deals
with legal applications. D.V. Lindley (1985) {\it Making
Decisions}. John Wiley, London, extends Bayesian ideas to
decision-making.
\vspace{1 cm}
\hfill{D.V. Lindley}
\end{thebibliography}
\end{document}