The natural approach to these computational problems is to use
computer simulation and Monte Carlo techniques, rather than numerical
methods, simply because there is much more to gain from exploiting the
probabilistic properties of the integrands rather than their
analytical properties. In addition, the dimension of most problems
considered in current Bayesian Statistics is such that very involved
numerical methods should be used to provide a satisfactory
approximation in such integration or optimisation problems. Indeed,
down-the-shelf numerical methods cannot handle integrals in dimensions
larger than and more advanced numerical integration methods
require analytical studies on the distribution of interest.
Given the statistical nature of the problem, the approximation of an integral like
![]() |
![]() |
![]() |
![]() |
In addition, if the posterior variance
var is
finite, the Central Limit Theorem applies to the empirical
average (11.4), which is then asymptotically normal
with variance
var
. Confidence regions can
then be built from this normal approximation and, most importantly,
the magnitude of the error remains of order
, whatever the
dimension of the problem, in opposition with numerical
methods.11.5 (See also [37],
Chap. 4, for more details on the convergence assessment based on the
CLT.)
The Monte Carlo method actually applies in a much wider generality
than the above simulation from . For instance, because
can be represented in an infinity of ways as an
expectation, there is no need to simulate from the distributions
or
to get a good approximation
of
. Indeed, if
is a probability density with
supp
including the support of
, the integral
can also be represented as
an expectation against
, namely
![]() |
![]() |
![]() |
While this convergence is guaranteed for all densities with wide
enough support, the choice of the importance function
is crucial. First, simulation from
must be easily implemented.
Moreover, the function
must be close enough to the
function
, in order to reduce the variability
of (11.5) as much as possible; otherwise, most of the weights
will be quite small and a few will be overly
influential. In fact, if
is not finite, the variance of the estimator (11.5) is
infinite (see [37], Chap. 3).
Obviously, the dependence on
of the importance function
can be
avoided by proposing generic choices such as the posterior
distribution
.
In either point estimation or simple testing situations, the computational problem is often expressed as a ratio of integrals. Let us start with a toy example to set up the way Monte Carlo methods proceed and highlight the difficulties of applying a generic approach to the problem.
If the inferential
problem is to decide about the value of , the posterior
expectation is
![]() |
||
![]() |
Now, there is a clear arbitrariness in the choice of in the
sample
for the proposal
. While any of the
's has the same
theoretical validity to ''represent'' the integral and the
integrating density, the choice of
's closer to the posterior
mode (the true value of
is 0) induces less variability in
the estimates, as shown by a further simulation experiment through
Fig. 11.4. It is fairly clear from this comparison that the
choice of extremal values like
and even more
is detrimental to the quality of the
approximation, compared with the median
. The
range of the estimators is much wider for both extremes, but the
influence of this choice is also visible for the average which does
not converge so quickly.11.7
![]() |
![]() |
to12cm![]() ![]() ![]() |
This example thus shows that Monte Carlo methods, while widely
available, may easily face inefficiency problems when the simulated
values are not sufficiently attuned to the distribution of
interest. It also shows that, fundamentally, there is no difference
between importance sampling and regular Monte Carlo, in that the
integral
can naturally be represented in many ways.
Although we do not wish to spend too much space on this issue, let us
note that the choice of the importance function gets paramount when
the support of the function of interest is not the whole
space. For instance, a tail probability, associated with
say, should be estimated with an
importance function whose support is
. (See
[37], Chap. 3, for details.)
![]() |
![]() |
The above example is one-dimensional (in the parameter) and the
problems exhibited there can be found severalfold in multidimensional
settings. Indeed, while Monte Carlo methods do not suffer from the
''curse of dimension''
in the sense that the error of the
corresponding estimators is always decreasing in ,
notwithstanding the dimension, it gets increasingly difficult to come
up with satisfactory importance sampling distributions as the
dimension gets higher and higher. As we will see in
Sect. 11.5, the intuition built on MCMC methods has to be
exploited to derive satisfactory importance
functions.
![]() |
Pr![]() ![]() |
Figure 11.6 illustrates this degeneracy of the importance
sampling
approach as the dimension increases. We simulate parameters 's
and datasets
for dimensions
ranging
from
to
, then represented the histograms of the largest
weight for
. The
's were simulated from
a
distribution, while the importance sampling
distribution was a
distribution.
As explained in Sects. 11.2.2 and 11.2.3, the first computational difficulty associated with Bayesian testing is the derivation of the Bayes factor, of the form
![]() |
Specific Monte Carlo methods for the estimation of ratios of
normalizing constants,
or, equivalently, of Bayes factors, have been developed in the past
five years. See Chen et al. (2000,
Chap. 5) for a complete exposition. In
particular, the importance sampling technique is rather well-adapted
to the computation of those
Bayes factors: Given a importance distribution, with density
proportional to , and a sample
simulated from
, the marginal density for model
,
, is approximated by
![]() |
![]() |
![]() |
However, the variance of
may be infinite, depending
on the choice of
. A possible choice is
,
with wider tails than
, but this is often inefficient
if the data is informative because the prior and the posterior
distributions will be quite different and most of the simulated values
fall outside the modal region of the likelihood. For
the choice
,
![]() |
![]() |
![]() |
![]() |
![]() |
![]() ![]() |
![]() |
As can be seen from the previous developments, such methods require a rather careful tuning to be of any use. Therefore, they are rather difficult to employ outside settings where pairs of models are opposed. In other words, they cannot be directly used in general model choice settings where the parameter space (and in particular the parameter dimension) varies across models, like, for instance, Example 7. To address the computational issues corresponding to these cases requires more advanced techniques introduced in the next section.