In practice, while implementing MCMC methods it is important to construct samplers that mix well, where mixing is measured by the autocorrelation time, because such samplers can be expected to converge more quickly to the invariant distribution. Over the years a number of different recipes for designing samplers with low autocorrelation times have been proposed although it may sometimes be difficult, because of the complexity of the problem, to apply any of these recipes.
As a general rule, sets of parameters that are highly correlated should be treated as one block when applying the multiple-block M-H algorithm. Otherwise, it would be difficult to develop proposal densities that lead to large moves through the support of the target distribution.
Blocks can be combined by the
method of composition. For example, suppose that
and
denote three
blocks and that the distribution
is
tractable (i.e., can be sampled directly). Then, the blocks
can be collapsed by first
sampling
from
followed by
from
. This amounts to
a two block MCMC algorithm. In addition, if it is possible to sample
marginalized over
then the number of blocks is reduced to
one. [35] discuss the value of these strategies in the
context of a three-block Gibbs MCMC chains. [52] provide
further discussion of the role of blocking in the context of Gibbs
Markov chains used to sample multivariate normal target distributions.
As mentioned above, the proposal density in a M-H algorithm has an
important bearing on the mixing of the MCMC chain. Fortunately, one
has great flexibility in the choice of candidate generating density
and it is possible to adapt the choice to the given problem. For
example, [16] develop and compare four different choices in
longitudinal
random effects models for
count data. In this problem, each cluster (or individual) has its own
random effects and each of these has to be sampled from an intractable
target distribution. If one lets
denote the number of clusters,
where
is typically large, say in excess of a thousand, then the
number of blocks in the MCMC implementation is
(
for each of
the random effect distributions, two for the fixed effects and one for
the variance components matrix). For this problem, the multiple-block
M-H algorithm requires
M-H steps within one iteration of the
algorithm. Tailored proposal densities are therefore computationally
expensive but one can use a mixture of proposal densities where a less
demanding proposal, for example a random walk proposal, is combined
with the tailored proposal to sample each of the
random effect
target distributions. Further discussion of mixture proposal densities
is contained in [59].
Other approaches have also been discussed in the
literature. [37] develop the
simulated tempering method whereas [30] develop a related
technique that they call the Metropolis-coupled MCMC method. Both
these approaches rely on a series of transition kernels
where only
has
as the
stationary distribution. The other kernels have equilibrium
distributions
, which [30] take to be
,
. This
specification produces a set of target distributions that have higher
variance than
. Once the transition kernels and
equilibrium distributions are specified then the Metropolis-coupled
MCMC method requires that each of the
kernels be used in
parallel. At each iteration, after the
draws have been obtained,
one randomly selects two chains to see if the states should be
swapped. The probability of swap is based on the M-H acceptance
condition. At the conclusion of the sampling, inference is based on
the sequence of draws that correspond to the
distribution
. These methods promote rapid mixing because
draws from the various ''flatter'' target densities have a chance of
being swapped with the draws from the base kernel
. Thus,
variates that are unlikely under the transition
have a chance of
being included in the chain, leading to more rapid exploration of the
parameter space.