next up previous contents index
Next: 3.9 Concluding Remarks Up: 3. Markov Chain Monte Previous: 3.7 Sampler Performance and

Subsections



3.8 Strategies for Improving Mixing

In practice, while implementing MCMC methods it is important to construct samplers that mix well, where mixing is measured by the autocorrelation time, because such samplers can be expected to converge more quickly to the invariant distribution. Over the years a number of different recipes for designing samplers with low autocorrelation times have been proposed although it may sometimes be difficult, because of the complexity of the problem, to apply any of these recipes.

3.8.1 Choice of Blocking

As a general rule, sets of parameters that are highly correlated should be treated as one block when applying the multiple-block M-H algorithm. Otherwise, it would be difficult to develop proposal densities that lead to large moves through the support of the target distribution.

Blocks can be combined by the method of composition. For example, suppose that $ \boldsymbol{\psi}_1,\boldsymbol{\psi}_2$ and $ \boldsymbol{\psi}_3$ denote three blocks and that the distribution $ \boldsymbol{\psi}_1\vert\boldsymbol{\psi}_3$ is tractable (i.e., can be sampled directly). Then, the blocks $ (\boldsymbol{\psi} _1,\boldsymbol{\psi}_2)$ can be collapsed by first sampling $ \boldsymbol{\psi}_1$ from $ \boldsymbol{\psi}_1\vert\boldsymbol{\psi}_3$ followed by $ \boldsymbol{\psi}_2$ from $ \boldsymbol{\psi}_2\vert\boldsymbol{\psi}_1,\boldsymbol{\psi }_3$. This amounts to a two block MCMC algorithm. In addition, if it is possible to sample $ (\boldsymbol{\psi} _1,\boldsymbol{\psi}_2)$ marginalized over $ \boldsymbol{\psi}_3$ then the number of blocks is reduced to one. [35] discuss the value of these strategies in the context of a three-block Gibbs MCMC chains. [52] provide further discussion of the role of blocking in the context of Gibbs Markov chains used to sample multivariate normal target distributions.

3.8.2 Tuning the Proposal Density

As mentioned above, the proposal density in a M-H algorithm has an important bearing on the mixing of the MCMC chain. Fortunately, one has great flexibility in the choice of candidate generating density and it is possible to adapt the choice to the given problem. For example, [16] develop and compare four different choices in longitudinal random effects models for count data. In this problem, each cluster (or individual) has its own random effects and each of these has to be sampled from an intractable target distribution. If one lets $ n$ denote the number of clusters, where $ n$ is typically large, say in excess of a thousand, then the number of blocks in the MCMC implementation is $ n+3$ ($ n$ for each of the random effect distributions, two for the fixed effects and one for the variance components matrix). For this problem, the multiple-block M-H algorithm requires $ n+1$ M-H steps within one iteration of the algorithm. Tailored proposal densities are therefore computationally expensive but one can use a mixture of proposal densities where a less demanding proposal, for example a random walk proposal, is combined with the tailored proposal to sample each of the $ n$ random effect target distributions. Further discussion of mixture proposal densities is contained in [59].

3.8.3 Other Strategies

Other approaches have also been discussed in the literature. [37] develop the simulated tempering method whereas [30] develop a related technique that they call the Metropolis-coupled MCMC method. Both these approaches rely on a series of transition kernels $ \{K_1,\ldots,K_m\}$ where only $ K_1$ has $ \pi ^{\ast}$ as the stationary distribution. The other kernels have equilibrium distributions $ \pi_i$, which [30] take to be $ \pi
_i(\boldsymbol{\psi})=\pi ( \boldsymbol{\psi })^{1/i}$, $ i=2,\ldots,m$. This specification produces a set of target distributions that have higher variance than  $ \pi ^{\ast}$. Once the transition kernels and equilibrium distributions are specified then the Metropolis-coupled MCMC method requires that each of the $ m$ kernels be used in parallel. At each iteration, after the $ m$ draws have been obtained, one randomly selects two chains to see if the states should be swapped. The probability of swap is based on the M-H acceptance condition. At the conclusion of the sampling, inference is based on the sequence of draws that correspond to the distribution  $ \pi ^{\ast}$. These methods promote rapid mixing because draws from the various ''flatter'' target densities have a chance of being swapped with the draws from the base kernel $ K_1$. Thus, variates that are unlikely under the transition $ K_1$ have a chance of being included in the chain, leading to more rapid exploration of the parameter space.


next up previous contents index
Next: 3.9 Concluding Remarks Up: 3. Markov Chain Monte Previous: 3.7 Sampler Performance and