Next: 3.5 MCMC Sampling with Up: 3. Markov Chain Monte Previous: 3.3 Metropolis-Hastings Algorithm

3.4 The Gibbs Sampling Algorithm

The Gibbs sampling algorithm is one of the simplest Markov chain Monte Carlo algorithms. It was introduced by [26] in the context of image processing and then discussed in the context of missing data problems by [58]. The paper by [23] helped to demonstrate the value of the Gibbs algorithm for a range of problems in Bayesian analysis.

3.4.1 The Algorithm

To define the Gibbs sampling algorithm, let the set of full conditional distributions be

$\displaystyle \left\{\pi (\boldsymbol{\psi}_1\vert\boldsymbol{\psi}_2,\ldots,\b... ...bol{\psi}_p\vert\boldsymbol{\psi}_1,\ldots,\boldsymbol{\psi} _{d-1})\right\}\;.$

Now one cycle of the Gibbs sampling algorithm is completed by simulating $\{\boldsymbol{\psi }_k\}_{k=1}^p$ from these distributions, recursively refreshing the conditioning variables. When

one obtains the two block Gibbs sampler that appears in [58]. The Gibbs sampler in which each block is revised in fixed order is defined as follows.

Algorithm 3 (Gibbs Sampling)

Specify an initial value $\boldsymbol{\psi}^{(0)}=\left(\boldsymbol{\psi} _1^{(0)},\ldots,\boldsymbol{\psi}_p^{(0)}\right)$ :
Repeat for $j=1,2,\ldots,M$ .

Generate $\boldsymbol{\psi}_1^{(j+1)}$ from $\pi\left(\boldsymbol{\psi}_1\vert\boldsymbol{\psi}_2^{(j)},\boldsymbol{\psi}_3^{(j)}, \ldots,\boldsymbol{\psi}_p^{(j)}\right)$

Generate $\boldsymbol{\psi}_2^{(j+1)}$ from $\pi \left(\boldsymbol{\psi}_2\vert \boldsymbol{\psi}_1^{(j+1)},\boldsymbol{\psi}_3^{(j)},\ldots ,\boldsymbol{\psi} _p^{(j)}\right)$

$\vdots$

Generate $\boldsymbol{\psi}_p^{(j+1)}$ from $\pi \left(\boldsymbol{\psi}_p\vert \boldsymbol{\psi}_1^{(j+1)},\ldots ,\boldsymbol{\psi}_{p-1}^{(j+1)}\right)$ .
Return the values $\left\{\boldsymbol{\psi}^{(1)},\boldsymbol{\psi}^{(2)}, \ldots, \boldsymbol{\psi}^{(M)}\right\}$ .

**Figure 3.4:** Gibbs sampling algorithm in two dimensions starting from an initial point and then completing three iterations
$\includegraphics[width=7.3cm]{text/2-3/fig4.eps}$

It follows that the transition density of moving from $\boldsymbol{\psi}_k^{(j)}$ to $\boldsymbol{\psi}_k^{(j+1)}$ is given by

$\displaystyle \pi \left(\boldsymbol{\psi}_k\vert\boldsymbol{\psi}_1^{(j+1)},\ld... ...}^{(j+1)},\boldsymbol{\psi}_{k+1}^{(j)},\ldots,\boldsymbol{\psi}_p^{(j)}\right)$

since when the

th block is reached, the previous

blocks have been updated. Thus, the transition density of the chain, under the maintained assumption that $\pi$ is absolutely continuous, is given by the product of transition kernels for each block:

$\displaystyle K(\boldsymbol{\psi},\boldsymbol{\psi}^{\prime})=\prod_{k=1}^p\pi ... ...j+1)},\boldsymbol{ \psi}_{k+1}^{(j)},\ldots,\boldsymbol{\psi}_p^{(j)}\right)\;.$

(3.23)

To illustrate the manner in which the blocks are revised, we consider a two block case, each with a single component, and trace out in Fig. 3.4 a possible trajectory of the sampling algorithm. The contours in the plot represent the joint distribution of $\boldsymbol{\psi}$ and the labels '''', '''' etc., denote the simulated values. Note that one iteration of the algorithm is completed after both components are revised. Also notice that each component is revised along the direction of the coordinate axes. This feature can be a source of problems if the two components are highly correlated because then the contours get compressed and movements along the coordinate axes tend to produce small moves. We return to this issue below.

3.4.2 Invariance of the Gibbs Markov Chain

The Gibbs transition kernel is invariant by construction. This is a consequence of the fact that the Gibbs algorithm is a special case of the multiple-block M-H algorithm which is invariant, as was established in the last section. Invariance can also be confirmed directly. Consider for simplicity a two block sampler with transition kernel density

$\displaystyle K(\boldsymbol{\psi},\boldsymbol{\psi}^{\prime})=\pi (\boldsymbol{... ...{\psi}_2)\pi (\boldsymbol{\psi}_2^{\prime}\vert\boldsymbol{\psi}_1^{\prime})\;.$

To check invariance we have to show that

	$\displaystyle \int K(\boldsymbol{\psi},\boldsymbol{\psi}^{\prime})\pi (\boldsymbol{\psi}_1, \boldsymbol{\psi}_2)d\boldsymbol{\psi}_1d\boldsymbol{\psi}_2$
	$\displaystyle =\int \pi (\boldsymbol{\psi}_1^{\prime}\vert\boldsymbol{\psi}_2)\... ...ldsymbol{\psi}_1,\boldsymbol{\psi} _2)d\boldsymbol{\psi }_1d\boldsymbol{\psi}_2$

is equal to $\pi (\boldsymbol{\psi}_1^{\prime},\boldsymbol{\psi}_2^{\prime})$ . This holds because $\pi (\boldsymbol{\psi}_2^{\prime}\vert\boldsymbol{ \psi}_1^{\prime})$ comes out of the integral, and the integral over $\boldsymbol{\psi}_1$ and $\boldsymbol{\psi}_2$ produces $\pi (\boldsymbol{\psi} _1^{\prime})$ . This calculation can be extended to any number of blocks. It may be noted that the Gibbs Markov chain is not reversible. Reversible Gibbs samplers are discussed by [36].

3.4.3 Sufficient Conditions for Convergence

Under rather general conditions, the Markov chain generated by the Gibbs sampling algorithm converges to the target density as the number of iterations become large. Formally, if we let $K(\boldsymbol{\psi}, \boldsymbol{\psi}^{\prime})$ represent the transition density of the Gibbs algorithm and let $K^{(M)}(\boldsymbol{\psi}_0,\boldsymbol{\psi}^{\prime})$ be the density of the draw $\boldsymbol{\psi}^{\prime}$ after iterations given the starting value $\boldsymbol{\psi}_0$ , then

$\displaystyle \left\Vert K^{(M)}\left(\boldsymbol{\psi}^{(0)},\boldsymbol{\psi}^{\prime}\right)-\pi ( \boldsymbol{\psi}^{\prime})\right\Vert \rightarrow 0\;,$ as $\displaystyle \quad M\rightarrow \infty \;.$

(3.24)

[53] (see also [7]) have shown that the conditions of Proposition 2 are satisfied under the following conditions: (1) $\pi (\boldsymbol{\psi})>0$ implies there exists an open neighborhood $N_{\boldsymbol{\psi}}$ containing $\boldsymbol{\psi}$ and $\epsilon > 0$ such that, for all $\boldsymbol{\psi}^{\prime}\in N_{\boldsymbol{\psi}}$ , $\pi (\boldsymbol{\psi }^{\prime})\geq \epsilon >0$ ; (2) $\int f(\boldsymbol{\psi})\,d\boldsymbol{\psi} _k$ is locally bounded for all

, where $\boldsymbol{\psi}_k$ is the

th block of parameters; and (3) the support of $\boldsymbol{\psi}$ is arc connected.

These conditions are satisfied in a wide range of problems.

3.4.4 Example: Simulating a Truncated Multivariate Normal

Consider the question of sampling a trivariate normal distribution truncated to the positive orthant. In particular, suppose that the target distribution is

$\displaystyle \pi (\boldsymbol{\psi})$	$\displaystyle =\frac 1{\Pr (\boldsymbol{\psi}\in A)}f_N(\boldsymbol{\mu}, \boldsymbol{\Sigma})I(\boldsymbol{\psi}\in A)$
	$\displaystyle \propto f_N(\boldsymbol{\mu},\boldsymbol{\Sigma})I(\boldsymbol{\psi}\in A)$

where $\boldsymbol{\mu}=({0.5},1,{1.5})^{\prime}$ , $\boldsymbol{\Sigma}$ is in equi-correlated form with units on the diagonal and

on the off-diagonal, $A=(0,\infty )\times (0,\infty )\times (0,\infty )$ and $\Pr (\boldsymbol{\psi} \in A)$ is the normalizing constant which is difficult to compute. In this case, the Gibbs sampler is defined with the blocks $\psi _1,\psi _2,\psi _3$ and the full conditional distributions

$\displaystyle \pi (\psi _1\vert\psi _2,\psi _3)\;;\quad\pi (\psi _2\vert\psi _1,\psi _3)\;; \quad\pi(\psi _3\vert\psi _1,\psi _2)\;,$

where each of the these full conditional distributions is univariate truncated normal restricted to the interval $(0,\infty )$ :

$\displaystyle \pi (\psi _k\vert\boldsymbol{\psi}_{-k})\propto f_N\left(\psi _k\... ...ldsymbol{\Sigma}_{-k}^{-1}\boldsymbol{C} _k\right)I(\psi _k\in (0,\infty )) \;,$

(3.25)

$\boldsymbol{C}_k=Cov(\psi _k,\boldsymbol{\psi} _{-k})$ , $\boldsymbol{\Sigma}_{-k}=Var(\boldsymbol{\psi }_{-k})$ and $\boldsymbol{\mu} _{-k}=E(\boldsymbol{\psi}_{-k})$ . Figure 3.5 gives the marginal distribution of each component of $\psi _{k}$ from a Gibbs sampling run of

iterations with a burn-in of

cycles. The figures includes both the histograms of the sampled values and the Rao-Blackwellized estimates of the marginal densities (see Sect. 3.6 below) based on the averaging of (3.25) over the simulated values of $\boldsymbol{ \psi}_{-k}$ . The agreement between the two density estimates is close. In the bottom panel of Fig. 3.5 we plot the autocorrelation function of the sampled draws. The rapid decline in the autocorrelations for higher lags indicates that the sampler is mixing well.

**Figure 3.5:** Marginal distributions of $\psi$ in truncated multivariate normal example (*top panel*). Histograms of the sampled values and Rao-Blackwellized estimates of the densities are shown. Autocorrelation plots of the Gibbs MCMC chain are in the *bottom panel*. Graphs are based on iterations following a burn-in of cycles
$\includegraphics[width=11cm]{text/2-3/fig5.eps}$

Next: 3.5 MCMC Sampling with Up: 3. Markov Chain Monte Previous: 3.3 Metropolis-Hastings Algorithm