next up previous contents index
Next: 3.6 Estimation of Density Up: 3. Markov Chain Monte Previous: 3.4 The Gibbs Sampling


3.5 MCMC Sampling with Latent Variables

In designing MCMC simulations, it is sometimes helpful to modify the target distribution by introducing latent variables or auxiliary variables into the sampling. This idea was called data augmentation by [58] in the context of missing data problems. Slice sampling, which we do not discuss in this chapter, is a particular way of introducing auxiliary variables into the sampling, for example see [20].

To fix notations, suppose that $ \boldsymbol{z}$ denotes a vector of latent variables and let the modified target distribution be $ \pi
(\boldsymbol{\psi}, \boldsymbol{z})$. If the latent variables are tactically introduced, the conditional distribution of  $ \boldsymbol{\psi}$ (or sub components of $ \boldsymbol{\psi})$ given $ \boldsymbol{z}$ may be easy to derive. Then, a multiple-block M-H simulation is conducted with the blocks $ \boldsymbol{\psi}$ and $ \boldsymbol{z}$ leading to the sample

$\displaystyle \left( \boldsymbol{\psi}^{(n_{0}+1)},\boldsymbol{z}^{(n_{0}+1)}\r...
...{(n_{0}+M)},\boldsymbol{z}^{(n_{0}+M)}\right) \sim \pi (\boldsymbol{\psi},z)\;,$    

where the draws on $ \boldsymbol{\psi}$, ignoring those on the latent data, are from  $ \pi (\boldsymbol{\psi})$, as required.

To demonstrate this technique in action, we return to the probit regression example discussed in Sect. 3.3.2 to show how a MCMC sampler can be developed with the help of latent variables. The approach, introduced by [1], capitalizes on the simplifications afforded by introducing latent or auxiliary data into the sampling.

The model is rewritten as

$\displaystyle z_{i}\vert\boldsymbol{\beta} \sim$   N$\displaystyle (\boldsymbol{x}_{i}^{\prime}\boldsymbol{\beta},1)\;,$    
$\displaystyle y_{i} = I[z_{i}>0]\;,\quad i\leq n\;,$    
$\displaystyle \boldsymbol{\beta} \sim$   N$\displaystyle _{k}(\boldsymbol{\beta}_{0},\boldsymbol{B}_{0})\;.$ (3.26)

This specification is equivalent to the probit regression model since

$\displaystyle \Pr (y_{i}=1\vert\boldsymbol{x}_{i},\boldsymbol{\beta})=\Pr (z_{i...
...}, \boldsymbol{ \beta})=\Phi (\boldsymbol{x}_{i}^{\prime}\boldsymbol{\beta})\;.$    

Now the Albert-Chib algorithm proceeds with the sampling of the full conditional distributions

$\displaystyle \boldsymbol{\beta}\vert\boldsymbol{y},\{z_{i}\}\;;\quad \{z_{i}\}\vert\boldsymbol{y},\boldsymbol{\beta} \;,$    

where both these distributions are tractable (i.e., requiring no M-H steps). In particular, the distribution of $ \boldsymbol {\beta }$ conditioned on the latent data becomes independent of the observed data and has the same form as in the Gaussian linear regression model with the response data given by $ \{z_{i}\}$ and is multivariate normal with mean $ \boldsymbol{\hat{\beta}}=
\boldsymbol{B}(\boldsymbol{B}_{0}^{-1}\boldsymbol{\beta}_{0}+\sum_{i=1}^{n}\boldsymbol{x}
_{i}z_{i})$ and variance matrix $ \boldsymbol{B}=(\boldsymbol{B}_{0}^{-1}+
\sum_{i=1}^{n}\boldsymbol{x}_{i}\boldsymbol{x}_{i}^{\prime})^{-1}$. Next, the distribution of the latent data conditioned on the data and the parameters factor into a set of $ n$ independent distributions with each depending on the data through $ y_{i}$:

$\displaystyle \{z_{i}\}\vert\boldsymbol{y},\boldsymbol{\beta}\overset{d}{=}\prod_{i=1}^{n}z_{i}\vert y_{i}, \boldsymbol{\beta} \;,$    

where the distribution $ z_{i}\vert y_{i},\boldsymbol{\beta}$ is the normal distribution $ z_{i}\vert\boldsymbol{\beta}$ truncated by the knowledge of $ y_{i}$; if $ y_{i}=0$, then $ z_{i}\leq 0$ and if $ y_{i}=1$, then $ z_{i}>0$. Thus, one samples $ z_{i}$ from $ \mathcal{TN}_{(-\infty
,0)}(\boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta},1)$ if $ y_{i}=0$ and from $ \mathcal{TN}_{(0,\infty )}( \boldsymbol{x}_{i}^{\prime}\boldsymbol{\beta},1)$ if $ y_{i}=1$, where $ \mathcal{TN} _{(a,b)}(\mu ,\sigma ^{2})$ denotes the $ \mathcal{N}(\mu ,\sigma ^{2})$ distribution truncated to the region $ (a,b)$.

The results, based on $ 5000$ MCMC draws beyond a burn-in of a $ 100$ iterations, are reported in Fig. 3.4. The results are close to those presented above, especially to the ones from the tailored M-H chain.

Figure 3.6: Caesarean data with Albert-Chib algorithm: Marginal posterior densities (top panel) and autocorrelation plot (bottom panel)
\includegraphics[width=9cm]{text/2-3/fig3press.eps}


next up previous contents index
Next: 3.6 Estimation of Density Up: 3. Markov Chain Monte Previous: 3.4 The Gibbs Sampling