This powerful algorithm provides a general approach for producing a correlated sequence of draws from the target density that may be difficult to sample by a classical independence method. The goal is to simulate the -dimensional distribution , that has density with respect to some dominating measure. To define the algorithm, let denote a source density for a candidate draw given the current value in the sampled sequence. The density is referred to as the proposal or candidate generating density. Then, the M-H algorithm is defined by two steps: a first step in which a proposal value is drawn from the candidate generating density and a second step in which the proposal value is accepted as the next iterate in the Markov chain according to the probability , where
Typically, a certain number of values (say ) at the start of this sequence are discarded after which the chain is assumed to have converged to it invariant distribution and the subsequent draws are taken as approximate variates from . Because theoretical calculation of the burn-in is not easy it is important that the proposal density is chosen to ensure that the chain makes large moves through the support of the invariant distribution without staying at one place for many iterations. Generally, the empirical behavior of the M-H output is monitored by the autocorrelation time of each component of and by the acceptance rate, which is the proportion of times a move is made as the sampling proceeds.
One should observe that the target density appears as a ratio in the probability and therefore the algorithm can be implemented without knowledge of the normalizing constant of . Furthermore, if the candidate-generating density is symmetric, i.e. , the acceptance probability only contains the ratio ; hence, if , the chain moves to , otherwise it moves with probability given by . The latter is the algorithm originally proposed by . This version of the algorithm is illustrated in Fig. 3.1.
Different proposal densities give rise to specific versions of the M-H algorithm, each with the correct invariant distribution . One family of candidate-generating densities is given by . The candidate is thus drawn according to the process , where follows the distribution . Since the candidate is equal to the current value plus noise, this case is called a random walk M-H chain. Possible choices for include the multivariate normal density and the multivariate-. The random walk M-H chain is perhaps the simplest version of the M-H algorithm (and was the one used by ) and popular in applications. One has to be careful, however, in setting the variance of ; if it is too large it is possible that the chain may remain stuck at a particular value for many iterations while if it is too small the chain will tend to make small moves and move inefficiently through the support of the target distribution. In both cases the generated draws that will be highly serially correlated. Note that when is symmetric, and the probability of move only contains the ratio . As mentioned earlier, the same reduction occurs if .
 considers a second family of candidate-generating densities that are given by the form .  refers to this as an independence M-H chain because, in contrast to the random walk chain, the candidates are drawn independently of the current location . In this case, the probability of move becomes
Chib and Greenberg (1994, 1995) discuss a way of formulating proposal densities in the context of time series autoregressive-moving average models that has a bearing on the choice of proposal density for the independence M-H chain. They suggest matching the proposal density to the target at the mode by a multivariate normal or multivariate- distribution with location given by the mode of the target and the dispersion given by inverse of the Hessian evaluated at the mode. Specifically, the parameters of the proposal density are taken to be
Another way to generate proposal values is through a Markov chain version of the accept-reject method. In this version, due to , and considered in detail by , a pseudo accept-reject step is used to generate candidates for an M-H algorithm. Suppose is a known constant and a source density. Let denote the set of value for which dominates the target density and assume that this set has high probability under . Given , the next value is obtained as follows: First, a candidate value is obtained, independent of the current value , by applying the accept-reject algorithm with as the ''pseudo dominating'' density. The candidates that are produced under this scheme have density . If we let then it can be shown that the M-H probability of move is given by
In the M-H algorithm the transition kernel of the chain is given by
 provide a way to derive and interpret the probability of move . Consider the proposal density . This proposal density is not likely to be reversible for (if it were then we would be done and M-H sampling would not be necessary). Without loss of generality, suppose that implying that the rate of transitions from to exceed those in the reverse direction. To reduce the transitions from to one can introduce a function such that . Solving for yields the probability of move in the M-H algorithm. This calculation reveals the important point that the function is reversible by construction, i.e., it satisfies the condition
It is not difficult to provide conditions under which the Markov chain generated by the M-H algorithm satisfies the conditions of Propositions 1-2. The conditions of Proposition 1 are satisfied by the M-H chain if is positive for and continuous and the set is connected. In addition, the conditions of Proposition 2 are satisfied if is not reversible (which is the usual situation) which leads to a chain that is aperiodic. Conditions for ergodicity, required for use of the central limit theorem, are satisfied if in addition is bounded. Other similar conditions are provided by .
To illustrate the M-H algorithm, consider the binary response data in Table 3.1, taken from Fahrmeir and Tutz (1997), on the occurrence or non-occurrence of infection following birth by caesarean section. The response variable is one if the caesarean birth resulted in an infection, and zero if not. There are three covariates: , an indicator of whether the caesarean was non-planned; , an indicator of whether risk factors were present at the time of birth and , an indicator of whether antibiotics were given as a prophylaxis. The data in the table contains information from births. Under the column of the response, an entry such as means that there were deliveries with covariates of whom developed an infection and did not.
Suppose that the probability of infection for the th birth is
To define the proposal density, let
|Mean||Std dev||Mean||Std dev||Lower||Upper|
As expected, both the first and second covariates increase the probability of infection while the third covariate (the antibiotics prophylaxis) reduces the probability of infection.
To get an idea of the form of the posterior density we plot in Fig. 3.1 the four marginal posterior densities. The density plots are obtained by smoothing the histogram of the simulated values with a Gaussian kernel. In the same plot we also report the autocorrelation functions (correlation against lag) for each of the sampled parameter values. The autocorrelation plots provide information of the extent of serial dependence in the sampled values. Here we see that the serial correlations start out high but decline to almost zero by lag twenty.
|Mean||Std dev||Mean||Std dev||Lower||Upper|
To see the difference in results, the M-H algorithm is next implemented with a tailored proposal density. In this scheme one utilizes both and that were defined above. We let the proposal density be , a multivariate- density with fifteen degrees of freedom. This proposal density is similar to the random-walk proposal except that the distribution is centered at the fixed point . The prior-posterior summary based on draws of the M-H algorithm with this proposal density is given in Table 3.3. We see that the marginal posterior moments are similar to those in Table 3.1. The marginal posterior densities are reported in the top panel of Fig. 3.2. These are virtually identical to those computed using the random-walk M-H algorithm. The most notable difference is in the serial correlation plots which decline much more quickly to zero indicating that the algorithm is mixing well. The same information is revealed by the inefficiency factors which are much closer to one than those from the previous algorithm.
The message from this analysis is that the two proposal densities produce similar results, with the differences appearing only in the autocorrelation plots (and inefficiency factors) of the sampled draws.
In applications when the dimension of is large, it can be difficult to construct a single block M-H algorithm that converges rapidly to the target density. In such cases, it is helpful to break up the variate space into smaller blocks and to then construct a Markov chain with these smaller blocks. Suppose, for illustration, that is split into two vector blocks . For example, in a regression model, one block may consist of the regression coefficients and the other block may consist of the error variance. Next, for each block, let
With these inputs, one sweep of the multiple-block M-H algorithm is completed by updating each block, say sequentially in fixed order, using a M-H step with the above probabilities of move, given the most current value of the other block.
The extension of this method to more than two blocks is straightforward.
The transition kernel of the resulting Markov chain is given by the product of transition kernels
The implication of this result is that it allows us to take draws in succession from each of the kernels, instead of having to run each to convergence for every value of the conditioning variable.