In implementing a MCMC method it is important to assess the performance of the sampling algorithm to determine the rate of mixing and the size of the burn-in, both having implications for the number of iterations required to get reliable answers. A large literature has emerged on these issues, for example, , Tanner (1996, Sect. 6.3), , Gammermann (1997, Sect. 5.4) and , but the ideas, although related in many ways, have not coalesced into a single prescription.
One approach for determining sampler performance and the size of the burn-in time is to employ analytical methods to the specified Markov chain, prior to sampling. This approach is exemplified in the work of, for example, ,  and . Two factors have inhibited the growth and application of these methods. The first is that the calculations are difficult and problem-specific and, second, the upper bounds for the burn-in that emerge from such calculations are usually conservative.
At this time the more popular approach is to utilize the sampled draws to assess both the performance of the algorithm and its approach to the invariant distribution. Several such relatively informal methods are available.  recommend monitoring the evolution of the quantiles as the sampling proceeds. Another useful diagnostic, one that is perhaps the most direct, are autocorrelation plots (and autocorrelation times) of the sampled output. Slowly decaying correlations indicate problems with the mixing of the chain. It is also useful in connection with M-H Markov chains to monitor the acceptance rate of the proposal values with low rates implying ''stickiness'' in the sampled values and thus a slower approach to the invariant distribution.
Somewhat more formal sample-based diagnostics are summarized in the CODA routines provided by . Although these diagnostics often go under the name ''convergence diagnostics'' they are in principle approaches that detect lack of convergence. Detection of convergence based entirely on the sampled output, without analysis of the target distribution, is perhaps impossible.  discuss and evaluate thirteen such diagnostics (for example, those proposed by , , , , , and , amongst others) without arriving at a consensus. Difficulties in evaluating these methods stem from the fact that some of these methods apply only to Gibbs Markov chains (for example, those of , and ) while others are based on the output not just of a single chain but on that of multiple chains specifically run from ''disparate starting values'' as in the method of Gelman and Rubin (1992). Finally, some methods assess the behavior of univariate moment estimates (as in the approach of , and ) while others are concerned with the behavior of the entire transition kernel (as in , and ).