Many econometric isuues require models that are richer or more flexible than the conventional regression type models. Several possibilities exist. For example, as explained in Sect. 2.2.3, the logit model is made more realistic by generalizing it to a mixed logit. Many models currently used in econometrics can be generalized in such a way.
In this section, we assume that the univariate or multivariate observations are considered as draws of
The structure of (2.39) implies that the likelihood for all the observations contains terms
(2.40) |
Bayesian inference on finite mixture distributions by MCMC sampling is explained in [19]. Gibbs sampling on is difficult since the posterior distributions of and are generally unknown. For the same reason as for the probit model in Sect. 2.2.1 and the stochastic volatility model in Sect. 2.3, inference on the finite mixture model is straightforward once the state or group of an observation is known. Data augmentation is therefore an appropriate way to render inference easier. Define the state indicator which takes value when belongs to state or group where . Denote by the -dimensional discrete vector containing all the state indicators. To facilitate the inference, prior independence, that is , is usually imposed. As shown in the next examples, the posterior distributions , and are either known distributions easy to sample from or they are distributions for which a second, but simpler, MCMC sampler is set up. A Gibbs sampler with three main blocks may therefore be used.
The complete data likelihood of the finite mixture is invariant to a relabeling of the states. This means that we can take the labeling and do a permutation without changing the value of the likelihood function. If the prior is also invariant to relabeling then the posterior has this property also. As a result, the posterior has potentially different modes. To solve this identification or label switching problem, identification restrictions have to be imposed.
Note that the inference described here is conditional on , the number of components. There are two modelling approaches to take care of . First, one can treat as an extra parameter in the model as is done in [44] who make use of the reversible jump MCMC methods. In this way, the prior information on the number of components can be taken explicitly into account by specifying for example a Poisson distribution on in such a way that it favors a small number of components. A second approach is to treat the choice of as a problem of model selection. By so-doing one separates the issue of the choice of from estimation with fixed. For example, one can take and and do the estimation separately for the two models. Then Bayesian model comparison techniques (see Chap. III.11) can be applied, for instance by the calculation of the Bayes factor, see [14] and [9] for more details.
We review two examples. The first example fits US quarterly GNP data using a Markov switching autoregressive model. The second example is about the clustering of many GARCH models.
[23] uses US quarterly real GNP growth data from 1951:2 to 1984:4. This series was initially used by [30] and is displayed in Fig. 2.2. The argument is that contracting and expanding periods are generated by the same model but with different parameters. These models are called state- (or regime-) switching models.
After some investigation using Bayesian model selection techniques, the adequate specification for the US growth data is found to be the two-state switching AR(2) model
In the second step, this sample is used to identify the model. This is done by visual inspection of the posterior marginal and bivariate densities. Identification restrictions need to be imposed to avoid multimodality of the posterior densities. Once suitable restrictions are found, a final MCMC sample is constructed to obtain the moments of the constrained posterior density. The latter sample is constructed by permutation sampling under the restrictions, which means that (2) is replaced by one permutation defining the constrained parameter space.
In the GNP growth data example, two identification restrictions seem possible, namely and , see [23] for details. Table 2.9 provides the posterior means and standard deviations of the 's for both identification restrictions.
Contraction | Expansion | Contraction | Expansion | |
() | () | () | () | |
() | () | () | () | |
() | () | () | () |
The GNP growth in contraction and expansion periods not only have different unconditional means, they are also driven by different dynamics. Both identification restrictions result in similar posterior moments.
[4] focus on the differentiation between the component distributions via different conditional heteroskedasticity structures by the use of GARCH models. In this framework, the observation is multivariate and the 's are the parameters of GARCH(1,1) models. The purpose is to estimate many, of the order of several hundreds, GARCH models. Each financial time series belongs to one of the groups but it is not known a priori which series belongs to which cluster.
An additional identification problem arises due to the possibility of empty groups. If a group is empty then the posterior of is equal to the prior of . Therefore an improper prior is not allowed for . The identification problems are solved by using an informative prior on each . The identification restrictions use the fact that we work with GARCH models: we select rather non-overlapping supports for the parameters, such that the prior depends on a labeling choice. Uniform prior densities on each parameter, on finite intervals, possibly subject to stationarity restrictions, are relatively easy to specify.
Bayesian inference is done by use of the Gibbs sampler and data augmentation. Table 2.10 summarizes the three blocks of the sampler.
Because of the prior independence of the 's, the griddy-Gibbs sampler is applied separately times.
True value | |||
Mean | |||
Standard deviation | |||
Correlation matrix | |||
True value | ||||
Prior interval | , | , | , | |
, | , | , | ||
Mean | ||||
Standard deviation | ||||
Correlation |
As an illustration we show the posterior marginals of the following model
(2.43) | |
(2.44) | |
(2.45) |
[4] succesfully apply this model to return series of 131 US stocks. Comparing the marginal likelihoods for different models, they find that is the appropriate choice for the number of component distributions.
Other interesting examples of finite mixture modelling exist in the literature. [Frühwirth-Schnatter and Kaufmann (2002)] develop a regime switching panel data model. Their purpose is to cluster many short time series to capture asymmetric effects of monetary policy on bank lending. [18] develop a finite mixture negative binomial count model to estimate six measures of medical care demand by the elderly. [11] offer a flexible Bayesian analysis of the problem of causal inference in models with non-randomly assigned treatments. Their approach is illustrated using hospice data and hip fracture data.