This section deals with models in which the dependent variable is discrete. Many interesting problems like labour force participation, presidential voting, transport mode choice and brand choice are discrete in nature. In particular, we consider discrete choice models in the case where panel data are available. This allows, for example, to follow individuals with their choices over time, so that richer behavioural models can be constructed. Although the number of parameters in these models does not necessarily increase, the likelihood function, and therefore estimation, becomes more complex. In this section we describe the multinomial multiperiod probit, the multivariate probit and the mixed multinomial logit model. Examples are given.
We refer to [37] for a general introduction to limited dependent and qualitative variables in econometrics and to [22] for a basic introduction motivating such models in relation to marketing.
Denote by the unobserved utility perceived by individual who chooses alternative at time . This utility may be modelled as follows
(2.2) |
(2.3) |
(2.6) |
As an identification restriction, one usually imposes a unit variance for the last alternative expressed in utility differences. Define
(2.7) |
This section briefly explains how the multinomial multiperiod probit model can be estimated in the classical or Bayesian framework. More details can be found in [25].
Since we assume independent observations on individuals the likelihood is
Alternative estimation methods are based on simulations of the choice probabilities. The simulated maximum likelihood (SML) method maximizes the simulated likelihood which is obtained by substituting the simulated choice probabilities in (2.9). The method of simulated moments is a simulation based substitute for the generalized method of moments. For further information on these estimation methods we refer to [27].
It is possible to extend the model in (2.5) in various ways, such as alternative specific 's, individual heterogeneity or a dynamic specification.
[41] propose a dynamic specification
The model parameters are and and are augmented by the latent utilities . Bayesian inference may be done by Gibbs sampling as described in the estimation part above. Table 2.1 describes for each of the nine blocks which posterior distribution is used. For example, has a conditional (on all other parameters) posterior density that is normal.
Parameter | Conditional posterior |
Multivariate normal distributions | |
Inverted Wishart distributions | |
Matrix normal distribution | |
Truncated multivariate normal |
As an illustration we reproduce the results of [41], who provided their Gauss code (which we slightly modified). They use optical scanner data on purchases of four brands of saltine crackers. [13] use the same data set to estimate a static multinomial probit model. The data set contains all purchases (choices) of crackers of households over a period of two years, yielding observations. Variables such as prices of the brands and whether there was a display and/or newspaper feature of the considered brands at the time of purchase are also observed and used as the explanatory variables forming (and then transformed into ). Table 2.2 gives the means of these variables. Display and Feature are dummy variables, e.g. Sunshine was displayed and was featured of the purchase occasions. The average market shares reflect the observed individual choices, with e.g. of the choices on Sunshine.
Sunshine | Keebler | Nabisco | Private Label | |
Market share | ||||
Display | ||||
Feature | ||||
Price |
Table 2.3 shows posterior means and standard deviations for the and parameters. They are computed from draws after dropping initial draws. The prior on is inverted Wishart, denoted by , with and chosen such that . Note that [41] use a prior such that . For the other parameters we put uninformative priors. As expected, Display and Feature have positive effects on the choice probabilities and price has a negative effect. This holds both in the short run and the long run. With respect to the private label (which serves as reference category), the posterior means of the intercepts are positive except for the first label whose intercept is imprecisely estimated.
parameter | parameter | Intercepts | |||||
mean | st. dev. | mean | st. dev. | mean | st. dev. | ||
Display | () | () | Sunshine | () | |||
Feature | () | () | Keebler | () | |||
Price | () | () | Nabisco | () |
Table 2.4 gives the posterior means and standard deviations of , , and . Note that the reported last element of is equal to in order to identify the model. This is done, after running the Gibbs sampler with unrestricted, by dividing the variance related parameter draws by . The other parameter draws are divided by the square root of the same quantity. [39] propose an alternative approach where is fixed to by construction, i.e. a fully identified parameter approach. They write
The relatively large posterior means of the diagonal elements of show that there is persistence in brand choice. The matrices and measure the unobserved heterogeneity. There seems to be substantial heterogeneity across the individuals, especially for the price of the products (see the third diagonal elements of both matrices). The last three elements in are related to the intercepts.
The multinomial probit model is frequently used for marketing purposes. For example, [1] use ketchup purchase data to emphasize the importance of a detailed understanding of the distribution of consumer heterogeneity and identification of preferences at the customer level. In fact, the disaggregate nature of many marketing decisions creates the need for models of consumer heterogeneity which pool data across individuals while allowing for the analysis of individual model parameters. The Bayesian approach is particularly suited for that, contrary to classical approaches that yields only aggregate summaries of heterogeneity.
The multivariate probit model relaxes the assumption that choices are mutually exclusive, as in the multinomial model discussed before. In that case, may contain several 's. [10] discuss classical and Bayesian inference for this model. They also provide examples on voting behavior, on health effects of air pollution and on labour force participation.
The multinomial logit model is defined as in (2.1), except that the random shock is extreme value (or Gumbel) distributed. This gives rise to the independence from irrelevant alternatives (IIA) property which essentially means that . Like the probit model, the mixed multinomial logit (MMNL) model alleviates this restrictive IIA property by treating the parameter as a random vector with density . The latter density is called the mixing density and is usually assumed to be a normal, lognormal, triangular or uniform distribution. To make clear why this model does not suffer from the IIA property, consider the following example. Suppose that there is only explanatory variable and that . We can then write (2.1) as
(2.14) | ||
The mixed logit probability is given by
Estimation of the MMNL model can be done by SML or the method of simulated moments or simulated scores. To do this, the logit probability in (2.15) is replaced by its simulated counterpart
According to [27] the SML estimator is asymptotically equivalent to the ML estimator if (the total number of observations) and both tend to infinity and . In practice, it is sufficient to fix at a moderate value.
The approximation of an integral like in (2.15) by the use of pseudo-random numbers may be questioned. [6] implements an alternative quasi-random SML method which uses quasi-random numbers. Like pseudo-random sequences, quasi-random sequences, such as Halton sequences, are deterministic, but they are more uniformly distributed in the domain of integration than pseudo-random ones. The numerical experiments indicate that the quasi-random method provides considerably better accuracy with much fewer draws and computational time than does the usual random method.
Let us suppose that the mixing distribution is Gaussian, that is, the vector is normally distributed with mean and variance matrix . The posterior density for individuals can be written as
For the first two blocks the conditional posterior densities are known and are easy to sample from. The last block is more difficult. To sample from this density, a Metropolis Hastings (MH) algorithm is set up. Note that only one iteration is necessary such that simulation within the Gibbs sampler is avoided. See [50], Chap. 12, for a detailed description of the MH algorithm for the mixed logit model and for guidelines about how to deal with other mixing densities. More general information on the MH algorithm can be found in Chap. II.3.
Bayesian inference in the mixed logit model is called hierarchical Bayes because of the hierarchy of parameters. At the first level, there are the individual parameters which are distributed with mean and variance matrix . The latter are called hyper-parameters, on which we have also prior densities. They form the second level of the hierarchy.
We reproduce the results of [40] using their Gauss code available on the web site elsa.berkeley.edu/train/software.html. They analyse the demand for alternative vehicles. There are respondents who choose among six alternatives (two alternatives run on electricity only). There are explanatory variables among which are considered to have a random effect. The mixing distributions for these random coefficients are independent normal distributions. The model is estimated by SML and uses replications per observation. Table 2.6 includes partly the estimation results of the MMNL model. We report the estimates and standard errors of the parameters of the normal mixing distributions, but we do not report the estimates of the fixed effect parameters corresponding to the other explanatory variables. For example, the luggage space error component induces greater covariance in the stochastic part of utility for pairs of vehicles with greater luggage space. We refer to [40] or [8] for more interpretations of the results.
[50] provides more information and pedagogical examples on the mixed multinomial model.
Variable | Mean | Standard deviation | ||
Electric vehicle (EV) dummy | () | () | ||
Compressed natural gass (CNG) dummy | () | () | ||
Size | () | () | ||
Luggage space | () | () |