12.1 Limited-Dependent and Qualitative Dependent Variables

The title of this section is taken from Maddala's (1983) well-known book of the same name which is still a very good reference for parametric models of this kind. XploRe 's metrics library covers some of the parametric models. Its comparative strength, however, is in the semiparametric models for data with limited-dependent and qualitative dependent variables, which have been developed more recently. We will first discuss the parametric models and then turn to their semiparametric competitors.


12.1.1 Probit, Logit and Tobit


{b,s,cv} = 27135 tobit (x, y)
two-step estimation of the Tobit model

Probit, Logit and Tobit are among the three most widely used parametric models for analyzing data with limited-dependent or qualitative dependent variables. Probit and Logit can be viewed as special cases of the generalized linear model (GLM). This is the perspective taken in XploRe . Hence we ask you to consult GLM (7) for a description of XploRe 's Probit and Logit (standard Logit, conditional Logit, multinomial Logit) quantlets.

The Tobit model is a parametric censored regression model. Formally, the model is given by

$\displaystyle Y= \left\{\begin{array}{ll} x^T \beta + \varepsilon &\qquad \text...
...beta + \varepsilon \geq c\\ 0 & \qquad \textrm{otherwise}\\ \end{array}\right.,$ (12.1)

where $ \varepsilon$ is an unobservable error term, assumed to be normally distributed with mean zero and variance $ \sigma^2,$ and $ c$ is a known constant. In the pioneering study of Tobin, $ Y$ is a consumer's expenditure on a durable good and $ Y^* = x^T \beta +
\varepsilon$ the consumer's willingness to pay for the good. $ Y$ is equal to $ Y^*$ only if the willingness to pay exceeds $ c,$ the minimum amount necessary to purchase the good. Otherwise, $ Y$ is equal to zero.

It is well known that an OLS regression of the nonzero $ Y$s on the explanatory variables $ x$ will not produce consistent estimates of the regression coefficients in this situation. The model implies the following conditional mean function :

$\displaystyle E(Y\vert x,Y^*>0) = x^T \beta +\sigma \frac{\varphi(x^T \beta/\sigma)}{\Phi(x^T \beta/\sigma)},$ (12.2)

where $ \varphi(\bullet)$ and $ \Phi(\bullet)$ are the probability density ($ pdf$) and cumulative distribution functions (cdf) of the standard normal distribution.

XploRe offers a two-step estimation of $ \beta:$ In the first step, a pilot estimate of $ \gamma = \beta/\sigma$ is obtained by estimating the model $ P({\boldsymbol{I}}(Y^*>0))=\Phi(x^T \gamma)$ by Probit analysis (here, $ {\boldsymbol{I}}(\bullet)$ denotes the indicator function). Using this pilot estimate $ \widehat{\gamma}$, we can compute $ \varphi(x^T\widehat{\gamma})$ and $ \Phi(x^T \widehat{\gamma})$ and use them as regressors when we estimate (12.2) on the part of the sample where $ Y>0.$

The tobit quantlet takes the observed x and y as inputs and returns the vector of estimated coefficients b, the estimated standard deviation of the error term s and the estimated covariance matrix cv of b and s:

  {b, s, cv} = tobit(x, y)

The dependent variable must be equal to 0 for the censored observations. The known constant $ c$ in (12.1) is subsumed in the constant term of $ \beta.$ That is, the estimated constant term is an estimate of the original constant term minus $ c.$

In the following example, simulated data is used to illustrate the use of 27156 tobit :

  library("metrics")
  randomize(241200)
  n      =  500
  k      =      2
  x      =      matrix(n)~aseq(1, n ,0.25)
  s      =      8
  u      =      s*normal(n)
  b      =      #(-9, 1)
  ystar  =      x*b+u
  y      =      ystar.*(ystar.>=0)
  tstep  =      tobit(x,y)
  dg     =      matrix(rows(tstep.cv),1)
  dig    =      diag(dg)
  stm    =      dig.*tstep.cv
  std    =      sqrt(sum(stm,2))
  coef   =      tstep.b|tstep.s
  coef~(coef./std)      ; t-ratios
27160 XLGmetric01.xpl

We have generated the data in accordance with the assumptions of the Tobit model and chose values of -9 and 1 for the components of $ \beta$ and 8 for $ \sigma$. The results in the XploRe output window show the estimated coefficients (first column) and their $ t$-ratios:
  Contents of _tmp
  [1,]  -9.7023  -11.201 
  [2,]   1.0092    92.13 
  [3,]   6.8732   4.5859


12.1.2 Single Index Models

XploRe offers several quantlets to estimate semiparametric regression models where the conditional mean of $ Y$ is assumed to depend on the explanatory variables $ x$ only via a function of the single (linear) index $ x^T\beta$:

$\displaystyle E(Y\vert x)=g(x^{T}\beta),$ (12.3)

where $ g(\bullet )$ is an unknown function. The Probit and Logit models of the previous sections are special cases of such single index models (SIMs) where $ g(\bullet )$ is assumed to be a known function (the cdf of the standard normal and logistic distribution, respectively). XploRe offers several alternative procedures to estimate $ \beta$ in (12.3) that only require that $ g(\bullet )$ is a smooth (but otherwise unspecified) function. All procedures are noniterative, easy to compute and have very desirable large-sample properties.


12.1.3 Average Derivatives

Define the vector of average derivatives of $ y$ with respect to $ x$ as

$\displaystyle \delta=E\left\{\frac{\partial E(Y\vert x)}{\partial x}\right\},$ (12.4)

i.e. $ \delta$ is the vector of partial derivatives of the regression function $ E(Y\vert x),$ averaged over the support of $ x.$ In a SIM, it turns out that

$\displaystyle \delta=E\left\{ g'(x^T\beta)\right\}\;\beta,$ (12.5)

where $ g'$ is the derivative of $ g(\bullet )$. From (12.5) we can see that in a SIM $ \delta$ is proportional to the vector of regression coefficients $ \beta.$ This implies that if we find a way to estimate $ \delta$, then we can estimate $ \beta$ up to scale. Estimating $ \beta$ up to scale is as good as we can do under the assumptions of the SIM, even if we had an infinite number of observations. That is, the scale of $ \beta$ is not identified in a SIM of the form (12.3). Hence we can focus on estimating $ \delta.$ That is the approach followed by the XploRe quantlets 27364 adeind , 27367 dwade and 27370 adeslp which return estimates of the average derivative $ \delta.$ Asymptotically, these estimators are equivalent, see Stoker (1991).

Two comments are in order:

Here is an overview of the XploRe commands for estimating the vector of average derivatives $ \delta$ (and thereby $ \beta$ up to scale in a SIM):


{delta, dvar} = 27407 adeind (x, y, d, h)
estimates the average derivative of $ Y$ with respect to $ x,$ which is proportional to $ \beta$ in SIMs
delta = 27410 dwade (x, y, h)
an alternative to 27413 adeind that estimates the density-weighted average derivative
{delta, dvar} = 27416 adeslp (x, y, d, m)
another alternative to 27419 adeind that estimates the average derivative by an instrumental variables regression
{delta, alpha, lim, hd} = 27422 adedis (z, x, y, h, hfac,c0,c1)
estimates the coefficients of discrete components of $ x$, in addition to 27425 dwade estimation of the coefficients of the continuous components

We will cover all quantlets, except for 27428 adeslp , which is discussed in Stoker (1991).


12.1.4 Average Derivative Estimation

It can be shown that

$\displaystyle \delta= E \{s(x) E(Y\vert x)\} =E\left\{ \frac{1}{f(x)}\frac{\partial f(x)}{\partial x} E(Y\vert x)\right\},$ (12.6)

where $ f(x)$ is the density of $ x.$ Equation (12.6) says that $ \delta$ is equal to the expectation of the product of the score vector $ s(x)$ and the conditional expectation function of $ Y.$ Hence, we can estimate $ \delta$ using the sample analogs of the quantities on the right-hand side of (12.6):

$\displaystyle \widehat{\delta} =\frac{1}{N}\sum_{i=1}^{N}\widehat{s}_{h}(x_{i})...
...at{f}_{h}(x_i)}{\partial x} Y_{i}{\boldsymbol{I}}(\widehat{f}_{h}(x_{i})>b_{N})$ (12.7)

where $ \widehat{f}_{h}(x)$ denotes the kernel density estimator with bandwidth $ h,$ $ \widehat f'_{h}(x)$ the vector of kernel estimators of the partial derivatives of $ f(x)$ and $ {\boldsymbol{I}}(\widehat{f}_{h}(x_{i})>b_{N})$ is an indicator that trims out observations at which the estimated density is very small.

Equation (12.7) defines the average derivative estimator (ADE) of Härdle and Stoker (1989) which is computed by the XploRe quantlet 27619 adeind :

  {delta, dvar} = adeind(x, y, d, h)
27622 adeind takes the data (x and y) as well as the bandwidth h and the binwidth d as inputs.

The bandwidth is visible in (12.7) but the binwidth d is not. This is because binning the data is merely a computational device for speeding up the computation of the estimator. Larger binwidth will speed up computation but imply a loss in information (due to using binned data rather than the actual data) that may be prohibitively large.

You may wonder what happened to the trimming bounds $ b_n$ in (12.7). Trimming is necessary for working out the theoretical properties of the estimator (to control the random denominator of $ \widehat{s}_h(x)$). In 27625 adeind , the trimming bounds are implicitly set such that 5 % of the data are always trimmed off at the boundaries (this is done by calling 27628 trimper within 27631 adeind ; 27634 trimper is an auxiliary quantlet, not covered here in more detail).

In the following example, simulated data are used to illustrate the use of 27637 adeind :

  library("metrics")
  randomize(333)
  n      =  500
  x      =  normal(n,3)
  beta   =  #(0.2 , -0.7 , 1)
  index  =  x*beta
  eps    =  normal(n,1) * sqrt(0.5)
  y      =  2 * index^3 + eps
  d      =  0.2
  m      =  5
  {delta,dvar} = adeind(x,y,d,m)
  (delta/delta[1])~(beta/beta[1])
27641 XLGmetric02.xpl

We have generated the data such that $ g(x^T\beta)=2 (x^T\beta)^3$. Recall that the estimate of $ \delta$ is an estimate of $ \beta$ up to scale. Since the scale of $ \beta$ is not identified you may normalize the estimated coefficients by dividing each of them by the first coefficient and then interpret the effects of the explanatory variables relative to the effect of the first explanatory variable (which is normalized to 1). The estimated and true coefficient vectors, both normalized, are shown in the output window:
  Contents of _tmp
  [1,]        1        1 
  [2,]  -4.2739     -3.5 
  [3,]   5.8963        5
Note that this example does not work in the Academic Edition of XploRe .


12.1.5 Weighted Average Derivative Estimation

The need for trimming in ADE is a consequence of its random denominator. This and other difficulties associated with a random denominator are overcome by the Density Weighted Average Derivative Estimator (DWADE) of Powell, Stock and Stoker (1989). It is based on the density weighted average derivative of $ Y$ with respect to $ x$:

$\displaystyle \widetilde{\delta} = E \left \{\frac{\partial g(x^T\beta}{\partial x} f(x) \right \} = E \{ g'(x^T\beta) f(x)\} \beta.$ (12.8)

Obviously, $ \widetilde{\delta}$ shares the property of the (unweighted) average derivative of being proportional to the coefficient vector $ \beta$ in single index models. The estimation strategy is therefore basically the same: find a way to estimate $ \widetilde{\delta}$ and you have an estimator of $ \beta$ up to a scaling factor.

It can be shown that

$\displaystyle \widetilde{\delta} = -2 E \{E(Y\vert x) f'(x)\}.$ (12.9)

Thus we may estimate $ \widetilde{\delta}$ (and thereby $ \beta$) by

$\displaystyle \ \widehat{\widetilde{\delta}}= -\frac{2}{N} \sum_{i=1}^N Y_{i} \widehat f'_{h} (x_{i}),$ (12.10)

where $ \widehat f'_{h}(x_{i})$ again denotes the vector of kernel estimators of the partial derivatives of $ f(x).$ The DWADE estimator defined in (12.10) shares the desirable distributional features of the ADE estimator ($ \sqrt{N}$-consistency, asymptotic normality). It is computed in XploRe by the 27771 dwade quantlet:
  d = dwade(x, y, h)
27774 dwade needs the data (x and y and the bandwidth h as inputs and returns the vector of estimated density weighted average derivatives.

We illustrate 27777 dwade with simulated data:

  library("metrics")
  randomize(333)
  n      =  500
  x      =  normal(n,3)
  beta   =  #(0.2 , -0.7 , 1)
  index  =  x*beta
  eps    =  normal(n,1) * sqrt(0.5)
  y      =  2 * index^3 + eps
  h      =  0.3
  d      =  dwade(x,y,h)
  (d/d[1])~(beta/beta[1])
27781 XLGmetric03.xpl

Note that we used the same data generating process as in the previous section and that we again show estimated and true coefficient vectors, both normalized by dividing through by their first element:
  Contents of _tmp
  [1,]        1        1 
  [2,]  -3.4727     -3.5 
  [3,]   4.9435        5


12.1.6 Average Derivatives and Discrete Variables

By definition, derivatives can only be calculated for continuous variables. Thus, 27899 adeind and 27902 dwade will not produce estimates of the components of $ \beta$ that belong to discrete explanatory variables.

Most discrete explanatory variables are 0/1 dummy variables. How do they enter into SIMs ? To give an answer, let us assume that $ x$ consists of several continuous and a single dummy variable and let us split $ x$ and $ \beta$ accordingly into $ x_1$ (continuous component) and $ x_2$ (dummy), and $ \beta_1$ and $ \beta_2,$ respectively. Then we have

\begin{displaymath}\begin{array}{rcll}
E(Y\vert x_1,x_2)&=& g(x_1^T\beta_1), &\t...
...)&=& g(x_1^T\beta_1 +\beta_2), &\textrm{if }x_2=1.
\end{array}\end{displaymath}

Graphically, changing $ x_2$ from 0 to $ 1$ means shifting $ g(\bullet )$ (as a function of $ x_1^T\beta_1$) by $ \beta_2.$ This is depicted in Figure 12.1, where $ \beta _2$ is labeled $ b_2$ for technical reasons.

Figure 12.1: Shifting $ g(\bullet )$ by $ \beta _2$.
\includegraphics[scale=0.425]{shiftit}

This is the basic idea underlying the estimator of $ \beta _2$ proposed by Horowitz and Härdle (1996): given an estimate of $ \beta_1$ (which can be obtained by using 27906 dwade , for instance) we can estimate both curves in Figure 12.1 by running separate kernel regression of $ Y$ on $ x_1^T\widehat{\beta}$ for the data points for which $ x_2=0$ (to get an estimate of $ g(x_1^T\beta_1)$) and for which $ x_2=1$ (to get an estimate of $ g(x_1^T\beta_1+\beta_2)$). Then we can compute the horizontal differences between the two estimated curves to get an estimate of $ \beta_2.$ This procedure is implemented in the XploRe quantlet 27913 adedis :

  {d, a, lim, h} = adedis(x2, x1, y, hd, hfac,c0,c1)
It takes as inputs the data, consisting of x2 (discrete explanatory variables), x1 (continuous explanatory variables) and y, and several parameters that are needed in the three steps of 27916 adedis estimation:

Whereas you have to specify $ c_0$ and $ c_1,$ the constants $ v_0$ and $ v_1$ are implicitly set to the minimum and maximum of $ x^T\widehat{\beta}_1$, plus or minus the bandwidth used in the kernel estimation of $ g(\bullet).$ The values of $ v_0$ and $ v_1$ are returned by 27925 adedis in the vector lim, along with the bandwidth $ h,$ calculated according to (12.11.) The most important outputs are, of course, the estimates of $ \beta_1$ and $ \beta _2$ which are stored in d and a, respectively.


12.1.7 Parametric versus Semiparametric Single Index Models


{t, p} = 28198 hhtest (vhat, y, yhat, h {,c {,m}}))
tests a parametric against a semiparametric SIM

In the previous sections, we have seen that even the noniterative estimators of SIMs implemented in XploRe require quite a bit of care and computational effort. More importantly, semiparametric estimators are less efficient than parametric estimators if the assumptions underlying the latter are satisfied. It is therefore desirable to know whether the distributional flexibility of these models justifies the loss in efficiency and the extra computational cost. That is, we would like to statistically test semiparametric SIMs against easily estimable parametric SIMs.

Horowitz and Härdle (1994) have developed a suitable test procedure that is implemented in the 28205 hhtest quantlet. Formally, the HH-test considers the following hypotheses:

$\displaystyle H_0$ $\displaystyle :$ $\displaystyle E(Y\vert x) = f(x^T\beta),$  
$\displaystyle H_1$ $\displaystyle :$ $\displaystyle E(Y\vert x) = g(x^T\beta),$  

where $ f(\bullet)$ is a known function (such as the cdf of the standard normal distribution) and $ g(\bullet )$ is an unknown, smooth function. Hence, the parametric model is in the null whereas the semiparametric model is in the alternative hypothesis.

Here is the main idea underlying the HH-test: if the model under the null is true (and given an estimate of $ \beta$), then a nonparametric regression of $ Y$ on $ x^T\widehat\beta = v$ will give a consistent estimate of the parametric function $ f(\bullet)$. If, however, the parametric model is wrong, then the nonparametric regression of $ Y$ on $ x^T\widehat\beta = v$ will deviate systematically from $ f(\bullet).$

This insight is reflected in the HH-test statistic:

$\displaystyle T = \sqrt {h} \sum\limits_{i=1}^N w(x_i^T\widehat\beta) \lbrace Y...
...race \lbrace\widetilde f_{i}(x_i^T\widehat\beta) - f(x_i^T\widehat\beta)\rbrace$ (12.12)

where $ h$ is the bandwidth used in the nonparametric (kernel) regression, and $ w(\bullet)$ is a weight function that downweighs extreme observations. In practice $ w(\bullet)$ is defined to be identically equal to one for $ 90\%$ or $ 95\%$ of the central values of $ x_i^T\widehat\beta$ and zero otherwise. The term $ \lbrace\widetilde f_{i}(x_i^T\widehat\beta)
- f(x_i^T\widehat\beta)\rbrace$ compares the kernel regression of $ Y$ on $ x_i^T\beta$ (denoted $ \widetilde f_{i}(x_i^T\widehat\beta)$) with the parametric model implied by the null hypothesis.

The HH-test statistic is computed by the XploRe quantlet 28212 hhtest :

  {t, p} = hhtest(vhat, y, yhat, h {,c {,m}}})

The function 28215 hhtest takes as inputs

vhat
the vector with the estimated index $ x_i^T\widehat{\beta}.$ Horowitz and Härdle (1994) suggest that the index can be estimated under the null hypothesis. Hence, if your model in $ H_0$ is the Probit model, then you get vhat by running 28218 glmest with the "bipro" option on your data. See GLM (7).
y
the observations on the dependent variable.
yhat
the parametric estimate of $ E(Y\vert x)$, i.e. $ f(x_i^T\widehat{\beta}).$
h
the bandwidth for the kernel regression of y on vhat.
c
optional parameter that must be lying in the interval $ 0 =< c <
1.$ Proportion of the sample to be cut at each extreme. Default is 0.05.
m
optional $ n \textrm{\tt x} 1$ vector or scalar. m should be given only if $ Y$ is a binomial or binary dependent variable. If it is binomial, then m should be the vector of binomial coefficients. If $ Y$ is binary, then set m equal to 1. This will improve estimation of the variance of the test statistic.

28221 hhtest returns the test statistic t and the corresponding $ p$-value p. Under $ H_0$ the test statistic defined in (12.12) is asymptotically normally distributed with zero mean and finite variance. The test, however, is a one-sided test because deviations of $ H_0$ of the semiparametric kind considered in (12.12) will lead to large positive values of the test statistic.

We illustrate 28224 hhtest using the kyphosis data:

  library("metrics")
  x      =  read("kyphosis")                                        
  y      =  x[,4]                                                 
  x      =  x[,1:3]                                               
  x      =  matrix(rows(x))~x                                     
  h      =  2                                                     
  g      =  glmbilo(x,y)   
  eta    =  x*g.b
  mu     =  g.mu
  {t,p}  =  hhtest(eta,y,mu,h,0.05,1)           
  t~p
28230 XLGmetric04.xpl

The value of the test statistic and the $ p$-value are displayed in the XploRe output window:
  Contents of _tmp
  [1,] -0.79444  0.21346
The null hypothesis is the Logit model that was used to fit the data. We cannot reject $ H_0$ at conventional significance levels because the $ p$-value is greater than 0.2.