12.1 Limited-Dependent and Qualitative Dependent Variables

The title of this section is taken from Maddala's (1983) well-known book of the same name which is still a very good reference for parametric models of this kind. XploRe 's metrics library covers some of the parametric models. Its comparative strength, however, is in the semiparametric models for data with limited-dependent and qualitative dependent variables, which have been developed more recently. We will first discuss the parametric models and then turn to their semiparametric competitors.

12.1.1 Probit, Logit and Tobit

{b,s,cv} = tobit (x, y): two-step estimation of the Tobit model

Probit, Logit and Tobit are among the three most widely used parametric models for analyzing data with limited-dependent or qualitative dependent variables. Probit and Logit can be viewed as special cases of the generalized linear model (GLM). This is the perspective taken in XploRe . Hence we ask you to consult GLM (7) for a description of XploRe 's Probit and Logit (standard Logit, conditional Logit, multinomial Logit) quantlets.

The Tobit model is a parametric censored regression model. Formally, the model is given by

$\displaystyle Y= \left\{\begin{array}{ll} x^T \beta + \varepsilon &\qquad \text... ...beta + \varepsilon \geq c\\ 0 & \qquad \textrm{otherwise}\\ \end{array}\right.,$

(12.1)

where $\varepsilon$ is an unobservable error term, assumed to be normally distributed with mean zero and variance $\sigma^2,$ and

is a known constant. In the pioneering study of Tobin,

is a consumer's expenditure on a durable good and $Y^* = x^T \beta + \varepsilon$ the consumer's willingness to pay for the good.

is equal to

only if the willingness to pay exceeds

the minimum amount necessary to purchase the good. Otherwise,

is equal to zero.

It is well known that an OLS regression of the nonzero s on the explanatory variables will not produce consistent estimates of the regression coefficients in this situation. The model implies the following conditional mean function :

$\displaystyle E(Y\vert x,Y^*>0) = x^T \beta +\sigma \frac{\varphi(x^T \beta/\sigma)}{\Phi(x^T \beta/\sigma)},$

(12.2)

where $\varphi(\bullet)$ and $\Phi(\bullet)$ are the probability density (

) and cumulative distribution functions (cdf) of the standard normal distribution.

XploRe offers a two-step estimation of $\beta:$ In the first step, a pilot estimate of $\gamma = \beta/\sigma$ is obtained by estimating the model $P({\boldsymbol{I}}(Y^*>0))=\Phi(x^T \gamma)$ by Probit analysis (here, ${\boldsymbol{I}}(\bullet)$ denotes the indicator function). Using this pilot estimate $\widehat{\gamma}$ , we can compute $\varphi(x^T\widehat{\gamma})$ and $\Phi(x^T \widehat{\gamma})$ and use them as regressors when we estimate (12.2) on the part of the sample where

The tobit quantlet takes the observed x and y as inputs and returns the vector of estimated coefficients b, the estimated standard deviation of the error term s and the estimated covariance matrix cv of b and s:

  {b, s, cv} = tobit(x, y)

The dependent variable must be equal to 0 for the censored observations. The known constant in (12.1) is subsumed in the constant term of $\beta.$ That is, the estimated constant term is an estimate of the original constant term minus

In the following example, simulated data is used to illustrate the use of 27156 tobit :

  library("metrics")
  randomize(241200)
  n      =  500
  k      =      2
  x      =      matrix(n)~aseq(1, n ,0.25)
  s      =      8
  u      =      s*normal(n)
  b      =      #(-9, 1)
  ystar  =      x*b+u
  y      =      ystar.*(ystar.>=0)
  tstep  =      tobit(x,y)
  dg     =      matrix(rows(tstep.cv),1)
  dig    =      diag(dg)
  stm    =      dig.*tstep.cv
  std    =      sqrt(sum(stm,2))
  coef   =      tstep.b|tstep.s
  coef~(coef./std)      ; t-ratios

XLGmetric01.xpl

We have generated the data in accordance with the assumptions of the Tobit model and chose values of -9 and 1 for the components of $\beta$ and 8 for $\sigma$ . The results in the XploRe output window show the estimated coefficients (first column) and their

-ratios:

  Contents of _tmp
  [1,]  -9.7023  -11.201 
  [2,]   1.0092    92.13 
  [3,]   6.8732   4.5859

12.1.2 Single Index Models

XploRe offers several quantlets to estimate semiparametric regression models where the conditional mean of is assumed to depend on the explanatory variables only via a function of the single (linear) index $x^T\beta$ :

$\displaystyle E(Y\vert x)=g(x^{T}\beta),$

(12.3)

where $g(\bullet )$ is an unknown function. The Probit and Logit models of the previous sections are special cases of such single index models (SIMs) where $g(\bullet )$ is assumed to be a known function (the cdf of the standard normal and logistic distribution, respectively). XploRe offers several alternative procedures to estimate $\beta$ in (12.3) that only require that $g(\bullet )$ is a smooth (but otherwise unspecified) function. All procedures are noniterative, easy to compute and have very desirable large-sample properties.

12.1.3 Average Derivatives

Define the vector of average derivatives of with respect to as

$\displaystyle \delta=E\left\{\frac{\partial E(Y\vert x)}{\partial x}\right\},$

(12.4)

i.e. $\delta$ is the vector of partial derivatives of the regression function $E(Y\vert x),$ averaged over the support of

In a SIM, it turns out that

$\displaystyle \delta=E\left\{ g'(x^T\beta)\right\}\;\beta,$

(12.5)

where

is the derivative of $g(\bullet )$ . From (12.5) we can see that in a SIM $\delta$ is proportional to the vector of regression coefficients $\beta.$ This implies that if we find a way to estimate $\delta$ , then we can estimate $\beta$ up to scale. Estimating $\beta$ up to scale is as good as we can do under the assumptions of the SIM, even if we had an infinite number of observations. That is, the scale of $\beta$ is not identified in a SIM of the form (12.3). Hence we can focus on estimating $\delta.$ That is the approach followed by the XploRe quantlets 27364

adeind , 27367

dwade and 27370

adeslp which return estimates of the average derivative $\delta.$ Asymptotically, these estimators are equivalent, see Stoker (1991).

Two comments are in order:

The average derivative is only defined for the continuous components of . For the discrete components of we have to find a different way to estimate their coefficients. The metrics library features the quantlet adedis which estimates the coefficients of discrete components of by a method suggested in Horowitz and Härdle (1996).
You can use adeind , dwade and adeslp to estimate the average derivative in any regression model where it exists. See Stoker (1991) and Härdle and Stoker (1989) for discussions of the interpretation of the average derivative in regression models that are not of the single index form.

Here is an overview of the XploRe commands for estimating the vector of average derivatives $\delta$ (and thereby $\beta$ up to scale in a SIM):

{delta, dvar} = adeind (x, y, d, h): estimates the average derivative of with respect to which is proportional to $\beta$ in SIMs
delta = dwade (x, y, h): an alternative to adeind that estimates the density-weighted average derivative
{delta, dvar} = adeslp (x, y, d, m): another alternative to adeind that estimates the average derivative by an instrumental variables regression
{delta, alpha, lim, hd} = adedis (z, x, y, h, hfac,c0,c1): estimates the coefficients of discrete components of , in addition to dwade estimation of the coefficients of the continuous components

We will cover all quantlets, except for 27428 adeslp , which is discussed in Stoker (1991).

12.1.4 Average Derivative Estimation

It can be shown that

$\displaystyle \delta= E \{s(x) E(Y\vert x)\} =E\left\{ \frac{1}{f(x)}\frac{\partial f(x)}{\partial x} E(Y\vert x)\right\},$

(12.6)

where

is the density of

Equation (12.6) says that $\delta$ is equal to the expectation of the product of the score vector

and the conditional expectation function of

Hence, we can estimate $\delta$ using the sample analogs of the quantities on the right-hand side of (12.6):

$\displaystyle \widehat{\delta} =\frac{1}{N}\sum_{i=1}^{N}\widehat{s}_{h}(x_{i})... ...at{f}_{h}(x_i)}{\partial x} Y_{i}{\boldsymbol{I}}(\widehat{f}_{h}(x_{i})>b_{N})$

(12.7)

where $\widehat{f}_{h}(x)$ denotes the kernel density estimator with bandwidth

$\widehat f'_{h}(x)$ the vector of kernel estimators of the partial derivatives of

and ${\boldsymbol{I}}(\widehat{f}_{h}(x_{i})>b_{N})$ is an indicator that trims out observations at which the estimated density is very small.

Equation (12.7) defines the average derivative estimator (ADE) of Härdle and Stoker (1989) which is computed by the XploRe quantlet 27619 adeind :

  {delta, dvar} = adeind(x, y, d, h)

adeind takes the data (x and y) as well as the bandwidth h and the binwidth d as inputs.

The bandwidth is visible in (12.7) but the binwidth d is not. This is because binning the data is merely a computational device for speeding up the computation of the estimator. Larger binwidth will speed up computation but imply a loss in information (due to using binned data rather than the actual data) that may be prohibitively large.

You may wonder what happened to the trimming bounds in (12.7). Trimming is necessary for working out the theoretical properties of the estimator (to control the random denominator of $\widehat{s}_h(x)$ ). In 27625 adeind , the trimming bounds are implicitly set such that 5 % of the data are always trimmed off at the boundaries (this is done by calling 27628 trimper within 27631 adeind ; 27634 trimper is an auxiliary quantlet, not covered here in more detail).

In the following example, simulated data are used to illustrate the use of 27637 adeind :

  library("metrics")
  randomize(333)
  n      =  500
  x      =  normal(n,3)
  beta   =  #(0.2 , -0.7 , 1)
  index  =  x*beta
  eps    =  normal(n,1) * sqrt(0.5)
  y      =  2 * index^3 + eps
  d      =  0.2
  m      =  5
  {delta,dvar} = adeind(x,y,d,m)
  (delta/delta[1])~(beta/beta[1])

XLGmetric02.xpl

We have generated the data such that $g(x^T\beta)=2 (x^T\beta)^3$ . Recall that the estimate of $\delta$ is an estimate of $\beta$ up to scale. Since the scale of $\beta$ is not identified you may normalize the estimated coefficients by dividing each of them by the first coefficient and then interpret the effects of the explanatory variables relative to the effect of the first explanatory variable (which is normalized to 1). The estimated and true coefficient vectors, both normalized, are shown in the output window:

  Contents of _tmp
  [1,]        1        1 
  [2,]  -4.2739     -3.5 
  [3,]   5.8963        5

Note that this example does not work in the Academic Edition of XploRe .

12.1.5 Weighted Average Derivative Estimation

The need for trimming in ADE is a consequence of its random denominator. This and other difficulties associated with a random denominator are overcome by the Density Weighted Average Derivative Estimator (DWADE) of Powell, Stock and Stoker (1989). It is based on the density weighted average derivative of with respect to :

$\displaystyle \widetilde{\delta} = E \left \{\frac{\partial g(x^T\beta}{\partial x} f(x) \right \} = E \{ g'(x^T\beta) f(x)\} \beta.$

(12.8)

Obviously, $\widetilde{\delta}$ shares the property of the (unweighted) average derivative of being proportional to the coefficient vector $\beta$ in single index models. The estimation strategy is therefore basically the same: find a way to estimate $\widetilde{\delta}$ and you have an estimator of $\beta$ up to a scaling factor.

It can be shown that

$\displaystyle \widetilde{\delta} = -2 E \{E(Y\vert x) f'(x)\}.$

(12.9)

Thus we may estimate $\widetilde{\delta}$ (and thereby $\beta$ ) by

$\displaystyle \ \widehat{\widetilde{\delta}}= -\frac{2}{N} \sum_{i=1}^N Y_{i} \widehat f'_{h} (x_{i}),$

(12.10)

where $\widehat f'_{h}(x_{i})$ again denotes the vector of kernel estimators of the partial derivatives of

The DWADE estimator defined in (12.10) shares the desirable distributional features of the ADE estimator ( $\sqrt{N}$ -consistency, asymptotic normality). It is computed in XploRe by the 27771

dwade quantlet:

  d = dwade(x, y, h)

dwade needs the data (x and y and the bandwidth h as inputs and returns the vector of estimated density weighted average derivatives.

We illustrate 27777 dwade with simulated data:

  library("metrics")
  randomize(333)
  n      =  500
  x      =  normal(n,3)
  beta   =  #(0.2 , -0.7 , 1)
  index  =  x*beta
  eps    =  normal(n,1) * sqrt(0.5)
  y      =  2 * index^3 + eps
  h      =  0.3
  d      =  dwade(x,y,h)
  (d/d[1])~(beta/beta[1])

XLGmetric03.xpl

Note that we used the same data generating process as in the previous section and that we again show estimated and true coefficient vectors, both normalized by dividing through by their first element:

  Contents of _tmp
  [1,]        1        1 
  [2,]  -3.4727     -3.5 
  [3,]   4.9435        5

12.1.6 Average Derivatives and Discrete Variables

By definition, derivatives can only be calculated for continuous variables. Thus, 27899 adeind and 27902 dwade will not produce estimates of the components of $\beta$ that belong to discrete explanatory variables.

Most discrete explanatory variables are 0/1 dummy variables. How do they enter into SIMs ? To give an answer, let us assume that consists of several continuous and a single dummy variable and let us split and $\beta$ accordingly into (continuous component) and (dummy), and $\beta_1$ and $\beta_2,$ respectively. Then we have

$\begin{displaymath}\begin{array}{rcll} E(Y\vert x_1,x_2)&=& g(x_1^T\beta_1), &\t... ...)&=& g(x_1^T\beta_1 +\beta_2), &\textrm{if }x_2=1. \end{array}\end{displaymath}$

Graphically, changing

from 0 to

means shifting $g(\bullet )$ (as a function of $x_1^T\beta_1$ ) by $\beta_2.$ This is depicted in Figure 12.1, where $\beta _2$ is labeled

for technical reasons.

**Figure 12.1:** Shifting $g(\bullet )$ by $\beta _2$ .
$\includegraphics[scale=0.425]{shiftit}$

This is the basic idea underlying the estimator of $\beta _2$ proposed by Horowitz and Härdle (1996): given an estimate of $\beta_1$ (which can be obtained by using 27906 dwade , for instance) we can estimate both curves in Figure 12.1 by running separate kernel regression of on $x_1^T\widehat{\beta}$ for the data points for which (to get an estimate of $g(x_1^T\beta_1)$ ) and for which (to get an estimate of $g(x_1^T\beta_1+\beta_2)$ ). Then we can compute the horizontal differences between the two estimated curves to get an estimate of $\beta_2.$ This procedure is implemented in the XploRe quantlet 27913 adedis :

  {d, a, lim, h} = adedis(x2, x1, y, hd, hfac,c0,c1)

It takes as inputs the data, consisting of x2 (discrete explanatory variables), x1 (continuous explanatory variables) and y, and several parameters that are needed in the three steps of 27916

adedis estimation:

The (vector) of bandwidth(s) hd
used in estimating the coefficient(s) of the continuous explanatory variable(s) These coefficients are implicitly estimated by dwade . Separate estimates of $\beta_1$ are computed for groups of observations that share the same value of That is, if is a dummy variable then separate estimates of $\beta_1$ are computed for observations with and These separate estimates are subsequently averaged to get a unique estimate of $\beta_1.$ This is the first step of the estimation algorithm.
The bandwidth scaling factor hfac
for estimating the $g(\bullet )$ functions by kernel regression of on $x_1^T\widehat{\beta}_1$ , where $\widehat{\beta}_1$ is the estimate of $\beta_1$ from the first step. The scaling factor hfac is used in calculating the bandwidth for the kernel regression in the following way:

$\displaystyle \ h = \textrm{\tt hfac}\, \sqrt{\textrm{Var}(x_1^T\widehat{\beta}_1)}\, N^{-1/7.5}, \$ (12.11)

where is the sample size and $\textrm{Var}(x_1^T\widehat{\beta})$ is the sample variance of $x_1^T\widehat{\beta}_1.$ This rule for calculating the bandwidth is suggested by simulation results in Horowitz and Härdle (1996). Separate kernel regressions are performed for each group of observations with a common value of This is the second step of the estimation procedure.
The monotonicity constants c0 and c1
which are part of the following monotonicity conditions that $g(\bullet )$ is assumed to satisfy: There are finite numbers and such that and

$\begin{displaymath}\begin{array}{ll} g(x_1^T\beta_1 + \beta_2 x_2)<c_0 &\textrm{... ...m{for each value of $x_2$ if $x_1^T\beta_1>v_1$}. \ \end{array}\end{displaymath}$
Figure 12.1 shows an example where $g(\bullet )$ is the cdf of the standard normal. You may verify that the values of and in the figure satisfy the conditions but many other choices for these constants would have worked too. adedis essentially calculates the horizontal differences between the estimated $g(\bullet )$ functions (corresponding to the different values of ) over the range of function values which satisfy $c_0\leq \widehat{g}(\bullet) \leq c_1.$ The above conditions imply that in this range $\widehat{g}(\bullet)$ is monotonous. You may want to check this by graphing the estimated $g(\bullet )$ functions against $x_1^T\widehat{\beta}_1$ .

Whereas you have to specify and the constants and are implicitly set to the minimum and maximum of $x^T\widehat{\beta}_1$ , plus or minus the bandwidth used in the kernel estimation of $g(\bullet).$ The values of and are returned by 27925 adedis in the vector lim, along with the bandwidth calculated according to (12.11.) The most important outputs are, of course, the estimates of $\beta_1$ and $\beta _2$ which are stored in d and a, respectively.

12.1.7 Parametric versus Semiparametric Single Index Models

{t, p} = hhtest (vhat, y, yhat, h {,c {,m}})): tests a parametric against a semiparametric SIM

In the previous sections, we have seen that even the noniterative estimators of SIMs implemented in XploRe require quite a bit of care and computational effort. More importantly, semiparametric estimators are less efficient than parametric estimators if the assumptions underlying the latter are satisfied. It is therefore desirable to know whether the distributional flexibility of these models justifies the loss in efficiency and the extra computational cost. That is, we would like to statistically test semiparametric SIMs against easily estimable parametric SIMs.

Horowitz and Härdle (1994) have developed a suitable test procedure that is implemented in the 28205 hhtest quantlet. Formally, the HH-test considers the following hypotheses:

$\displaystyle H_0$	$\displaystyle :$	$\displaystyle E(Y\vert x) = f(x^T\beta),$
$\displaystyle H_1$	$\displaystyle :$	$\displaystyle E(Y\vert x) = g(x^T\beta),$

where $f(\bullet)$ is a known function (such as the cdf of the standard normal distribution) and $g(\bullet )$ is an unknown, smooth function. Hence, the parametric model is in the null whereas the semiparametric model is in the alternative hypothesis.

Here is the main idea underlying the HH-test: if the model under the null is true (and given an estimate of $\beta$ ), then a nonparametric regression of on $x^T\widehat\beta = v$ will give a consistent estimate of the parametric function $f(\bullet)$ . If, however, the parametric model is wrong, then the nonparametric regression of on $x^T\widehat\beta = v$ will deviate systematically from $f(\bullet).$

This insight is reflected in the HH-test statistic:

$\displaystyle T = \sqrt {h} \sum\limits_{i=1}^N w(x_i^T\widehat\beta) \lbrace Y... ...race \lbrace\widetilde f_{i}(x_i^T\widehat\beta) - f(x_i^T\widehat\beta)\rbrace$

(12.12)

where

is the bandwidth used in the nonparametric (kernel) regression, and $w(\bullet)$ is a weight function that downweighs extreme observations. In practice $w(\bullet)$ is defined to be identically equal to one for $90\%$ or $95\%$ of the central values of $x_i^T\widehat\beta$ and zero otherwise. The term $\lbrace\widetilde f_{i}(x_i^T\widehat\beta) - f(x_i^T\widehat\beta)\rbrace$ compares the kernel regression of

on $x_i^T\beta$ (denoted $\widetilde f_{i}(x_i^T\widehat\beta)$ ) with the parametric model implied by the null hypothesis.

The HH-test statistic is computed by the XploRe quantlet 28212 hhtest :

  {t, p} = hhtest(vhat, y, yhat, h {,c {,m}}})

The function 28215 hhtest takes as inputs

vhat: the vector with the estimated index $x_i^T\widehat{\beta}.$ Horowitz and Härdle (1994) suggest that the index can be estimated under the null hypothesis. Hence, if your model in is the Probit model, then you get vhat by running glmest with the "bipro" option on your data. See GLM (7).
y: the observations on the dependent variable.
yhat: the parametric estimate of $E(Y\vert x)$ , i.e. $f(x_i^T\widehat{\beta}).$
h: the bandwidth for the kernel regression of y on vhat.
c: optional parameter that must be lying in the interval Proportion of the sample to be cut at each extreme. Default is 0.05.
m: optional $n \textrm{\tt x} 1$ vector or scalar. m should be given only if is a binomial or binary dependent variable. If it is binomial, then m should be the vector of binomial coefficients. If is binary, then set m equal to 1. This will improve estimation of the variance of the test statistic.

28221 hhtest returns the test statistic t and the corresponding -value p. Under the test statistic defined in (12.12) is asymptotically normally distributed with zero mean and finite variance. The test, however, is a one-sided test because deviations of of the semiparametric kind considered in (12.12) will lead to large positive values of the test statistic.

We illustrate 28224 hhtest using the kyphosis data:

  library("metrics")
  x      =  read("kyphosis")                                        
  y      =  x[,4]                                                 
  x      =  x[,1:3]                                               
  x      =  matrix(rows(x))~x                                     
  h      =  2                                                     
  g      =  glmbilo(x,y)   
  eta    =  x*g.b
  mu     =  g.mu
  {t,p}  =  hhtest(eta,y,mu,h,0.05,1)           
  t~p

XLGmetric04.xpl

The value of the test statistic and the

-value are displayed in the XploRe output window:

  Contents of _tmp
  [1,] -0.79444  0.21346

The null hypothesis is the Logit model that was used to fit the data. We cannot reject

at conventional significance levels because the

-value is greater than 0.2.