The title of this section is taken from Maddala's (1983) well-known book of the same name which is still a very good reference for parametric models of this kind. XploRe 's metrics library covers some of the parametric models. Its comparative strength, however, is in the semiparametric models for data with limited-dependent and qualitative dependent variables, which have been developed more recently. We will first discuss the parametric models and then turn to their semiparametric competitors.
|
Probit, Logit and Tobit are among the three most widely used parametric models for analyzing data with limited-dependent or qualitative dependent variables. Probit and Logit can be viewed as special cases of the generalized linear model (GLM). This is the perspective taken in XploRe . Hence we ask you to consult GLM (7) for a description of XploRe 's Probit and Logit (standard Logit, conditional Logit, multinomial Logit) quantlets.
The Tobit model is a parametric censored regression model. Formally, the model is given by
It is well known that an OLS regression of the nonzero s on the
explanatory variables
will not produce consistent estimates of
the regression coefficients in this situation. The model implies
the following conditional mean function :
XploRe
offers a two-step estimation of In the first step, a
pilot estimate of
is obtained by estimating
the model
by Probit analysis (here,
denotes the indicator function). Using this pilot estimate
, we can compute
and
and use them as regressors when we
estimate (12.2) on the part of the sample where
The tobit quantlet takes the observed x and y as inputs and returns the vector of estimated coefficients b, the estimated standard deviation of the error term s and the estimated covariance matrix cv of b and s:
{b, s, cv} = tobit(x, y)
The dependent variable must be equal to 0 for the censored
observations. The known constant in (12.1) is subsumed in the
constant term of
That is, the estimated constant term is an estimate
of the original constant term minus
In the following example, simulated data is used to illustrate the
use of
tobit
:
library("metrics") randomize(241200) n = 500 k = 2 x = matrix(n)~aseq(1, n ,0.25) s = 8 u = s*normal(n) b = #(-9, 1) ystar = x*b+u y = ystar.*(ystar.>=0) tstep = tobit(x,y) dg = matrix(rows(tstep.cv),1) dig = diag(dg) stm = dig.*tstep.cv std = sqrt(sum(stm,2)) coef = tstep.b|tstep.s coef~(coef./std) ; t-ratios
Contents of _tmp [1,] -9.7023 -11.201 [2,] 1.0092 92.13 [3,] 6.8732 4.5859
XploRe
offers several quantlets to estimate semiparametric regression
models where the conditional mean of is assumed to depend on
the explanatory variables
only via a function of the single (linear) index
:
Define the vector of average derivatives of with respect to
as
![]() |
(12.4) |
Two comments are in order:
Here is an overview of the
XploRe
commands for estimating the vector
of average derivatives
(and thereby
up to scale in a SIM):
|
We will cover all quantlets, except for
adeslp
, which is
discussed in Stoker (1991).
It can be shown that
Equation (12.7) defines the average derivative
estimator (ADE) of Härdle and Stoker (1989) which is computed
by the
XploRe
quantlet
adeind
:
{delta, dvar} = adeind(x, y, d, h)
The bandwidth is visible in (12.7) but the binwidth d is not. This is because binning the data is merely a computational device for speeding up the computation of the estimator. Larger binwidth will speed up computation but imply a loss in information (due to using binned data rather than the actual data) that may be prohibitively large.
You may wonder what happened to the trimming bounds in
(12.7).
Trimming is necessary for working out the theoretical properties of the
estimator (to control the random denominator of
). In
adeind
, the trimming bounds are implicitly set such that 5 % of the data are
always trimmed off at the boundaries (this is done by calling
trimper
within
adeind
;
trimper
is an auxiliary quantlet, not covered here
in more detail).
In the following example, simulated data are used to illustrate the
use of
adeind
:
library("metrics") randomize(333) n = 500 x = normal(n,3) beta = #(0.2 , -0.7 , 1) index = x*beta eps = normal(n,1) * sqrt(0.5) y = 2 * index^3 + eps d = 0.2 m = 5 {delta,dvar} = adeind(x,y,d,m) (delta/delta[1])~(beta/beta[1])
Contents of _tmp [1,] 1 1 [2,] -4.2739 -3.5 [3,] 5.8963 5Note that this example does not work in the Academic Edition of XploRe .
The need for trimming in ADE is a consequence of its random
denominator. This and other difficulties associated with a random denominator
are overcome by the Density Weighted Average Derivative Estimator (DWADE) of
Powell, Stock and Stoker (1989). It is based on the density weighted
average derivative of with respect to
:
It can be shown that
![]() |
(12.9) |
d = dwade(x, y, h)
We illustrate
dwade
with simulated data:
library("metrics") randomize(333) n = 500 x = normal(n,3) beta = #(0.2 , -0.7 , 1) index = x*beta eps = normal(n,1) * sqrt(0.5) y = 2 * index^3 + eps h = 0.3 d = dwade(x,y,h) (d/d[1])~(beta/beta[1])
Contents of _tmp [1,] 1 1 [2,] -3.4727 -3.5 [3,] 4.9435 5
By definition, derivatives can only be calculated for continuous variables. Thus,
adeind
and
dwade
will not produce estimates of the components
of
that belong to discrete explanatory variables.
Most discrete explanatory variables are 0/1 dummy variables. How do they
enter into SIMs ?
To give an answer, let us assume that consists of several continuous and a
single dummy variable and let us split
and
accordingly into
(continuous component) and
(dummy), and
and
respectively.
Then we have
This is the basic idea underlying the estimator of proposed by
Horowitz and Härdle (1996): given an estimate of
(which can be obtained
by using
dwade
, for instance) we can estimate both curves in
Figure 12.1 by running separate kernel regression of
on
for
the data points for which
(to get an estimate of
)
and for which
(to get an estimate of
). Then we can compute
the horizontal differences between the two estimated curves to get an estimate
of
This procedure is implemented in the
XploRe
quantlet
adedis
:
{d, a, lim, h} = adedis(x2, x1, y, hd, hfac,c0,c1)It takes as inputs the data, consisting of x2 (discrete explanatory variables), x1 (continuous explanatory variables) and y, and several parameters that are needed in the three steps of
Whereas you have to specify and
the constants
and
are
implicitly set to the minimum and maximum of
, plus or
minus the bandwidth used in the kernel estimation of
The values of
and
are returned by
adedis
in the vector lim, along
with the bandwidth
calculated according to (12.11.) The most
important outputs are, of course, the estimates of
and
which are stored in d and a, respectively.
|
In the previous sections, we have seen that even the noniterative estimators of SIMs implemented in XploRe require quite a bit of care and computational effort. More importantly, semiparametric estimators are less efficient than parametric estimators if the assumptions underlying the latter are satisfied. It is therefore desirable to know whether the distributional flexibility of these models justifies the loss in efficiency and the extra computational cost. That is, we would like to statistically test semiparametric SIMs against easily estimable parametric SIMs.
Horowitz and Härdle (1994) have developed a suitable test procedure that
is implemented in the
hhtest
quantlet.
Formally, the HH-test considers the following hypotheses:
Here is the main idea underlying the HH-test:
if the model under the null is true (and given an estimate of ), then a
nonparametric regression of
on
will give a consistent
estimate of the parametric function
. If, however, the parametric
model is wrong, then the nonparametric regression of
on
will
deviate systematically from
This insight is reflected in the HH-test statistic:
The HH-test statistic is computed by the
XploRe
quantlet
hhtest
:
{t, p} = hhtest(vhat, y, yhat, h {,c {,m}}})
The function
hhtest
takes as inputs
hhtest
returns the test statistic t and the corresponding
-value
p. Under
the test statistic defined in (12.12) is asymptotically
normally distributed with zero mean and finite variance. The test, however, is
a one-sided test because deviations of
of the semiparametric kind considered in
(12.12) will lead to large positive values of the test statistic.
We illustrate
hhtest
using the
kyphosis
data:
library("metrics") x = read("kyphosis") y = x[,4] x = x[,1:3] x = matrix(rows(x))~x h = 2 g = glmbilo(x,y) eta = x*g.b mu = g.mu {t,p} = hhtest(eta,y,mu,h,0.05,1) t~p
Contents of _tmp [1,] -0.79444 0.21346The null hypothesis is the Logit model that was used to fit the data. We cannot reject