This chapter is motivated by our attempt to answer pertinent questions concerning a number of real data sets, some of which are listed below.
Example 3.1.1. Consider the relationship between the
levels of pollutants and weather with the total number () of daily
hospital admissions for circulatory and respiratory problems. The covariates
are the average levels of sulphur dioxide (
, unit
),
nitrogen dioxide (
, unit
), respirable suspended
particulates (
, unit
), ozone (
, unit
), temperature (
, unit
) and humidity (
, unit %).
The data set was collected daily in Hong Kong from January 1, 1994 to December
31, 1995 (Courtesy of Professor T.S. Lau). The basic question is this: Are the
prevailing levels of the pollutants a cause for concern?
![]() ![]() ![]() ![]() |
The relationship between and
is quite complicated. A naive approach
may be to start with a simple linear regression model such as
Note that the coefficients of ,
and
are not significantly different from 0 (at the 5% level of
significance) and the negative coefficients of
and
are difficult to interpret. Refinements of the above
model are, of course, possible within the linear framework but it
is unlikely that they will throw much light in respect of the
opening question because, as we shall see, the situation is quite
complex.
Example 3.1.2. We revisit the Mackenzie River Lynx data for
1821-1934. Following common practice in ecology and statistics,
let denote
number recorded as trapped in year
1820
)
. The series is shown in Figure
1.2. It is interesting to see that the relation between
and
seems quite linear as shown in Figure 1.2(b).
However, the relation between
and
shows some
nonlinearity. A number of time series models have been proposed
in the literature. Do they have points of contact with one
another?
![]() ![]() |
Let be respectively
-valued,
-valued, and
-valued random
variables. In the absence of any prior knowledge about the
relation between
and
, a nonparametric regression
model is usually adopted, i.e.
,
where
almost surely. More recently,
there is a tendency to use instead some semiparametric models to
fit the relations between
and
. There are three
reasons for this. The first is to reduce the impact of the curse
of dimensionality in nonparametric estimation. The second is that
a parametric form allows us some explicit expression, perhaps
based on intuition or prior information, about part of the
relation between variables. The third is that for one reason or
another (e.g. availability of some background information) some
restrictions have to be imposed on the relations. The latter two
reasons also mean that we have some information about some of the
explanatory variables but not the others. A general semiparametric
model can be written as
(1) The following model has been quite often considered.
(2) A slightly more complicated model is the multi-index model
proposed by Ichimura and Lee (1991), namely
(3) The varying-coefficient model proposed by Hastie and Tibshirani (1993),
The above discussion and examples highlight the importance of dimension reduction for semiparametric models. For some special cases of model (3.2), some dimension reduction methods have been introduced. Next, we give a brief review of these methods.
The projection pursuit regression (PPR) was proposed by Friedman and Stuetzle (1981).
Huber (1985) gave a comprehensive discussion.
The commonly used PPR aims to find univariate functions
and directions
which satisfy the following sequence of
minimizations,
Another simple approach related to the estimation of the e.d.r. direction is the average derivative estimation (ADE) proposed by
Härdle and Stoker (1989). Suppose that
. Then
where
is the gradient of
the unknown regression function
with respect to its
arguments. It follows that
. Therefore, the difference
between
and the expectation of the gradient is a
scalar constant. We can estimate
nonparametrically, and then obtain an estimate of
by
the direction of the estimate of
.
An interesting result is that the estimator of
can
achieve root-
consistency even when we use high-dimensional
kernel smoothing method to estimate
.
However, there are several limitations with the ADE: (i) To
obtain the estimate of
, the condition
is needed. This condition is
violated when
is an even function and
is
symmetrically distributed. (ii) As far as we known, there is no
successful extension to the case of more than one e.d.r. direction.
The sliced inverse regression (SIR) method proposed by Li (1991)
is perhaps up to now the most powerful method for searching for
the e.d.r. directions. However, to ensure that such an inverse
regression can be taken, the SIR method imposes some strong
probabilistic structure on . Specifically, the method requires
that for any constant vector
, there are
constants
and
such that for any
,
For the general model (3.2), the methods listed above may
fail in one way or another. For instance, the SIR method fails
with most nonlinear times series models and the ADE fails with
model (3.8) when and
have common variables. In this
chapter, we shall propose a new method to estimate the e.d.r. directions
for the general model (3.2). Our approach is
inspired by the SIR method, the ADE method and the idea of local
linear smoothers (see, for example, Fan and Gibbers (1996)). It is
easy to implement and needs no strong assumptions on the
probabilistic structure of
. In particular, it can handle time
series data. Our simulations show that the proposed method has
better performance than the existing ones. Based on the properties
of our direction estimation methods, we shall propose a method to
estimate the number of the e.d.r. directions, which again does
not require special assumptions on the design
and is
applicable to many complicated models.
To explain the basic ideas of our approach, we shall refer mostly to models (3.4) and (3.8). Extension to other models is not difficult. The rest of this chapter is organized as follows. Section 3.2 gives some properties of the e.d.r. directions and extends the ADE method to the average outer product of gradients estimation method. These properties are important for the implementation of our estimation procedure. Section 3.3 describes the the minimum average (conditional) variance estimation procedure and gives some results. Some comparisons with the existing methods are also discussed. An algorithm is proposed in Section 3.5. To check the feasibility of our approach, we have conducted a substantial volume of simulations, typical ones of which are reported in Section 3.6. Section 3.7 gives some real applications of our method to both independently observed data sets and time series data sets.