The term semiparametric refers to models in which there is an unknown function in addition to an unknown finite dimensional parameter. For example, the binary response model is semiparametric if the function and the vector of coefficients are both treated as unknown quantities. This section describes two semiparametric models of conditional mean functions that are important in applications. The section also describes a related class of models that has no unknown finite-dimensional parameters but, like semiparametric models, mitigates the disadvantages of fully nonparametric models. Finally, this section describes a class of transformation models that is important in estimation of hazard functions among other applications. Powell (1994) discusses additional semiparametric models.
In a semiparametric single index model, the conditional mean function has the form
Model (10.1) contains many widely used parametric models as special cases. For example, if is the identity function, then (10.1) is a linear model. If is the cumulative normal or logistic distribution function, then (10.1) is a binary probit or logit model. When is unknown, (10.1) provides a specification that is more flexible than a parametric model but retains many of the desirable features of parametric models, as will now be explained.
One important property of single index models is that they avoid the curse of dimensionality. This is because the index aggregates the dimensions of , thereby achieving dimension reduction. Consequently, the difference between the estimator of and the true function can be made to converge to zero at the same rate that would be achieved if were observable. Moreover, can be estimated with the same rate of convergence that is achieved in a parametric model. Thus, in terms of the rates of convergence of estimators, a single index model is as accurate as a parametric model for estimating and as accurate as a one-dimensional nonparametric model for estimating . This dimension reduction feature of single index models gives them a considerable advantage over nonparametric methods in applications where is multidimensional and the single index structure is plausible.
A single-index model permits limited extrapolation. Specifically, it yields predictions of at values of that are not in the support of but are in the support of . Of course, there is a price that must be paid for the ability to extrapolate. A single index model makes assumptions that are stronger than those of a nonparametric model. These assumptions are testable on the support of but not outside of it. Thus, extrapolation (unavoidably) relies on untestable assumptions about the behavior of beyond the support of .
Before and can be estimated, restrictions must be imposed that insure their identification. That is, and must be uniquely determined by the population distribution of (, ). Identification of single index models has been investigated by Ichimura (1993) and, for the special case of binary response models, Manski (1988). It is clear that is not identified if is a constant function or there is an exact linear relation among the components of (perfect multicollinearity). In addition, (10.1) is observationally equivalent to the model , where and are arbitrary and is defined by the relation for all in the support of . Therefore, and are not identified unless restrictions are imposed that uniquely specify and . The restriction on is called location normalization and can be imposed by requiring to contain no constant (intercept) component. The restriction on is called scale normalization. Scale normalization can be achieved by setting the coefficient of one component of equal to one. A further identification requirement is that must include at least one continuously distributed component whose coefficient is non-zero. Horowitz (1998) gives an example that illustrates the need for this requirement. Other more technical identification requirements are discussed by Ichimura (1993) and Manski (1988).
The main estimation challenge in single index models is estimating . Given an estimator of , can be estimated by carrying out the nonparametric regression of on (e.g, by using kernel estimation). Several estimators of are available. Ichimura (1993) describes a nonlinear least squares estimator. Klein and Spady (1993) describe a semiparametric maximum likelihood estimator for the case in which is binary. These estimators are difficult to compute because they require solving complicated nonlinear optimization problems. Powell, et al. (1989) describe a density-weighted average derivative estimator (DWADE) that is non-iterative and easily computed. The DWADE applies when all components of are continuous random variables. It is based on the relation
The usefulness of single-index models can be illustrated with an example that is taken from Horowitz and Härdle (1996). The example consists of estimating a model of product innovation by German manufacturers of investment goods. The data, assembled in 1989 by the IFO Institute of Munich, consist of observations on 1100 manufacturers. The dependent variable is if a manufacturer realized an innovation during 1989 in a specific product category and 0 otherwise. The independent variables are the number of employees in the product category (), the number of employees in the entire firm (), an indicator of the firm's production capacity utilization (), and a variable , which is if a firm expected increasing demand in the product category and 0 otherwise. The first three independent variables are standardized so that they have units of standard deviations from their means. Scale normalization was achieved by setting .
Table 10.1 shows the parameter estimates obtained using a binary probit model and the semiparametric method of Horowitz and Härdle (1996). Figure 10.2 shows a kernel estimate of . There are two important differences between the semiparametric and probit estimates. First, the semiparametric estimate of is small and statistically nonsignificant, whereas the probit estimate is significant at the level and similar in size to . Second, in the binary probit model, is a cumulative normal distribution function, so is a normal density function. Figure 10.2 reveals, however, that is bimodal. This bimodality suggests that the data may be a mixture of two populations. An obvious next step in the analysis of the data would be to search for variables that characterize these populations. Standard diagnostic techniques for binary probit models would provide no indication that is bimodal. Thus, the semiparametric estimate has revealed an important feature of the data that could not easily be found using standard parametric methods.
EMPLP | EMPLF | CAP | DEM |
Semiparametric Model | |||
1 | 0.032 | 0.346 | 1.732 |
(0.023) | (0.078) | (0.509) | |
Probit Model | |||
1 | 0.516 | 0.520 | 1.895 |
(0.024) | (0.163) | (0.387) |
In a partially linear model, is partitioned into two non-overlapping subvectors, and . The model has the form
An estimator of can be obtained by observing that (10.3) implies
Let have continuously distributed components that are denoted . In a nonparametric additive model of the conditional mean function,
An estimator of can be obtained by observing that (10.5) and (10.6) imply
Linton and Härdle (1996) describe a generalized additive model whose form is
The use of the nonparametric additive specification (10.5) can be illustrated by estimating the model EDUC, where and EXP are defined as in Sect. 10.1, and EDUC denotes years of education. The data are taken from the 1993 CPS and are for white males with or fewer years of education who work full time and live in urban areas of the North Central U.S. The results are shown in Fig 10.3. The unknown functions and are estimated by the method of Linton and Nielsen (1995) and are normalized so that . The estimates of (Fig 10.3a) and (Fig 10.3b) are nonlinear and differently shaped. Functions and with different shapes cannot be produced by a single index model, and a lengthy specification search might be needed to find a parametric model that produces the shapes shown in Fig 10.3. Some of the fluctuations of the estimates of and may be artifacts of random sampling error rather than features of EDUC. However, a more elaborate analysis that takes account of the effects of random sampling error rejects the hypothesis that either function is linear.
A transformation model has the form
Another possibility is to assume that is unknown but that the distribution of is known. Cheng, Wei, and Ying (1995, 1997) have developed estimators for this version of (10.9). Consider, first, the problem of estimating . Let denote the (known) cumulative distribution function (CDF) of . Let and be two distinct, independent observations of . Then it follows from (10.9) that
The problem of estimating the transformation function is addressed by Cheng, Wei, and Ying (1997). Equation (10.11) implies that for any real and vector that is conformable with , . Cheng, Wei, and Ying (1997) propose estimating by the solution to the sample analog of this equation. That is, the estimator solves
A third possibility is to assume that and are both nonparametric in (10.9). In this case, certain normalizations are needed to make identification of (10.9) possible. First, observe that (10.9) continues to hold if is replaced by , is replaced by , and is replaced by for any positive constant . Therefore, a scale normalization is needed to make identification possible. This will be done here by setting , where is the first component of . Observe, also, that when and are nonparametric, (10.9) is a semiparametric single-index model. Therefore, identification of requires to have at least one component whose distribution conditional on the others is continuous and whose coefficient is non-zero. Assume without loss of generality that the components of are ordered so that the first satisfies this condition.
It can also be seen that (10.9) is unchanged if is replaced by and is replaced by for any positive or negative constant . Therefore, a location normalization is also needed to achieve identification when and are nonparametric. Location normalization will be carried out here by assuming that for some finite With this location normalization, there is no centering assumption on and no intercept term in .
Now consider the problem of estimating , , and . Because (10.9) is a single-index model in this case, can be estimated using the methods described in Sect. 10.2.1. Let denote the estimator of . One approach to estimating and is given by Horowitz (1996). To describe this approach, define . Let denote the CDF of conditional on . Set and . Then it follows from (10.9) that and that
Other estimators of when and are both nonparametric have been proposed by Ye and Duan (1997) and Chen (2002). Chen uses a rank-based approach that is in some ways simpler than that of Horowitz (1996) and may have better finite-sample performance. To describe this approach, define and . Let . Then whenever . This suggests that if were known, then could be estimated by