9.1 Additive Partial Linear Models

The APLM can be considered as a modification of the AM by a parametric (linear) part or as a nontrivial extension of the linear model by additive components. The main motivation for this partial linear approach is that explanatory variables are often of mixed discrete-continuous structure. Apart from that, sometimes (economic) theory may guide us how some effects inside the regression have to be modeled. As a positive side effect we will see that parametric terms can be typically estimated more efficiently.

Consider now the APLM from (9.2) in the following way:

$\displaystyle Y = c+ {\boldsymbol{U}}^\top \beta + \sum_{\alpha =1}^q g_\alpha ( T_\alpha ) + \varepsilon$ (9.5)

with $ E(\varepsilon\vert{\boldsymbol{U}},{\boldsymbol{T}})=0$ and $ \mathop{\mathit{Var}}(Y\vert{\boldsymbol{U}},{\boldsymbol{T}})=\mathop{\mathit...
...t{\boldsymbol{U}},{\boldsymbol{T}})=\sigma^2({\boldsymbol{U}},{\boldsymbol{T}})$. We have already presented estimators for $ {\boldsymbol{\beta}}$ and $ m=c+\sum g_\alpha$ in Section 7.2.2:
$\displaystyle {\widehat{\boldsymbol{\beta}}}$ $\displaystyle =$ $\displaystyle \left\{{{\mathbf{U}}}^\top ({\mathbf{I}}-{\mathbf{S}})^2 {\mathbf...
...ht\}^{-1}
{{\mathbf{U}}}^\top ({\mathbf{I}}-{\mathbf{S}})^2 {{\boldsymbol{Y}}},$ (9.6)
$\displaystyle {\widehat{\boldsymbol{m}}}$ $\displaystyle =$ $\displaystyle {\mathbf{S}}({\boldsymbol{Y}}- {\mathbf{U}}{\widehat{\boldsymbol{\beta}}}).$ (9.7)

As known from Denby (1986) and Speckman (1988), the parameter $ {\boldsymbol{\beta}}$ can be estimated at $ \sqrt{n}$-rate. The problematic part is thus the estimation of the additive part, as the estimate of $ m$ lacks precision due to the curse of dimensionality. When using backfitting for the additive components, the construction of a smoother matrix is principally possible, however, eluding from any asymptotic theory.

For this reason we consider a procedure based on marginal integration, suggested first by Fan et al. (1998). Their idea is as follows: Assume $ {\boldsymbol{U}}$ has $ M$ realizations being $ {\boldsymbol{u}}^{(1)},\ldots ,{\boldsymbol{u}}^{(M)}$. We can then calculate the estimates $ \widetilde g_\alpha^{k}$ for each of the subsamples $ k=1,\ldots,M$. Now we average over the $ \widetilde g_\alpha^{k}$ to obtain the final estimate $ \widehat
g_\alpha$.

Note that this subsampling affects only the pre-estimation, when determining $ \widetilde m$ at the points over which we have to integrate. To estimate $ \widetilde m(t_\alpha,{\boldsymbol{T}}_{i\underline{\alpha}}\vert{\boldsymbol{u}}^{(k)})$, we use a local linear expansion in the direction of interest $ \alpha$ (cf. (8.16)), and minimize

$\displaystyle \sum_{i=1}^n \left\{ Y_i- \gamma_0 - \gamma_1 ( T_{i\alpha}-t_\al...
...}}_{l\underline{\alpha}}) \Ind\{ {\boldsymbol{U}}_i = {\boldsymbol{u}}^{(k)} \}$ (9.8)

for each $ {\boldsymbol{T}}_{l\alpha}$. The notation is the same as in Chapter 8. We can estimate $ \widetilde m(t_\alpha,{\boldsymbol{T}}_{i\underline{\alpha}}\vert{\boldsymbol{u}}^{(k)})$ by the optimized $ \gamma_0$. Repeating this for all $ k$ and using the marginal integration technique, we obtain

$\displaystyle \widehat g_\alpha (\bullet ) = \frac 1n \sum_{i=1}^n \widetilde m...
...pha}, {\boldsymbol{T}}_{l \underline{ \alpha}} \vert {\boldsymbol{U}}_{l} ) \ ,$ (9.9)

cf. equation (8.14).

Under the same assumptions as in Sections 8.2.1 and 8.2.2, complemented by regularity conditions for $ {\boldsymbol{U}}$ and adjustments for the pdfs $ f_\alpha$, we obtain the same asymptotic results for $ \widehat
g_\alpha$ as in Chapter 8. The mentioned adjustments are necessary as we have to use conditional densities $ f(\bullet\vert{\boldsymbol{u}})$, $ f_{\alpha}(\bullet \vert{\boldsymbol{u}})$, and $ f_{\underline{\alpha}}(\bullet\vert{\boldsymbol{u}})$ now; for details we refer to Fan et al. (1998). For example, when $ {\boldsymbol{U}}$ is discrete, we set $ f({\boldsymbol{t}},{\boldsymbol{u}}) = f({\boldsymbol{t}}\vert{\boldsymbol{u}}) P({\boldsymbol{U}}={\boldsymbol{u}})$.

Having estimated the nonparametric additive components $ g_\alpha$, we turn to the estimation of the parameters $ {\boldsymbol{\beta}}$ and $ c$. Note that $ c$ is now no longer the unconditional expectation of $ Y$, but covers also the parametric linear part:

$\displaystyle c=E(Y)-E({\boldsymbol{U}}^\top {\boldsymbol{\beta}})\,.$

Thus, $ c$ could be estimated by

$\displaystyle \widehat c = \overline{Y} - \frac 1n \sum_{i=1}^n {\boldsymbol{U}}_i^\top \widehat {\boldsymbol{\beta}}\,.$ (9.10)

This estimator is unbiased with parametric $ \sqrt{n}$-rate if this also holds for $ \widehat{\boldsymbol{\beta}}$.

How does one to obtain such a $ \sqrt{n}$-consistent $ \widehat{\boldsymbol{\beta}}$? The solution is an ordinary LS regression of the partial residuals $ Y_i -\sum\widehat g_\alpha (T_{i\alpha})$ on the $ {\boldsymbol{U}}_i$, i.e., minimizing

$\displaystyle \sum_{i=1}^n \left\{ Y_i- \sum_{\alpha} \widehat g_\alpha (T_{i\alpha}) - \gamma- {\boldsymbol{U}}_i^\top \beta \right\}^2\,.$ (9.11)

If we define

$\displaystyle {\mathbf{U}}=\left(\begin{array}{cc}
1 & {\boldsymbol{U}}_1^\top ...
...top \\
\vdots & \vdots \\
1 & {\boldsymbol{U}}_n^\top
\end{array}\right)\,, $

this writes as

$\displaystyle {\widehat\gamma \choose \widehat \beta} = ({\mathbf{U}}^\top{\mat...
...\{ {\boldsymbol{Y}}- \sum_{\alpha} \widehat {\boldsymbol{g}}_\alpha \right\}\,.$ (9.12)

Note that $ \gamma$ stands for the constant $ c$ plus a bias correction caused by the nonparametric estimates $ \widehat
g_\alpha$. Since we have always included an intercept, this bias does not affect the estimation of the slope vector $ \beta$ but it is not recommended that we use $ \widehat \gamma$ as an estimate for $ c$. Instead, the constant $ c$ should be estimated as suggested in equation (9.10). The following theorem gives the asymptotic properties of the estimator $ \widehat {\boldsymbol{\delta}}= (\widehat\gamma, \widehat {\boldsymbol{\beta}}^\top)^\top$:

THEOREM 9.1  
Under the aforementioned assumptions and $ h=o(n^{-1/4})$ it holds

$\displaystyle \sqrt{n}(\widehat{{\boldsymbol{\delta}}}-{\boldsymbol{\delta}}) \...
...rrow}\limits_{}^{L}} N(0,{\mathbf{V}}^{-1}
{\mathbf{\Sigma}}{\mathbf{V}}^{-1})
$

where

$\displaystyle {\mathbf{V}}=E {\boldsymbol{U}}{\boldsymbol{U}}^\top\,,\quad
{\ma...
...}
+ \mathop{\mathit{Var}}\left( \sum_{\alpha}{\boldsymbol{V}}_{\alpha} \right)
$

and

$\displaystyle {\boldsymbol{Z}}= {\boldsymbol{U}}- \sum_{\alpha}
\frac{ f_{\unde...
...,
f_{\alpha}(T_{\alpha}) \,E\left( {\boldsymbol{U}}\vert T_{\alpha} \right)\,,
$

$\displaystyle {\boldsymbol{V}}_{\alpha} = E({\boldsymbol{U}}) \left[ \left\{\su...
...mbol{g}}_{j} + {\boldsymbol{U}}^\top{\boldsymbol{\delta}}
\right\} \right] \,. $

The condition on the bandwidth means that to obtain $ \sqrt{n}$-consistent estimator of $ \widehat{\boldsymbol{\beta}}$, we have to undersmooth in the nonparametric part. A careful study of the proof reveals a bias term of order $ h^2$, which with $ h=o(n^{-1/4})$ is faster than $ \sqrt{n}$. Otherwise $ \widehat\beta$ would inherit the bias from the $ \widehat
g_\alpha$s as it is based on a regression of the partial residuals $ {\boldsymbol{Y}}-\sum_\alpha \widehat
{\boldsymbol{g}}_\alpha$.

Unfortunately, in practice this procedure is hardly feasible if $ {\boldsymbol{U}}$ has too many distinct realizations. For that case, Fan et al. (1998) suggest a modification that goes back to an idea of Carroll et al. (1997). Their approach leads to much more complicated asymptotic expressions, so that we only sketch the algorithm.

The problem for (quasi-)continuous $ {\boldsymbol{U}}$ is that we cannot longer work on subsamples. Instead, we simultaneously estimate the impact of $ {\boldsymbol{U}}$ and $ {\boldsymbol{T}}$. More precisely, $ {\boldsymbol{\beta}}$ is estimated by minimizing

$\displaystyle \sum_{i=1}^n \{ Y_{i}- \gamma_0 -\gamma_1 (T_{i\alpha}-t_{\alpha}...
...{\boldsymbol{T}}_{i\underline{\alpha}}- {\boldsymbol{T}}_{l\underline{\alpha}})$ (9.13)

when calculating the pre-estimate $ \widetilde m$ for all $ t_\alpha,
{\boldsymbol{T}}_{l\underline{\alpha}}$. Note that $ \widehat \gamma_0$ is not really an estimate for $ m$ but rather for $ c+
\sum_{\alpha} g_\alpha$ at the point $ (t_\alpha,
{\boldsymbol{T}}_{l\underline{\alpha}})$. Consequently, we will use further $ \widehat \gamma_0$ at the place off $ \widetilde m$. The centered average over the $ \widehat \gamma_0( t_\alpha , {\boldsymbol{T}}_{l\underline{\alpha}})$ is then the estimate for $ g_\alpha$:

$\displaystyle \widehat{g}_{\alpha} (\bullet)
= \frac 1n \sum_{i=1}^n\widehat{\g...
...1}^n
\widehat{\gamma}_0(T_{i\alpha},{\boldsymbol{T}}_{l\underline{\alpha}})\,. $

The estimation of the parameter $ {\boldsymbol{\beta}}$ can again be done by ordinary LS as in (9.11) and (9.12).

EXAMPLE 9.1  
We consider data from 1992 on female labor supply in Eastern Germany. The aim of this example to study the impact of various factors on female labor supply which is measured by weekly hours of work ($ Y$). The underlying data set is a subsample from the German Socio Economic Panel (GSOEP) of $ n=607$ women having a job and living together with a partner. The explanatory variables are:
$ U_1$
female has children of less than $ 16$ years old (1 if yes),
$ U_2$
unemployment rate in the state where she lives,
$ T_1$
age (in years),
$ T_2$
wages (per hour),
$ T_3$
"Treiman prestige index" of the job (Treiman, 1975),
$ T_4$
rent or redemption for the flat,
$ T_5$
years of education,
$ T_6$
monthly net income of husband or partner.
Here, variable $ T_5$ reflects the human capital, $ T_2$, $ T_3$ represent the particular attractiveness of the job, and $ T_4$ is the main part of the household's expenditures. Note that since there are only five East German states, $ U_2$ can not take on more than $ 5$ different values.

Figure 9.1: Estimates of additive components with approximate 90% confidence intervals (left), density estimates of the regressor (right), female labor supply data, variables $ T_1$ to $ T_3$
\includegraphics[width=1.3\defpicwidth]{SPMfmhx1.ps}

Figure 9.2: Estimates of additive components with approximate 90% confidence intervals (left), density estimates of the regressor (right), female labor supply data, variables $ T_4$ to $ T_6$
\includegraphics[width=1.3\defpicwidth]{SPMfmhx2.ps}

The parametric coefficients are estimated as $ \widehat\beta_1 = -1.457$ (female has children) and $ \widehat\beta_2 = 0.5119$ (unemployment rate). Figures 9.1 and 9.2 show the estimated curves $ \widehat
g_\alpha$. The approximate confidence intervals are constructed as $ 1.64$ times the (estimated) pointwise standard deviation of the curve estimates. The displayed point clouds are the logarithms of the distance to the partial residuals, i.e. $ Y-{\boldsymbol{U}}^\top \widehat{{\boldsymbol{\beta}}}
-\sum_{j\neq\alpha} \widehat{g}_j (T_j) $.

The plots show clear nonlinearities, a fact that has been confirmed in Härdle et al. (2001) when testing these additive functions for linearity. If the model is chosen correctly, the results quantify the extent to which each variable affects the female labor supply. For a comparison with a parametric model we show in Table 9.1 the results of a parametric least squares analysis.


Table 9.1: OLS coefficients for female labor supply data
Source Sum of Squares df Mean square $ F$-ratio
Regression 6526.3 10 652.6 9.24
Residual 42101.1 596 70.6  
$ R^2$ = 13.4% $ R^2$(adjusted) = 12,0%  
Variable Coefficients S.E. $ t$-values $ p$-values
constant 1.36 8.95 0.15 0.8797
CHILDREN -2.63 1.09 -2.41 0.0163
UNEMPLOYMENT 0.48 0.22 2.13 0.0333
AGE 1.63 0.43 3.75 0.0002
AGE$ ^2$ -0.021 0.0054 -3.82 0.0001
WAGES -1.07 0.18 -6.11 $ \leq $ 0.0001
WAGES$ ^2$ 0.017 0.0033 4.96 $ \leq $ 0.0001
PRESTIGE 0.13 0.034 3.69 0.0002
EDUCATION 0.66 0.19 3.58 0.0004
HOUSING 0.0018 0.0012 1.56 0.1198
NETINCOME -0.0016 0.0003 -4.75 $ \leq $ 0.0001

As can be seen from the table, $ T^2_1 $ (squared AGE) and $ T^2_2 $ (squared WAGE) have been added, and indeed their presence is highly significant. Their introduction was motivated by the nonparametric estimates $ \widehat g_3$, $ \widehat g_4$. Clearly, the piecewise linear shapes of $ g_5$ and $ g_7$ are harder to model in a parametric model. Here, at least the signs of the estimated parameters agree with the slopes of the nonparametric estimate in the upper part. Both factors are highly significant but the nonparametric analysis suggests that both factors are less influential for young women and the low income group. $ \Box$