7.3 Testing the GPLM

Having estimated the function $ m(\bullet)$, it is natural to ask whether the estimate $ \widehat{m}(\bullet)$ is significantly different from a parametric function obtained by a parametric GLM fit. In the simplest case, this means to consider

$\displaystyle H_0$ $\displaystyle :$ $\displaystyle m(t) = {\boldsymbol{T}}^\top {\boldsymbol{\gamma}}+ \gamma_0$  
$\displaystyle H_1$ $\displaystyle :$ $\displaystyle m(\bullet) \textrm{ is an arbitrary smooth function}.$  

A test statistic for this test problem is typically based on a semiparametric generalization of the parametric likelihood ratio test.

We will discuss two approaches here: Hastie & Tibshirani (1990) propose to use the difference of the deviances of the linear and the semiparametric model, respectively, and to approximate the degrees of freedom in the semiparametric case. The asymptotic behavior of this method is unknown, though. Härdle, Mammen & Müller (1998) derive an asymptotic normal distribution for a slightly modified test statistic.

7.3.1 Likelihood Ratio Test with Approximate Degrees of Freedom

In the following we denote the semiparametric estimates by $ \widehat{\mu}_i=G\{{\boldsymbol{U}}_i^\top \widehat{{\boldsymbol{\beta}}} +
\widehat{m}({\boldsymbol{T}}_i)\}$ and the parametric estimates by $ \widetilde{\mu}_i=G\{{\boldsymbol{U}}_i^\top \widetilde{{\boldsymbol{\beta}}} ...
...boldsymbol{T}}^\top \widetilde{{\boldsymbol{\gamma}}}
+ \widetilde{\gamma}_0\}$. A natural approach is to compare both estimates by a likelihood ratio test statistic

$\displaystyle LR = 2\sum\limits^n_{i=1} \ell(Y_i,\widehat{\mu}_i,\widehat\psi) - \ell(Y_i,\widetilde{\mu}_i,\widehat\psi).$ (7.28)

which would have an asymptotic $ \chi^2$ distribution if the estimates $ \widehat{\mu}_i$ were from a nesting parametric fit.

This test statistic can be used in the semiparametric case, too. However, an approximate number of degrees of freedom needs to be defined for the GPLM. The basic idea is as follows. Recall that $ D({\boldsymbol{Y}},\widehat{\boldsymbol{\mu}},\psi)$ is the deviance in the observations $ Y_i$ and fitted values $ \widehat\mu_i$, see (5.19). Abbreviate the estimated index $ \widehat{\boldsymbol{\eta}}={\mathbf{U}}\widehat{\boldsymbol{\beta}}+ \widehat{{\boldsymbol{m}}}$ and consider the adjusted dependent variable $ {\boldsymbol{Z}}= \widehat{\boldsymbol{\eta}}- {\mathbf{W}}^{-1} {\boldsymbol{v}}$. If at convergence of the iterative estimation $ \widehat{\boldsymbol{\eta}}= \MR{\boldsymbol{Z}}= \MR(\widehat{\boldsymbol{\eta}}- {\mathbf{W}}^{-1} {\boldsymbol{v}})$ with a linear operator $ \MR$, then

$\displaystyle a(\psi)\,D({\boldsymbol{Y}},\widehat{\boldsymbol{\mu}},\psi) \app...
...symbol{\eta}})^\top {\mathbf{W}}({\boldsymbol{Z}}- \widehat{\boldsymbol{\eta}})$ (7.29)

which has approximately

$\displaystyle df^{err}(\widehat{{\boldsymbol{\mu}}})= n - \mathop{\hbox{tr}}\left(2\MR- \MR^\top {\mathbf{W}}\MR{\mathbf{W}}^{-1} \right)$ (7.30)

degrees of freedom. In practice, the computation of the trace $ \mathop{\hbox{tr}}\left(\MR^\top {\mathbf{W}}\MR{\mathbf{W}}^{-1} \right)$ can be rather difficult. It is also possible to use the simpler approximation

$\displaystyle df^{err}(\widehat{{\boldsymbol{\mu}}})= n - \mathop{\hbox{tr}}\left(\MR\right)$ (7.31)

which would be correct if $ \MR$ were a projection operator and $ {\mathbf{W}}$ were the identity matrix. Now, for the comparison of the semiparametric $ \widehat{\boldsymbol{\mu}}$ and the parametric $ \widetilde{\boldsymbol{\mu}}$, the test statistic (7.28) can be expressed by

$\displaystyle LR = D({\boldsymbol{Y}},\widetilde{{\boldsymbol{\mu}}},\psi)
- D({\boldsymbol{Y}},\widehat{{\boldsymbol{\mu}}},\psi)$

and should follow approximately a $ \chi^2$ distribution with $ df^{err}(\widetilde{{\boldsymbol{\mu}}})-df^{err}(\widehat{{\boldsymbol{\mu}}})$ degrees of freedom.

Property (7.29) holds for backfitting and the generalized Speckman estimator with matrices $ \MR^B$ and $ \MR^S$, respectively. A direct application to the profile likelihood algorithm is not possible because of the more involved estimation of the nonparametric function $ m(\bullet)$. However a workable approximation can be obtained by using

$\displaystyle \MR^P = \widetilde{\mathbf{U}}\{\widetilde{\mathbf{U}}^\top {\mat...
...e{\mathbf{U}}^\top {\mathbf{W}}({\mathbf{I}}- {\mathbf{S}}^P) + {\mathbf{S}}^P,$ (7.32)

where $ \widetilde{\mathbf{U}}$ denotes $ ({\mathbf{I}}- {\mathbf{S}}^P) {\mathbf{U}}$.

7.3.2 Modified Likelihood Ratio Test

The direct comparison of the semiparametric estimates $ \widehat{\mu}_i$ and the parametric estimates $ \widetilde{\mu}_i$ can be misleading because $ \widehat{m}(\bullet)$ has a non-negligible smoothing bias, even under the linearity hypothesis. Hence, the key idea is to use the estimate $ \overline{m}({\boldsymbol{T}}_i)$ which introduces a smoothing bias to $ {\boldsymbol{T}}_i^\top \widetilde{{\boldsymbol{\gamma}}}+ \widetilde{\gamma}_0$. This estimate can be obtained from the updating procedure for $ m_i$ on the parametric estimate. Note that here the second argument of $ L(\bullet,\bullet)$ should be the parametric estimate of $ E(Y_i\vert {\boldsymbol{U}}_i, {\boldsymbol{T}}_i)$ instead of $ Y_i$ which means to apply the smoothing step according to (7.8) to the artificial data set consisting of $ \{G({\boldsymbol{U}}_i^\top \widetilde{{\boldsymbol{\beta}}} +{\boldsymbol{T}}...
...symbol{\gamma}}}
+\widetilde\gamma_0), {\boldsymbol{U}}_i, {\boldsymbol{T}}_i\}$.

Using this ``bias-adjusted'' parametric estimate $ \overline{m}(\bullet)$, one can form the test statistic

$\displaystyle \widetilde{LR} = 2\sum\limits^n_{i=1} \ell(\widehat{\mu}_i,\widehat{\mu}_i,\widehat\psi) - \ell(\overline{\mu}_i,\widetilde{\mu}_i,\widehat\psi)$ (7.33)

where $ \overline{\mu}_i = G\{{\boldsymbol{U}}_i^\top \widetilde{{\boldsymbol{\beta}}} +
\overline{m}({\boldsymbol{T}}_i)\}$ is the bias-adjusted parametric GLM fit and $ \widehat{\mu}_i$ is the semiparametric GPLM fit to the observations. Asymptotically, this test statistic is equivalent to

$\displaystyle \widetilde{\widetilde{LR}}= \frac{1}{a(\widehat\psi)}\, \sum\limi...
...\widehat{m}({\boldsymbol{T}}_i) - \overline{m}({\boldsymbol{T}}_i)\right\}^2\,.$ (7.34)

Hence, the resulting test statistic can be interpreted as a weighted quadratic difference of the (bias-adjusted) parametric predictor $ {\boldsymbol{U}}_i^\top \widetilde{{\boldsymbol{\beta}}}+\overline{m}({\boldsymbol{T}}_i)$ and the semiparametric predictor $ {\boldsymbol{U}}_i^\top \widehat{{\boldsymbol{\beta}}}+\widehat{m}({\boldsymbol{T}}_i)$.

Both test statistics $ \widetilde{LR}$ and $ \widetilde{\widetilde{LR}}$ have the same asymptotic normal distribution if the profile likelihood algorithm is used. (A $ \chi^2$ approximation does not hold in this case since kernel smoother matrices are not projection operators.) It turns out that the normal approximation does not work well. Therefore, for the calculation of quantiles, it is recommended to use a bootstrap approximation of the quantiles of the test statistic:

(a)
Generate samples $ Y_1^*, \ldots, Y^*_n$ with
$\displaystyle E^*(Y_i^*)$ $\displaystyle =$ $\displaystyle G({\boldsymbol{U}}_i^\top \widetilde{{\boldsymbol{\beta}}}
+ {\boldsymbol{T}}_i^\top \widetilde{{\boldsymbol{\gamma}}} + \gamma_0)$  
$\displaystyle {\mathop{\mathit{Var}}}^*(Y_i^*)$ $\displaystyle =$ $\displaystyle {a(\widehat\psi)}\,
V\{G({\boldsymbol{U}}_i^\top \widetilde{{\bol...
...a}}} +
{\boldsymbol{T}}_i^\top \widetilde{{\boldsymbol{\gamma}}} + \gamma_0)\}.$  

(b)
Calculate estimates based on the bootstrap samples and finally the test statistics $ \widetilde{LR}^{*}$. The quantiles of the distribution of $ \widetilde{LR}$ are estimated by the quantiles of the conditional distributions of $ \widetilde{LR}^{*}$.
There are several possibilities for the choice of the conditional distribution of the $ Y_i^*$s. In a binary response model, the distribution of $ Y_i$ is completely specified by $ \mu_i = G({\boldsymbol{U}}_i^\top \beta + {\boldsymbol{T}}_i^\top {\boldsymbol{\gamma}}+ \gamma_0)$ and a parametric bootstrap procedure can be used. If the distribution of $ Y_i$ cannot be specified (apart from the first two moments) one may use the wild bootstrap procedure of Härdle & Mammen (1993).

EXAMPLE 7.2  
Finally, consider the testing problem for Example 7.1. From Figure 7.1 it is difficult to judge significance of the nonlinearity. For this real data example, it cannot be excluded that the difference between the nonparametric and the linear fit may be caused by boundary and bias problems of $ \widehat{m}(\bullet)$. Additionally, the variable age (included in a linear way) has dominant influence in the model.


Table 7.3: Observed significance levels for testing GLM versus GPLM, migration data, 400 bootstrap replications
$ h$ 0.20 0.30 0.40    
$ {LR}$ (profile likelihood) 0.066 0.048 0.035    
$ {LR}$ (generalized Speckman) 0.068 0.047 0.033    
$ {LR}$ (backfitting) 0.073 0.062 0.068    
$ {LR}$ (modified backfitting) 0.068 0.048 0.035    
           
$ \widetilde{LR}$ (profile likelihood, bootstrap) 0.074 0.060 0.052    

Table 7.3 shows the results of the application of the different test statistics for different choices of the bandwidth $ h$. As we have seen in the simulations, the likelihood ratio test statistic $ LR$ and the modified test statistic $ \widetilde{LR}$ in combination with bootstrap give very similar results. The number of bootstrap simulations has been chosen as $ n_{boot}=400$. Linearity is clearly rejected (at 10% level) for all bandwidths from $ 0.2$ to $ 0.4$.

The different behavior of the tests for different $ h$ gives some indication of possible deviance of $ m(\bullet)$ from linear functions. The appearance of small wiggles of small length seems not to be significant for the bootstrap ($ h=0.2$). Also, the bootstrapped $ \widetilde{LR}$ still rejects large values of $ h$. This is due to the comparison of the semiparametric estimator with a bias corrected parametric one, yielding more independence of the bandwidth. $ \Box$

Partial linear models were first considered by Green & Yandell (1985), Denby (1986), Speckman (1988) and Robinson (1988b). For a combination with spline smoothing see also Schimek (2000a), Eubank et al. (1998) and the monograph of Green & Silverman (1994).

The extension of the partial linear and additive models to generalized regression models with link function is mainly considered in Hastie & Tibshirani (1986) and their monograph Hastie & Tibshirani (1990). They employed the observation of Nelder & Wedderburn (1972) and McCullagh & Nelder (1989) that the parametric GLM can be estimated by applying a weighted least squares estimator to the adjusted dependent variable and modified the LS estimator in a semi-/nonparametric way. Formal asymptotic results for the GPLM using Nadaraya-Watson type smoothing were first obtained by Severini & Wong (1992) and applied to this specific model by Severini & Staniswalis (1994). An illustration for the use of the profile likelihood and its efficiency is given by Staniswalis & Thall (2001).

The theoretical ideas for testing the GPLM using a likelihood ratio test and approximate degrees of freedom go back to Buja et al. (1989) and Hastie & Tibshirani (1990). The bootstrap procedure for comparing parametric versus nonparametric functions was formally discussed in Härdle & Mammen (1993). The theoretical results of Härdle, Mammen & Müller (1998) have been empirically analyzed by Müller (2001).

EXERCISE 7.1   Explain why (7.21) and (7.22) hold.

EXERCISE 7.2   Derive the backfitting estimators (7.24) and (7.25).

EXERCISE 7.3   Prove that the linear estimation matrix for backfitting in the GPLM case has form (7.26).


Summary
$ \ast$
A partial linear model (PLM) is given by

$\displaystyle E(Y\vert{\boldsymbol{X}})={\boldsymbol{X}}^\top{\boldsymbol{\beta}}+ m({\boldsymbol{T}}),$

where $ {\boldsymbol{\beta}}$ is an unknown parameter vector and $ m(\bullet)$ is an unknown smooth function of a multidimensional argument $ {\boldsymbol{T}}$.
$ \ast$
A generalized partial linear model (GPLM) is of the form

$\displaystyle E(Y\vert{\boldsymbol{X}})=G\{{\boldsymbol{X}}^\top{\boldsymbol{\beta}}+ m({\boldsymbol{T}})\},$

where $ G$ is a know link function, $ {\boldsymbol{\beta}}$ is an unknown parameter vector and $ m(\bullet)$ is an unknown smooth function of a multidimensional argument $ {\boldsymbol{T}}$.
$ \ast$
Partial linear models are usually estimated by Speckman's estimator. This estimator determines first the parametric component by applying an OLS estimator to a nonparametrically modified design matrix and response vector. In a second step the nonparametric component is estimated by smoothing the residuals w.r.t. the parametric part.
$ \ast$
The profile likelihood approach is based on the fact that the conditional distribution of $ Y$ given $ {\boldsymbol{U}}$ and $ {\boldsymbol{T}}$ is parametric. Its idea is to estimate the least favorable nonparametric function $ m_{\boldsymbol{\beta}}(\bullet)$ in dependence of $ {\boldsymbol{\beta}}$. The resulting estimate for $ m_{\boldsymbol{\beta}}(\bullet)$ is then used to construct the profile likelihood for $ {\boldsymbol{\beta}}$.
$ \ast$
The generalized Speckman estimator can be seen as a simplification of the profile likelihood method. It is based on a combination of a parametric IRLS estimator (applied to a nonparametrically modified design matrix and response vector) and a nonparametric smoothing method (applied to the adjusted dependent variable reduced by its parametric component).
$ \ast$
Generalized partial linear models should be estimated by the profile likelihood method or by a generalized Speckman estimator.
$ \ast$
To check whether the underlying true model is a parametric GLM or a semiparametric GPLM, one can use specification tests that are modifications or the classical likelihood ratio test. In the semiparametric setting, either an approximate number of degrees of freedom is used or the test statistic itself is modified such that bootstrapping its distribution leads to appropriate critical values.