6.1 Estimating GPLMs

As mentioned above, a GPLM has the form

$\displaystyle E(Y\vert X,T) = G\{X^T\beta + m(T)\},$

where $ E(Y\vert X,T)$ denotes the expected value of the dependent variable $ Y$ given $ X$, $ T$ which are vectors of explanatory variables. The index $ X^T\beta + m(T)$ is linked to the dependent variable $ Y$ via a known function $ G(\bullet)$ which is called the link function in analogy to generalized linear models (GLM). The parameter vector $ \beta$ and the function $ m(\bullet)$ need to be estimated. Typically, generalized partial linear models are considered for $ Y$ from an exponential family. We therefore assume for the variance $ Var (Y\vert X,T)= \sigma^2 V[G\{X^T\beta + m(T)\}]$, i.e. a dependence on the index $ X^T\beta + m(T)$ and on a dispersion parameter $ \sigma^2$.


6.1.1 Models

It is easy to see that GPLM covers a range of semiparametric models, as for example:


6.1.2 Semiparametric Likelihood

The estimation methods for the GPLM are based on the idea that an estimate $ \widehat{\beta }$ can be found for known $ m(\bullet)$, and an estimate $ \widehat{m}(\bullet)$ can be found for known $ \beta$. The gplm quantlib implements profile likelihood estimation and backfitting. Details on the estimation procedure can be found in Hastie and Tibshirani (1990), Severini and Staniswalis (1994), Härdle, Mammen, and Müller (1998), Müller (1997).

The default numerical algorithm for likelihood maximization is the Newton-Raphson iteration. Optionally, a Fisher scoring can be chosen.


6.1.2.0.1 Profile Likelihood

Denote by $ L(\mu,y)$ the individual log-likelihood or (if the distribution of $ Y$ does not belong to an exponential family) quasi-likelihood function

$\displaystyle L(\mu, y) = \int\limits^y_{\mu} \frac{(s-y)}{V(s)}\,ds.$

The profile likelihood method considered in Severini and Wong (1992) and Severini and Staniswalis (1994) is based on the fact, that the conditional distribution of $ Y$ given $ X$ and $ T$ is parametric. The essential method for estimation is to fix the parameter $ \beta$ and to estimate the least favorable nonparametric function in dependence of this fixed $ \beta$. The resulting estimate for $ m_\beta (\bullet)$ is then used to construct the profile likelihood for $ \beta$.

Suppose, we have observations $ \{y_i, x_i, t_i\}$, $ i=1, \ldots, n$. Denote the individual log- or quasi-likelihood in $ y_i$ by

$\displaystyle \ell_i(\eta) = L\{G(\eta),y_i\}.$

In the following, $ \ell'_i$ and $ \ell''_i$ denote the derivatives of $ \ell_i(\eta)$ with respect to $ \eta$. Abbreviate now $ m_j=m_\beta (t_j)$ and define $ S^P$ the smoother matrix with elements

$\displaystyle S^P_{ij} = \frac{\ell''_i(x_i^T{\beta }+{m}_j) K_{H} (t_i-t_j)} {\sum\limits_{i=1}^n \ell''_i(x_i^T{\beta } + {m}_j)K_{H} (t_i-t_j)}$ (6.1)

and let $ X$ be the design matrix with rows $ x_i^T$. Denote further by $ I$ the identity matrix, by $ v$ the vector and by $ W$ the diagonal matrix containing the first ($ \ell_i'$) and second ($ \ell_i''$) derivatives of $ \ell_i(x_i^T{\beta }+{m}_i)$, respectively.

The Newton-Raphson estimation algorithm (see Severini and Staniswalis; 1994) is then as follows.

\fbox{\parbox{0.9\textwidth}{
\centerline{Profile Likelihood Algorithm}
\hrule
\...
..._i (x_i^T{\beta }
+ {m}_j)\, K_{H} (t_i-t_j)}.\end{displaymath}\end{itemize} }}

The variable $ \widetilde{z}$ is a sort of adjusted dependent variable. From the formula for $ {\beta }^{new}$ it becomes clear, that the parametric part of the model is updated by a parametric method (with a nonparametrically modified design matrix $ {X}$).

Alternatively, the functions $ \ell''_i$ can be replaced by their expectations (w.r.t. to $ y_i$) to obtain a Fisher scoring type procedure.


6.1.2.0.2 Generalized Speckman Estimator

The profile likelihood estimator is particularly easy to derive in case of a model with identity link and normally distributed $ y_i$. Here, $ \ell_i'= y_i - x_i^T\beta - m_j$ and $ \ell_i''\equiv -1$. The latter yields the smoother matrix $ S$ with elements

$\displaystyle S_{ij} = \frac{ K_{H} (t_i-t_j)} {\sum\limits_{i=1}^n K_{H} (t_i-t_j)}.$ (6.2)

Moreover, the update for $ m_j$ simplifies to

$\displaystyle {m}^{new}= S(y- X{\beta })$

using the vector notation $ y= (y_1,\ldots,y_n)^T$, $ m^{new} = \left(m_1^{new},\ldots,m_n^{new}\right)^T$. The parametric component is determined by

$\displaystyle {\beta }^{new}=(\widetilde{X}^T \widetilde{X})^{-1}
\widetilde{X}^T \widetilde{y}$

with $ \widetilde{X} = (I-S) X$ and $ \widetilde{y} = (I-S) y$. These estimators for the partial linear model were proposed by Speckman (1988).

Recall that each iteration step of a GLM is a weighted least squares regression on an adjusted dependent variable (McCullagh and Nelder; 1989). Hence, in the partial linear model the weighted least squares regression could be replaced by an partial linear fit on the adjusted dependent variable

$\displaystyle z= X\beta + m- W^{-1} v.$ (6.3)

Again, denote $ v$ a vector and $ W$ a diagonal matrix containing the first ($ \ell_i'$) and second ($ \ell_i''$) derivatives of $ \ell_i(x_i^T{\beta }+m_i)$, respectively. Then, the Newton-Raphson type Speckman estimator (see Müller; 1997) for the GPLM can be written as:
\fbox{\parbox{0.9\textwidth}{
\centerline{Generalized Speckman Algorithm}
\hrule...
...=& (I- S) z
= \widetilde{X}{\beta } - W^{-1}v.
\end{eqnarray*}\end{itemize} }}
The basic simplification of this approach consists in using the smoothing matrix $ S$ with elements

$\displaystyle S_{ij} = \frac{\ell''_i(x_i^T{\beta }+{m}_i) K_{H} (t_i-t_j)} {\sum\limits_{i=1}^n \ell''_i(x_i^T{\beta } + {m}_i)K_{H} (t_i-t_j)}$ (6.4)

instead of the matrix $ S^P$ from (6.1). As before, a Fisher scoring type procedure is obtained by replacing $ \ell_i''$ by their expectations.


6.1.2.0.3 Backfitting

The backfitting method was suggested as an iterative algorithm to fit an additive model (Hastie and Tibshirani; 1990). The key idea is to regress the additive components separately on partial residuals. The ordinary partial linear model (with identity link function)

$\displaystyle E(Y\vert X,T) = X^T\beta + m(T)$

is a special case, consisting of only two additive functions. Denote $ P$ the projection matrix $ P= X(X^TX)^{-1} X^T$ and $ S$ a smoother matrix. Abbreviate $ m= \left(m_1,\ldots,m_n\right)^T=
\left(m(t_1),\ldots,m(t_n)\right)^T$. Then backfitting means to solve
$\displaystyle X\beta$ $\displaystyle =$ $\displaystyle P(y- m)$  
$\displaystyle m$ $\displaystyle =$ $\displaystyle S(y- X\beta ).$  

For a GPLM, backfitting means now to perform an additive fit on the adjusted dependent variable $ {z}$ which was defined in (6.3), see Hastie and Tibshirani (1990). We use again the kernel smoother matrix $ S$ from (6.4).

\fbox{\parbox{0.9\textwidth}{
\centerline{Backfitting Algorithm}
\hrule
\begin{i...
...de z&=& (I- S) z= \widetilde{X}\beta - W^{-1}v.
\end{eqnarray*}\end{itemize} }}
As for profile likelihood and Speckman estimation, we obtain a Newton-Raphson or Fisher scoring type algorithm by using $ \ell_i''$ or $ E(\ell_i'')$, respectively.