7.1 Brief Theory


7.1.1 Models

An additive model (AM) with response variable Y and explanatory variable $ T \in \mathbb{R}^{d}$ has the form

$\displaystyle E\left( Y\vert T=t \right)=\sum_{j=1}^{d} f_{j} \left( t_{j} \right) + c \; ,$

where $ c$ is a constant with $ E\left( Y \right)=c$ and the univariate functions, also called additive components, $ f_{j}$ obey $ E_{T_{j}} \left\{ f_{j} \left( T_{j} \right) \right\}=0$ for all $ j$.

Possible extensions which can be handled in XploRe are additive partially linear models (APLM), generalized additive models (GAM), a mixture of both (GAPLM), models with bivariate additive components and additive models with interaction.

Generalized additive models are of the form

$\displaystyle E \left( Y\vert T=t \right) = G \left\{ \sum_{j=1}^{d} f_{j} \left( t_{j} \right) + c \right\} $

with a known link function $ G$, $ c:=G^{-1} \left\{ E\left(Y\right) \right\}$ and the same conditions on $ f_j$ as above.

Special cases of these models are the well known probit and logit regression models.

Additive partially linear models allow for an additional linear part to the original additive model. When $ T \in \mathbb{R}^{d}$ is the explanatory variable with an influence of unknown functional form and $ X\in \mathbb{R}^{p}$ an explanatory variable with linear influence, we have

$\displaystyle E\left( Y\vert T=t, X=x \right)= \sum_{j=1}^ {d} f_{j}(t_{j}) + c + x^{T} \beta \; ,$

where the additive components $ f_{j}$ and the parameter $ \beta$ have to be estimated. These models are especially recommended to include discrete and dummy variables into the model.

The mixture of both, that is the generalized additive partially linear model, has the form

$\displaystyle E\left( Y\vert T=t, X=x \right)=G\left\{ \sum_{j=1}^{d} f_{j} \left( t_{j} \right) + c + x^{T} \beta \right\}
\; .$

Sometimes we know of joint influence of some explanatory variables, e.g. $ T_{k}$ and $ T_{l}$ and thus their influence cannot be separated into two additive components. In those cases, the sum of them, $ f_{k}\left( \cdot \right) + f_{l}
\left( \cdot \right)$, has to be replaced by a bivariate additive component $ f_{k,l}\left( \cdot , \cdot \right)$.

A further possible extension is to keep the additive separable structure as introduced above but to allow additionally for any kind of interaction term $ f_{kl}$. Do not mix them up with the bivariate additive components we just spoke about! Here we focus on the isolated marginal influences with condition $ E_{T_{k}} \left\{ f_{kl} \left( T_{k}, t_{l}
\right) \right\} \equiv E_{T_{l}} \left\{ f_{kl} \left( t_{k}, T_{l} \right) \right\} \equiv 0$. The model we consider then is

$\displaystyle E \left( Y\vert T=t \right)=\sum_{j=1}^{d} f_{j} \left( T_{j} \right) + \sum_{1\leq k<l\leq d} f_{kl} \left( T_{k}, T_{l}
\right) + c $

with $ E_{T_{j}} \left\{ f_{j} \left( T_{j} \right) \right\}=0$ as in the models before.


7.1.2 Marginal Integration

As indicated by its name, the marginal integration estimator is estimating the marginal influence of a particular explanatory variable on a multidimensional regression. If the true model is an additive separable one, the functional to be estimated is exactly the additive component. Possible interaction in the model will be neglected and the estimator is still giving the marginal and thus interpretable influence of the considered variable.

The basic idea of the estimation procedure is to estimate a pre-estimator of the hyperdimensional regression surface and then to integrate out the dimensions not of interest keeping the direction of interest fixed. Assuming we are interested in the first additive component at point $ t_{1}$ this would lead to the formula

$\displaystyle \hat{f}_{1} \left( t_{1} \right) = \sum_{l=1}^{n} \tilde{m} \left( t_{1}, T_{-1,l} \right) \; ,$

where $ \tilde{m}$ is a pre-estimator for $ E \left( Y\vert T \right)$. $ T_{-1,l}$ denotes the $ l$-th observation of the explanatory vector $ T \in \mathbb{R}^{d}$ without the first influence variable and $ \left\{ Y_{i}, T_{i}
\right\}_{i=1}^{n}$ the observations.

The pre-estimators are kinds of multidimensional kernel smoother in XploRe . For the pre-estimation in generalized additive partially linear models we make use of a variant of profile maximum likelihood.

The calculation of such an estimate would need $ O(n^{3})$ steps, to apply this procedure on large data sets will take plenty of time. For those data sets an alternative fast procedure can be applied which asymptotically (for a large number of observations) yields the same results but needs only $ O(n^{2})$ computing steps. In that procedure for the pilot estimation the usual local polynomial estimator is replaced by the fully internalized smoother (see Jones, Davies, and Park; 1994).

Since we know the asymptotics for the integration estimator, different test procedures for component analysis can be constructed. The main idea is always to estimate either the function or its derivative under hypothesis as well as under alternative and to look on the squared integral of their difference. This corresponds to the Euclidian distance between hypothesis and alternative. If this distance is too large we reject the hypothesis.

For further details, especially how to estimate in extended models, we refer to the list of the literature.


7.1.3 Backfitting

The backfitting estimator is projecting the multidimensional regression problem into the space of additive models. It is always looking for that additive model which yields the best regression fit. If the true model is an additive separable one, the estimated functionals are just the additive components of the true model.

If the true model is not additive, one reason for the use of backfitting is its effect of dimension reduction in high dimensional regression problems. Even if the model assumption is false, this method often leads to a reasonable regression fit. But in that case the additive components must not be interpreted.

Since this algorithm is directly related to the whole regression it is not possible to estimate a single component separately. The implemented iterative procedure works as follows. Given starting values on the $ (n \times 1)$ vectors $ \mathbf{f}_{j}^{0}$ for $ j=1,2, \, \ldots , d$, update these vectors in the $ r$-th step by

$\displaystyle \mathbf{f}_{k}^{r} = S_{k} \left( \mathbf{y} - \sum_{j \ne k} \mathbf{f}_{j}^{r-1} \right) $

until some tolerance is reached. Here, $ S_{k}$ is the one dimensional smoothing operator calculating a regression from $ T_{k}$ on $ \left( \mathbf{y} - \sum_{j \ne k} \mathbf{f}_{j}^{r-1} \right)$.

In generalized additive models we have to work with a quasi likelihood function and so the Fisher scoring algorithm is applied as an outer iterative procedure before applying the above mentioned algorithm.

For a more detailed description of the backfitting procedure we refer to Hastie and Tibshirani (1990) or Opsomer and Ruppert (1997).


7.1.4 Orthogonal Series

Another well known method to estimate regression functions nonparametrically is to approximate them by an orthogonal function basis. You choose a basis of functions which describes e.g. the $ L_{2}$ space, fix the degree of fineness (usually depending on the number of observations and the dimensions of the particular sample) and finally estimate the unknown parameters of the resulting function to get an optimal fit for your sample.

Such a basis can be constructed by wavelets. Here, the fineness of the estimation is mainly determined by the chosen level degree. The estimated coefficients of the wavelets are asymptotically normally distributed.

Assuming an additive model for the considered regression problem, the additive components are simply formed by the wavelets which correspond to the same particular explanatory variable.

A testing procedure to examine the additive components is based on an analysis of the estimated coefficients.

For an introduction into wavelets and further information we recommend the Chapter 14 in Härdle, Klinke, and Müller (2000) and the quantlibs twave and wavelet in XploRe .

More details and theory can be found in Kaiser (1994) and Härdle, Sperlich, and Spokoiny (1997).