An additive model (AM) with response variable Y and explanatory variable
has the form
Possible extensions which can be handled in XploRe are additive partially linear models (APLM), generalized additive models (GAM), a mixture of both (GAPLM), models with bivariate additive components and additive models with interaction.
Generalized additive models are of the form
Special cases of these models are the well known probit and logit regression models.
Additive partially linear models allow for an additional linear part
to the original additive model. When
is the explanatory variable with an influence of unknown functional form and
an
explanatory variable with linear influence, we have
The mixture of both, that is the generalized additive partially linear model, has the form
Sometimes we know of joint influence of some explanatory variables,
e.g. and
and thus their influence
cannot be separated into two additive components. In those cases,
the sum of them,
, has to be replaced by a bivariate additive
component
.
A further possible extension is to keep the additive separable structure as introduced above but to allow additionally
for any kind of interaction term . Do not mix them up with the bivariate additive components we just spoke
about! Here we focus on the isolated marginal influences with condition
. The model we consider
then is
As indicated by its name, the marginal integration estimator is estimating the marginal influence of a particular explanatory variable on a multidimensional regression. If the true model is an additive separable one, the functional to be estimated is exactly the additive component. Possible interaction in the model will be neglected and the estimator is still giving the marginal and thus interpretable influence of the considered variable.
The basic idea of the estimation procedure is to estimate a pre-estimator of the hyperdimensional regression surface and
then to integrate out the dimensions not of interest keeping the direction of interest fixed. Assuming we are
interested in the first additive component at point this would lead to the formula
The pre-estimators are kinds of multidimensional kernel smoother in XploRe . For the pre-estimation in generalized additive partially linear models we make use of a variant of profile maximum likelihood.
The calculation of such an estimate would need steps, to apply this procedure on large
data sets will take plenty of time. For those data sets an alternative fast procedure can be applied which
asymptotically (for a large number of observations) yields the same results but needs only
computing steps. In that procedure for the pilot estimation the usual local polynomial estimator is replaced by the
fully internalized smoother
(see Jones, Davies, and Park; 1994).
Since we know the asymptotics for the integration estimator, different test procedures for component analysis can be constructed. The main idea is always to estimate either the function or its derivative under hypothesis as well as under alternative and to look on the squared integral of their difference. This corresponds to the Euclidian distance between hypothesis and alternative. If this distance is too large we reject the hypothesis.
For further details, especially how to estimate in extended models, we refer to the list of the literature.
The backfitting estimator is projecting the multidimensional regression problem into the space of additive models. It is always looking for that additive model which yields the best regression fit. If the true model is an additive separable one, the estimated functionals are just the additive components of the true model.
If the true model is not additive, one reason for the use of backfitting is its effect of dimension reduction in high dimensional regression problems. Even if the model assumption is false, this method often leads to a reasonable regression fit. But in that case the additive components must not be interpreted.
Since this algorithm is directly related to the whole regression it is not possible to estimate a single component
separately. The implemented iterative procedure works as follows. Given starting values on the
vectors
for
, update these vectors in the
-th step by
In generalized additive models we have to work with a quasi likelihood function and so the Fisher scoring algorithm is applied as an outer iterative procedure before applying the above mentioned algorithm.
For a more detailed description of the backfitting procedure we refer to Hastie and Tibshirani (1990) or Opsomer and Ruppert (1997).
Another well known method to estimate regression functions nonparametrically is to approximate them by an orthogonal
function basis. You choose a basis of functions which describes e.g. the space, fix the degree of fineness
(usually depending on the number of observations and the dimensions of the particular sample) and finally estimate the
unknown parameters of the resulting function to get an optimal fit for your sample.
Such a basis can be constructed by wavelets. Here, the fineness of the estimation is mainly determined by the chosen level degree. The estimated coefficients of the wavelets are asymptotically normally distributed.
Assuming an additive model for the considered regression problem, the additive components are simply formed by the wavelets which correspond to the same particular explanatory variable.
A testing procedure to examine the additive components is based on an analysis of the estimated coefficients.
For an introduction into wavelets and further information we recommend the Chapter 14 in Härdle, Klinke, and Müller (2000) and the quantlibs twave and wavelet in XploRe .
More details and theory can be found in Kaiser (1994) and Härdle, Sperlich, and Spokoiny (1997).