Asymptotic statistical properties are only one part of the story. For the applied researcher knowledge of the finite sample behavior of a method and its robustness are essential. This section is reserved for this topic.
We present and examine now results on the finite sample performance of the competing backfitting and integration approaches. To keep things simple we only use local constant and local linear estimates here. For a more detailed discussion see Sperlich et al. (1999) and Nielsen & Linton (1998). Let us point out again, that
The following two subsections refer to results for the specific models
Note that all estimators presented in this section (two-dimensional backfitting and marginal integration estimators) are linear in , i.e. of the form
As we have already discussed in the first part of this book, the choice of the smoothing parameter is crucial in practice. The integration estimator requires to choose two bandwidths, for the direction of interest and for the nuisance direction. Possible practical approaches are the rule of thumb of Linton & Nielsen (1995) and the plug-in method suggested in Severance-Lossin & Sperlich (1999). Both methods use the MASE-minimizing bandwidth, the former approximating it by means of parametric pre-estimators, the latter one by using nonparametric pre-estimators.
For example, the formula for the MASE-minimizing (and thus asymptotically optimal) bandwidth in the local linear case is given by
In case of backfitting the procedure becomes possible due to the fact that we only consider one-dimensional smoothers. Here, the MASE-minimizing bandwidth is commonly approximated by the MASE-minimizing bandwidth for the corresponding one-dimensional kernel regression case.
|
|
Obviously, the backfitting estimator is rather sensitive to the choice of bandwidth. To get small MASE values it is important for the backfitting method to choose the smoothing parameter appropriately. For the integration estimator the results differ depending on the model. This method is nowhere near as sensitive to the choice of bandwidth as the backfitting. Focusing on the we have similar results as for the MASE but weakened concerning the sensitivity. Here the results differ more depending on the data generating model.
Table 8.1 presents the MASE when using local linear smoothers and the asymptotically optimal bandwidths. To exclude boundary effects each entry of the table consists of two rows: evaluation on the complete data set in the upper row, and evaluation on trimmed data in the lower row. The trimming was implemented by cutting off of the data on each side of the support.
covariance | |||||||||||||
model | |||||||||||||
back | 0.047 | 0.041 | 0.020 | 0.046 | 0.028 | 0.053 | 0.124 | 0.135 | 0.128 | 0.068 | 0.081 | 0.111 | |
0.038 | 0.031 | 0.014 | 0.037 | 0.018 | 0.033 | 0.107 | 0.116 | 0.099 | 0.046 | 0.055 | 0.081 | ||
int | 0.019 | 0.030 | 0.057 | 0.031 | 0.075 | 0.081 | 0.047 | 0.048 | 0.089 | 0.053 | 0.049 | 0.056 | |
0.013 | 0.017 | 0.047 | 0.024 | 0.059 | 0.071 | 0.022 | 0.026 | 0.078 | 0.026 | 0.022 | 0.041 | ||
back | 0.083 | 0.079 | 0.047 | 0.073 | 0.053 | 0.058 | 0.112 | 0.121 | 0.110 | 0.051 | 0.062 | 0.096 | |
0.071 | 0.060 | 0.024 | 0.058 | 0.032 | 0.028 | 0.101 | 0.110 | 0.091 | 0.039 | 0.048 | 0.075 | ||
int | 0.090 | 0.116 | 0.530 | 0.137 | 0.234 | 0.528 | 0.048 | 0.480 | 1.32 | 0.057 | 0.603 | 2.41 | |
0.028 | 0.029 | 0.205 | 0.027 | 0.031 | 0.149 | 0.032 | 0.061 | 0.151 | 0.040 | 0.265 | 1.02 | ||
back | 0.052 | 0.054 | 0.049 | 0.051 | 0.054 | 0.057 | 0.061 | 0.063 | 0.068 | 0.065 | 0.066 | 0.064 | |
0.032 | 0.031 | 0.028 | 0.030 | 0.029 | 0.035 | 0.035 | 0.035 | 0.037 | 0.038 | 0.037 | 0.038 | ||
int | 0.115 | 0.145 | 0.619 | 0.175 | 0.285 | 0.608 | 0.118 | 0.561 | 1.37 | 0.085 | 0.670 | 2.24 | |
0.041 | 0.041 | 0.252 | 0.043 | 0.053 | 0.194 | 0.076 | 0.083 | 0.189 | 0.044 | 0.257 | 0.681 |
We see that no estimator is uniformly superior to the others. All results depend more significantly on the design distribution and the underlying model than on the particular estimation procedure. The main conclusion is that backfitting almost always fits the overall regression better whereas the marginal integration often does better for the additive components. Recalling the construction of the procedures this is not surprising, but exactly what one should have expected.
Also not surprisingly, the integration estimator suffers more heavily from boundary effects. For increasing correlation both estimators perform worse, but this effect is especially present for the integration estimator. This is in line with the theory saying that the integration estimator is inefficient for correlated designs, see Linton (1997). Here a bandwidth matrix with appropriate non-zero arguments in the off diagonals can help in case of high correlated regressors, see a corresponding study in Sperlich et al. (1999). They point out that the fit can be improved significantly by using well defined off-diagonal elements in the bandwidth matrices. A similar analysis would be harder to do for the backfitting method as it depends only on one-dimensional smoothers. We remark that internalized marginal integration estimators (Dette et al., 2004) and smoothed backfitting estimators (Mammen et al., 1999; Nielsen & Sperlich, 2002) are much better suited to deal with correlated regressors.
How do the additive approaches overcome the curse of dimensionality? We compare now the additive estimation method with the bivariate Nadaraya-Watson kernel smoother. We define equivalent kernels as the linear weights used in the estimates for fitting the regression function at a particular point. In the following we take the center point , which is used in Figures 8.11 to 8.13. All estimators are based on univariate or bivariate Nadaraya-Watson smoothers (in the latter case using a diagonal bandwidth matrix).
|
|
|
Obviously, additive methods (Figures 8.12, 8.13) are characterized by their local panels along the axes instead of being uniformly equal in all directions like the bivariate Nadaraya-Watson (Figure 8.11). Since additive estimators are made up of components that behave like univariate smoothers, they can overcome the curse of dimensionality. The pictures for the additive smoothers look very similar (apart from some negative weights for the backfitting).
Finally, we see clearly how both additive methods run into problems when the correlation between the regressors is increasing. In particular for the marginal integration estimator recall that before we apply the integration over the nuisance directions, we pre-estimate on all combinations of realizations of and . For example, since are both uniform on it may happen that we have to pre-estimate the regression function at the point . Now imagine that and are positively correlated. In small samples, the pre-estimate for is then usually obtained by extrapolation. The insufficient quality of the pre-estimate does then transfer to the final estimate.
Consider the empirical model
The data base for this analysis is drawn from the Kienbaum Vergütungsstudie, containing data about top management compensation of German AGs and the compensation of managing directors (Geschäftsführer) of incorporateds (GmbHs). To measure compensation we use managerial compensation per capita due to the lack of more detailed information. The analysis is based on the following four industry groups.
Group | 1 | 2 | 3 | 4 |
# observations | 131 | 148 | 41 | 38 |
constant | 4.128 | 4.547 | 3.776 | 4.120 |
ROS | 1.641 | 0.959 | 15.01 | 8.377 |
log(SIZE) | 0.258 | 0.201 | 0.283 | 0.249 |
We first present the results of the parametric analysis for each group, see Table 8.2. The sensitivity parameter for the size variable can be directly interpreted as the size elasticity in each case.
|
We now check for a possible heterogeneous behavior over the groups. A two-dimensional Nadaraya-Watson estimate is shown in Figure 8.14. Considering the plots we realize that the estimated surfaces are similar for the industry groups 1 and 2 (upper row) while the surfaces for the two other groups clearly differ. Further, we see a strong positive relation for compensation to firm size at least in groups 1 and 2, and a weaker one to the performance measure varying over years and groups. Finally, interaction of the regressors -- especially in groups 3 and 4 -- can be recognized.
The backfitting procedure projects the data into the space of additive models. We used for the backfitting estimators univariate local linear kernel smoother with Quartic kernel and bandwidth inflation factors 0.75, 0.5 for group 1 and 2 and 1.25, 1.0 for groups 3 and 4. In the Figure 8.15 we compare the nonparametric (additive) components with the parametric (linear) functions. Over all groups we observe a clear nonlinear impact of ROS. Note, that the low values for significance in the parametric model describe only the linear impact, which here seem to be caused by functional misspecification (or interactions).
Finally, in Figure 8.16 we estimate the marginal effects of the regressors using local linear smoothers. The estimated marginal effects are presented together with -bands, where we use for the variance functions of the estimates. Note that for ROS in group 1 the ranges are slightly different as in Figure 8.15.
Generally, the results are consistent with the findings above. The nonlinear effects in the impact of ROS are stronger, especially in groups 1 and 2. Since the abovementioned bumps in the firm size do not exist here, we can conclude that indeed interaction effects are responsible for this. The backfitting results differ substantially from the estimated marginal effects in group 3 and 4 what again underlines the presence of interaction effects.
To summarize, we conclude that the separation into groups is useful, but groups 1 and 2 respectively 3 and 4 seem to behave similarly. The assumption of additivity seems to be violated for groups 3 and 4. Furthermore, the nonparametric estimates yield different results due to nonlinear effects and interaction, so that parametric elasticities underestimate the true elasticity in our example.
Additive models were first considered for economics and econometrics by Leontief (1947a,b). Intensive discussion of their application to economics can be found in Deaton & Muellbauer (1980) and Fuss et al. (1978). Wecker & Ansley (1983) introduced especially the backfitting method in economics.
The development of the backfitting procedure has a long history. The procedure goes back to algorithms of Friedman & Stuetzle (1982) and Breiman & Friedman (1985). We also refer to Buja et al. (1989) and the references therein. Asymptotic theory for backfitting has been studied first by Opsomer & Ruppert (1997), and later on (under more general conditions) in the above mentioned paper of Mammen et al. (1999).
The marginal integration estimator was first presented by Tjøstheim & Auestad (1994a) and Linton & Nielsen (1995), the idea can also be found in Newey (1994) or Boularan et al. (1994) for estimating growth curves. Hengartner et al. (1999) introduce modifications leading to computational efficiency. Masry & Tjøstheim (1997); Masry & Tjøstheim (1995) use marginal integration and prove its consistency in the context of time series analysis. Dalelane (1999) and Achmus (2000) prove consistency of bootstrap methods for marginal integration. Linton (1997) combines marginal integration and a one step backfitting iteration to obtain an estimator that is both efficient and easy to analyze.
Interaction models have been considered in different papers. Stone et al. (1997) and Andrews & Whang (1990) developed estimators for interaction terms of any order by polynomial splines. Spline estimators have also been used by Wahba (1990). For series estimation we refer in particular to Newey (1995) and the references therein. Härdle et al. (2001) use wavelets to test for additive models. Testing additivity is a field with a growing amount of literature, such as Chen et al. (1995), Eubank et al. (1995) and Gozalo & Linton (2001).
A comprehensive resource for additive modeling is is the textbook by Hastie & Tibshirani (1990) who focus on the backfitting approach. Further references are Sperlich (1998) and Ruppert et al. (1990).