We now turn to the problem of estimating the marginal effects of the regressors . The marginal effect of an explanatory variable tells how changes on average if this variable is varying. In other words, the marginal effect represents the conditional expectation where the expectation is not only taken on the error distribution but also on all other regressors. (Note, that we usually suppressed the in all expectations up to now. This is the only case where we need to explicitely mention on which distribution the expectation is calculated.)
As already indicated, in case of true additivity the marginal effects correspond exactly to the additive component functions . The estimator here is based on an integration idea, coming from the following observation. Denote by the marginal density of . We have from (8.2)
Many extensions and modifications of the integration approach have been developed recently. We consider now the simultaneous estimation of both the functions and their derivatives by combining the procedure with a local polynomial approach (Subsections 8.2.1, 8.2.2) and the estimation of interaction terms (Subsection 8.2.3).
In order to estimate the marginal effect , equation (8.12) suggests the following idea: First estimate the function with a multidimensional pre-smoother , then integrate out the variables different from . In the estimation procedure integration can be replaced by averaging (over the directions not of interest, i.e. ) resulting in
Note that to get the marginal effects, we just integrate over all other (the nuisance) directions . In case of additivity these marginal effects are the additive component functions plus the constant . As for backfitting, the constant can be estimated consistently by at -rate. Hence, a possible estimate for is
(8.15) |
It remains to discuss how to obtain a reasonable pre-estimator
. Principally
this could be any multivariate nonparametric estimator. We
make use here of a special type of multidimensional local linear kernel
estimators, cf. Ruppert & Wand (1994) and
Severance-Lossin & Sperlich (1999).
This estimator is given by minimizing
To derive asymptotic properties of these estimators the concept of equivalent kernels is used, see e.g. Ruppert & Wand (1994). The main idea is that the local polynomial smoother of degree is asymptotically equivalent (i.e. has the same leading term) to a kernel estimator using a higher order kernel given by
We now extend the marginal integration method to the estimation of derivatives of the functions . For additive linear functions their first derivatives are constants and so all higher order derivatives vanish. However, very often in economics the derivatives of the marginal effects are of essential interest, e.g. to determine the elasticities or returns to scale as in Example 8.5.
To estimate the derivatives of the additive components, we do not need any further extension of our method, since using a local polynomial estimator of order for the pre-estimator provides us simultaneously with both component functions and derivative estimates up to degree . The reason is that the optimal in equation (8.16) is an estimate for , provided that dimension is separable from the others. In case of additivity this is automatically given.
Thus, we can use
Additionally, for the regression function estimate constructed by the additive component estimates and , i.e.
In the following example we illustrate the smoothing properties of this estimator for and .
|
As pointed out before, marginal integration estimates marginal effects. These are identical to the additive components, if the model is truly additive. But what happens if the underlying model is not purely additive? How do the estimators behave when we have some interaction between explanatory variables, for example given by an additional term ?
An obvious weakness of the truly additive model is that those interactions are completely ignored, and in certain econometric contexts -- production function modeling being one of them -- the absence of interaction terms has often been criticized. For that reason we will now extend the regression model by pairwise interactions resulting in
For the marginal integration estimator bivariate interaction terms have been studied in Sperlich et al. (2002). They provide asymptotic properties and additionally introduce test procedures to check for significance of the interactions. In the following we will only sketch the construction of the relevant estimation procedure and its application. For the theoretical results we remark that they are higher dimensional extensions of Theorem 8.3 and refer to the above mentioned article.
For the estimation of (8.20) by marginal integration we have to extend our identification condition
As before, equations (8.21) and (8.22) should not be considered as restrictions. It is always possible to shift the functions and in the vertical direction without changing the functional forms or the overall regression function. Moreover, all models of the form (8.20) are equivalent to exactly one model satisfying (8.21) and (8.22).
According to the definition of , let now denote the -dimensional random variable obtained by removing and from . With some abuse of notation we will write to highlight the directions in -dimensional space represented by the and coordinates. We denote the marginal densities of , and by , , and , respectively.
Again consider marginal integration as used before
Using the same estimation procedure as described above, i.e., replacing the expectations by averages and the function by an appropriate pre-estimator, we get estimates for and for the interaction terms . For the ease of notation we give only the formula for , i.e., the local linear estimator, in the pre-estimation step. We obtain
(8.25) |
Finally, let us turn to an example, which presents the application of marginal integration estimation, derivative (elasticity) estimation and allows us to illustrate the use of interaction terms.
We use a subset of observations of an original data set of more than 1,000 Wisconsin farms collected by the Farm Credit Service of St. Paul, Minnesota in 1987. Severance-Lossin & Sperlich (1999) removed outliers and incomplete records and selected farms which only produced animal outputs. The data consist of farm level inputs and outputs measured in dollars. In more detail, output is livestock, and the input variables are
|
|
A purely additive model (ignoring any possible interaction) is of the form
The results are given in Figures 8.5, 8.6. We use (product) Quartic kernels for all dimensions and bandwidths proportional to the standard deviation of ( , ). It is known that the integration estimator is quite robust against different choices of bandwidths, see e.g. Sperlich et al. (1999).
To highlight the shape of the estimates we display the main part of the point clouds including the function estimates. The graphs give some indication of nonlinearity, in particular for , and . The derivatives seem to indicate that the elasticities for these inputs increase and could finally lead to increasing returns to scale. Note that for all dimensions (especially where the mass of the observations is located) the nonparametric results differ a lot from the parametric case. An obvious conclusion from the economic point of view is that for instance larger farms are more productive (intuitively quite reasonable).
In Figures 8.7 to 8.8 we present the estimates of the bivariate interaction terms . For their estimation and graphical presentation we trimmed the data by removing of the most extreme observations. Again Quartic kernels were used, here with bandwidths and as above.
Obviously, often it is hard to interpret those interaction terms. But as long as we can visualize relationships a careful interpretation can be tried. Sperlich et al. (2002) find that a weak form of interaction is present. The variable (family labor) plays an important role in the interactions, especially (family labor and miscellaneous inputs), (family labor and intermediate run assets) and (miscellaneous inputs and intermediate run assets) should be taken into account.