8.2 Marginal Integration Estimator

We now turn to the problem of estimating the marginal effects of the regressors $X_\alpha$ . The marginal effect of an explanatory variable tells how

changes on average if this variable is varying. In other words, the marginal effect represents the conditional expectation $E_{\varepsilon,{\boldsymbol{X}}_{\underline{\alpha}}}(Y\vert X_\alpha)$ where the expectation is not only taken on the error distribution but also on all other regressors. (Note, that we usually suppressed the $\varepsilon$ in all expectations up to now. This is the only case where we need to explicitely mention on which distribution the expectation is calculated.)

As already indicated, in case of true additivity the marginal effects correspond exactly to the additive component functions $g_\alpha$ . The estimator here is based on an integration idea, coming from the following observation. Denote by $f_\alpha$ the marginal density of $X_\alpha$ . We have from (8.2)

EXAMPLE 8.3
Suppose we have a data generating process of the form

$\displaystyle Y = 4 + X_1^2 + 2\cdot \sin (X_2) + \varepsilon \,,$

where $X_1 \sim U[-2,2]$ and $X_2 \sim U[-3,3]$ uniformly distributed and $\varepsilon$ a regular, possibly heteroscedastic noise term. The regression function obviously is

$\displaystyle m(x_1,x_2) = E(Y\vert{\boldsymbol{X}}={\boldsymbol{x}}) = 4 + x_1^2 + 2\cdot \sin (x_2).$

We have consequently marginal expectations

$\displaystyle E_{X_2}\{m(X_1,X_2)\} = \int_{-3}^3 \frac{1}{6} \left\{ 4 + X_1^2 + 2\cdot \sin (u) \right\} du = 4 + X_1^2 \,,$

$\displaystyle E_{X_1}\{m(X_1,X_2)\} = \int_{-2}^2 \frac{1}{4} \left\{ 4 + u^2 + 2\cdot \sin (X_2) \right\} du = \frac{16}{3} + 2\cdot \sin (X_2)\,.$

This yields the component functions

$\displaystyle g_1 (x_1) =x_1^2-\frac 43\,,\quad g_2 (x_2)= 2 \sin (x_2)\,,\quad\textrm{and}\quad c=\frac{16}{3}\,$

which are normalized to $E_{X_\alpha}g_\alpha(X_\alpha)=0$ $\Box$ .

Many extensions and modifications of the integration approach have been developed recently. We consider now the simultaneous estimation of both the functions and their derivatives by combining the procedure with a local polynomial approach (Subsections 8.2.1, 8.2.2) and the estimation of interaction terms (Subsection 8.2.3).

8.2.1 Estimation of Marginal Effects

In order to estimate the marginal effect $g_\alpha (X_\alpha)$ , equation (8.12) suggests the following idea: First estimate the function $m(\bullet)$ with a multidimensional pre-smoother $\widetilde{m}$ , then integrate out the variables different from $X_\alpha$ . In the estimation procedure integration can be replaced by averaging (over the directions not of interest, i.e. ${\boldsymbol{X}}_{\underline{\alpha}}$ ) resulting in

Note that to get the marginal effects, we just integrate $\widetilde{m}$ over all other (the nuisance) directions $\underline{\alpha}$ . In case of additivity these marginal effects are the additive component functions $g_\alpha$ plus the constant

. As for backfitting, the constant

can be estimated consistently by $\widehat{c} = \overline{Y}$ at $\sqrt{n}$ -rate. Hence, a possible estimate for $g_\alpha$ is

It remains to discuss how to obtain a reasonable pre-estimator $\widetilde{m}(x_\alpha ,{\boldsymbol{x}}_{l\underline{\alpha}})$ . Principally this could be any multivariate nonparametric estimator. We make use here of a special type of multidimensional local linear kernel estimators, cf. Ruppert & Wand (1994) and Severance-Lossin & Sperlich (1999). This estimator is given by minimizing

To derive asymptotic properties of these estimators the concept of equivalent kernels is used, see e.g. Ruppert & Wand (1994). The main idea is that the local polynomial smoother of degree

is asymptotically equivalent (i.e. has the same leading term) to a kernel estimator using a higher order kernel given by

8.2.2 Derivative Estimation for the Marginal Effects

We now extend the marginal integration method to the estimation of derivatives of the functions $g_\alpha(\bullet)$ . For additive linear functions their first derivatives are constants and so all higher order derivatives vanish. However, very often in economics the derivatives of the marginal effects are of essential interest, e.g. to determine the elasticities or returns to scale as in Example 8.5.

To estimate the derivatives of the additive components, we do not need any further extension of our method, since using a local polynomial estimator of order

for the pre-estimator $\widetilde{m}$ provides us simultaneously with both component functions and derivative estimates up to degree

. The reason is that the optimal $\beta_\nu$ in equation (8.16) is an estimate for $g_\alpha^{(\nu)}(X_\alpha ) / \nu !$ , provided that dimension $\alpha$ is separable from the others. In case of additivity this is automatically given.

Additionally, for the regression function estimate constructed by the additive component estimates $\widehat{g}_\alpha$ and $\widehat{c}$ , i.e.

In the following example we illustrate the smoothing properties of this estimator for $\nu = 0$ and

**Figure 8.4:** Estimated local linear (solid line) versus true additive component functions (circles at the input values)
$\includegraphics[width=1.4\defpicwidth]{SPMdemoi1.ps}$

8.2.3 Interaction Terms

As pointed out before, marginal integration estimates marginal effects. These are identical to the additive components, if the model is truly additive. But what happens if the underlying model is not purely additive? How do the estimators behave when we have some interaction between explanatory variables, for example given by an additional term $g_{\alpha j}(X_\alpha,X_j)$ ?

An obvious weakness of the truly additive model is that those interactions are completely ignored, and in certain econometric contexts -- production function modeling being one of them -- the absence of interaction terms has often been criticized. For that reason we will now extend the regression model by pairwise interactions resulting in

For the marginal integration estimator bivariate interaction terms have been studied in Sperlich et al. (2002). They provide asymptotic properties and additionally introduce test procedures to check for significance of the interactions. In the following we will only sketch the construction of the relevant estimation procedure and its application. For the theoretical results we remark that they are higher dimensional extensions of Theorem 8.3 and refer to the above mentioned article.

For the estimation of (8.20) by marginal integration we have to extend our identification condition

As before, equations (8.21) and (8.22) should not be considered as restrictions. It is always possible to shift the functions $g_{\alpha}$ and $g_{\alpha j }$ in the vertical direction without changing the functional forms or the overall regression function. Moreover, all models of the form (8.20) are equivalent to exactly one model satisfying (8.21) and (8.22).

According to the definition of ${\boldsymbol{X}}_{\underline{\alpha}}$ , let ${\boldsymbol{X}}_{\underline{\alpha j}}$ now denote the

-dimensional random variable obtained by removing $X_\alpha$ and

from ${\boldsymbol{X}}=(X_1,\ldots ,X_d)^\top$ . With some abuse of notation we will write ${\boldsymbol{X}}=(X_\alpha ,X_ j ,{\boldsymbol{X}}_{\underline{\alpha j }})$ to highlight the directions in

-dimensional space represented by the $\alpha$ and

coordinates. We denote the marginal densities of $X_\alpha$ , ${\boldsymbol{X}}_{\underline{\alpha j}}$ and ${\boldsymbol{X}}$ by $f_\alpha (x_\alpha )$ , $f_{\underline{\alpha j }}( {\boldsymbol{x}}_{\underline{\alpha j }})$ , and $f ({\boldsymbol{x}})$ , respectively.

Using the same estimation procedure as described above, i.e., replacing the expectations by averages and the function

by an appropriate pre-estimator, we get estimates for $g_{\alpha}$ and for the interaction terms $g_{\alpha j }$ . For the ease of notation we give only the formula for

, i.e., the local linear estimator, in the pre-estimation step. We obtain

Finally, let us turn to an example, which presents the application of marginal integration estimation, derivative (elasticity) estimation and allows us to illustrate the use of interaction terms.

EXAMPLE 8.5
Our illustration is based on the example and the data used in Severance-Lossin & Sperlich (1999) who investigated a production function for livestock in Wisconsin. The main interest here is to estimate the impact of various regressors, their return to scale and hence on derivative estimation. Additionally, we validate the additivity assumption by estimating the interaction terms $g_{\alpha j }$ .

We use a subset of observations of an original data set of more than 1,000 Wisconsin farms collected by the Farm Credit Service of St. Paul, Minnesota in 1987. Severance-Lossin & Sperlich (1999) removed outliers and incomplete records and selected farms which only produced animal outputs. The data consist of farm level inputs and outputs measured in dollars. In more detail, output is livestock, and the input variables are

: family labor force,
: hired labor force,
: miscellaneous inputs (as e.g. repairs, rent, custom hiring, supplies, insurance, gas),
: animal inputs (as e.g. purchased feed, breeding, or veterinary services),
: intermediate run assets, that is assets with a useful life of one to ten years.

To get an idea of the distribution of the regressors one could plot kernel density estimates for each of them. You would recognize that the regressors are behaving almost normally distributed so that an application of kernel smoother methods should not cause serious numerical problems.

**Figure 8.5:** Function estimates for the additive components and observations (left), derivative estimates for the parametric (thin lines) and the nonparametric case (right), variables to
$\includegraphics[width=1.4\defpicwidth]{SPMfafam.ps}$ $\includegraphics[width=1.4\defpicwidth]{SPMfahir.ps}$ $\includegraphics[width=1.4\defpicwidth]{SPMfamis.ps}$

**Figure 8.6:** Function estimate for the additive components and observations (left), derivative estimates for the parametric (thin lines) and the nonparametric case (right), variables and
$\includegraphics[width=1.43\defpicwidth]{SPMfaani.ps}$ $\includegraphics[width=1.43\defpicwidth]{SPMfaass.ps}$

A purely additive model (ignoring any possible interaction) is of the form

$\displaystyle \log \left( Y\right) =c +\sum\limits_{\alpha =1}^d g_\alpha \left\{ \log (X_\alpha) \right\} +\varepsilon\,.$

(8.26)

This model can be viewed as a generalization of the Cobb-Douglas production, where $g_\alpha \left\{ \log (X_\alpha) \right\} = \beta_\alpha \log (X_\alpha )$ (see (5.4)). Additionally, we allow for inclusion of interaction terms $g_{\alpha j }$ and obtain

$\displaystyle \log \left( Y\right) =c + \sum\limits_{\alpha =1}^d g_\alpha \lef... ...e d} g_{\alpha j} \left\{ \log (X_\alpha) ,\log (X_ j) \right\} +\varepsilon\,.$

(8.27)

The important point to understand in marginal integration is that the estimation of the one-dimensional component functions is not affected by the inclusion of interaction terms. This means, whether we estimate in model (8.26) or in (8.27) does not change the results for the estimation of the marginal functions $g_\alpha$ .

The results are given in Figures 8.5, 8.6. We use (product) Quartic kernels for all dimensions and bandwidths proportional to the standard deviation of $X_\alpha$ ( $h_\alpha=1.5\,\sigma_\alpha$ , $\widetilde{h}_\alpha=4\,h_\alpha$ ). It is known that the integration estimator is quite robust against different choices of bandwidths, see e.g. Sperlich et al. (1999).

To highlight the shape of the estimates we display the main part of the point clouds including the function estimates. The graphs give some indication of nonlinearity, in particular for , and . The derivatives seem to indicate that the elasticities for these inputs increase and could finally lead to increasing returns to scale. Note that for all dimensions (especially where the mass of the observations is located) the nonparametric results differ a lot from the parametric case. An obvious conclusion from the economic point of view is that for instance larger farms are more productive (intuitively quite reasonable).

**Figure 8.7:** Estimates for interaction terms for Wisconsin farm data
$\includegraphics[width=1.45\defpicwidth]{SPMinter11.ps}$

**Figure 8.8:** Estimates for interaction terms for Wisconsin farm data
$\includegraphics[width=1.45\defpicwidth]{SPMinter12.ps}$

In Figures 8.7 to 8.8 we present the estimates of the bivariate interaction terms $g_{\alpha j }$ . For their estimation and graphical presentation we trimmed the data by removing $2\%$ of the most extreme observations. Again Quartic kernels were used, here with bandwidths $h_\alpha = 1.7 \,\sigma_\alpha$ and $\widetilde{h}_\alpha$ as above.

Obviously, often it is hard to interpret those interaction terms. But as long as we can visualize relationships a careful interpretation can be tried. Sperlich et al. (2002) find that a weak form of interaction is present. The variable (family labor) plays an important role in the interactions, especially $g_{1,3}$ (family labor and miscellaneous inputs), $g_{1,5}$ (family labor and intermediate run assets) and $g_{3,5}$ (miscellaneous inputs and intermediate run assets) should be taken into account. $\Box$

$\displaystyle \int m({\boldsymbol{x}}) f_{\underline{\alpha}}({\boldsymbol{x}}_{\underline{\alpha}}) \prod_{k \neq \alpha }dX_k$	$\displaystyle =$	$\displaystyle E_{{\boldsymbol{X}}_{\underline{\alpha}}} [ m(X_\alpha,{\boldsymbol{X}}_{\underline{\alpha}}) ]$
	$\displaystyle =$	$\displaystyle E_{{\boldsymbol{X}}_{\underline{\alpha}}} \{ c+g_\alpha (X_\alpha)+\sum_{k\neq \alpha} g_k (X_k) \}$
	$\displaystyle =$	$\displaystyle c+g_\alpha (X_\alpha ) \,.$	(8.12)

$\displaystyle \widehat{g}_\alpha (\bullet)$	$\displaystyle =$	$\displaystyle \widehat{ \left\{ g_\alpha (\bullet)+c \right\}} - \frac 1{n}\sum_{i=1}^n \widehat{ \left\{ g_\alpha (X_{i\alpha})+c \right\}}$
	$\displaystyle =$	$\displaystyle \frac 1n \sum_{i=1}^n \widetilde{m}(\bullet ,{\boldsymbol{X}}_{i\... ..._{l=1}^n \widetilde{m}(X_{i\alpha} ,{\boldsymbol{X}}_{l\underline{\alpha}}) \,.$	(8.15)

$\displaystyle {\sum_{i=1}^n \{ Y_i-\beta_0-\beta_1(X_{i\alpha}-x_\alpha )- \ldots -\beta_p (X_{i\alpha}-x_\alpha)^p \}^2}$
		$\displaystyle \quad\quad\quad\quad \cdotp K_h(X_{i\alpha}-x_\alpha) {\mathcal{K... ...\boldsymbol{X}}_{i\underline{\alpha}} - {\boldsymbol{x}}_{l\underline{\alpha}})$	(8.16)