3.3 Partially Linear EIV Models


sf = 6952 eivplmnor (w, t, y, sigma, h)
computes statistical characteristics for the partially linear EIV models

Partially linear eiv models relate a response $ Y$ to predictors $ (X,T)$ with mean function $ \beta^T X +g(T)$, where the regressors $ X$ are measured with additive errors, that is,

$\displaystyle Y$ $\displaystyle =$ $\displaystyle X^T\beta+g(T)+\varepsilon$  
$\displaystyle W$ $\displaystyle =$ $\displaystyle X+U,$ (3.8)

where the variable $ U$ is independent of $ (Y, X, T)$ with mean zero and $ Var(U)=\Sigma_{uu}$, $ E(\varepsilon \vert X, T)=0$ and $ E(\varepsilon ^2\vert X,T)=\sigma^2(X,T)
<\infty.$

Here, we only introduce the conclusions. The related proofs and discussions can be found in Liang, Härdle, and Carroll (1999).


3.3.1 The Variance of Error Known

In EIV linear regression, inconsistency caused by the measurement error can be overcome by applying the so-called correction for attenuation. In our context, this suggests that we use the estimator

$\displaystyle \widehat {\beta}_n =
({\widetilde W}^T{\widetilde W}-n \Sigma_{uu})^{-1}{\widetilde W}^T{\widetilde Y}.$     (3.9)

In some cases, we assume that the model errors $ \varepsilon _i$ are homoscedastic with common variance $ \sigma^2$. In this event, since $ E\{Y_i-X_i^T\beta-g(T_i)\}^2=\sigma^2$ and $ E\{Y_i-W_i^T\beta-g(T_i)\}^2=E\{Y_i-X_i^T\beta-g(T_i)\}^2
+\beta^T\Sigma_{uu}\beta$, we define

$\displaystyle \widehat {\sigma}_n^2=n^{-1}\sum_{i=1}^n
(\widetilde Y_i-\widetil...
...T{\widehat {\beta}_n})^2
-{\widehat {\beta}_n}^T\Sigma_{uu}{\widehat {\beta}_n}$     (3.10)

as the estimator of $ \sigma^2$.

THEOREM 3.1   Suppose that certain conditions hold and $ E(\varepsilon ^4+\Vert U\Vert^4)<\infty$. Then $ {\widehat {\beta}_n}$ is an asymptotically normal estimator, i.e.,
$\displaystyle n^{1/2}({\widehat {\beta}_n}-\beta)\stackrel{\cal L}{\longrightarrow }N(0, \Sigma ^{-1}\Gamma \Sigma ^{-1}),$      

where $ \Sigma=E\{X-E(X\vert T)\}^{\otimes2}, $ $ \Gamma=E[(\varepsilon -U^T\beta)\{X - E(X\vert T)\}]^{\otimes2}
+E\{(U U^T-\Sigma_{uu})\beta\}^{\otimes2}+E(UU^T\varepsilon ^2)$. Note that $ \Gamma=E(\varepsilon -U^T\beta)^2\Sigma +E\{(U U^T-\Sigma_{uu})\beta\}^{\otimes2}
+\Sigma_{uu}\sigma^2$ if $ \varepsilon $ is homoscedastic and independent of $ (X,T)$, where $ A^{\otimes2}=A\cdot A^T$.

THEOREM 3.2   Under the same conditions as that of Theorem 3.1, if the $ \varepsilon $'s are homoscedastic with variance $ \sigma^2$, and independent of $ (X,T)$. Then
$\displaystyle n^{1/2}(\widehat {\sigma}_n^2-\sigma^2)\stackrel{\cal L}{\longrightarrow } N(0, \sigma_*^2),$      

where $ \sigma_*^2=E\{(\varepsilon -U^T\beta)^2-(\beta^T\Sigma_{uu}\beta+\sigma^2)\}^2$.


3.3.2 The Variance of Error Unknown

The technique of partial replication is adopted when $ \Sigma_{uu}$ is unknown and must be estimated. That is, we observe $ W_{ij} = X_i + U_{ij}, \ \ \ j=1,...m_i$.

We consider here only the usual case that $ m_i \leq 2$, and assume that a fraction $ \delta$ of the data has such replicates. Let $ \overline {W}_i$ be the sample mean of the replicates. Then a consistent, unbiased method of moments estimate for $ \Sigma_{uu}$ is

$\displaystyle \widehat \Sigma_{uu}=\frac{\sum_{i=1}^n\sum_{j=1}^{m_i}(W_{ij}-\overline W_i)^{\otimes2}}
{\sum_{i=1}^n(m_i-1)}.$      

The estimator changes only slightly to accommodate the replicates, becoming
$\displaystyle \widehat {\beta}_n$ $\displaystyle =$ $\displaystyle \left[\sum_{i=1}^n
\left\{\overline {W}_i - \widehat {g}_{w,h}(T_i) \right\}^{\otimes2}
- n (1 - \delta/2)\widehat \Sigma_{uu}\right]^{-1}$  
    $\displaystyle \times \sum_{i=1}^n
\left\{\overline {W}_i - \widehat {g}_{w,h}(T_i) \right\}
\left\{Y_i - \widehat {g}_{y,h}(T_i) \right\},$ (3.11)

where $ \widehat {g}_{w,h}(\cdot)$ is the kernel regression of the $ \overline {W}_i$'s on $ T_i$.

The limit distribution of (3.11) is $ \hbox{N}(0, \Sigma ^{-1}\Gamma_2 \Sigma ^{-1})$, with

$\displaystyle \Gamma_2$ $\displaystyle =$ $\displaystyle (1-\delta)E\left[(\varepsilon -U^T\beta)\{X - E(X\vert T)\}\right]^{\otimes2}$  
    $\displaystyle +
\delta E\left[(\varepsilon -\overline {U}^T\beta)\{X - E(X\vert T)\}\right]^{\otimes2}$  
    $\displaystyle +(1-\delta)E\left(\bigl[\{ U U^T-(1-\delta/2)\Sigma_{uu}\}
\beta\bigr]^{\otimes2}+UU^T\varepsilon ^2\right)$  
    $\displaystyle +\delta
E\left(\bigl[\{\overline U\overline U^T-(1-\delta/2)\Sigma_{uu}\}
\beta\bigr]^{\otimes2}+\overline U\overline U^T\varepsilon ^2\right).$ (3.12)

In (3.12), $ \overline {U}$ refers to the mean of two $ U$'s. In the case that $ \varepsilon $ is independent of $ (X,T)$, the sum of the first two terms simplifies to $ \{\sigma^2+\beta^T(1-\delta/2)\Sigma_{uu}\beta\}\Sigma $.


3.3.3 XploRe Calculation and Practical Data

The quantlet 7253 eivplmnor estimates the parameters of partially linear eiv model, with the assumption that the conditional distribution of $ Y$ given $ X$ and $ T$ is normally distributed. We show the following example:

  library("xplore")
  library("eiv")
  n = 100
  randomize(n)
  sigma = 0.0081
  b = 1|2
  p = rows(b)
  x = 2.*uniform(n,p)-1        ; latent variable
  t = sort(2.*uniform(n)-1,1)  ; observable variable
  w = x+sqrt(sigma)*uniform(n) ; manifest variable
  m = 0.5*cos(pi.*t)+0.5*t
  y = x*b+m+normal(n)./2
  h=0.5
  sf = eivplmnor(w,t,y,sigma,h)
  b~sf.b                       ; estimates of b and g(t)
  dds = createdisplay(1,1)
  datah1=t~m
  datah2=t~sf.m
  part=grid(1,1,rows(t))'
  setmaskp(datah1,1,0,1)
  setmaskp(datah2,4,0,3)
  setmaskl(datah1,part,1,1,1)
  setmaskl(datah2,part,4,1,3)
  show(dds,1,1,datah1,datah2)
7259 XAGeiv11.xpl

A partially linear fit for $ E(y\vert x,t)$ is computed. sf.b contains the coefficients for the linear part. sf.m contains the estimated nonparametric part evaluated at observations t, see Figure 3.7. There the thin curve line represents true data and the thick one does the nonparametric estimates.

Figure 3.7: Output display for partially linear EIV example
\includegraphics[scale=0.6]{eivplmnortu}

We now use the quantlet 7265 eivplmnor to calculate practical data from the Framingham heart study. In this data set, the response variable $ Y$ is the average blood pressure in a fixed 2-year period, $ T$, the age and $ W$, the logarithm of the observed cholesterol level, for which there are two replicates.

For the purpose of illustration, we only use the first cholesterol measurement. The measurement error variance is obtained in the previous analysis. The estimate of $ \beta$ is 9.438 with the standard error $ 0.187$. For nonparametric fitting, we choose the bandwidth using cross-validation to predict the response. Precisely we compute the squared error using a geometric sequence of 191 bandwidths ranging in $ [1, 20]$. The optimal bandwidth is selected to minimize the square error among these 191 candidates. An analysis ignoring measurement error found some curvature in $ T$, see Figure 3.8 for the estimate of $ g(T)$.

Figure 3.8: Framingham data study
\includegraphics[scale=0.6]{hua02tu}