3.2 Nonlinear EIV Models


res = 6635 reca (y, w, z, su2)
implementation of regression calibration
res = 6638 simex (y, w, z, su2, lam, b)
implementation of simulation extrapolation

When the relationship between response and the covariates is nonlinear and the covariates are measured with errors, the models are called nonlinear EIV models. There is a numerous body of literature on the nonlinear EIV models (the monograph by Carroll, Ruppert, and Stefanski (1995) gives a good overview of the nonlinear methods). In this section we mainly describe two simple approximate techniques for handling measurement error in the analysis of nonlinear EIV models. The presentation here is based on Carroll, Ruppert, and Stefanski (1995).

We denote the dependent variable by $ Y$, the variables observed with error by $ X$, the variables measured without error by $ Z$, and the manifest variable by $ W$. We define a nonlinear errors-in-variables model as:

$\displaystyle E(Y\vert X)$ $\displaystyle =$ $\displaystyle g(X)$  
$\displaystyle W$ $\displaystyle =$ $\displaystyle X+U$ (3.7)

Two classes of nonlinear eiv models are considered:

From the viewpoint of measurement error construction, the usual model is typically restricted on the classical additive measurement error model:
$\displaystyle W = X +u \textrm{ with } E(u\vert X, Z) = 0.$      

In the controlled variable model, the measurement error model has the form:
$\displaystyle X = W +U' \textrm{ with } E(U'\vert W) = 0.$      

The example considered in this section is an occupational study on the relationship between dust concentration and chronic bronchitis. In the study, $ N= 499$ workers of a cement plant in Heidelberg were observed from 1960 to 1977. The response $ Y$ is the appearance of chronic bronchitis, and the correctly measured covariates $ Z$ are smoking and duration of exposure. The effect of the dust concentration in the individual working area $ X$ is of primary interest in the study. This concentration was measured several times in a certain time period and averaged, leading to the surrogate $ W$ for the concentration.

Ignoring the ME, we conducted a logistic regression with the response chronic bronchitis and the regressors log(1+dust concentration), duration (in years), and smoking. The calculations were conducted by XploRe with the following commands:

  dat = read("heid.dat")
  y   = dat[,1]
  w   = dat[,2]
  z   = dat[,3]
  library("glm")
  doglm(w~z y)

In interactive modeling, the binomial distribution and the logistic link have to be chosen for the GLM. The output table from XploRe for the logistic model is given in Figure 3.5.

Figure: XploRe output display the Heidelberg data
\includegraphics[scale=0.6]{nonlineartu1}


3.2.1 Regression Calibration

Regression calibration was suggested as a general approach by Carroll and Stefanski (1990) and Gleser (1992). The idea of this method is to replace the unobserved $ X$ by its expected value $ E(X\vert W, Z)$ and then to perform a standard EIV analysis, since the latent variable $ X$ is approximated by the regression $ E(X\vert W, Z)$. The corresponding XploRe quantlet is called 6770 reca and is discussed below.

  res = reca(y, w, z, su2)

Input parameters:

y
$ n \times 1 $ matrix, the design variables,
w
$ n \times 1 $ matrix,
z
$ n \times 1 $ matrix,
su2
scalar, the variance of measurement error.
Output:
res.beta
vector, the estimate,
res.bv
matrix, the variance of the estimate.

We give an example to explain this code. Let's come back to the Heidelberg data.

  library("xplore")
  library("eiv")
  v=read("heid.dat")
  y=v[,1]
  w=v[,2]
  z=v[,3]
  su2=var(w)/4
  res=reca(y,w,z,su2)
6774 XAGeiv09.xpl

The estimate of the slope parameter of the dust concentration is $ 2.9193$ with standard error $ 0.9603$, compared to the naive estimates $ 2.54428$ (s.e. 0.8641). Here, the shape of the curve is similar to that obtained by the naive model. The quantlet 6779 reca uses binomial distribution with logistic link. Notice that 6782 reca also calls the interactive quantlet 6785 doglm which produces the graphical output which is given in Figure 3.6.

Figure 3.6: RECA estimation
\includegraphics[scale=0.55]{eivrecanew}


3.2.2 Simulation Extrapolation

Simulation extrapolation is a complementary approximate method that shares the simplicity of regression calibration and is well suited to problems with additive measurement error. This is a simulation-based method for estimating and reducing bias due to measurement error. The estimates are obtained by adding additional measurement error to the data in a resampling-like stage, establishing a trend of measurement error, and extrapolating this trend back to the case of no measurement error. For a detailed explanation of this method, see Carroll, Ruppert, and Stefanski (1995). The quantlet 6876 simex implements calculation in XploRe . Its syntax is

  library("eiv")
  gest = simex(y,w,z,su2,lam,b)

where the input parameters are:

y
$ n \times 1 $ matrix, the design variables,
w
$ n \times 1 $ matrix,
z
$ n \times 1 $ matrix,
su2
the variance of the measurement error,
lam
pseudo-parameter for generating pseudo-errors,
b
the number of replication in each simulation.
The output is the list variable gest containing:
gest.simexl
the estimate based on linear extrapolant function
gest.simexq
the estimate based on quadratic extrapolant function

Consider the Heidelberg data again.

  library("xplore")  
  library("eiv")     
  V=read("heid.dat") 
  y=V[,1]            
  w=V[,2]            
  z=V[,3]            
  sw2=var(w)         
  su2=sw2/4          
  lam=aseq(0.01,6,0.5)
  B=20                       
  gest=simex(y,w,z,su2,lam,B) 
  gest
6884 XAGeiv10.xpl

As before, we assume that the ME is normal with variance $ \sigma_u^2=0.25*\sigma_z^2$. The results for $ \hat{\beta}_{\mathrm{SIMEX}}$ were $ 2.8109$ (linear) and $ 3.0051$ (quadratic).