When the relationship between response and the covariates is nonlinear and the covariates are measured with errors, the models are called nonlinear EIV models. There is a numerous body of literature on the nonlinear EIV models (the monograph by Carroll, Ruppert, and Stefanski (1995) gives a good overview of the nonlinear methods). In this section we mainly describe two simple approximate techniques for handling measurement error in the analysis of nonlinear EIV models. The presentation here is based on Carroll, Ruppert, and Stefanski (1995).
We denote the dependent variable by , the
variables observed with error by
, the variables measured without
error by
, and the manifest variable by
. We define a nonlinear
errors-in-variables model as:
Two classes of nonlinear eiv models are considered:
![]() |
![]() |
The example considered in this section is an
occupational study on the relationship between dust concentration and
chronic bronchitis. In the study, workers of a cement plant
in Heidelberg were observed from 1960 to 1977.
The response
is the appearance of chronic
bronchitis, and the correctly measured covariates
are smoking and
duration of exposure. The effect of the dust concentration in the
individual working area
is of primary interest in the study. This
concentration was measured several times in a certain time period and
averaged, leading to the surrogate
for the concentration.
Ignoring the ME, we conducted a logistic regression with the response chronic bronchitis and the regressors log(1+dust concentration), duration (in years), and smoking. The calculations were conducted by XploRe with the following commands:
dat = read("heid.dat") y = dat[,1] w = dat[,2] z = dat[,3] library("glm") doglm(w~z y)
In interactive modeling, the binomial distribution and the logistic link have to be chosen for the GLM. The output table from XploRe for the logistic model is given in Figure 3.5.
Regression calibration was suggested as a general approach by
Carroll and Stefanski (1990) and
Gleser (1992). The idea of this method is to replace the unobserved
by its expected value
and then to perform a standard EIV
analysis, since the latent variable
is approximated by the regression
. The corresponding
XploRe
quantlet is called
reca
and
is discussed below.
res = reca(y, w, z, su2)
Input parameters:
We give an example to explain this code. Let's come back to the Heidelberg data.
library("xplore") library("eiv") v=read("heid.dat") y=v[,1] w=v[,2] z=v[,3] su2=var(w)/4 res=reca(y,w,z,su2)
The estimate of the slope parameter of the dust concentration is
with standard error
, compared to the naive estimates
(s.e. 0.8641).
Here, the shape of the curve is similar to that obtained by the naive model.
The quantlet
reca
uses binomial distribution with logistic link.
Notice that
reca
also calls the interactive
quantlet
doglm
which produces the graphical output
which is given in Figure 3.6.
Simulation extrapolation is a
complementary approximate method that shares the simplicity of regression
calibration and is well suited to problems with additive measurement error.
This is a simulation-based method for estimating and reducing bias due
to measurement error. The estimates are obtained by adding
additional measurement error to the data in a resampling-like stage,
establishing a trend of measurement error, and extrapolating this
trend back to the case of no measurement error. For a detailed
explanation of this method, see
Carroll, Ruppert, and Stefanski (1995).
The quantlet
simex
implements calculation in
XploRe
.
Its syntax is
library("eiv") gest = simex(y,w,z,su2,lam,b)
where the input parameters are:
Consider the Heidelberg data again.
library("xplore") library("eiv") V=read("heid.dat") y=V[,1] w=V[,2] z=V[,3] sw2=var(w) su2=sw2/4 lam=aseq(0.01,6,0.5) B=20 gest=simex(y,w,z,su2,lam,B) gest