In this section, we study the nonlinear regression model
We first discuss the fitting and inference in the nonlinear regression (Sects. 8.2.1 and 8.2.2), whereby we again concentrate on the least square estimation. For an extensive discussion of theory and practice of nonlinear least squares regression see monographs [3], [8] and [80]. Second, similarly to the linear modeling section, methods for ill-conditioned nonlinear systems are briefly reviewed in Sect. 8.2.3.
In this section, we concentrate on estimating the vector
of unknown parameters in (8.16) by
nonlinear least squares.
Contrary to the linear model fitting, we cannot express analytically
the solution of this optimization problem for a general
function . On the other hand, we can try to approximate the
nonlinear objective function using the Taylor expansion because the
existence of the first two derivatives of
is an often used
condition for the asymptotic normality of
, and thus, could be
readily assumed. Denoting
and
, we
can state the following theorem from [4].
Hence, although there is no general explicit solution
to (8.17), we can assume without loss of much generality
that the objective function
is twice differentiable in
order to devise a numerical optimization algorithm. The second-order
Taylor expansion provides then a quadratic approximation of the
minimized function, which can be used for obtaining an approximate
minimum of the function, see [3]. As a result, one
should search in the direction of the steepest descent of a function,
which is given by its gradient, to get a better approximation of the
minimum. We discuss here the incarnations of these methods
specifically for the case of quadratic loss function
in (8.17).
The classical method based on the gradient approach is Newton's
method, see [54] and [3] for detailed
discussion. Starting from an initial point
, a better
approximation is found by taking
To find
, (8.19) is iterated until convergence is
achieved. This is often verified by checking whether the relative
change from
to
is sufficiently
small. Unfortunately, this criterion can indicate a lack of progress
rather than convergence. Instead, [8] proposed to check
convergence by looking at some measure of orthogonality of residuals
towards the regression surface given by
,
since the identification assumption of model (8.16) is
. See [13],
[54] and [92] for more details and further
modifications.
To evaluate iteration (8.19), it is necessary to invert the
Hessian matrix
. From the computational point of
view, all issues discussed in Sect. 8.1 apply here too and
one should use a numerically stable procedure, such as QR or SVD
decompositions, to perform the inversion. Moreover, to guarantee
that (8.19) leads to a better approximation of the minimum,
that is
, the Hessian matrix
needs to be positive definite, which in
general holds only in a neighborhood of
(see the
Levenberg-Marquardt
method for a remedy). Even if it is so, the step in the gradient
direction should not be too long, otherwise we ''overshoot.''
Modified Newton's method addresses this by using some fraction
of iteration step
. See [12],
[31] and [54] for some choices of
.
The Gauss-Newton method is designed specifically for
by replacing
the regression function
in (8.17) by its
first-order Taylor expansion. The resulting iteration step is
Depending on data and the current approximation
of
, the Hessian matrix
or its approximations
such as
can be
badly conditioned or not positive definite, which could even result in
divergence of Newton's method (or a very slow convergence in the case
of modified Newton's method). The Levenberg-Marquardt method addresses
the ill-conditioning by choosing the search direction
as a solution of
Although Newton's method and its modifications are most frequently used in applications, the fact that they find local minima gives rise to various improvements and alternative methods. They range from simple starting the minimization algorithm from several (randomly chosen) initial points to general global-search optimization methods such as genetic algorithms mentioned in Sect. 8.1.3 and discussed in more details in Chaps. II.5 and II.6.
Similarly to linear modeling, the inference in nonlinear regression
models is mainly based, besides the estimate
itself, on two quantities: the residual
sum of squares
and the
(asymptotic) variance of the estimate
, see (8.18). Here we discuss how to compute
these quantities for
and its functions.
will be typically a by-product of a numerical computation
procedure, since it constitutes the minimized function.
also
provides an estimate of
:
. The same also
holds for the matrix
, which can be consistently estimated
by
,
that is, by the asymptotic representation of the Hessian matrix
. This matrix or its approximations are
computed at every step of (quasi-)Newton methods for
, and thus, it
will be readily available after the estimation.
Furthermore, the inference in nonlinear regression models may often
involve a nonlinear (vector) function of the estimate
;
for example, when we test a hypothesis (see [3], for
a discussion of
hypothesis testing). Contrary to linear functions
of estimates, where
, there is no exact expression
for
in a general case. Thus, we usually assume
the first-order differentiability of
and use the Taylor
expansion to approximate this variance. Since
![]() |
![]() |
Similarly to linear modeling, the nonlinear models can also be
ill-conditioned when the Hessian matrix
is nearly
singular or does not even have a full rank, see
Sect. 8.1.2. This can be caused either by the nonlinear
regression function
itself or by too many explanatory variables
relative to sample size
. Here we mention extensions of methods
dealing with ill-conditioned problems in the case of linear models
(discussed in Sects. 8.1.5-8.1.9) to nonlinear
modeling: ridge regression, Stein-rule estimator, Lasso, and partial
least squares.
First, one of early nonlinear
was proposed by [21], who
simply added a diagonal matrix to
in (8.19). Since the nonlinear modeling is done by
minimizing of an objective function, a more straightforward way is to
use the alternative formulation (8.11) of
and to
minimize
Next, equally straightforward is an application of Stein-rule
estimator (8.8) in nonlinear regression, see
[56] for a recent study of positive-part Stein-rule
estimator within the Box-Cox model. The same could possibly apply to
Lasso-type estimators discussed in Sect. 8.1.8 as well: the
Euclidian norm
in (8.22) would just have
to be replaced by another
norm. Nevertheless, the behavior of
Lasso within linear regression has only recently been studied in more
details, and to my best knowledge, there are no results on Lasso in
nonlinear models yet.
Finally, there is a range of modifications of
designed for
nonlinear regression
modeling, which either try to make the relationship between dependent
and expl variables linear in unknown parameters or deploy an
intrinsically nonlinear model. First, the methods using linearization
are typically based on approximating a nonlinear relationship by
higher-order polynomials (see quadratic
by [107], and INLR
approach by [10]) or a piecewise constant approximation
(GIFI approach, see [11]). [108] present an
overview of these methods. Second, several recent works introduced
intrinsic nonlinearity into
modeling. Among most important
contributions, there are [77] and [64] modeling the
nonlinear relationship using a forward-feed neural network,
[106] and [25] transforming predictors by
spline functions, and [5] using fuzzy-clustering
regression approach.