7.6 Auxiliary Routines for Numerical Optimization


7.6.1 Gradient

The first derivative of a one-dimensional function $ f$ at $ x$ can be approximated by the difference $ f'(x)=\frac{1}{2h}\{f(x+h)-f(x-h)\}$ with a precision of $ O(h^2)$. Similarly, the partial derivative of function $ f$ of $ n$ variables $ x_i$, $ i=1,\dots,n$, with respect to the $ i$-th variable at point $ x=(x_1,\dots,x_n)$ can be approximated by $ \partial_{x_i} f = \frac{1}{2h}\{f(x_1,\dots,x_i+h,\dots,x_n)-f(x_1,\dots,x_i-h,\dots,x_n)\}$ with a precision of $ O(h^2)$.


7.6.2 Examples

Example 30384 XEGnum23.xpl approximates the first derivative of $ f(x) = x^3$ at $ x=0.1$ by the difference with a step $ h=0.05$. It shows Figure 7.12 and the following output:

Contents of string
[1,] "Gradient - analytic computation: 0.030000"

Contents of string
[1,] "Gradient - numeric computation : 0.032500"

Figure: Approximation of the gradient of $ f(y) = y^3$ at $ y=0.1$ by symmetric difference with $ h=0.05$, 30388 XEGnum23.xpl
\includegraphics[width=1.0\defepswidth]{XEGnum23.ps}

This example uses a quantlet


grad = 30398 nmgraddiff (fname,x0{,h})

implementing the symmetric difference approximation of gradient of fname at a given point x0. The input parameter fname should be a string containing a name of a function whose gradient is to be computed; x0 is a vector. One can influence the precision of the approximation by setting the optional parameter h--step of the symmetric difference; the step h can be set separately for each component--input a vector of steps in this case--or it can be the same for every component--inputing a single scalar is then enough. The output parameter grad contains the computed gradient.

Example 30401 XEGnum15.xpl computes the gradient of $ f(x) = 1 - x + y^2$ at point $ x_0 = (1,2)^T$. As one can easily see, $ \mathop{\rm grad }f(x) = (-1, 2y)^T$ and $ \mathop{\rm grad }f((1,2)) = (-1, 4)^T$:

Contents of grad
[1,]       -1
[2,]        4
30405 XEGnum15.xpl

If you need to compute the gradient of a given function more precisely, use the following quantlet:


z = 30417 nmgraditer (fname,x0{,h})

It calls iteratively the quantlet 30420 nmgraddiff with various values of parameter h and extrapolates the results to obtain the limit value for $ h = 0$. Ridder's polynomial extrapolation is used. The quantlet 30423 nmgraditer is used the same way as 30426 nmgraddiff . The output parameter z consists of the computed gradient z.grad, a vector of estimated errors of partial derivatives z.err and a vector z.hfin of stepsizes h is the last iteration.

REMARK 7.9   For differentiable functions, 30429 nmgraditer gives very precise results but the iterative process is more time-consuming, of course. Hence, use 30432 nmgraddiff for less accurate but quick results (e.g., in another iterative method) and 30435 nmgraditer for precise but slower computation.

REMARK 7.10   An advantage of 30438 nmgraditer is a fact that it computes also an estimated error of the computed gradient. In case of oscillating functions, it is advisable to compare the error with a computed value of the gradient: if the error estimate is relatively high, the initial stepsize should be decreased.

Table 7.1 generated by example 30441 XEGnum16.xpl compares the results of several gradient evaluations for $ f(x) = x \sin(1/x^2)$ at $ x_0 = 0.01$ (see Fig. 7.2). The gradient was computed analytically using the formula

$\displaystyle \mathop{\rm grad }f(x) = \sin \left( \frac{1}{x^2} \right) - \frac{2}{x^2}\cdot \cos \left( \frac{1}{x^2} \right)
$

and numerically using the quantlets 30444 nmgraddiff and 30447 nmgraditer . The table summarizes step $ h$ for 30450 nmgraddiff , initial and final step used in 30453 nmgraditer , the computed gradient and an error estimated by 30456 nmgraditer .


Table: Comparison of gradient evaluations, 30459 XEGnum16.xpl
\begin{table}\begin{center}
\begin{verbatim}Contents of out[ 1,] ==========...
...==========================================\end{verbatim} \end{center}\end{table}



7.6.3 Jacobian

The Jacobian of a vector function $ f$ (see Section 7.3.1) consists of the gradients of components $ f_i$, $ i=1,\dots,n$. Hence, we can use the analogous approximation as for the gradients, described in Section 7.6.1.


7.6.4 Examples

The quantlet


jacobi = 30608 nmjacobian (fname,x0{,h,iter})

computes not only the Jacobian of a vector function; it can compute the gradients of components of a vector function fname even when $ \dim f\neq \dim x_0$. Input the name of the function(s) whose gradients should be computed as a string or a vector of strings fname. The gradients will be computed at a point x0, using the quantlet 30611 graddiff when the parameter iter is equal to zero or not given, or using 30614 graditer otherwise--see Remark 7.9. In both cases, h is an input parameter for gradient-computing quantlet. The rows of output matrix jacobi contain the gradients of respective components of fname.

Example 30617 XEGnum18.xpl computes the Jacobian of a function $ f$ defined as $ f(x) = (\sin(x_1) + x_2^2, x_1 - x_2)$ at a point $ (\pi,2)^T$.

Contents of jacobi
[1,]       -1        4
[2,]        1       -1
30621 XEGnum18.xpl


7.6.5 Hessian

The partial derivative of the second order of a function $ f$ at a point $ x=(x_1,\dots,x_n)$ with respect to the $ i$-th variable (i.e., the diagonal of the Hessian) can be approximated by the symmetric difference
$\displaystyle \partial_i^2 f(x) \approx
\frac{1}{h^2} \{ f(x_1,\dots,x_i+h,\dots,x_n) - 2 f(x_1,\dots,x_n) + \quad$      
$\displaystyle + f(x_1,\dots,x_i-h,\dots,x_n) \}$      

with a precision of order $ O(h^2)$.

Let us suppose that the partial derivatives of the second order of $ f$ are continuous; then $ \partial^2_{ij} f = \partial^2_{ji} f$ for all $ i,j = 1,\dots,n$. The non-diagonal elements of the Hessian contain the mixed partial derivatives of the second order that can be approximated by the difference

\begin{displaymath}
\partial_{ij}^2 f(x) = \partial_{ji}^2 f(x) \approx
\frac{1}...
...1em} +~ f(x_1,\dots,x_i-h,\dots,x_j-h,\dots,x_n) \}
\end{array}\end{displaymath}

providing $ i<j$. The error of the approximation is $ O(h^2)$.


7.6.6 Example

The following quantlet can be used for numerical approximation of the Hessian:


hess = 30722 nmhessian (fname,x0{,h})

The input parameter fname is a string with a name of the function, x0 is a point (vector) at which the Hessian is to be computed. The optional parameter h is a stepsize $ h$ used in the approximation; it can influence the precision of the approximation. h can be either a vector or a scalar; in the first case, $ i$-th component of h is a step size for the difference in the $ i$-th variable, in the latter case, the same value h is used for all variables.

Example 30725 XEGnum19.xpl computes the Hessian of $ f(x) = (\cos(x_1)\cdot\sin(x_2))$ at a point $ (\pi/4,-\pi/4)^T$:

Contents of hess
[1,]      0.5     -0.5
[2,]     -0.5      0.5
30729 XEGnum19.xpl


7.6.7 Restriction of a Function to a Line

It is often useful to restrict a multidimensional function to a line (for example, in multidimensional optimization methods based on a series of one-dimensional optimizations) and deal with it as with a function of only one parameter. The function restricted to a line given by a point $ x_0$ and a direction $ direc$ is defined as $ f_{1D}(t) = f(x_0+t\cdot direc)$. Refer to the Section 7.6.9 for information on a derivative of a restricted function.


7.6.8 Example

The quantlet


ft = 30800 nmfunc1d (t)

restricts the function fname to a line: $ f(t) = fname(x0 + t\cdot direc)$. In the context of optimization, the main goal of a restriction of a function to a line is to get a function of only one variable. Therefore, given a function $ fname$, we construct a new one-dimensional function $ f$ defined as $ f(t) = fname(x0 + t\cdot direc)$. A variable t is the only parameter of $ f$. However, to be able to compute the values of $ f(t)$, the values of x0 and direc have to be given. They should be stored in the global variables nmfunc1dx and nmfunc1dd before calling 30803 nmfunc1d . The global variable nmfunc1dfunc should be a string with the name of a function computing $ fname$. The resulting value $ f(t)$ is returned in an output parameter ft.

Example 30806 XEGnum20.xpl evaluates $ f(x) = x_1^2 + 3(x_2 - 1)^4$ restricted to a line $ (2,-1) + t\cdot(0,1)$ at $ t = 3$ and produces the following result:

Contents of ft
[1,]        7
30810 XEGnum20.xpl


7.6.9 Derivative of a Restricted Function

Some line-minimizing methods use also the derivative of a restricted function (see Section 7.6.7), i.e., the derivative of a multidimensional function in a direction of a given line. Providing one has restricted the multidimensional function $ fname$ to $ f(t) = fname(x0 + t\cdot direc)$, its derivative can be computed either as a derivative of a one-dimensional function $ f'(t) = \frac{df(t)}{dt}$ or by multiplication of a local gradient of the original function $ fname$ by the direction vector of the line: $ f'(t) = direc^T \cdot \mathop{\rm grad } fname(x0 + t\cdot direc)$.


7.6.10 Example

The quantlet


fdert = 30940 nmfder1d (t)

computes the derivative of a function fname restricted to a line using the formula

$\displaystyle \frac{d(fname(x0 + t\cdot direc))}{dt} = direc^T \cdot \mathop{\rm grad } fname(x0 + t\cdot direc).
$

Similarly as 30943 nmfunc1d , it has only one input parameter t. The values x0 and direc are taken from the global variables nmfder1dx and nmfder1dd which are set before calling 30946 nmfder1d . The global variable nmfder1dfunc should be a string with a name of a function to be restricted. If a function computing the gradient of $ fname$ is available, the global variable nmfder1dfder should be set to its name. Otherwise, the gradient is computed numerically using the quantlet 30949 nmgraddiff : if nmfder1dfder is left empty, the default value of a step $ h$ is used in 30952 nmgraddiff or one can set nmfder1dfder to a value of $ h$ for the numerical approximation by 30955 nmgraddiff .

Example 30958 XEGnum21.xpl computes the derivative of $ f(x) = x_1^2 + 3(x_2 - 1)^4$ restricted to a line $ (2,-1) + t\cdot(0,1)$ at $ t = 3$:

Contents of fdert
[1,]       12
30962 XEGnum20.xpl