7.5 Minimization of a Function: Multidimensional Case

Several numerical methods for the multidimensional minimization will be described in this section. Many of such algorithms except the Nelder & Mead's method (see Section 7.5.1) use a one-dimensional minimization method in their individual iterations.

REMARK 7.6   Analogously to root-finding, we are generally not able to bracket a minimum in the multidimensional case.


7.5.1 Nelder and Mead's Downhill Simplex Method (Amoeba)

The downhill simplex method is a very simple iterative multidimensional ( $ n\approx 20$) minimization method, not requiring evaluation nor existence of derivatives. On the other hand, it requires many function evaluations. However, it can be useful when $ f$ is nonsmooth or when its derivatives are impossible to find.

The simplex method attempts to enclose the minimum inside a simplex (i.e., an $ n$-dimensional convex volume defined by $ n+1$ linearly independent points--vertices). Starting from an initial simplex, the algorithm inspects the function values in vertices and constructs a new simplex using operations of reflection, expansion or contraction, so that the final simplex is small enough to contain the minimum with the desired accuracy.


7.5.2 Example

The quantlet


x = 29238 nelmin (x0,f,maxiter{,eps,step})

finds a minimum of a given function using Nelder and Mead's simplex method. The method can be started from more initial points at the same time; input all the initial points as columns of input matrix x0. The string parameter f contains a name of the minimized function. It is necessary to specify a maximal number of iterations maxiter. The optional parameter eps sets the termination criterium of the iterative process (it is compared with the variance of function values at vortices, hence a smaller value should be set to get the same precision as by e.g., conjugate gradients described in Section 7.5.3). The parameter step sets the length of an initial simplex. The output parameter x has three components: columns of x.minimum contain the minima of f found in the search started from the respective initial points, x.iter is the number of executed iterations and x.converged is equal to 1 if the process converged for all initial points, otherwise it is 0.

Example 29241 XEGnum22.xpl implements minimization of $ f(x) = \sum_{i=1}^n x_i^2$ using the downhill simplex method. Starting from an initial estimate $ (28,-35,13,-17)^T$, amoeba needs 410 iterations to find the following approximation of the minimum in $ (0,0,0,0)^T$:

Contents of minim.minimum
[1,]  8.6837e-19
[2,]  8.9511e-19
[3,]  1.6666e-18
[4,]  2.0878e-18

Contents of minim.iter
[1,]      410

Contents of minim.converged
[1,]        1
29245 XEGnum22.xpl


7.5.3 Conjugate Gradient Methods

A whole family of conjugate gradient methods exists. Their common principle as well as some details of Fletcher-Reeves algorithm and its modification by Polak and Ribiere are described in this section. As the name of methods suggests, it is necessary to compute the gradient of function $ f$ whose minimum is to be found. Gradient information can be incorporated into the minimization procedure in various ways. The common principle of all conjugate gradient method is following:

Start at an initial point $ x_0$. In each iteration, compute $ x_{i+1}$ as a point minimizing the function $ f$ along a new direction, derived in some way from the local gradient. The way of choosing a new direction distinguishes the various conjugate gradients methods. For example, very simple method of steepest descent searches along the line from $ x_i$ in the direction of the local (downhill) gradient $ -\mathop{\rm grad } f(x_i)$, i.e., computes the new iteration as $ x_{i+1} = x_i - \lambda \mathop{\rm grad } f(x_i)$, where $ \lambda$ minimizes the restricted function $ f(x_i + \lambda \mathop{\rm grad } f(x_i))$. However, this method is not very efficient (each step has to go in perpendicular direction to the previous one, which is usually not a direction leading to the minimum). Hence, we would like to choose a new direction based on the negative local gradient direction but at the same time conjugated, i.e., $ Q$-orthogonal to the previous direction(s).

In the original Fletcher-Reeves version of conjugate gradient algorithm, the new direction in all steps is taken as a linear combination of the current gradient and the previous direction, $ h_{i+1} = -\mathop{\rm grad } f(x_{i+1}) + w_i h_i$, with the factor $ w_i$ calculated from the ratio of the magnitudes of the current and previous gradients:

$\displaystyle w_i = \frac{\mathop{\rm grad } f(x_{i+1})^T \mathop{\rm grad } f(x_{i+1}))}{\mathop{\rm grad } f(x_i)^T \mathop{\rm grad } f(x_i)}.
$

Polak and Ribiere proposed to use the factor $ w_i$ in the form

$\displaystyle w_i = \frac{(\mathop{\rm grad } f(x_{i+1}) - \mathop{\rm grad } f...
...\rm grad } f(x_{i+1})}{\mathop{\rm grad } f(x_i)^T \mathop{\rm grad } f(x_i)}.
$

There is no difference between these two versions on an exactly quadratic hypersurface; otherwise, the latter version converges faster than the former one.

Figure: Graph of $ f(x) = x^2 + 3(y-1)^4$; red line shows the progress of conjugate gradient method, 29311 XEGnum08.xpl
\includegraphics[width=1.0\defepswidth]{XEGnum08.ps}


7.5.4 Examples

Fig. 7.8 shows a principle of conjugate gradient method. It was produced by example 29426 XEGnum08.xpl that minimizes the function $ f(x) = x^2 + 3(y-1)^4$ starting from point $ (1,2)^T$. The exact solution $ (0,1)^T$ is approximated in 10 iterations as follows:

Contents of minim.xmin
[1,]  7.0446e-12
[2,]        1

Contents of minim.fmin
[1,]  4.9822e-23

Contents of minim.iter
[1,]       10
29430 XEGnum08.xpl

The example 29435 XEGnum08.xpl uses the quantlet


min = 29445 nmcongrad (fname,x0{,fder,linmin,ftol,maxiter})

implementing the Polak and Ribiere version of conjugate gradient method. One can call this function with only two parameters--string fname containing the name of a function to be minimized and a vector x0 with the initial estimate of the minimum location:

min = nmcongrad(fname,x0)
In this case, the gradient of fname will be computed numerically using the quantlet 29448 nmgraddiff . The precision of the gradient computation can be influenced by setting the step h of 29451 nmgraddiff --call 29454 nmcongrad in the form
min = nmcongrad(fname,x0,h).
If a function computing the derivatives (gradient) of fname is available, one can input its name as a string fder and call the quantlet 29457 nmcongrad in the form
min = nmcongrad(fname,x0,fder).

Another example illustrating the usage of 29460 nmcongrad is implemented in 29463 XEGnum09.xpl . The function to be minimized is defined as $ f(x) = \sum_{i=1}^n x_i^2$. Starting from an initial estimate $ (28,-35,13,-17)^T$, 29466 nmcongrad needs four iterations to find a following approximation of the minimum in $ (0,0,0,0)^T$:

Contents of minim.xmin
[1,] -3.1788e-18
[2,] -4.426e-18
[3,] -4.1159e-18
[4,]  7.2989e-19

Contents of minim.fmin
[1,]  4.7167e-35

Contents of minim.iter
[1,]        4
29470 XEGnum09.xpl

The conjugate gradient method involves a line minimization (see Section 7.5.7 for more details); the quantlet 29475 nmlinminder is used by default. One can specify the name of another line minimization function in the parameter linmin. Line minimum should be computed as precisely as possible, otherwise the convergence of the conjugate gradient method is slower; hence, the quantlet 29478 nmlinminappr is not suitable for line minimization in context of 29481 nmcongrad .

The termination of the iterative process can be influenced by the parameters ftol and maxiter, setting the tolerance limit for the function values and maximal number of iterations, respectively.


7.5.5 Quasi-Newton Methods

Recall the steepest descent method mentioned in Section 7.5.3. Its straightforward idea is to choose the search direction always in the direction of the negative gradient $ -\mathop{\rm grad } f(x_i)$ (steepest descent direction). Another simple idea is based on Newton-Raphson method for solving systems of equations used to find a stationary point of the function $ f$ (i.e., a root of $ f$'s gradient); this yields

$\displaystyle x_{i+1} = x_i - H^{-1} \mathop{\rm grad }f(x_i),
$

where $ H = Hf(x_i)$ denotes the Hessian matrix of $ f$ at $ x_i$. The Newton-Raphson algorithm converges quadratically but, unfortunately, it is not globally convergent. In addition, for $ x_i$ not close enough to a minimum, $ H$ does not need to be positive definite. In such cases, the Newton-Raphson method is not guaranteed to work. An evaluation of $ H$ can be difficult or time-demanding as well. Consequently, the so-called quasi-Newton methods producing a sequence of matrices $ H_i$ approximating the Hessian matrix were developed. To prevent a possible overshooting of the minimum, the same backtracking strategy as in the modified Newton-Raphson method (see Section 7.3.3) is used.

Together with conjugate gradient methods, the family of quasi-Newton methods (called also variable metric methods) belongs to the class of conjugate directions methods. The search direction in $ i$-th step is computed according to the rule

$\displaystyle d_i = -A_i \mathop{\rm grad} f(x_i),
$

where $ A_i$ is a symmetric positive definite matrix (usually $ A_1 = I$, the unit matrix) approximating $ H^{-1}$. One question remains open: given $ A_i$, what $ A_{i+1}$ should we use in the next iteration? Let us return to Newton's method that gave us $ x - x_i = - H^{-1} \mathop{\rm grad }f(x_i)$; taking the left-hand side step with a quadratic function $ f$ would take us to the exact minimum. The same equation for $ x_{i+1}$ reads $ x-x_{i+1} = - H^{-1} \mathop{\rm grad }f(x_{i+1})$. Subtracting these two equations gives

$\displaystyle x_{i+1} - x_{i} = H^{-1} (\mathop{\rm grad }f_{i+1} - \mathop{\rm grad }f_i),
$

where $ \mathop{\rm grad }f_{i}$, $ \mathop{\rm grad }f_{i+1}$ stand for $ \mathop{\rm grad }f(x_i)$ and $ \mathop{\rm grad }f(x_{i+1})$, respectively. Hence, a reasonable idea is to take a new approximation $ A_i$ satisfying

$\displaystyle x_{i+1} - x_{i} = A_{i+1} (\mathop{\rm grad }f_{i+1} - \mathop{\rm grad }f_i)
$

(the quasi-Newton condition). Updating formulas for $ A_i$, usually of the form $ A_{i+1} = A_i + \textrm{correction}$, differentiate various quasi-Newton methods. The most commonly used is the Broyden-Fletcher-Goldfarb-Shanno method (BFGS) that uses the following update:

$\displaystyle A_{i+1} = A_i + \frac{s_i s_i^T}{s_i^T v_i} - \frac{A_i v_i v_i^T A_i}{v_i^T A_i v_i} + (v_i^T A_i v_i)\cdot u_i u_i^T
$

with

$\displaystyle u_i = \frac{s_i}{s_i^T v_i} - \frac{A_i v_i}{v_i^T A_i v_i},
$

where $ s_i = x_{i+1} - x_i$ and $ v_i = \mathop{\rm grad }f_{i+1} - \mathop{\rm grad }f_i$. It can be easily shown that if $ A_i$ is a symmetric positive definite matrix, the new matrix $ A_{i+1}$ is also symmetric positive definite and satisfies the quasi-Newton condition.


7.5.6 Examples

Figure: Graph of $ f(x) = x^2 + 3(y-1)^4$; red line shows the progress of BFGS method, 29759 XEGnum10.xpl
\includegraphics[width=1.0\defepswidth]{XEGnum10.ps}

Fig. 7.9 shows a principle of BFGS method. It was produced by example 29762 XEGnum10.xpl that minimizes function $ f(x) = x^2 + 3(y-1)^4$ starting from point $ (1,2)$. The exact solution $ (0,1)$ is approximated in 25 iterations as follows:

Contents of minim.xmin
[1,]  2.1573e-21
[2,]  0.99967

Contents of minim.fmin
[1,]  3.6672e-14

Contents of minim.iter
[1,]       25
29766 XEGnum10.xpl

The following quantlet


min = 29778 nmBFGS (fname,x0{,fder,linmin,ftol,gtol,maxiter})

is used to find a minimum of a given function using Broyden-Fletcher-Goldfarb-Shanno method. Similarly to 29781 nmcongrad , this quantlet can be called with only two parameters--string fname containing the name of a function to be minimized and a vector x0 with the initial estimate of the minimum location:

min = nmBFGS(fname,x0)
The gradient of fname will be computed numerically using the quantlet 29784 nmgraddiff (see Section 7.6.1) in this case. The precision of this computation can be influenced by setting the step h of 29787 nmgraddiff --call 29790 nmBFGS in the form
min = nmBFGS(fname,x0,h).
If a function computing the derivatives (gradient) of fname is available, one can call the quantlet 29793 nmBFGS with its name as an input string fder:
min = nmcongrad(fname,x0,fder).

An example 29796 XEGnum11.xpl calls the quantlet 29799 nmBFGS to minimize $ f(x) = \sum_{i=1}^n x_i^2$ (see Section 7.5.3 for minimization of the same function by conjugate gradient method). Starting from an initial estimate $ (28,-35,13,-17)^T$, 29802 nmBFGS finds the following approximation of the minimum $ (0,0,0,0)^T$ in two iterations:

Contents of minim.xmin
[1,]  1.0756e-08
[2,]  1.4977e-08
[3,]  1.3926e-08
[4,] -2.47e-09

Contents of minim.fmin
[1,]  5.4004e-16

Contents of minim.iter
[1,]        2
29806 XEGnum11.xpl

The BFGS method also involves a line minimization; the quantlet 29811 nmlinminappr is used by default, because it gives a result quicker then 29814 nmlinmin or 29817 nmlinminder and its precision is sufficient in context of 29820 nmBFGS . The name of another line minimization function can be specified in the parameter linmin.

The termination of the iterative process can be influenced by the parameters ftol, gtol and maxiter, setting the tolerance limit for the convergence of function values, function gradients and maximal number of iterations, respectively.


7.5.7 Line Minimization

As mentioned already at the beginning of the Section 7.5, multidimensional optimization routines call often some of the one-dimensional methods for finding a minimum on a line going through the last trial point in a given direction. The easy way how to include these one-dimensional optimizers into a multidimensional procedure is to minimize the function $ f$ restricted to the given line.

In other words, minimizing $ f$ on the line $ x_i + \lambda g_i$ is equivalent to the one-dimensional minimization of a newly defined (one-dimensional, of course) function $ f_{1D}(t) = f(x_i + t \cdot g_i)$ using any one-dimensional minimization method.

Figure: Graph of $ f(x) = x^2 + 3(y-1)^4$; a line $ (1,2) + t\cdot (3,1)$ is shown in red with a big point representing the line minimum, 29917 XEGnum12.xpl
\includegraphics[width=1.0\defepswidth]{XEGnum12.ps}

REMARK 7.7   If a chosen one-dimensional method needs also the derivative of a function, this has to be a derivative in the direction of the line of minimization. It can be computed using the formula $ f'_{g}(x) = g^T \mathop{\rm grad }f(x)$; of course, one can also compute the (numerical, if necessary) derivative of the restricted function $ f_{1D}$.

REMARK 7.8   Note that there is a substantial difference between the gradient and a derivative in the direction of a given line. For example, gradient and a derivative in the direction $ d=(1,-1)$ of a function $ f(x,y)=x^2 + y^3$ at $ (1,1)$ can be computed using 29920 XEGnum17.xpl , with the following output:
Contents of grad
[1,]        2
[2,]        3

Contents of fdert
[1,]       -1


7.5.8 Examples

The quantlet


lmin = 30051 nmlinmin (fname,fder,x0,direc)

serves to find a minimum of a given function along the direction direc from the point x0 not using derivatives of fname. The input string fname contains a name of a function to be minimized; both x0 and direc are vectors. The second parameter fder is a dummy necessary for the compatibility with 30054 nmlinminder ; it can be set to any value.

Example 30057 XEGnum12.xpl illustrates a line minimization using the quantlet 30060 nmlinmin . A function $ f(x) = x^2 + 3(y-1)^4$ is minimized on a line $ (1,2) + t\cdot (3,1)$ and a graph of the function is produced (see Fig. 7.10).

Contents of minim.xlmin
[1,] -0.33928
[2,]   1.5536

Contents of minim.flmin
[1,]  0.39683

Contents of minim.moved
[1,]  -1.3393
[2,] -0.44643
30064 XEGnum12.xpl

The output parameter lmin consists of the abscissa lmin.xlmin of the minimum of fname on a line $ x0+span\{direc\}$, lmin.flmin is a minimum function value $ flmin = f(xlmin)$ and lmin.moved gives the vector displacement during the minimization: $ moved = xlmin - x0$.

The following quantlet is similar to 30069 nmlinmin but the implemented method involves the evaluation of the derivatives of fname:


lmin = 30079 nmlinminder (fname,fder,x0,direc)

Hence, the string fder should contain a name of function computing the derivatives (gradient) of fname; if left empty, the quantlet 30082 nmgraddiff will be used for computing the gradient.

Example 30085 XEGnum13.xpl is equivalent to the example 30088 XEGnum12.xpl above except of using the quantlet 30091 nmlinminder for line minimization. It produces the same output and Fig. 7.10.

The quantlet 30094 nmlinminappr finds an approximated minimum of a function on a given line:


lmin = 30105 nmlinminappr (fname,fder,x0,direc,stepmax)

In contrast to the quantlets 30108 nmlinmin and 30111 nmlinminder described above, it finds a minimum with lower precision; on the other hand, the computation is quicker. Hence, it is used as a default line minimization routine for 30114 nmBFGS but it is not convenient for 30117 nmcongrad .

Figure: Graph of $ f(x) = x^2 + 3(y-1)^4$; a line $ (1,2) + t\cdot (3,1)$ is shown in red with a big point representing an approximation of the line minimum, 30121 XEGnum14.xpl
\includegraphics[width=1.0\defepswidth]{XEGnum14.ps}

Example 30124 XEGnum14.xpl is another modification of the example 30127 XEGnum12.xpl , using the line-minimizing quantlet 30130 nmlinminappr . Fig. 7.11 and the following output show that the approximation of the line minimum found by 30133 nmlinminappr is less precise then the approximation found by 30136 nmlinmin or 30139 nmlinminder :

Contents of minim.xlmin
[1,]     -0.5
[2,]      1.5

Contents of minim.flmin
[1,]   0.4375

Contents of minim.moved
[1,]     -1.5
[2,]     -0.5

Contents of minim.check
[1,]        0
30143 XEGnum14.xpl

Please note that 30148 nmlinminappr searches for a line minimum only in the positive direction of a given direction vector direc. An additional input parameter stepmax can be used to prevent evaluation of the function fname outside its domain. Except of output parameters lmin.xlmin, lmin.flmin and lmin.moved described above, 30151 nmlinminappr returns also lmin.check. lmin.check is equal to zero in case of numerical convergence of line minimization and equals one for lmin.xlmin too close to x0; $ \texttt{lmin.check} = 1$ means usually convergence when used in a minimization algorithm, but a calling method should check the convergence in case of a root-finding problem.