An algorithm is a list of directed actions to accomplish a designated task. Cooking recipes are the best examples of algorithms in everyday life. The level of a cookbook reflect the skill of the cook: a gourmet cookbook may include the instruction ''saute the onion until transparent'' while a beginner's cookbook would describe how to choose and slice the onion, what kind of pan, the level of heat, etc. Since computers are inanimate objects incapable of thought, instructions for a computer algorithm must go much, much further to be completely clear and unambiguous, and include all details.
Most cooking recipes would be called single pass algorithms, since they are a list of commands to be completed in consecutive order. Repeating the execution of the same tasks, as in baking batches of cookies, would be described in algorithmic terms as looping. Looping is the most common feature in mathematical algorithms, where a specific task, or similar tasks, are to be repeated many times. The computation of an inner product is commonly implemented using a loop:
![]() |
do to
end do
where the range of the loop includes the single statement
with a multiplication and addition. In an iterative
algorithm, the number of times the loop is be repeated is not known in
advance, but determined by some monitoring mechanism. For mathematical
algorithms, the focus is most often monitoring convergence of
a sequence or series. Care must be taken in implementing iterative
algorithms to insure that, at some point, the loop will be terminated,
otherwise an improperly coded procedure may proceed indefinitely in an
infinite loop. Surprises occur when the convergence of an
algorithm can be proven analytically, but, because of the discrete
nature of floating point arithmetic, the procedure implementing that
algorithm may not converge. For example, in a square-root problem to
be examined further momentarily, we cannot find
so
that
is exactly equal to
. The square of one number
may be just below two, and the square of the next largest number in
may be larger than
. When monitoring convergence,
common practice is to convert any test for equality of two floating
point numbers or expressions to tests of closeness:
algorithm sort(list)
break list into two pieces: first and second
sort (first)
sort (second)
put sorted lists first and second together to form
one sorted list
end algorithm sort
Implemented recursively, a big problem is quickly broken
into tiny pieces and the key to the performance of divide-and-conquer
algorithms is in combining the solutions to lots of little problems to
address the big problem. In cases where these solutions can be easily
combined, these recursive algorithms can achieve remarkable
breakthroughs in performance. In the case of sorting, the standard
algorithm, known as bubblesort, takes work to sort a problem
of size
- if the size of the problem is doubled, the work goes up
by factor of
. The Discrete Fourier Transform, when written as the
multiplication of an
matrix and a vector, involves
multiplications and additions. In both cases, the problem is broken
into two subproblems, and the mathematics of divide and conquer
follows a simple recursive relationship, that the time/work
to solve a problem of size
is the twice the time/work to solve
two subproblem with half the size, plus the time/work
, to put
the solutions together:
The performance of an algorithm may be measured in many ways,
depending on the characteristics of the problems the it may be
intended to solve. The sample variance problem above provides an
example. The simple algorithm using (1.7) requires minimal
storage and computation, but may lose accuracy when the variance is
much smaller than the mean: the common test problem for exhibiting
catastrophic cancellation employs
for single
precision. The two-pass method (1.8) requires all of the
observations to be stored, but provides the most accuracy and least
computation. Centering using the first observation (1.9) is
nearly as fast, requires no extra storage, and its accuracy only
suffers when the first observation is unlike the others. The last
method, arising from the use of Givens transformations (1.10)
and (1.11), also requires no extra storage, gives sound
accuracy, but requires more computation. As commonly seen in the
marketplace of ideas, the inferior methods have not survived, and the
remaining competitors all have tradeoffs with speed, storage, and
numerical stability.
The most common difficult numerical problems in statistics involve optimization, or root-finding: maximum likelihood, nonlinear least squares, M-estimation, solving the likelihood equations or generalized estimating equations. And the algorithms for solving these problems are typically iterative algorithms, using the results from the current step to direct the next step.
To illustrate, consider the problem of computing the square root of
a real number . Following from the previous discussion of floating
point arithmetic, we can restrict
to the interval
. One
approach is to view the problem as a root-finding problem, that is, we
seek
such that
. The bisection algorithm is
a simple, stable method for finding a root. In this case, we may
start with an interval known to contain the root, say
,
with
and
. Then bisection tries
,
the midpoint of the current interval. If
, then
, and the root is known to belong in the new
interval
. The algorithm continues by testing the midpoint
of the current interval, and eliminating half of the interval. The
rate of convergence of this algorithm is linear, since the interval of
uncertainty, in this case, is cut by a constant
with each
step. For other algorithms, we may measure the rate at which the
distance from the root decreases. Adapting Newton's method to this
root-finding problem yields Heron's iteration
![]() |
While we can stop this algorithm when , as discussed
previously, there may not be any floating point number that will give
a zero to the function, hence the stopping rule (1.12). Often
in root-finding problems, we stop when
is small enough. In
some problems, the appropriate ''small enough'' quantity to ensure
the desired accuracy may depend on parameters of the problem, as in
this case, the value of
. As a result, termination criterion for
the algorithm is changed to: stop when the relative change in
is
small
![]() |
As discussed previously, rounding error with floating point
computation affects the level of accuracy that is possible with
iterative algorithms for root-finding. In general, the relative error
in the root is at the same relative level as the computation of the
function. While optimization problems have many of the same
characteristics as root-finding problems, the effect of computational
error is a bit more substantial: digits of accuracy in the
function to be optimization can produce but
digits in the
root/solution.
In the multidimensional case, the common problems are solving a system
of nonlinear equations or optimizing a function of several variables.
The most common tools for these problems are Newton's method or
secant-like variations. Given the appropriate regularity conditions,
again we can achieve quadratic convergence with Newton's method, and
superlinear convergence with secant-like variations. In the case of
optimization, we seek to minimize , and Newton's method is based
on minimizing the quadratic approximation:
![]() |
![]() |
![]() |
![]() |
![]() |
In both problems, the scaling of the parameters is quite important, as measuring the error with the Euclidean norm presupposes that errors in each component are equally weighted. Most software for optimization includes a parameter vector for suitably scaling the parameters, so that one larger parameter does not dominate the convergence decision. In solving nonlinear equations, the condition of the problem is given by
![]() |
With the optimization problem, there is a natural scaling with
in contrast with the Jacobian matrix.
Here, the eigenvectors of the Hessian matrix
dictate the condition of the
problem; see, for example, [4] and [3]. Again,
parameter scaling remains one of the most important tools.