This section covers relevant matrix decompositions and basic numerical methods. Decompositions provide a numerically stable way to solve a system of linear equations, as shown already in [65], and to invert a matrix. Additionally, they provide an important tool for analyzing the numerical stability of a system.
Some of most frequently used decompositions are the Cholesky, QR, LU, and SVD decompositions. We start with the Cholesky and LU decompositions, which work only with positive definite and nonsingular diagonally dominant square matrices, respectively (Sects. 4.1.1 and 4.1.2). Later, we explore more general and in statistics more widely used QR and SVD decompositions, which can be applied to any matrix (Sects. 4.1.3 and 4.1.4). Finally, we briefly describe the use of decompositions for matrix inversion, although one rarely needs to invert a matrix (Sect. 4.1.5). Monographs [15,31,37] and [59] include extensive discussions of matrix decompositions.
All mentioned decompositions allow us to transform a general system of
linear equations to a system with an upper triangular, a diagonal, or
a lower triangular coefficient matrix:
, or
, respectively. Such systems are easy to solve with a very
high accuracy by back substitution, see [34]. Assuming the
respective coefficient matrix
has a full rank, one can
find a solution of
, where
, by evaluating
![]() |
(4.1) |
![]() |
(4.2) |
The Cholesky factorization, first published by [5], was
originally developed to solve least squares problems in geodesy and
topography. This factorization, in statistics also referred to as
''square root method,'' is a triangular decomposition. Providing
matrix
is positive definite, the Cholesky decomposition finds
a triangular matrix
that multiplied by its own transpose leads
back to matrix
. That is,
can be thought of as a square
root of
.
The matrix
is called the Cholesky factor of
and the
relation
is called the Cholesky factorization.
Obviously, decomposing a system
to
allows us to solve two triangular systems:
for
and then
for
. This is similar to
the original Gauss approach for solving a positive definite system of
normal equations
. Gauss solved the
normal equations by a symmetry-preserving elimination and used the
back substitution to solve for
.
Let us now describe the algorithm for finding the Cholesky
decomposition, which is illustrated on Fig. 4.1. One of the
interesting features of the algorithm is that in the th iteration
we obtain the Cholesky decomposition of the
th leading principal
minor of
,
.
for
to
for
to
end
end
The Cholesky decomposition described in Algorithm 1 is
a numerically stable procedure, see [44] and
[45], which can be at the same time implemented in a very
efficient way. Computed values can be stored in place of
original
, and thus, no extra memory is needed. Moreover, let
us note that while Algorithm 1 describes the rowwise
decomposition (
is computed row by row), there are also
a columnwise version and a version with diagonal pivoting, which is
also applicable to semidefinite matrices. Finally, there are also
modifications of the algorithm, such as blockwise decomposition, that
are suitable for very large problems and parallelization; see
[7,14] and [50].
The LU decomposition is another method reducing a square matrix
to a product of two triangular matrices (lower triangular
and
upper triangular
). Contrary to the Cholesky decomposition, it
does not require a positive definite matrix
, but there is no
guarantee that
.
![]() |
Similarly to the Cholesky decomposition, the LU decomposition reduces
solving a system of linear equations
to
solving two triangular systems:
, where
, and
.
Finding the LU decomposition of
is described in
Algorithm 2. Since it is equivalent to solving a system of
linear equations by the Gauss elimination, which searches just for
and ignores
, we refer a reader to Sect. 4.2,
where its advantages (e.g., easy implementation, speed) and
disadvantages (e.g., numerical instability without pivoting) are
discussed.
Finally, note that there are also generalizations of LU to non-square and singular matrices, such as rank revealing LU factorization; see [51] and [47].
One of the most important matrix transformations is the QR
decomposition. It splits a general matrix
to an orthonormal
matrix
, that is, a matrix with columns orthogonal to each other
and its Euclidian norm equal to
, and to an upper triangular matrix
. Thus, a suitably chosen orthogonal matrix
will
triangularize the given matrix
.
![]() |
If
is a nonsingular square matrix, an even slightly stronger
result can be obtained: uniqueness of the QR decomposition.
The use of the QR decomposition for solving a system of equations
consists in multiplying the whole system by
the orthonormal matrix
,
, and then
solving the remaining upper triangular system
. This method guarantees numerical stability by minimizing errors
caused by machine roundoffs (see the end of this section for details).
The QR decomposition is usually constructed by finding one orthonormal
vector (one column of
) after another. This can be achieved using
the so-called elementary orthogonal transformations such as
Householder reflections, [39], or Givens rotations,
[20], that are described in the following subsections. These
transformations are related to the solution of the following standard
task.
The QR decomposition using Householder reflections (HR) was developed
by [21]. Householder reflection (or Householder
transformation) is a matrix
,
![]() |
To solve Problem 1 using some HR, we search for a vector
such that
will be reflected to the
-axis. This holds
for the following choice of
:
A Givens rotation (GR) in dimensions (or Givens transformation) is
defined by an orthonormal matrix
,
![]() |
Now, let us have a look at how GRs can be used for solving
Problem 1. A GR of a vector
by an angle
results in
such that
![]() |
![]() |
The next step employs a simple fact that the pre- or
postmultiplication of a vector
or a matrix
by any GR
affects only the
th and
th rows and columns,
respectively. Hence, one can combine several rotations without one
rotation spoiling the result of another rotation. (Consequently, GRs
are more flexible than HRs). Two typical ways how GRs are used for
solving Problem 1 mentioned in Sect. 4.1.3 follow.
Finally, there are several algorithms for computing the Givens
rotations that improve over the straightforward evaluation of
. A robust algorithm minimizing the loss of precision
is given in [7]. An algorithm minimizing memory
requirements was proposed by [58]. On the other hand,
[16] and [29] proposed modifications aiming to
minimize the number of arithmetic operations.
An appropriate combination of HRs or GRs, respectively, can be used to
compute the QR decomposition of a given matrix
,
, in a following way. Let
,
, denote an orthonormal matrix in
such that premultiplication of
by
can zero all elements in the
th column that are
below the diagonal and such that the previous columns
are not affected at all. Such a matrix can be a blockwise diagonal
matrix with blocks being the identity matrix
and
a matrix
solving Problem 1 for the
vector composed of elements in the
th column of
that
lie on and below the diagonal. The first part
guarantees that the columns
of matrix
are
not affected by multiplication, whereas the second block
transforms all elements in the
th column that are below the
diagonal to zero. Naturally, matrix
can be found by means
of HRs or GRs as described in previous paragraphs.
This way, we construct a series of matrices
such that
![]() |
![]() |
We describe now the QR algorithm using HRs or GRs. Let
denote the orthonormal matrix from Problem 1 constructed
for a vector
by one of the discussed methods.
for
to
end
There are also modifications of this basic algorithm employing pivoting for better numerical performance and even revealing rank of the system, see [38] and [36] for instance. An error analysis of the QR decomposition by HRs and GRs are given by [17] and [36], respectively.
Given a nonsingular matrix
, the
Gram-Schmidt orthogonalization constructs a matrix
such that
the columns of
are orthonormal to each other and span the same
space as the columns of
. Thus,
can be expressed as
multiplied by another matrix
, whereby the Gram-Schmidt
orthogonalization process (GS) ensures that
is an upper
triangular matrix. Consequently, GS can be used to construct the QR
decomposition of a matrix
. A survey of GS variants and their
properties is given by [6].
The classical Gram-Schmidt (CGS) process constructs the orthonormal
basis stepwise. The first column
of
is simply
normalized
. Having constructed a orthonormal base
, the next column
is
proportional to
minus its projection to the space
span
. Thus,
is by its definition orthogonal to
span
, and at the same time, the first
columns of
and
span the same linear space. The elements of the
triangular matrix
from Theorem 3 are then coordinates
of the columns of
given the columns of
as a basis.
for
to
for
to
end
end
Similarly to many decomposition algorithms, also CGS allows a memory
efficient implementation since the computed orthonormal columns of
can rewrite the original columns of
. Despite this feature
and mathematical correctness, the CGS algorithm does not always behave
well numerically because numerical errors can very quickly
accumulate. For example, an error made in computing
affects
, errors in both of these terms (although caused initially
just by an error in
) adversely influence
and so
on. Fortunately, there is a modified Gram-Schmidt (MGS) procedure,
which prevents such an error accumulation by subtracting linear
combinations of
directly from
before constructing
following orthonormal vectors. (Surprisingly, MGS is historically
older than CGS.)
for
to
for
to
end
end
Apart from this algorithm (the row version of MGS), there are also a column version of MGS by [6] and MGS modifications employing iterative orthogonalization and pivoting by [9]. Numerical superiority of MGS over CGS was experimentally established already by [53]. This result is also theoretically supported by the GS error analysis in [6], who uncovered numerical equivalence of the QR decompositions done by MGS and HRs.
The singular value decomposition (SVD) plays an important role in
numerical linear algebra and in many statistical techniques as
well. Using two orthonormal matrices, SVD can diagonalize any matrix
and the results of SVD can tell a lot about (numerical)
properties of the matrix. (This is closely related to the eigenvalue
decomposition: any symmetric square matrix
can be diagonalized,
, where
is a diagonal matrix
containing the eigenvalues of
and
is an orthonormal
matrix.)
Similarly to the QR decomposition, SVD offers a numerically stable way
to solve a system of linear equations. Given a system
, one can transform it to
and solve it in two trivial
steps: first, finding a solution
of
,
and second, setting
, which is equivalent to
.
On the other hand, the power of SVD lies in its relation to many
important matrix properties; see [62], for
instance. First of all, the singular values of a matrix
are
equal to the (positive) square roots of the eigenvalues of
and
, whereby the associated left
and right singular vectors are identical with the corresponding
eigenvectors. Thus, one can compute the eigenvalues of
directly from the original matrix
. Second, the number of
nonzero singular values equals the rank of a matrix. Consequently, SVD
can be used to find an effective rank of a matrix, to check a near
singularity and to compute the condition number of a matrix. That is,
it allows to assess conditioning and sensitivity to errors of a given
system of equations. Finally, let us note that there are far more uses
of SVD: identification of the null space of
,
span
; computation of the matrix
pseudo-inverse,
; low-rank
approximations and so on. See [7] and [62] for
details.
Let us now present an overview of algorithms for computing the SVD
decomposition, which are not described in details due to their extent.
The first stable algorithm for computing the SVD was suggested by
[22]. It involved reduction of a matrix
to its
bidiagonal form by HRs, with singular values and vectors being
computed as eigenvalues and eigenvectors of a specific tridiagonal
matrix using a method based on Sturm sequences. The final form of the
QR algorithm for computing SVD, which has been the preferred SVD
method for dense matrices up to now, is due to [23];
see [2,7] or [15] for the
description of the algorithm and some modifications. An alternative
approach based on Jacobi algorithm was given by
[30]. Latest contributions to the pool of computational
methods for SVD, including [63,10] and
[36], aim to improve the accuracy of singular values and
computational speed using recent advances in the QR decomposition.
In previous sections, we described how matrix decompositions can be
used for solving systems of linear equations. Let us now discuss the
use of matrix decompositions for inverting a nonsingular squared
matrix
, although matrix
inversion is not needed very often. All discussed matrix decomposition
construct two or more matrices
such that
, where matrices
are orthonormal, triangular, or
diagonal. Because
we just need to be able to invert
orthonormal and triangular matrices (a diagonal matrix is a special
case of a triangular matrix).
First, an orthonormal matrix
satisfies by definition
. Thus, inversion is in this case
equivalent to the transposition of a matrix:
.
Second, inverting an upper triangular matrix
can be done by
solving directly
, which leads to the backward
substitution method. Let
denote
the searched for inverse matrix
.
for
to
for
to
end
end
There are several other algorithms available such as forward substitution or blockwise inversion. Designed for a faster and more (time) efficient computation, their numerical behavior does not significantly differ from the presented algorithm. See [8] for an overview and numerical study.