Next: 4.2 Direct Methods for Up: 4. Numerical Linear Algebra Previous: 4. Numerical Linear Algebra

4.1 Matrix Decompositions

This section covers relevant matrix decompositions and basic numerical methods. Decompositions provide a numerically stable way to solve a system of linear equations, as shown already in [65], and to invert a matrix. Additionally, they provide an important tool for analyzing the numerical stability of a system.

Some of most frequently used decompositions are the Cholesky, QR, LU, and SVD decompositions. We start with the Cholesky and LU decompositions, which work only with positive definite and nonsingular diagonally dominant square matrices, respectively (Sects. 4.1.1 and 4.1.2). Later, we explore more general and in statistics more widely used QR and SVD decompositions, which can be applied to any matrix (Sects. 4.1.3 and 4.1.4). Finally, we briefly describe the use of decompositions for matrix inversion, although one rarely needs to invert a matrix (Sect. 4.1.5). Monographs [15,31,37] and [59] include extensive discussions of matrix decompositions.

All mentioned decompositions allow us to transform a general system of linear equations to a system with an upper triangular, a diagonal, or a lower triangular coefficient matrix: $\boldsymbol{U} \boldsymbol{x} = \boldsymbol{b}, \boldsymbol{D} \boldsymbol{x} = \boldsymbol{b}$ , or $\boldsymbol{L} \boldsymbol{x} = \boldsymbol{b}$ , respectively. Such systems are easy to solve with a very high accuracy by back substitution, see [34]. Assuming the respective coefficient matrix $\boldsymbol{A}$ has a full rank, one can find a solution of $\boldsymbol{U} \boldsymbol{x} = \boldsymbol{b}$ , where $\boldsymbol{U} = \{U_{ij}\}_{i=1,j=1}^n$ , by evaluating

$\displaystyle x_i = U_{ii}^{-1} \left( b_i - \sum_{j=i+1}^{n} U_{ij} x_j \right)$

(4.1)

for $i=n, \ldots, 1$ . Analogously for $\boldsymbol{L} \boldsymbol{x} = \boldsymbol{b}$ , where $\boldsymbol{L} = \{L_{ij}\}_{i=1,j=1}^n$ , one evaluates for $i=1,\ldots,n$

$\displaystyle x_i = L_{ii}^{-1} \left( b_i - \sum_{j=1}^{i-1} L_{ij} x_j \right)\;.$

(4.2)

For discussion of systems of equations that do not have a full rank, see for example [37].

4.1.1 Cholesky Decomposition

The Cholesky factorization, first published by [5], was originally developed to solve least squares problems in geodesy and topography. This factorization, in statistics also referred to as ''square root method,'' is a triangular decomposition. Providing matrix $\boldsymbol{A}$ is positive definite, the Cholesky decomposition finds a triangular matrix $\boldsymbol{U}$ that multiplied by its own transpose leads back to matrix $\boldsymbol{A}$ . That is, $\boldsymbol{U}$ can be thought of as a square root of $\boldsymbol{A}$ .

Theorem 1 Let matrix $\boldsymbol{A}\in {\mathbb{R}}^{n\times n}$ be symmetric and positive definite. Then there exists a unique upper triangular matrix $\boldsymbol{U}$ with positive diagonal elements such that $\boldsymbol{A} = \boldsymbol{U}^\top \boldsymbol{U}$ .

The matrix $\boldsymbol{U}$ is called the Cholesky factor of $\boldsymbol{A}$ and the relation $\boldsymbol{A} = \boldsymbol{U}^\top \boldsymbol{U}$ is called the Cholesky factorization.

Obviously, decomposing a system $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{b}$ to $\boldsymbol{U}^{\top} \boldsymbol{U} \boldsymbol{x} = \boldsymbol{b}$ allows us to solve two triangular systems: $\boldsymbol{U}^{\top} \boldsymbol{z} = \boldsymbol{b}$ for $\boldsymbol{z}$ and then $\boldsymbol{U} \boldsymbol{x} = \boldsymbol{z}$ for $\boldsymbol {x}$ . This is similar to the original Gauss approach for solving a positive definite system of normal equations $\boldsymbol{X}^\top\boldsymbol{X}\boldsymbol{x} = \boldsymbol{X}^\top \boldsymbol{b}$ . Gauss solved the normal equations by a symmetry-preserving elimination and used the back substitution to solve for $\boldsymbol {x}$ .

**Figure 4.1:** Rowwise Cholesky algorithm
$\includegraphics[width=4.7cm]{text/2-4/choleski.eps}$

Let us now describe the algorithm for finding the Cholesky decomposition, which is illustrated on Fig. 4.1. One of the interesting features of the algorithm is that in the th iteration we obtain the Cholesky decomposition of the th leading principal minor of $\boldsymbol{A}$ , $\{A_{kl}\}_{k=1,l=1}^{i,i}$ .

Algorithm 1

for to
     $U_{ii} = \left( A_{ii} - \sum_{k=1}^{i-1} U_{ki}^2 \right)^{1/2}$
     for to
         $\left. U_{ij}=\left(A_{ij} - \sum_{k=1}^{i-1} U_{ki} U_{kj} \right) \right/ U_{ii}$
     end
end

The Cholesky decomposition described in Algorithm 1 is a numerically stable procedure, see [44] and [45], which can be at the same time implemented in a very efficient way. Computed values $U_{ij}$ can be stored in place of original $A_{ij}$ , and thus, no extra memory is needed. Moreover, let us note that while Algorithm 1 describes the rowwise decomposition ( $\boldsymbol{U}$ is computed row by row), there are also a columnwise version and a version with diagonal pivoting, which is also applicable to semidefinite matrices. Finally, there are also modifications of the algorithm, such as blockwise decomposition, that are suitable for very large problems and parallelization; see [7,14] and [50].

4.1.2 LU Decomposition

The LU decomposition is another method reducing a square matrix $\boldsymbol{A}$ to a product of two triangular matrices (lower triangular $\boldsymbol{L}$ and upper triangular $\boldsymbol{U}$ ). Contrary to the Cholesky decomposition, it does not require a positive definite matrix $\boldsymbol{A}$ , but there is no guarantee that $\boldsymbol{L} = \boldsymbol{U}^{\top}$ .

Theorem 2 Let the matrix $\boldsymbol{A}\in {\mathbb{R}}^{n\times n}$ satisfy following conditions:

$\displaystyle A_{11}\neq 0\;, \quad \det\left(\begin{array}{cc}A_{11} & A_{12} ... ... A_{33}\end{array}\right) \neq 0\;, ~~ \ldots\;, ~~ \det\boldsymbol{A}\neq 0\;.$

Then there exists a unique lower triangular matrix $\boldsymbol{L}$ with ones on a diagonal, a unique upper triangular matrix $\boldsymbol{U}$ with ones on a diagonal and a unique diagonal matrix $\boldsymbol{D}$ such that $\boldsymbol{A} = \boldsymbol{L}\boldsymbol{D}\boldsymbol{U}$ .

Note that for any nonsingular matrix $\boldsymbol{A}$ there is always a row permutation $\boldsymbol{P}$ such that the permuted matrix $\boldsymbol{P} \boldsymbol{A}$ satisfies the assumptions of Theorem 2. Further, a more frequently used version of this theorem factorizes $\boldsymbol{A}$ to a lower triangular matrix $\boldsymbol{L}' = \boldsymbol{L} \boldsymbol{D}$ and an upper triangular matrix $\boldsymbol{U}' = \boldsymbol{U}$ . Finally, [69] gave alternative conditions for the existence of the LU decomposition: $\boldsymbol{A}\in {\mathbb{R}}^{n\times n}$ is nonsingular and $\boldsymbol{A}^{\!\top}$ is diagonally dominant (i.e., $\vert A_{ii}\vert \geq \sum_{i=1,i\not=j}^n \vert A_{ij}\vert$ for $j = 1,\ldots,n$ ).

Similarly to the Cholesky decomposition, the LU decomposition reduces solving a system of linear equations $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{L} \boldsymbol{U} \boldsymbol{x} = \boldsymbol{b}$ to solving two triangular systems: $\boldsymbol{L} \boldsymbol{z} = \boldsymbol{b}$ , where $\boldsymbol{z} = \boldsymbol{U} \boldsymbol{x}$ , and $\boldsymbol{U} \boldsymbol{x} = \boldsymbol{z}$ .

Finding the LU decomposition of $\boldsymbol{A}$ is described in Algorithm 2. Since it is equivalent to solving a system of linear equations by the Gauss elimination, which searches just for $\boldsymbol{U}$ and ignores $\boldsymbol{L}$ , we refer a reader to Sect. 4.2, where its advantages (e.g., easy implementation, speed) and disadvantages (e.g., numerical instability without pivoting) are discussed.

Algorithm 2
$\boldsymbol{L} = \boldsymbol{0}_{n}, \boldsymbol{U} = \boldsymbol{I}_{n}$
for

to

for

to

         $L_{ji} = A_{ji} - \sum_{k=1}^{i-1} L_{jk} U_{ki}$
     end
     for

to

$U_{ij} = \left( A_{ij} - \sum_{k=1}^{i-1} L_{ik} U_{kj} \right) / L_{ii}$
end
end

Finally, note that there are also generalizations of LU to non-square and singular matrices, such as rank revealing LU factorization; see [51] and [47].

4.1.3 QR Decomposition

One of the most important matrix transformations is the QR decomposition. It splits a general matrix $\boldsymbol{A}$ to an orthonormal matrix $\boldsymbol{Q}$ , that is, a matrix with columns orthogonal to each other and its Euclidian norm equal to , and to an upper triangular matrix $\boldsymbol{R}$ . Thus, a suitably chosen orthogonal matrix $\boldsymbol{Q}$ will triangularize the given matrix $\boldsymbol{A}$ .

Theorem 3 Let matrix $\boldsymbol{A}\in {\mathbb{R}}^{m\times n}$ with $m \ge n$ . Then there exist an orthonormal matrix $\boldsymbol{Q}\in {\mathbb{R}}^{m\times m}$ and an upper triangular matrix $\boldsymbol{R}\in {\mathbb{R}}^{n\times n}$ with nonnegative diagonal elements such that

$\displaystyle \boldsymbol{A} = \boldsymbol{Q} \left(\begin{array}{c}\boldsymbol{R}\\ \boldsymbol{0}\end{array}\right)$

(the QR decomposition of the matrix $\boldsymbol{A}$ ).

If $\boldsymbol{A}$ is a nonsingular square matrix, an even slightly stronger result can be obtained: uniqueness of the QR decomposition.

Theorem 4 Let matrix $\boldsymbol{A}\in {\mathbb{R}}^{n\times n}$ be nonsingular. Then there exist a unique orthonormal matrix $\boldsymbol{Q} \in {\mathbb{R}}^{n\times n}$ and a unique upper triangular matrix $\boldsymbol{R}\in {\mathbb{R}}^{n\times n}$ with positive diagonal elements such that $\boldsymbol{A} = \boldsymbol{Q}\boldsymbol{R}$ .

The use of the QR decomposition for solving a system of equations $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{Q} \boldsymbol{R} \boldsymbol{x} = \boldsymbol{b}$ consists in multiplying the whole system by the orthonormal matrix $\boldsymbol{Q}^\top$ , $\boldsymbol{Q}^\top\boldsymbol{Q}= \boldsymbol{I}$ , and then solving the remaining upper triangular system $\boldsymbol{R} \boldsymbol{x} = \boldsymbol{Q}^\top \boldsymbol{b}$ . This method guarantees numerical stability by minimizing errors caused by machine roundoffs (see the end of this section for details).

The QR decomposition is usually constructed by finding one orthonormal vector (one column of $\boldsymbol{Q}$ ) after another. This can be achieved using the so-called elementary orthogonal transformations such as Householder reflections, [39], or Givens rotations, [20], that are described in the following subsections. These transformations are related to the solution of the following standard task.

Problem 1 Given a vector $\boldsymbol{x}\in {\mathbb{R}}^m$ , $\boldsymbol{x}\neq 0$ , find an orthogonal matrix $\boldsymbol{M}\in {\mathbb{R}}^{m\times m}$ such that $\boldsymbol{M}^\top \boldsymbol{x} = \Vert\boldsymbol{x}\Vert _2 \cdot \boldsymbol{e}_1$ , where $\boldsymbol{e}_1 = (1,0,\ldots,0)^\top$ denotes the first unit vector.

In the rest of this section, we will first discuss how Householder reflections and Givens rotations can be used for solving Problem 1. Next, using these elementary results, we show how one can construct the QR decomposition. Finally, we briefly mention the Gram-Schmidt orthogonalization method, which also provides a way to find the QR decomposition.

4.1.3.1 Householder Reflections

The QR decomposition using Householder reflections (HR) was developed by [21]. Householder reflection (or Householder transformation) is a matrix $\boldsymbol{P}$ ,

$\displaystyle \boldsymbol{P} = \boldsymbol{I} - \frac{1}{c} \boldsymbol{u}\boldsymbol{u}^{\top}, \quad c = \frac{1}{2}\boldsymbol{u}^{\top} \boldsymbol{u}\;,$

(4.3)

where $\boldsymbol {u}$ is a Householder vector. By definition, the matrix $\boldsymbol{P}$ is orthonormal and symmetric. Moreover, for any $\boldsymbol{x}\in {\mathbb{R}}^m$ , it holds that

$\displaystyle \boldsymbol{P}\boldsymbol{x} = \boldsymbol{x} - \frac{1}{c} \left(\boldsymbol{u}^{\top} \boldsymbol{x}\right)\boldsymbol{u}\;.$

Therefore, to apply HR one does not need to explicitly compute the matrix $\boldsymbol{P}$ itself. Additionally, it holds $\boldsymbol{P}\boldsymbol{u} = -\boldsymbol{u}$ and $\boldsymbol{P}\boldsymbol{x}\in$ span $\{\boldsymbol{x},\boldsymbol{u}\}$ . This means that HR reflects a vector $\boldsymbol {x}$ with respect to the hyperplane with normal vector $\boldsymbol {u}$ (hence the name Householder reflection).

**Figure 4.2:** Reflection with respect to the hyperplane with a normal vector $\boldsymbol {u}$
$\includegraphics[width=4.5cm,clip]{text/2-4/householder}$

To solve Problem 1 using some HR, we search for a vector $\boldsymbol {u}$ such that $\boldsymbol {x}$ will be reflected to the -axis. This holds for the following choice of $\boldsymbol {u}$ :

$\displaystyle \boldsymbol{u} = \boldsymbol{x} + s_1 \Vert\boldsymbol{x}\Vert _2 \cdot \boldsymbol{e}_1\;, \quad s_1 = 2I(x_1 \geq 0) - 1\;,$

(4.4)

where $x_1 = \boldsymbol{x}^{\top} \boldsymbol{e}_1$ denotes the first element of the vector $\boldsymbol {x}$ and $I(\cdot)$ represents an indicator. For this reflection, it holds that

from (4.3) equals $\Vert x\Vert _2 (\Vert x\Vert _2 + \vert x_1\vert)$ and $\boldsymbol{P}\boldsymbol{x} = - s_1 \Vert\boldsymbol{x}\Vert _2 \cdot \boldsymbol{e}_1$ as one can verify by substituting (4.4) into (4.3).

4.1.3.2 Givens Rotations

A Givens rotation (GR) in dimensions (or Givens transformation) is defined by an orthonormal matrix $\boldsymbol{R}_{ij}(\alpha) \in {\mathbb{R}}^{m\times m}$ ,

$\displaystyle \boldsymbol{R}_{ij}(\alpha) = \left(\begin{array}{ccccccc} ~1~ & ... ...{array}\right) \begin{array}{c} \\ [2mm] \\ i\\ \\ [6mm] j\\ \\ ~\\ \end{array}$

(4.5)

where $c = \cos\alpha$ and $s = \sin\alpha$ for $\alpha \in {\mathbb{R}}$ and $1\leq i<j\le n$ . Thus, the rotation $\boldsymbol{R}_{ij}(\alpha)$ represents a plane rotation in the space spanned by the unit vectors $\boldsymbol{e}_i$ and $\boldsymbol{e}_j$ by an angle $\alpha$ . In two dimensions, rotation $\boldsymbol{R}_{12}(\alpha)$ ,

$\displaystyle \boldsymbol{R}_{12}(\alpha) = \left(\begin{array}{rrr} c && s \\ -s && c \end{array}\right)\;, \quad c = \cos\alpha\;, \quad s = \sin\alpha$

represents a clockwise rotation by an angle $\alpha$ ; see Fig. 4.3.

**Figure 4.3:** Rotation of $\boldsymbol {x}$ in a plane by an angle $\alpha$
$\includegraphics[width=7.7cm]{text/2-4/givens}$

Now, let us have a look at how GRs can be used for solving Problem 1. A GR of a vector $\boldsymbol{x} = (x_1,\ldots,x_m)^{\top}\in {\mathbb{R}}^m$ by an angle $\alpha$ results in $\boldsymbol{R}_{ij}(\alpha)\boldsymbol{x} = \boldsymbol{y} = (y_1,\ldots,y_m)^{\top}$ such that

$\displaystyle y_k = \begin{cases}x_k & \text{for}\quad k \neq i,j\;, \\ c x_i +... ...ext{for}\quad k = i\;, \\ -s x_i + c x_j & \text{for}\quad k = j\;. \end{cases}$

For a vector $\boldsymbol {x}$ with nonzero elements

, setting $d = (x_i^2 + x_j^2)^{1/2}$ ,

leads to

$\displaystyle \left(\begin{array}{rrr} c && s\\ -s && c \end{array}\right) \lef... ...\\ x_j \end{array}\right) = \left(\begin{array}{c} d \\ 0 \end{array}\right)\;.$

Thus, using GR with this specific choice of

and

(referred further as $\boldsymbol{R}_{ij}^0$ ) implies that the

th component of the vector $\boldsymbol {x}$ vanishes. Similarly to HRs, it is not necessary to explicitly construct the whole matrix $\boldsymbol{P}$ to transform $\boldsymbol {x}$ since the rotation is fully described by only two numbers:

and

. This elementary rotation $\boldsymbol{R}_{ij}^0$ does not however constitute a solution to Problem 1 yet: we need to combine more of them.

The next step employs a simple fact that the pre- or postmultiplication of a vector $\boldsymbol {x}$ or a matrix $\boldsymbol{A}$ by any GR $\boldsymbol{R}_{ij}(\alpha)$ affects only the th and th rows and columns, respectively. Hence, one can combine several rotations without one rotation spoiling the result of another rotation. (Consequently, GRs are more flexible than HRs). Two typical ways how GRs are used for solving Problem 1 mentioned in Sect. 4.1.3 follow.

$\boldsymbol{R}_{1n}^0\boldsymbol{R}_{1,n-1}^0\ldots \boldsymbol{R}_{13}^0\boldsymbol{R}_{12}^0\boldsymbol{x} = d\boldsymbol{e}_1$ . Here the th component of the vector $\boldsymbol {x}$ vanishes after the Givens rotation $\boldsymbol{R}_{1k}^0$ . The previously zeroed elements $x_2,\ldots,x_{k-1}$ are not changed because rotation $\boldsymbol{R}_{1k}$ affects only the first and th component.
$\boldsymbol{R}_{12}^0\boldsymbol{R}_{23}^0\ldots \boldsymbol{R}_{n-1,n}^0\boldsymbol{x} = d\boldsymbol{e}_1$ . Here the th component vanishes by the rotation $R_{k-1,k}$ .

Finally, there are several algorithms for computing the Givens rotations that improve over the straightforward evaluation of $\boldsymbol{R}_{ij}^0 \boldsymbol{x}$ . A robust algorithm minimizing the loss of precision is given in [7]. An algorithm minimizing memory requirements was proposed by [58]. On the other hand, [16] and [29] proposed modifications aiming to minimize the number of arithmetic operations.

4.1.3.3 QR Decomposition by Householder Reflections or Givens Rotations

An appropriate combination of HRs or GRs, respectively, can be used to compute the QR decomposition of a given matrix $\boldsymbol{A}\in {\mathbb{R}}^{m\times n}$ , $m \ge n$ , in a following way. Let $\boldsymbol{Q}_{i}$ , $i = 1,\ldots,n-1$ , denote an orthonormal matrix in ${\mathbb{R}}^{m\times m}$ such that premultiplication of $\boldsymbol{B} = \boldsymbol{Q}_{i-1} \cdots \boldsymbol{Q}_{1} \boldsymbol{A}$ by $\boldsymbol{Q}_{i}$ can zero all elements in the th column that are below the diagonal and such that the previous columns $1,\ldots,i-1$ are not affected at all. Such a matrix can be a blockwise diagonal matrix with blocks being the identity matrix $\boldsymbol{I}_{i-1}$ and a matrix $\boldsymbol{M}$ solving Problem 1 for the vector composed of elements in the th column of $\boldsymbol{B}$ that lie on and below the diagonal. The first part $\boldsymbol{I}_{i-1}$ guarantees that the columns $1,\ldots,i-1$ of matrix $\boldsymbol{B}$ are not affected by multiplication, whereas the second block $\boldsymbol{M}$ transforms all elements in the th column that are below the diagonal to zero. Naturally, matrix $\boldsymbol{M}$ can be found by means of HRs or GRs as described in previous paragraphs.

This way, we construct a series of matrices $\boldsymbol{Q}_{1},\ldots,\boldsymbol{Q}[n]$ such that

$\displaystyle \boldsymbol{Q}[n] \cdots \boldsymbol{Q}_{1} \boldsymbol{A} = \left(\begin{array}{c}\boldsymbol{R}\\ \boldsymbol{0}\end{array}\right)\;.$

Since all matrices $\boldsymbol{Q}_{1},\ldots,\boldsymbol{Q}[n]$ are orthonormal, $\boldsymbol{Q}[t] = \boldsymbol{Q}[n] \cdots \boldsymbol{Q}_{1}$ is also orthonormal and its inverse equals its transpose: $\boldsymbol{Q}_{\boldsymbol{t}}^{-1} = \boldsymbol{Q}_{\boldsymbol{t}}^{\top}$ . Hence,

$\displaystyle \boldsymbol{A} = (\boldsymbol{Q}[n] \cdots \boldsymbol{Q}_{1})^{\... ...mbol{Q} \left(\begin{array}{c}\boldsymbol{R}\\ \boldsymbol{0}\end{array}\right)$

as described in Theorem 3.

We describe now the QR algorithm using HRs or GRs. Let $\boldsymbol{M}(\boldsymbol{x})$ denote the orthonormal matrix from Problem 1 constructed for a vector $\boldsymbol {x}$ by one of the discussed methods.

Algorithm 3
$\boldsymbol{Q} = \boldsymbol{I}_{m}$
$\boldsymbol{R} = \boldsymbol{A}$
for

to

     $\boldsymbol{x} = \{R_{ki}\}_{k=i}^m$
     $\begin{displaymath}\boldsymbol{Q}[i] = \left( \begin{array}{cc} \boldsymbol{I}_{... ...ol{0} & \boldsymbol{M}(\boldsymbol{x}) \\ \end{array} \right) \end{displaymath}$
     $\boldsymbol{Q} = \boldsymbol{Q}_{i} \boldsymbol{Q}$
     $\boldsymbol{R} = \boldsymbol{Q}_{i} \boldsymbol{R}$
     end
$\boldsymbol{Q} = \boldsymbol{Q}^{\top}$
$\boldsymbol{R} = \left\{R_{ij}\right\}_{i=1,j=1}^{n,n}$

There are also modifications of this basic algorithm employing pivoting for better numerical performance and even revealing rank of the system, see [38] and [36] for instance. An error analysis of the QR decomposition by HRs and GRs are given by [17] and [36], respectively.

4.1.3.4 Gram-Schmidt Orthogonalization

Given a nonsingular matrix $\boldsymbol{A} \in {\mathbb{R}}^{m \times n}, m \ge n$ , the Gram-Schmidt orthogonalization constructs a matrix $\boldsymbol{Q}$ such that the columns of $\boldsymbol{Q}$ are orthonormal to each other and span the same space as the columns of $\boldsymbol{A}$ . Thus, $\boldsymbol{A}$ can be expressed as $\boldsymbol{Q}$ multiplied by another matrix $\boldsymbol{R}$ , whereby the Gram-Schmidt orthogonalization process (GS) ensures that $\boldsymbol{R}$ is an upper triangular matrix. Consequently, GS can be used to construct the QR decomposition of a matrix $\boldsymbol{A}$ . A survey of GS variants and their properties is given by [6].

The classical Gram-Schmidt (CGS) process constructs the orthonormal basis stepwise. The first column $\boldsymbol{Q}_{1}$ of $\boldsymbol{Q}$ is simply normalized $\boldsymbol{A}_{1}$ . Having constructed a orthonormal base $\boldsymbol{Q}_{1:k} = \{ \boldsymbol{Q}_{1},\ldots,\boldsymbol{Q}_{k} \}$ , the next column $\boldsymbol{Q}_{k+1}$ is proportional to $\boldsymbol{A}_{k+1}$ minus its projection to the space span $\{ \boldsymbol{Q}_{1:k} \}$ . Thus, $\boldsymbol{Q}_{k+1}$ is by its definition orthogonal to span $\{ \boldsymbol{Q}_{1:k} \}$ , and at the same time, the first columns of $\boldsymbol{A}$ and $\boldsymbol{Q}$ span the same linear space. The elements of the triangular matrix $\boldsymbol{R}$ from Theorem 3 are then coordinates of the columns of $\boldsymbol{A}$ given the columns of $\boldsymbol{Q}$ as a basis.

Algorithm 4
for

to

for

to

         $R_{ji} = \boldsymbol{Q}_{j}^{\top} \boldsymbol{A}_{i}$
     end
     $\boldsymbol{Q}_{i} = \boldsymbol{A}_{i} - \sum_{j=1}^{i-1} R_{ji} \boldsymbol{Q}_{j}$
     $R_{ii} = \left(\boldsymbol{Q}_{i}^{\top} \boldsymbol{Q}_{i}\right)^{1/2}$
     $\boldsymbol{Q}_{i} = \boldsymbol{Q}_{i} / R_{ii}$
end

Similarly to many decomposition algorithms, also CGS allows a memory efficient implementation since the computed orthonormal columns of $\boldsymbol{Q}$ can rewrite the original columns of $\boldsymbol{A}$ . Despite this feature and mathematical correctness, the CGS algorithm does not always behave well numerically because numerical errors can very quickly accumulate. For example, an error made in computing $\boldsymbol{Q}_{1}$ affects $\boldsymbol{Q}_{2}$ , errors in both of these terms (although caused initially just by an error in $\boldsymbol{Q}_{1}$ ) adversely influence $\boldsymbol{Q}_{3}$ and so on. Fortunately, there is a modified Gram-Schmidt (MGS) procedure, which prevents such an error accumulation by subtracting linear combinations of $\boldsymbol{Q}_{k}$ directly from $\boldsymbol{A}$ before constructing following orthonormal vectors. (Surprisingly, MGS is historically older than CGS.)

Algorithm 5
for

to

     $\boldsymbol{Q}_{i} = \boldsymbol{A}_{i}$
     $R_{ii} = \left(\boldsymbol{Q}_{i}^{\top} \boldsymbol{Q}_{i}\right)^{1/2}$
     $\boldsymbol{Q}_{i} = \boldsymbol{Q}_{i} / R_{ii}$
     for

to

         $R_{ji} = \boldsymbol{Q}_{i}^{\top} \boldsymbol{A}_{j}$
         $\boldsymbol{A}_{j} = \boldsymbol{A}_{j} - R_{ij} \boldsymbol{Q}_{i}$
     end
end

Apart from this algorithm (the row version of MGS), there are also a column version of MGS by [6] and MGS modifications employing iterative orthogonalization and pivoting by [9]. Numerical superiority of MGS over CGS was experimentally established already by [53]. This result is also theoretically supported by the GS error analysis in [6], who uncovered numerical equivalence of the QR decompositions done by MGS and HRs.

4.1.4 Singular Value Decomposition

The singular value decomposition (SVD) plays an important role in numerical linear algebra and in many statistical techniques as well. Using two orthonormal matrices, SVD can diagonalize any matrix $\boldsymbol{A}$ and the results of SVD can tell a lot about (numerical) properties of the matrix. (This is closely related to the eigenvalue decomposition: any symmetric square matrix $\boldsymbol{A}$ can be diagonalized, $\boldsymbol{A} = \boldsymbol{V} \boldsymbol{D} \boldsymbol{V}^{\top}$ , where $\boldsymbol{D}$ is a diagonal matrix containing the eigenvalues of $\boldsymbol{A}$ and $\boldsymbol{V}$ is an orthonormal matrix.)

Theorem 5 Let $\boldsymbol{A}\in {\mathbb{R}}^{m\times n}$ be a matrix of rank . Then there exist orthonormal matrices $\boldsymbol{U}\in {\mathbb{R}}^{m\times m}$ and $\boldsymbol{V}\in {\mathbb{R}}^{n\times n}$ and a diagonal matrix $\boldsymbol{D}\in {\mathbb{R}}^{m\times n}$ , with the diagonal elements $\sigma_1 \ge \sigma_2 \ge \ldots \ge \sigma_r > \sigma_{r+1} = \ldots = \sigma_{\min\{m,n\}} = 0$ , such that $\boldsymbol{A} = \boldsymbol{U} \boldsymbol{D} \boldsymbol{V}^{\top}$ .

Numbers $\sigma_1,\ldots,\sigma_{\min\{m,n\}}$ represent the singular values of $\boldsymbol{A}$ . Columns $\boldsymbol{U}_i$ and $\boldsymbol{V}_i$ of matrices $\boldsymbol{U}$ and $\boldsymbol{V}$ are called the left and right singular vectors of $\boldsymbol{A}$ associated with singular value $\sigma_i$ , respectively, because $\boldsymbol{A} \boldsymbol{V}_i = \sigma_i \boldsymbol{U}_i$ and $\boldsymbol{U}_i^{\top} \boldsymbol{A} = \sigma_i \boldsymbol{V}_i^{\top}, i = 1,\ldots,\min\{m,n\}$ .

Similarly to the QR decomposition, SVD offers a numerically stable way to solve a system of linear equations. Given a system $\boldsymbol{A} \boldsymbol{x} = \boldsymbol{U} \boldsymbol{D} \boldsymbol{V}^{\top} \boldsymbol{x} = \boldsymbol{b}$ , one can transform it to $\boldsymbol{U}^{\top} \boldsymbol{A} \boldsymbol{x} = \boldsymbol{D} \boldsymbol{V}^{\top} \boldsymbol{x} = \boldsymbol{U}^{\top} \boldsymbol{b}$ and solve it in two trivial steps: first, finding a solution $\boldsymbol{z}$ of $\boldsymbol{D} \boldsymbol{z} = \boldsymbol{U}^{\top} \boldsymbol{b}$ , and second, setting $\boldsymbol{x} = \boldsymbol{V} \boldsymbol{z}$ , which is equivalent to $\boldsymbol{V}^{\top} \boldsymbol{x} = \boldsymbol{z}$ .

On the other hand, the power of SVD lies in its relation to many important matrix properties; see [62], for instance. First of all, the singular values of a matrix $\boldsymbol{A}$ are equal to the (positive) square roots of the eigenvalues of $\boldsymbol{A}^{\!\top} \boldsymbol{A}$ and $\boldsymbol{A}\boldsymbol{A}^{\!\top}$ , whereby the associated left and right singular vectors are identical with the corresponding eigenvectors. Thus, one can compute the eigenvalues of $\boldsymbol{A}^{\!\top} \boldsymbol{A}$ directly from the original matrix $\boldsymbol{A}$ . Second, the number of nonzero singular values equals the rank of a matrix. Consequently, SVD can be used to find an effective rank of a matrix, to check a near singularity and to compute the condition number of a matrix. That is, it allows to assess conditioning and sensitivity to errors of a given system of equations. Finally, let us note that there are far more uses of SVD: identification of the null space of $\boldsymbol{A}$ , $\mathrm{null}(\boldsymbol{A}) =$ span $\{\boldsymbol{V}_{k+1},\ldots,\boldsymbol{V}_{n}\}$ ; computation of the matrix pseudo-inverse, $\boldsymbol{A}^- = \boldsymbol{V} \boldsymbol{D}^- \boldsymbol{U}^{\top}$ ; low-rank approximations and so on. See [7] and [62] for details.

Let us now present an overview of algorithms for computing the SVD decomposition, which are not described in details due to their extent. The first stable algorithm for computing the SVD was suggested by [22]. It involved reduction of a matrix $\boldsymbol{A}$ to its bidiagonal form by HRs, with singular values and vectors being computed as eigenvalues and eigenvectors of a specific tridiagonal matrix using a method based on Sturm sequences. The final form of the QR algorithm for computing SVD, which has been the preferred SVD method for dense matrices up to now, is due to [23]; see [2,7] or [15] for the description of the algorithm and some modifications. An alternative approach based on Jacobi algorithm was given by [30]. Latest contributions to the pool of computational methods for SVD, including [63,10] and [36], aim to improve the accuracy of singular values and computational speed using recent advances in the QR decomposition.

4.1.5 Matrix Inversion

In previous sections, we described how matrix decompositions can be used for solving systems of linear equations. Let us now discuss the use of matrix decompositions for inverting a nonsingular squared matrix $\boldsymbol{A}\in {\mathbb{R}}^{n\times n}$ , although matrix inversion is not needed very often. All discussed matrix decomposition construct two or more matrices $\boldsymbol{A}_{1},\ldots,\boldsymbol{A}_{d}$ such that $\boldsymbol{A} = \boldsymbol{A}_{1}\cdot \ldots \cdot \boldsymbol{A}_{d}$ , where matrices $\boldsymbol{A}_{l}, l=1,\ldots,d,$ are orthonormal, triangular, or diagonal. Because $\boldsymbol{A}^{\!-1} = \boldsymbol{A}_{\boldsymbol{d}}^{\!-1}\cdot \ldots \cdot \boldsymbol{A}_{\boldsymbol{1}}^{\!-1},$ we just need to be able to invert orthonormal and triangular matrices (a diagonal matrix is a special case of a triangular matrix).

First, an orthonormal matrix $\boldsymbol{Q}$ satisfies by definition $\boldsymbol{Q}^{\top} \boldsymbol{Q} = \boldsymbol{Q} \boldsymbol{Q}^{\top} = \boldsymbol{I}_{n}$ . Thus, inversion is in this case equivalent to the transposition of a matrix: $\boldsymbol{Q}^{-1} = \boldsymbol{Q}^{\top}$ .

Second, inverting an upper triangular matrix $\boldsymbol{U}$ can be done by solving directly $\boldsymbol{X} \boldsymbol{U} = \boldsymbol{I}_{n}$ , which leads to the backward substitution method. Let $\boldsymbol{X} = \{ X_{ij} \}_{i=1,j=1}^{n,n}$ denote the searched for inverse matrix $\boldsymbol{U}^{-1}$ .

Algorithm 6
$\boldsymbol{X} = \boldsymbol{0}_{n}$
for

to

$X_{ii} = 1 / U_{ii}$
for

to

$X_{ij} = - \left( \sum_{k=i+1}^j X_{kj} U_{ik} \right) / U_{jj}$
end
end

The inversion of a lower triangular matrix $\boldsymbol{L}$ can be done analogously: the algorithm is applied to $\boldsymbol{L}^{\top}$ , that is, $U_{ij}$ is replaced by $L_{ji}$ for $i,j=1,\ldots,n$ .

There are several other algorithms available such as forward substitution or blockwise inversion. Designed for a faster and more (time) efficient computation, their numerical behavior does not significantly differ from the presented algorithm. See [8] for an overview and numerical study.

Next: 4.2 Direct Methods for Up: 4. Numerical Linear Algebra Previous: 4. Numerical Linear Algebra