Next: 4.3 Iterative Methods for Up: 4. Numerical Linear Algebra Previous: 4.1 Matrix Decompositions

Subsections

4.2 Direct Methods
for Solving Linear Systems

A system of linear equations can be written in the matrix notation as

$\displaystyle \boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}\;,$

(4.6)

where $\boldsymbol{A}$ denotes the coefficient matrix, $\boldsymbol{b}$ is the right-hand side, and $\boldsymbol {x}$ represents the solution vector we search for. The system (4.6) has a solution if and only if $\boldsymbol{b}$ belongs to the vector space spanned by the columns of $\boldsymbol{A}$ .

If , that is, the number of equations is smaller than the number of unknown variables, or if $m \geq n$ but $\boldsymbol{A}$ does not have a full rank (which means that some equations are linear combinations of the other ones), the system is underdetermined and there are either no solution at all or infinitely many of them. In the latter case, any solution can be written as a sum of a particular solution and a vector from the nullspace of $\boldsymbol{A}$ . Finding the solution space can involve the SVD decomposition (Sect. 4.1.4).
If and the matrix $\boldsymbol{A}$ has a full rank, that is, if the number of equations is greater than the number of unknown variables, there is generally no solution and the system is overdetermined. One can search some $\boldsymbol {x}$ such that the distance between $\boldsymbol{A}\boldsymbol{x}$ and $\boldsymbol{b}$ is minimized, which leads to the linear least-squares problem if distance is measured by L norm; see Chap. III.8.
If and the matrix $\boldsymbol{A}$ is nonsingular, the system (4.6) has a unique solution. Methods suitable for this case will be discussed in the rest of this section as well as in Sect. 4.3.

From here on, we concentrate on systems of equations with unique solutions.

There are two basic classes of methods for solving system (4.6). The first class is represented by direct methods. They theoretically give an exact solution in a (predictable) finite number of steps. Unfortunately, this does not have to be true in computational praxis due to rounding errors: an error made in one step spreads in all following steps. Classical direct methods are discussed in this section. Moreover, solving an equation system by means of matrix decompositions, as discussed in Sect. 4.1, can be classified as a direct method as well. The second class is called iterative methods, which construct a series of solution approximations that (under some assumptions) converges to the solution of the system. Iterative methods are discussed in Sect. 4.3. Finally, note that some methods are on the borderline between the two classes; for example, gradient methods (Sect. 4.3.5) and iterative refinement (Sect. 4.2.2).

Further, the direct methods discussed in this section are not necessarily optimal for an arbitrary system (4.6). Let us deal with the main exceptions. First, even if a unique solution exist, numerical methods can fail to find the solution: if the number of unknown variables is large, rounding errors can accumulate and result in a wrong solution. The same applies very much to systems with a nearly singular coefficient matrix. One alternative is to use iterative methods (Sect. 4.3), which are less sensitive to these problems. Another approach is to use the QR or SVD decompositions (Sect. 4.1), which can transform some nearly singular problems to nonsingular ones. Second, very large problems including hundreds or thousands of equations and unknown variables may be very time demanding to solve by standard direct methods. On the other hand, their coefficient matrices are often sparse, that is, most of their elements are zeros. Special strategies to store and solve such problems are discussed in Sect. 4.5.

To conclude these remarks, let us mention a close relation between solving the system (4.6) and computing the inverse matrix $\boldsymbol{A}^{-1}$ :

having an algorithm that for a matrix $\boldsymbol{A}$ computes $\boldsymbol{A}^{-1}$ , we can find the solution to (4.6) as $\boldsymbol{x} = \boldsymbol{A}^{-1}\boldsymbol{b}$ ;
an algorithm solving the system (4.6) can be used to compute $\boldsymbol{A}^{-1}$ as follows. Solve linear systems $\boldsymbol{A}\boldsymbol{x}_i = \boldsymbol{e}_i$ , $i=1,\ldots,n$ (or the corresponding system with multiple right-hand sides), where $\boldsymbol{e}_i$ denotes the th unit vector. Then $\boldsymbol{A}^{-1} = (\boldsymbol{x}_1,\ldots,\boldsymbol{x}_n)$ .

In the rest of this section, we concentrate on the Gauss-Jordan elimination (Sect. 4.2.1) and its modifications and extensions, such as iterative refinement (Sect. 4.2.2). A wealth of information on direct methods can be found in monographs [3,15] and [24].

4.2.1 Gauss-Jordan Elimination

In this subsection, we will simultaneously solve the linear systems

$\displaystyle \boldsymbol{A}\boldsymbol{x}_1 = \boldsymbol{b}_1\;, \quad \bolds... ...b}_2\;, \quad \ldots\;, \quad \boldsymbol{A}\boldsymbol{x}_k = \boldsymbol{b}_k$

and a matrix equation $\boldsymbol{A}\boldsymbol{X} = \boldsymbol{B}$ , where $\boldsymbol{X},\boldsymbol{B} \in {\mathbb{R}}^{n \times l}$ (its solution is $\boldsymbol{X} = \boldsymbol{A}^{-1}\boldsymbol{B}$ , yielding the inverse $\boldsymbol{A}^{-1}$ for a special choice $\boldsymbol{B} = \boldsymbol{I}_{n}$ ). They can be written as a linear matrix equation

$\displaystyle \boldsymbol{A}{} [ \boldsymbol{x}_1 \vert \boldsymbol{x}_2 \vert ... ... \boldsymbol{b}_2 \vert \ldots \vert \boldsymbol{b}_k \vert \boldsymbol{B} ]\;,$

(4.7)

where the operator $\vert$ stands for column augmentation.

The Gauss-Jordan elimination (GJ) is based on elementary operations that do not affect the solution of an equation system. The solution of (4.7) will not change if we perform any of the following operations:

interchanging any two rows of $\boldsymbol{A}$ and the corresponding rows of $\boldsymbol{b}_i$ 's and $\boldsymbol{B}$ , $i=1,\ldots,k$ ;
multiplying a row of $\boldsymbol{A}$ and the same row of $\boldsymbol{b}_i$ 's and $\boldsymbol{B}$ by a nonzero number, $i=1,\ldots,k$ ;
adding to a chosen row of $\boldsymbol{A}$ and the same row of $\boldsymbol{b}_i$ 's and $\boldsymbol{B}$ a linear combination of other rows, $i=1,\ldots,k$ .

Interchanging any two columns of $\boldsymbol{A}$ is possible too, but it has to be followed by interchanging the corresponding rows of all solutions $\boldsymbol{x}_i$ and $\boldsymbol {X}$ as well as of right sides $\boldsymbol{b}_i$ and $\boldsymbol{B}$ , $i=1,\ldots,k$ . Each row or column operation described above is equivalent to the pre- or postmultiplication of the system by a certain elementary matrix $\boldsymbol{R}$ or $\boldsymbol{C}$ , respectively, that are results of the same operation applied to the identity matrix $\boldsymbol{I}_{n}$ .

GJ is a technique that applies one or more of these elementary operations to (4.7) so that $\boldsymbol{A}$ becomes the identity matrix $\boldsymbol{I}_{n}$ . Simultaneously, the right-hand side becomes the set of solutions. Denoting $\boldsymbol{R}_{i}$ , $i = 1,\ldots,O$ , the matrices corresponding to the th row operation, the combination of all operations has to constitute inverse $\boldsymbol{A}^{-1} = \boldsymbol{R}_{O} \cdot \ldots \cdot \boldsymbol{R}_{3} \boldsymbol{R}_{2} \boldsymbol{R}_{1}$ and hence $\boldsymbol{x} = \boldsymbol{R}_{O} \cdot \ldots \cdot \boldsymbol{R}_{3} \boldsymbol{R}_{2} \boldsymbol{R}_{1} \boldsymbol{b}$ . The exact choice of these elementary operation is described in the following paragraph.

4.2.1.1 Pivoting in Gauss-Jordan Elimination

Let us now discuss several well-known variants of the Gauss-Jordan elimination. GJ without pivoting does not interchange any rows or columns; only multiplication and addition of rows are permitted. First, nonzero nondiagonal elements in the first column $\boldsymbol{A}_1$ are eliminated: the first row of (4.7) is divided by its diagonal element $A_{11}$ and the $A_{i1}$ -multiple of the modified first row is subtracted from the th row, $i = 2,\ldots,n$ . We can proceed the same way for all columns of $\boldsymbol{A}$ , and thus, transform $\boldsymbol{A}$ to the identity matrix $\boldsymbol{I}_{n}$ . It is easy to see that the method fails if the diagonal element in a column to be eliminated, the so-called pivot, is zero in some step. Even if this is not the case, one should be aware that GJ without pivoting is numerically unstable.

On the other hand, the GJ method becomes stable when using pivoting. This means that one can interchange rows (partial pivoting) or rows and columns (full pivoting) to put a suitable matrix element to the position of the current pivot. Since it is desirable to keep the already constructed part of the identify matrix, only rows below and columns right to the current pivot are considered. GJ with full pivoting is numerically stable. From the application point of view, GJ with partial pivoting is numerically stable too, although there are artificial examples where it fails. Additionally, the advantage of partial pivoting (compared to full pivoting) is that it does not change the order of solution components.

There are various strategies to choose a pivot. A very good choice is the largest available element (in absolute value). This procedure depends however on the original scaling of the equations. Implicit pivoting takes scaling into account and chooses a pivot as if the original system were rescaled so that the largest element of each row would be equal to one.

Finally, let us add several concluding remarks on efficiency of GJ and its relationship to matrix decompositions. As shown, GJ can efficiently solve problems with multiple right-hand sides known in advance and compute $\boldsymbol{A}^{-1}$ at the same time. On the other hand, if it is necessary to solve later a new system with the same coefficient matrix $\boldsymbol{A}$ but a new right-hand side $\boldsymbol{b}$ , one has to start the whole elimination process again, which is time demanding, or compute $\boldsymbol{A}^{-1} \boldsymbol{b}$ using the previously computed inverse matrix $\boldsymbol{A}^{-1}$ , which leads to further error accumulation. In praxis, one should prefer matrix decompositions, which do not have this drawback. Specifically, the LU decomposition (Sect. 4.1.2) is equivalent to GJ (with the same kind of pivoting applied in both cases) and allows us to repeatedly solve systems with the same coefficient matrix in an efficient way.

4.2.2 Iterative Refinement

In the introduction to Sect. 4.2, we noted that direct methods are rather sensitive to rounding errors. Iterative refinement offers a way to improve the solution obtained by any direct method, unless the system matrix $\boldsymbol{A}$ is too ill-conditioned or even singular.

Let $\boldsymbol{x}_1$ denote an initially computed (approximate) solution of (4.6). Iterative refinement is a process constructing a series $\boldsymbol{x}_i$ , $i=1,2,\ldots,$ as described in Algorithm 7. First, given a solution $\boldsymbol{x}_i$ , the residuum $\boldsymbol{r}_i = \boldsymbol{A} \boldsymbol{x}_i - \boldsymbol{b}$ is computed. Then, one obtains the correction $\boldsymbol{\Delta x}_i$ by solving the original system with residuum $\boldsymbol{r}_i$ on the right-hand side.

Algorithm 7
Repeat for $i = 1,2,\ldots$
     compute $\boldsymbol{r}_i = \boldsymbol{b} - \boldsymbol{A} \boldsymbol{x}_i$
     solve $\boldsymbol{A} \boldsymbol{\Delta x}_i = \boldsymbol{r}_i$ for $\boldsymbol{\Delta x}_i$
     set $\boldsymbol{x}_{i+1} = \boldsymbol{x}_i + \boldsymbol{\Delta x}_i$
until the desired precision is achieved.

It is reasonable to carry out the computation of residuals $\boldsymbol{r}_i$ in a higher precision because a lot of cancellation occurs if $\boldsymbol{x}_i$ is a good approximation. Nevertheless, provided that the coefficient matrix $\boldsymbol{A}$ is not too ill-conditioned, [57] proved that GJ with partial pivoting and only one step of iterative refinement computed in a fixed precision is stable (it has a relative backward error proportional to the used precision). In spite of this result, one can recommend to use iterative refinement repeatedly until the desired precision is reached.

Additionally, an important feature of iterative refinement is its low computational costs. Provided that a system is solved by means of decompositions (e.g., GJ is implemented as the LU decomposition), a factorization of $\boldsymbol{A}$ is available already after computing the initial solution $\boldsymbol{x}_1$ . Subsequently, solving any system with the same coefficient matrix $\boldsymbol{A}$ , such as $\boldsymbol{A} \boldsymbol{\Delta x}_i = \boldsymbol{r}_i$ , can be done fast and efficiently and the computational costs of iterative refinement are small.

Next: 4.3 Iterative Methods for Up: 4. Numerical Linear Algebra Previous: 4.1 Matrix Decompositions

4.2 Direct Methods for Solving Linear Systems

4.2.1 Gauss-Jordan Elimination

4.2.1.1 Pivoting in Gauss-Jordan Elimination

4.2.2 Iterative Refinement

4.2 Direct Methods
for Solving Linear Systems