Direct methods for solving linear systems theoretically give the exact solution in a finite number of steps, see Sect. 4.2. Unfortunately, this is rarely true in applications because of rounding errors: an error made in one step spreads further in all following steps! Contrary to direct methods, iterative methods construct a series of solution approximations such that it converges to the exact solution of a system. Their main advantage is that they are self-correcting, see Sect. 4.3.1.
In this section, we first discuss general principles of iterative methods that solve linear system (4.6), , whereby we assume that and the system has exactly one solution (see Sect. 4.2 for more details on other cases). Later, we describe most common iterative methods: the Jacobi, Gauss-Seidel, successive overrelaxation, and gradient methods (Sects. 4.3.2-4.3.5). Monographs containing detailed discussion of these methods include [7,24] and [27]. Although we treat these methods separately from the direct methods, let us mention here that iterative methods can usually benefit from a combination with the Gauss elimination, see [46] and [1], for instance.
To unify the presentation of all methods, let , , and denote the diagonal, lower triangular and upper triangular parts of a matrix throughout this section:
An iterative method for solving a linear system constructs an iteration series , , that under some conditions converges to the exact solution of the system ( ). Thus, it is necessary to choose a starting point and iteratively apply a rule that computes from an already known .
A starting vector is usually chosen as some approximation of . (Luckily, its choice cannot cause divergence of a convergent method.) Next, given , the subsequent element of the series is computed using a rule of the form
Let us discuss now a minimal set of conditions on and in (4.8) that guarantee the convergence of an iterative method. First of all, it has to hold that for all , or equivalently,
In praxis, stationary iterative methods are used, that is, methods with constant and , . Consequently, an iteration series is then constructed using
Note that the convergence condition holds, for example, if in any matrix norm. Moreover, Theorem 7 guarantees the self-correcting property of iterative methods since convergence takes place independent of the starting value . Thus, if computational errors adversely affect during the th iteration, can be considered as a new starting vector and the iterative method will further converge. Consequently, the iterative methods are in general more robust than the direct ones.
Apparently, such an iterative process can continue arbitrarily long unless at some point. This is impractical and usually unnecessary. Therefore, one uses stopping (or convergence) criteria that stop the iterative process when a pre-specified condition is met. Commonly used stopping criteria are based on the change of the solution or residual vector achieved during one iteration. Specifically, given a small , the iterative process is stopped after the th iteration when , , or , where is a residual vector. Additionally, a maximum acceptable number of iterations is usually specified.
The Jacobi method is motivated by the following observation. Let have nonzero diagonal elements (the rows of any nonsingular matrix can be reorganized to achieve this). Then the diagonal part of is nonsingular and the system (4.6) can be rewritten as . Consequently,
The intuition of the Jacobi method is very simple: given an approximation of the solution, let us express the th component of as a function of the other components from the th equation and compute given :
The Jacobi method converges for any starting vector as long as , see Theorem 7. This condition is satisfied for a relatively big class of matrices including diagonally dominant matrices (matrices such that for ), and symmetric matrices such that , , and are all positive definite. Although there are many improvements to the basic principle of the Jacobi method in terms of convergence to , see Sects. 4.3.3 and 4.3.4, its advantage is an easy and fast implementation (elements of a new iteration can be computed independently of each other).
Analogously to the Jacobi method, we can rewrite system (4.6) as , which further implies . This leads to the iteration formula of the Gauss-Seidel method:
Following the Theorem 7, the Gauss-Seidel method converges for any starting vector if . This condition holds, for example, for diagonally dominant matrices as well as for positive definite ones.
The successive overrelaxation (SOR) method aims to further refine the Gauss-Seidel method. The Gauss-Seidel formula (4.12) can be rewritten as
A good choice of parameter can speed up convergence, as measured by the spectral radius of the corresponding iteration matrix (see Theorem 7; a lower spectral radius means faster convergence). There is a choice of literature devoted to the optimal setting of relaxation parameter: see [28] for a recent overview of the main results concerning SOR. We just present one important result, which is due to [66].
Using SOR with the optimal relaxation parameter significantly increases the rate of convergence. Note however that the convergence acceleration is obtained only for very close to . If cannot be computed exactly, it is better to take slightly larger rather than smaller. [24] describe an approximation algorithm for .
On the other hand, if the assumptions of Theorem 8 are not satisfied, one can employ the symmetric SOR (SSOR), which performs the SOR iteration twice: once as usual, see (4.13), and once with interchanged and . SSOR requires more computations per iteration and usually converges slower, but it works for any positive definite matrix and can be combined with various acceleration techniques. See [7] and [28] for details.
Gradient iterative methods are based on the assumption that is a symmetric positive definite matrix . They use this assumption to reformulate (4.6) as a minimization problem: is the only minimum of the quadratic form
Given this minimization problem, gradient methods construct an iteration series of vectors converging to using the following principle. Having the th approximation , choose a direction and find a number such that the new vector
Interestingly, the Gauss-Seidel method can be seen as a gradient method for the choice
The steepest descent method is based on the direction given by the gradient of at . Denoting the residuum of the th approximation , the iteration formula is
In the conjugate gradient (CG) method proposed by [33], the directions are generated by the -orthogonalization of residuum vectors. Given a symmetric positive definite matrix , -orthogonalization is a procedure that constructs a series of linearly independent vectors such that for (conjugacy or -orthogonality condition). It can be used to solve the system (4.6) as follows ( represents residuals).
do
until a stop criterion holds
An interesting theoretic property of CG is that it reaches the exact solution in at most steps because there are not more than ( -)orthogonal vectors. Thus, CG is not a truly iterative method. (This does not have to be the case if is a singular or non-square matrix, see [42].) On the other hand, it is usually used as an iterative method, because it can give a solution within the given accuracy much earlier than after iterations. Moreover, if the approximate solution after iterations is not accurate enough (due to computational errors), the algorithm can be restarted with set to . Finally, let us note that CG is attractive for use with large sparse matrices because it addresses only by its multiplication by a vector. This operation can be done very efficiently for a properly stored sparse matrix, see Sect. 4.5.
The principle of CG has many extensions that are applicable also for nonsymmetric nonsingular matrices: for example, generalized minimal residual, [55]; (stabilized) biconjugate gradients, [64]; or quasi-minimal residual, [13].