Iterative methods solve a linear system by producing a sequence of approximate solutions. Instead of factoring the matrix once and solving the system directly, an iterative method starts from an initial guess and repeatedly improves it.
The target problem is
A direct method, such as Gaussian elimination, tries to compute the solution in a finite sequence of algebraic steps. An iterative method constructs vectors
with the goal that
as increases.
Iterative methods are especially important for large sparse systems, where direct factorization may require too much memory or computation. Large sparse systems arise in discretized differential equations, graph problems, optimization, inverse problems, simulation, and machine learning. Standard classifications distinguish stationary methods such as Jacobi and Gauss-Seidel from nonstationary Krylov methods such as conjugate gradient and GMRES.
88.1 Direct Methods and Iterative Methods
A direct method attempts to solve
by transforming the system into an equivalent system that can be solved exactly in exact arithmetic.
Examples include:
| Method | Main idea |
|---|---|
| Gaussian elimination | eliminate variables |
| LU factorization | factor into triangular matrices |
| Cholesky factorization | factor symmetric positive definite matrices |
| QR factorization | use orthogonal transformations |
An iterative method instead computes a sequence of approximations. It may stop before the exact solution is reached, once the residual or error is small enough.
| Method type | Strength | Weakness |
|---|---|---|
| Direct | predictable, robust for moderate dense systems | expensive for large sparse systems |
| Iterative | memory efficient, often fast for sparse systems | convergence depends on matrix structure |
For large sparse matrices, iterative methods are often the only practical choice.
88.2 The Residual
Given an approximate solution , the residual is
The residual measures how much the current approximation fails to satisfy the equation.
If
then
so is an exact solution.
Most iterative methods use the residual to decide how to correct the current approximation.
88.3 The Error
Let be the exact solution, and let
be the error at iteration .
Because
we have
Thus
The residual is the image of the error under . A small residual implies a small error only when the matrix is well-conditioned.
88.4 General Fixed-Point Iteration
Many iterative methods can be written as fixed-point iterations.
Starting from
we rewrite the equation in the form
Then define
The matrix is called the iteration matrix.
If this iteration converges, its limit satisfies
which is equivalent to the original system.
88.5 Convergence of Fixed-Point Iteration
Let be the exact solution. Subtracting
from
gives
Therefore,
The iteration converges for every starting vector if and only if
where is the spectral radius of , the largest absolute value of its eigenvalues.
This criterion is central for stationary iterative methods.
88.6 Matrix Splitting
A common construction uses a splitting of the matrix
where is easy to invert.
Then
becomes
so
This gives the iteration
Equivalently,
Thus the iteration matrix is
The quality of the method depends on choosing so that solving systems with is cheap and has favorable convergence behavior.
88.7 Diagonal, Lower, and Upper Parts
Write
where:
| Symbol | Meaning |
|---|---|
| diagonal part of | |
| strictly lower triangular part | |
| strictly upper triangular part |
Different choices of produce different classical methods.
| Method | Choice of |
|---|---|
| Jacobi | |
| Gauss-Seidel | |
| SOR |
These methods are called stationary because the same iteration matrix is used at every step.
88.8 Jacobi Method
The Jacobi method solves each equation for its diagonal variable using values from the previous iteration.
For the system
with entries , the update is
The Jacobi method corresponds to the splitting
Its iteration matrix is
Jacobi is simple and parallelizable because each component of uses only values from the previous iteration.
Its convergence may be slow.
88.9 Gauss-Seidel Method
The Gauss-Seidel method updates components sequentially and immediately uses the newest available values.
The update is
The method corresponds to the splitting
Its iteration matrix is
Gauss-Seidel often converges faster than Jacobi because it uses updated values as soon as they are available, though it remains relatively slow compared with modern Krylov methods.
88.10 Successive Over-Relaxation
Successive over-relaxation, or SOR, modifies Gauss-Seidel by introducing a relaxation parameter .
The update has the form
The parameter controls the step size.
| Range | Interpretation |
|---|---|
| under-relaxation | |
| Gauss-Seidel | |
| over-relaxation |
When is chosen well, SOR may converge much faster than Gauss-Seidel. A poor choice can slow convergence or cause divergence.
88.11 Convergence Conditions
Convergence depends on the matrix and the method.
Some useful sufficient conditions are:
| Matrix property | Consequence |
|---|---|
| Strict diagonal dominance | Jacobi and Gauss-Seidel often converge |
| Symmetric positive definiteness | Gauss-Seidel converges |
| Spectral radius less than one | fixed-point iteration converges |
| Poor conditioning | convergence may be slow |
Strict diagonal dominance means
$$ |a_{ii}|
\sum_{j\ne i}|a_{ij}| $$
for each row.
This condition says that each diagonal entry dominates the off-diagonal entries in its row.
88.12 Stopping Criteria
An iterative method must decide when to stop.
Common stopping criteria include:
| Criterion | Form |
|---|---|
| Absolute residual | |
| Relative residual | |
| Step size | |
| Maximum iterations |
The relative residual is commonly used because it accounts for the scale of .
However, residual-based stopping must be interpreted carefully. If is ill-conditioned, a small residual may still correspond to a large solution error.
88.13 Stationary and Nonstationary Methods
Stationary methods use the same update rule at every iteration.
Examples:
| Stationary method | Feature |
|---|---|
| Jacobi | parallel, simple |
| Gauss-Seidel | sequential, often faster |
| SOR | relaxation parameter |
| SSOR | symmetric form useful as preconditioner |
Nonstationary methods adapt the iteration using information from previous steps.
Examples:
| Nonstationary method | Typical use |
|---|---|
| Conjugate gradient | symmetric positive definite systems |
| MINRES | symmetric indefinite systems |
| GMRES | nonsymmetric systems |
| BiCGSTAB | nonsymmetric systems |
| Chebyshev iteration | spectrum approximately known |
Modern large-scale solvers usually rely on nonstationary methods, especially Krylov subspace methods.
88.14 Krylov Subspaces
Krylov methods search for approximations in spaces generated by repeated multiplication by .
Given an initial residual
the -th Krylov subspace is
A Krylov method chooses from
This approach is powerful because it uses only matrix-vector products.
For sparse matrices, matrix-vector products are cheap compared with factorization.
88.15 Conjugate Gradient Method
The conjugate gradient method solves
when is symmetric positive definite.
It constructs approximations in Krylov subspaces and minimizes the error in the energy norm.
The method uses three main sequences:
| Sequence | Meaning |
|---|---|
| approximate solution | |
| residual | |
| search direction |
The search directions are -conjugate:
for .
In exact arithmetic, conjugate gradient reaches the exact solution in at most iterations for an system. In floating point arithmetic, it is used as an iterative method and often stops much earlier when the residual is sufficiently small.
88.16 Basic Conjugate Gradient Algorithm
Given , define
For :
The method requires one matrix-vector product per iteration.
This makes it attractive for large sparse symmetric positive definite systems.
88.17 Convergence of Conjugate Gradient
The convergence of conjugate gradient depends on the eigenvalues of .
For symmetric positive definite , a standard estimate is
Here
The estimate shows that convergence slows as the condition number grows.
Eigenvalue clustering can make convergence much faster than the worst-case bound suggests.
88.18 GMRES
GMRES stands for generalized minimal residual method.
It is designed for nonsymmetric systems.
At step , GMRES chooses an approximation in the Krylov subspace that minimizes the residual norm:
GMRES is robust, but it stores an increasing number of basis vectors as grows. To control memory, practical implementations often use restarted GMRES.
Restarting reduces memory cost, but it may slow convergence.
88.19 Preconditioning
Preconditioning transforms the system into an equivalent system that is easier for an iterative method to solve.
Instead of solving
we choose a matrix that approximates but is easier to invert.
A left-preconditioned system is
A good preconditioner makes
better conditioned or more favorable spectrally.
Preconditioning is often the difference between a method that converges quickly and one that does not converge in useful time.
88.20 Examples of Preconditioners
Common preconditioners include:
| Preconditioner | Main idea |
|---|---|
| Jacobi | use diagonal of |
| SSOR | use symmetric relaxation |
| Incomplete LU | approximate LU factorization |
| Incomplete Cholesky | approximate Cholesky factorization |
| Multigrid | reduce errors across scales |
| Domain decomposition | split problem into subdomains |
The best preconditioner depends on matrix structure.
There is no universally optimal choice.
88.21 Sparse Matrices
Iterative methods are closely tied to sparse matrices.
A sparse matrix has mostly zero entries. It is stored by recording only the nonzero entries and their locations.
The cost of one matrix-vector product is proportional to the number of nonzero entries:
This is much cheaper than dense multiplication when
For this reason, iterative methods can solve systems far larger than direct dense methods.
88.22 Matrix-Free Methods
Some applications do not form explicitly.
Instead, they provide a function that computes
Such methods are called matrix-free.
Krylov methods are well suited to matrix-free computation because they mainly require matrix-vector products.
This is important in:
| Application | Reason |
|---|---|
| PDE solvers | matrix may be too large to assemble |
| optimization | Hessian-vector products are available |
| inverse problems | operator defined by simulation |
| machine learning | Jacobian or Hessian products computed implicitly |
Matrix-free methods reduce storage cost and allow very large-scale computation.
88.23 Scaling and Normalization
Scaling can improve iterative convergence.
Poorly scaled systems may have entries with vastly different magnitudes. This can slow convergence and worsen numerical behavior.
Common scaling techniques include:
| Technique | Goal |
|---|---|
| Row scaling | balance equations |
| Column scaling | balance variables |
| Diagonal equilibration | reduce magnitude disparities |
| Symmetric scaling | preserve symmetry |
Scaling is often applied before preconditioning.
88.24 Parallelism
Iterative methods are often suitable for parallel computation.
Matrix-vector products can be parallelized across rows.
Vector updates can be parallelized componentwise.
Dot products require global reductions, which may become bottlenecks in distributed systems.
| Operation | Parallel behavior |
|---|---|
| Sparse matrix-vector product | good locality if partitioned well |
| Vector addition | highly parallel |
| Dot product | requires reduction |
| Preconditioner application | depends on preconditioner |
Designing scalable iterative solvers requires attention to communication cost, not only arithmetic cost.
88.25 Choosing an Iterative Method
A practical choice depends on matrix properties.
| Matrix type | Common method |
|---|---|
| Symmetric positive definite | Conjugate gradient |
| Symmetric indefinite | MINRES |
| Nonsymmetric | GMRES, BiCGSTAB |
| Very structured PDE matrix | multigrid |
| Ill-conditioned sparse matrix | Krylov method with preconditioning |
| Small dense matrix | direct factorization |
Before choosing a method, one should ask:
| Question | Why it matters |
|---|---|
| Is symmetric? | determines eligible methods |
| Is positive definite? | enables conjugate gradient |
| Is sparse? | favors iterative methods |
| Is a good preconditioner available? | controls convergence |
| How accurate must the result be? | determines stopping tolerance |
88.26 Failure Modes
Iterative methods may fail or perform poorly for several reasons.
| Failure mode | Cause |
|---|---|
| Divergence | iteration matrix spectral radius too large |
| Stagnation | residual stops decreasing |
| Slow convergence | poor conditioning or unfavorable spectrum |
| Loss of orthogonality | floating point error in Krylov basis |
| Bad preconditioner | transformed system remains difficult |
| Poor stopping rule | iteration stops too early or too late |
Failure should be diagnosed through residual history, conditioning estimates, and knowledge of matrix structure.
88.27 Residual History
A residual history records values such as
over iterations.
It helps identify the behavior of the method.
| Pattern | Interpretation |
|---|---|
| Steady decrease | normal convergence |
| Rapid early decrease, then stagnation | limited precision or poor preconditioner |
| Oscillation | nonsymmetric or indefinite behavior |
| Growth | divergence or instability |
| Flat residual | no useful progress |
Residual history is one of the main diagnostic tools for iterative solvers.
88.28 Summary
Iterative methods solve linear systems by improving an approximate solution step by step.
They are essential for large sparse systems because they avoid dense factorization and often require only matrix-vector products.
The central objects are:
| Concept | Meaning |
|---|---|
| Residual | |
| Error | |
| Iteration matrix | controls stationary convergence |
| Spectral radius | determines fixed-point convergence |
| Krylov subspace | search space for modern methods |
| Preconditioner | transforms system for faster convergence |
Classical stationary methods include Jacobi, Gauss-Seidel, and SOR. Modern large-scale methods include conjugate gradient, GMRES, MINRES, and BiCGSTAB. In practice, preconditioning is often the decisive ingredient.