Krylov subspaces are the search spaces used by many modern iterative methods for large linear systems and eigenvalue problems. They are generated by repeatedly applying a matrix to a starting vector.
For a square matrix and a vector , the -th Krylov subspace is
These spaces are central because they use only matrix-vector multiplication. This makes them suitable for large sparse matrices and matrix-free problems, where forming or factoring the full matrix is too expensive. Krylov subspace methods include conjugate gradient, MINRES, GMRES, BiCGSTAB, Arnoldi iteration, and Lanczos iteration.
92.1 Definition
Let
and let
The -th Krylov subspace generated by and is
Thus:
The subspace grows by repeatedly applying to the most recent direction.
92.2 Nested Structure
Krylov subspaces are nested:
Each new subspace contains all previous directions and possibly one new direction.
The dimension cannot exceed . Therefore the sequence eventually stops growing:
If
lies in the span of previous vectors, then no new direction is added.
92.3 Polynomial View
Every vector in can be written as
where is a polynomial of degree at most .
Indeed,
gives
Thus Krylov methods are polynomial methods.
They search for an approximation by choosing a polynomial in applied to a starting vector.
This polynomial view explains why eigenvalues influence convergence. A Krylov method succeeds when a low-degree polynomial can reduce error on the spectrum of .
92.4 Krylov Subspaces for Linear Systems
Consider the linear system
Let be an initial approximation, and define the initial residual
A Krylov method seeks approximations of the form
This means the correction
is chosen from the space generated by
The method never needs . It only needs products of the form
92.5 Why Matrix-Vector Products Matter
For a dense matrix, storing and factoring may be expensive.
For a sparse matrix, most entries are zero. A matrix-vector product can be computed in approximately
operations, where is the number of nonzero entries.
For a matrix-free problem, may not be stored at all. Instead, the application provides a routine that computes
for any vector .
Krylov methods are effective in these settings because they can operate using this routine alone.
92.6 Projection Principle
Krylov methods choose an approximation from a low-dimensional affine space:
To select one vector from this space, they impose a condition on the residual.
The residual at step is
A projection method usually requires to be orthogonal to a test space:
Different choices of the test space produce different Krylov methods.
| Method | Search space | Residual condition |
|---|---|---|
| CG | Krylov subspace | energy minimization |
| MINRES | Krylov subspace | minimal residual for symmetric |
| GMRES | Krylov subspace | minimal residual for general |
| BiCG | pair of Krylov subspaces | biorthogonality |
| BiCGSTAB | stabilized BiCG structure | smoothed residual behavior |
This projection viewpoint is the common framework behind many Krylov solvers.
92.7 Basis Construction
The raw Krylov sequence
is usually unsuitable as a computational basis.
The vectors often become nearly linearly dependent. Their magnitudes may also grow or decay rapidly.
For stable computation, Krylov methods build an orthonormal or structured basis.
Two basic procedures are:
| Procedure | Matrix class | Role |
|---|---|---|
| Arnoldi iteration | general matrices | orthonormal Krylov basis |
| Lanczos iteration | symmetric or Hermitian matrices | short-recurrence Krylov basis |
Arnoldi is more general. Lanczos is cheaper when symmetry is available.
92.8 Arnoldi Iteration
Arnoldi iteration constructs an orthonormal basis
for
Start with
At step , compute
Then orthogonalize against the previous basis vectors:
Update
Set
If
define
The Arnoldi relation is
where has columns , and is upper Hessenberg.
92.9 Arnoldi Pseudocode
arnoldi(A, v, m):
q[1] = v / norm(v)
for k = 1 to m:
w = A * q[k]
for i = 1 to k:
h[i,k] = dot(q[i], w)
w = w - h[i,k] * q[i]
h[k+1,k] = norm(w)
if h[k+1,k] == 0:
stop
q[k+1] = w / h[k+1,k]
return q, hArnoldi is the basis construction behind GMRES and several eigenvalue algorithms.
Its cost grows with , because each new vector must be orthogonalized against all previous basis vectors.
92.10 Lanczos Iteration
When is symmetric, Arnoldi simplifies.
The Hessenberg matrix becomes tridiagonal. The new basis vector can be computed using only a three-term recurrence.
Lanczos iteration has the form:
where
The tridiagonal relation is
Lanczos is the basis process behind conjugate gradient and MINRES. For Hermitian matrices, the Arnoldi method reduces to Lanczos and only the two previous basis vectors are needed in the recurrence.
92.11 Lanczos Pseudocode
lanczos(A, v, m):
q[0] = zero vector
q[1] = v / norm(v)
beta[1] = 0
for k = 1 to m:
w = A * q[k] - beta[k] * q[k-1]
alpha[k] = dot(q[k], w)
w = w - alpha[k] * q[k]
beta[k+1] = norm(w)
if beta[k+1] == 0:
stop
q[k+1] = w / beta[k+1]
return q, alpha, betaLanczos is cheaper than Arnoldi, but it is more delicate in floating point arithmetic. Loss of orthogonality may affect computed eigenvalues and convergence diagnostics.
92.12 GMRES
GMRES stands for generalized minimal residual method.
It applies to general square systems
At step , GMRES chooses
to minimize the residual norm
Arnoldi iteration reduces this large residual minimization problem to a small least squares problem involving the Hessenberg matrix . This is the standard computational structure of GMRES.
92.13 Conjugate Gradient as a Krylov Method
For symmetric positive definite , conjugate gradient also searches in Krylov spaces:
It selects the vector that minimizes the error in the energy norm:
Equivalently, it minimizes the quadratic objective associated with .
CG can be derived from Lanczos iteration. This connection explains its short recurrence and low storage cost.
92.14 MINRES
MINRES is used for symmetric indefinite systems.
Like CG, it uses Lanczos iteration. Unlike CG, it does not require positive definiteness.
At each step, MINRES chooses an approximation that minimizes the residual norm.
Thus MINRES is often appropriate when
but has both positive and negative eigenvalues.
92.15 BiCG and BiCGSTAB
For nonsymmetric systems, one may use two-sided Krylov methods.
BiCG constructs coupled Krylov subspaces for and . It enforces biorthogonality between residual sequences.
BiCGSTAB modifies this structure to smooth irregular convergence behavior.
These methods can be cheaper per iteration than GMRES because they use short recurrences. However, they may be less robust.
92.16 Eigenvalue Problems
Krylov subspaces are also used to approximate eigenvalues.
Given
large-scale algorithms often seek a few eigenvalues rather than the full spectrum.
Arnoldi and Lanczos project onto a lower-dimensional Krylov subspace. The eigenvalues of the projected small matrix approximate selected eigenvalues of .
These approximations are called Ritz values.
This is the basis of many large-scale eigensolvers.
92.17 Ritz Values and Ritz Vectors
Let be an orthonormal basis for a Krylov subspace.
The projected matrix is
If
then
is a Ritz value, and
is a Ritz vector.
Ritz pairs approximate eigenpairs of .
This reduces a large eigenvalue problem to a small projected eigenvalue problem.
92.18 Convergence and Eigenvalues
Krylov convergence depends strongly on spectral properties.
For linear systems, convergence is fast when a low-degree polynomial can be small on most eigenvalues of while preserving the value needed at zero.
For CG, clustered eigenvalues often give rapid convergence.
For GMRES, nonnormality may complicate convergence. Eigenvalues alone may not fully predict behavior.
For eigenvalue problems, extremal eigenvalues often converge first in Lanczos and Arnoldi methods.
92.19 Preconditioning
Preconditioning is essential in many Krylov solvers.
Instead of solving
one introduces a matrix that approximates and is easier to invert.
A left-preconditioned system is
The Krylov subspace becomes
The goal is to improve the spectrum or geometry of the operator so that fewer Krylov steps are required.
92.20 Restarting
Some Krylov methods, especially GMRES, store all previous basis vectors.
After steps, GMRES stores
The orthogonalization cost also grows with .
Restarting limits memory by running the method for a fixed number of steps, then restarting with the latest approximation.
This gives restarted GMRES, often written
Restarting controls memory but may slow or stall convergence.
92.21 Loss of Orthogonality
In exact arithmetic, Arnoldi and Lanczos produce orthogonal basis vectors.
In floating point arithmetic, orthogonality may degrade.
For Arnoldi, reorthogonalization can control this.
For Lanczos, loss of orthogonality may cause repeated or spurious Ritz values.
Practical implementations must balance accuracy, memory, and cost.
92.22 Matrix-Free Computation
Krylov methods are particularly useful when the matrix is implicit.
Instead of storing , the algorithm only needs a function:
This is common in:
| Application | Matrix-vector product |
|---|---|
| PDE simulation | apply discretized operator |
| optimization | Hessian-vector product |
| inverse problems | forward and adjoint model |
| graph algorithms | sparse adjacency action |
| machine learning | Jacobian-vector or Hessian-vector product |
Matrix-free Krylov methods allow computations far beyond the size of explicit dense matrices.
92.23 Practical Method Selection
A practical Krylov method is chosen from matrix structure.
| Matrix property | Typical method |
|---|---|
| Symmetric positive definite | CG |
| Symmetric indefinite | MINRES |
| General nonsymmetric | GMRES |
| Large nonsymmetric with memory pressure | BiCGSTAB |
| Eigenvalues of symmetric matrix | Lanczos |
| Eigenvalues of general matrix | Arnoldi |
| Ill-conditioned system | preconditioned Krylov method |
The method should match the algebraic structure of the operator.
92.24 Failure Modes
Krylov methods may fail or slow down for several reasons.
| Failure mode | Cause |
|---|---|
| Slow convergence | poor spectrum or conditioning |
| Stagnation | restart too small or bad preconditioner |
| Breakdown | recurrence denominator becomes zero |
| Loss of orthogonality | floating point effects |
| Memory growth | long Arnoldi or unrestarted GMRES |
| Misleading residual | ill-conditioned system |
| Non-normality | eigenvalues poorly predict behavior |
Diagnosis usually requires residual history, conditioning estimates, and knowledge of matrix structure.
92.25 Summary
A Krylov subspace is
Krylov methods solve large linear algebra problems by searching in these spaces.
Their main advantages are:
| Advantage | Meaning |
|---|---|
| Matrix-vector only | no factorization required |
| Sparse friendly | cost depends on nonzeros |
| Matrix-free | works with implicit operators |
| Adaptive | subspace grows from the residual |
| Broad | supports systems and eigenvalue problems |
The main basis processes are Arnoldi for general matrices and Lanczos for symmetric or Hermitian matrices. The main solvers include CG, MINRES, GMRES, and BiCGSTAB.
Krylov subspaces provide the algebraic structure behind much of modern large-scale numerical linear algebra.