Skip to content

Chapter 92. Krylov Subspaces

Krylov subspaces are the search spaces used by many modern iterative methods for large linear systems and eigenvalue problems. They are generated by repeatedly applying a matrix to a starting vector.

For a square matrix AA and a vector vv, the kk-th Krylov subspace is

Kk(A,v)=span{v,Av,A2v,,Ak1v}. \mathcal{K}_k(A,v) = \operatorname{span}\{v, Av, A^2v,\ldots,A^{k-1}v\}.

These spaces are central because they use only matrix-vector multiplication. This makes them suitable for large sparse matrices and matrix-free problems, where forming or factoring the full matrix is too expensive. Krylov subspace methods include conjugate gradient, MINRES, GMRES, BiCGSTAB, Arnoldi iteration, and Lanczos iteration.

92.1 Definition

Let

ARn×n A\in \mathbb{R}^{n\times n}

and let

vRn. v\in \mathbb{R}^n.

The kk-th Krylov subspace generated by AA and vv is

Kk(A,v)=span{v,Av,A2v,,Ak1v}. \mathcal{K}_k(A,v) = \operatorname{span} \{v,Av,A^2v,\ldots,A^{k-1}v\}.

Thus:

K1(A,v)=span{v}, \mathcal{K}_1(A,v)=\operatorname{span}\{v\}, K2(A,v)=span{v,Av}, \mathcal{K}_2(A,v)=\operatorname{span}\{v,Av\}, K3(A,v)=span{v,Av,A2v}. \mathcal{K}_3(A,v)=\operatorname{span}\{v,Av,A^2v\}.

The subspace grows by repeatedly applying AA to the most recent direction.

92.2 Nested Structure

Krylov subspaces are nested:

K1(A,v)K2(A,v)K3(A,v). \mathcal{K}_1(A,v) \subseteq \mathcal{K}_2(A,v) \subseteq \mathcal{K}_3(A,v) \subseteq \cdots.

Each new subspace contains all previous directions and possibly one new direction.

The dimension cannot exceed nn. Therefore the sequence eventually stops growing:

dimKk(A,v)n. \dim \mathcal{K}_k(A,v)\le n.

If

Akv A^k v

lies in the span of previous vectors, then no new direction is added.

92.3 Polynomial View

Every vector in Kk(A,v)\mathcal{K}_k(A,v) can be written as

p(A)v p(A)v

where pp is a polynomial of degree at most k1k-1.

Indeed,

p(t)=c0+c1t++ck1tk1 p(t)=c_0+c_1t+\cdots+c_{k-1}t^{k-1}

gives

p(A)v=c0v+c1Av++ck1Ak1v. p(A)v = c_0v+c_1Av+\cdots+c_{k-1}A^{k-1}v.

Thus Krylov methods are polynomial methods.

They search for an approximation by choosing a polynomial in AA applied to a starting vector.

This polynomial view explains why eigenvalues influence convergence. A Krylov method succeeds when a low-degree polynomial can reduce error on the spectrum of AA.

92.4 Krylov Subspaces for Linear Systems

Consider the linear system

Ax=b. Ax=b.

Let x(0)x^{(0)} be an initial approximation, and define the initial residual

r(0)=bAx(0). r^{(0)}=b-Ax^{(0)}.

A Krylov method seeks approximations of the form

x(k)x(0)+Kk(A,r(0)). x^{(k)} \in x^{(0)}+\mathcal{K}_k(A,r^{(0)}).

This means the correction

x(k)x(0) x^{(k)}-x^{(0)}

is chosen from the space generated by

r(0),Ar(0),A2r(0),. r^{(0)},\quad Ar^{(0)},\quad A^2r^{(0)},\quad \ldots.

The method never needs A1A^{-1}. It only needs products of the form

wAw. w\mapsto Aw.

92.5 Why Matrix-Vector Products Matter

For a dense matrix, storing and factoring AA may be expensive.

For a sparse matrix, most entries are zero. A matrix-vector product can be computed in approximately

O(nnz(A)) O(\operatorname{nnz}(A))

operations, where nnz(A)\operatorname{nnz}(A) is the number of nonzero entries.

For a matrix-free problem, AA may not be stored at all. Instead, the application provides a routine that computes

Av Av

for any vector vv.

Krylov methods are effective in these settings because they can operate using this routine alone.

92.6 Projection Principle

Krylov methods choose an approximation from a low-dimensional affine space:

x(0)+Kk(A,r(0)). x^{(0)}+\mathcal{K}_k(A,r^{(0)}).

To select one vector from this space, they impose a condition on the residual.

The residual at step kk is

r(k)=bAx(k). r^{(k)}=b-Ax^{(k)}.

A projection method usually requires r(k)r^{(k)} to be orthogonal to a test space:

r(k)Lk. r^{(k)}\perp \mathcal{L}_k.

Different choices of the test space produce different Krylov methods.

MethodSearch spaceResidual condition
CGKrylov subspaceenergy minimization
MINRESKrylov subspaceminimal residual for symmetric AA
GMRESKrylov subspaceminimal residual for general AA
BiCGpair of Krylov subspacesbiorthogonality
BiCGSTABstabilized BiCG structuresmoothed residual behavior

This projection viewpoint is the common framework behind many Krylov solvers.

92.7 Basis Construction

The raw Krylov sequence

v,Av,A2v, v,\quad Av,\quad A^2v,\quad \ldots

is usually unsuitable as a computational basis.

The vectors often become nearly linearly dependent. Their magnitudes may also grow or decay rapidly.

For stable computation, Krylov methods build an orthonormal or structured basis.

Two basic procedures are:

ProcedureMatrix classRole
Arnoldi iterationgeneral matricesorthonormal Krylov basis
Lanczos iterationsymmetric or Hermitian matricesshort-recurrence Krylov basis

Arnoldi is more general. Lanczos is cheaper when symmetry is available.

92.8 Arnoldi Iteration

Arnoldi iteration constructs an orthonormal basis

q1,q2,,qk q_1,q_2,\ldots,q_k

for

Kk(A,v). \mathcal{K}_k(A,v).

Start with

q1=vv2. q_1=\frac{v}{\|v\|_2}.

At step kk, compute

w=Aqk. w=Aq_k.

Then orthogonalize ww against the previous basis vectors:

hik=qiTw,i=1,,k. h_{ik}=q_i^T w, \qquad i=1,\ldots,k.

Update

wwi=1khikqi. w \leftarrow w-\sum_{i=1}^k h_{ik}q_i.

Set

hk+1,k=w2. h_{k+1,k}=\|w\|_2.

If

hk+1,k0, h_{k+1,k}\ne 0,

define

qk+1=whk+1,k. q_{k+1}=\frac{w}{h_{k+1,k}}.

The Arnoldi relation is

AQk=Qk+1Hk+1,k, AQ_k=Q_{k+1}H_{k+1,k},

where QkQ_k has columns q1,,qkq_1,\ldots,q_k, and Hk+1,kH_{k+1,k} is upper Hessenberg.

92.9 Arnoldi Pseudocode

arnoldi(A, v, m):
    q[1] = v / norm(v)

    for k = 1 to m:
        w = A * q[k]

        for i = 1 to k:
            h[i,k] = dot(q[i], w)
            w = w - h[i,k] * q[i]

        h[k+1,k] = norm(w)

        if h[k+1,k] == 0:
            stop

        q[k+1] = w / h[k+1,k]

    return q, h

Arnoldi is the basis construction behind GMRES and several eigenvalue algorithms.

Its cost grows with kk, because each new vector must be orthogonalized against all previous basis vectors.

92.10 Lanczos Iteration

When AA is symmetric, Arnoldi simplifies.

The Hessenberg matrix becomes tridiagonal. The new basis vector can be computed using only a three-term recurrence.

Lanczos iteration has the form:

βk+1qk+1=Aqkαkqkβkqk1. \beta_{k+1}q_{k+1} = Aq_k-\alpha_kq_k-\beta_kq_{k-1}.

where

αk=qkTAqk. \alpha_k=q_k^TAq_k.

The tridiagonal relation is

AQk=QkTk+βk+1qk+1ekT. AQ_k = Q_kT_k+\beta_{k+1}q_{k+1}e_k^T.

Lanczos is the basis process behind conjugate gradient and MINRES. For Hermitian matrices, the Arnoldi method reduces to Lanczos and only the two previous basis vectors are needed in the recurrence.

92.11 Lanczos Pseudocode

lanczos(A, v, m):
    q[0] = zero vector
    q[1] = v / norm(v)
    beta[1] = 0

    for k = 1 to m:
        w = A * q[k] - beta[k] * q[k-1]
        alpha[k] = dot(q[k], w)
        w = w - alpha[k] * q[k]

        beta[k+1] = norm(w)

        if beta[k+1] == 0:
            stop

        q[k+1] = w / beta[k+1]

    return q, alpha, beta

Lanczos is cheaper than Arnoldi, but it is more delicate in floating point arithmetic. Loss of orthogonality may affect computed eigenvalues and convergence diagnostics.

92.12 GMRES

GMRES stands for generalized minimal residual method.

It applies to general square systems

Ax=b. Ax=b.

At step kk, GMRES chooses

x(k)x(0)+Kk(A,r(0)) x^{(k)} \in x^{(0)}+\mathcal{K}_k(A,r^{(0)})

to minimize the residual norm

bAx(k)2. \|b-Ax^{(k)}\|_2.

Arnoldi iteration reduces this large residual minimization problem to a small least squares problem involving the Hessenberg matrix Hk+1,kH_{k+1,k}. This is the standard computational structure of GMRES.

92.13 Conjugate Gradient as a Krylov Method

For symmetric positive definite AA, conjugate gradient also searches in Krylov spaces:

x(k)x(0)+Kk(A,r(0)). x^{(k)} \in x^{(0)}+\mathcal{K}_k(A,r^{(0)}).

It selects the vector that minimizes the error in the energy norm:

xx(k)A. \|x-x^{(k)}\|_A.

Equivalently, it minimizes the quadratic objective associated with Ax=bAx=b.

CG can be derived from Lanczos iteration. This connection explains its short recurrence and low storage cost.

92.14 MINRES

MINRES is used for symmetric indefinite systems.

Like CG, it uses Lanczos iteration. Unlike CG, it does not require positive definiteness.

At each step, MINRES chooses an approximation that minimizes the residual norm.

Thus MINRES is often appropriate when

A=AT A=A^T

but AA has both positive and negative eigenvalues.

92.15 BiCG and BiCGSTAB

For nonsymmetric systems, one may use two-sided Krylov methods.

BiCG constructs coupled Krylov subspaces for AA and ATA^T. It enforces biorthogonality between residual sequences.

BiCGSTAB modifies this structure to smooth irregular convergence behavior.

These methods can be cheaper per iteration than GMRES because they use short recurrences. However, they may be less robust.

92.16 Eigenvalue Problems

Krylov subspaces are also used to approximate eigenvalues.

Given

Av=λv, Av=\lambda v,

large-scale algorithms often seek a few eigenvalues rather than the full spectrum.

Arnoldi and Lanczos project AA onto a lower-dimensional Krylov subspace. The eigenvalues of the projected small matrix approximate selected eigenvalues of AA.

These approximations are called Ritz values.

This is the basis of many large-scale eigensolvers.

92.17 Ritz Values and Ritz Vectors

Let QkQ_k be an orthonormal basis for a Krylov subspace.

The projected matrix is

Bk=QkTAQk. B_k=Q_k^TAQ_k.

If

Bky=θy, B_ky=\theta y,

then

θ \theta

is a Ritz value, and

Qky Q_ky

is a Ritz vector.

Ritz pairs approximate eigenpairs of AA.

This reduces a large eigenvalue problem to a small projected eigenvalue problem.

92.18 Convergence and Eigenvalues

Krylov convergence depends strongly on spectral properties.

For linear systems, convergence is fast when a low-degree polynomial can be small on most eigenvalues of AA while preserving the value needed at zero.

For CG, clustered eigenvalues often give rapid convergence.

For GMRES, nonnormality may complicate convergence. Eigenvalues alone may not fully predict behavior.

For eigenvalue problems, extremal eigenvalues often converge first in Lanczos and Arnoldi methods.

92.19 Preconditioning

Preconditioning is essential in many Krylov solvers.

Instead of solving

Ax=b, Ax=b,

one introduces a matrix MM that approximates AA and is easier to invert.

A left-preconditioned system is

M1Ax=M1b. M^{-1}Ax=M^{-1}b.

The Krylov subspace becomes

Kk(M1A,M1r(0)). \mathcal{K}_k(M^{-1}A,M^{-1}r^{(0)}).

The goal is to improve the spectrum or geometry of the operator so that fewer Krylov steps are required.

92.20 Restarting

Some Krylov methods, especially GMRES, store all previous basis vectors.

After kk steps, GMRES stores

q1,,qk. q_1,\ldots,q_k.

The orthogonalization cost also grows with kk.

Restarting limits memory by running the method for a fixed number of steps, then restarting with the latest approximation.

This gives restarted GMRES, often written

GMRES(m). \operatorname{GMRES}(m).

Restarting controls memory but may slow or stall convergence.

92.21 Loss of Orthogonality

In exact arithmetic, Arnoldi and Lanczos produce orthogonal basis vectors.

In floating point arithmetic, orthogonality may degrade.

For Arnoldi, reorthogonalization can control this.

For Lanczos, loss of orthogonality may cause repeated or spurious Ritz values.

Practical implementations must balance accuracy, memory, and cost.

92.22 Matrix-Free Computation

Krylov methods are particularly useful when the matrix is implicit.

Instead of storing AA, the algorithm only needs a function:

vAv. v \mapsto Av.

This is common in:

ApplicationMatrix-vector product
PDE simulationapply discretized operator
optimizationHessian-vector product
inverse problemsforward and adjoint model
graph algorithmssparse adjacency action
machine learningJacobian-vector or Hessian-vector product

Matrix-free Krylov methods allow computations far beyond the size of explicit dense matrices.

92.23 Practical Method Selection

A practical Krylov method is chosen from matrix structure.

Matrix propertyTypical method
Symmetric positive definiteCG
Symmetric indefiniteMINRES
General nonsymmetricGMRES
Large nonsymmetric with memory pressureBiCGSTAB
Eigenvalues of symmetric matrixLanczos
Eigenvalues of general matrixArnoldi
Ill-conditioned systempreconditioned Krylov method

The method should match the algebraic structure of the operator.

92.24 Failure Modes

Krylov methods may fail or slow down for several reasons.

Failure modeCause
Slow convergencepoor spectrum or conditioning
Stagnationrestart too small or bad preconditioner
Breakdownrecurrence denominator becomes zero
Loss of orthogonalityfloating point effects
Memory growthlong Arnoldi or unrestarted GMRES
Misleading residualill-conditioned system
Non-normalityeigenvalues poorly predict behavior

Diagnosis usually requires residual history, conditioning estimates, and knowledge of matrix structure.

92.25 Summary

A Krylov subspace is

Kk(A,v)=span{v,Av,A2v,,Ak1v}. \mathcal{K}_k(A,v) = \operatorname{span} \{v,Av,A^2v,\ldots,A^{k-1}v\}.

Krylov methods solve large linear algebra problems by searching in these spaces.

Their main advantages are:

AdvantageMeaning
Matrix-vector onlyno factorization required
Sparse friendlycost depends on nonzeros
Matrix-freeworks with implicit operators
Adaptivesubspace grows from the residual
Broadsupports systems and eigenvalue problems

The main basis processes are Arnoldi for general matrices and Lanczos for symmetric or Hermitian matrices. The main solvers include CG, MINRES, GMRES, and BiCGSTAB.

Krylov subspaces provide the algebraic structure behind much of modern large-scale numerical linear algebra.