Chapter 92. Krylov Subspaces

Krylov subspaces are the search spaces used by many modern iterative methods for large linear systems and eigenvalue problems. They are generated by repeatedly applying a matrix to a starting vector.

For a square matrix $A$ and a vector $v$ , the $k$ -th Krylov subspace is

\mathcal{K}_k(A,v) = \operatorname{span}\{v, Av, A^2v,\ldots,A^{k-1}v\}.

These spaces are central because they use only matrix-vector multiplication. This makes them suitable for large sparse matrices and matrix-free problems, where forming or factoring the full matrix is too expensive. Krylov subspace methods include conjugate gradient, MINRES, GMRES, BiCGSTAB, Arnoldi iteration, and Lanczos iteration.

92.1 Definition

Let

A\in \mathbb{R}^{n\times n}

and let

v\in \mathbb{R}^n.

The $k$ -th Krylov subspace generated by $A$ and $v$ is

\mathcal{K}_k(A,v) = \operatorname{span} \{v,Av,A^2v,\ldots,A^{k-1}v\}.

Thus:

\mathcal{K}_1(A,v)=\operatorname{span}\{v\},

\mathcal{K}_2(A,v)=\operatorname{span}\{v,Av\},

\mathcal{K}_3(A,v)=\operatorname{span}\{v,Av,A^2v\}.

The subspace grows by repeatedly applying $A$ to the most recent direction.

92.2 Nested Structure

Krylov subspaces are nested:

\mathcal{K}_1(A,v) \subseteq \mathcal{K}_2(A,v) \subseteq \mathcal{K}_3(A,v) \subseteq \cdots.

Each new subspace contains all previous directions and possibly one new direction.

The dimension cannot exceed $n$ . Therefore the sequence eventually stops growing:

\dim \mathcal{K}_k(A,v)\le n.

A^k v

lies in the span of previous vectors, then no new direction is added.

92.3 Polynomial View

Every vector in $\mathcal{K}_k(A,v)$ can be written as

p(A)v

where $p$ is a polynomial of degree at most $k-1$ .

Indeed,

p(t)=c_0+c_1t+\cdots+c_{k-1}t^{k-1}

gives

p(A)v = c_0v+c_1Av+\cdots+c_{k-1}A^{k-1}v.

Thus Krylov methods are polynomial methods.

They search for an approximation by choosing a polynomial in $A$ applied to a starting vector.

This polynomial view explains why eigenvalues influence convergence. A Krylov method succeeds when a low-degree polynomial can reduce error on the spectrum of $A$ .

92.4 Krylov Subspaces for Linear Systems

Consider the linear system

Ax=b.

Let $x^{(0)}$ be an initial approximation, and define the initial residual

r^{(0)}=b-Ax^{(0)}.

A Krylov method seeks approximations of the form

x^{(k)} \in x^{(0)}+\mathcal{K}_k(A,r^{(0)}).

This means the correction

x^{(k)}-x^{(0)}

is chosen from the space generated by

r^{(0)},\quad Ar^{(0)},\quad A^2r^{(0)},\quad \ldots.

The method never needs $A^{-1}$ . It only needs products of the form

w\mapsto Aw.

92.5 Why Matrix-Vector Products Matter

For a dense matrix, storing and factoring $A$ may be expensive.

For a sparse matrix, most entries are zero. A matrix-vector product can be computed in approximately

O(\operatorname{nnz}(A))

operations, where $\operatorname{nnz}(A)$ is the number of nonzero entries.

For a matrix-free problem, $A$ may not be stored at all. Instead, the application provides a routine that computes

Av

for any vector $v$ .

Krylov methods are effective in these settings because they can operate using this routine alone.

92.6 Projection Principle

Krylov methods choose an approximation from a low-dimensional affine space:

x^{(0)}+\mathcal{K}_k(A,r^{(0)}).

To select one vector from this space, they impose a condition on the residual.

The residual at step $k$ is

r^{(k)}=b-Ax^{(k)}.

A projection method usually requires $r^{(k)}$ to be orthogonal to a test space:

r^{(k)}\perp \mathcal{L}_k.

Different choices of the test space produce different Krylov methods.

Method	Search space	Residual condition
CG	Krylov subspace	energy minimization
MINRES	Krylov subspace	minimal residual for symmetric $A$
GMRES	Krylov subspace	minimal residual for general $A$
BiCG	pair of Krylov subspaces	biorthogonality
BiCGSTAB	stabilized BiCG structure	smoothed residual behavior

This projection viewpoint is the common framework behind many Krylov solvers.

92.7 Basis Construction

The raw Krylov sequence

v,\quad Av,\quad A^2v,\quad \ldots

is usually unsuitable as a computational basis.

The vectors often become nearly linearly dependent. Their magnitudes may also grow or decay rapidly.

For stable computation, Krylov methods build an orthonormal or structured basis.

Two basic procedures are:

Procedure	Matrix class	Role
Arnoldi iteration	general matrices	orthonormal Krylov basis
Lanczos iteration	symmetric or Hermitian matrices	short-recurrence Krylov basis

Arnoldi is more general. Lanczos is cheaper when symmetry is available.

92.8 Arnoldi Iteration

Arnoldi iteration constructs an orthonormal basis

q_1,q_2,\ldots,q_k

for

\mathcal{K}_k(A,v).

Start with

q_1=\frac{v}{\|v\|_2}.

At step $k$ , compute

w=Aq_k.

Then orthogonalize $w$ against the previous basis vectors:

h_{ik}=q_i^T w, \qquad i=1,\ldots,k.

Update

w \leftarrow w-\sum_{i=1}^k h_{ik}q_i.

Set

h_{k+1,k}=\|w\|_2.

h_{k+1,k}\ne 0,

define

q_{k+1}=\frac{w}{h_{k+1,k}}.

The Arnoldi relation is

AQ_k=Q_{k+1}H_{k+1,k},

where $Q_k$ has columns $q_1,\ldots,q_k$ , and $H_{k+1,k}$ is upper Hessenberg.

92.9 Arnoldi Pseudocode

arnoldi(A, v, m):
    q[1] = v / norm(v)

    for k = 1 to m:
        w = A * q[k]

        for i = 1 to k:
            h[i,k] = dot(q[i], w)
            w = w - h[i,k] * q[i]

        h[k+1,k] = norm(w)

        if h[k+1,k] == 0:
            stop

        q[k+1] = w / h[k+1,k]

    return q, h

Arnoldi is the basis construction behind GMRES and several eigenvalue algorithms.

Its cost grows with $k$ , because each new vector must be orthogonalized against all previous basis vectors.

92.10 Lanczos Iteration

When $A$ is symmetric, Arnoldi simplifies.

The Hessenberg matrix becomes tridiagonal. The new basis vector can be computed using only a three-term recurrence.

Lanczos iteration has the form:

\beta_{k+1}q_{k+1} = Aq_k-\alpha_kq_k-\beta_kq_{k-1}.

where

\alpha_k=q_k^TAq_k.

The tridiagonal relation is

AQ_k = Q_kT_k+\beta_{k+1}q_{k+1}e_k^T.

Lanczos is the basis process behind conjugate gradient and MINRES. For Hermitian matrices, the Arnoldi method reduces to Lanczos and only the two previous basis vectors are needed in the recurrence.

92.11 Lanczos Pseudocode

lanczos(A, v, m):
    q[0] = zero vector
    q[1] = v / norm(v)
    beta[1] = 0

    for k = 1 to m:
        w = A * q[k] - beta[k] * q[k-1]
        alpha[k] = dot(q[k], w)
        w = w - alpha[k] * q[k]

        beta[k+1] = norm(w)

        if beta[k+1] == 0:
            stop

        q[k+1] = w / beta[k+1]

    return q, alpha, beta

Lanczos is cheaper than Arnoldi, but it is more delicate in floating point arithmetic. Loss of orthogonality may affect computed eigenvalues and convergence diagnostics.

92.12 GMRES

GMRES stands for generalized minimal residual method.

It applies to general square systems

Ax=b.

At step $k$ , GMRES chooses

x^{(k)} \in x^{(0)}+\mathcal{K}_k(A,r^{(0)})

to minimize the residual norm

\|b-Ax^{(k)}\|_2.

Arnoldi iteration reduces this large residual minimization problem to a small least squares problem involving the Hessenberg matrix $H_{k+1,k}$ . This is the standard computational structure of GMRES.

92.13 Conjugate Gradient as a Krylov Method

For symmetric positive definite $A$ , conjugate gradient also searches in Krylov spaces:

x^{(k)} \in x^{(0)}+\mathcal{K}_k(A,r^{(0)}).

It selects the vector that minimizes the error in the energy norm:

\|x-x^{(k)}\|_A.

Equivalently, it minimizes the quadratic objective associated with $Ax=b$ .

CG can be derived from Lanczos iteration. This connection explains its short recurrence and low storage cost.

92.14 MINRES

MINRES is used for symmetric indefinite systems.

Like CG, it uses Lanczos iteration. Unlike CG, it does not require positive definiteness.

At each step, MINRES chooses an approximation that minimizes the residual norm.

Thus MINRES is often appropriate when

A=A^T

but $A$ has both positive and negative eigenvalues.

92.15 BiCG and BiCGSTAB

For nonsymmetric systems, one may use two-sided Krylov methods.

BiCG constructs coupled Krylov subspaces for $A$ and $A^T$ . It enforces biorthogonality between residual sequences.

BiCGSTAB modifies this structure to smooth irregular convergence behavior.

These methods can be cheaper per iteration than GMRES because they use short recurrences. However, they may be less robust.

92.16 Eigenvalue Problems

Krylov subspaces are also used to approximate eigenvalues.

Given

Av=\lambda v,

large-scale algorithms often seek a few eigenvalues rather than the full spectrum.

Arnoldi and Lanczos project $A$ onto a lower-dimensional Krylov subspace. The eigenvalues of the projected small matrix approximate selected eigenvalues of $A$ .

These approximations are called Ritz values.

This is the basis of many large-scale eigensolvers.

92.17 Ritz Values and Ritz Vectors

Let $Q_k$ be an orthonormal basis for a Krylov subspace.

The projected matrix is

B_k=Q_k^TAQ_k.

B_ky=\theta y,

then

\theta

is a Ritz value, and

Q_ky

is a Ritz vector.

Ritz pairs approximate eigenpairs of $A$ .

This reduces a large eigenvalue problem to a small projected eigenvalue problem.

92.18 Convergence and Eigenvalues

Krylov convergence depends strongly on spectral properties.

For linear systems, convergence is fast when a low-degree polynomial can be small on most eigenvalues of $A$ while preserving the value needed at zero.

For CG, clustered eigenvalues often give rapid convergence.

For GMRES, nonnormality may complicate convergence. Eigenvalues alone may not fully predict behavior.

For eigenvalue problems, extremal eigenvalues often converge first in Lanczos and Arnoldi methods.

92.19 Preconditioning

Preconditioning is essential in many Krylov solvers.

Instead of solving

Ax=b,

one introduces a matrix $M$ that approximates $A$ and is easier to invert.

A left-preconditioned system is

M^{-1}Ax=M^{-1}b.

The Krylov subspace becomes

\mathcal{K}_k(M^{-1}A,M^{-1}r^{(0)}).

The goal is to improve the spectrum or geometry of the operator so that fewer Krylov steps are required.

92.20 Restarting

Some Krylov methods, especially GMRES, store all previous basis vectors.

After $k$ steps, GMRES stores

q_1,\ldots,q_k.

The orthogonalization cost also grows with $k$ .

Restarting limits memory by running the method for a fixed number of steps, then restarting with the latest approximation.

This gives restarted GMRES, often written

\operatorname{GMRES}(m).

Restarting controls memory but may slow or stall convergence.

92.21 Loss of Orthogonality

In exact arithmetic, Arnoldi and Lanczos produce orthogonal basis vectors.

In floating point arithmetic, orthogonality may degrade.

For Arnoldi, reorthogonalization can control this.

For Lanczos, loss of orthogonality may cause repeated or spurious Ritz values.

Practical implementations must balance accuracy, memory, and cost.

92.22 Matrix-Free Computation

Krylov methods are particularly useful when the matrix is implicit.

Instead of storing $A$ , the algorithm only needs a function:

v \mapsto Av.

This is common in:

Application	Matrix-vector product
PDE simulation	apply discretized operator
optimization	Hessian-vector product
inverse problems	forward and adjoint model
graph algorithms	sparse adjacency action
machine learning	Jacobian-vector or Hessian-vector product

Matrix-free Krylov methods allow computations far beyond the size of explicit dense matrices.

92.23 Practical Method Selection

A practical Krylov method is chosen from matrix structure.

Matrix property	Typical method
Symmetric positive definite	CG
Symmetric indefinite	MINRES
General nonsymmetric	GMRES
Large nonsymmetric with memory pressure	BiCGSTAB
Eigenvalues of symmetric matrix	Lanczos
Eigenvalues of general matrix	Arnoldi
Ill-conditioned system	preconditioned Krylov method

The method should match the algebraic structure of the operator.

92.24 Failure Modes

Krylov methods may fail or slow down for several reasons.

Failure mode	Cause
Slow convergence	poor spectrum or conditioning
Stagnation	restart too small or bad preconditioner
Breakdown	recurrence denominator becomes zero
Loss of orthogonality	floating point effects
Memory growth	long Arnoldi or unrestarted GMRES
Misleading residual	ill-conditioned system
Non-normality	eigenvalues poorly predict behavior

Diagnosis usually requires residual history, conditioning estimates, and knowledge of matrix structure.

92.25 Summary

A Krylov subspace is

\mathcal{K}_k(A,v) = \operatorname{span} \{v,Av,A^2v,\ldots,A^{k-1}v\}.

Krylov methods solve large linear algebra problems by searching in these spaces.

Their main advantages are:

Advantage	Meaning
Matrix-vector only	no factorization required
Sparse friendly	cost depends on nonzeros
Matrix-free	works with implicit operators
Adaptive	subspace grows from the residual
Broad	supports systems and eigenvalue problems

The main basis processes are Arnoldi for general matrices and Lanczos for symmetric or Hermitian matrices. The main solvers include CG, MINRES, GMRES, and BiCGSTAB.

Krylov subspaces provide the algebraic structure behind much of modern large-scale numerical linear algebra.