# Chapter 92. Krylov Subspaces

# Chapter 92. Krylov Subspaces

Krylov subspaces are the search spaces used by many modern iterative methods for large linear systems and eigenvalue problems. They are generated by repeatedly applying a matrix to a starting vector.

For a square matrix \(A\) and a vector \(v\), the \(k\)-th Krylov subspace is

$$
\mathcal{K}_k(A,v) =
\operatorname{span}\{v, Av, A^2v,\ldots,A^{k-1}v\}.
$$

These spaces are central because they use only matrix-vector multiplication. This makes them suitable for large sparse matrices and matrix-free problems, where forming or factoring the full matrix is too expensive. Krylov subspace methods include conjugate gradient, MINRES, GMRES, BiCGSTAB, Arnoldi iteration, and Lanczos iteration.

## 92.1 Definition

Let

$$
A\in \mathbb{R}^{n\times n}
$$

and let

$$
v\in \mathbb{R}^n.
$$

The \(k\)-th Krylov subspace generated by \(A\) and \(v\) is

$$
\mathcal{K}_k(A,v) =
\operatorname{span}
\{v,Av,A^2v,\ldots,A^{k-1}v\}.
$$

Thus:

$$
\mathcal{K}_1(A,v)=\operatorname{span}\{v\},
$$

$$
\mathcal{K}_2(A,v)=\operatorname{span}\{v,Av\},
$$

$$
\mathcal{K}_3(A,v)=\operatorname{span}\{v,Av,A^2v\}.
$$

The subspace grows by repeatedly applying \(A\) to the most recent direction.

## 92.2 Nested Structure

Krylov subspaces are nested:

$$
\mathcal{K}_1(A,v)
\subseteq
\mathcal{K}_2(A,v)
\subseteq
\mathcal{K}_3(A,v)
\subseteq
\cdots.
$$

Each new subspace contains all previous directions and possibly one new direction.

The dimension cannot exceed \(n\). Therefore the sequence eventually stops growing:

$$
\dim \mathcal{K}_k(A,v)\le n.
$$

If

$$
A^k v
$$

lies in the span of previous vectors, then no new direction is added.

## 92.3 Polynomial View

Every vector in \(\mathcal{K}_k(A,v)\) can be written as

$$
p(A)v
$$

where \(p\) is a polynomial of degree at most \(k-1\).

Indeed,

$$
p(t)=c_0+c_1t+\cdots+c_{k-1}t^{k-1}
$$

gives

$$
p(A)v =
c_0v+c_1Av+\cdots+c_{k-1}A^{k-1}v.
$$

Thus Krylov methods are polynomial methods.

They search for an approximation by choosing a polynomial in \(A\) applied to a starting vector.

This polynomial view explains why eigenvalues influence convergence. A Krylov method succeeds when a low-degree polynomial can reduce error on the spectrum of \(A\).

## 92.4 Krylov Subspaces for Linear Systems

Consider the linear system

$$
Ax=b.
$$

Let \(x^{(0)}\) be an initial approximation, and define the initial residual

$$
r^{(0)}=b-Ax^{(0)}.
$$

A Krylov method seeks approximations of the form

$$
x^{(k)}
\in
x^{(0)}+\mathcal{K}_k(A,r^{(0)}).
$$

This means the correction

$$
x^{(k)}-x^{(0)}
$$

is chosen from the space generated by

$$
r^{(0)},\quad
Ar^{(0)},\quad
A^2r^{(0)},\quad \ldots.
$$

The method never needs \(A^{-1}\). It only needs products of the form

$$
w\mapsto Aw.
$$

## 92.5 Why Matrix-Vector Products Matter

For a dense matrix, storing and factoring \(A\) may be expensive.

For a sparse matrix, most entries are zero. A matrix-vector product can be computed in approximately

$$
O(\operatorname{nnz}(A))
$$

operations, where \(\operatorname{nnz}(A)\) is the number of nonzero entries.

For a matrix-free problem, \(A\) may not be stored at all. Instead, the application provides a routine that computes

$$
Av
$$

for any vector \(v\).

Krylov methods are effective in these settings because they can operate using this routine alone.

## 92.6 Projection Principle

Krylov methods choose an approximation from a low-dimensional affine space:

$$
x^{(0)}+\mathcal{K}_k(A,r^{(0)}).
$$

To select one vector from this space, they impose a condition on the residual.

The residual at step \(k\) is

$$
r^{(k)}=b-Ax^{(k)}.
$$

A projection method usually requires \(r^{(k)}\) to be orthogonal to a test space:

$$
r^{(k)}\perp \mathcal{L}_k.
$$

Different choices of the test space produce different Krylov methods.

| Method | Search space | Residual condition |
|---|---|---|
| CG | Krylov subspace | energy minimization |
| MINRES | Krylov subspace | minimal residual for symmetric \(A\) |
| GMRES | Krylov subspace | minimal residual for general \(A\) |
| BiCG | pair of Krylov subspaces | biorthogonality |
| BiCGSTAB | stabilized BiCG structure | smoothed residual behavior |

This projection viewpoint is the common framework behind many Krylov solvers.

## 92.7 Basis Construction

The raw Krylov sequence

$$
v,\quad Av,\quad A^2v,\quad \ldots
$$

is usually unsuitable as a computational basis.

The vectors often become nearly linearly dependent. Their magnitudes may also grow or decay rapidly.

For stable computation, Krylov methods build an orthonormal or structured basis.

Two basic procedures are:

| Procedure | Matrix class | Role |
|---|---|---|
| Arnoldi iteration | general matrices | orthonormal Krylov basis |
| Lanczos iteration | symmetric or Hermitian matrices | short-recurrence Krylov basis |

Arnoldi is more general. Lanczos is cheaper when symmetry is available.

## 92.8 Arnoldi Iteration

Arnoldi iteration constructs an orthonormal basis

$$
q_1,q_2,\ldots,q_k
$$

for

$$
\mathcal{K}_k(A,v).
$$

Start with

$$
q_1=\frac{v}{\|v\|_2}.
$$

At step \(k\), compute

$$
w=Aq_k.
$$

Then orthogonalize \(w\) against the previous basis vectors:

$$
h_{ik}=q_i^T w,
\qquad i=1,\ldots,k.
$$

Update

$$
w \leftarrow w-\sum_{i=1}^k h_{ik}q_i.
$$

Set

$$
h_{k+1,k}=\|w\|_2.
$$

If

$$
h_{k+1,k}\ne 0,
$$

define

$$
q_{k+1}=\frac{w}{h_{k+1,k}}.
$$

The Arnoldi relation is

$$
AQ_k=Q_{k+1}H_{k+1,k},
$$

where \(Q_k\) has columns \(q_1,\ldots,q_k\), and \(H_{k+1,k}\) is upper Hessenberg.

## 92.9 Arnoldi Pseudocode

```text
arnoldi(A, v, m):
    q[1] = v / norm(v)

    for k = 1 to m:
        w = A * q[k]

        for i = 1 to k:
            h[i,k] = dot(q[i], w)
            w = w - h[i,k] * q[i]

        h[k+1,k] = norm(w)

        if h[k+1,k] == 0:
            stop

        q[k+1] = w / h[k+1,k]

    return q, h
```

Arnoldi is the basis construction behind GMRES and several eigenvalue algorithms.

Its cost grows with \(k\), because each new vector must be orthogonalized against all previous basis vectors.

## 92.10 Lanczos Iteration

When \(A\) is symmetric, Arnoldi simplifies.

The Hessenberg matrix becomes tridiagonal. The new basis vector can be computed using only a three-term recurrence.

Lanczos iteration has the form:

$$
\beta_{k+1}q_{k+1} =
Aq_k-\alpha_kq_k-\beta_kq_{k-1}.
$$

where

$$
\alpha_k=q_k^TAq_k.
$$

The tridiagonal relation is

$$
AQ_k =
Q_kT_k+\beta_{k+1}q_{k+1}e_k^T.
$$

Lanczos is the basis process behind conjugate gradient and MINRES. For Hermitian matrices, the Arnoldi method reduces to Lanczos and only the two previous basis vectors are needed in the recurrence.

## 92.11 Lanczos Pseudocode

```text
lanczos(A, v, m):
    q[0] = zero vector
    q[1] = v / norm(v)
    beta[1] = 0

    for k = 1 to m:
        w = A * q[k] - beta[k] * q[k-1]
        alpha[k] = dot(q[k], w)
        w = w - alpha[k] * q[k]

        beta[k+1] = norm(w)

        if beta[k+1] == 0:
            stop

        q[k+1] = w / beta[k+1]

    return q, alpha, beta
```

Lanczos is cheaper than Arnoldi, but it is more delicate in floating point arithmetic. Loss of orthogonality may affect computed eigenvalues and convergence diagnostics.

## 92.12 GMRES

GMRES stands for generalized minimal residual method.

It applies to general square systems

$$
Ax=b.
$$

At step \(k\), GMRES chooses

$$
x^{(k)}
\in
x^{(0)}+\mathcal{K}_k(A,r^{(0)})
$$

to minimize the residual norm

$$
\|b-Ax^{(k)}\|_2.
$$

Arnoldi iteration reduces this large residual minimization problem to a small least squares problem involving the Hessenberg matrix \(H_{k+1,k}\). This is the standard computational structure of GMRES.

## 92.13 Conjugate Gradient as a Krylov Method

For symmetric positive definite \(A\), conjugate gradient also searches in Krylov spaces:

$$
x^{(k)}
\in
x^{(0)}+\mathcal{K}_k(A,r^{(0)}).
$$

It selects the vector that minimizes the error in the energy norm:

$$
\|x-x^{(k)}\|_A.
$$

Equivalently, it minimizes the quadratic objective associated with \(Ax=b\).

CG can be derived from Lanczos iteration. This connection explains its short recurrence and low storage cost.

## 92.14 MINRES

MINRES is used for symmetric indefinite systems.

Like CG, it uses Lanczos iteration. Unlike CG, it does not require positive definiteness.

At each step, MINRES chooses an approximation that minimizes the residual norm.

Thus MINRES is often appropriate when

$$
A=A^T
$$

but \(A\) has both positive and negative eigenvalues.

## 92.15 BiCG and BiCGSTAB

For nonsymmetric systems, one may use two-sided Krylov methods.

BiCG constructs coupled Krylov subspaces for \(A\) and \(A^T\). It enforces biorthogonality between residual sequences.

BiCGSTAB modifies this structure to smooth irregular convergence behavior.

These methods can be cheaper per iteration than GMRES because they use short recurrences. However, they may be less robust.

## 92.16 Eigenvalue Problems

Krylov subspaces are also used to approximate eigenvalues.

Given

$$
Av=\lambda v,
$$

large-scale algorithms often seek a few eigenvalues rather than the full spectrum.

Arnoldi and Lanczos project \(A\) onto a lower-dimensional Krylov subspace. The eigenvalues of the projected small matrix approximate selected eigenvalues of \(A\).

These approximations are called Ritz values.

This is the basis of many large-scale eigensolvers.

## 92.17 Ritz Values and Ritz Vectors

Let \(Q_k\) be an orthonormal basis for a Krylov subspace.

The projected matrix is

$$
B_k=Q_k^TAQ_k.
$$

If

$$
B_ky=\theta y,
$$

then

$$
\theta
$$

is a Ritz value, and

$$
Q_ky
$$

is a Ritz vector.

Ritz pairs approximate eigenpairs of \(A\).

This reduces a large eigenvalue problem to a small projected eigenvalue problem.

## 92.18 Convergence and Eigenvalues

Krylov convergence depends strongly on spectral properties.

For linear systems, convergence is fast when a low-degree polynomial can be small on most eigenvalues of \(A\) while preserving the value needed at zero.

For CG, clustered eigenvalues often give rapid convergence.

For GMRES, nonnormality may complicate convergence. Eigenvalues alone may not fully predict behavior.

For eigenvalue problems, extremal eigenvalues often converge first in Lanczos and Arnoldi methods.

## 92.19 Preconditioning

Preconditioning is essential in many Krylov solvers.

Instead of solving

$$
Ax=b,
$$

one introduces a matrix \(M\) that approximates \(A\) and is easier to invert.

A left-preconditioned system is

$$
M^{-1}Ax=M^{-1}b.
$$

The Krylov subspace becomes

$$
\mathcal{K}_k(M^{-1}A,M^{-1}r^{(0)}).
$$

The goal is to improve the spectrum or geometry of the operator so that fewer Krylov steps are required.

## 92.20 Restarting

Some Krylov methods, especially GMRES, store all previous basis vectors.

After \(k\) steps, GMRES stores

$$
q_1,\ldots,q_k.
$$

The orthogonalization cost also grows with \(k\).

Restarting limits memory by running the method for a fixed number of steps, then restarting with the latest approximation.

This gives restarted GMRES, often written

$$
\operatorname{GMRES}(m).
$$

Restarting controls memory but may slow or stall convergence.

## 92.21 Loss of Orthogonality

In exact arithmetic, Arnoldi and Lanczos produce orthogonal basis vectors.

In floating point arithmetic, orthogonality may degrade.

For Arnoldi, reorthogonalization can control this.

For Lanczos, loss of orthogonality may cause repeated or spurious Ritz values.

Practical implementations must balance accuracy, memory, and cost.

## 92.22 Matrix-Free Computation

Krylov methods are particularly useful when the matrix is implicit.

Instead of storing \(A\), the algorithm only needs a function:

$$
v \mapsto Av.
$$

This is common in:

| Application | Matrix-vector product |
|---|---|
| PDE simulation | apply discretized operator |
| optimization | Hessian-vector product |
| inverse problems | forward and adjoint model |
| graph algorithms | sparse adjacency action |
| machine learning | Jacobian-vector or Hessian-vector product |

Matrix-free Krylov methods allow computations far beyond the size of explicit dense matrices.

## 92.23 Practical Method Selection

A practical Krylov method is chosen from matrix structure.

| Matrix property | Typical method |
|---|---|
| Symmetric positive definite | CG |
| Symmetric indefinite | MINRES |
| General nonsymmetric | GMRES |
| Large nonsymmetric with memory pressure | BiCGSTAB |
| Eigenvalues of symmetric matrix | Lanczos |
| Eigenvalues of general matrix | Arnoldi |
| Ill-conditioned system | preconditioned Krylov method |

The method should match the algebraic structure of the operator.

## 92.24 Failure Modes

Krylov methods may fail or slow down for several reasons.

| Failure mode | Cause |
|---|---|
| Slow convergence | poor spectrum or conditioning |
| Stagnation | restart too small or bad preconditioner |
| Breakdown | recurrence denominator becomes zero |
| Loss of orthogonality | floating point effects |
| Memory growth | long Arnoldi or unrestarted GMRES |
| Misleading residual | ill-conditioned system |
| Non-normality | eigenvalues poorly predict behavior |

Diagnosis usually requires residual history, conditioning estimates, and knowledge of matrix structure.

## 92.25 Summary

A Krylov subspace is

$$
\mathcal{K}_k(A,v) =
\operatorname{span}
\{v,Av,A^2v,\ldots,A^{k-1}v\}.
$$

Krylov methods solve large linear algebra problems by searching in these spaces.

Their main advantages are:

| Advantage | Meaning |
|---|---|
| Matrix-vector only | no factorization required |
| Sparse friendly | cost depends on nonzeros |
| Matrix-free | works with implicit operators |
| Adaptive | subspace grows from the residual |
| Broad | supports systems and eigenvalue problems |

The main basis processes are Arnoldi for general matrices and Lanczos for symmetric or Hermitian matrices. The main solvers include CG, MINRES, GMRES, and BiCGSTAB.

Krylov subspaces provide the algebraic structure behind much of modern large-scale numerical linear algebra.