Skip to content

Chapter 126. Numerical Linear Algebra

Numerical linear algebra studies algorithms for computing with vectors and matrices using finite-precision arithmetic.

Exact symbolic formulas are often impractical for large systems. Instead, computations are performed approximately on digital hardware. Numerical linear algebra analyzes the accuracy, stability, efficiency, and scalability of these computations.

The subject includes solving linear systems, least squares problems, eigenvalue problems, singular value decompositions, matrix factorizations, iterative methods, sparse matrices, conditioning, and floating-point error analysis.

Modern scientific computing, machine learning, optimization, graphics, signal processing, simulation, and data analysis all depend on numerical linear algebra. Matrix factorizations such as LU, QR, and SVD are central tools throughout the field. (cs.cornell.edu)

126.1 Floating-Point Arithmetic

Computers do not represent real numbers exactly.

Instead, they use floating-point numbers of the form

±m×be, \pm m \times b^e,

where:

SymbolMeaning
mmMantissa or significand
bbBase
eeExponent

Most modern systems use binary floating-point arithmetic.

A floating-point number approximates a real number. Arithmetic operations are rounded to nearby representable values.

Thus

(a+b)+c (a+b)+c

may differ from

a+(b+c). a+(b+c).

Finite precision changes algebraic behavior.

126.2 Machine Precision

Machine precision measures the relative spacing of floating-point numbers.

The machine epsilon εmach\varepsilon_{\text{mach}} is approximately the smallest number such that

1+εmach>1 1+\varepsilon_{\text{mach}} > 1

in floating-point arithmetic.

For IEEE double precision,

εmach2531016. \varepsilon_{\text{mach}}\approx 2^{-53}\approx 10^{-16}.

Machine precision limits attainable accuracy.

No stable algorithm can usually compute significantly beyond machine precision relative to the conditioning of the problem.

126.3 Rounding Error

Each arithmetic operation introduces small rounding error.

A common model is

fl(xy)=(xy)(1+δ), \operatorname{fl}(x\circ y) = (x\circ y)(1+\delta),

where:

SymbolMeaning
\circArithmetic operation
fl\operatorname{fl}Floating-point result
δ\deltaSmall relative error

Typically,

δεmach. |\delta|\leq \varepsilon_{\text{mach}}.

Although each error is tiny, many operations may accumulate substantial error.

Numerical analysis studies how these local errors propagate through algorithms.

126.4 Catastrophic Cancellation

Cancellation occurs when subtracting nearly equal numbers.

For example,

1.0000000000011.000000000000 1.000000000001 - 1.000000000000

may lose many significant digits.

The relative error after subtraction can become large even if the original numbers were accurate.

This is called catastrophic cancellation.

Algorithms should avoid unnecessary cancellation when possible.

For example, the quadratic formula

x=b±b24ac2a x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}

can suffer cancellation for one root. Stable rearrangements are preferred in numerical computation.

126.5 Backward Error

Backward error asks:

What nearby problem did the algorithm solve exactly?

Suppose an algorithm computes x^\hat{x} for the system

Ax=b. Ax=b.

If

(A+ΔA)x^=b+Δb, (A+\Delta A)\hat{x}=b+\Delta b,

then ΔA\Delta A and Δb\Delta b describe the backward error.

An algorithm is backward stable if it solves a nearby problem with small perturbations.

Backward stability is one of the most important concepts in numerical linear algebra. Cornell notes emphasize backward stability as solving a nearby problem exactly. (cs.cornell.edu)

126.6 Conditioning

Conditioning measures sensitivity of the solution to perturbations in the data.

For a nonsingular matrix AA, the condition number is

κ(A)=AA1. \kappa(A)=\|A\|\|A^{-1}\|.

If κ(A)\kappa(A) is large, small perturbations in the data may produce large changes in the solution.

If κ(A)\kappa(A) is near 11, the problem is well-conditioned.

Conditioning depends on the mathematical problem itself, not on the algorithm.

Algorithms may be stable or unstable, but even a stable algorithm cannot overcome severe ill-conditioning.

126.7 Linear Systems

A central problem is solving

Ax=b. Ax=b.

For dense matrices, direct factorization methods are common.

For sparse or very large matrices, iterative methods are often preferred.

The computational strategy depends on:

Matrix propertyPreferred methods
DenseLU, QR
Symmetric positive definiteCholesky, conjugate gradient
SparseSparse direct or iterative methods
StructuredSpecialized algorithms
HugeKrylov methods, multigrid

Linear system solution is one of the main workloads in scientific computing.

126.8 Gaussian Elimination

Gaussian elimination transforms a matrix into upper triangular form using row operations.

For

Ax=b, Ax=b,

the process computes

A=LU, A=LU,

where:

MatrixMeaning
LLLower triangular
UUUpper triangular

Then solve:

Ly=b, Ly=b,

followed by

Ux=y. Ux=y.

Triangular systems are solved efficiently by forward and backward substitution.

Gaussian elimination is one of the most fundamental algorithms in numerical linear algebra.

126.9 Pivoting

Without pivoting, Gaussian elimination may become numerically unstable.

Pivoting swaps rows or columns to avoid dividing by small numbers.

Partial pivoting chooses the largest available pivot in magnitude within the current column.

The factorization becomes

PA=LU, PA=LU,

where PP is a permutation matrix.

Pivoting greatly improves stability in practice.

Most dense linear algebra libraries use pivoted LU factorization by default.

126.10 Complexity of Dense Elimination

For an n×nn\times n dense matrix, Gaussian elimination requires approximately

23n3 \frac{2}{3}n^3

floating-point operations.

Storage requires approximately

n2 n^2

numbers.

Thus direct dense methods become expensive for large nn.

For example:

nnApproximate operations
10310^310910^9
10410^4101210^{12}
10510^5101510^{15}

This motivates sparse and iterative methods for large problems.

126.11 Sparse Matrices

A sparse matrix has mostly zero entries.

Examples arise naturally in:

AreaSource of sparsity
Finite elementsLocal basis support
GraphsLimited adjacency
PDE discretizationLocal coupling
NetworksSparse connections
Markov chainsFew transitions

Sparse matrices are stored using compressed formats such as CSR or CSC.

Algorithms try to avoid computations involving zeros.

Sparsity changes both computational complexity and memory usage dramatically.

126.12 Fill-In

During elimination, zero entries may become nonzero.

This phenomenon is called fill-in.

For sparse matrices, fill-in can destroy sparsity and increase cost.

Matrix ordering strategies attempt to reduce fill-in.

Examples include:

OrderingPurpose
Minimum degreeReduce local fill
Nested dissectionExploit graph separators
Reverse Cuthill-McKeeReduce bandwidth

Sparse matrix computation is strongly connected to graph structure.

126.13 Cholesky Factorization

If AA is symmetric positive definite, then

A=LLT, A=LL^T,

where LL is lower triangular.

This is the Cholesky factorization.

Compared with LU factorization, Cholesky:

PropertyAdvantage
Uses symmetryHalf storage
Uses definitenessNo pivoting needed
Fewer operationsAbout half the work of LU

Cholesky is widely used in optimization, statistics, finite elements, and Gaussian process models.

126.14 QR Factorization

The QR factorization expresses a matrix as

A=QR, A=QR,

where:

MatrixProperty
QQOrthogonal
RRUpper triangular

QR factorization is fundamental for least squares problems.

Given

Axb, Ax\approx b,

compute

QRxb. QRx\approx b.

Since orthogonal transformations preserve norms,

RxQTb. Rx\approx Q^Tb.

This triangular system gives the least squares solution.

QR factorization is more stable than solving normal equations directly.

126.15 Householder Reflections

A Householder reflection has the form

H=I2vvT, H=I-2vv^T,

where vv is a unit vector.

It reflects vectors across a hyperplane.

Householder transformations are orthogonal and numerically stable.

They are widely used for QR factorization because they systematically eliminate subdiagonal entries.

A sequence of Householder reflections transforms a matrix into triangular form.

126.16 Givens Rotations

A Givens rotation acts on two coordinates at a time.

It has the form

G=[cssc], G= \begin{bmatrix} c & s \\ -s & c \end{bmatrix},

embedded into a larger identity matrix.

Givens rotations are useful when matrices are sparse because they modify only selected rows or columns.

They are commonly used in updating factorizations and in some eigenvalue algorithms.

126.17 Least Squares Problems

The least squares problem seeks

x=argminxAxb2. x = \arg\min_x \|Ax-b\|^2.

The normal equations are

ATAx=ATb. A^TAx=A^Tb.

If AA has full column rank, the solution is unique.

However, solving normal equations may square the condition number:

κ(ATA)=κ(A)2. \kappa(A^TA)=\kappa(A)^2.

Therefore QR factorization or SVD is often preferred.

Least squares problems appear throughout data fitting, regression, inverse problems, and machine learning.

126.18 Singular Value Decomposition

The singular value decomposition is

A=UΣVT. A=U\Sigma V^T.

Here:

MatrixProperty
UUOrthogonal
VVOrthogonal
Σ\SigmaDiagonal with nonnegative entries

The diagonal entries

σ1σ2 \sigma_1\geq \sigma_2\geq \cdots

are the singular values.

The SVD reveals:

QuantityFrom SVD
RankNumber of nonzero singular values
Condition numberσ1/σr\sigma_1/\sigma_r
Best low-rank approximationTruncated SVD
Null spaceSmall singular vectors
Principal directionsSingular vectors

The SVD is one of the most important matrix decompositions in numerical computation.

126.19 Eigenvalue Problems

The eigenvalue problem seeks

Av=λv. Av=\lambda v.

Numerical eigenvalue algorithms include:

MethodMain idea
Power iterationDominant eigenvector
Inverse iterationTargeted eigenvalues
QR algorithmOrthogonal similarity iteration
LanczosSparse symmetric matrices
ArnoldiGeneral sparse matrices

Eigenvalue problems arise in vibration analysis, quantum mechanics, PCA, graph analysis, stability analysis, and differential equations.

126.20 Power Iteration

The power method repeatedly multiplies by AA:

xk+1=AxkAxk. x_{k+1}=\frac{Ax_k}{\|Ax_k\|}.

Under suitable conditions, the vectors converge toward the dominant eigenvector.

The convergence rate depends on the spectral gap:

λ2λ1. \left|\frac{\lambda_2}{\lambda_1}\right|.

Power iteration is simple and scalable for large sparse matrices.

It underlies PageRank and many large-scale spectral algorithms.

126.21 Krylov Subspaces

Iterative methods often work in Krylov subspaces.

The mm-th Krylov subspace generated by AA and bb is

Km(A,b)=span{b,Ab,A2b,,Am1b}. \mathcal{K}_m(A,b) = \operatorname{span} \{b,Ab,A^2b,\ldots,A^{m-1}b\}.

Krylov methods build approximate solutions from these spaces.

Examples include:

MethodMatrix type
Conjugate gradientSymmetric positive definite
GMRESGeneral nonsymmetric
MINRESSymmetric indefinite
LanczosEigenvalue problems

These methods rely mainly on matrix-vector multiplication.

126.22 Conjugate Gradient Method

For symmetric positive definite matrices, the conjugate gradient method solves

Ax=b Ax=b

iteratively.

The method minimizes the quadratic energy

12xTAxbTx. \frac12 x^TAx-b^Tx.

The search directions are AA-orthogonal:

piTApj=0for ij. p_i^TAp_j=0 \quad \text{for } i\neq j.

Conjugate gradient often converges much faster than worst-case bounds suggest.

Its efficiency depends strongly on the spectrum of AA.

126.23 Preconditioning

Preconditioning improves iterative convergence.

Instead of solving

Ax=b, Ax=b,

solve an equivalent transformed system

M1Ax=M1b. M^{-1}Ax=M^{-1}b.

The matrix MM approximates AA but is easier to invert.

A good preconditioner reduces the effective condition number.

Common preconditioners include:

PreconditionerIdea
JacobiDiagonal scaling
Incomplete LUApproximate factorization
MultigridHierarchical correction
Domain decompositionSubdomain solves

Preconditioning is often the key to practical large-scale performance.

126.24 Stability of Orthogonal Transformations

Orthogonal matrices preserve norms:

Qx=x. \|Qx\|=\|x\|.

This makes orthogonal transformations numerically stable.

Algorithms based on orthogonal transformations, such as QR factorization, tend to have excellent stability properties.

This is one reason QR methods are preferred over explicitly forming normal equations.

Orthogonality is extremely valuable numerically.

126.25 Matrix Norms

Matrix norms measure matrix size.

Common norms include:

NormDefinition
Spectral normLargest singular value
Frobenius normSquare root of sum of squares
1-normMaximum column sum
Infinity normMaximum row sum

Norms are used in error bounds, conditioning, perturbation analysis, and convergence analysis.

Different norms emphasize different aspects of matrix behavior.

126.26 Perturbation Theory

Perturbation theory studies how small changes affect solutions.

Questions include:

ProblemPerturbation question
Linear systemsHow does xx change if AA changes?
EigenvaluesHow stable are eigenvalues?
Singular valuesHow sensitive are low-rank structures?
Least squaresHow sensitive is the fit?

Perturbation theory combines linear algebra with analysis and numerical stability.

It explains why some computations are inherently difficult.

126.27 Parallel and High-Performance Computation

Modern numerical linear algebra must exploit hardware efficiently.

Performance depends on:

FactorImportance
Cache localityMemory efficiency
ParallelismMultiple processors
GPU accelerationMassive throughput
Communication costDistributed systems
Sparse access patternsMemory bandwidth

Large-scale computation often becomes limited by data movement rather than arithmetic itself.

This changes algorithm design.

126.28 Numerical Linear Algebra and Applications

Numerical linear algebra is foundational because many applications reduce to matrix problems.

AreaTypical matrix problem
Machine learningLeast squares, SVD
Finite elementsSparse linear systems
GraphicsGeometric transformations
OptimizationNewton systems
Signal processingSpectral analysis
NetworksEigenvectors
StatisticsCovariance factorizations
Quantum mechanicsEigenvalue problems
PDEsSparse operators

Efficient computation with matrices is therefore central across scientific and engineering disciplines.

126.29 Numerical Linear Algebra and Linear Algebra

The dictionary is direct.

Numerical conceptLinear algebra object
Floating-point vectorApproximate vector
StabilityError growth under operations
ConditioningSensitivity of matrix problems
LU factorizationTriangular decomposition
QR factorizationOrthogonal-triangular decomposition
CholeskySymmetric positive definite factorization
Iterative solverKrylov subspace method
Sparse matrixGraph-structured operator
SVDOrthogonal spectral factorization
PreconditionerApproximate inverse operator
Eigenvalue algorithmSpectral computation

Numerical linear algebra studies how linear algebra behaves under finite computation.

126.30 Summary

Numerical linear algebra develops practical algorithms for matrix computation under finite precision.

Floating-point arithmetic introduces rounding error, cancellation, and stability concerns. Conditioning measures sensitivity of problems. Stable algorithms aim to control error growth.

Linear systems are solved using LU, Cholesky, or iterative Krylov methods. Least squares problems use QR or SVD methods. Sparse matrices require specialized storage and algorithms. Eigenvalue problems use iterative spectral methods.

The central principle is approximation under finite precision. Numerical linear algebra studies how to compute meaningful answers efficiently and reliably even though arithmetic is inexact and matrices may be extremely large.