Chapter 126. Numerical Linear Algebra

Numerical linear algebra studies algorithms for computing with vectors and matrices using finite-precision arithmetic.

Exact symbolic formulas are often impractical for large systems. Instead, computations are performed approximately on digital hardware. Numerical linear algebra analyzes the accuracy, stability, efficiency, and scalability of these computations.

The subject includes solving linear systems, least squares problems, eigenvalue problems, singular value decompositions, matrix factorizations, iterative methods, sparse matrices, conditioning, and floating-point error analysis.

Modern scientific computing, machine learning, optimization, graphics, signal processing, simulation, and data analysis all depend on numerical linear algebra. Matrix factorizations such as LU, QR, and SVD are central tools throughout the field. (cs.cornell.edu)

126.1 Floating-Point Arithmetic

Computers do not represent real numbers exactly.

Instead, they use floating-point numbers of the form

\pm m \times b^e,

where:

Symbol	Meaning
$m$	Mantissa or significand
$b$	Base
$e$	Exponent

Most modern systems use binary floating-point arithmetic.

A floating-point number approximates a real number. Arithmetic operations are rounded to nearby representable values.

Thus

(a+b)+c

may differ from

a+(b+c).

Finite precision changes algebraic behavior.

126.2 Machine Precision

Machine precision measures the relative spacing of floating-point numbers.

The machine epsilon $\varepsilon_{\text{mach}}$ is approximately the smallest number such that

1+\varepsilon_{\text{mach}} > 1

in floating-point arithmetic.

For IEEE double precision,

\varepsilon_{\text{mach}}\approx 2^{-53}\approx 10^{-16}.

Machine precision limits attainable accuracy.

No stable algorithm can usually compute significantly beyond machine precision relative to the conditioning of the problem.

126.3 Rounding Error

Each arithmetic operation introduces small rounding error.

A common model is

\operatorname{fl}(x\circ y) = (x\circ y)(1+\delta),

where:

Symbol	Meaning
$\circ$	Arithmetic operation
$\operatorname{fl}$	Floating-point result
$\delta$	Small relative error

Typically,

|\delta|\leq \varepsilon_{\text{mach}}.

Although each error is tiny, many operations may accumulate substantial error.

Numerical analysis studies how these local errors propagate through algorithms.

126.4 Catastrophic Cancellation

Cancellation occurs when subtracting nearly equal numbers.

For example,

1.000000000001 - 1.000000000000

may lose many significant digits.

The relative error after subtraction can become large even if the original numbers were accurate.

This is called catastrophic cancellation.

Algorithms should avoid unnecessary cancellation when possible.

For example, the quadratic formula

x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}

can suffer cancellation for one root. Stable rearrangements are preferred in numerical computation.

126.5 Backward Error

Backward error asks:

What nearby problem did the algorithm solve exactly?

Suppose an algorithm computes $\hat{x}$ for the system

Ax=b.

(A+\Delta A)\hat{x}=b+\Delta b,

then $\Delta A$ and $\Delta b$ describe the backward error.

An algorithm is backward stable if it solves a nearby problem with small perturbations.

Backward stability is one of the most important concepts in numerical linear algebra. Cornell notes emphasize backward stability as solving a nearby problem exactly. (cs.cornell.edu)

126.6 Conditioning

Conditioning measures sensitivity of the solution to perturbations in the data.

For a nonsingular matrix $A$ , the condition number is

\kappa(A)=\|A\|\|A^{-1}\|.

If $\kappa(A)$ is large, small perturbations in the data may produce large changes in the solution.

If $\kappa(A)$ is near $1$ , the problem is well-conditioned.

Conditioning depends on the mathematical problem itself, not on the algorithm.

Algorithms may be stable or unstable, but even a stable algorithm cannot overcome severe ill-conditioning.

126.7 Linear Systems

A central problem is solving

Ax=b.

For dense matrices, direct factorization methods are common.

For sparse or very large matrices, iterative methods are often preferred.

The computational strategy depends on:

Matrix property	Preferred methods
Dense	LU, QR
Symmetric positive definite	Cholesky, conjugate gradient
Sparse	Sparse direct or iterative methods
Structured	Specialized algorithms
Huge	Krylov methods, multigrid

Linear system solution is one of the main workloads in scientific computing.

126.8 Gaussian Elimination

Gaussian elimination transforms a matrix into upper triangular form using row operations.

For

Ax=b,

the process computes

A=LU,

where:

Matrix	Meaning
$L$	Lower triangular
$U$	Upper triangular

Then solve:

Ly=b,

followed by

Ux=y.

Triangular systems are solved efficiently by forward and backward substitution.

Gaussian elimination is one of the most fundamental algorithms in numerical linear algebra.

126.9 Pivoting

Without pivoting, Gaussian elimination may become numerically unstable.

Pivoting swaps rows or columns to avoid dividing by small numbers.

Partial pivoting chooses the largest available pivot in magnitude within the current column.

The factorization becomes

PA=LU,

where $P$ is a permutation matrix.

Pivoting greatly improves stability in practice.

Most dense linear algebra libraries use pivoted LU factorization by default.

126.10 Complexity of Dense Elimination

For an $n\times n$ dense matrix, Gaussian elimination requires approximately

\frac{2}{3}n^3

floating-point operations.

Storage requires approximately

n^2

numbers.

Thus direct dense methods become expensive for large $n$ .

For example:

$n$	Approximate operations
$10^3$	$10^9$
$10^4$	$10^{12}$
$10^5$	$10^{15}$

This motivates sparse and iterative methods for large problems.

126.11 Sparse Matrices

A sparse matrix has mostly zero entries.

Examples arise naturally in:

Area	Source of sparsity
Finite elements	Local basis support
Graphs	Limited adjacency
PDE discretization	Local coupling
Networks	Sparse connections
Markov chains	Few transitions

Sparse matrices are stored using compressed formats such as CSR or CSC.

Algorithms try to avoid computations involving zeros.

Sparsity changes both computational complexity and memory usage dramatically.

126.12 Fill-In

During elimination, zero entries may become nonzero.

This phenomenon is called fill-in.

For sparse matrices, fill-in can destroy sparsity and increase cost.

Matrix ordering strategies attempt to reduce fill-in.

Examples include:

Ordering	Purpose
Minimum degree	Reduce local fill
Nested dissection	Exploit graph separators
Reverse Cuthill-McKee	Reduce bandwidth

Sparse matrix computation is strongly connected to graph structure.

126.13 Cholesky Factorization

If $A$ is symmetric positive definite, then

A=LL^T,

where $L$ is lower triangular.

This is the Cholesky factorization.

Compared with LU factorization, Cholesky:

Property	Advantage
Uses symmetry	Half storage
Uses definiteness	No pivoting needed
Fewer operations	About half the work of LU

Cholesky is widely used in optimization, statistics, finite elements, and Gaussian process models.

126.14 QR Factorization

The QR factorization expresses a matrix as

A=QR,

where:

Matrix	Property
$Q$	Orthogonal
$R$	Upper triangular

QR factorization is fundamental for least squares problems.

Given

Ax\approx b,

compute

QRx\approx b.

Since orthogonal transformations preserve norms,

Rx\approx Q^Tb.

This triangular system gives the least squares solution.

QR factorization is more stable than solving normal equations directly.

126.15 Householder Reflections

A Householder reflection has the form

H=I-2vv^T,

where $v$ is a unit vector.

It reflects vectors across a hyperplane.

Householder transformations are orthogonal and numerically stable.

They are widely used for QR factorization because they systematically eliminate subdiagonal entries.

A sequence of Householder reflections transforms a matrix into triangular form.

126.16 Givens Rotations

A Givens rotation acts on two coordinates at a time.

It has the form

G= \begin{bmatrix} c & s \\ -s & c \end{bmatrix},

embedded into a larger identity matrix.

Givens rotations are useful when matrices are sparse because they modify only selected rows or columns.

They are commonly used in updating factorizations and in some eigenvalue algorithms.

126.17 Least Squares Problems

The least squares problem seeks

x = \arg\min_x \|Ax-b\|^2.

The normal equations are

A^TAx=A^Tb.

If $A$ has full column rank, the solution is unique.

However, solving normal equations may square the condition number:

\kappa(A^TA)=\kappa(A)^2.

Therefore QR factorization or SVD is often preferred.

Least squares problems appear throughout data fitting, regression, inverse problems, and machine learning.

126.18 Singular Value Decomposition

The singular value decomposition is

A=U\Sigma V^T.

Here:

Matrix	Property
$U$	Orthogonal
$V$	Orthogonal
$\Sigma$	Diagonal with nonnegative entries

The diagonal entries

\sigma_1\geq \sigma_2\geq \cdots

are the singular values.

The SVD reveals:

Quantity	From SVD
Rank	Number of nonzero singular values
Condition number	$\sigma_1/\sigma_r$
Best low-rank approximation	Truncated SVD
Null space	Small singular vectors
Principal directions	Singular vectors

The SVD is one of the most important matrix decompositions in numerical computation.

126.19 Eigenvalue Problems

The eigenvalue problem seeks

Av=\lambda v.

Numerical eigenvalue algorithms include:

Method	Main idea
Power iteration	Dominant eigenvector
Inverse iteration	Targeted eigenvalues
QR algorithm	Orthogonal similarity iteration
Lanczos	Sparse symmetric matrices
Arnoldi	General sparse matrices

Eigenvalue problems arise in vibration analysis, quantum mechanics, PCA, graph analysis, stability analysis, and differential equations.

126.20 Power Iteration

The power method repeatedly multiplies by $A$ :

x_{k+1}=\frac{Ax_k}{\|Ax_k\|}.

Under suitable conditions, the vectors converge toward the dominant eigenvector.

The convergence rate depends on the spectral gap:

\left|\frac{\lambda_2}{\lambda_1}\right|.

Power iteration is simple and scalable for large sparse matrices.

It underlies PageRank and many large-scale spectral algorithms.

126.21 Krylov Subspaces

Iterative methods often work in Krylov subspaces.

The $m$ -th Krylov subspace generated by $A$ and $b$ is

\mathcal{K}_m(A,b) = \operatorname{span} \{b,Ab,A^2b,\ldots,A^{m-1}b\}.

Krylov methods build approximate solutions from these spaces.

Examples include:

Method	Matrix type
Conjugate gradient	Symmetric positive definite
GMRES	General nonsymmetric
MINRES	Symmetric indefinite
Lanczos	Eigenvalue problems

These methods rely mainly on matrix-vector multiplication.

126.22 Conjugate Gradient Method

For symmetric positive definite matrices, the conjugate gradient method solves

Ax=b

iteratively.

The method minimizes the quadratic energy

\frac12 x^TAx-b^Tx.

The search directions are $A$ -orthogonal:

p_i^TAp_j=0 \quad \text{for } i\neq j.

Conjugate gradient often converges much faster than worst-case bounds suggest.

Its efficiency depends strongly on the spectrum of $A$ .

126.23 Preconditioning

Preconditioning improves iterative convergence.

Instead of solving

Ax=b,

solve an equivalent transformed system

M^{-1}Ax=M^{-1}b.

The matrix $M$ approximates $A$ but is easier to invert.

A good preconditioner reduces the effective condition number.

Common preconditioners include:

Preconditioner	Idea
Jacobi	Diagonal scaling
Incomplete LU	Approximate factorization
Multigrid	Hierarchical correction
Domain decomposition	Subdomain solves

Preconditioning is often the key to practical large-scale performance.

126.24 Stability of Orthogonal Transformations

Orthogonal matrices preserve norms:

\|Qx\|=\|x\|.

This makes orthogonal transformations numerically stable.

Algorithms based on orthogonal transformations, such as QR factorization, tend to have excellent stability properties.

This is one reason QR methods are preferred over explicitly forming normal equations.

Orthogonality is extremely valuable numerically.

126.25 Matrix Norms

Matrix norms measure matrix size.

Common norms include:

Norm	Definition
Spectral norm	Largest singular value
Frobenius norm	Square root of sum of squares
1-norm	Maximum column sum
Infinity norm	Maximum row sum

Norms are used in error bounds, conditioning, perturbation analysis, and convergence analysis.

Different norms emphasize different aspects of matrix behavior.

126.26 Perturbation Theory

Perturbation theory studies how small changes affect solutions.

Questions include:

Problem	Perturbation question
Linear systems	How does $x$ change if $A$ changes?
Eigenvalues	How stable are eigenvalues?
Singular values	How sensitive are low-rank structures?
Least squares	How sensitive is the fit?

Perturbation theory combines linear algebra with analysis and numerical stability.

It explains why some computations are inherently difficult.

126.27 Parallel and High-Performance Computation

Modern numerical linear algebra must exploit hardware efficiently.

Performance depends on:

Factor	Importance
Cache locality	Memory efficiency
Parallelism	Multiple processors
GPU acceleration	Massive throughput
Communication cost	Distributed systems
Sparse access patterns	Memory bandwidth

Large-scale computation often becomes limited by data movement rather than arithmetic itself.

This changes algorithm design.

126.28 Numerical Linear Algebra and Applications

Numerical linear algebra is foundational because many applications reduce to matrix problems.

Area	Typical matrix problem
Machine learning	Least squares, SVD
Finite elements	Sparse linear systems
Graphics	Geometric transformations
Optimization	Newton systems
Signal processing	Spectral analysis
Networks	Eigenvectors
Statistics	Covariance factorizations
Quantum mechanics	Eigenvalue problems
PDEs	Sparse operators

Efficient computation with matrices is therefore central across scientific and engineering disciplines.

126.29 Numerical Linear Algebra and Linear Algebra

The dictionary is direct.

Numerical concept	Linear algebra object
Floating-point vector	Approximate vector
Stability	Error growth under operations
Conditioning	Sensitivity of matrix problems
LU factorization	Triangular decomposition
QR factorization	Orthogonal-triangular decomposition
Cholesky	Symmetric positive definite factorization
Iterative solver	Krylov subspace method
Sparse matrix	Graph-structured operator
SVD	Orthogonal spectral factorization
Preconditioner	Approximate inverse operator
Eigenvalue algorithm	Spectral computation

Numerical linear algebra studies how linear algebra behaves under finite computation.

126.30 Summary

Numerical linear algebra develops practical algorithms for matrix computation under finite precision.

Floating-point arithmetic introduces rounding error, cancellation, and stability concerns. Conditioning measures sensitivity of problems. Stable algorithms aim to control error growth.

Linear systems are solved using LU, Cholesky, or iterative Krylov methods. Least squares problems use QR or SVD methods. Sparse matrices require specialized storage and algorithms. Eigenvalue problems use iterative spectral methods.

The central principle is approximation under finite precision. Numerical linear algebra studies how to compute meaningful answers efficiently and reliably even though arithmetic is inexact and matrices may be extremely large.