# Chapter 126. Numerical Linear Algebra

# Chapter 126. Numerical Linear Algebra

Numerical linear algebra studies algorithms for computing with vectors and matrices using finite-precision arithmetic.

Exact symbolic formulas are often impractical for large systems. Instead, computations are performed approximately on digital hardware. Numerical linear algebra analyzes the accuracy, stability, efficiency, and scalability of these computations.

The subject includes solving linear systems, least squares problems, eigenvalue problems, singular value decompositions, matrix factorizations, iterative methods, sparse matrices, conditioning, and floating-point error analysis.

Modern scientific computing, machine learning, optimization, graphics, signal processing, simulation, and data analysis all depend on numerical linear algebra. Matrix factorizations such as LU, QR, and SVD are central tools throughout the field. ([cs.cornell.edu](https://www.cs.cornell.edu/courses/cs6210/2025fa/lec/2025-08-25.pdf?utm_source=chatgpt.com))

## 126.1 Floating-Point Arithmetic

Computers do not represent real numbers exactly.

Instead, they use floating-point numbers of the form

$$
\pm m \times b^e,
$$

where:

| Symbol | Meaning |
|---|---|
| \(m\) | Mantissa or significand |
| \(b\) | Base |
| \(e\) | Exponent |

Most modern systems use binary floating-point arithmetic.

A floating-point number approximates a real number. Arithmetic operations are rounded to nearby representable values.

Thus

$$
(a+b)+c
$$

may differ from

$$
a+(b+c).
$$

Finite precision changes algebraic behavior.

## 126.2 Machine Precision

Machine precision measures the relative spacing of floating-point numbers.

The machine epsilon \(\varepsilon_{\text{mach}}\) is approximately the smallest number such that

$$
1+\varepsilon_{\text{mach}} > 1
$$

in floating-point arithmetic.

For IEEE double precision,

$$
\varepsilon_{\text{mach}}\approx 2^{-53}\approx 10^{-16}.
$$

Machine precision limits attainable accuracy.

No stable algorithm can usually compute significantly beyond machine precision relative to the conditioning of the problem.

## 126.3 Rounding Error

Each arithmetic operation introduces small rounding error.

A common model is

$$
\operatorname{fl}(x\circ y) =
(x\circ y)(1+\delta),
$$

where:

| Symbol | Meaning |
|---|---|
| \(\circ\) | Arithmetic operation |
| \(\operatorname{fl}\) | Floating-point result |
| \(\delta\) | Small relative error |

Typically,

$$
|\delta|\leq \varepsilon_{\text{mach}}.
$$

Although each error is tiny, many operations may accumulate substantial error.

Numerical analysis studies how these local errors propagate through algorithms.

## 126.4 Catastrophic Cancellation

Cancellation occurs when subtracting nearly equal numbers.

For example,

$$
1.000000000001 - 1.000000000000
$$

may lose many significant digits.

The relative error after subtraction can become large even if the original numbers were accurate.

This is called catastrophic cancellation.

Algorithms should avoid unnecessary cancellation when possible.

For example, the quadratic formula

$$
x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}
$$

can suffer cancellation for one root. Stable rearrangements are preferred in numerical computation.

## 126.5 Backward Error

Backward error asks:

What nearby problem did the algorithm solve exactly?

Suppose an algorithm computes \(\hat{x}\) for the system

$$
Ax=b.
$$

If

$$
(A+\Delta A)\hat{x}=b+\Delta b,
$$

then \(\Delta A\) and \(\Delta b\) describe the backward error.

An algorithm is backward stable if it solves a nearby problem with small perturbations.

Backward stability is one of the most important concepts in numerical linear algebra. Cornell notes emphasize backward stability as solving a nearby problem exactly. ([cs.cornell.edu](https://www.cs.cornell.edu/courses/cs6210/2025fa/lec/2025-08-25.pdf?utm_source=chatgpt.com))

## 126.6 Conditioning

Conditioning measures sensitivity of the solution to perturbations in the data.

For a nonsingular matrix \(A\), the condition number is

$$
\kappa(A)=\|A\|\|A^{-1}\|.
$$

If \(\kappa(A)\) is large, small perturbations in the data may produce large changes in the solution.

If \(\kappa(A)\) is near \(1\), the problem is well-conditioned.

Conditioning depends on the mathematical problem itself, not on the algorithm.

Algorithms may be stable or unstable, but even a stable algorithm cannot overcome severe ill-conditioning.

## 126.7 Linear Systems

A central problem is solving

$$
Ax=b.
$$

For dense matrices, direct factorization methods are common.

For sparse or very large matrices, iterative methods are often preferred.

The computational strategy depends on:

| Matrix property | Preferred methods |
|---|---|
| Dense | LU, QR |
| Symmetric positive definite | Cholesky, conjugate gradient |
| Sparse | Sparse direct or iterative methods |
| Structured | Specialized algorithms |
| Huge | Krylov methods, multigrid |

Linear system solution is one of the main workloads in scientific computing.

## 126.8 Gaussian Elimination

Gaussian elimination transforms a matrix into upper triangular form using row operations.

For

$$
Ax=b,
$$

the process computes

$$
A=LU,
$$

where:

| Matrix | Meaning |
|---|---|
| \(L\) | Lower triangular |
| \(U\) | Upper triangular |

Then solve:

$$
Ly=b,
$$

followed by

$$
Ux=y.
$$

Triangular systems are solved efficiently by forward and backward substitution.

Gaussian elimination is one of the most fundamental algorithms in numerical linear algebra.

## 126.9 Pivoting

Without pivoting, Gaussian elimination may become numerically unstable.

Pivoting swaps rows or columns to avoid dividing by small numbers.

Partial pivoting chooses the largest available pivot in magnitude within the current column.

The factorization becomes

$$
PA=LU,
$$

where \(P\) is a permutation matrix.

Pivoting greatly improves stability in practice.

Most dense linear algebra libraries use pivoted LU factorization by default.

## 126.10 Complexity of Dense Elimination

For an \(n\times n\) dense matrix, Gaussian elimination requires approximately

$$
\frac{2}{3}n^3
$$

floating-point operations.

Storage requires approximately

$$
n^2
$$

numbers.

Thus direct dense methods become expensive for large \(n\).

For example:

| \(n\) | Approximate operations |
|---|---:|
| \(10^3\) | \(10^9\) |
| \(10^4\) | \(10^{12}\) |
| \(10^5\) | \(10^{15}\) |

This motivates sparse and iterative methods for large problems.

## 126.11 Sparse Matrices

A sparse matrix has mostly zero entries.

Examples arise naturally in:

| Area | Source of sparsity |
|---|---|
| Finite elements | Local basis support |
| Graphs | Limited adjacency |
| PDE discretization | Local coupling |
| Networks | Sparse connections |
| Markov chains | Few transitions |

Sparse matrices are stored using compressed formats such as CSR or CSC.

Algorithms try to avoid computations involving zeros.

Sparsity changes both computational complexity and memory usage dramatically.

## 126.12 Fill-In

During elimination, zero entries may become nonzero.

This phenomenon is called fill-in.

For sparse matrices, fill-in can destroy sparsity and increase cost.

Matrix ordering strategies attempt to reduce fill-in.

Examples include:

| Ordering | Purpose |
|---|---|
| Minimum degree | Reduce local fill |
| Nested dissection | Exploit graph separators |
| Reverse Cuthill-McKee | Reduce bandwidth |

Sparse matrix computation is strongly connected to graph structure.

## 126.13 Cholesky Factorization

If \(A\) is symmetric positive definite, then

$$
A=LL^T,
$$

where \(L\) is lower triangular.

This is the Cholesky factorization.

Compared with LU factorization, Cholesky:

| Property | Advantage |
|---|---|
| Uses symmetry | Half storage |
| Uses definiteness | No pivoting needed |
| Fewer operations | About half the work of LU |

Cholesky is widely used in optimization, statistics, finite elements, and Gaussian process models.

## 126.14 QR Factorization

The QR factorization expresses a matrix as

$$
A=QR,
$$

where:

| Matrix | Property |
|---|---|
| \(Q\) | Orthogonal |
| \(R\) | Upper triangular |

QR factorization is fundamental for least squares problems.

Given

$$
Ax\approx b,
$$

compute

$$
QRx\approx b.
$$

Since orthogonal transformations preserve norms,

$$
Rx\approx Q^Tb.
$$

This triangular system gives the least squares solution.

QR factorization is more stable than solving normal equations directly.

## 126.15 Householder Reflections

A Householder reflection has the form

$$
H=I-2vv^T,
$$

where \(v\) is a unit vector.

It reflects vectors across a hyperplane.

Householder transformations are orthogonal and numerically stable.

They are widely used for QR factorization because they systematically eliminate subdiagonal entries.

A sequence of Householder reflections transforms a matrix into triangular form.

## 126.16 Givens Rotations

A Givens rotation acts on two coordinates at a time.

It has the form

$$
G=
\begin{bmatrix}
c & s \\
-s & c
\end{bmatrix},
$$

embedded into a larger identity matrix.

Givens rotations are useful when matrices are sparse because they modify only selected rows or columns.

They are commonly used in updating factorizations and in some eigenvalue algorithms.

## 126.17 Least Squares Problems

The least squares problem seeks

$$
x =
\arg\min_x \|Ax-b\|^2.
$$

The normal equations are

$$
A^TAx=A^Tb.
$$

If \(A\) has full column rank, the solution is unique.

However, solving normal equations may square the condition number:

$$
\kappa(A^TA)=\kappa(A)^2.
$$

Therefore QR factorization or SVD is often preferred.

Least squares problems appear throughout data fitting, regression, inverse problems, and machine learning.

## 126.18 Singular Value Decomposition

The singular value decomposition is

$$
A=U\Sigma V^T.
$$

Here:

| Matrix | Property |
|---|---|
| \(U\) | Orthogonal |
| \(V\) | Orthogonal |
| \(\Sigma\) | Diagonal with nonnegative entries |

The diagonal entries

$$
\sigma_1\geq \sigma_2\geq \cdots
$$

are the singular values.

The SVD reveals:

| Quantity | From SVD |
|---|---|
| Rank | Number of nonzero singular values |
| Condition number | \(\sigma_1/\sigma_r\) |
| Best low-rank approximation | Truncated SVD |
| Null space | Small singular vectors |
| Principal directions | Singular vectors |

The SVD is one of the most important matrix decompositions in numerical computation.

## 126.19 Eigenvalue Problems

The eigenvalue problem seeks

$$
Av=\lambda v.
$$

Numerical eigenvalue algorithms include:

| Method | Main idea |
|---|---|
| Power iteration | Dominant eigenvector |
| Inverse iteration | Targeted eigenvalues |
| QR algorithm | Orthogonal similarity iteration |
| Lanczos | Sparse symmetric matrices |
| Arnoldi | General sparse matrices |

Eigenvalue problems arise in vibration analysis, quantum mechanics, PCA, graph analysis, stability analysis, and differential equations.

## 126.20 Power Iteration

The power method repeatedly multiplies by \(A\):

$$
x_{k+1}=\frac{Ax_k}{\|Ax_k\|}.
$$

Under suitable conditions, the vectors converge toward the dominant eigenvector.

The convergence rate depends on the spectral gap:

$$
\left|\frac{\lambda_2}{\lambda_1}\right|.
$$

Power iteration is simple and scalable for large sparse matrices.

It underlies PageRank and many large-scale spectral algorithms.

## 126.21 Krylov Subspaces

Iterative methods often work in Krylov subspaces.

The \(m\)-th Krylov subspace generated by \(A\) and \(b\) is

$$
\mathcal{K}_m(A,b) =
\operatorname{span}
\{b,Ab,A^2b,\ldots,A^{m-1}b\}.
$$

Krylov methods build approximate solutions from these spaces.

Examples include:

| Method | Matrix type |
|---|---|
| Conjugate gradient | Symmetric positive definite |
| GMRES | General nonsymmetric |
| MINRES | Symmetric indefinite |
| Lanczos | Eigenvalue problems |

These methods rely mainly on matrix-vector multiplication.

## 126.22 Conjugate Gradient Method

For symmetric positive definite matrices, the conjugate gradient method solves

$$
Ax=b
$$

iteratively.

The method minimizes the quadratic energy

$$
\frac12 x^TAx-b^Tx.
$$

The search directions are \(A\)-orthogonal:

$$
p_i^TAp_j=0
\quad \text{for } i\neq j.
$$

Conjugate gradient often converges much faster than worst-case bounds suggest.

Its efficiency depends strongly on the spectrum of \(A\).

## 126.23 Preconditioning

Preconditioning improves iterative convergence.

Instead of solving

$$
Ax=b,
$$

solve an equivalent transformed system

$$
M^{-1}Ax=M^{-1}b.
$$

The matrix \(M\) approximates \(A\) but is easier to invert.

A good preconditioner reduces the effective condition number.

Common preconditioners include:

| Preconditioner | Idea |
|---|---|
| Jacobi | Diagonal scaling |
| Incomplete LU | Approximate factorization |
| Multigrid | Hierarchical correction |
| Domain decomposition | Subdomain solves |

Preconditioning is often the key to practical large-scale performance.

## 126.24 Stability of Orthogonal Transformations

Orthogonal matrices preserve norms:

$$
\|Qx\|=\|x\|.
$$

This makes orthogonal transformations numerically stable.

Algorithms based on orthogonal transformations, such as QR factorization, tend to have excellent stability properties.

This is one reason QR methods are preferred over explicitly forming normal equations.

Orthogonality is extremely valuable numerically.

## 126.25 Matrix Norms

Matrix norms measure matrix size.

Common norms include:

| Norm | Definition |
|---|---|
| Spectral norm | Largest singular value |
| Frobenius norm | Square root of sum of squares |
| 1-norm | Maximum column sum |
| Infinity norm | Maximum row sum |

Norms are used in error bounds, conditioning, perturbation analysis, and convergence analysis.

Different norms emphasize different aspects of matrix behavior.

## 126.26 Perturbation Theory

Perturbation theory studies how small changes affect solutions.

Questions include:

| Problem | Perturbation question |
|---|---|
| Linear systems | How does \(x\) change if \(A\) changes? |
| Eigenvalues | How stable are eigenvalues? |
| Singular values | How sensitive are low-rank structures? |
| Least squares | How sensitive is the fit? |

Perturbation theory combines linear algebra with analysis and numerical stability.

It explains why some computations are inherently difficult.

## 126.27 Parallel and High-Performance Computation

Modern numerical linear algebra must exploit hardware efficiently.

Performance depends on:

| Factor | Importance |
|---|---|
| Cache locality | Memory efficiency |
| Parallelism | Multiple processors |
| GPU acceleration | Massive throughput |
| Communication cost | Distributed systems |
| Sparse access patterns | Memory bandwidth |

Large-scale computation often becomes limited by data movement rather than arithmetic itself.

This changes algorithm design.

## 126.28 Numerical Linear Algebra and Applications

Numerical linear algebra is foundational because many applications reduce to matrix problems.

| Area | Typical matrix problem |
|---|---|
| Machine learning | Least squares, SVD |
| Finite elements | Sparse linear systems |
| Graphics | Geometric transformations |
| Optimization | Newton systems |
| Signal processing | Spectral analysis |
| Networks | Eigenvectors |
| Statistics | Covariance factorizations |
| Quantum mechanics | Eigenvalue problems |
| PDEs | Sparse operators |

Efficient computation with matrices is therefore central across scientific and engineering disciplines.

## 126.29 Numerical Linear Algebra and Linear Algebra

The dictionary is direct.

| Numerical concept | Linear algebra object |
|---|---|
| Floating-point vector | Approximate vector |
| Stability | Error growth under operations |
| Conditioning | Sensitivity of matrix problems |
| LU factorization | Triangular decomposition |
| QR factorization | Orthogonal-triangular decomposition |
| Cholesky | Symmetric positive definite factorization |
| Iterative solver | Krylov subspace method |
| Sparse matrix | Graph-structured operator |
| SVD | Orthogonal spectral factorization |
| Preconditioner | Approximate inverse operator |
| Eigenvalue algorithm | Spectral computation |

Numerical linear algebra studies how linear algebra behaves under finite computation.

## 126.30 Summary

Numerical linear algebra develops practical algorithms for matrix computation under finite precision.

Floating-point arithmetic introduces rounding error, cancellation, and stability concerns. Conditioning measures sensitivity of problems. Stable algorithms aim to control error growth.

Linear systems are solved using LU, Cholesky, or iterative Krylov methods. Least squares problems use QR or SVD methods. Sparse matrices require specialized storage and algorithms. Eigenvalue problems use iterative spectral methods.

The central principle is approximation under finite precision. Numerical linear algebra studies how to compute meaningful answers efficiently and reliably even though arithmetic is inexact and matrices may be extremely large.