Chapter 16. Matrix Factorizations Overview

A matrix factorization writes a matrix as a product of simpler matrices. The factors are chosen so that they reveal structure or make computation easier. Common factorizations include LU, PLU, Cholesky, QR, Schur, eigenvalue decomposition, and singular value decomposition. These factorizations are used to solve linear systems, compute determinants, estimate rank, solve least squares problems, analyze geometry, and design stable numerical algorithms.

16.1 What a Factorization Is

A factorization has the form

A = BC

or, more generally,

A = B_1B_2\cdots B_k.

The matrix $A$ is replaced by factors with useful structure.

For example,

A = LU

writes $A$ as a product of a lower triangular matrix $L$ and an upper triangular matrix $U$ .

This is useful because triangular systems are easy to solve. Instead of solving

Ax=b,

one solves

LUx=b.

Let

Ux=y.

Then solve

Ly=b

by forward substitution, and solve

Ux=y

by back substitution.

16.2 Why Factorizations Matter

A factorization separates a hard problem into simpler problems.

Many matrix problems are expensive when handled directly. Factorization replaces a general matrix by structured factors. These factors may be triangular, diagonal, orthogonal, unitary, symmetric, or low rank.

The common purposes are:

Purpose	Factorization examples
Solve linear systems	LU, PLU, Cholesky
Solve least squares problems	QR, SVD
Study eigenvalues	Schur, spectral decomposition
Estimate rank	SVD, rank-revealing QR
Compute determinants	LU, Cholesky
Compress data	SVD, low-rank factorization
Improve numerical stability	QR, SVD, pivoted factorizations

A factorization is therefore both algebraic and computational. It rewrites the matrix while preserving the problem.

16.3 Triangular Matrices as Building Blocks

Triangular matrices are central because triangular systems can be solved directly.

An upper triangular system has the form

Ux=b,

where

U= \begin{bmatrix} u_{11}&u_{12}&\cdots&u_{1n}\\ 0&u_{22}&\cdots&u_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&u_{nn} \end{bmatrix}.

The last equation gives $x_n$ . Then the previous equation gives $x_{n-1}$ , and so on.

A lower triangular system is solved in the opposite direction, using forward substitution.

This explains why LU, PLU, and Cholesky factorizations are useful.

16.4 LU Factorization

An LU factorization writes a square matrix as

A=LU,

where $L$ is lower triangular and $U$ is upper triangular.

Usually $L$ is chosen with diagonal entries equal to $1$ . Such an $L$ is called unit lower triangular.

LU factorization records Gaussian elimination as a matrix product. The entries of $L$ store the multipliers used during elimination, and $U$ is the resulting upper triangular matrix.

For example, if

A= \begin{bmatrix} 2&1\\ 6&7 \end{bmatrix},

then eliminating the $6$ below the pivot $2$ uses multiplier $3$ . We get

U= \begin{bmatrix} 2&1\\ 0&4 \end{bmatrix}.

The corresponding lower triangular factor is

L= \begin{bmatrix} 1&0\\ 3&1 \end{bmatrix}.

Indeed,

LU= \begin{bmatrix} 1&0\\ 3&1 \end{bmatrix} \begin{bmatrix} 2&1\\ 0&4 \end{bmatrix} = \begin{bmatrix} 2&1\\ 6&7 \end{bmatrix}.

16.5 PLU Factorization

Some matrices require row swaps during elimination. In that case, the factorization is written as

PA=LU,

or equivalently

A=P^TLU,

where $P$ is a permutation matrix.

The matrix $P$ records the row swaps. The matrices $L$ and $U$ record the elimination after the swaps.

PLU factorization is more robust than plain LU because it handles zero or unsuitable pivot entries.

For example, the matrix

A= \begin{bmatrix} 0&1\\ 2&3 \end{bmatrix}

cannot start elimination with the entry $0$ in the first pivot position. Swapping rows gives

PA= \begin{bmatrix} 2&3\\ 0&1 \end{bmatrix}.

This is already upper triangular. The permutation matrix records the row swap.

16.6 Cholesky Factorization

Cholesky factorization applies to real symmetric positive definite matrices and complex Hermitian positive definite matrices. In the real case, it writes

A=LL^T,

where $L$ is lower triangular with positive diagonal entries. In the complex case, the form is

A=LL^*,

where $L^*$ is the conjugate transpose. Cholesky is especially useful for solving systems with symmetric positive definite matrices and is more efficient than general LU when its assumptions hold.

For example,

\begin{bmatrix} 4&2\\ 2&3 \end{bmatrix} = \begin{bmatrix} 2&0\\ 1&\sqrt{2} \end{bmatrix} \begin{bmatrix} 2&1\\ 0&\sqrt{2} \end{bmatrix}.

The matrix is decomposed into a triangular factor and its transpose.

Cholesky is common in optimization, statistics, least squares, covariance matrices, and numerical solutions of positive definite systems.

16.7 QR Factorization

QR factorization writes a matrix as

A=QR,

where $Q$ has orthonormal columns and $R$ is upper triangular.

For a square real matrix with full rank, $Q$ is orthogonal:

Q^TQ=I.

For a complex matrix, $Q$ is unitary:

Q^*Q=I.

QR factorization is closely related to Gram-Schmidt orthogonalization. It replaces the columns of $A$ by orthonormal directions and records the coefficients in $R$ .

QR is especially important for least squares problems. It avoids forming the normal equations

A^TAx=A^Tb,

which can worsen numerical conditioning.

16.8 Singular Value Decomposition

The singular value decomposition, or SVD, writes an $m\times n$ matrix as

A=U\Sigma V^T

in the real case, or

A=U\Sigma V^*

in the complex case.

Here $U$ and $V$ are orthogonal or unitary matrices, and $\Sigma$ is diagonal except for possible extra zero rows or columns.

The diagonal entries of $\Sigma$ are the singular values:

\sigma_1,\sigma_2,\ldots.

They are usually ordered as

\sigma_1\ge \sigma_2\ge \cdots \ge 0.

The SVD applies to every matrix. It is one of the most powerful factorizations because it reveals rank, null space, column space, conditioning, and best low-rank approximations. It is also useful for least squares problems, especially when the matrix is close to rank deficient.

16.9 Eigenvalue Decomposition

For some square matrices, one can write

A=PDP^{-1},

where $D$ is diagonal.

The columns of $P$ are eigenvectors of $A$ , and the diagonal entries of $D$ are eigenvalues.

This factorization is possible when $A$ has enough linearly independent eigenvectors. In that case, $A$ is diagonalizable.

Diagonalization is useful because powers and functions of $A$ become easier:

A^k=PD^kP^{-1}.

Since $D$ is diagonal, $D^k$ is obtained by raising each diagonal entry to the $k$ -th power.

16.10 Spectral Decomposition

For real symmetric matrices and complex Hermitian matrices, the eigenvalue decomposition has a stronger form.

In the real symmetric case,

A=Q\Lambda Q^T,

where $Q$ is orthogonal and $\Lambda$ is diagonal.

In the complex Hermitian case,

A=Q\Lambda Q^*,

where $Q$ is unitary and $\Lambda$ is real diagonal.

This is called spectral decomposition. It is a central result of inner product linear algebra. It says that symmetric or Hermitian matrices can be understood using orthonormal eigenvectors.

Spectral decomposition is used in quadratic forms, principal component analysis, differential equations, statistics, and optimization.

16.11 Schur Decomposition

The Schur decomposition writes a square matrix as

A=QTQ^*,

where $Q$ is unitary and $T$ is upper triangular.

For real matrices, a real Schur form also exists, with $Q$ orthogonal and $T$ quasi-upper triangular.

Schur decomposition is more general than diagonalization. Every complex square matrix has a Schur decomposition, even when it is not diagonalizable.

The diagonal entries of $T$ are the eigenvalues of $A$ . Thus Schur form is a stable way to study eigenvalues in numerical linear algebra.

16.12 Polar Decomposition

The polar decomposition writes a matrix as

A=UP,

where $U$ is orthogonal or unitary in the square nonsingular case, and $P$ is positive semidefinite.

It is the matrix analogue of writing a complex number as

z=e^{i\theta}r.

The factor $U$ represents rotation or reflection. The factor $P$ represents stretching.

Polar decomposition is closely related to the SVD. If

A=U_1\Sigma V^*

is an SVD, then

A=(U_1V^*)(V\Sigma V^*).

The first factor is orthogonal or unitary, and the second is positive semidefinite.

16.13 Rank Factorization

If $A$ has rank $r$ , then it can be written as

A=BC,

where $B$ is $m\times r$ , $C$ is $r\times n$ , and both have rank $r$ .

This is called a rank factorization.

It separates the matrix into a map from $F^n$ to $F^r$ , followed by a map from $F^r$ to $F^m$ . It makes the effective dimension $r$ explicit.

For example, if the columns of $A$ lie in a two-dimensional subspace, then $A$ can be factored through $F^2$ .

16.14 Low-Rank Approximation

A low-rank approximation replaces $A$ by a matrix of smaller rank.

The SVD gives the most important construction. If

A=U\Sigma V^T

and the singular values are ordered, then the best rank- $k$ approximation in the Euclidean and Frobenius norms is obtained by keeping only the largest $k$ singular values.

The approximation has the form

A_k=U_k\Sigma_kV_k^T.

This is used in data compression, noise reduction, recommender systems, principal component analysis, and scientific computing.

16.15 Factorizations and Solving Systems

Different factorizations solve different kinds of systems.

A=LU,

then

Ax=b

becomes

LUx=b.

Solve

Ly=b

then

Ux=y.

A=QR,

then least squares problems are simplified. The problem

\min_x \|Ax-b\|

becomes

\min_x \|QRx-b\|.

Since $Q$ preserves length,

\|QRx-b\|

can be transformed into a triangular least squares problem involving $R$ .

A=U\Sigma V^T,

then solving and approximating systems can be expressed in terms of singular values.

16.16 Factorizations and Determinants

Factorizations can simplify determinant computation.

A=LU

and $L$ has diagonal entries all equal to $1$ , then

\det(A)=\det(U).

Since $U$ is triangular,

\det(U)

is the product of its diagonal entries.

If row swaps are used,

PA=LU.

Then

\det(P)\det(A)=\det(L)\det(U).

Since $\det(P)$ is $1$ or $-1$ , this tracks the sign change from row swaps.

For Cholesky,

A=LL^T

gives

\det(A)=\det(L)^2.

Since $L$ is triangular,

\det(A)=\left(\prod_i l_{ii}\right)^2.

16.17 Factorizations and Geometry

Factorizations often separate geometric actions.

For example, the SVD writes

A=U\Sigma V^T.

This means that $A$ acts in three stages:

Stage	Action
$V^T$	Rotate or reflect the input coordinates
$\Sigma$	Scale along coordinate axes
$U$	Rotate or reflect the output coordinates

Thus every real matrix acts like a coordinate rotation, followed by axis scaling, followed by another coordinate rotation, after allowing collapse in zero singular-value directions.

This gives a geometric explanation of rank, conditioning, and low-rank approximation.

16.18 Choosing a Factorization

The correct factorization depends on the matrix and the problem.

Problem or matrix type	Common choice
General square system	LU or PLU
Symmetric positive definite system	Cholesky
Least squares	QR or SVD
Rank-deficient or nearly rank-deficient problem	SVD
Eigenvalue problem	Schur or spectral decomposition
Symmetric or Hermitian matrix	Spectral decomposition
Data compression	Truncated SVD
Repeated solves with same matrix	Factor once, solve many times

The choice is guided by structure, stability, cost, and the information needed.

16.19 Exact and Numerical Factorizations

In exact algebra, a factorization is an identity.

In numerical computation, a computed factorization is approximate. Instead of

A=LU,

a computer may produce factors satisfying

A \approx LU.

The quality of a factorization is judged by error, stability, conditioning, and sensitivity to perturbations.

Pivoting is often used to improve numerical behavior. Orthogonal and unitary factors are valuable because they preserve norms and reduce error growth.

This is why QR and SVD are often preferred for least squares and rank-sensitive problems.

16.20 Common Mistakes

Mistake	Correction
Assuming every matrix has an LU factorization without pivoting	Some matrices require row swaps
Using Cholesky for any symmetric matrix	Cholesky requires positive definiteness
Treating QR as only for square matrices	QR also applies to rectangular matrices
Assuming every square matrix is diagonalizable	Some square matrices lack enough eigenvectors
Confusing eigenvalues with singular values	Singular values are always nonnegative
Computing an inverse when a solve is enough	Use the factorization to solve triangular systems
Ignoring numerical stability	Pivoting, QR, and SVD often matter in floating point arithmetic

16.21 Summary

A matrix factorization rewrites a matrix as a product of structured factors. The structure makes computation and theory easier.

The main factorizations are:

Factorization	Form	Main use
LU	$A=LU$	Gaussian elimination, linear systems
PLU	$PA=LU$	Linear systems with pivoting
Cholesky	$A=LL^T$ or $A=LL^*$	Positive definite systems
QR	$A=QR$	Least squares, orthogonalization
SVD	$A=U\Sigma V^*$	Rank, conditioning, low-rank approximation
Eigenvalue	$A=PDP^{-1}$	Diagonalizable square matrices
Spectral	$A=Q\Lambda Q^*$	Hermitian or symmetric matrices
Schur	$A=QTQ^*$	General square matrices and eigenvalues
Polar	$A=UP$	Geometry of rotation and stretching

Factorizations turn matrices into simpler pieces. They are the main bridge between linear algebra as a theory and linear algebra as a practical computational tool.