# Chapter 16. Matrix Factorizations Overview

# Chapter 16. Matrix Factorizations Overview

A matrix factorization writes a matrix as a product of simpler matrices. The factors are chosen so that they reveal structure or make computation easier. Common factorizations include LU, PLU, Cholesky, QR, Schur, eigenvalue decomposition, and singular value decomposition. These factorizations are used to solve linear systems, compute determinants, estimate rank, solve least squares problems, analyze geometry, and design stable numerical algorithms.

## 16.1 What a Factorization Is

A factorization has the form

$$
A = BC
$$

or, more generally,

$$
A = B_1B_2\cdots B_k.
$$

The matrix \(A\) is replaced by factors with useful structure.

For example,

$$
A = LU
$$

writes \(A\) as a product of a lower triangular matrix \(L\) and an upper triangular matrix \(U\).

This is useful because triangular systems are easy to solve. Instead of solving

$$
Ax=b,
$$

one solves

$$
LUx=b.
$$

Let

$$
Ux=y.
$$

Then solve

$$
Ly=b
$$

by forward substitution, and solve

$$
Ux=y
$$

by back substitution.

## 16.2 Why Factorizations Matter

A factorization separates a hard problem into simpler problems.

Many matrix problems are expensive when handled directly. Factorization replaces a general matrix by structured factors. These factors may be triangular, diagonal, orthogonal, unitary, symmetric, or low rank.

The common purposes are:

| Purpose | Factorization examples |
|---|---|
| Solve linear systems | LU, PLU, Cholesky |
| Solve least squares problems | QR, SVD |
| Study eigenvalues | Schur, spectral decomposition |
| Estimate rank | SVD, rank-revealing QR |
| Compute determinants | LU, Cholesky |
| Compress data | SVD, low-rank factorization |
| Improve numerical stability | QR, SVD, pivoted factorizations |

A factorization is therefore both algebraic and computational. It rewrites the matrix while preserving the problem.

## 16.3 Triangular Matrices as Building Blocks

Triangular matrices are central because triangular systems can be solved directly.

An upper triangular system has the form

$$
Ux=b,
$$

where

$$
U=
\begin{bmatrix}
u_{11}&u_{12}&\cdots&u_{1n}\\
0&u_{22}&\cdots&u_{2n}\\
\vdots&\vdots&\ddots&\vdots\\
0&0&\cdots&u_{nn}
\end{bmatrix}.
$$

The last equation gives \(x_n\). Then the previous equation gives \(x_{n-1}\), and so on.

A lower triangular system is solved in the opposite direction, using forward substitution.

This explains why LU, PLU, and Cholesky factorizations are useful.

## 16.4 LU Factorization

An LU factorization writes a square matrix as

$$
A=LU,
$$

where \(L\) is lower triangular and \(U\) is upper triangular.

Usually \(L\) is chosen with diagonal entries equal to \(1\). Such an \(L\) is called unit lower triangular.

LU factorization records Gaussian elimination as a matrix product. The entries of \(L\) store the multipliers used during elimination, and \(U\) is the resulting upper triangular matrix.

For example, if

$$
A=
\begin{bmatrix}
2&1\\
6&7
\end{bmatrix},
$$

then eliminating the \(6\) below the pivot \(2\) uses multiplier \(3\). We get

$$
U=
\begin{bmatrix}
2&1\\
0&4
\end{bmatrix}.
$$

The corresponding lower triangular factor is

$$
L=
\begin{bmatrix}
1&0\\
3&1
\end{bmatrix}.
$$

Indeed,

$$
LU=
\begin{bmatrix}
1&0\\
3&1
\end{bmatrix}
\begin{bmatrix}
2&1\\
0&4
\end{bmatrix} =
\begin{bmatrix}
2&1\\
6&7
\end{bmatrix}.
$$

## 16.5 PLU Factorization

Some matrices require row swaps during elimination. In that case, the factorization is written as

$$
PA=LU,
$$

or equivalently

$$
A=P^TLU,
$$

where \(P\) is a permutation matrix.

The matrix \(P\) records the row swaps. The matrices \(L\) and \(U\) record the elimination after the swaps.

PLU factorization is more robust than plain LU because it handles zero or unsuitable pivot entries.

For example, the matrix

$$
A=
\begin{bmatrix}
0&1\\
2&3
\end{bmatrix}
$$

cannot start elimination with the entry \(0\) in the first pivot position. Swapping rows gives

$$
PA=
\begin{bmatrix}
2&3\\
0&1
\end{bmatrix}.
$$

This is already upper triangular. The permutation matrix records the row swap.

## 16.6 Cholesky Factorization

Cholesky factorization applies to real symmetric positive definite matrices and complex Hermitian positive definite matrices. In the real case, it writes

$$
A=LL^T,
$$

where \(L\) is lower triangular with positive diagonal entries. In the complex case, the form is

$$
A=LL^*,
$$

where \(L^*\) is the conjugate transpose. Cholesky is especially useful for solving systems with symmetric positive definite matrices and is more efficient than general LU when its assumptions hold.

For example,

$$
\begin{bmatrix}
4&2\\
2&3
\end{bmatrix} =
\begin{bmatrix}
2&0\\
1&\sqrt{2}
\end{bmatrix}
\begin{bmatrix}
2&1\\
0&\sqrt{2}
\end{bmatrix}.
$$

The matrix is decomposed into a triangular factor and its transpose.

Cholesky is common in optimization, statistics, least squares, covariance matrices, and numerical solutions of positive definite systems.

## 16.7 QR Factorization

QR factorization writes a matrix as

$$
A=QR,
$$

where \(Q\) has orthonormal columns and \(R\) is upper triangular.

For a square real matrix with full rank, \(Q\) is orthogonal:

$$
Q^TQ=I.
$$

For a complex matrix, \(Q\) is unitary:

$$
Q^*Q=I.
$$

QR factorization is closely related to Gram-Schmidt orthogonalization. It replaces the columns of \(A\) by orthonormal directions and records the coefficients in \(R\).

QR is especially important for least squares problems. It avoids forming the normal equations

$$
A^TAx=A^Tb,
$$

which can worsen numerical conditioning.

## 16.8 Singular Value Decomposition

The singular value decomposition, or SVD, writes an \(m\times n\) matrix as

$$
A=U\Sigma V^T
$$

in the real case, or

$$
A=U\Sigma V^*
$$

in the complex case.

Here \(U\) and \(V\) are orthogonal or unitary matrices, and \(\Sigma\) is diagonal except for possible extra zero rows or columns.

The diagonal entries of \(\Sigma\) are the singular values:

$$
\sigma_1,\sigma_2,\ldots.
$$

They are usually ordered as

$$
\sigma_1\ge \sigma_2\ge \cdots \ge 0.
$$

The SVD applies to every matrix. It is one of the most powerful factorizations because it reveals rank, null space, column space, conditioning, and best low-rank approximations. It is also useful for least squares problems, especially when the matrix is close to rank deficient.

## 16.9 Eigenvalue Decomposition

For some square matrices, one can write

$$
A=PDP^{-1},
$$

where \(D\) is diagonal.

The columns of \(P\) are eigenvectors of \(A\), and the diagonal entries of \(D\) are eigenvalues.

This factorization is possible when \(A\) has enough linearly independent eigenvectors. In that case, \(A\) is diagonalizable.

Diagonalization is useful because powers and functions of \(A\) become easier:

$$
A^k=PD^kP^{-1}.
$$

Since \(D\) is diagonal, \(D^k\) is obtained by raising each diagonal entry to the \(k\)-th power.

## 16.10 Spectral Decomposition

For real symmetric matrices and complex Hermitian matrices, the eigenvalue decomposition has a stronger form.

In the real symmetric case,

$$
A=Q\Lambda Q^T,
$$

where \(Q\) is orthogonal and \(\Lambda\) is diagonal.

In the complex Hermitian case,

$$
A=Q\Lambda Q^*,
$$

where \(Q\) is unitary and \(\Lambda\) is real diagonal.

This is called spectral decomposition. It is a central result of inner product linear algebra. It says that symmetric or Hermitian matrices can be understood using orthonormal eigenvectors.

Spectral decomposition is used in quadratic forms, principal component analysis, differential equations, statistics, and optimization.

## 16.11 Schur Decomposition

The Schur decomposition writes a square matrix as

$$
A=QTQ^*,
$$

where \(Q\) is unitary and \(T\) is upper triangular.

For real matrices, a real Schur form also exists, with \(Q\) orthogonal and \(T\) quasi-upper triangular.

Schur decomposition is more general than diagonalization. Every complex square matrix has a Schur decomposition, even when it is not diagonalizable.

The diagonal entries of \(T\) are the eigenvalues of \(A\). Thus Schur form is a stable way to study eigenvalues in numerical linear algebra.

## 16.12 Polar Decomposition

The polar decomposition writes a matrix as

$$
A=UP,
$$

where \(U\) is orthogonal or unitary in the square nonsingular case, and \(P\) is positive semidefinite.

It is the matrix analogue of writing a complex number as

$$
z=e^{i\theta}r.
$$

The factor \(U\) represents rotation or reflection. The factor \(P\) represents stretching.

Polar decomposition is closely related to the SVD. If

$$
A=U_1\Sigma V^*
$$

is an SVD, then

$$
A=(U_1V^*)(V\Sigma V^*).
$$

The first factor is orthogonal or unitary, and the second is positive semidefinite.

## 16.13 Rank Factorization

If \(A\) has rank \(r\), then it can be written as

$$
A=BC,
$$

where \(B\) is \(m\times r\), \(C\) is \(r\times n\), and both have rank \(r\).

This is called a rank factorization.

It separates the matrix into a map from \(F^n\) to \(F^r\), followed by a map from \(F^r\) to \(F^m\). It makes the effective dimension \(r\) explicit.

For example, if the columns of \(A\) lie in a two-dimensional subspace, then \(A\) can be factored through \(F^2\).

## 16.14 Low-Rank Approximation

A low-rank approximation replaces \(A\) by a matrix of smaller rank.

The SVD gives the most important construction. If

$$
A=U\Sigma V^T
$$

and the singular values are ordered, then the best rank-\(k\) approximation in the Euclidean and Frobenius norms is obtained by keeping only the largest \(k\) singular values.

The approximation has the form

$$
A_k=U_k\Sigma_kV_k^T.
$$

This is used in data compression, noise reduction, recommender systems, principal component analysis, and scientific computing.

## 16.15 Factorizations and Solving Systems

Different factorizations solve different kinds of systems.

If

$$
A=LU,
$$

then

$$
Ax=b
$$

becomes

$$
LUx=b.
$$

Solve

$$
Ly=b
$$

then

$$
Ux=y.
$$

If

$$
A=QR,
$$

then least squares problems are simplified. The problem

$$
\min_x \|Ax-b\|
$$

becomes

$$
\min_x \|QRx-b\|.
$$

Since \(Q\) preserves length,

$$
\|QRx-b\|
$$

can be transformed into a triangular least squares problem involving \(R\).

If

$$
A=U\Sigma V^T,
$$

then solving and approximating systems can be expressed in terms of singular values.

## 16.16 Factorizations and Determinants

Factorizations can simplify determinant computation.

If

$$
A=LU
$$

and \(L\) has diagonal entries all equal to \(1\), then

$$
\det(A)=\det(U).
$$

Since \(U\) is triangular,

$$
\det(U)
$$

is the product of its diagonal entries.

If row swaps are used,

$$
PA=LU.
$$

Then

$$
\det(P)\det(A)=\det(L)\det(U).
$$

Since \(\det(P)\) is \(1\) or \(-1\), this tracks the sign change from row swaps.

For Cholesky,

$$
A=LL^T
$$

gives

$$
\det(A)=\det(L)^2.
$$

Since \(L\) is triangular,

$$
\det(A)=\left(\prod_i l_{ii}\right)^2.
$$

## 16.17 Factorizations and Geometry

Factorizations often separate geometric actions.

For example, the SVD writes

$$
A=U\Sigma V^T.
$$

This means that \(A\) acts in three stages:

| Stage | Action |
|---|---|
| \(V^T\) | Rotate or reflect the input coordinates |
| \(\Sigma\) | Scale along coordinate axes |
| \(U\) | Rotate or reflect the output coordinates |

Thus every real matrix acts like a coordinate rotation, followed by axis scaling, followed by another coordinate rotation, after allowing collapse in zero singular-value directions.

This gives a geometric explanation of rank, conditioning, and low-rank approximation.

## 16.18 Choosing a Factorization

The correct factorization depends on the matrix and the problem.

| Problem or matrix type | Common choice |
|---|---|
| General square system | LU or PLU |
| Symmetric positive definite system | Cholesky |
| Least squares | QR or SVD |
| Rank-deficient or nearly rank-deficient problem | SVD |
| Eigenvalue problem | Schur or spectral decomposition |
| Symmetric or Hermitian matrix | Spectral decomposition |
| Data compression | Truncated SVD |
| Repeated solves with same matrix | Factor once, solve many times |

The choice is guided by structure, stability, cost, and the information needed.

## 16.19 Exact and Numerical Factorizations

In exact algebra, a factorization is an identity.

In numerical computation, a computed factorization is approximate. Instead of

$$
A=LU,
$$

a computer may produce factors satisfying

$$
A \approx LU.
$$

The quality of a factorization is judged by error, stability, conditioning, and sensitivity to perturbations.

Pivoting is often used to improve numerical behavior. Orthogonal and unitary factors are valuable because they preserve norms and reduce error growth.

This is why QR and SVD are often preferred for least squares and rank-sensitive problems.

## 16.20 Common Mistakes

| Mistake | Correction |
|---|---|
| Assuming every matrix has an LU factorization without pivoting | Some matrices require row swaps |
| Using Cholesky for any symmetric matrix | Cholesky requires positive definiteness |
| Treating QR as only for square matrices | QR also applies to rectangular matrices |
| Assuming every square matrix is diagonalizable | Some square matrices lack enough eigenvectors |
| Confusing eigenvalues with singular values | Singular values are always nonnegative |
| Computing an inverse when a solve is enough | Use the factorization to solve triangular systems |
| Ignoring numerical stability | Pivoting, QR, and SVD often matter in floating point arithmetic |

## 16.21 Summary

A matrix factorization rewrites a matrix as a product of structured factors. The structure makes computation and theory easier.

The main factorizations are:

| Factorization | Form | Main use |
|---|---|---|
| LU | \(A=LU\) | Gaussian elimination, linear systems |
| PLU | \(PA=LU\) | Linear systems with pivoting |
| Cholesky | \(A=LL^T\) or \(A=LL^*\) | Positive definite systems |
| QR | \(A=QR\) | Least squares, orthogonalization |
| SVD | \(A=U\Sigma V^*\) | Rank, conditioning, low-rank approximation |
| Eigenvalue | \(A=PDP^{-1}\) | Diagonalizable square matrices |
| Spectral | \(A=Q\Lambda Q^*\) | Hermitian or symmetric matrices |
| Schur | \(A=QTQ^*\) | General square matrices and eigenvalues |
| Polar | \(A=UP\) | Geometry of rotation and stretching |

Factorizations turn matrices into simpler pieces. They are the main bridge between linear algebra as a theory and linear algebra as a practical computational tool.