Chapter 80. Singular Value Decomposition

Singular value decomposition is a factorization that applies to every real or complex matrix, square or rectangular. It is one of the most important decompositions in linear algebra because it exposes the rank, range, null space, conditioning, and best low-rank approximations of a matrix.

For a real $m \times n$ matrix $A$ , the singular value decomposition has the form

A = U\Sigma V^T.

Here $U$ is an $m \times m$ orthogonal matrix, $V$ is an $n \times n$ orthogonal matrix, and $\Sigma$ is an $m \times n$ rectangular diagonal matrix with nonnegative entries on the diagonal. For complex matrices, the form is $A=U\Sigma V^*$ , where $U$ and $V$ are unitary. The nonnegative diagonal entries of $\Sigma$ are the singular values of $A$ . This decomposition exists for every real or complex matrix.

80.1 The Factorization

Let

A \in \mathbb{R}^{m \times n}.

A singular value decomposition of $A$ is

A = U\Sigma V^T,

where

U^TU = I_m, \qquad V^TV = I_n.

The matrix $\Sigma$ has the form

\Sigma = \begin{bmatrix} \sigma_1 & 0 & 0 & \cdots \\ 0 & \sigma_2 & 0 & \cdots \\ 0 & 0 & \sigma_3 & \cdots \\ \vdots & \vdots & \vdots & \ddots \end{bmatrix},

with possible extra zero rows or columns depending on the shape of $A$ . The diagonal entries satisfy

\sigma_1 \ge \sigma_2 \ge \cdots \ge 0.

The numbers

\sigma_1,\sigma_2,\ldots

are the singular values of $A$ .

The columns of $U$ are called left singular vectors. The columns of $V$ are called right singular vectors. The factor $\Sigma$ records how much each right singular direction is stretched into the corresponding left singular direction.

80.2 Geometric Meaning

The SVD describes the action of $A$ as three simpler operations:

A = U\Sigma V^T.

Applied to a vector $x$ , this gives

Ax = U\Sigma V^T x.

The operation proceeds in three stages.

First, $V^T$ changes coordinates by an orthogonal transformation. This rotates or reflects the input space without changing lengths.

Second, $\Sigma$ scales the coordinate axes by the singular values. It may also change dimension, since $A$ can be rectangular.

Third, $U$ changes coordinates in the output space by another orthogonal transformation.

Thus the SVD says that every linear map can be understood as an orthogonal change of input coordinates, followed by axis-aligned scaling, followed by an orthogonal change of output coordinates. This geometric interpretation is one reason SVD is so useful.

80.3 Singular Vectors

Write

V = \begin{bmatrix} v_1 & v_2 & \cdots & v_n \end{bmatrix}, \qquad U = \begin{bmatrix} u_1 & u_2 & \cdots & u_m \end{bmatrix}.

The SVD implies

Av_i = \sigma_i u_i

for each singular value $\sigma_i$ . For nonzero singular values,

u_i = \frac{Av_i}{\sigma_i}.

The right singular vector $v_i$ is an input direction. The left singular vector $u_i$ is the corresponding output direction. The singular value $\sigma_i$ is the stretching factor.

If $\sigma_i = 0$ , then

Av_i = 0.

Thus right singular vectors associated with zero singular values lie in the null space of $A$ .

80.4 Relation to Eigenvalues

The SVD is closely related to the eigenvalue decomposition of the symmetric matrices

A^TA

and

AA^T.

Starting from

A = U\Sigma V^T,

we compute

A^TA = V\Sigma^TU^TU\Sigma V^T = V\Sigma^T\Sigma V^T.

Thus the columns of $V$ are eigenvectors of $A^TA$ .

Similarly,

AA^T = U\Sigma V^TV\Sigma^TU^T = U\Sigma\Sigma^TU^T.

Thus the columns of $U$ are eigenvectors of $AA^T$ .

The nonzero eigenvalues of $A^TA$ and $AA^T$ are

\sigma_i^2.

Therefore the singular values of $A$ are the square roots of the eigenvalues of $A^TA$ . This connection is a standard way to understand why singular values are always nonnegative.

80.5 Full SVD

The full SVD writes

A = U\Sigma V^T,

where $U$ is $m \times m$ , $\Sigma$ is $m \times n$ , and $V$ is $n \times n$ .

If $m > n$ , then $\Sigma$ has extra zero rows:

\Sigma = \begin{bmatrix} \sigma_1 & 0 & \cdots & 0 \\ 0 & \sigma_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_n \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \cdots & \vdots \end{bmatrix}.

If $m < n$ , then $\Sigma$ has extra zero columns.

The full SVD includes orthonormal bases for both the range and null-related orthogonal complements. It is complete, but often larger than needed in computation.

80.6 Reduced SVD

Suppose $A$ has rank $r$ . Let

\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0

be the positive singular values.

The reduced SVD writes

A = U_r \Sigma_r V_r^T,

where $U_r$ is $m \times r$ , $\Sigma_r$ is $r \times r$ , and $V_r$ is $n \times r$ .

Here

\Sigma_r = \operatorname{diag}(\sigma_1,\ldots,\sigma_r).

The reduced SVD contains exactly the parts of the decomposition that contribute to $A$ . It omits singular vectors associated with zero singular values. In many applications this is the natural form.

80.7 Thin SVD

If $A$ is $m \times n$ , the thin SVD keeps $k=\min(m,n)$ singular directions:

A = U_k \Sigma_k V_k^T.

This form is also called the economy-sized SVD. It saves storage when $m$ and $n$ are very different. In applications, reduced and thin SVDs are often preferred over the full SVD because the full orthogonal bases may contain unused directions.

80.8 A Two by Two Example

Let

A = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix}.

This matrix is already diagonal with nonnegative entries. Its SVD is

A = U\Sigma V^T

with

U = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \qquad \Sigma = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix}, \qquad V = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.

The singular values are

\sigma_1 = 3, \qquad \sigma_2 = 2.

The first coordinate direction is stretched by $3$ . The second coordinate direction is stretched by $2$ .

This example is simple because the input and output axes already coincide with the singular directions.

80.9 A Non-Diagonal Example

Let

A = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}.

Compute

A^TA = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}^T \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}.

Thus both eigenvalues of $A^TA$ are $2$ . Therefore both singular values are

\sigma_1=\sigma_2=\sqrt{2}.

In fact,

A^TA = 2I

means that $A$ scales every vector by $\sqrt{2}$ and preserves angles up to that uniform scaling. Hence

A = \sqrt{2}Q

where

Q = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}

is orthogonal.

One valid SVD is

A = Q \begin{bmatrix} \sqrt{2} & 0 \\ 0 & \sqrt{2} \end{bmatrix} I^T.

Because the singular values are equal, the singular vectors are not unique.

80.10 Rank from Singular Values

The rank of $A$ is the number of nonzero singular values:

\operatorname{rank}(A) = \#\{i : \sigma_i > 0\}.

This follows from the reduced SVD

A = U_r\Sigma_rV_r^T.

The diagonal matrix $\Sigma_r$ has exactly $r$ positive diagonal entries, and the orthogonal factors do not change rank.

In numerical computation, singular values also give a practical way to estimate numerical rank. If some singular values are extremely small compared with the largest singular value, the matrix behaves like a lower-rank matrix under finite precision arithmetic.

80.11 Range and Null Space

The SVD gives orthonormal bases for the fundamental subspaces of $A$ .

A = U\Sigma V^T

and the rank is $r$ , then

\operatorname{range}(A) = \operatorname{span}(u_1,\ldots,u_r).

The null space is

\mathcal{N}(A) = \operatorname{span}(v_{r+1},\ldots,v_n).

The row space is

\operatorname{range}(A^T) = \operatorname{span}(v_1,\ldots,v_r).

The left null space is

\mathcal{N}(A^T) = \operatorname{span}(u_{r+1},\ldots,u_m).

Thus SVD organizes all four fundamental subspaces in one decomposition.

80.12 Norms from Singular Values

The largest singular value gives the operator norm induced by the Euclidean norm:

\|A\|_2 = \sigma_1.

This means

\sigma_1 = \max_{x \ne 0} \frac{\|Ax\|_2}{\|x\|_2}.

The Frobenius norm is determined by all singular values:

\|A\|_F = \sqrt{ \sigma_1^2+\sigma_2^2+\cdots+\sigma_r^2 }.

The singular values therefore measure how strongly $A$ stretches vectors overall and by direction.

80.13 Condition Number

For a full-rank square matrix, the condition number in the Euclidean norm is

\kappa_2(A) = \frac{\sigma_{\max}}{\sigma_{\min}}.

Using ordered singular values, this is

\kappa_2(A) = \frac{\sigma_1}{\sigma_n}.

A condition number close to $1$ means the matrix stretches all directions by similar amounts. A large condition number means some directions are stretched much more than others. Such a matrix is close to singular and may produce sensitive solutions in linear systems.

If $\sigma_n = 0$ , then the matrix is singular and

\kappa_2(A)=\infty.

The condition number is one of the most important numerical quantities derived from the SVD.

80.14 Solving Linear Systems

Suppose $A$ is square and invertible, with SVD

A = U\Sigma V^T.

Then

A^{-1} = V\Sigma^{-1}U^T.

Therefore the solution of

Ax=b

x = V\Sigma^{-1}U^Tb.

Since $\Sigma$ is diagonal,

\Sigma^{-1} = \operatorname{diag} \left( \frac{1}{\sigma_1}, \ldots, \frac{1}{\sigma_n} \right).

This formula shows why small singular values are dangerous. If $\sigma_i$ is tiny, then $1/\sigma_i$ is large, so noise in the corresponding direction is amplified.

80.15 The Moore-Penrose Pseudoinverse

For a rectangular or rank-deficient matrix, the SVD gives the Moore-Penrose pseudoinverse.

A = U\Sigma V^T,

then

A^+ = V\Sigma^+U^T,

where $\Sigma^+$ is formed by replacing each positive singular value $\sigma_i$ by $1/\sigma_i$ , then transposing the rectangular diagonal shape.

\Sigma = \begin{bmatrix} \sigma_1 & 0 \\ 0 & \sigma_2 \\ 0 & 0 \end{bmatrix},

then

\Sigma^+ = \begin{bmatrix} 1/\sigma_1 & 0 & 0 \\ 0 & 1/\sigma_2 & 0 \end{bmatrix}.

The pseudoinverse is used to solve least squares problems, minimum norm problems, and rank-deficient systems.

80.16 Least Squares by SVD

For the least squares problem

\min_x \|Ax-b\|_2,

the SVD gives a complete solution.

A = U\Sigma V^T,

then

Ax-b = U\Sigma V^Tx-b.

Multiplying by $U^T$ preserves the Euclidean norm, so the problem becomes

\min_x \|\Sigma V^Tx-U^Tb\|_2.

Let

z = V^Tx.

Then

\min_z \|\Sigma z-U^Tb\|_2.

Because $\Sigma$ is diagonal, this problem separates into scalar equations. The minimum-norm least squares solution is

x = A^+b.

SVD is more expensive than QR for many least squares problems, but it gives the most reliable information when the matrix is rank deficient or nearly rank deficient.

80.17 Low-Rank Approximation

The SVD writes a matrix as a sum of rank-one matrices:

A = \sum_{i=1}^{r} \sigma_i u_i v_i^T.

The best rank- $k$ approximation is obtained by keeping the first $k$ terms:

A_k = \sum_{i=1}^{k} \sigma_i u_i v_i^T.

Equivalently,

A_k = U_k\Sigma_kV_k^T.

The Eckart-Young theorem states that this truncated SVD gives the best rank- $k$ approximation in the Frobenius norm and the spectral norm. This result is central in compression, denoising, dimension reduction, and principal component analysis.

80.18 Principal Component Analysis

Principal component analysis is closely related to the SVD.

Suppose a data matrix $X$ has centered columns, so each feature has mean zero. The covariance matrix is proportional to

X^TX.

X = U\Sigma V^T,

then

X^TX = V\Sigma^T\Sigma V^T.

Thus the columns of $V$ give the principal directions of the data, and the squared singular values give the variances along those directions up to the usual scaling factor.

Keeping only the leading singular vectors gives a lower-dimensional representation that preserves as much variance as possible in the least squares sense.

80.19 Image Compression

A grayscale image can be represented as a matrix of pixel intensities. If

A = \sum_{i=1}^{r}\sigma_i u_i v_i^T,

then a compressed approximation is

A_k = \sum_{i=1}^{k}\sigma_i u_i v_i^T.

When the singular values decay quickly, a small number of terms may capture most of the visible structure.

Each term

\sigma_i u_i v_i^T

is a rank-one image component. The first terms capture broad structure. Later terms capture finer details and noise.

This is a concrete example of low-rank approximation. It also illustrates the general principle: SVD separates a matrix into ordered components by importance.

80.20 Polar Decomposition Connection

The SVD is related to the polar decomposition.

For a square matrix, polar decomposition writes

A = QH,

where $Q$ is orthogonal or unitary and $H$ is symmetric positive semidefinite or Hermitian positive semidefinite.

A = U\Sigma V^T,

then

A = (UV^T)(V\Sigma V^T).

Here

Q = UV^T

is orthogonal, and

H = V\Sigma V^T

is symmetric positive semidefinite.

Thus the SVD refines the polar decomposition by diagonalizing the positive semidefinite factor.

80.21 Comparison with Eigenvalue Decomposition

SVD and eigenvalue decomposition are related but different.

Feature	Eigenvalue decomposition	Singular value decomposition
Form	$A=X\Lambda X^{-1}$	$A=U\Sigma V^T$
Applies to rectangular matrices	No	Yes
Always exists	No	Yes
Orthogonal factors	Only for special matrices	Yes
Middle factor	Diagonal eigenvalue matrix	Nonnegative rectangular diagonal matrix
Main values	Eigenvalues	Singular values
Main directions	Eigenvectors	Left and right singular vectors

Eigenvalue decomposition studies directions that are mapped to scalar multiples of themselves. SVD studies input directions that are mapped to orthogonal output directions with nonnegative scaling.

For symmetric positive semidefinite matrices, the two decompositions coincide in a strong sense. For general matrices, SVD is usually more robust and more broadly applicable.

80.22 Algorithmic Computation

Practical SVD algorithms are more complex than the definition suggests. A common dense strategy has two main stages.

svd(A):
    reduce A to bidiagonal form using orthogonal transformations

    apply iterative diagonalization to the bidiagonal matrix

    accumulate the orthogonal transformations into U and V

    return U, Sigma, V

The first stage uses orthogonal transformations to preserve numerical stability. The second stage computes the singular values and singular vectors from the simpler bidiagonal form.

For very large sparse matrices, one often computes only the leading singular values and vectors. Krylov subspace methods, Lanczos-type methods, randomized algorithms, and iterative solvers are common in that setting.

80.23 Numerical Stability

The SVD is one of the most stable matrix decompositions because it uses orthogonal or unitary transformations. These transformations preserve Euclidean length and do not by themselves amplify errors.

SVD is often used when reliability matters more than speed. It is especially valuable for:

Task	Why SVD helps
Rank estimation	Singular values show numerical rank
Ill-conditioned systems	Small singular values expose sensitivity
Least squares	Handles rank deficiency
Compression	Gives best low-rank approximations
PCA	Gives principal directions
Pseudoinverse	Provides a canonical generalized inverse

The cost is higher than LU, Cholesky, or QR in many dense problems. The extra cost is often justified when the matrix may be nearly singular or when structural information is needed.

80.24 Summary

Singular value decomposition factors any real matrix as

A = U\Sigma V^T.

For complex matrices, the form is

A = U\Sigma V^*.

The matrices $U$ and $V$ are orthogonal or unitary. The matrix $\Sigma$ is rectangular diagonal with nonnegative entries. These entries are the singular values.

SVD reveals the geometry of a matrix. The right singular vectors are input directions. The left singular vectors are output directions. The singular values are stretching factors.

The decomposition gives rank, null space, range, condition number, pseudoinverse, least squares solutions, and best low-rank approximations. It applies to every matrix and is one of the most reliable tools in both theoretical and numerical linear algebra.