Skip to content

Chapter 80. Singular Value Decomposition

Singular value decomposition is a factorization that applies to every real or complex matrix, square or rectangular. It is one of the most important decompositions in linear algebra because it exposes the rank, range, null space, conditioning, and best low-rank approximations of a matrix.

For a real m×nm \times n matrix AA, the singular value decomposition has the form

A=UΣVT. A = U\Sigma V^T.

Here UU is an m×mm \times m orthogonal matrix, VV is an n×nn \times n orthogonal matrix, and Σ\Sigma is an m×nm \times n rectangular diagonal matrix with nonnegative entries on the diagonal. For complex matrices, the form is A=UΣVA=U\Sigma V^*, where UU and VV are unitary. The nonnegative diagonal entries of Σ\Sigma are the singular values of AA. This decomposition exists for every real or complex matrix.

80.1 The Factorization

Let

ARm×n. A \in \mathbb{R}^{m \times n}.

A singular value decomposition of AA is

A=UΣVT, A = U\Sigma V^T,

where

UTU=Im,VTV=In. U^TU = I_m, \qquad V^TV = I_n.

The matrix Σ\Sigma has the form

Σ=[σ1000σ2000σ3], \Sigma = \begin{bmatrix} \sigma_1 & 0 & 0 & \cdots \\ 0 & \sigma_2 & 0 & \cdots \\ 0 & 0 & \sigma_3 & \cdots \\ \vdots & \vdots & \vdots & \ddots \end{bmatrix},

with possible extra zero rows or columns depending on the shape of AA. The diagonal entries satisfy

σ1σ20. \sigma_1 \ge \sigma_2 \ge \cdots \ge 0.

The numbers

σ1,σ2, \sigma_1,\sigma_2,\ldots

are the singular values of AA.

The columns of UU are called left singular vectors. The columns of VV are called right singular vectors. The factor Σ\Sigma records how much each right singular direction is stretched into the corresponding left singular direction.

80.2 Geometric Meaning

The SVD describes the action of AA as three simpler operations:

A=UΣVT. A = U\Sigma V^T.

Applied to a vector xx, this gives

Ax=UΣVTx. Ax = U\Sigma V^T x.

The operation proceeds in three stages.

First, VTV^T changes coordinates by an orthogonal transformation. This rotates or reflects the input space without changing lengths.

Second, Σ\Sigma scales the coordinate axes by the singular values. It may also change dimension, since AA can be rectangular.

Third, UU changes coordinates in the output space by another orthogonal transformation.

Thus the SVD says that every linear map can be understood as an orthogonal change of input coordinates, followed by axis-aligned scaling, followed by an orthogonal change of output coordinates. This geometric interpretation is one reason SVD is so useful.

80.3 Singular Vectors

Write

V=[v1v2vn],U=[u1u2um]. V = \begin{bmatrix} v_1 & v_2 & \cdots & v_n \end{bmatrix}, \qquad U = \begin{bmatrix} u_1 & u_2 & \cdots & u_m \end{bmatrix}.

The SVD implies

Avi=σiui Av_i = \sigma_i u_i

for each singular value σi\sigma_i. For nonzero singular values,

ui=Aviσi. u_i = \frac{Av_i}{\sigma_i}.

The right singular vector viv_i is an input direction. The left singular vector uiu_i is the corresponding output direction. The singular value σi\sigma_i is the stretching factor.

If σi=0\sigma_i = 0, then

Avi=0. Av_i = 0.

Thus right singular vectors associated with zero singular values lie in the null space of AA.

80.4 Relation to Eigenvalues

The SVD is closely related to the eigenvalue decomposition of the symmetric matrices

ATA A^TA

and

AAT. AA^T.

Starting from

A=UΣVT, A = U\Sigma V^T,

we compute

ATA=VΣTUTUΣVT=VΣTΣVT. A^TA = V\Sigma^TU^TU\Sigma V^T = V\Sigma^T\Sigma V^T.

Thus the columns of VV are eigenvectors of ATAA^TA.

Similarly,

AAT=UΣVTVΣTUT=UΣΣTUT. AA^T = U\Sigma V^TV\Sigma^TU^T = U\Sigma\Sigma^TU^T.

Thus the columns of UU are eigenvectors of AATAA^T.

The nonzero eigenvalues of ATAA^TA and AATAA^T are

σi2. \sigma_i^2.

Therefore the singular values of AA are the square roots of the eigenvalues of ATAA^TA. This connection is a standard way to understand why singular values are always nonnegative.

80.5 Full SVD

The full SVD writes

A=UΣVT, A = U\Sigma V^T,

where UU is m×mm \times m, Σ\Sigma is m×nm \times n, and VV is n×nn \times n.

If m>nm > n, then Σ\Sigma has extra zero rows:

Σ=[σ1000σ2000σn000]. \Sigma = \begin{bmatrix} \sigma_1 & 0 & \cdots & 0 \\ 0 & \sigma_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_n \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \cdots & \vdots \end{bmatrix}.

If m<nm < n, then Σ\Sigma has extra zero columns.

The full SVD includes orthonormal bases for both the range and null-related orthogonal complements. It is complete, but often larger than needed in computation.

80.6 Reduced SVD

Suppose AA has rank rr. Let

σ1σ2σr>0 \sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0

be the positive singular values.

The reduced SVD writes

A=UrΣrVrT, A = U_r \Sigma_r V_r^T,

where UrU_r is m×rm \times r, Σr\Sigma_r is r×rr \times r, and VrV_r is n×rn \times r.

Here

Σr=diag(σ1,,σr). \Sigma_r = \operatorname{diag}(\sigma_1,\ldots,\sigma_r).

The reduced SVD contains exactly the parts of the decomposition that contribute to AA. It omits singular vectors associated with zero singular values. In many applications this is the natural form.

80.7 Thin SVD

If AA is m×nm \times n, the thin SVD keeps k=min(m,n)k=\min(m,n) singular directions:

A=UkΣkVkT. A = U_k \Sigma_k V_k^T.

This form is also called the economy-sized SVD. It saves storage when mm and nn are very different. In applications, reduced and thin SVDs are often preferred over the full SVD because the full orthogonal bases may contain unused directions.

80.8 A Two by Two Example

Let

A=[3002]. A = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix}.

This matrix is already diagonal with nonnegative entries. Its SVD is

A=UΣVT A = U\Sigma V^T

with

U=[1001],Σ=[3002],V=[1001]. U = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \qquad \Sigma = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix}, \qquad V = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.

The singular values are

σ1=3,σ2=2. \sigma_1 = 3, \qquad \sigma_2 = 2.

The first coordinate direction is stretched by 33. The second coordinate direction is stretched by 22.

This example is simple because the input and output axes already coincide with the singular directions.

80.9 A Non-Diagonal Example

Let

A=[1111]. A = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}.

Compute

ATA=[1111]T[1111]=[2002]. A^TA = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}^T \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}.

Thus both eigenvalues of ATAA^TA are 22. Therefore both singular values are

σ1=σ2=2. \sigma_1=\sigma_2=\sqrt{2}.

In fact,

ATA=2I A^TA = 2I

means that AA scales every vector by 2\sqrt{2} and preserves angles up to that uniform scaling. Hence

A=2Q A = \sqrt{2}Q

where

Q=12[1111] Q = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}

is orthogonal.

One valid SVD is

A=Q[2002]IT. A = Q \begin{bmatrix} \sqrt{2} & 0 \\ 0 & \sqrt{2} \end{bmatrix} I^T.

Because the singular values are equal, the singular vectors are not unique.

80.10 Rank from Singular Values

The rank of AA is the number of nonzero singular values:

rank(A)=#{i:σi>0}. \operatorname{rank}(A) = \#\{i : \sigma_i > 0\}.

This follows from the reduced SVD

A=UrΣrVrT. A = U_r\Sigma_rV_r^T.

The diagonal matrix Σr\Sigma_r has exactly rr positive diagonal entries, and the orthogonal factors do not change rank.

In numerical computation, singular values also give a practical way to estimate numerical rank. If some singular values are extremely small compared with the largest singular value, the matrix behaves like a lower-rank matrix under finite precision arithmetic.

80.11 Range and Null Space

The SVD gives orthonormal bases for the fundamental subspaces of AA.

If

A=UΣVT A = U\Sigma V^T

and the rank is rr, then

range(A)=span(u1,,ur). \operatorname{range}(A) = \operatorname{span}(u_1,\ldots,u_r).

The null space is

N(A)=span(vr+1,,vn). \mathcal{N}(A) = \operatorname{span}(v_{r+1},\ldots,v_n).

The row space is

range(AT)=span(v1,,vr). \operatorname{range}(A^T) = \operatorname{span}(v_1,\ldots,v_r).

The left null space is

N(AT)=span(ur+1,,um). \mathcal{N}(A^T) = \operatorname{span}(u_{r+1},\ldots,u_m).

Thus SVD organizes all four fundamental subspaces in one decomposition.

80.12 Norms from Singular Values

The largest singular value gives the operator norm induced by the Euclidean norm:

A2=σ1. \|A\|_2 = \sigma_1.

This means

σ1=maxx0Ax2x2. \sigma_1 = \max_{x \ne 0} \frac{\|Ax\|_2}{\|x\|_2}.

The Frobenius norm is determined by all singular values:

AF=σ12+σ22++σr2. \|A\|_F = \sqrt{ \sigma_1^2+\sigma_2^2+\cdots+\sigma_r^2 }.

The singular values therefore measure how strongly AA stretches vectors overall and by direction.

80.13 Condition Number

For a full-rank square matrix, the condition number in the Euclidean norm is

κ2(A)=σmaxσmin. \kappa_2(A) = \frac{\sigma_{\max}}{\sigma_{\min}}.

Using ordered singular values, this is

κ2(A)=σ1σn. \kappa_2(A) = \frac{\sigma_1}{\sigma_n}.

A condition number close to 11 means the matrix stretches all directions by similar amounts. A large condition number means some directions are stretched much more than others. Such a matrix is close to singular and may produce sensitive solutions in linear systems.

If σn=0\sigma_n = 0, then the matrix is singular and

κ2(A)=. \kappa_2(A)=\infty.

The condition number is one of the most important numerical quantities derived from the SVD.

80.14 Solving Linear Systems

Suppose AA is square and invertible, with SVD

A=UΣVT. A = U\Sigma V^T.

Then

A1=VΣ1UT. A^{-1} = V\Sigma^{-1}U^T.

Therefore the solution of

Ax=b Ax=b

is

x=VΣ1UTb. x = V\Sigma^{-1}U^Tb.

Since Σ\Sigma is diagonal,

Σ1=diag(1σ1,,1σn). \Sigma^{-1} = \operatorname{diag} \left( \frac{1}{\sigma_1}, \ldots, \frac{1}{\sigma_n} \right).

This formula shows why small singular values are dangerous. If σi\sigma_i is tiny, then 1/σi1/\sigma_i is large, so noise in the corresponding direction is amplified.

80.15 The Moore-Penrose Pseudoinverse

For a rectangular or rank-deficient matrix, the SVD gives the Moore-Penrose pseudoinverse.

If

A=UΣVT, A = U\Sigma V^T,

then

A+=VΣ+UT, A^+ = V\Sigma^+U^T,

where Σ+\Sigma^+ is formed by replacing each positive singular value σi\sigma_i by 1/σi1/\sigma_i, then transposing the rectangular diagonal shape.

If

Σ=[σ100σ200], \Sigma = \begin{bmatrix} \sigma_1 & 0 \\ 0 & \sigma_2 \\ 0 & 0 \end{bmatrix},

then

Σ+=[1/σ10001/σ20]. \Sigma^+ = \begin{bmatrix} 1/\sigma_1 & 0 & 0 \\ 0 & 1/\sigma_2 & 0 \end{bmatrix}.

The pseudoinverse is used to solve least squares problems, minimum norm problems, and rank-deficient systems.

80.16 Least Squares by SVD

For the least squares problem

minxAxb2, \min_x \|Ax-b\|_2,

the SVD gives a complete solution.

If

A=UΣVT, A = U\Sigma V^T,

then

Axb=UΣVTxb. Ax-b = U\Sigma V^Tx-b.

Multiplying by UTU^T preserves the Euclidean norm, so the problem becomes

minxΣVTxUTb2. \min_x \|\Sigma V^Tx-U^Tb\|_2.

Let

z=VTx. z = V^Tx.

Then

minzΣzUTb2. \min_z \|\Sigma z-U^Tb\|_2.

Because Σ\Sigma is diagonal, this problem separates into scalar equations. The minimum-norm least squares solution is

x=A+b. x = A^+b.

SVD is more expensive than QR for many least squares problems, but it gives the most reliable information when the matrix is rank deficient or nearly rank deficient.

80.17 Low-Rank Approximation

The SVD writes a matrix as a sum of rank-one matrices:

A=i=1rσiuiviT. A = \sum_{i=1}^{r} \sigma_i u_i v_i^T.

The best rank-kk approximation is obtained by keeping the first kk terms:

Ak=i=1kσiuiviT. A_k = \sum_{i=1}^{k} \sigma_i u_i v_i^T.

Equivalently,

Ak=UkΣkVkT. A_k = U_k\Sigma_kV_k^T.

The Eckart-Young theorem states that this truncated SVD gives the best rank-kk approximation in the Frobenius norm and the spectral norm. This result is central in compression, denoising, dimension reduction, and principal component analysis.

80.18 Principal Component Analysis

Principal component analysis is closely related to the SVD.

Suppose a data matrix XX has centered columns, so each feature has mean zero. The covariance matrix is proportional to

XTX. X^TX.

If

X=UΣVT, X = U\Sigma V^T,

then

XTX=VΣTΣVT. X^TX = V\Sigma^T\Sigma V^T.

Thus the columns of VV give the principal directions of the data, and the squared singular values give the variances along those directions up to the usual scaling factor.

Keeping only the leading singular vectors gives a lower-dimensional representation that preserves as much variance as possible in the least squares sense.

80.19 Image Compression

A grayscale image can be represented as a matrix of pixel intensities. If

A=i=1rσiuiviT, A = \sum_{i=1}^{r}\sigma_i u_i v_i^T,

then a compressed approximation is

Ak=i=1kσiuiviT. A_k = \sum_{i=1}^{k}\sigma_i u_i v_i^T.

When the singular values decay quickly, a small number of terms may capture most of the visible structure.

Each term

σiuiviT \sigma_i u_i v_i^T

is a rank-one image component. The first terms capture broad structure. Later terms capture finer details and noise.

This is a concrete example of low-rank approximation. It also illustrates the general principle: SVD separates a matrix into ordered components by importance.

80.20 Polar Decomposition Connection

The SVD is related to the polar decomposition.

For a square matrix, polar decomposition writes

A=QH, A = QH,

where QQ is orthogonal or unitary and HH is symmetric positive semidefinite or Hermitian positive semidefinite.

If

A=UΣVT, A = U\Sigma V^T,

then

A=(UVT)(VΣVT). A = (UV^T)(V\Sigma V^T).

Here

Q=UVT Q = UV^T

is orthogonal, and

H=VΣVT H = V\Sigma V^T

is symmetric positive semidefinite.

Thus the SVD refines the polar decomposition by diagonalizing the positive semidefinite factor.

80.21 Comparison with Eigenvalue Decomposition

SVD and eigenvalue decomposition are related but different.

FeatureEigenvalue decompositionSingular value decomposition
FormA=XΛX1A=X\Lambda X^{-1}A=UΣVTA=U\Sigma V^T
Applies to rectangular matricesNoYes
Always existsNoYes
Orthogonal factorsOnly for special matricesYes
Middle factorDiagonal eigenvalue matrixNonnegative rectangular diagonal matrix
Main valuesEigenvaluesSingular values
Main directionsEigenvectorsLeft and right singular vectors

Eigenvalue decomposition studies directions that are mapped to scalar multiples of themselves. SVD studies input directions that are mapped to orthogonal output directions with nonnegative scaling.

For symmetric positive semidefinite matrices, the two decompositions coincide in a strong sense. For general matrices, SVD is usually more robust and more broadly applicable.

80.22 Algorithmic Computation

Practical SVD algorithms are more complex than the definition suggests. A common dense strategy has two main stages.

svd(A):
    reduce A to bidiagonal form using orthogonal transformations

    apply iterative diagonalization to the bidiagonal matrix

    accumulate the orthogonal transformations into U and V

    return U, Sigma, V

The first stage uses orthogonal transformations to preserve numerical stability. The second stage computes the singular values and singular vectors from the simpler bidiagonal form.

For very large sparse matrices, one often computes only the leading singular values and vectors. Krylov subspace methods, Lanczos-type methods, randomized algorithms, and iterative solvers are common in that setting.

80.23 Numerical Stability

The SVD is one of the most stable matrix decompositions because it uses orthogonal or unitary transformations. These transformations preserve Euclidean length and do not by themselves amplify errors.

SVD is often used when reliability matters more than speed. It is especially valuable for:

TaskWhy SVD helps
Rank estimationSingular values show numerical rank
Ill-conditioned systemsSmall singular values expose sensitivity
Least squaresHandles rank deficiency
CompressionGives best low-rank approximations
PCAGives principal directions
PseudoinverseProvides a canonical generalized inverse

The cost is higher than LU, Cholesky, or QR in many dense problems. The extra cost is often justified when the matrix may be nearly singular or when structural information is needed.

80.24 Summary

Singular value decomposition factors any real matrix as

A=UΣVT. A = U\Sigma V^T.

For complex matrices, the form is

A=UΣV. A = U\Sigma V^*.

The matrices UU and VV are orthogonal or unitary. The matrix Σ\Sigma is rectangular diagonal with nonnegative entries. These entries are the singular values.

SVD reveals the geometry of a matrix. The right singular vectors are input directions. The left singular vectors are output directions. The singular values are stretching factors.

The decomposition gives rank, null space, range, condition number, pseudoinverse, least squares solutions, and best low-rank approximations. It applies to every matrix and is one of the most reliable tools in both theoretical and numerical linear algebra.