Skip to content

Chapter 57. Singular Value Decomposition

The singular value decomposition, usually called the SVD, is one of the central factorizations in linear algebra. It applies to every matrix, including rectangular and rank-deficient matrices. Unlike eigenvalue decomposition, it does not require the matrix to be square or diagonalizable.

The SVD expresses a matrix as

A=UΣV, A=U\Sigma V^*,

where:

FactorProperty
UUUnitary or orthogonal
Σ\SigmaDiagonal with nonnegative entries
VVUnitary or orthogonal

The diagonal entries of Σ\Sigma are called the singular values of AA.

The SVD explains the geometry of a matrix completely. It shows that every linear transformation can be decomposed into:

  1. a rotation or unitary transformation,
  2. a coordinate scaling,
  3. another rotation or unitary transformation.

This factorization is fundamental in numerical linear algebra, data analysis, signal processing, statistics, optimization, machine learning, and inverse problems. Singular value decomposition factorizes any matrix into orthogonal or unitary factors together with a diagonal scaling matrix of singular values. (en.wikipedia.org)

57.1 Definition of the SVD

Let

ACm×n. A\in\mathbb{C}^{m\times n}.

A singular value decomposition of AA is a factorization

A=UΣV, A=U\Sigma V^*,

where:

  • UCm×mU\in\mathbb{C}^{m\times m} is unitary,
  • VCn×nV\in\mathbb{C}^{n\times n} is unitary,
  • ΣRm×n\Sigma\in\mathbb{R}^{m\times n} is diagonal in the rectangular sense.

The diagonal entries satisfy

σ1σ2σr>0, \sigma_1\ge\sigma_2\ge\cdots\ge\sigma_r>0,

and all remaining diagonal entries are zero.

The numbers

σ1,,σr \sigma_1,\ldots,\sigma_r

are the singular values of AA.

The number rr equals the rank of AA.

In the real case, the matrices UU and VV are orthogonal, and the conjugate transpose becomes an ordinary transpose:

A=UΣVT. A=U\Sigma V^T.

57.2 Rectangular Diagonal Matrices

The matrix Σ\Sigma has the form

Σ=[σ1000σ2000σr000]. \Sigma= \begin{bmatrix} \sigma_1 & 0 & \cdots & 0\\ 0 & \sigma_2 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & \sigma_r\\ 0 & 0 & \cdots & 0 \end{bmatrix}.

If m>nm>n, then Σ\Sigma has extra zero rows. If n>mn>m, then it has extra zero columns.

For example,

Σ=[500200] \Sigma= \begin{bmatrix} 5&0\\ 0&2\\ 0&0 \end{bmatrix}

is a 3×23\times 2 diagonal matrix in the rectangular sense.

The singular values are always nonnegative.

57.3 Singular Values from Eigenvalues

The singular values of AA are the square roots of the eigenvalues of

AA. A^*A.

Since

AA A^*A

is Hermitian positive semidefinite, all its eigenvalues are real and nonnegative.

Suppose

AAv=λv. A^*Av=\lambda v.

Then

vAAv=λvv. v^*A^*Av = \lambda v^*v.

But

vAAv=(Av)(Av)=Av220. v^*A^*Av = (Av)^*(Av) = \|Av\|_2^2\ge 0.

Thus

λ0. \lambda\ge 0.

The singular values are defined by

σi=λi. \sigma_i=\sqrt{\lambda_i}.

Therefore:

MatrixSpectrum
AAA^*AEigenvalues λi0\lambda_i\ge 0
AASingular values σi=λi\sigma_i=\sqrt{\lambda_i}

57.4 Right Singular Vectors

The eigenvectors of

AA A^*A

are called the right singular vectors of AA.

Suppose

AAvi=σi2vi. A^*Av_i=\sigma_i^2v_i.

Choose orthonormal eigenvectors

v1,,vn. v_1,\ldots,v_n.

Collect them into the matrix

V=[v1v2vn]. V= \begin{bmatrix} |&|&&|\\ v_1&v_2&\cdots&v_n\\ |&|&&| \end{bmatrix}.

Then

VV=I. V^*V=I.

Thus VV is unitary.

The matrix VV gives the input coordinate directions of the transformation.

57.5 Left Singular Vectors

For every nonzero singular value σi\sigma_i, define

ui=Aviσi. u_i=\frac{Av_i}{\sigma_i}.

Then

Aui? Au_i?

No. The correct relation is

Avi=σiui. Av_i=\sigma_i u_i.

The vectors uiu_i are called left singular vectors.

They satisfy

Aui=σivi. A^*u_i=\sigma_i v_i.

Indeed,

Aui=AAviσi=AAviσi=σi2viσi=σivi. A^*u_i = A^*\frac{Av_i}{\sigma_i} = \frac{A^*Av_i}{\sigma_i} = \frac{\sigma_i^2v_i}{\sigma_i} = \sigma_i v_i.

The vectors uiu_i are orthonormal:

uiuj=1σiσj(Avi)(Avj). u_i^*u_j = \frac{1}{\sigma_i\sigma_j} (Av_i)^*(Av_j).

Since

(Avi)(Avj)=viAAvj=σj2vivj, (Av_i)^*(Av_j) = v_i^*A^*Av_j = \sigma_j^2v_i^*v_j,

we obtain

uiuj=δij. u_i^*u_j=\delta_{ij}.

57.6 Constructing the SVD

Let

AA=VΛV A^*A=V\Lambda V^*

be a unitary diagonalization.

The diagonal matrix Λ\Lambda contains the eigenvalues

λi=σi2. \lambda_i=\sigma_i^2.

Define

Σ=diag(σ1,,σr). \Sigma= \operatorname{diag}(\sigma_1,\ldots,\sigma_r).

For each nonzero singular value, define

ui=Aviσi. u_i=\frac{Av_i}{\sigma_i}.

Extend the orthonormal set

u1,,ur u_1,\ldots,u_r

to an orthonormal basis of Cm\mathbb{C}^m, and let

U=[u1u2um]. U= \begin{bmatrix} |&|&&|\\ u_1&u_2&\cdots&u_m\\ |&|&&| \end{bmatrix}.

Then

A=UΣV. A=U\Sigma V^*.

Thus every matrix has an SVD.

57.7 Geometric Interpretation

The SVD describes the action of a matrix geometrically.

The transformation

xAx x\mapsto Ax

can be decomposed into three steps:

  1. apply VV^*,
  2. scale coordinates by Σ\Sigma,
  3. apply UU.

The matrices UU and VV^* are unitary or orthogonal, so they preserve length and angle.

The matrix Σ\Sigma stretches space independently along orthogonal coordinate directions.

Thus the singular values measure how strongly the matrix stretches different directions.

57.8 Unit Sphere Interpretation

Consider the unit sphere

{x:x2=1}. \{x:\|x\|_2=1\}.

Under the transformation xAxx\mapsto Ax, the sphere becomes an ellipsoid.

The principal axes of this ellipsoid are:

QuantityMeaning
viv_iInput direction
uiu_iOutput direction
σi\sigma_iStretch factor

Specifically,

Avi=σiui. Av_i=\sigma_i u_i.

Thus the matrix stretches the direction viv_i by the factor σi\sigma_i, then rotates it into the direction uiu_i.

This geometric interpretation is one of the most important meanings of the SVD.

57.9 Rank and Singular Values

The rank of AA equals the number of nonzero singular values.

Indeed,

Avi=σiui. Av_i=\sigma_i u_i.

If

σi=0, \sigma_i=0,

then

Avi=0, Av_i=0,

so viv_i lies in the null space of AA.

Thus:

QuantityEquals
Rank of AANumber of positive singular values
Nullity of AANumber of zero singular values

The SVD therefore gives complete rank information.

57.10 Compact SVD

If AA has rank rr, one may keep only the nonzero singular values.

Define:

Ur=[u1ur], U_r= \begin{bmatrix} u_1&\cdots&u_r \end{bmatrix}, Vr=[v1vr], V_r= \begin{bmatrix} v_1&\cdots&v_r \end{bmatrix},

and

Σr=diag(σ1,,σr). \Sigma_r= \operatorname{diag}(\sigma_1,\ldots,\sigma_r).

Then

A=UrΣrVr. A=U_r\Sigma_rV_r^*.

This is the compact SVD or thin SVD.

It removes directions associated with zero singular values.

57.11 Outer Product Expansion

The SVD can be written as

A=i=1rσiuivi. A = \sum_{i=1}^r \sigma_i u_i v_i^*.

Each term

uivi u_i v_i^*

is a rank-one matrix.

Thus every matrix is a sum of rank-one matrices weighted by singular values.

This decomposition is fundamental in low-rank approximation and data compression.

57.12 Best Rank-kk Approximation

One of the most important theorems about the SVD is the Eckart-Young theorem.

Suppose

A=i=1rσiuivi. A = \sum_{i=1}^r \sigma_i u_i v_i^*.

Define

Ak=i=1kσiuivi. A_k = \sum_{i=1}^k \sigma_i u_i v_i^*.

Then AkA_k is the best rank-kk approximation to AA in both the spectral norm and Frobenius norm.

The approximation error is

AAk2=σk+1. \|A-A_k\|_2=\sigma_{k+1}.

Thus truncating the SVD gives the optimal low-rank approximation.

This result is central in compression, PCA, latent semantic analysis, and recommender systems.

57.13 Example

Consider

A=[3001]. A= \begin{bmatrix} 3&0\\ 0&1 \end{bmatrix}.

Then

ATA=[9001]. A^TA= \begin{bmatrix} 9&0\\ 0&1 \end{bmatrix}.

The eigenvalues are

9and1. 9 \quad \text{and} \quad 1.

Thus the singular values are

σ1=3,σ2=1. \sigma_1=3, \qquad \sigma_2=1.

The standard basis vectors are eigenvectors, so

V=I,U=I. V=I, \qquad U=I.

Hence

A=I[3001]I. A=I \begin{bmatrix} 3&0\\ 0&1 \end{bmatrix} I.

This matrix stretches the xx-direction by 33 and the yy-direction by 11.

57.14 Pseudoinverse

The SVD gives the Moore-Penrose pseudoinverse.

Suppose

A=UΣV. A=U\Sigma V^*.

Define

Σ+ \Sigma^+

by replacing each nonzero singular value σi\sigma_i by

1σi \frac{1}{\sigma_i}

and transposing the rectangular structure.

Then

A+=VΣ+U. A^+=V\Sigma^+U^*.

The pseudoinverse gives:

ProblemSolution
Least squaresx=A+bx=A^+b
Minimum norm solutionx=A+bx=A^+b
ProjectionAA+AA^+

The pseudoinverse exists for every matrix.

57.15 Least Squares and the SVD

Suppose

A=UΣV. A=U\Sigma V^*.

The least squares problem

minxAxb2 \min_x\|Ax-b\|_2

becomes

minxΣVxUb2. \min_x\|\Sigma V^*x-U^*b\|_2.

Since UU and VV are unitary, they preserve norms.

The problem reduces to a diagonal system involving singular values.

If all singular values are positive, the least squares solution is

x=VΣ1Ub. x = V\Sigma^{-1}U^*b.

This equals

A+b. A^+b.

The SVD therefore solves least squares problems robustly, even for ill-conditioned or rank-deficient matrices.

57.16 Condition Number

The condition number of a full-rank matrix in the Euclidean norm is

κ2(A)=σ1σr, \kappa_2(A) = \frac{\sigma_1}{\sigma_r},

where:

  • σ1\sigma_1 is the largest singular value,
  • σr\sigma_r is the smallest positive singular value.

A large condition number indicates near singularity and numerical instability.

The singular values therefore measure how close a matrix is to losing rank.

57.17 Spectral Norm

The operator 22-norm of a matrix is

A2=maxx2=1Ax2. \|A\|_2 = \max_{\|x\|_2=1}\|Ax\|_2.

This equals the largest singular value:

A2=σ1. \|A\|_2=\sigma_1.

Thus the largest singular value measures the maximum stretching factor of the transformation.

57.18 Frobenius Norm

The Frobenius norm is

AF=i,jaij2. \|A\|_F = \sqrt{ \sum_{i,j}|a_{ij}|^2 }.

Using the SVD,

AF2=i=1rσi2. \|A\|_F^2 = \sum_{i=1}^r \sigma_i^2.

Thus the singular values completely determine the Frobenius norm.

57.19 Principal Component Analysis

Principal component analysis, usually called PCA, is closely related to the SVD.

Suppose a data matrix XX contains centered observations.

The covariance matrix is

XTX. X^TX.

The eigenvectors of XTXX^TX are principal directions.

If

X=UΣVT, X=U\Sigma V^T,

then:

QuantityPCA interpretation
viv_iPrincipal direction
σi2\sigma_i^2Variance magnitude
uiσiu_i\sigma_iPrincipal component coordinates

Thus PCA is essentially the SVD of the centered data matrix.

57.20 Numerical Importance

The SVD is one of the most stable and informative matrix factorizations.

It is used for:

ApplicationPurpose
Least squaresStable solution
Rank estimationDetect numerical rank
CompressionLow-rank approximation
PCADimension reduction
Signal processingNoise filtering
Machine learningFeature extraction
Inverse problemsRegularization
Numerical analysisConditioning analysis

Unlike many decompositions, the SVD exists for every matrix and reveals complete geometric information about the transformation.

57.21 Summary

Every matrix admits a singular value decomposition:

A=UΣV. A=U\Sigma V^*.

The matrices UU and VV are unitary or orthogonal. The matrix Σ\Sigma contains the nonnegative singular values.

The singular values are the square roots of the eigenvalues of

AA. A^*A.

The SVD describes the action of a matrix geometrically as:

  1. a unitary transformation,
  2. a coordinate scaling,
  3. another unitary transformation.

It reveals rank, conditioning, principal directions, and optimal low-rank approximations. It is one of the most powerful tools in linear algebra and numerical computation.