Skip to content

Chapter 51. Orthogonal Projections

An orthogonal projection is the operation of replacing a vector by its closest vector in a chosen subspace. It is the precise linear algebra version of dropping a perpendicular from a point to a line, plane, or higher-dimensional subspace.

If WW is a subspace of an inner product space VV, then every vector vv can often be decomposed into two parts:

v=w+r, v = w + r,

where

wW,rW. w \in W, \qquad r \in W^\perp.

The vector ww is the orthogonal projection of vv onto WW. The vector rr is the residual. In finite-dimensional inner product spaces, this decomposition exists and is unique for every subspace WW. The projected vector is the closest vector in WW to the original vector.

51.1 Projection onto a Line

Let uu be a nonzero vector in an inner product space VV. The line generated by uu is

L=span{u}. L = \operatorname{span}\{u\}.

The projection of vv onto LL is the vector in the direction of uu closest to vv. It has the form

p=cu p = cu

for some scalar cc.

The residual is

r=vcu. r = v - cu.

For cucu to be the orthogonal projection, the residual must be orthogonal to the line. Since the line is spanned by uu, it is enough to require

vcu,u=0. \langle v-cu,u\rangle = 0.

Using linearity,

v,ucu,u=0. \langle v,u\rangle - c\langle u,u\rangle = 0.

Therefore

c=v,uu,u. c = \frac{\langle v,u\rangle}{\langle u,u\rangle}.

So the projection is

proju(v)=v,uu,uu. \operatorname{proj}_u(v) = \frac{\langle v,u\rangle}{\langle u,u\rangle}u.

This formula is valid whenever u0u\ne 0.

51.2 Projection onto a Unit Vector

If qq is a unit vector, then

q,q=1. \langle q,q\rangle = 1.

The projection formula simplifies to

projq(v)=v,qq. \operatorname{proj}_q(v) = \langle v,q\rangle q.

This is the simplest projection formula. The scalar

v,q \langle v,q\rangle

is the coordinate of vv in the direction qq. The vector

v,qq \langle v,q\rangle q

is the component of vv along qq.

For example, let

v=[34],q=[10]. v = \begin{bmatrix} 3\\ 4 \end{bmatrix}, \qquad q = \begin{bmatrix} 1\\ 0 \end{bmatrix}.

Then

v,q=3, \langle v,q\rangle = 3,

so

projq(v)=3[10]=[30]. \operatorname{proj}_q(v) = 3 \begin{bmatrix} 1\\ 0 \end{bmatrix} = \begin{bmatrix} 3\\ 0 \end{bmatrix}.

The projection keeps the horizontal component and removes the vertical component.

51.3 The Residual

The residual of vv after projection onto a subspace WW is

r=vprojW(v). r = v - \operatorname{proj}_W(v).

The defining property of orthogonal projection is

rW. r \in W^\perp.

Equivalently,

r,w=0 \langle r,w\rangle = 0

for every wWw\in W.

Thus projection separates a vector into an explained part and an unexplained part:

v=projW(v)+r, v = \operatorname{proj}_W(v) + r,

where

projW(v)W,rW. \operatorname{proj}_W(v)\in W, \qquad r\in W^\perp.

This is the orthogonal decomposition of vv with respect to WW.

51.4 Projection onto an Orthonormal Basis

Let WW be a subspace with orthonormal basis

q1,q2,,qk. q_1,q_2,\ldots,q_k.

The projection of vv onto WW is

projW(v)=j=1kv,qjqj. \operatorname{proj}_W(v) = \sum_{j=1}^k \langle v,q_j\rangle q_j.

This formula follows from the coordinate formula for orthonormal bases. The projected vector lies in WW, and the residual is orthogonal to every qjq_j.

Indeed, let

p=j=1kv,qjqj. p = \sum_{j=1}^k \langle v,q_j\rangle q_j.

Then for each ii,

vp,qi=v,qij=1kv,qjqj,qi. \langle v-p,q_i\rangle = \langle v,q_i\rangle - \left\langle \sum_{j=1}^k \langle v,q_j\rangle q_j, q_i \right\rangle.

Using orthonormality,

j=1kv,qjqj,qi=v,qi. \left\langle \sum_{j=1}^k \langle v,q_j\rangle q_j, q_i \right\rangle = \langle v,q_i\rangle.

Hence

vp,qi=0. \langle v-p,q_i\rangle = 0.

Since the qiq_i span WW, the residual is orthogonal to all of WW.

51.5 Projection Matrix for an Orthonormal Basis

Let QQ be the matrix whose columns are the orthonormal vectors

q1,,qk. q_1,\ldots,q_k.

Then

QTQ=Ik. Q^TQ=I_k.

The projection of vv onto Col(Q)\operatorname{Col}(Q) is

p=QQTv. p = QQ^T v.

Thus the projection matrix is

P=QQT. P = QQ^T.

This formula is important because it expresses projection as matrix multiplication.

The matrix PP satisfies

P2=P. P^2=P.

Indeed,

P2=(QQT)(QQT)=Q(QTQ)QT=QIQT=QQT=P. P^2 = (QQ^T)(QQ^T) = Q(Q^TQ)Q^T = QIQ^T = QQ^T = P.

It also satisfies

PT=P. P^T=P.

Therefore PP is symmetric and idempotent. A real matrix with these two properties is an orthogonal projection matrix.

51.6 Idempotence

A projection is idempotent. This means that applying it twice gives the same result as applying it once:

P2=P. P^2=P.

The reason is geometric. Once a vector has been projected onto a subspace, projecting it onto the same subspace again changes nothing.

If

p=Pv p = P v

and pWp\in W, then

Pp=p. Pp=p.

Therefore

P(Pv)=Pv. P(Pv)=Pv.

In matrix form,

P2v=Pv P^2v=Pv

for every vector vv, so

P2=P. P^2=P.

Idempotence is the algebraic signature of projection. General projections are idempotent linear maps, while orthogonal projections also respect the inner product geometry.

51.7 Symmetry

A real projection matrix PP is an orthogonal projection matrix precisely when it is both idempotent and symmetric:

P2=P,PT=P. P^2=P, \qquad P^T=P.

The idempotent condition says that PP is a projection. The symmetry condition says that the projection is orthogonal rather than oblique.

For complex matrices, symmetry is replaced by self-adjointness:

P2=P,P=P. P^2=P, \qquad P^*=P.

Here PP^* denotes the conjugate transpose.

Orthogonal projection matrices preserve the perpendicular relationship between the range and the residual. If p=Pvp=Pv, then

vpRange(P). v-p \in \operatorname{Range}(P)^\perp.

This is the geometric content of the symmetry condition.

51.8 Projection onto a Column Space

Let AA be an m×nm\times n real matrix with linearly independent columns. We want the projection of bRmb\in\mathbb{R}^m onto

Col(A). \operatorname{Col}(A).

The projected vector has the form

p=Ax^ p=A\hat{x}

for some x^Rn\hat{x}\in\mathbb{R}^n.

The residual is

r=bAx^. r=b-A\hat{x}.

For pp to be the orthogonal projection, the residual must be orthogonal to every column of AA. This condition is

AT(bAx^)=0. A^T(b-A\hat{x})=0.

Rearranging gives the normal equations:

ATAx^=ATb. A^TA\hat{x}=A^Tb.

Since the columns of AA are linearly independent, ATAA^TA is invertible. Thus

x^=(ATA)1ATb. \hat{x}=(A^TA)^{-1}A^Tb.

Therefore

p=A(ATA)1ATb. p=A(A^TA)^{-1}A^Tb.

The projection matrix onto Col(A)\operatorname{Col}(A) is

P=A(ATA)1AT. P=A(A^TA)^{-1}A^T.

51.9 Why ATAA^TA Is Invertible

Assume the columns of AA are linearly independent. Then ATAA^TA is invertible.

To see this, suppose

ATAx=0. A^TAx=0.

Multiply on the left by xTx^T:

xTATAx=0. x^TA^TAx=0.

But

xTATAx=(Ax)T(Ax)=Ax2. x^TA^TAx = (Ax)^T(Ax)=\|Ax\|^2.

Hence

Ax2=0. \|Ax\|^2=0.

Therefore

Ax=0. Ax=0.

Since the columns of AA are linearly independent, the null space of AA is trivial. Thus

x=0. x=0.

So the null space of ATAA^TA is trivial, and ATAA^TA is invertible.

51.10 Projection Matrix onto a Column Space

For a full-column-rank matrix AA, the projection matrix

P=A(ATA)1AT P=A(A^TA)^{-1}A^T

has two key properties.

First,

P2=P. P^2=P.

Indeed,

P2=A(ATA)1ATA(ATA)1AT. P^2 = A(A^TA)^{-1}A^T A(A^TA)^{-1}A^T.

Since

(ATA)1ATA(ATA)1=(ATA)1, (A^TA)^{-1}A^TA(A^TA)^{-1} = (A^TA)^{-1},

we get

P2=A(ATA)1AT=P. P^2 = A(A^TA)^{-1}A^T = P.

Second,

PT=P. P^T=P.

This follows because ATAA^TA is symmetric, so its inverse is symmetric:

PT=(A(ATA)1AT)T=A(ATA)1AT=P. P^T = \left(A(A^TA)^{-1}A^T\right)^T = A(A^TA)^{-1}A^T = P.

Thus PP is an orthogonal projection matrix.

51.11 Closest Vector Property

Orthogonal projection gives the closest vector in a subspace.

Let WW be a finite-dimensional subspace of an inner product space VV. Let

p=projW(v),r=vp. p=\operatorname{proj}_W(v), \qquad r=v-p.

Then

pW,rW. p\in W, \qquad r\in W^\perp.

For any other vector wWw\in W,

vw=(vp)+(pw). v-w = (v-p)+(p-w).

Here

vpW, v-p \in W^\perp,

and

pwW. p-w\in W.

Therefore the two vectors vpv-p and pwp-w are orthogonal. By the Pythagorean theorem,

vw2=vp2+pw2. \|v-w\|^2 = \|v-p\|^2+\|p-w\|^2.

Since

pw20, \|p-w\|^2\ge 0,

we have

vw2vp2. \|v-w\|^2\ge \|v-p\|^2.

Thus

vwvp. \|v-w\|\ge \|v-p\|.

So pp is the closest vector in WW to vv. Equality occurs only when w=pw=p. This is the best approximation property of orthogonal projection.

51.12 Distance to a Subspace

The distance from vv to a subspace WW is

dist(v,W)=infwWvw. \operatorname{dist}(v,W) = \inf_{w\in W}\|v-w\|.

When WW is finite-dimensional, this infimum is attained by the orthogonal projection:

dist(v,W)=vprojW(v). \operatorname{dist}(v,W) = \|v-\operatorname{proj}_W(v)\|.

If p=projW(v)p=\operatorname{proj}_W(v), then

dist(v,W)=vp. \operatorname{dist}(v,W)=\|v-p\|.

The residual is therefore the shortest error vector. It measures exactly how far vv lies from the subspace.

51.13 Least Squares

Orthogonal projection is the geometric core of least squares.

Consider an inconsistent system

Ax=b. Ax=b.

If bCol(A)b\notin \operatorname{Col}(A), there is no exact solution. Instead, we seek x^\hat{x} such that

Ax^ A\hat{x}

is as close as possible to bb. This means minimizing

bAx2. \|b-Ax\|_2.

The closest vector in Col(A)\operatorname{Col}(A) is the orthogonal projection of bb onto Col(A)\operatorname{Col}(A). Therefore

Ax^=projCol(A)(b). A\hat{x} = \operatorname{proj}_{\operatorname{Col}(A)}(b).

The residual

r=bAx^ r=b-A\hat{x}

must be orthogonal to Col(A)\operatorname{Col}(A). Hence

ATr=0. A^Tr=0.

Substituting r=bAx^r=b-A\hat{x} gives

AT(bAx^)=0, A^T(b-A\hat{x})=0,

or

ATAx^=ATb. A^TA\hat{x}=A^Tb.

These are the normal equations.

51.14 Example: Projection onto a Line in R2\mathbb{R}^2

Let

u=[12],v=[31]. u= \begin{bmatrix} 1\\ 2 \end{bmatrix}, \qquad v= \begin{bmatrix} 3\\ 1 \end{bmatrix}.

The projection of vv onto span{u}\operatorname{span}\{u\} is

p=vTuuTuu. p= \frac{v^Tu}{u^Tu}u.

Compute

vTu=31+12=5, v^Tu = 3\cdot 1 + 1\cdot 2 = 5,

and

uTu=12+22=5. u^Tu = 1^2+2^2=5.

Thus

p=55[12]=[12]. p= \frac{5}{5} \begin{bmatrix} 1\\ 2 \end{bmatrix} = \begin{bmatrix} 1\\ 2 \end{bmatrix}.

The residual is

r=vp=[31][12]=[21]. r=v-p = \begin{bmatrix} 3\\ 1 \end{bmatrix} - \begin{bmatrix} 1\\ 2 \end{bmatrix} = \begin{bmatrix} 2\\ -1 \end{bmatrix}.

Check orthogonality:

rTu=21+(1)2=0. r^Tu = 2\cdot 1 + (-1)\cdot 2 = 0.

Thus the decomposition is

[31]=[12]+[21], \begin{bmatrix} 3\\ 1 \end{bmatrix} = \begin{bmatrix} 1\\ 2 \end{bmatrix} + \begin{bmatrix} 2\\ -1 \end{bmatrix},

with the first vector on the line and the second vector perpendicular to the line.

51.15 Example: Projection onto a Plane

Let WR3W\subseteq \mathbb{R}^3 be the xyxy-plane:

W={[xy0]:x,yR}. W= \left\{ \begin{bmatrix} x\\ y\\ 0 \end{bmatrix} :x,y\in\mathbb{R} \right\}.

For

v=[abc], v= \begin{bmatrix} a\\ b\\ c \end{bmatrix},

the projection onto WW is

p=[ab0]. p= \begin{bmatrix} a\\ b\\ 0 \end{bmatrix}.

The residual is

r=[00c]. r= \begin{bmatrix} 0\\ 0\\ c \end{bmatrix}.

The projection matrix is

P=[100010000]. P= \begin{bmatrix} 1&0&0\\ 0&1&0\\ 0&0&0 \end{bmatrix}.

Then

Pv=[ab0]. Pv= \begin{bmatrix} a\\ b\\ 0 \end{bmatrix}.

This matrix satisfies

P2=P,PT=P. P^2=P, \qquad P^T=P.

So it is an orthogonal projection matrix.

51.16 Example: Projection Using a Matrix

Let

A=[110]. A= \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix}.

The column space of AA is the line in R3\mathbb{R}^3 spanned by

u=[110]. u= \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix}.

The projection matrix is

P=A(ATA)1AT. P=A(A^TA)^{-1}A^T.

Compute

ATA=[110][110]=2. A^TA = \begin{bmatrix} 1&1&0 \end{bmatrix} \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix} = 2.

Thus

P=12[110][110]=12[110110000]. P= \frac12 \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix} \begin{bmatrix} 1&1&0 \end{bmatrix} = \frac12 \begin{bmatrix} 1&1&0\\ 1&1&0\\ 0&0&0 \end{bmatrix}.

For

b=[245], b= \begin{bmatrix} 2\\ 4\\ 5 \end{bmatrix},

the projection is

p=Pb=12[110110000][245]=[330]. p=Pb = \frac12 \begin{bmatrix} 1&1&0\\ 1&1&0\\ 0&0&0 \end{bmatrix} \begin{bmatrix} 2\\ 4\\ 5 \end{bmatrix} = \begin{bmatrix} 3\\ 3\\ 0 \end{bmatrix}.

The residual is

r=bp=[115]. r=b-p = \begin{bmatrix} -1\\ 1\\ 5 \end{bmatrix}.

Check orthogonality to the column of AA:

ATr=[110][115]=0. A^Tr = \begin{bmatrix} 1&1&0 \end{bmatrix} \begin{bmatrix} -1\\ 1\\ 5 \end{bmatrix} = 0.

Thus pp is the orthogonal projection of bb onto Col(A)\operatorname{Col}(A).

51.17 Orthogonal Projection and Coordinates

If WW has an orthonormal basis q1,,qkq_1,\ldots,q_k, then the projection coefficients are

cj=v,qj. c_j=\langle v,q_j\rangle.

Thus

projW(v)=c1q1++ckqk. \operatorname{proj}_W(v) = c_1q_1+\cdots+c_kq_k.

These coefficients are the coordinates of the projected vector in the orthonormal basis of WW.

The projection discards all components of vv in WW^\perp. If V=WWV=W\oplus W^\perp, and

v=w+z,wW,zW, v=w+z, \qquad w\in W, \qquad z\in W^\perp,

then

projW(v)=w. \operatorname{proj}_W(v)=w.

Projection is therefore a coordinate-selection operation relative to an orthogonal decomposition.

51.18 Orthogonal Projection and Energy

Let p=projW(v)p=\operatorname{proj}_W(v) and r=vpr=v-p. Since

pr, p\perp r,

the Pythagorean theorem gives

v2=p2+r2. \|v\|^2=\|p\|^2+\|r\|^2.

Thus projection splits the squared norm into two parts:

TermMeaning
p2\|p\|^2Energy captured by the subspace
r2\|r\|^2Energy left outside the subspace

If WW has orthonormal basis q1,,qkq_1,\ldots,q_k, then

p2=j=1kv,qj2. \|p\|^2 = \sum_{j=1}^k |\langle v,q_j\rangle|^2.

Therefore

r2=v2j=1kv,qj2. \|r\|^2 = \|v\|^2 - \sum_{j=1}^k |\langle v,q_j\rangle|^2.

This form appears in approximation theory, signal processing, Fourier analysis, statistics, and numerical linear algebra.

51.19 Oblique Projections

A projection need not be orthogonal.

A linear map P:VVP:V\to V is a projection if

P2=P. P^2=P.

This only means that applying PP twice is the same as applying it once. It does not require the residual to be orthogonal to the range.

An oblique projection is a projection whose range and null space are complementary but not orthogonal.

For example,

P=[1100] P= \begin{bmatrix} 1&1\\ 0&0 \end{bmatrix}

satisfies

P2=P, P^2=P,

so it is a projection. But

PTP, P^T\ne P,

so it is not an orthogonal projection.

Orthogonal projections are usually preferred when distance minimization matters. Oblique projections appear in other settings where the decomposition directions are prescribed by constraints rather than perpendicularity.

51.20 Projection Theorem

In finite-dimensional inner product spaces, every subspace WW has an orthogonal projection. For each vVv\in V, there is a unique vector pWp\in W such that

vpW. v-p\in W^\perp.

This vector pp is the unique closest vector in WW to vv.

In Hilbert spaces, the analogous result requires WW to be closed. If MM is a closed subspace of a Hilbert space HH, then every xHx\in H has a unique best approximation x^M\hat{x}\in M, and the error xx^x-\hat{x} lies in MM^\perp.

This result is called the projection theorem. It is the abstract form of the closest vector property.

51.21 Summary

Orthogonal projection decomposes a vector into a component inside a subspace and a residual perpendicular to that subspace:

v=projW(v)+r,rW. v=\operatorname{proj}_W(v)+r, \qquad r\in W^\perp.

For a line spanned by a nonzero vector uu,

proju(v)=v,uu,uu. \operatorname{proj}_u(v) = \frac{\langle v,u\rangle}{\langle u,u\rangle}u.

For a subspace with orthonormal basis q1,,qkq_1,\ldots,q_k,

projW(v)=j=1kv,qjqj. \operatorname{proj}_W(v) = \sum_{j=1}^k \langle v,q_j\rangle q_j.

For a full-column-rank matrix AA, the projection matrix onto Col(A)\operatorname{Col}(A) is

P=A(ATA)1AT. P=A(A^TA)^{-1}A^T.

Orthogonal projection gives the closest vector in a subspace, produces the residual used in least squares, and gives the geometric meaning of many matrix formulas.