# Chapter 51. Orthogonal Projections

# Chapter 51. Orthogonal Projections

An orthogonal projection is the operation of replacing a vector by its closest vector in a chosen subspace. It is the precise linear algebra version of dropping a perpendicular from a point to a line, plane, or higher-dimensional subspace.

If \(W\) is a subspace of an inner product space \(V\), then every vector \(v\) can often be decomposed into two parts:

$$
v = w + r,
$$

where

$$
w \in W,
\qquad
r \in W^\perp.
$$

The vector \(w\) is the orthogonal projection of \(v\) onto \(W\). The vector \(r\) is the residual. In finite-dimensional inner product spaces, this decomposition exists and is unique for every subspace \(W\). The projected vector is the closest vector in \(W\) to the original vector.

## 51.1 Projection onto a Line

Let \(u\) be a nonzero vector in an inner product space \(V\). The line generated by \(u\) is

$$
L = \operatorname{span}\{u\}.
$$

The projection of \(v\) onto \(L\) is the vector in the direction of \(u\) closest to \(v\). It has the form

$$
p = cu
$$

for some scalar \(c\).

The residual is

$$
r = v - cu.
$$

For \(cu\) to be the orthogonal projection, the residual must be orthogonal to the line. Since the line is spanned by \(u\), it is enough to require

$$
\langle v-cu,u\rangle = 0.
$$

Using linearity,

$$
\langle v,u\rangle - c\langle u,u\rangle = 0.
$$

Therefore

$$
c =
\frac{\langle v,u\rangle}{\langle u,u\rangle}.
$$

So the projection is

$$
\operatorname{proj}_u(v) =
\frac{\langle v,u\rangle}{\langle u,u\rangle}u.
$$

This formula is valid whenever \(u\ne 0\).

## 51.2 Projection onto a Unit Vector

If \(q\) is a unit vector, then

$$
\langle q,q\rangle = 1.
$$

The projection formula simplifies to

$$
\operatorname{proj}_q(v) =
\langle v,q\rangle q.
$$

This is the simplest projection formula. The scalar

$$
\langle v,q\rangle
$$

is the coordinate of \(v\) in the direction \(q\). The vector

$$
\langle v,q\rangle q
$$

is the component of \(v\) along \(q\).

For example, let

$$
v =
\begin{bmatrix}
3\\
4
\end{bmatrix},
\qquad
q =
\begin{bmatrix}
1\\
0
\end{bmatrix}.
$$

Then

$$
\langle v,q\rangle = 3,
$$

so

$$
\operatorname{proj}_q(v) =
3
\begin{bmatrix}
1\\
0
\end{bmatrix} =
\begin{bmatrix}
3\\
0
\end{bmatrix}.
$$

The projection keeps the horizontal component and removes the vertical component.

## 51.3 The Residual

The residual of \(v\) after projection onto a subspace \(W\) is

$$
r = v - \operatorname{proj}_W(v).
$$

The defining property of orthogonal projection is

$$
r \in W^\perp.
$$

Equivalently,

$$
\langle r,w\rangle = 0
$$

for every \(w\in W\).

Thus projection separates a vector into an explained part and an unexplained part:

$$
v = \operatorname{proj}_W(v) + r,
$$

where

$$
\operatorname{proj}_W(v)\in W,
\qquad
r\in W^\perp.
$$

This is the orthogonal decomposition of \(v\) with respect to \(W\).

## 51.4 Projection onto an Orthonormal Basis

Let \(W\) be a subspace with orthonormal basis

$$
q_1,q_2,\ldots,q_k.
$$

The projection of \(v\) onto \(W\) is

$$
\operatorname{proj}_W(v) =
\sum_{j=1}^k \langle v,q_j\rangle q_j.
$$

This formula follows from the coordinate formula for orthonormal bases. The projected vector lies in \(W\), and the residual is orthogonal to every \(q_j\).

Indeed, let

$$
p =
\sum_{j=1}^k \langle v,q_j\rangle q_j.
$$

Then for each \(i\),

$$
\langle v-p,q_i\rangle =
\langle v,q_i\rangle -
\left\langle
\sum_{j=1}^k \langle v,q_j\rangle q_j,
q_i
\right\rangle.
$$

Using orthonormality,

$$
\left\langle
\sum_{j=1}^k \langle v,q_j\rangle q_j,
q_i
\right\rangle =
\langle v,q_i\rangle.
$$

Hence

$$
\langle v-p,q_i\rangle = 0.
$$

Since the \(q_i\) span \(W\), the residual is orthogonal to all of \(W\).

## 51.5 Projection Matrix for an Orthonormal Basis

Let \(Q\) be the matrix whose columns are the orthonormal vectors

$$
q_1,\ldots,q_k.
$$

Then

$$
Q^TQ=I_k.
$$

The projection of \(v\) onto \(\operatorname{Col}(Q)\) is

$$
p = QQ^T v.
$$

Thus the projection matrix is

$$
P = QQ^T.
$$

This formula is important because it expresses projection as matrix multiplication.

The matrix \(P\) satisfies

$$
P^2=P.
$$

Indeed,

$$
P^2 =
(QQ^T)(QQ^T) =
Q(Q^TQ)Q^T =
QIQ^T =
QQ^T =
P.
$$

It also satisfies

$$
P^T=P.
$$

Therefore \(P\) is symmetric and idempotent. A real matrix with these two properties is an orthogonal projection matrix.

## 51.6 Idempotence

A projection is idempotent. This means that applying it twice gives the same result as applying it once:

$$
P^2=P.
$$

The reason is geometric. Once a vector has been projected onto a subspace, projecting it onto the same subspace again changes nothing.

If

$$
p = P v
$$

and \(p\in W\), then

$$
Pp=p.
$$

Therefore

$$
P(Pv)=Pv.
$$

In matrix form,

$$
P^2v=Pv
$$

for every vector \(v\), so

$$
P^2=P.
$$

Idempotence is the algebraic signature of projection. General projections are idempotent linear maps, while orthogonal projections also respect the inner product geometry.

## 51.7 Symmetry

A real projection matrix \(P\) is an orthogonal projection matrix precisely when it is both idempotent and symmetric:

$$
P^2=P,
\qquad
P^T=P.
$$

The idempotent condition says that \(P\) is a projection. The symmetry condition says that the projection is orthogonal rather than oblique.

For complex matrices, symmetry is replaced by self-adjointness:

$$
P^2=P,
\qquad
P^*=P.
$$

Here \(P^*\) denotes the conjugate transpose.

Orthogonal projection matrices preserve the perpendicular relationship between the range and the residual. If \(p=Pv\), then

$$
v-p \in \operatorname{Range}(P)^\perp.
$$

This is the geometric content of the symmetry condition.

## 51.8 Projection onto a Column Space

Let \(A\) be an \(m\times n\) real matrix with linearly independent columns. We want the projection of \(b\in\mathbb{R}^m\) onto

$$
\operatorname{Col}(A).
$$

The projected vector has the form

$$
p=A\hat{x}
$$

for some \(\hat{x}\in\mathbb{R}^n\).

The residual is

$$
r=b-A\hat{x}.
$$

For \(p\) to be the orthogonal projection, the residual must be orthogonal to every column of \(A\). This condition is

$$
A^T(b-A\hat{x})=0.
$$

Rearranging gives the normal equations:

$$
A^TA\hat{x}=A^Tb.
$$

Since the columns of \(A\) are linearly independent, \(A^TA\) is invertible. Thus

$$
\hat{x}=(A^TA)^{-1}A^Tb.
$$

Therefore

$$
p=A(A^TA)^{-1}A^Tb.
$$

The projection matrix onto \(\operatorname{Col}(A)\) is

$$
P=A(A^TA)^{-1}A^T.
$$

## 51.9 Why \(A^TA\) Is Invertible

Assume the columns of \(A\) are linearly independent. Then \(A^TA\) is invertible.

To see this, suppose

$$
A^TAx=0.
$$

Multiply on the left by \(x^T\):

$$
x^TA^TAx=0.
$$

But

$$
x^TA^TAx = (Ax)^T(Ax)=\|Ax\|^2.
$$

Hence

$$
\|Ax\|^2=0.
$$

Therefore

$$
Ax=0.
$$

Since the columns of \(A\) are linearly independent, the null space of \(A\) is trivial. Thus

$$
x=0.
$$

So the null space of \(A^TA\) is trivial, and \(A^TA\) is invertible.

## 51.10 Projection Matrix onto a Column Space

For a full-column-rank matrix \(A\), the projection matrix

$$
P=A(A^TA)^{-1}A^T
$$

has two key properties.

First,

$$
P^2=P.
$$

Indeed,

$$
P^2 =
A(A^TA)^{-1}A^T A(A^TA)^{-1}A^T.
$$

Since

$$
(A^TA)^{-1}A^TA(A^TA)^{-1} =
(A^TA)^{-1},
$$

we get

$$
P^2 =
A(A^TA)^{-1}A^T =
P.
$$

Second,

$$
P^T=P.
$$

This follows because \(A^TA\) is symmetric, so its inverse is symmetric:

$$
P^T =
\left(A(A^TA)^{-1}A^T\right)^T =
A(A^TA)^{-1}A^T =
P.
$$

Thus \(P\) is an orthogonal projection matrix.

## 51.11 Closest Vector Property

Orthogonal projection gives the closest vector in a subspace.

Let \(W\) be a finite-dimensional subspace of an inner product space \(V\). Let

$$
p=\operatorname{proj}_W(v),
\qquad
r=v-p.
$$

Then

$$
p\in W,
\qquad
r\in W^\perp.
$$

For any other vector \(w\in W\),

$$
v-w = (v-p)+(p-w).
$$

Here

$$
v-p \in W^\perp,
$$

and

$$
p-w\in W.
$$

Therefore the two vectors \(v-p\) and \(p-w\) are orthogonal. By the Pythagorean theorem,

$$
\|v-w\|^2 =
\|v-p\|^2+\|p-w\|^2.
$$

Since

$$
\|p-w\|^2\ge 0,
$$

we have

$$
\|v-w\|^2\ge \|v-p\|^2.
$$

Thus

$$
\|v-w\|\ge \|v-p\|.
$$

So \(p\) is the closest vector in \(W\) to \(v\). Equality occurs only when \(w=p\). This is the best approximation property of orthogonal projection.

## 51.12 Distance to a Subspace

The distance from \(v\) to a subspace \(W\) is

$$
\operatorname{dist}(v,W) =
\inf_{w\in W}\|v-w\|.
$$

When \(W\) is finite-dimensional, this infimum is attained by the orthogonal projection:

$$
\operatorname{dist}(v,W) =
\|v-\operatorname{proj}_W(v)\|.
$$

If \(p=\operatorname{proj}_W(v)\), then

$$
\operatorname{dist}(v,W)=\|v-p\|.
$$

The residual is therefore the shortest error vector. It measures exactly how far \(v\) lies from the subspace.

## 51.13 Least Squares

Orthogonal projection is the geometric core of least squares.

Consider an inconsistent system

$$
Ax=b.
$$

If \(b\notin \operatorname{Col}(A)\), there is no exact solution. Instead, we seek \(\hat{x}\) such that

$$
A\hat{x}
$$

is as close as possible to \(b\). This means minimizing

$$
\|b-Ax\|_2.
$$

The closest vector in \(\operatorname{Col}(A)\) is the orthogonal projection of \(b\) onto \(\operatorname{Col}(A)\). Therefore

$$
A\hat{x} =
\operatorname{proj}_{\operatorname{Col}(A)}(b).
$$

The residual

$$
r=b-A\hat{x}
$$

must be orthogonal to \(\operatorname{Col}(A)\). Hence

$$
A^Tr=0.
$$

Substituting \(r=b-A\hat{x}\) gives

$$
A^T(b-A\hat{x})=0,
$$

or

$$
A^TA\hat{x}=A^Tb.
$$

These are the normal equations.

## 51.14 Example: Projection onto a Line in \(\mathbb{R}^2\)

Let

$$
u=
\begin{bmatrix}
1\\
2
\end{bmatrix},
\qquad
v=
\begin{bmatrix}
3\\
1
\end{bmatrix}.
$$

The projection of \(v\) onto \(\operatorname{span}\{u\}\) is

$$
p=
\frac{v^Tu}{u^Tu}u.
$$

Compute

$$
v^Tu = 3\cdot 1 + 1\cdot 2 = 5,
$$

and

$$
u^Tu = 1^2+2^2=5.
$$

Thus

$$
p=
\frac{5}{5}
\begin{bmatrix}
1\\
2
\end{bmatrix} =
\begin{bmatrix}
1\\
2
\end{bmatrix}.
$$

The residual is

$$
r=v-p =
\begin{bmatrix}
3\\
1
\end{bmatrix} -
\begin{bmatrix}
1\\
2
\end{bmatrix} =
\begin{bmatrix}
2\\
-1
\end{bmatrix}.
$$

Check orthogonality:

$$
r^Tu = 2\cdot 1 + (-1)\cdot 2 = 0.
$$

Thus the decomposition is

$$
\begin{bmatrix}
3\\
1
\end{bmatrix} =
\begin{bmatrix}
1\\
2
\end{bmatrix}
+
\begin{bmatrix}
2\\
-1
\end{bmatrix},
$$

with the first vector on the line and the second vector perpendicular to the line.

## 51.15 Example: Projection onto a Plane

Let \(W\subseteq \mathbb{R}^3\) be the \(xy\)-plane:

$$
W=
\left\{
\begin{bmatrix}
x\\
y\\
0
\end{bmatrix}
:x,y\in\mathbb{R}
\right\}.
$$

For

$$
v=
\begin{bmatrix}
a\\
b\\
c
\end{bmatrix},
$$

the projection onto \(W\) is

$$
p=
\begin{bmatrix}
a\\
b\\
0
\end{bmatrix}.
$$

The residual is

$$
r=
\begin{bmatrix}
0\\
0\\
c
\end{bmatrix}.
$$

The projection matrix is

$$
P=
\begin{bmatrix}
1&0&0\\
0&1&0\\
0&0&0
\end{bmatrix}.
$$

Then

$$
Pv=
\begin{bmatrix}
a\\
b\\
0
\end{bmatrix}.
$$

This matrix satisfies

$$
P^2=P,
\qquad
P^T=P.
$$

So it is an orthogonal projection matrix.

## 51.16 Example: Projection Using a Matrix

Let

$$
A=
\begin{bmatrix}
1\\
1\\
0
\end{bmatrix}.
$$

The column space of \(A\) is the line in \(\mathbb{R}^3\) spanned by

$$
u=
\begin{bmatrix}
1\\
1\\
0
\end{bmatrix}.
$$

The projection matrix is

$$
P=A(A^TA)^{-1}A^T.
$$

Compute

$$
A^TA =
\begin{bmatrix}
1&1&0
\end{bmatrix}
\begin{bmatrix}
1\\
1\\
0
\end{bmatrix} =
2.
$$

Thus

$$
P=
\frac12
\begin{bmatrix}
1\\
1\\
0
\end{bmatrix}
\begin{bmatrix}
1&1&0
\end{bmatrix} =
\frac12
\begin{bmatrix}
1&1&0\\
1&1&0\\
0&0&0
\end{bmatrix}.
$$

For

$$
b=
\begin{bmatrix}
2\\
4\\
5
\end{bmatrix},
$$

the projection is

$$
p=Pb =
\frac12
\begin{bmatrix}
1&1&0\\
1&1&0\\
0&0&0
\end{bmatrix}
\begin{bmatrix}
2\\
4\\
5
\end{bmatrix} =
\begin{bmatrix}
3\\
3\\
0
\end{bmatrix}.
$$

The residual is

$$
r=b-p =
\begin{bmatrix}
-1\\
1\\
5
\end{bmatrix}.
$$

Check orthogonality to the column of \(A\):

$$
A^Tr =
\begin{bmatrix}
1&1&0
\end{bmatrix}
\begin{bmatrix}
-1\\
1\\
5
\end{bmatrix} =
0.
$$

Thus \(p\) is the orthogonal projection of \(b\) onto \(\operatorname{Col}(A)\).

## 51.17 Orthogonal Projection and Coordinates

If \(W\) has an orthonormal basis \(q_1,\ldots,q_k\), then the projection coefficients are

$$
c_j=\langle v,q_j\rangle.
$$

Thus

$$
\operatorname{proj}_W(v) =
c_1q_1+\cdots+c_kq_k.
$$

These coefficients are the coordinates of the projected vector in the orthonormal basis of \(W\).

The projection discards all components of \(v\) in \(W^\perp\). If \(V=W\oplus W^\perp\), and

$$
v=w+z,
\qquad
w\in W,
\qquad
z\in W^\perp,
$$

then

$$
\operatorname{proj}_W(v)=w.
$$

Projection is therefore a coordinate-selection operation relative to an orthogonal decomposition.

## 51.18 Orthogonal Projection and Energy

Let \(p=\operatorname{proj}_W(v)\) and \(r=v-p\). Since

$$
p\perp r,
$$

the Pythagorean theorem gives

$$
\|v\|^2=\|p\|^2+\|r\|^2.
$$

Thus projection splits the squared norm into two parts:

| Term | Meaning |
|---|---|
| \(\|p\|^2\) | Energy captured by the subspace |
| \(\|r\|^2\) | Energy left outside the subspace |

If \(W\) has orthonormal basis \(q_1,\ldots,q_k\), then

$$
\|p\|^2 =
\sum_{j=1}^k |\langle v,q_j\rangle|^2.
$$

Therefore

$$
\|r\|^2 =
\|v\|^2 -
\sum_{j=1}^k |\langle v,q_j\rangle|^2.
$$

This form appears in approximation theory, signal processing, Fourier analysis, statistics, and numerical linear algebra.

## 51.19 Oblique Projections

A projection need not be orthogonal.

A linear map \(P:V\to V\) is a projection if

$$
P^2=P.
$$

This only means that applying \(P\) twice is the same as applying it once. It does not require the residual to be orthogonal to the range.

An oblique projection is a projection whose range and null space are complementary but not orthogonal.

For example,

$$
P=
\begin{bmatrix}
1&1\\
0&0
\end{bmatrix}
$$

satisfies

$$
P^2=P,
$$

so it is a projection. But

$$
P^T\ne P,
$$

so it is not an orthogonal projection.

Orthogonal projections are usually preferred when distance minimization matters. Oblique projections appear in other settings where the decomposition directions are prescribed by constraints rather than perpendicularity.

## 51.20 Projection Theorem

In finite-dimensional inner product spaces, every subspace \(W\) has an orthogonal projection. For each \(v\in V\), there is a unique vector \(p\in W\) such that

$$
v-p\in W^\perp.
$$

This vector \(p\) is the unique closest vector in \(W\) to \(v\).

In Hilbert spaces, the analogous result requires \(W\) to be closed. If \(M\) is a closed subspace of a Hilbert space \(H\), then every \(x\in H\) has a unique best approximation \(\hat{x}\in M\), and the error \(x-\hat{x}\) lies in \(M^\perp\).

This result is called the projection theorem. It is the abstract form of the closest vector property.

## 51.21 Summary

Orthogonal projection decomposes a vector into a component inside a subspace and a residual perpendicular to that subspace:

$$
v=\operatorname{proj}_W(v)+r,
\qquad
r\in W^\perp.
$$

For a line spanned by a nonzero vector \(u\),

$$
\operatorname{proj}_u(v) =
\frac{\langle v,u\rangle}{\langle u,u\rangle}u.
$$

For a subspace with orthonormal basis \(q_1,\ldots,q_k\),

$$
\operatorname{proj}_W(v) =
\sum_{j=1}^k \langle v,q_j\rangle q_j.
$$

For a full-column-rank matrix \(A\), the projection matrix onto \(\operatorname{Col}(A)\) is

$$
P=A(A^TA)^{-1}A^T.
$$

Orthogonal projection gives the closest vector in a subspace, produces the residual used in least squares, and gives the geometric meaning of many matrix formulas.