# Chapter 50. Gram-Schmidt Orthogonalization

# Chapter 50. Gram-Schmidt Orthogonalization

The Gram-Schmidt process converts a linearly independent list of vectors into an orthogonal or orthonormal list spanning the same subspace. It is one of the standard constructions in inner product spaces and is the basic theoretical source of the QR factorization used in numerical linear algebra. The process repeatedly subtracts projections onto previously constructed directions, leaving a residual vector orthogonal to all earlier ones.

## 50.1 The Problem

Let

$$
v_1, v_2, \ldots, v_k
$$

be linearly independent vectors in an inner product space \(V\). We want to construct vectors

$$
q_1, q_2, \ldots, q_k
$$

such that:

| Requirement | Meaning |
|---|---|
| Same span | \(\operatorname{span}(q_1,\ldots,q_j)=\operatorname{span}(v_1,\ldots,v_j)\) for each \(j\) |
| Orthogonality | \(\langle q_i,q_j\rangle=0\) whenever \(i\ne j\) |
| Normalization | \(\|q_j\|=1\) for each \(j\) |

The output is an orthonormal basis for the same subspace spanned by the original vectors.

The key idea is simple. Keep the part of each new vector that points in a new direction. Remove all parts that already lie in the directions constructed earlier.

## 50.2 Projection onto a Unit Vector

Suppose \(q\) is a unit vector. The projection of \(v\) onto the line spanned by \(q\) is

$$
\operatorname{proj}_q(v)=\langle v,q\rangle q.
$$

This formula is simple because \(\|q\|=1\). For a non-unit vector \(u\), the projection would be

$$
\operatorname{proj}_u(v) =
\frac{\langle v,u\rangle}{\langle u,u\rangle}u.
$$

Gram-Schmidt uses these projections to remove the parts of a vector that lie in old directions.

## 50.3 First Step

Start with the first vector \(v_1\). Since the list is linearly independent, \(v_1\ne 0\).

Define

$$
u_1=v_1.
$$

Normalize it:

$$
q_1=\frac{u_1}{\|u_1\|}.
$$

Then \(q_1\) is a unit vector and

$$
\operatorname{span}(q_1)=\operatorname{span}(v_1).
$$

The first direction is unchanged except for scaling.

## 50.4 Second Step

The second vector \(v_2\) may have a component in the direction of \(q_1\). Remove that component:

$$
u_2 =
v_2-\langle v_2,q_1\rangle q_1.
$$

Then \(u_2\) is orthogonal to \(q_1\). Indeed,

$$
\langle u_2,q_1\rangle =
\langle v_2-\langle v_2,q_1\rangle q_1,q_1\rangle.
$$

Using linearity,

$$
\langle u_2,q_1\rangle =
\langle v_2,q_1\rangle -
\langle v_2,q_1\rangle\langle q_1,q_1\rangle.
$$

Since \(q_1\) is a unit vector,

$$
\langle q_1,q_1\rangle=1.
$$

Therefore

$$
\langle u_2,q_1\rangle=0.
$$

Normalize:

$$
q_2=\frac{u_2}{\|u_2\|}.
$$

Now \(q_1,q_2\) are orthonormal and span the same subspace as \(v_1,v_2\).

## 50.5 General Step

Assume that

$$
q_1,\ldots,q_{j-1}
$$

have already been constructed and are orthonormal.

To construct the next vector, subtract from \(v_j\) its projections onto all previous \(q_i\):

$$
u_j =
v_j -
\sum_{i=1}^{j-1}
\langle v_j,q_i\rangle q_i.
$$

Then normalize:

$$
q_j=\frac{u_j}{\|u_j\|}.
$$

The vector \(u_j\) is orthogonal to every earlier \(q_i\). For \(m<j\),

$$
\langle u_j,q_m\rangle =
\left\langle
v_j -
\sum_{i=1}^{j-1}
\langle v_j,q_i\rangle q_i,
q_m
\right\rangle.
$$

By linearity,

$$
\langle u_j,q_m\rangle =
\langle v_j,q_m\rangle -
\sum_{i=1}^{j-1}
\langle v_j,q_i\rangle
\langle q_i,q_m\rangle.
$$

Since the \(q_i\) are orthonormal,

$$
\langle q_i,q_m\rangle=\delta_{im}.
$$

Thus only one term remains:

$$
\langle u_j,q_m\rangle =
\langle v_j,q_m\rangle -
\langle v_j,q_m\rangle =
0.
$$

So \(u_j\) is orthogonal to all previous directions.

## 50.6 Why \(u_j\) Is Nonzero

The vector \(u_j\) must be nonzero. If

$$
u_j=0,
$$

then

$$
v_j =
\sum_{i=1}^{j-1}
\langle v_j,q_i\rangle q_i.
$$

This would imply

$$
v_j\in \operatorname{span}(q_1,\ldots,q_{j-1}).
$$

But

$$
\operatorname{span}(q_1,\ldots,q_{j-1}) =
\operatorname{span}(v_1,\ldots,v_{j-1}).
$$

Thus \(v_j\) would lie in the span of the earlier \(v_i\), contradicting linear independence.

Therefore \(u_j\ne 0\), so normalization is valid.

## 50.7 The Algorithm

Given linearly independent vectors

$$
v_1,\ldots,v_k,
$$

the Gram-Schmidt algorithm is:

$$
u_1=v_1,
\qquad
q_1=\frac{u_1}{\|u_1\|}.
$$

For \(j=2,\ldots,k\),

$$
u_j =
v_j -
\sum_{i=1}^{j-1}
\langle v_j,q_i\rangle q_i,
$$

and

$$
q_j=\frac{u_j}{\|u_j\|}.
$$

The result is an orthonormal list

$$
q_1,\ldots,q_k
$$

such that

$$
\operatorname{span}(q_1,\ldots,q_j) =
\operatorname{span}(v_1,\ldots,v_j)
$$

for every \(j\).

## 50.8 Example in \(\mathbb{R}^2\)

Let

$$
v_1=
\begin{bmatrix}
1\\
1
\end{bmatrix},
\qquad
v_2=
\begin{bmatrix}
1\\
0
\end{bmatrix}.
$$

First,

$$
u_1=v_1=
\begin{bmatrix}
1\\
1
\end{bmatrix}.
$$

Its norm is

$$
\|u_1\| =
\sqrt{1^2+1^2} =
\sqrt{2}.
$$

Thus

$$
q_1=
\frac{1}{\sqrt{2}}
\begin{bmatrix}
1\\
1
\end{bmatrix}.
$$

Now remove from \(v_2\) its projection onto \(q_1\):

$$
\langle v_2,q_1\rangle =
\left\langle
\begin{bmatrix}
1\\
0
\end{bmatrix},
\frac{1}{\sqrt{2}}
\begin{bmatrix}
1\\
1
\end{bmatrix}
\right\rangle =
\frac{1}{\sqrt{2}}.
$$

Therefore

$$
u_2 =
v_2-\langle v_2,q_1\rangle q_1 =
\begin{bmatrix}
1\\
0
\end{bmatrix} -
\frac{1}{\sqrt{2}}
\cdot
\frac{1}{\sqrt{2}}
\begin{bmatrix}
1\\
1
\end{bmatrix}.
$$

So

$$
u_2 =
\begin{bmatrix}
1\\
0
\end{bmatrix} -
\frac{1}{2}
\begin{bmatrix}
1\\
1
\end{bmatrix} =
\begin{bmatrix}
1/2\\
-1/2
\end{bmatrix}.
$$

The norm is

$$
\|u_2\| =
\sqrt{\frac14+\frac14} =
\frac{1}{\sqrt{2}}.
$$

Thus

$$
q_2 =
\frac{u_2}{\|u_2\|} =
\frac{1}{\sqrt{2}}
\begin{bmatrix}
1\\
-1
\end{bmatrix}.
$$

The final orthonormal basis is

$$
q_1=
\frac{1}{\sqrt{2}}
\begin{bmatrix}
1\\
1
\end{bmatrix},
\qquad
q_2=
\frac{1}{\sqrt{2}}
\begin{bmatrix}
1\\
-1
\end{bmatrix}.
$$

## 50.9 Example in \(\mathbb{R}^3\)

Let

$$
v_1=
\begin{bmatrix}
1\\
0\\
1
\end{bmatrix},
\qquad
v_2=
\begin{bmatrix}
1\\
1\\
0
\end{bmatrix},
\qquad
v_3=
\begin{bmatrix}
0\\
1\\
1
\end{bmatrix}.
$$

First,

$$
u_1=v_1.
$$

Since

$$
\|u_1\|=\sqrt{2},
$$

we have

$$
q_1=
\frac{1}{\sqrt{2}}
\begin{bmatrix}
1\\
0\\
1
\end{bmatrix}.
$$

Next,

$$
\langle v_2,q_1\rangle =
\frac{1}{\sqrt{2}}.
$$

Thus

$$
u_2 =
v_2-\frac{1}{\sqrt{2}}q_1 =
\begin{bmatrix}
1\\
1\\
0
\end{bmatrix} -
\frac12
\begin{bmatrix}
1\\
0\\
1
\end{bmatrix} =
\begin{bmatrix}
1/2\\
1\\
-1/2
\end{bmatrix}.
$$

Its norm is

$$
\|u_2\| =
\sqrt{\frac14+1+\frac14} =
\sqrt{\frac32}.
$$

Hence

$$
q_2 =
\frac{1}{\sqrt{6}}
\begin{bmatrix}
1\\
2\\
-1
\end{bmatrix}.
$$

Now compute \(u_3\):

$$
u_3 =
v_3-\langle v_3,q_1\rangle q_1-\langle v_3,q_2\rangle q_2.
$$

We have

$$
\langle v_3,q_1\rangle =
\frac{1}{\sqrt{2}},
$$

and

$$
\langle v_3,q_2\rangle =
\frac{1}{\sqrt{6}}.
$$

Therefore

$$
u_3 =
\begin{bmatrix}
0\\
1\\
1
\end{bmatrix} -
\frac12
\begin{bmatrix}
1\\
0\\
1
\end{bmatrix} -
\frac16
\begin{bmatrix}
1\\
2\\
-1
\end{bmatrix}.
$$

This gives

$$
u_3 =
\begin{bmatrix}
-2/3\\
2/3\\
2/3
\end{bmatrix}.
$$

Its norm is

$$
\|u_3\| =
\sqrt{\frac49+\frac49+\frac49} =
\frac{2}{\sqrt{3}}.
$$

Thus

$$
q_3 =
\frac{1}{\sqrt{3}}
\begin{bmatrix}
-1\\
1\\
1
\end{bmatrix}.
$$

The three vectors \(q_1,q_2,q_3\) form an orthonormal basis of \(\mathbb{R}^3\).

## 50.10 Orthogonal Version

Sometimes one wants an orthogonal basis rather than an orthonormal basis. In that case, stop before normalization.

Define

$$
u_1=v_1,
$$

and

$$
u_j =
v_j -
\sum_{i=1}^{j-1}
\frac{\langle v_j,u_i\rangle}{\langle u_i,u_i\rangle}u_i.
$$

Then

$$
u_1,\ldots,u_k
$$

are nonzero and mutually orthogonal.

The normalized vectors are

$$
q_j=\frac{u_j}{\|u_j\|}.
$$

The orthogonal version is often easier to write when the intermediate vectors are not unit vectors.

## 50.11 Span Preservation

Gram-Schmidt preserves spans step by step:

$$
\operatorname{span}(u_1,\ldots,u_j) =
\operatorname{span}(v_1,\ldots,v_j).
$$

This follows from the construction. Each \(u_j\) is built from \(v_j\) and earlier \(q_i\), which lie in the earlier span. Hence

$$
u_j\in \operatorname{span}(v_1,\ldots,v_j).
$$

Conversely,

$$
v_j =
u_j+
\sum_{i=1}^{j-1}
\langle v_j,q_i\rangle q_i,
$$

so \(v_j\) lies in the span of \(u_1,\ldots,u_j\).

Thus the two lists generate the same subspace at each stage.

## 50.12 QR Factorization

Gram-Schmidt gives a factorization of a matrix.

Let \(A\) be an \(m\times n\) matrix with linearly independent columns

$$
a_1,\ldots,a_n.
$$

Apply Gram-Schmidt to these columns and obtain orthonormal vectors

$$
q_1,\ldots,q_n.
$$

Let \(Q\) be the \(m\times n\) matrix with columns \(q_i\). Then

$$
Q^TQ=I.
$$

Each column \(a_j\) lies in

$$
\operatorname{span}(q_1,\ldots,q_j).
$$

Thus

$$
a_j=r_{1j}q_1+\cdots+r_{jj}q_j.
$$

Collecting these equations gives

$$
A=QR,
$$

where \(R\) is upper triangular.

The entries of \(R\) are

$$
r_{ij}=\langle a_j,q_i\rangle
\quad
\text{for } i\le j,
$$

and

$$
r_{jj}=\|u_j\|.
$$

The QR factorization is one of the main computational uses of Gram-Schmidt.

## 50.13 Classical Gram-Schmidt

The formula

$$
u_j =
v_j -
\sum_{i=1}^{j-1}
\langle v_j,q_i\rangle q_i
$$

is called classical Gram-Schmidt.

It subtracts all projections of \(v_j\) using inner products computed against the original vector \(v_j\).

In exact arithmetic, this produces an orthonormal basis. In floating point arithmetic, the computed vectors may lose orthogonality when the original vectors are nearly linearly dependent.

This numerical weakness matters in practical computation. For stable algorithms, one often uses modified Gram-Schmidt, Householder reflections, or Givens rotations.

## 50.14 Modified Gram-Schmidt

Modified Gram-Schmidt subtracts projections one at a time from the current residual.

Start with

$$
w=v_j.
$$

For \(i=1,\ldots,j-1\), compute

$$
r_{ij}=\langle w,q_i\rangle,
$$

then replace

$$
w \leftarrow w-r_{ij}q_i.
$$

After all previous directions have been removed, set

$$
u_j=w,
\qquad
q_j=\frac{u_j}{\|u_j\|}.
$$

In exact arithmetic, classical and modified Gram-Schmidt give the same result. In floating point arithmetic, modified Gram-Schmidt usually preserves orthogonality better.

The reason is that each projection is removed from the current residual rather than from the original vector.

## 50.15 Linear Dependence and Zero Residuals

If the input list is linearly dependent, Gram-Schmidt produces a zero residual at some step.

That is, for some \(j\),

$$
u_j=0.
$$

This means

$$
v_j\in \operatorname{span}(v_1,\ldots,v_{j-1}).
$$

In this case, \(v_j\) contributes no new direction and cannot be normalized.

A practical version of the algorithm skips zero residuals. If \(u_j=0\), discard \(v_j\). The remaining nonzero residuals form an orthogonal basis for the span of the original list.

This gives a method for extracting a basis from a spanning set.

## 50.16 Gram-Schmidt as Projection Residual

At each step, let

$$
W_{j-1} =
\operatorname{span}(q_1,\ldots,q_{j-1}).
$$

The vector

$$
\sum_{i=1}^{j-1}
\langle v_j,q_i\rangle q_i
$$

is the orthogonal projection of \(v_j\) onto \(W_{j-1}\).

Thus

$$
u_j =
v_j-\operatorname{proj}_{W_{j-1}}(v_j).
$$

So \(u_j\) is the residual after projecting \(v_j\) onto the old subspace.

This gives the geometric interpretation:

$$
v_j =
\text{old part}
+
\text{new orthogonal part}.
$$

Gram-Schmidt keeps the new orthogonal part and normalizes it.

## 50.17 Gram Matrices and Nonzero Pivots

Let

$$
G_j =
\big[\langle v_i,v_l\rangle\big]_{i,l=1}^j
$$

be the Gram matrix of the first \(j\) input vectors.

If \(v_1,\ldots,v_j\) are linearly independent, then \(G_j\) is positive definite. This guarantees that the construction has no zero residuals.

The squared lengths

$$
\|u_j\|^2
$$

measure the amount of genuinely new direction contributed by \(v_j\) beyond the earlier span.

If \(\|u_j\|\) is very small, then \(v_j\) is nearly dependent on the earlier vectors. This is a warning sign for numerical instability.

## 50.18 Function Space Example

Gram-Schmidt also applies to function spaces.

Let \(V\) be the space of polynomials on \([-1,1]\) with inner product

$$
\langle f,g\rangle =
\int_{-1}^{1} f(x)g(x)\,dx.
$$

Start with

$$
1,\quad x,\quad x^2,\quad \ldots
$$

Applying Gram-Schmidt produces an orthogonal sequence of polynomials. With a conventional normalization, this gives the Legendre polynomials.

For example,

$$
p_0(x)=1.
$$

Since

$$
\langle x,1\rangle =
\int_{-1}^{1}x\,dx =
0,
$$

the polynomial \(x\) is already orthogonal to \(1\).

For \(x^2\), subtract its projection onto \(1\). Since

$$
\langle x^2,1\rangle =
\int_{-1}^{1}x^2\,dx =
\frac{2}{3},
$$

and

$$
\langle 1,1\rangle=2,
$$

the projection coefficient is

$$
\frac{\langle x^2,1\rangle}{\langle 1,1\rangle} =
\frac13.
$$

Thus the new orthogonal polynomial is

$$
x^2-\frac13.
$$

After scaling, this corresponds to the second Legendre polynomial.

## 50.19 Numerical Remarks

Gram-Schmidt is conceptually important, but its classical form can be numerically fragile. When vectors are nearly linearly dependent, subtraction may remove almost equal quantities. This causes cancellation and loss of orthogonality.

For numerical QR factorization, Householder transformations are usually preferred for high stability. Modified Gram-Schmidt is often acceptable and easier to implement in iterative methods.

The correct algorithm depends on the context:

| Method | Main use |
|---|---|
| Classical Gram-Schmidt | Theory, simple derivations |
| Modified Gram-Schmidt | Iterative methods, improved orthogonality |
| Householder QR | Stable dense QR factorization |
| Givens rotations | Sparse QR, selective zeroing |

The mathematical goal is the same: replace a basis by an orthonormal basis for the same subspace.

## 50.20 Summary

Gram-Schmidt orthogonalization converts a linearly independent list

$$
v_1,\ldots,v_k
$$

into an orthonormal list

$$
q_1,\ldots,q_k
$$

with the same successive spans.

The construction is

$$
u_j =
v_j -
\sum_{i=1}^{j-1}
\langle v_j,q_i\rangle q_i,
\qquad
q_j=\frac{u_j}{\|u_j\|}.
$$

At each step, the algorithm subtracts the projection onto the old subspace and keeps the new orthogonal residual.

Gram-Schmidt explains how orthonormal bases arise, why QR factorization exists, and how projection can be performed by successive removal of old components.