Chapter 50. Gram-Schmidt Orthogonalization

The Gram-Schmidt process converts a linearly independent list of vectors into an orthogonal or orthonormal list spanning the same subspace. It is one of the standard constructions in inner product spaces and is the basic theoretical source of the QR factorization used in numerical linear algebra. The process repeatedly subtracts projections onto previously constructed directions, leaving a residual vector orthogonal to all earlier ones.

50.1 The Problem

Let

v_1, v_2, \ldots, v_k

be linearly independent vectors in an inner product space $V$ . We want to construct vectors

q_1, q_2, \ldots, q_k

such that:

Requirement	Meaning
Same span	$\operatorname{span}(q_1,\ldots,q_j)=\operatorname{span}(v_1,\ldots,v_j)$ for each $j$
Orthogonality	$\langle q_i,q_j\rangle=0$ whenever $i\ne j$
Normalization	$\\|q_j\\|=1$ for each $j$

The output is an orthonormal basis for the same subspace spanned by the original vectors.

The key idea is simple. Keep the part of each new vector that points in a new direction. Remove all parts that already lie in the directions constructed earlier.

50.2 Projection onto a Unit Vector

Suppose $q$ is a unit vector. The projection of $v$ onto the line spanned by $q$ is

\operatorname{proj}_q(v)=\langle v,q\rangle q.

This formula is simple because $\|q\|=1$ . For a non-unit vector $u$ , the projection would be

\operatorname{proj}_u(v) = \frac{\langle v,u\rangle}{\langle u,u\rangle}u.

Gram-Schmidt uses these projections to remove the parts of a vector that lie in old directions.

50.3 First Step

Start with the first vector $v_1$ . Since the list is linearly independent, $v_1\ne 0$ .

Define

u_1=v_1.

Normalize it:

q_1=\frac{u_1}{\|u_1\|}.

Then $q_1$ is a unit vector and

\operatorname{span}(q_1)=\operatorname{span}(v_1).

The first direction is unchanged except for scaling.

50.4 Second Step

The second vector $v_2$ may have a component in the direction of $q_1$ . Remove that component:

u_2 = v_2-\langle v_2,q_1\rangle q_1.

Then $u_2$ is orthogonal to $q_1$ . Indeed,

\langle u_2,q_1\rangle = \langle v_2-\langle v_2,q_1\rangle q_1,q_1\rangle.

Using linearity,

\langle u_2,q_1\rangle = \langle v_2,q_1\rangle - \langle v_2,q_1\rangle\langle q_1,q_1\rangle.

Since $q_1$ is a unit vector,

\langle q_1,q_1\rangle=1.

Therefore

\langle u_2,q_1\rangle=0.

Normalize:

q_2=\frac{u_2}{\|u_2\|}.

Now $q_1,q_2$ are orthonormal and span the same subspace as $v_1,v_2$ .

50.5 General Step

Assume that

q_1,\ldots,q_{j-1}

have already been constructed and are orthonormal.

To construct the next vector, subtract from $v_j$ its projections onto all previous $q_i$ :

u_j = v_j - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i.

Then normalize:

q_j=\frac{u_j}{\|u_j\|}.

The vector $u_j$ is orthogonal to every earlier $q_i$ . For $m<j$ ,

\langle u_j,q_m\rangle = \left\langle v_j - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i, q_m \right\rangle.

By linearity,

\langle u_j,q_m\rangle = \langle v_j,q_m\rangle - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle \langle q_i,q_m\rangle.

Since the $q_i$ are orthonormal,

\langle q_i,q_m\rangle=\delta_{im}.

Thus only one term remains:

\langle u_j,q_m\rangle = \langle v_j,q_m\rangle - \langle v_j,q_m\rangle = 0.

So $u_j$ is orthogonal to all previous directions.

50.6 Why $u_j$ Is Nonzero

The vector $u_j$ must be nonzero. If

u_j=0,

then

v_j = \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i.

This would imply

v_j\in \operatorname{span}(q_1,\ldots,q_{j-1}).

But

\operatorname{span}(q_1,\ldots,q_{j-1}) = \operatorname{span}(v_1,\ldots,v_{j-1}).

Thus $v_j$ would lie in the span of the earlier $v_i$ , contradicting linear independence.

Therefore $u_j\ne 0$ , so normalization is valid.

50.7 The Algorithm

Given linearly independent vectors

v_1,\ldots,v_k,

the Gram-Schmidt algorithm is:

u_1=v_1, \qquad q_1=\frac{u_1}{\|u_1\|}.

For $j=2,\ldots,k$ ,

u_j = v_j - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i,

and

q_j=\frac{u_j}{\|u_j\|}.

The result is an orthonormal list

q_1,\ldots,q_k

such that

\operatorname{span}(q_1,\ldots,q_j) = \operatorname{span}(v_1,\ldots,v_j)

for every $j$ .

50.8 Example in $\mathbb{R}^2$

Let

v_1= \begin{bmatrix} 1\\ 1 \end{bmatrix}, \qquad v_2= \begin{bmatrix} 1\\ 0 \end{bmatrix}.

First,

u_1=v_1= \begin{bmatrix} 1\\ 1 \end{bmatrix}.

Its norm is

\|u_1\| = \sqrt{1^2+1^2} = \sqrt{2}.

Thus

q_1= \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ 1 \end{bmatrix}.

Now remove from $v_2$ its projection onto $q_1$ :

\langle v_2,q_1\rangle = \left\langle \begin{bmatrix} 1\\ 0 \end{bmatrix}, \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ 1 \end{bmatrix} \right\rangle = \frac{1}{\sqrt{2}}.

Therefore

u_2 = v_2-\langle v_2,q_1\rangle q_1 = \begin{bmatrix} 1\\ 0 \end{bmatrix} - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ 1 \end{bmatrix}.

u_2 = \begin{bmatrix} 1\\ 0 \end{bmatrix} - \frac{1}{2} \begin{bmatrix} 1\\ 1 \end{bmatrix} = \begin{bmatrix} 1/2\\ -1/2 \end{bmatrix}.

The norm is

\|u_2\| = \sqrt{\frac14+\frac14} = \frac{1}{\sqrt{2}}.

Thus

q_2 = \frac{u_2}{\|u_2\|} = \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ -1 \end{bmatrix}.

The final orthonormal basis is

q_1= \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ 1 \end{bmatrix}, \qquad q_2= \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ -1 \end{bmatrix}.

50.9 Example in $\mathbb{R}^3$

Let

v_1= \begin{bmatrix} 1\\ 0\\ 1 \end{bmatrix}, \qquad v_2= \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix}, \qquad v_3= \begin{bmatrix} 0\\ 1\\ 1 \end{bmatrix}.

First,

u_1=v_1.

Since

\|u_1\|=\sqrt{2},

we have

q_1= \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ 0\\ 1 \end{bmatrix}.

Next,

\langle v_2,q_1\rangle = \frac{1}{\sqrt{2}}.

Thus

u_2 = v_2-\frac{1}{\sqrt{2}}q_1 = \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix} - \frac12 \begin{bmatrix} 1\\ 0\\ 1 \end{bmatrix} = \begin{bmatrix} 1/2\\ 1\\ -1/2 \end{bmatrix}.

Its norm is

\|u_2\| = \sqrt{\frac14+1+\frac14} = \sqrt{\frac32}.

Hence

q_2 = \frac{1}{\sqrt{6}} \begin{bmatrix} 1\\ 2\\ -1 \end{bmatrix}.

Now compute $u_3$ :

u_3 = v_3-\langle v_3,q_1\rangle q_1-\langle v_3,q_2\rangle q_2.

We have

\langle v_3,q_1\rangle = \frac{1}{\sqrt{2}},

and

\langle v_3,q_2\rangle = \frac{1}{\sqrt{6}}.

Therefore

u_3 = \begin{bmatrix} 0\\ 1\\ 1 \end{bmatrix} - \frac12 \begin{bmatrix} 1\\ 0\\ 1 \end{bmatrix} - \frac16 \begin{bmatrix} 1\\ 2\\ -1 \end{bmatrix}.

This gives

u_3 = \begin{bmatrix} -2/3\\ 2/3\\ 2/3 \end{bmatrix}.

Its norm is

\|u_3\| = \sqrt{\frac49+\frac49+\frac49} = \frac{2}{\sqrt{3}}.

Thus

q_3 = \frac{1}{\sqrt{3}} \begin{bmatrix} -1\\ 1\\ 1 \end{bmatrix}.

The three vectors $q_1,q_2,q_3$ form an orthonormal basis of $\mathbb{R}^3$ .

50.10 Orthogonal Version

Sometimes one wants an orthogonal basis rather than an orthonormal basis. In that case, stop before normalization.

Define

u_1=v_1,

and

u_j = v_j - \sum_{i=1}^{j-1} \frac{\langle v_j,u_i\rangle}{\langle u_i,u_i\rangle}u_i.

Then

u_1,\ldots,u_k

are nonzero and mutually orthogonal.

The normalized vectors are

q_j=\frac{u_j}{\|u_j\|}.

The orthogonal version is often easier to write when the intermediate vectors are not unit vectors.

50.11 Span Preservation

Gram-Schmidt preserves spans step by step:

\operatorname{span}(u_1,\ldots,u_j) = \operatorname{span}(v_1,\ldots,v_j).

This follows from the construction. Each $u_j$ is built from $v_j$ and earlier $q_i$ , which lie in the earlier span. Hence

u_j\in \operatorname{span}(v_1,\ldots,v_j).

Conversely,

v_j = u_j+ \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i,

so $v_j$ lies in the span of $u_1,\ldots,u_j$ .

Thus the two lists generate the same subspace at each stage.

50.12 QR Factorization

Gram-Schmidt gives a factorization of a matrix.

Let $A$ be an $m\times n$ matrix with linearly independent columns

a_1,\ldots,a_n.

Apply Gram-Schmidt to these columns and obtain orthonormal vectors

q_1,\ldots,q_n.

Let $Q$ be the $m\times n$ matrix with columns $q_i$ . Then

Q^TQ=I.

Each column $a_j$ lies in

\operatorname{span}(q_1,\ldots,q_j).

Thus

a_j=r_{1j}q_1+\cdots+r_{jj}q_j.

Collecting these equations gives

A=QR,

where $R$ is upper triangular.

The entries of $R$ are

r_{ij}=\langle a_j,q_i\rangle \quad \text{for } i\le j,

and

r_{jj}=\|u_j\|.

The QR factorization is one of the main computational uses of Gram-Schmidt.

50.13 Classical Gram-Schmidt

The formula

u_j = v_j - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i

is called classical Gram-Schmidt.

It subtracts all projections of $v_j$ using inner products computed against the original vector $v_j$ .

In exact arithmetic, this produces an orthonormal basis. In floating point arithmetic, the computed vectors may lose orthogonality when the original vectors are nearly linearly dependent.

This numerical weakness matters in practical computation. For stable algorithms, one often uses modified Gram-Schmidt, Householder reflections, or Givens rotations.

50.14 Modified Gram-Schmidt

Modified Gram-Schmidt subtracts projections one at a time from the current residual.

Start with

w=v_j.

For $i=1,\ldots,j-1$ , compute

r_{ij}=\langle w,q_i\rangle,

then replace

w \leftarrow w-r_{ij}q_i.

After all previous directions have been removed, set

u_j=w, \qquad q_j=\frac{u_j}{\|u_j\|}.

In exact arithmetic, classical and modified Gram-Schmidt give the same result. In floating point arithmetic, modified Gram-Schmidt usually preserves orthogonality better.

The reason is that each projection is removed from the current residual rather than from the original vector.

50.15 Linear Dependence and Zero Residuals

If the input list is linearly dependent, Gram-Schmidt produces a zero residual at some step.

That is, for some $j$ ,

u_j=0.

This means

v_j\in \operatorname{span}(v_1,\ldots,v_{j-1}).

In this case, $v_j$ contributes no new direction and cannot be normalized.

A practical version of the algorithm skips zero residuals. If $u_j=0$ , discard $v_j$ . The remaining nonzero residuals form an orthogonal basis for the span of the original list.

This gives a method for extracting a basis from a spanning set.

50.16 Gram-Schmidt as Projection Residual

At each step, let

W_{j-1} = \operatorname{span}(q_1,\ldots,q_{j-1}).

The vector

\sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i

is the orthogonal projection of $v_j$ onto $W_{j-1}$ .

Thus

u_j = v_j-\operatorname{proj}_{W_{j-1}}(v_j).

So $u_j$ is the residual after projecting $v_j$ onto the old subspace.

This gives the geometric interpretation:

v_j = \text{old part} + \text{new orthogonal part}.

Gram-Schmidt keeps the new orthogonal part and normalizes it.

50.17 Gram Matrices and Nonzero Pivots

Let

G_j = \big[\langle v_i,v_l\rangle\big]_{i,l=1}^j

be the Gram matrix of the first $j$ input vectors.

If $v_1,\ldots,v_j$ are linearly independent, then $G_j$ is positive definite. This guarantees that the construction has no zero residuals.

The squared lengths

\|u_j\|^2

measure the amount of genuinely new direction contributed by $v_j$ beyond the earlier span.

If $\|u_j\|$ is very small, then $v_j$ is nearly dependent on the earlier vectors. This is a warning sign for numerical instability.

50.18 Function Space Example

Gram-Schmidt also applies to function spaces.

Let $V$ be the space of polynomials on $[-1,1]$ with inner product

\langle f,g\rangle = \int_{-1}^{1} f(x)g(x)\,dx.

Start with

1,\quad x,\quad x^2,\quad \ldots

Applying Gram-Schmidt produces an orthogonal sequence of polynomials. With a conventional normalization, this gives the Legendre polynomials.

For example,

p_0(x)=1.

Since

\langle x,1\rangle = \int_{-1}^{1}x\,dx = 0,

the polynomial $x$ is already orthogonal to $1$ .

For $x^2$ , subtract its projection onto $1$ . Since

\langle x^2,1\rangle = \int_{-1}^{1}x^2\,dx = \frac{2}{3},

and

\langle 1,1\rangle=2,

the projection coefficient is

\frac{\langle x^2,1\rangle}{\langle 1,1\rangle} = \frac13.

Thus the new orthogonal polynomial is

x^2-\frac13.

After scaling, this corresponds to the second Legendre polynomial.

50.19 Numerical Remarks

Gram-Schmidt is conceptually important, but its classical form can be numerically fragile. When vectors are nearly linearly dependent, subtraction may remove almost equal quantities. This causes cancellation and loss of orthogonality.

For numerical QR factorization, Householder transformations are usually preferred for high stability. Modified Gram-Schmidt is often acceptable and easier to implement in iterative methods.

The correct algorithm depends on the context:

Method	Main use
Classical Gram-Schmidt	Theory, simple derivations
Modified Gram-Schmidt	Iterative methods, improved orthogonality
Householder QR	Stable dense QR factorization
Givens rotations	Sparse QR, selective zeroing

The mathematical goal is the same: replace a basis by an orthonormal basis for the same subspace.

50.20 Summary

Gram-Schmidt orthogonalization converts a linearly independent list

v_1,\ldots,v_k

into an orthonormal list

q_1,\ldots,q_k

with the same successive spans.

The construction is

u_j = v_j - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i, \qquad q_j=\frac{u_j}{\|u_j\|}.

At each step, the algorithm subtracts the projection onto the old subspace and keeps the new orthogonal residual.

Gram-Schmidt explains how orthonormal bases arise, why QR factorization exists, and how projection can be performed by successive removal of old components.

Chapter 50. Gram-Schmidt Orthogonalization

50.1 The Problem

50.2 Projection onto a Unit Vector

50.3 First Step

50.4 Second Step

50.5 General Step

50.6 Why uju_juj​ Is Nonzero

50.7 The Algorithm

50.8 Example in R2\mathbb{R}^2R2

50.9 Example in R3\mathbb{R}^3R3

50.10 Orthogonal Version

50.11 Span Preservation

50.12 QR Factorization

50.13 Classical Gram-Schmidt

50.14 Modified Gram-Schmidt

50.15 Linear Dependence and Zero Residuals

50.16 Gram-Schmidt as Projection Residual

50.17 Gram Matrices and Nonzero Pivots

50.18 Function Space Example

50.19 Numerical Remarks

50.20 Summary

50.6 Why $u_j$ Is Nonzero

50.8 Example in $\mathbb{R}^2$

50.9 Example in $\mathbb{R}^3$