Skip to content

Chapter 50. Gram-Schmidt Orthogonalization

The Gram-Schmidt process converts a linearly independent list of vectors into an orthogonal or orthonormal list spanning the same subspace. It is one of the standard constructions in inner product spaces and is the basic theoretical source of the QR factorization used in numerical linear algebra. The process repeatedly subtracts projections onto previously constructed directions, leaving a residual vector orthogonal to all earlier ones.

50.1 The Problem

Let

v1,v2,,vk v_1, v_2, \ldots, v_k

be linearly independent vectors in an inner product space VV. We want to construct vectors

q1,q2,,qk q_1, q_2, \ldots, q_k

such that:

RequirementMeaning
Same spanspan(q1,,qj)=span(v1,,vj)\operatorname{span}(q_1,\ldots,q_j)=\operatorname{span}(v_1,\ldots,v_j) for each jj
Orthogonalityqi,qj=0\langle q_i,q_j\rangle=0 whenever iji\ne j
Normalizationqj=1\|q_j\|=1 for each jj

The output is an orthonormal basis for the same subspace spanned by the original vectors.

The key idea is simple. Keep the part of each new vector that points in a new direction. Remove all parts that already lie in the directions constructed earlier.

50.2 Projection onto a Unit Vector

Suppose qq is a unit vector. The projection of vv onto the line spanned by qq is

projq(v)=v,qq. \operatorname{proj}_q(v)=\langle v,q\rangle q.

This formula is simple because q=1\|q\|=1. For a non-unit vector uu, the projection would be

proju(v)=v,uu,uu. \operatorname{proj}_u(v) = \frac{\langle v,u\rangle}{\langle u,u\rangle}u.

Gram-Schmidt uses these projections to remove the parts of a vector that lie in old directions.

50.3 First Step

Start with the first vector v1v_1. Since the list is linearly independent, v10v_1\ne 0.

Define

u1=v1. u_1=v_1.

Normalize it:

q1=u1u1. q_1=\frac{u_1}{\|u_1\|}.

Then q1q_1 is a unit vector and

span(q1)=span(v1). \operatorname{span}(q_1)=\operatorname{span}(v_1).

The first direction is unchanged except for scaling.

50.4 Second Step

The second vector v2v_2 may have a component in the direction of q1q_1. Remove that component:

u2=v2v2,q1q1. u_2 = v_2-\langle v_2,q_1\rangle q_1.

Then u2u_2 is orthogonal to q1q_1. Indeed,

u2,q1=v2v2,q1q1,q1. \langle u_2,q_1\rangle = \langle v_2-\langle v_2,q_1\rangle q_1,q_1\rangle.

Using linearity,

u2,q1=v2,q1v2,q1q1,q1. \langle u_2,q_1\rangle = \langle v_2,q_1\rangle - \langle v_2,q_1\rangle\langle q_1,q_1\rangle.

Since q1q_1 is a unit vector,

q1,q1=1. \langle q_1,q_1\rangle=1.

Therefore

u2,q1=0. \langle u_2,q_1\rangle=0.

Normalize:

q2=u2u2. q_2=\frac{u_2}{\|u_2\|}.

Now q1,q2q_1,q_2 are orthonormal and span the same subspace as v1,v2v_1,v_2.

50.5 General Step

Assume that

q1,,qj1 q_1,\ldots,q_{j-1}

have already been constructed and are orthonormal.

To construct the next vector, subtract from vjv_j its projections onto all previous qiq_i:

uj=vji=1j1vj,qiqi. u_j = v_j - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i.

Then normalize:

qj=ujuj. q_j=\frac{u_j}{\|u_j\|}.

The vector uju_j is orthogonal to every earlier qiq_i. For m<jm<j,

uj,qm=vji=1j1vj,qiqi,qm. \langle u_j,q_m\rangle = \left\langle v_j - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i, q_m \right\rangle.

By linearity,

uj,qm=vj,qmi=1j1vj,qiqi,qm. \langle u_j,q_m\rangle = \langle v_j,q_m\rangle - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle \langle q_i,q_m\rangle.

Since the qiq_i are orthonormal,

qi,qm=δim. \langle q_i,q_m\rangle=\delta_{im}.

Thus only one term remains:

uj,qm=vj,qmvj,qm=0. \langle u_j,q_m\rangle = \langle v_j,q_m\rangle - \langle v_j,q_m\rangle = 0.

So uju_j is orthogonal to all previous directions.

50.6 Why uju_j Is Nonzero

The vector uju_j must be nonzero. If

uj=0, u_j=0,

then

vj=i=1j1vj,qiqi. v_j = \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i.

This would imply

vjspan(q1,,qj1). v_j\in \operatorname{span}(q_1,\ldots,q_{j-1}).

But

span(q1,,qj1)=span(v1,,vj1). \operatorname{span}(q_1,\ldots,q_{j-1}) = \operatorname{span}(v_1,\ldots,v_{j-1}).

Thus vjv_j would lie in the span of the earlier viv_i, contradicting linear independence.

Therefore uj0u_j\ne 0, so normalization is valid.

50.7 The Algorithm

Given linearly independent vectors

v1,,vk, v_1,\ldots,v_k,

the Gram-Schmidt algorithm is:

u1=v1,q1=u1u1. u_1=v_1, \qquad q_1=\frac{u_1}{\|u_1\|}.

For j=2,,kj=2,\ldots,k,

uj=vji=1j1vj,qiqi, u_j = v_j - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i,

and

qj=ujuj. q_j=\frac{u_j}{\|u_j\|}.

The result is an orthonormal list

q1,,qk q_1,\ldots,q_k

such that

span(q1,,qj)=span(v1,,vj) \operatorname{span}(q_1,\ldots,q_j) = \operatorname{span}(v_1,\ldots,v_j)

for every jj.

50.8 Example in R2\mathbb{R}^2

Let

v1=[11],v2=[10]. v_1= \begin{bmatrix} 1\\ 1 \end{bmatrix}, \qquad v_2= \begin{bmatrix} 1\\ 0 \end{bmatrix}.

First,

u1=v1=[11]. u_1=v_1= \begin{bmatrix} 1\\ 1 \end{bmatrix}.

Its norm is

u1=12+12=2. \|u_1\| = \sqrt{1^2+1^2} = \sqrt{2}.

Thus

q1=12[11]. q_1= \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ 1 \end{bmatrix}.

Now remove from v2v_2 its projection onto q1q_1:

v2,q1=[10],12[11]=12. \langle v_2,q_1\rangle = \left\langle \begin{bmatrix} 1\\ 0 \end{bmatrix}, \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ 1 \end{bmatrix} \right\rangle = \frac{1}{\sqrt{2}}.

Therefore

u2=v2v2,q1q1=[10]1212[11]. u_2 = v_2-\langle v_2,q_1\rangle q_1 = \begin{bmatrix} 1\\ 0 \end{bmatrix} - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ 1 \end{bmatrix}.

So

u2=[10]12[11]=[1/21/2]. u_2 = \begin{bmatrix} 1\\ 0 \end{bmatrix} - \frac{1}{2} \begin{bmatrix} 1\\ 1 \end{bmatrix} = \begin{bmatrix} 1/2\\ -1/2 \end{bmatrix}.

The norm is

u2=14+14=12. \|u_2\| = \sqrt{\frac14+\frac14} = \frac{1}{\sqrt{2}}.

Thus

q2=u2u2=12[11]. q_2 = \frac{u_2}{\|u_2\|} = \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ -1 \end{bmatrix}.

The final orthonormal basis is

q1=12[11],q2=12[11]. q_1= \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ 1 \end{bmatrix}, \qquad q_2= \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ -1 \end{bmatrix}.

50.9 Example in R3\mathbb{R}^3

Let

v1=[101],v2=[110],v3=[011]. v_1= \begin{bmatrix} 1\\ 0\\ 1 \end{bmatrix}, \qquad v_2= \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix}, \qquad v_3= \begin{bmatrix} 0\\ 1\\ 1 \end{bmatrix}.

First,

u1=v1. u_1=v_1.

Since

u1=2, \|u_1\|=\sqrt{2},

we have

q1=12[101]. q_1= \frac{1}{\sqrt{2}} \begin{bmatrix} 1\\ 0\\ 1 \end{bmatrix}.

Next,

v2,q1=12. \langle v_2,q_1\rangle = \frac{1}{\sqrt{2}}.

Thus

u2=v212q1=[110]12[101]=[1/211/2]. u_2 = v_2-\frac{1}{\sqrt{2}}q_1 = \begin{bmatrix} 1\\ 1\\ 0 \end{bmatrix} - \frac12 \begin{bmatrix} 1\\ 0\\ 1 \end{bmatrix} = \begin{bmatrix} 1/2\\ 1\\ -1/2 \end{bmatrix}.

Its norm is

u2=14+1+14=32. \|u_2\| = \sqrt{\frac14+1+\frac14} = \sqrt{\frac32}.

Hence

q2=16[121]. q_2 = \frac{1}{\sqrt{6}} \begin{bmatrix} 1\\ 2\\ -1 \end{bmatrix}.

Now compute u3u_3:

u3=v3v3,q1q1v3,q2q2. u_3 = v_3-\langle v_3,q_1\rangle q_1-\langle v_3,q_2\rangle q_2.

We have

v3,q1=12, \langle v_3,q_1\rangle = \frac{1}{\sqrt{2}},

and

v3,q2=16. \langle v_3,q_2\rangle = \frac{1}{\sqrt{6}}.

Therefore

u3=[011]12[101]16[121]. u_3 = \begin{bmatrix} 0\\ 1\\ 1 \end{bmatrix} - \frac12 \begin{bmatrix} 1\\ 0\\ 1 \end{bmatrix} - \frac16 \begin{bmatrix} 1\\ 2\\ -1 \end{bmatrix}.

This gives

u3=[2/32/32/3]. u_3 = \begin{bmatrix} -2/3\\ 2/3\\ 2/3 \end{bmatrix}.

Its norm is

u3=49+49+49=23. \|u_3\| = \sqrt{\frac49+\frac49+\frac49} = \frac{2}{\sqrt{3}}.

Thus

q3=13[111]. q_3 = \frac{1}{\sqrt{3}} \begin{bmatrix} -1\\ 1\\ 1 \end{bmatrix}.

The three vectors q1,q2,q3q_1,q_2,q_3 form an orthonormal basis of R3\mathbb{R}^3.

50.10 Orthogonal Version

Sometimes one wants an orthogonal basis rather than an orthonormal basis. In that case, stop before normalization.

Define

u1=v1, u_1=v_1,

and

uj=vji=1j1vj,uiui,uiui. u_j = v_j - \sum_{i=1}^{j-1} \frac{\langle v_j,u_i\rangle}{\langle u_i,u_i\rangle}u_i.

Then

u1,,uk u_1,\ldots,u_k

are nonzero and mutually orthogonal.

The normalized vectors are

qj=ujuj. q_j=\frac{u_j}{\|u_j\|}.

The orthogonal version is often easier to write when the intermediate vectors are not unit vectors.

50.11 Span Preservation

Gram-Schmidt preserves spans step by step:

span(u1,,uj)=span(v1,,vj). \operatorname{span}(u_1,\ldots,u_j) = \operatorname{span}(v_1,\ldots,v_j).

This follows from the construction. Each uju_j is built from vjv_j and earlier qiq_i, which lie in the earlier span. Hence

ujspan(v1,,vj). u_j\in \operatorname{span}(v_1,\ldots,v_j).

Conversely,

vj=uj+i=1j1vj,qiqi, v_j = u_j+ \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i,

so vjv_j lies in the span of u1,,uju_1,\ldots,u_j.

Thus the two lists generate the same subspace at each stage.

50.12 QR Factorization

Gram-Schmidt gives a factorization of a matrix.

Let AA be an m×nm\times n matrix with linearly independent columns

a1,,an. a_1,\ldots,a_n.

Apply Gram-Schmidt to these columns and obtain orthonormal vectors

q1,,qn. q_1,\ldots,q_n.

Let QQ be the m×nm\times n matrix with columns qiq_i. Then

QTQ=I. Q^TQ=I.

Each column aja_j lies in

span(q1,,qj). \operatorname{span}(q_1,\ldots,q_j).

Thus

aj=r1jq1++rjjqj. a_j=r_{1j}q_1+\cdots+r_{jj}q_j.

Collecting these equations gives

A=QR, A=QR,

where RR is upper triangular.

The entries of RR are

rij=aj,qifor ij, r_{ij}=\langle a_j,q_i\rangle \quad \text{for } i\le j,

and

rjj=uj. r_{jj}=\|u_j\|.

The QR factorization is one of the main computational uses of Gram-Schmidt.

50.13 Classical Gram-Schmidt

The formula

uj=vji=1j1vj,qiqi u_j = v_j - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i

is called classical Gram-Schmidt.

It subtracts all projections of vjv_j using inner products computed against the original vector vjv_j.

In exact arithmetic, this produces an orthonormal basis. In floating point arithmetic, the computed vectors may lose orthogonality when the original vectors are nearly linearly dependent.

This numerical weakness matters in practical computation. For stable algorithms, one often uses modified Gram-Schmidt, Householder reflections, or Givens rotations.

50.14 Modified Gram-Schmidt

Modified Gram-Schmidt subtracts projections one at a time from the current residual.

Start with

w=vj. w=v_j.

For i=1,,j1i=1,\ldots,j-1, compute

rij=w,qi, r_{ij}=\langle w,q_i\rangle,

then replace

wwrijqi. w \leftarrow w-r_{ij}q_i.

After all previous directions have been removed, set

uj=w,qj=ujuj. u_j=w, \qquad q_j=\frac{u_j}{\|u_j\|}.

In exact arithmetic, classical and modified Gram-Schmidt give the same result. In floating point arithmetic, modified Gram-Schmidt usually preserves orthogonality better.

The reason is that each projection is removed from the current residual rather than from the original vector.

50.15 Linear Dependence and Zero Residuals

If the input list is linearly dependent, Gram-Schmidt produces a zero residual at some step.

That is, for some jj,

uj=0. u_j=0.

This means

vjspan(v1,,vj1). v_j\in \operatorname{span}(v_1,\ldots,v_{j-1}).

In this case, vjv_j contributes no new direction and cannot be normalized.

A practical version of the algorithm skips zero residuals. If uj=0u_j=0, discard vjv_j. The remaining nonzero residuals form an orthogonal basis for the span of the original list.

This gives a method for extracting a basis from a spanning set.

50.16 Gram-Schmidt as Projection Residual

At each step, let

Wj1=span(q1,,qj1). W_{j-1} = \operatorname{span}(q_1,\ldots,q_{j-1}).

The vector

i=1j1vj,qiqi \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i

is the orthogonal projection of vjv_j onto Wj1W_{j-1}.

Thus

uj=vjprojWj1(vj). u_j = v_j-\operatorname{proj}_{W_{j-1}}(v_j).

So uju_j is the residual after projecting vjv_j onto the old subspace.

This gives the geometric interpretation:

vj=old part+new orthogonal part. v_j = \text{old part} + \text{new orthogonal part}.

Gram-Schmidt keeps the new orthogonal part and normalizes it.

50.17 Gram Matrices and Nonzero Pivots

Let

Gj=[vi,vl]i,l=1j G_j = \big[\langle v_i,v_l\rangle\big]_{i,l=1}^j

be the Gram matrix of the first jj input vectors.

If v1,,vjv_1,\ldots,v_j are linearly independent, then GjG_j is positive definite. This guarantees that the construction has no zero residuals.

The squared lengths

uj2 \|u_j\|^2

measure the amount of genuinely new direction contributed by vjv_j beyond the earlier span.

If uj\|u_j\| is very small, then vjv_j is nearly dependent on the earlier vectors. This is a warning sign for numerical instability.

50.18 Function Space Example

Gram-Schmidt also applies to function spaces.

Let VV be the space of polynomials on [1,1][-1,1] with inner product

f,g=11f(x)g(x)dx. \langle f,g\rangle = \int_{-1}^{1} f(x)g(x)\,dx.

Start with

1,x,x2, 1,\quad x,\quad x^2,\quad \ldots

Applying Gram-Schmidt produces an orthogonal sequence of polynomials. With a conventional normalization, this gives the Legendre polynomials.

For example,

p0(x)=1. p_0(x)=1.

Since

x,1=11xdx=0, \langle x,1\rangle = \int_{-1}^{1}x\,dx = 0,

the polynomial xx is already orthogonal to 11.

For x2x^2, subtract its projection onto 11. Since

x2,1=11x2dx=23, \langle x^2,1\rangle = \int_{-1}^{1}x^2\,dx = \frac{2}{3},

and

1,1=2, \langle 1,1\rangle=2,

the projection coefficient is

x2,11,1=13. \frac{\langle x^2,1\rangle}{\langle 1,1\rangle} = \frac13.

Thus the new orthogonal polynomial is

x213. x^2-\frac13.

After scaling, this corresponds to the second Legendre polynomial.

50.19 Numerical Remarks

Gram-Schmidt is conceptually important, but its classical form can be numerically fragile. When vectors are nearly linearly dependent, subtraction may remove almost equal quantities. This causes cancellation and loss of orthogonality.

For numerical QR factorization, Householder transformations are usually preferred for high stability. Modified Gram-Schmidt is often acceptable and easier to implement in iterative methods.

The correct algorithm depends on the context:

MethodMain use
Classical Gram-SchmidtTheory, simple derivations
Modified Gram-SchmidtIterative methods, improved orthogonality
Householder QRStable dense QR factorization
Givens rotationsSparse QR, selective zeroing

The mathematical goal is the same: replace a basis by an orthonormal basis for the same subspace.

50.20 Summary

Gram-Schmidt orthogonalization converts a linearly independent list

v1,,vk v_1,\ldots,v_k

into an orthonormal list

q1,,qk q_1,\ldots,q_k

with the same successive spans.

The construction is

uj=vji=1j1vj,qiqi,qj=ujuj. u_j = v_j - \sum_{i=1}^{j-1} \langle v_j,q_i\rangle q_i, \qquad q_j=\frac{u_j}{\|u_j\|}.

At each step, the algorithm subtracts the projection onto the old subspace and keeps the new orthogonal residual.

Gram-Schmidt explains how orthonormal bases arise, why QR factorization exists, and how projection can be performed by successive removal of old components.