The Gram-Schmidt process converts a linearly independent list of vectors into an orthogonal or orthonormal list spanning the same subspace. It is one of the standard constructions in inner product spaces and is the basic theoretical source of the QR factorization used in numerical linear algebra. The process repeatedly subtracts projections onto previously constructed directions, leaving a residual vector orthogonal to all earlier ones.
50.1 The Problem
Let
v1,v2,…,vk
be linearly independent vectors in an inner product space V. We want to construct vectors
q1,q2,…,qk
such that:
Requirement
Meaning
Same span
span(q1,…,qj)=span(v1,…,vj) for each j
Orthogonality
⟨qi,qj⟩=0 whenever i=j
Normalization
∥qj∥=1 for each j
The output is an orthonormal basis for the same subspace spanned by the original vectors.
The key idea is simple. Keep the part of each new vector that points in a new direction. Remove all parts that already lie in the directions constructed earlier.
50.2 Projection onto a Unit Vector
Suppose q is a unit vector. The projection of v onto the line spanned by q is
projq(v)=⟨v,q⟩q.
This formula is simple because ∥q∥=1. For a non-unit vector u, the projection would be
proju(v)=⟨u,u⟩⟨v,u⟩u.
Gram-Schmidt uses these projections to remove the parts of a vector that lie in old directions.
50.3 First Step
Start with the first vector v1. Since the list is linearly independent, v1=0.
Define
u1=v1.
Normalize it:
q1=∥u1∥u1.
Then q1 is a unit vector and
span(q1)=span(v1).
The first direction is unchanged except for scaling.
50.4 Second Step
The second vector v2 may have a component in the direction of q1. Remove that component:
u2=v2−⟨v2,q1⟩q1.
Then u2 is orthogonal to q1. Indeed,
⟨u2,q1⟩=⟨v2−⟨v2,q1⟩q1,q1⟩.
Using linearity,
⟨u2,q1⟩=⟨v2,q1⟩−⟨v2,q1⟩⟨q1,q1⟩.
Since q1 is a unit vector,
⟨q1,q1⟩=1.
Therefore
⟨u2,q1⟩=0.
Normalize:
q2=∥u2∥u2.
Now q1,q2 are orthonormal and span the same subspace as v1,v2.
50.5 General Step
Assume that
q1,…,qj−1
have already been constructed and are orthonormal.
To construct the next vector, subtract from vj its projections onto all previous qi:
uj=vj−i=1∑j−1⟨vj,qi⟩qi.
Then normalize:
qj=∥uj∥uj.
The vector uj is orthogonal to every earlier qi. For m<j,
⟨uj,qm⟩=⟨vj−i=1∑j−1⟨vj,qi⟩qi,qm⟩.
By linearity,
⟨uj,qm⟩=⟨vj,qm⟩−i=1∑j−1⟨vj,qi⟩⟨qi,qm⟩.
Since the qi are orthonormal,
⟨qi,qm⟩=δim.
Thus only one term remains:
⟨uj,qm⟩=⟨vj,qm⟩−⟨vj,qm⟩=0.
So uj is orthogonal to all previous directions.
50.6 Why uj Is Nonzero
The vector uj must be nonzero. If
uj=0,
then
vj=i=1∑j−1⟨vj,qi⟩qi.
This would imply
vj∈span(q1,…,qj−1).
But
span(q1,…,qj−1)=span(v1,…,vj−1).
Thus vj would lie in the span of the earlier vi, contradicting linear independence.
Therefore uj=0, so normalization is valid.
50.7 The Algorithm
Given linearly independent vectors
v1,…,vk,
the Gram-Schmidt algorithm is:
u1=v1,q1=∥u1∥u1.
For j=2,…,k,
uj=vj−i=1∑j−1⟨vj,qi⟩qi,
and
qj=∥uj∥uj.
The result is an orthonormal list
q1,…,qk
such that
span(q1,…,qj)=span(v1,…,vj)
for every j.
50.8 Example in R2
Let
v1=[11],v2=[10].
First,
u1=v1=[11].
Its norm is
∥u1∥=12+12=2.
Thus
q1=21[11].
Now remove from v2 its projection onto q1:
⟨v2,q1⟩=⟨[10],21[11]⟩=21.
Therefore
u2=v2−⟨v2,q1⟩q1=[10]−21⋅21[11].
So
u2=[10]−21[11]=[1/2−1/2].
The norm is
∥u2∥=41+41=21.
Thus
q2=∥u2∥u2=21[1−1].
The final orthonormal basis is
q1=21[11],q2=21[1−1].
50.9 Example in R3
Let
v1=101,v2=110,v3=011.
First,
u1=v1.
Since
∥u1∥=2,
we have
q1=21101.
Next,
⟨v2,q1⟩=21.
Thus
u2=v2−21q1=110−21101=1/21−1/2.
Its norm is
∥u2∥=41+1+41=23.
Hence
q2=6112−1.
Now compute u3:
u3=v3−⟨v3,q1⟩q1−⟨v3,q2⟩q2.
We have
⟨v3,q1⟩=21,
and
⟨v3,q2⟩=61.
Therefore
u3=011−21101−6112−1.
This gives
u3=−2/32/32/3.
Its norm is
∥u3∥=94+94+94=32.
Thus
q3=31−111.
The three vectors q1,q2,q3 form an orthonormal basis of R3.
50.10 Orthogonal Version
Sometimes one wants an orthogonal basis rather than an orthonormal basis. In that case, stop before normalization.
Define
u1=v1,
and
uj=vj−i=1∑j−1⟨ui,ui⟩⟨vj,ui⟩ui.
Then
u1,…,uk
are nonzero and mutually orthogonal.
The normalized vectors are
qj=∥uj∥uj.
The orthogonal version is often easier to write when the intermediate vectors are not unit vectors.
50.11 Span Preservation
Gram-Schmidt preserves spans step by step:
span(u1,…,uj)=span(v1,…,vj).
This follows from the construction. Each uj is built from vj and earlier qi, which lie in the earlier span. Hence
uj∈span(v1,…,vj).
Conversely,
vj=uj+i=1∑j−1⟨vj,qi⟩qi,
so vj lies in the span of u1,…,uj.
Thus the two lists generate the same subspace at each stage.
50.12 QR Factorization
Gram-Schmidt gives a factorization of a matrix.
Let A be an m×n matrix with linearly independent columns
a1,…,an.
Apply Gram-Schmidt to these columns and obtain orthonormal vectors
q1,…,qn.
Let Q be the m×n matrix with columns qi. Then
QTQ=I.
Each column aj lies in
span(q1,…,qj).
Thus
aj=r1jq1+⋯+rjjqj.
Collecting these equations gives
A=QR,
where R is upper triangular.
The entries of R are
rij=⟨aj,qi⟩for i≤j,
and
rjj=∥uj∥.
The QR factorization is one of the main computational uses of Gram-Schmidt.
50.13 Classical Gram-Schmidt
The formula
uj=vj−i=1∑j−1⟨vj,qi⟩qi
is called classical Gram-Schmidt.
It subtracts all projections of vj using inner products computed against the original vector vj.
In exact arithmetic, this produces an orthonormal basis. In floating point arithmetic, the computed vectors may lose orthogonality when the original vectors are nearly linearly dependent.
This numerical weakness matters in practical computation. For stable algorithms, one often uses modified Gram-Schmidt, Householder reflections, or Givens rotations.
50.14 Modified Gram-Schmidt
Modified Gram-Schmidt subtracts projections one at a time from the current residual.
Start with
w=vj.
For i=1,…,j−1, compute
rij=⟨w,qi⟩,
then replace
w←w−rijqi.
After all previous directions have been removed, set
uj=w,qj=∥uj∥uj.
In exact arithmetic, classical and modified Gram-Schmidt give the same result. In floating point arithmetic, modified Gram-Schmidt usually preserves orthogonality better.
The reason is that each projection is removed from the current residual rather than from the original vector.
50.15 Linear Dependence and Zero Residuals
If the input list is linearly dependent, Gram-Schmidt produces a zero residual at some step.
That is, for some j,
uj=0.
This means
vj∈span(v1,…,vj−1).
In this case, vj contributes no new direction and cannot be normalized.
A practical version of the algorithm skips zero residuals. If uj=0, discard vj. The remaining nonzero residuals form an orthogonal basis for the span of the original list.
This gives a method for extracting a basis from a spanning set.
50.16 Gram-Schmidt as Projection Residual
At each step, let
Wj−1=span(q1,…,qj−1).
The vector
i=1∑j−1⟨vj,qi⟩qi
is the orthogonal projection of vj onto Wj−1.
Thus
uj=vj−projWj−1(vj).
So uj is the residual after projecting vj onto the old subspace.
This gives the geometric interpretation:
vj=old part+new orthogonal part.
Gram-Schmidt keeps the new orthogonal part and normalizes it.
50.17 Gram Matrices and Nonzero Pivots
Let
Gj=[⟨vi,vl⟩]i,l=1j
be the Gram matrix of the first j input vectors.
If v1,…,vj are linearly independent, then Gj is positive definite. This guarantees that the construction has no zero residuals.
The squared lengths
∥uj∥2
measure the amount of genuinely new direction contributed by vj beyond the earlier span.
If ∥uj∥ is very small, then vj is nearly dependent on the earlier vectors. This is a warning sign for numerical instability.
50.18 Function Space Example
Gram-Schmidt also applies to function spaces.
Let V be the space of polynomials on [−1,1] with inner product
⟨f,g⟩=∫−11f(x)g(x)dx.
Start with
1,x,x2,…
Applying Gram-Schmidt produces an orthogonal sequence of polynomials. With a conventional normalization, this gives the Legendre polynomials.
For example,
p0(x)=1.
Since
⟨x,1⟩=∫−11xdx=0,
the polynomial x is already orthogonal to 1.
For x2, subtract its projection onto 1. Since
⟨x2,1⟩=∫−11x2dx=32,
and
⟨1,1⟩=2,
the projection coefficient is
⟨1,1⟩⟨x2,1⟩=31.
Thus the new orthogonal polynomial is
x2−31.
After scaling, this corresponds to the second Legendre polynomial.
50.19 Numerical Remarks
Gram-Schmidt is conceptually important, but its classical form can be numerically fragile. When vectors are nearly linearly dependent, subtraction may remove almost equal quantities. This causes cancellation and loss of orthogonality.
For numerical QR factorization, Householder transformations are usually preferred for high stability. Modified Gram-Schmidt is often acceptable and easier to implement in iterative methods.
The correct algorithm depends on the context:
Method
Main use
Classical Gram-Schmidt
Theory, simple derivations
Modified Gram-Schmidt
Iterative methods, improved orthogonality
Householder QR
Stable dense QR factorization
Givens rotations
Sparse QR, selective zeroing
The mathematical goal is the same: replace a basis by an orthonormal basis for the same subspace.
50.20 Summary
Gram-Schmidt orthogonalization converts a linearly independent list
v1,…,vk
into an orthonormal list
q1,…,qk
with the same successive spans.
The construction is
uj=vj−i=1∑j−1⟨vj,qi⟩qi,qj=∥uj∥uj.
At each step, the algorithm subtracts the projection onto the old subspace and keeps the new orthogonal residual.
Gram-Schmidt explains how orthonormal bases arise, why QR factorization exists, and how projection can be performed by successive removal of old components.
← → section · ↑ ↓ slide · Space next · F fullscreen · Esc exit