A least squares problem asks for the best approximate solution to a linear system that may have no exact solution. Instead of requiring
we choose so that is as close as possible to . The usual measure of closeness is the Euclidean norm of the residual:
Equivalently, we minimize the squared residual norm:
This problem appears whenever there are more equations than unknowns, noisy measurements, inconsistent constraints, or data that should be fitted by a simple model. Standard linear algebra texts describe least squares as solving as closely as possible by minimizing the squared error .
52.1 The Basic Problem
Let
The least squares problem is
Since the square root is increasing, this is equivalent to
The vector
is called the residual. The least squares solution is a vector for which
for every .
The vector is the closest vector to among all vectors in the column space of .
52.2 Why Exact Solutions May Fail
The equation
has a solution exactly when
If does not lie in the column space, no vector satisfies the system exactly.
This is common when . There may be more equations than unknowns. Such systems are called overdetermined. An overdetermined system may be consistent, but in applications with measurement error it usually is not.
Least squares replaces the exact equation by the approximation problem
The goal is to choose the vector in nearest to .
52.3 Geometric Interpretation
The set of all possible vectors is the column space of :
Thus the least squares problem is the problem of projecting onto .
Let
Then
for at least one vector . The residual is
The defining condition for orthogonal projection is
Thus least squares is an orthogonal projection problem. The method of least squares can be understood as finding the projection of a vector onto a column space.
52.4 Orthogonality Condition
Since the residual
is orthogonal to the column space of , it is orthogonal to every column of . This condition can be written as
Substituting , we obtain
Expanding gives
Therefore
These equations are called the normal equations. They characterize least squares solutions in the Euclidean norm.
52.5 The Normal Equations
The normal equations are
They form an linear system.
If has linearly independent columns, then is invertible, and the least squares solution is unique:
If does not have linearly independent columns, the normal equations may have infinitely many solutions. In that case, every solution of the normal equations gives the same projected vector , but the coefficient vector may not be unique.
The normal equations are central because they replace an inconsistent system by a consistent square system.
52.6 Why Is Invertible Under Full Column Rank
Assume the columns of are linearly independent. Then
Suppose
Multiply on the left by :
But
Thus
So
Since has linearly independent columns,
Therefore has trivial null space and is invertible.
52.7 Projection Matrix
If has full column rank, the projection of onto is
Using
we get
Thus the projection matrix onto the column space of is
This matrix satisfies
and
Hence it is an orthogonal projection matrix.
The residual is
Since and , we have
This is the orthogonal decomposition behind least squares.
52.8 The Objective Function
Define
Then
Expanding gives
This is a quadratic function of . Its gradient is
At a minimizer , the gradient must vanish:
Hence
Thus the normal equations can also be derived from calculus.
52.9 Example: An Inconsistent System
Consider
The system
means
There is no exact solution.
The least squares problem is
That is,
The normal equations are
Compute
and
Thus
so
The best approximate equation is
The residual is
Check orthogonality:
Thus the residual is orthogonal to the column space of .
52.10 Least Squares Line Fitting
Suppose data points
are to be fitted by a line
For each data point, the model gives
Collecting all equations gives
This has the form
The least squares estimate is
provided has full column rank.
The first column represents the intercept. The second column represents the slope. More complicated models add more columns.
52.11 Example: Fitting a Line
Fit a line
to the three data points
The design matrix and data vector are
Compute
Also,
The normal equations are
Solving,
Subtracting gives
so
Then
so
The least squares line is
52.12 Residuals in Data Fitting
For the fitted line above, the predicted values are
Thus
The residual vector is
The residual is orthogonal to both columns of :
This gives two equations:
and
For line fitting, these equations say that the residuals balance against the intercept column and the slope column.
52.13 Least Squares with Orthonormal Columns
If has orthonormal columns, then
The least squares problem
has normal equations
Since , this reduces to
The projected vector is
This is why orthonormal bases are useful. They make least squares coefficients easy to compute and avoid forming .
52.14 QR Solution of Least Squares
Let
where has orthonormal columns and is upper triangular and invertible. Then
Since has orthonormal columns, the least squares condition becomes
Thus
In practice, one solves the triangular system
by back substitution rather than explicitly computing .
QR methods are usually preferred over the normal equations for numerical stability. Forming can worsen conditioning, while orthogonal transformations preserve Euclidean norms.
52.15 Rank-Deficient Least Squares
If does not have full column rank, the least squares problem still has at least one solution, but the solution vector may not be unique.
The normal equations
remain valid. However, is singular.
If is one least squares solution and , then
Thus gives the same fitted vector and the same residual.
Among all least squares solutions, one often chooses the solution with smallest Euclidean norm. This solution can be computed using the singular value decomposition.
52.16 Weighted Least Squares
In some problems, not all residuals should be treated equally. If measurements have different reliability, more reliable measurements should receive greater weight.
Weighted least squares minimizes
where is a weighting matrix.
If is diagonal,
then the objective is
The normal equations become
Weighted least squares appears in statistics, inverse problems, and numerical modeling when observations have unequal variance or importance.
52.17 Least Squares and the Pseudoinverse
The Moore-Penrose pseudoinverse gives a compact expression for least squares solutions.
If has full column rank, then
and
For general , the vector
is the minimum-norm least squares solution.
The pseudoinverse is especially useful when is rectangular or rank deficient.
52.18 Error Decomposition
Let
be the projection of onto , and let
Since
only when the column space contains the origin and , , the Pythagorean theorem gives
The total squared size of splits into a fitted part and an unfitted part.
In regression language, the projection explains part of the variation, and the residual contains the part left unexplained by the chosen model.
52.19 Conditioning
The normal equations can be numerically sensitive.
If is ill-conditioned, then is often much more ill-conditioned. In the Euclidean norm,
Thus solving least squares by normal equations may lose accuracy.
More stable methods use orthogonal transformations:
| Method | Main idea |
|---|---|
| QR factorization | Reduce least squares to triangular solve |
| SVD | Handles rank deficiency and computes minimum-norm solutions |
| Iterative methods | Useful for large sparse problems |
| Regularization | Stabilizes ill-conditioned problems |
The normal equations are important theoretically, but QR and SVD are often better computational tools.
52.20 Summary
A least squares problem seeks
Geometrically, is the orthogonal projection of onto . The residual
satisfies
This orthogonality gives the normal equations:
If has full column rank, the solution is unique:
Least squares connects linear systems, projection, approximation, data fitting, and numerical computation. It is one of the main uses of inner product geometry in applied linear algebra.