Linear regression studies how one set of variables depends approximately on another through a linear relationship.
Given observations of inputs and outputs, the goal is to construct a linear model that predicts the outputs from the inputs as accurately as possible.
Linear regression is one of the most important applications of linear algebra. It connects vectors, matrices, projections, least squares problems, optimization, statistics, and numerical computation into a single framework.
The central problem is simple:
Given measured data that may contain noise or inconsistency, find the linear relationship that best explains the observations.
109.1 Data and Models
Suppose we observe pairs of values
The variable is called the input, predictor, or feature. The variable is called the output, response, or target.
A linear regression model assumes that the output is approximately linear in the input:
Here:
| Symbol | Meaning |
|---|---|
| Intercept | |
| Slope |
The unknown coefficients and must be estimated from the data.
For example, suppose the data are:
| 1 | 2 |
| 2 | 3 |
| 3 | 5 |
| 4 | 4 |
The points do not lie exactly on a single line. Linear regression finds the line that best approximates them.
109.2 The Geometric Problem
Each observed point contributes an equation:
Collecting all equations gives
In matrix form:
where
The matrix is called the design matrix.
The vector contains the predicted values.
The problem usually has more equations than unknowns. Therefore an exact solution often does not exist.
Instead, we search for the vector that makes the error as small as possible.
109.3 Residuals
The residual vector is
Each component measures prediction error:
Large residuals correspond to poor predictions.
The goal is to minimize the total error.
A natural measure is the Euclidean norm:
Expanding gives
This quantity is called the residual sum of squares.
Linear regression minimizes this expression.
109.4 Least Squares
The least squares problem is
This is the central optimization problem of linear regression.
\min_{\beta}|y-A\beta|^2
The solution is the point in the column space of closest to .
Geometrically, linear regression is an orthogonal projection problem.
The vector is the orthogonal projection of onto the column space of .
The residual vector is orthogonal to every column of :
This condition produces the normal equations.
109.5 Normal Equations
Starting from
we obtain
A^TA\hat{\beta}=A^Ty
These are called the normal equations.
If is invertible, then
This formula gives the least squares solution.
The matrix
is called the Moore-Penrose pseudoinverse when has full column rank.
109.6 Example of Simple Linear Regression
Consider the data:
| 1 | 1 |
| 2 | 2 |
| 3 | 2 |
| 4 | 4 |
The design matrix is
The target vector is
Compute
and
The normal equations become
Solving gives
Thus the regression line is
109.7 Multiple Linear Regression
Linear regression extends naturally to multiple variables.
Suppose each observation has features:
The model becomes
In matrix form:
Now
Each row corresponds to one observation. Each column corresponds to one feature.
The least squares problem remains
The solution again satisfies
109.8 Projection Interpretation
Linear regression is fundamentally a projection problem.
The column space of contains all vectors representable by the model.
The observed vector may not lie in this subspace.
The least squares solution finds the closest vector inside the subspace.
If
then
where:
| Vector | Meaning |
|---|---|
| Projection onto column space | |
| Orthogonal residual |
The orthogonality condition is
This geometric interpretation explains why least squares works.
109.9 Orthogonal Projection Matrix
The projection matrix onto the column space of is
P=A(A^TA)^{-1}A^T
The predicted vector satisfies
The matrix has several important properties:
| Property | Meaning |
|---|---|
| Idempotent | |
| Symmetric | |
| Projection property |
The residual operator is
Thus
109.10 Statistical Interpretation
In statistics, linear regression models the response as
where is a random error vector.
Common assumptions are:
| Assumption | Meaning |
|---|---|
| Mean zero | |
| Constant variance | Equal noise variance |
| Independence | Errors independent |
| Normality | Errors Gaussian |
Under these assumptions, least squares estimators have strong statistical properties.
The estimator
is unbiased:
It is also the maximum likelihood estimator under Gaussian noise.
109.11 Rank and Identifiability
The regression problem depends critically on the rank of .
If the columns of are linearly independent, then
and is invertible.
If the columns are dependent, then multiple parameter vectors produce the same predictions.
This phenomenon is called multicollinearity.
For example, if one feature is an exact multiple of another, then the regression coefficients are not uniquely determined.
Rank deficiency causes instability and ill-conditioning.
109.12 Numerical Computation
The normal equations are conceptually simple but numerically unstable in some problems.
Modern numerical linear algebra usually solves regression problems using QR decomposition or singular value decomposition.
QR Method
If
where:
| Matrix | Property |
|---|---|
| Orthogonal | |
| Upper triangular |
then
This avoids forming , which can amplify numerical error.
SVD Method
If
then least squares solutions can be computed robustly even when is nearly singular.
The SVD reveals rank, conditioning, and geometric structure simultaneously.
109.13 Regularization
Large regression problems may overfit the data.
Regularization introduces additional constraints.
Ridge Regression
Ridge regression minimizes
|y-A\beta|^2+\lambda|\beta|^2
The parameter penalizes large coefficients.
The solution becomes
Lasso Regression
Lasso regression minimizes
The -penalty encourages sparse coefficients.
Regularization connects linear algebra with optimization and machine learning.
109.14 Polynomial Regression
Regression can model nonlinear relationships while remaining linear algebraically.
Suppose we fit
The model is nonlinear in but linear in the coefficients.
The design matrix becomes
The least squares framework remains unchanged.
This principle extends to arbitrary basis expansions.
109.15 Applications
Linear regression appears throughout science and engineering.
| Field | Example |
|---|---|
| Statistics | Trend estimation |
| Economics | Forecasting |
| Physics | Experimental fitting |
| Biology | Growth models |
| Machine learning | Predictive models |
| Signal processing | Parameter estimation |
| Computer vision | Camera calibration |
| Finance | Risk modeling |
| Engineering | System identification |
Many advanced machine learning methods are extensions of linear regression.
109.16 Geometric Summary
Linear regression unifies several major ideas in linear algebra.
| Concept | Role |
|---|---|
| Vectors | Observations and predictions |
| Matrices | Design operators |
| Column space | Model subspace |
| Orthogonality | Residual condition |
| Projection | Best approximation |
| Least squares | Optimization principle |
| Rank | Identifiability |
| Decompositions | Numerical algorithms |
The subject demonstrates how abstract linear algebra directly solves practical approximation problems.
109.17 Summary
Linear regression seeks the linear model that best approximates observed data.
Given a system
the least squares solution minimizes
The solution satisfies the normal equations
Geometrically, regression projects the observed vector onto the column space of the design matrix.
Computationally, regression relies on matrix factorizations such as QR decomposition and singular value decomposition.
Statistically, regression estimates relationships between variables under uncertainty.
Linear regression is therefore both a practical computational method and a central application of the geometry and algebra of vector spaces.