Optimization studies how to choose variables so that a quantity becomes as small or as large as possible. Linear algebra enters optimization because many objective functions, constraints, approximations, and algorithms are naturally written with vectors and matrices.
A standard optimization problem has the form
where is the vector of decision variables and is the objective function. With constraints, the problem becomes
This is the usual mathematical form of a constrained optimization problem. Linear algebra provides the notation and the computational machinery for expressing such problems, especially least squares, linear programming, quadratic programming, Newton methods, and constrained convex optimization.
110.1 Decision Variables
The unknown in an optimization problem is usually a vector
Each component represents a quantity to be chosen.
For example, in portfolio optimization, the entries of may represent asset weights. In machine learning, they may represent model parameters. In engineering design, they may represent physical dimensions. In numerical linear algebra, they may represent an approximate solution to a system.
The objective function assigns a scalar value to each vector . The optimizer searches for a vector that gives the best value.
110.2 Linear Objective Functions
The simplest objective function is linear:
Here is a fixed vector of coefficients.
Expanding,
A linear objective has constant slope. Its level sets are hyperplanes.
Linear objectives appear in linear programming, resource allocation, transportation problems, scheduling, and economic planning.
A linear optimization problem with linear constraints has the form
Here the feasible set is described by matrix inequalities.
110.3 Quadratic Objective Functions
The next important class is quadratic optimization.
A quadratic objective has the form
Here is an matrix, is a vector, and is a scalar.
The matrix controls curvature. If is symmetric positive definite, the function has a unique global minimizer.
The gradient is
assuming is symmetric.
The Hessian is
Thus quadratic optimization is directly governed by a matrix.
Quadratic functions are important because second-order Taylor approximation replaces a smooth nonlinear function near a point by a quadratic model. In optimization, this is the basis for Newton methods and many local algorithms.
110.4 Positive Definite Matrices
A symmetric matrix is positive definite if
for every nonzero vector .
Positive definiteness means that the quadratic form curves upward in every direction. In optimization, this corresponds to strict convexity for the quadratic part.
If is positive definite, then
has a unique minimizer.
The minimizer satisfies
Therefore
and
This formula shows the direct relation between minimization and solving a linear system.
110.5 Gradients
The gradient of a scalar-valued function
is the vector
The gradient points in the direction of steepest increase. Therefore the vector
points in the direction of steepest local decrease.
Gradient-based methods build sequences
that move toward a minimizer.
The most basic method is gradient descent:
Here is the step size.
110.6 Hessians
The Hessian matrix of a twice differentiable scalar function is the matrix of second partial derivatives:
The Hessian describes local curvature. When the second partial derivatives are continuous, the Hessian is symmetric.
The Hessian is central in second-order optimization methods.
At a critical point where
the Hessian helps classify the point.
| Hessian condition | Interpretation |
|---|---|
| Positive definite | Strict local minimum |
| Negative definite | Strict local maximum |
| Indefinite | Saddle point |
| Positive semidefinite | Possible local minimum |
110.7 Taylor Approximation
Near a point , a smooth function can be approximated by its second-order Taylor expansion:
This expression is a quadratic model in the step vector .
Optimization algorithms often choose by minimizing this local model.
If the Hessian is positive definite, the minimizer of the quadratic model satisfies
This is a linear system.
Thus each Newton step requires solving a linear algebra problem.
110.8 Newton’s Method
Newton’s method uses first and second derivative information.
At iteration , it solves
for the step , then updates
If the Hessian is positive definite near the solution, Newton’s method can converge rapidly. In many settings, Newton’s method has quadratic local convergence, while basic gradient descent often has only linear convergence.
The computational cost lies in forming or approximating the Hessian and solving the linear system.
Linear algebra determines both cost and stability.
110.9 Least Squares as Optimization
Least squares is one of the most important optimization problems in linear algebra.
Given
the least squares problem is
The objective can be expanded as
Thus least squares is a quadratic optimization problem.
The gradient is
Setting the gradient equal to zero gives
These are the normal equations. Least squares is therefore both a projection problem and an optimization problem.
110.10 Constrained Optimization
Many optimization problems include constraints.
A linear equality constraint has the form
A linear inequality constraint has the form
A constrained problem may be written as
Linear algebra appears in the description of the feasible set. The equation describes an affine subspace when the system is consistent.
The null space of describes feasible directions. If is one feasible point, then every feasible point has the form
where
Thus constrained optimization can often be reduced to optimization over a subspace.
110.11 Lagrange Multipliers
Consider the equality-constrained problem
The Lagrangian is
Here is the vector of Lagrange multipliers.
At a constrained optimum, under suitable regularity conditions,
and
For a quadratic objective
the conditions become
These equations can be written as the block linear system
This matrix is called a KKT matrix. It is fundamental in constrained optimization.
110.12 Convexity
A function is convex if for all and all ,
Convexity means that the function has no false local minima.
For a twice differentiable function, convexity is closely connected to the Hessian:
for all in the domain.
Here means positive semidefinite.
Convex optimization is important because local optimality implies global optimality under standard convex assumptions. Linear algebra enters through gradients, Hessians, affine constraints, cones, projections, and matrix inequalities.
110.13 Projection Methods
Projection methods use linear algebra to enforce constraints.
Given a closed subspace , the projection of a vector onto is the closest point in .
If is the column space of a matrix with full column rank, the projection matrix is
Projection methods appear in constrained optimization, least squares, alternating projections, projected gradient descent, and signal reconstruction.
Projected gradient descent has the form
where denotes projection onto the feasible set .
When is a subspace or affine space, this projection is a linear algebra operation.
110.14 Quadratic Programming
A quadratic program has the form
If is positive semidefinite, the problem is convex.
Quadratic programs appear in support vector machines, portfolio optimization, model predictive control, structural mechanics, and constrained least squares.
Solving quadratic programs usually involves repeated solution of linear systems, factorization of KKT matrices, or iterative methods.
110.15 Eigenvalues and Conditioning
Eigenvalues describe the geometry of quadratic objectives.
For
where is symmetric positive definite, the level sets are ellipsoids.
The eigenvectors of give the principal axes. The eigenvalues determine curvature along those axes.
The condition number
measures how elongated the level sets are.
A large condition number makes optimization harder. Gradient descent may move slowly because the function is steep in some directions and flat in others.
This explains why preconditioning is important.
110.16 Preconditioning
Preconditioning changes the geometry of an optimization problem.
Instead of moving by
one may move by
where approximates the Hessian or another curvature matrix.
The goal is to transform an ill-conditioned problem into a better-conditioned one.
For linear systems and quadratic optimization, preconditioning often determines whether an iterative method is practical.
A good preconditioner should be cheap to apply and close enough to the original curvature to improve convergence.
110.17 Conjugate Gradient Method
The conjugate gradient method solves systems
where is symmetric positive definite.
It can also be interpreted as minimizing the quadratic function
Conjugate gradient uses search directions that are conjugate with respect to . This means
when .
Unlike ordinary gradient descent, conjugate gradient accounts for the geometry induced by . It is especially useful for large sparse systems where forming a factorization would be too expensive.
110.18 Linear Algebra in Modern Optimization
Modern optimization algorithms rely heavily on linear algebra.
| Optimization idea | Linear algebra object |
|---|---|
| Variables | Vectors |
| Linear constraints | Matrix equations |
| Quadratic objectives | Quadratic forms |
| Curvature | Hessians |
| Newton steps | Linear systems |
| Least squares | Projections |
| Regularization | Norms and penalties |
| Conditioning | Singular values and eigenvalues |
| Large-scale methods | Sparse matrices |
| Preconditioning | Approximate inverses |
| Low-rank methods | Matrix decompositions |
Even nonlinear optimization uses linear algebra locally. At each iteration, the nonlinear problem is approximated by a linear or quadratic problem.
110.19 Summary
Optimization and linear algebra are tightly connected.
Linear algebra supplies the language of vectors, matrices, subspaces, projections, eigenvalues, and decompositions. Optimization uses that language to choose the best vector under an objective and possible constraints.
The most important bridge is the quadratic model. Least squares, Newton methods, quadratic programming, and conjugate gradient methods all reduce optimization to structured linear algebra.
In practice, the success of an optimization method often depends on matrix structure: rank, sparsity, positive definiteness, conditioning, and the availability of stable factorizations.