134.1 Introduction
Numerical optimization studies algorithms for finding minima or maxima of functions using finite computation.
Linear algebra is one of the foundations of numerical optimization. Vectors represent variables, matrices represent constraints and derivatives, and linear systems appear inside nearly every optimization algorithm.
The basic optimization problem is:
subject to
where:
| Symbol | Meaning |
|---|---|
| Unknown vector | |
| Objective function | |
| Feasible set |
Optimization problems appear throughout science and engineering:
| Area | Optimization task |
|---|---|
| Machine learning | Loss minimization |
| Statistics | Maximum likelihood |
| Control theory | Optimal control |
| Signal processing | Error minimization |
| Finance | Portfolio optimization |
| Physics | Energy minimization |
| Numerical PDEs | Variational methods |
The central challenge is computational: how to efficiently approximate optimal solutions in finite precision arithmetic.
134.2 Optimization Problems
An optimization problem consists of:
- Variables,
- An objective function,
- Constraints.
The objective function assigns a real number to each feasible point:
The feasible set contains all allowed vectors.
For example:
subject to
The objective measures squared distance from the origin. The constraint restricts the solution to a line.
Optimization seeks the feasible point with smallest objective value.
134.3 Linear Optimization
A linear optimization problem has:
| Component | Form |
|---|---|
| Objective | Linear |
| Constraints | Linear |
The standard form of linear programming is:
subject to
and
Here:
| Symbol | Meaning |
|---|---|
| Decision vector | |
| Cost vector | |
| Constraint matrix | |
| Constraint values |
The feasible region is a convex polyhedron.
Linear programming is important because:
- Many practical problems are naturally linear,
- Nonlinear problems are often approximated linearly,
- Efficient algorithms exist.
The geometry of linear programming is fundamentally linear-algebraic.
134.4 Least Squares Problems
One of the simplest optimization problems is least squares.
Given
the least squares problem minimizes
This problem appears when the system
has no exact solution.
The least squares solution satisfies the normal equations:
A^TAx=A^Tb
The matrix
is symmetric and positive semidefinite.
Least squares is central in regression, data fitting, signal reconstruction, and inverse problems.
134.5 Convex Optimization
A convex optimization problem minimizes a convex function over a convex set.
The problem is:
subject to
where:
| Property | Meaning |
|---|---|
| convex | Feasible region has no holes or bends inward |
| convex | Local minima are global |
Convex optimization is especially important because it avoids many pathological behaviors of general nonlinear optimization.
If is differentiable and convex, then:
implies is globally optimal.
Convexity transforms optimization from a potentially difficult global problem into a tractable local one.
134.6 Gradient Descent
Gradient descent is one of the most fundamental optimization algorithms.
Given a differentiable function
the gradient vector is
The gradient points in the direction of steepest increase.
Therefore:
points in the direction of steepest decrease.
Gradient descent updates:
where:
| Symbol | Meaning |
|---|---|
| Current iterate | |
| Step size | |
| Gradient |
x_{k+1}=x_k-\alpha_k\nabla f(x_k)
The algorithm repeatedly moves downhill.
134.7 Quadratic Optimization
A quadratic optimization problem minimizes a quadratic function:
where:
| Symbol | Meaning |
|---|---|
| Symmetric matrix | |
| Linear term | |
| Constant |
The gradient is
Setting the gradient equal to zero gives:
Thus quadratic optimization reduces to solving a linear system.
If is positive definite, then the problem has a unique minimizer.
Quadratic problems are central because many nonlinear functions are locally approximated by quadratics.
134.8 Hessian Matrices
The Hessian matrix collects second derivatives.
For a twice-differentiable function
the Hessian is
The Hessian measures local curvature.
Important cases:
| Hessian | Interpretation |
|---|---|
| Positive definite | Local minimum |
| Negative definite | Local maximum |
| Indefinite | Saddle point |
Second-order optimization methods use Hessians to accelerate convergence.
134.9 Newton’s Method
Newton’s method uses second-order information.
The update rule is:
x_{k+1}=x_k-H_f(x_k)^{-1}\nabla f(x_k)
The method approximates the objective locally by a quadratic function and minimizes that quadratic approximation.
Advantages:
| Advantage | Reason |
|---|---|
| Fast local convergence | Uses curvature |
| Scale adaptation | Hessian rescales directions |
Disadvantages:
| Disadvantage | Reason |
|---|---|
| Expensive | Requires solving linear systems |
| Hessian storage | Large matrices |
| Possible instability | Hessian may be singular |
Large-scale optimization often approximates Newton steps rather than computing them exactly.
134.10 Constrained Optimization
Many optimization problems include constraints.
Equality constraints have the form:
Inequality constraints have the form:
The feasible set becomes:
Constraints change the geometry of optimization.
The optimum may occur on a boundary rather than in the interior.
134.11 Lagrange Multipliers
Lagrange multipliers solve equality-constrained optimization problems.
Suppose we minimize
subject to
Define the Lagrangian:
Optimality conditions become:
and
This yields:
\nabla f(x)=-\lambda \nabla g(x)
Geometrically, the objective gradient becomes parallel to the constraint gradient.
134.12 Karush-Kuhn-Tucker Conditions
The Karush-Kuhn-Tucker conditions generalize Lagrange multipliers to inequality constraints.
For problems with constraints
the KKT conditions include:
| Condition | Meaning |
|---|---|
| Stationarity | Gradient balance |
| Primal feasibility | Constraints satisfied |
| Dual feasibility | Multipliers nonnegative |
| Complementary slackness | Active constraints identified |
The KKT system forms the basis of many modern optimization algorithms.
134.13 Interior-Point Methods
Interior-point methods solve constrained optimization problems by moving through the interior of the feasible region.
Rather than enforcing constraints exactly at each step, they use barrier functions such as:
The logarithm approaches infinity near the boundary, preventing the iterates from leaving the feasible region.
Interior-point methods are especially important in:
| Problem class | Application |
|---|---|
| Linear programming | Large sparse systems |
| Quadratic programming | Control and finance |
| Semidefinite programming | Convex matrix optimization |
These methods rely heavily on sparse linear algebra.
134.14 Conjugate Gradient Method
The conjugate gradient method solves large symmetric positive definite systems:
It may also be viewed as minimizing the quadratic function:
f(x)=\frac12x^TAx-b^Tx
Conjugate gradient constructs search directions that are mutually -orthogonal.
Advantages include:
| Advantage | Reason |
|---|---|
| No matrix inversion | Iterative method |
| Exploits sparsity | Efficient for large systems |
| Low memory | Matrix-vector products only |
Conjugate gradient is one of the central algorithms in numerical optimization.
134.15 Stochastic Gradient Descent
Stochastic gradient descent, or SGD, approximates gradients using random subsets of data.
Instead of computing:
one computes an estimate:
The update becomes:
SGD is fundamental in machine learning because exact gradients may be too expensive for massive datasets.
Noise in the gradient introduces randomness but greatly reduces computational cost.
134.16 Regularization
Optimization problems may be ill-posed or unstable.
Regularization adds penalty terms to stabilize solutions.
Examples include:
| Method | Objective |
|---|---|
| Ridge regression | |
| Lasso |
Ridge regression penalizes large Euclidean norm.
Lasso promotes sparsity.
Regularization changes both the geometry and numerical conditioning of optimization problems.
134.17 Duality
Many optimization problems possess dual formulations.
The primal problem:
may correspond to a dual problem involving multipliers or supporting hyperplanes.
Duality provides:
| Benefit | Interpretation |
|---|---|
| Lower bounds | Optimality certification |
| Sensitivity analysis | Constraint interpretation |
| Alternative algorithms | Easier computation |
Strong duality occurs when primal and dual optimal values coincide.
Convexity is closely tied to duality theory.
134.18 Numerical Stability
Optimization algorithms operate in finite precision arithmetic.
Important issues include:
| Issue | Effect |
|---|---|
| Roundoff error | Loss of accuracy |
| Ill-conditioning | Amplified perturbations |
| Cancellation | Numerical instability |
| Overflow/underflow | Computational failure |
Stable algorithms minimize error growth.
Condition numbers are especially important because they measure sensitivity of solutions to perturbations.
134.19 Sparse Optimization
Many optimization problems involve sparse matrices.
A sparse matrix has mostly zero entries.
Sparse linear algebra allows:
| Advantage | Reason |
|---|---|
| Lower memory usage | Store nonzeros only |
| Faster computation | Skip zero operations |
| Scalable algorithms | Large systems feasible |
Sparse optimization is fundamental in:
| Area | Typical problem |
|---|---|
| PDE discretization | Sparse linear systems |
| Network optimization | Graph sparsity |
| Machine learning | Sparse features |
| Signal processing | Compressed sensing |
134.20 Optimization and Linear Algebra
Nearly every optimization method relies on linear algebra.
| Optimization concept | Linear algebra role |
|---|---|
| Gradient | Vector |
| Hessian | Matrix |
| Constraints | Linear systems |
| Newton step | Solve matrix equation |
| Least squares | Orthogonal projection |
| Convexity | Positive semidefinite Hessian |
| Duality | Linear functionals |
| Iterative methods | Krylov subspaces |
Modern optimization is fundamentally computational linear algebra.
134.21 Summary
Numerical optimization studies computational methods for minimizing or maximizing functions.
The central concepts are:
| Concept | Meaning |
|---|---|
| Objective function | Quantity being optimized |
| Feasible set | Allowed region |
| Convexity | Guarantees global optimality |
| Gradient | First-order change |
| Hessian | Second-order curvature |
| Newton method | Quadratic local approximation |
| Least squares | Error minimization |
| Conjugate gradient | Iterative quadratic solver |
| Interior-point method | Barrier-based constrained optimization |
| Stochastic gradient | Randomized optimization |
| Duality | Complementary optimization formulation |
| Regularization | Stabilization through penalties |
Numerical optimization combines geometry, calculus, probability, and linear algebra into algorithms for solving large computational problems.