139.1 Introduction
Matrix calculus studies derivatives of functions involving vectors and matrices.
It extends ordinary differential calculus to multidimensional linear-algebraic settings. Matrix calculus is fundamental in optimization, statistics, control theory, machine learning, numerical analysis, and differential geometry.
The main objects are:
| Object | Example |
|---|---|
| Scalar-valued functions | |
| Vector-valued functions | |
| Matrix-valued functions |
Matrix calculus provides rules for gradients, Jacobians, Hessians, and derivatives of matrix expressions.
Modern optimization and machine learning rely heavily on matrix derivatives because objective functions are usually expressed using vectors and matrices.
139.2 Scalars, Vectors, and Matrices
Throughout this chapter:
| Symbol | Meaning |
|---|---|
| Column vector | |
| Matrix | |
| Scalar-valued function | |
| Vector-valued function |
A scalar function maps vectors to numbers:
A vector function maps vectors to vectors:
The derivative structures depend on the dimensions of the input and output.
139.3 Directional Derivatives
Let
The directional derivative of at in direction is
It measures the instantaneous rate of change of in direction .
The directional derivative is linear in the direction vector:
Directional derivatives lead naturally to gradients and Jacobians.
139.4 Gradient
For a differentiable scalar function
the gradient is the column vector
The gradient gives the direction of steepest increase.
The directional derivative satisfies
D_vf(x)=\nabla f(x)^Tv
Thus the gradient converts local change into a linear-algebraic inner product.
139.5 Jacobian Matrix
Let
with components
The Jacobian matrix is
The Jacobian represents the best linear approximation of near :
The Jacobian generalizes the derivative matrix from single-variable calculus.
139.6 Hessian Matrix
For a twice-differentiable scalar function
the Hessian matrix is
Explicitly,
If mixed partial derivatives commute, then
The Hessian describes local curvature.
139.7 Differential Notation
Matrix calculus is often cleaner using differentials.
Suppose
The differential is written
If is differentiable, then
For vector functions,
Differential notation treats derivatives as linear maps acting on infinitesimal increments.
This viewpoint is coordinate-free and especially useful for matrix expressions.
139.8 Derivative of Linear Functions
Let
where
Then
The function is already linear, so its derivative is constant.
More generally, for
the Jacobian is
Linear maps are their own derivatives.
139.9 Derivative of Quadratic Forms
Consider the quadratic form
Its differential is
Using transpose identities,
Thus:
Therefore,
If is symmetric, this simplifies to
\nabla(x^TAx)=(A+A^T)x
Quadratic forms are central in optimization and statistics.
139.10 Least Squares Derivatives
Let
Expand:
Differentiating gives
Setting the gradient equal to zero yields the normal equations:
This derivation is fundamental in least squares theory and machine learning.
139.11 Chain Rule
Suppose
Then
The Jacobian satisfies
J_H(x)=J_G(F(x))J_F(x)
Thus derivatives compose by matrix multiplication.
This rule is the foundation of backpropagation in neural networks.
139.12 Matrix-by-Matrix Derivatives
Sometimes the variable itself is a matrix.
Suppose
The differential is
Thus the derivative with respect to is
Matrix derivatives are often expressed using trace identities because traces linearize matrix expressions.
139.13 Trace Identities
Trace identities are heavily used in matrix calculus.
Important formulas include:
| Identity | Formula |
|---|---|
| Cyclic property | |
| Transpose invariance | |
| Inner product |
These identities allow complicated derivatives to be rewritten in manageable form.
139.14 Derivative of the Determinant
Let
be a differentiable family of invertible matrices.
Then
This is Jacobi’s formula.
Equivalently,
Log-determinants appear frequently in statistics, covariance estimation, and optimization.
139.15 Derivative of the Matrix Inverse
Suppose
is invertible.
Since
differentiating gives
Solving for the derivative:
(A^{-1})’=-A^{-1}A’A^{-1}
This identity appears constantly in optimization and sensitivity analysis.
139.16 Derivative of Eigenvalues
Suppose
is a differentiable family of symmetric matrices.
Let
with normalized eigenvector
Differentiating gives
Thus eigenvalue sensitivity depends on quadratic forms of the perturbation.
This formula is important in perturbation theory and optimization involving spectra.
139.17 Matrix Exponential Derivatives
The matrix exponential is
If
then differentiating is more complicated because matrices may not commute.
If
then
Without commutativity, integral formulas are needed.
Matrix exponentials appear in differential equations and control theory.
139.18 Automatic Differentiation
Automatic differentiation computes derivatives algorithmically using the chain rule.
It is neither symbolic differentiation nor finite-difference approximation.
Instead, computations are decomposed into elementary operations, and derivatives propagate through the computation graph.
Two major modes are:
| Mode | Efficient when |
|---|---|
| Forward mode | Few inputs |
| Reverse mode | Few outputs |
Reverse-mode automatic differentiation underlies backpropagation in deep learning.
Matrix calculus provides the mathematical foundation for these algorithms.
139.19 Backpropagation
Neural networks are compositions of matrix operations and nonlinearities.
A layer often has the form
The loss function depends on the final output.
Backpropagation computes gradients efficiently by repeatedly applying the chain rule backward through the network.
At each stage:
- Compute local derivatives,
- Multiply by incoming sensitivities,
- Propagate backward.
This process is fundamentally matrix calculus.
139.20 Differential Geometry Perspective
Matrix calculus may also be interpreted geometrically.
The derivative of
at is the linear map
best approximating near .
The Jacobian matrix is simply a coordinate representation of this linear map.
This viewpoint extends naturally to manifolds and tensor calculus.
139.21 Summary
Matrix calculus extends differentiation to vector and matrix expressions.
The main concepts are:
| Concept | Meaning |
|---|---|
| Gradient | First derivative of scalar function |
| Jacobian | Derivative matrix of vector function |
| Hessian | Matrix of second derivatives |
| Differential | Linear approximation notation |
| Chain rule | Composition of derivatives |
| Quadratic-form derivative | |
| Least squares gradient | |
| Trace calculus | Matrix derivative simplification |
| Determinant derivative | Jacobi formula |
| Inverse derivative | |
| Eigenvalue derivative | Spectral sensitivity |
| Automatic differentiation | Algorithmic chain-rule propagation |
Matrix calculus provides the language for optimization, machine learning, statistics, control theory, and numerical computation. It transforms multivariable differentiation into structured linear algebra.