Eigenvalue problems are fundamental in numerical analysis, optimization, physics, graph methods, control theory, and machine learning. They are also among the most subtle...
Eigenvalue problems are fundamental in numerical analysis, optimization, physics, graph methods, control theory, and machine learning. They are also among the most subtle operations in automatic differentiation.
The main difficulty is that eigenvectors are not stable objects. Small perturbations in the matrix can produce large changes in eigenvectors when eigenvalues are close or repeated. The derivative formulas reveal this instability directly.
Eigenvalues and Eigenvectors
For a square matrix
an eigenpair satisfies
where
The scalar is an eigenvalue and is an eigenvector.
For symmetric matrices,
all eigenvalues are real, and there exists an orthonormal basis of eigenvectors:
where
and
Symmetric eigenproblems are much more stable and better behaved for AD than general nonsymmetric eigenproblems.
Why Eigenproblems Matter
Eigenvalue computations appear in:
| Application | Role |
|---|---|
| PCA | Principal directions |
| Spectral clustering | Graph Laplacian eigenvectors |
| Stability analysis | System modes |
| Differential equations | Modal decomposition |
| Quantum mechanics | Hamiltonian spectra |
| Covariance analysis | Principal components |
| Optimization | Hessian curvature |
| Graph neural networks | Spectral filters |
Many machine learning systems differentiate through eigendecompositions implicitly or explicitly.
Differential of Eigenvalues
Assume is symmetric with normalized eigenvectors:
The eigenvalue equation is
Differentiate:
Left-multiply by :
Since
we obtain
The terms cancel:
This is the first-order perturbation formula for eigenvalues.
The gradient of the eigenvalue with respect to the matrix is
This formula is stable even when eigenvectors are not.
Reverse Rule for Eigenvalues
Suppose a scalar loss depends only on eigenvalues:
Let
Then
Substitute the perturbation formula:
Rewrite using trace:
Therefore,
This rule is clean and stable for symmetric matrices.
Differential of Eigenvectors
Eigenvectors are more difficult.
Assume distinct eigenvalues. Differentiate:
Rearrange:
Expand in the eigenbasis:
Project onto :
Since
we obtain
Thus
Therefore
This formula is extremely important.
It shows that eigenvector derivatives contain terms:
If eigenvalues are close, derivatives become large. If eigenvalues are equal, the derivative becomes undefined.
Geometric Meaning
Eigenvectors are unstable because only the eigenspace is intrinsic.
Suppose
Then any orthonormal basis of the corresponding eigenspace is valid. The individual vectors are arbitrary.
Thus the mapping
is not continuous at repeated eigenvalues.
But the projector onto the eigenspace:
is stable and differentiable under broader conditions.
This distinction matters in AD. Stable objectives should depend on invariant subspaces rather than arbitrary basis vectors.
Symmetric vs Nonsymmetric Problems
For symmetric matrices:
| Property | Result |
|---|---|
| Eigenvalues | Real |
| Eigenvectors | Orthogonal |
| Decomposition | Stable |
| Gradient formulas | Relatively clean |
For general matrices:
| Property | Result |
|---|---|
| Eigenvalues | May be complex |
| Eigenvectors | May be nonorthogonal |
| Jordan blocks | Possible |
| Perturbation theory | Much harder |
Most ML and scientific computing systems restrict differentiable eigendecompositions to symmetric or Hermitian matrices.
Spectral Functions
A spectral function depends only on eigenvalues:
Examples:
| Function | Spectral Form |
|---|---|
| Matrix exponential | |
| Matrix logarithm | |
| Matrix square root | |
| Matrix inverse |
Differentiating spectral functions involves differentiating eigenvalues and eigenvectors simultaneously.
The resulting formulas often contain divided differences:
These expressions remain finite when eigenvalues coincide if handled carefully.
Example: Largest Eigenvalue
Consider
If the largest eigenvalue is simple, then
Thus
This appears in:
| Application | Use |
|---|---|
| Spectral norm regularization | Largest singular/eigenvalue |
| Stability constraints | Spectral radius |
| Graph methods | Leading eigenvector |
| Optimization | Convex spectral penalties |
If the largest eigenvalue is repeated, the gradient becomes set-valued.
Rayleigh Quotient
The Rayleigh quotient is
For symmetric , stationary points are eigenvectors.
Differentiate numerator and denominator:
When is normalized:
this simplifies to
At an eigenvector,
the second term vanishes.
Then
The Rayleigh quotient connects optimization and eigensystems directly.
Orthogonality Constraints
Eigenvector matrices satisfy
Differentiate:
Therefore
is skew-symmetric.
This constraint appears in all orthogonal matrix derivatives: QR, eigendecomposition, SVD, Stiefel manifolds, and orthogonal parameterizations.
A backward rule must preserve this geometry.
Reverse Mode for Symmetric Eigendecomposition
Suppose
and a loss depends on both and .
Let:
Define matrix :
Then the symmetric reverse rule is:
Symmetrization is usually required:
The term involving is precisely where instability appears when eigenvalues collide.
Eigenvalue Crossing
Suppose two eigenvalues approach each other:
Then
The gradient can explode.
This is not a bug in AD. The mathematical derivative itself becomes singular.
Near eigenvalue crossings:
| Quantity | Stability |
|---|---|
| Eigenvalues | Stable |
| Eigenspaces | Usually stable |
| Individual eigenvectors | Unstable |
This distinction is critical in spectral machine learning methods.
Numerical Stability
Practical eigendecomposition differentiation often requires:
| Technique | Purpose |
|---|---|
| Eigenvalue regularization | Avoid collisions |
| Symmetrization | Remove asymmetric noise |
| Projector objectives | Avoid unstable basis gradients |
| Truncated spectra | Ignore small unstable modes |
| Stable spectral gaps | Improve conditioning |
Spectral methods are fundamentally conditioning-sensitive.
Power Iteration and Implicit Differentiation
Some systems compute dominant eigenvectors iteratively:
AD can:
- Differentiate through all iterations.
- Use implicit differentiation on the fixed-point equation.
Implicit methods are often cheaper and more stable for long iterative processes.
The fixed-point condition is:
Differentiating the fixed-point equation yields a linear system for the derivative.
This approach generalizes to many iterative eigensolvers.
Graph Laplacians
Graph learning frequently uses eigendecompositions of Laplacians:
The smallest eigenvectors encode graph geometry.
Spectral clustering, diffusion maps, graph embeddings, and manifold learning all rely on these spectra.
Repeated or nearly repeated eigenvalues are common in graphs with weakly connected components. This can make eigenvector gradients unstable.
Using subspace projectors instead of raw eigenvectors often improves robustness.
Spectral Radius
The spectral radius is
It controls:
| Domain | Meaning |
|---|---|
| Dynamical systems | Stability |
| Recurrent networks | Gradient growth |
| Numerical methods | Convergence |
| Control theory | System response |
Differentiating spectral radius is difficult when multiple eigenvalues attain the maximum magnitude.
Implementation Considerations
A production AD implementation for eigendecomposition should track:
matrix symmetry
eigenvalue ordering
degeneracy handling
batch dimensions
dtype
complex vs real arithmetic
orthogonality conventionsThe backward rule should explicitly define behavior when eigenvalues are repeated.
Many frameworks either:
| Strategy | Behavior |
|---|---|
| Return NaNs | Signal undefined derivative |
| Use approximate stabilization | Numerically smoother |
| Document undefined region | User responsibility |
No implementation can make fundamentally undefined derivatives well-defined.
Practical Guidance
Prefer:
| Stable Objective | Avoid |
|---|---|
| Eigenvalues | Raw eigenvectors |
| Subspace projectors | Arbitrary basis vectors |
| Spectral traces | Unstable spectral coordinates |
| Orthogonally invariant losses | Basis-dependent losses |
| Symmetric problems | General nonsymmetric spectra |
When possible, formulate objectives using:
rather than itself.
Summary
Eigenvalue problems reveal the geometric limits of automatic differentiation. Eigenvalues have stable first-order perturbations. Eigenvectors do not. Their derivatives contain inverse spectral gaps, making gradients unstable or undefined near repeated eigenvalues.
Differentiating eigendecompositions correctly requires respecting orthogonality constraints, spectral degeneracy, and basis ambiguity. Stable formulations should depend on invariant subspaces or spectral quantities rather than arbitrary eigenvector coordinates.