Eigenvalue Problems

Eigenvalue problems are fundamental in numerical analysis, optimization, physics, graph methods, control theory, and machine learning. They are also among the most subtle operations in automatic differentiation.

The main difficulty is that eigenvectors are not stable objects. Small perturbations in the matrix can produce large changes in eigenvectors when eigenvalues are close or repeated. The derivative formulas reveal this instability directly.

Eigenvalues and Eigenvectors

For a square matrix

A \in \mathbb{R}^{n \times n},

an eigenpair satisfies

Av = \lambda v,

where

v \ne 0.

The scalar $\lambda$ is an eigenvalue and $v$ is an eigenvector.

For symmetric matrices,

A = A^T,

all eigenvalues are real, and there exists an orthonormal basis of eigenvectors:

A = Q\Lambda Q^T,

where

Q^TQ = I,

and

\Lambda = \operatorname{diag}(\lambda_1,\ldots,\lambda_n).

Symmetric eigenproblems are much more stable and better behaved for AD than general nonsymmetric eigenproblems.

Why Eigenproblems Matter

Eigenvalue computations appear in:

Application	Role
PCA	Principal directions
Spectral clustering	Graph Laplacian eigenvectors
Stability analysis	System modes
Differential equations	Modal decomposition
Quantum mechanics	Hamiltonian spectra
Covariance analysis	Principal components
Optimization	Hessian curvature
Graph neural networks	Spectral filters

Many machine learning systems differentiate through eigendecompositions implicitly or explicitly.

Differential of Eigenvalues

Assume $A$ is symmetric with normalized eigenvectors:

q_i^T q_i = 1.

The eigenvalue equation is

Aq_i = \lambda_i q_i.

Differentiate:

dA\,q_i + A\,dq_i = d\lambda_i\,q_i + \lambda_i\,dq_i.

Left-multiply by $q_i^T$ :

q_i^T dA\,q_i + q_i^T A\,dq_i = d\lambda_i\,q_i^T q_i + \lambda_i q_i^T dq_i.

Since

Aq_i = \lambda_i q_i,

we obtain

q_i^T A\,dq_i = \lambda_i q_i^T dq_i.

The terms cancel:

d\lambda_i = q_i^T dA\,q_i.

This is the first-order perturbation formula for eigenvalues.

The gradient of the eigenvalue with respect to the matrix is

\nabla_A \lambda_i = q_i q_i^T.

This formula is stable even when eigenvectors are not.

Reverse Rule for Eigenvalues

Suppose a scalar loss depends only on eigenvalues:

L = f(\lambda_1,\ldots,\lambda_n).

Let

\bar{\lambda}_i = \frac{\partial L}{\partial \lambda_i}.

Then

dL = \sum_i \bar{\lambda}_i d\lambda_i.

Substitute the perturbation formula:

dL = \sum_i \bar{\lambda}_i q_i^T dA q_i.

Rewrite using trace:

dL = \operatorname{tr} \left( Q\,\operatorname{diag}(\bar{\lambda})\,Q^T dA \right).

Therefore,

\bar{A} = Q\,\operatorname{diag}(\bar{\lambda})\,Q^T.

This rule is clean and stable for symmetric matrices.

Differential of Eigenvectors

Eigenvectors are more difficult.

Assume distinct eigenvalues. Differentiate:

Aq_i = \lambda_i q_i.

Rearrange:

(A - \lambda_i I)dq_i = (d\lambda_i I - dA)q_i.

Expand $dq_i$ in the eigenbasis:

dq_i = \sum_{j \ne i} c_{ji} q_j.

Project onto $q_j$ :

q_j^T(A-\lambda_i I)dq_i = q_j^T(d\lambda_i I - dA)q_i.

Since

Aq_j = \lambda_j q_j,

we obtain

(\lambda_j - \lambda_i)c_{ji} = -q_j^T dA q_i.

Thus

c_{ji} = \frac{q_j^T dA q_i}{\lambda_i - \lambda_j}.

Therefore

dq_i = \sum_{j \ne i} q_j \frac{q_j^T dA q_i}{\lambda_i - \lambda_j}.

This formula is extremely important.

It shows that eigenvector derivatives contain terms:

\frac{1}{\lambda_i - \lambda_j}.

If eigenvalues are close, derivatives become large. If eigenvalues are equal, the derivative becomes undefined.

Geometric Meaning

Eigenvectors are unstable because only the eigenspace is intrinsic.

Suppose

\lambda_i = \lambda_j.

Then any orthonormal basis of the corresponding eigenspace is valid. The individual vectors are arbitrary.

Thus the mapping

A \mapsto q_i

is not continuous at repeated eigenvalues.

But the projector onto the eigenspace:

P = QQ^T

is stable and differentiable under broader conditions.

This distinction matters in AD. Stable objectives should depend on invariant subspaces rather than arbitrary basis vectors.

Symmetric vs Nonsymmetric Problems

For symmetric matrices:

Property	Result
Eigenvalues	Real
Eigenvectors	Orthogonal
Decomposition	Stable
Gradient formulas	Relatively clean

For general matrices:

Property	Result
Eigenvalues	May be complex
Eigenvectors	May be nonorthogonal
Jordan blocks	Possible
Perturbation theory	Much harder

Most ML and scientific computing systems restrict differentiable eigendecompositions to symmetric or Hermitian matrices.

Spectral Functions

A spectral function depends only on eigenvalues:

f(A) = Q f(\Lambda) Q^T.

Examples:

Function	Spectral Form
Matrix exponential	$e^A = Q e^\Lambda Q^T$
Matrix logarithm	$\log A = Q \log \Lambda Q^T$
Matrix square root	$A^{1/2} = Q \Lambda^{1/2} Q^T$
Matrix inverse	$A^{-1} = Q \Lambda^{-1} Q^T$

Differentiating spectral functions involves differentiating eigenvalues and eigenvectors simultaneously.

The resulting formulas often contain divided differences:

\frac{f(\lambda_i)-f(\lambda_j)} {\lambda_i-\lambda_j}.

These expressions remain finite when eigenvalues coincide if handled carefully.

Example: Largest Eigenvalue

Consider

\lambda_{\max}(A).

If the largest eigenvalue is simple, then

d\lambda_{\max} = q^T dA q.

Thus

\nabla_A \lambda_{\max} = qq^T.

This appears in:

Application	Use
Spectral norm regularization	Largest singular/eigenvalue
Stability constraints	Spectral radius
Graph methods	Leading eigenvector
Optimization	Convex spectral penalties

If the largest eigenvalue is repeated, the gradient becomes set-valued.

Rayleigh Quotient

The Rayleigh quotient is

R(x) = \frac{x^T A x}{x^T x}.

For symmetric $A$ , stationary points are eigenvectors.

Differentiate numerator and denominator:

dR = \frac{ 2x^T A\,dx + x^T dA x }{x^T x} - \frac{ x^T A x }{ (x^T x)^2 } 2x^T dx.

When $x$ is normalized:

x^T x = 1,

this simplifies to

dR = x^T dA x + 2(Ax - R(x)x)^T dx.

At an eigenvector,

Ax = \lambda x,

the second term vanishes.

Then

dR = x^T dA x.

The Rayleigh quotient connects optimization and eigensystems directly.

Orthogonality Constraints

Eigenvector matrices satisfy

Q^TQ = I.

Differentiate:

dQ^TQ + Q^TdQ = 0.

Therefore

Q^TdQ

is skew-symmetric.

This constraint appears in all orthogonal matrix derivatives: QR, eigendecomposition, SVD, Stiefel manifolds, and orthogonal parameterizations.

A backward rule must preserve this geometry.

Reverse Mode for Symmetric Eigendecomposition

Suppose

A = Q\Lambda Q^T,

and a loss depends on both $Q$ and $\Lambda$ .

Let:

\bar{Q} = \frac{\partial L}{\partial Q}, \qquad \bar{\Lambda} = \frac{\partial L}{\partial \Lambda}.

Define matrix $F$ :

F_{ij} = \begin{cases} \frac{1}{\lambda_i - \lambda_j}, & i \ne j, \\ 0, & i=j. \end{cases}

Then the symmetric reverse rule is:

\bar{A} = Q \left( \bar{\Lambda} + F \odot (Q^T \bar{Q}) \right) Q^T.

Symmetrization is usually required:

\bar{A} \leftarrow \frac{1}{2}(\bar{A}+\bar{A}^T).

The term involving $F$ is precisely where instability appears when eigenvalues collide.

Eigenvalue Crossing

Suppose two eigenvalues approach each other:

\lambda_i \to \lambda_j.

Then

\frac{1}{\lambda_i - \lambda_j} \to \infty.

The gradient can explode.

This is not a bug in AD. The mathematical derivative itself becomes singular.

Near eigenvalue crossings:

Quantity	Stability
Eigenvalues	Stable
Eigenspaces	Usually stable
Individual eigenvectors	Unstable

This distinction is critical in spectral machine learning methods.

Numerical Stability

Practical eigendecomposition differentiation often requires:

Technique	Purpose
Eigenvalue regularization	Avoid collisions
Symmetrization	Remove asymmetric noise
Projector objectives	Avoid unstable basis gradients
Truncated spectra	Ignore small unstable modes
Stable spectral gaps	Improve conditioning

Spectral methods are fundamentally conditioning-sensitive.

Power Iteration and Implicit Differentiation

Some systems compute dominant eigenvectors iteratively:

x_{k+1} = \frac{Ax_k}{\|Ax_k\|}.

AD can:

Differentiate through all iterations.
Use implicit differentiation on the fixed-point equation.

Implicit methods are often cheaper and more stable for long iterative processes.

The fixed-point condition is:

x = \frac{Ax}{\|Ax\|}.

Differentiating the fixed-point equation yields a linear system for the derivative.

This approach generalizes to many iterative eigensolvers.

Graph Laplacians

Graph learning frequently uses eigendecompositions of Laplacians:

L = D - A.

The smallest eigenvectors encode graph geometry.

Spectral clustering, diffusion maps, graph embeddings, and manifold learning all rely on these spectra.

Repeated or nearly repeated eigenvalues are common in graphs with weakly connected components. This can make eigenvector gradients unstable.

Using subspace projectors instead of raw eigenvectors often improves robustness.

Spectral Radius

The spectral radius is

\rho(A) = \max_i |\lambda_i|.

It controls:

Domain	Meaning
Dynamical systems	Stability
Recurrent networks	Gradient growth
Numerical methods	Convergence
Control theory	System response

Differentiating spectral radius is difficult when multiple eigenvalues attain the maximum magnitude.

Implementation Considerations

A production AD implementation for eigendecomposition should track:

matrix symmetry
eigenvalue ordering
degeneracy handling
batch dimensions
dtype
complex vs real arithmetic
orthogonality conventions

The backward rule should explicitly define behavior when eigenvalues are repeated.

Many frameworks either:

Strategy	Behavior
Return NaNs	Signal undefined derivative
Use approximate stabilization	Numerically smoother
Document undefined region	User responsibility

No implementation can make fundamentally undefined derivatives well-defined.

Practical Guidance

Prefer:

Stable Objective	Avoid
Eigenvalues	Raw eigenvectors
Subspace projectors	Arbitrary basis vectors
Spectral traces	Unstable spectral coordinates
Orthogonally invariant losses	Basis-dependent losses
Symmetric problems	General nonsymmetric spectra

When possible, formulate objectives using:

QQ^T

rather than $Q$ itself.

Summary

Eigenvalue problems reveal the geometric limits of automatic differentiation. Eigenvalues have stable first-order perturbations. Eigenvectors do not. Their derivatives contain inverse spectral gaps, making gradients unstable or undefined near repeated eigenvalues.

Differentiating eigendecompositions correctly requires respecting orthogonality constraints, spectral degeneracy, and basis ambiguity. Stable formulations should depend on invariant subspaces or spectral quantities rather than arbitrary eigenvector coordinates.