Skip to content

Eigenvalue Problems

Eigenvalue problems are fundamental in numerical analysis, optimization, physics, graph methods, control theory, and machine learning. They are also among the most subtle...

Eigenvalue problems are fundamental in numerical analysis, optimization, physics, graph methods, control theory, and machine learning. They are also among the most subtle operations in automatic differentiation.

The main difficulty is that eigenvectors are not stable objects. Small perturbations in the matrix can produce large changes in eigenvectors when eigenvalues are close or repeated. The derivative formulas reveal this instability directly.

Eigenvalues and Eigenvectors

For a square matrix

ARn×n, A \in \mathbb{R}^{n \times n},

an eigenpair satisfies

Av=λv, Av = \lambda v,

where

v0. v \ne 0.

The scalar λ\lambda is an eigenvalue and vv is an eigenvector.

For symmetric matrices,

A=AT, A = A^T,

all eigenvalues are real, and there exists an orthonormal basis of eigenvectors:

A=QΛQT, A = Q\Lambda Q^T,

where

QTQ=I, Q^TQ = I,

and

Λ=diag(λ1,,λn). \Lambda = \operatorname{diag}(\lambda_1,\ldots,\lambda_n).

Symmetric eigenproblems are much more stable and better behaved for AD than general nonsymmetric eigenproblems.

Why Eigenproblems Matter

Eigenvalue computations appear in:

ApplicationRole
PCAPrincipal directions
Spectral clusteringGraph Laplacian eigenvectors
Stability analysisSystem modes
Differential equationsModal decomposition
Quantum mechanicsHamiltonian spectra
Covariance analysisPrincipal components
OptimizationHessian curvature
Graph neural networksSpectral filters

Many machine learning systems differentiate through eigendecompositions implicitly or explicitly.

Differential of Eigenvalues

Assume AA is symmetric with normalized eigenvectors:

qiTqi=1. q_i^T q_i = 1.

The eigenvalue equation is

Aqi=λiqi. Aq_i = \lambda_i q_i.

Differentiate:

dAqi+Adqi=dλiqi+λidqi. dA\,q_i + A\,dq_i = d\lambda_i\,q_i + \lambda_i\,dq_i.

Left-multiply by qiTq_i^T:

qiTdAqi+qiTAdqi=dλiqiTqi+λiqiTdqi. q_i^T dA\,q_i + q_i^T A\,dq_i = d\lambda_i\,q_i^T q_i + \lambda_i q_i^T dq_i.

Since

Aqi=λiqi, Aq_i = \lambda_i q_i,

we obtain

qiTAdqi=λiqiTdqi. q_i^T A\,dq_i = \lambda_i q_i^T dq_i.

The terms cancel:

dλi=qiTdAqi. d\lambda_i = q_i^T dA\,q_i.

This is the first-order perturbation formula for eigenvalues.

The gradient of the eigenvalue with respect to the matrix is

Aλi=qiqiT. \nabla_A \lambda_i = q_i q_i^T.

This formula is stable even when eigenvectors are not.

Reverse Rule for Eigenvalues

Suppose a scalar loss depends only on eigenvalues:

L=f(λ1,,λn). L = f(\lambda_1,\ldots,\lambda_n).

Let

λˉi=Lλi. \bar{\lambda}_i = \frac{\partial L}{\partial \lambda_i}.

Then

dL=iλˉidλi. dL = \sum_i \bar{\lambda}_i d\lambda_i.

Substitute the perturbation formula:

dL=iλˉiqiTdAqi. dL = \sum_i \bar{\lambda}_i q_i^T dA q_i.

Rewrite using trace:

dL=tr(Qdiag(λˉ)QTdA). dL = \operatorname{tr} \left( Q\,\operatorname{diag}(\bar{\lambda})\,Q^T dA \right).

Therefore,

Aˉ=Qdiag(λˉ)QT. \bar{A} = Q\,\operatorname{diag}(\bar{\lambda})\,Q^T.

This rule is clean and stable for symmetric matrices.

Differential of Eigenvectors

Eigenvectors are more difficult.

Assume distinct eigenvalues. Differentiate:

Aqi=λiqi. Aq_i = \lambda_i q_i.

Rearrange:

(AλiI)dqi=(dλiIdA)qi. (A - \lambda_i I)dq_i = (d\lambda_i I - dA)q_i.

Expand dqidq_i in the eigenbasis:

dqi=jicjiqj. dq_i = \sum_{j \ne i} c_{ji} q_j.

Project onto qjq_j:

qjT(AλiI)dqi=qjT(dλiIdA)qi. q_j^T(A-\lambda_i I)dq_i = q_j^T(d\lambda_i I - dA)q_i.

Since

Aqj=λjqj, Aq_j = \lambda_j q_j,

we obtain

(λjλi)cji=qjTdAqi. (\lambda_j - \lambda_i)c_{ji} = -q_j^T dA q_i.

Thus

cji=qjTdAqiλiλj. c_{ji} = \frac{q_j^T dA q_i}{\lambda_i - \lambda_j}.

Therefore

dqi=jiqjqjTdAqiλiλj. dq_i = \sum_{j \ne i} q_j \frac{q_j^T dA q_i}{\lambda_i - \lambda_j}.

This formula is extremely important.

It shows that eigenvector derivatives contain terms:

1λiλj. \frac{1}{\lambda_i - \lambda_j}.

If eigenvalues are close, derivatives become large. If eigenvalues are equal, the derivative becomes undefined.

Geometric Meaning

Eigenvectors are unstable because only the eigenspace is intrinsic.

Suppose

λi=λj. \lambda_i = \lambda_j.

Then any orthonormal basis of the corresponding eigenspace is valid. The individual vectors are arbitrary.

Thus the mapping

Aqi A \mapsto q_i

is not continuous at repeated eigenvalues.

But the projector onto the eigenspace:

P=QQT P = QQ^T

is stable and differentiable under broader conditions.

This distinction matters in AD. Stable objectives should depend on invariant subspaces rather than arbitrary basis vectors.

Symmetric vs Nonsymmetric Problems

For symmetric matrices:

PropertyResult
EigenvaluesReal
EigenvectorsOrthogonal
DecompositionStable
Gradient formulasRelatively clean

For general matrices:

PropertyResult
EigenvaluesMay be complex
EigenvectorsMay be nonorthogonal
Jordan blocksPossible
Perturbation theoryMuch harder

Most ML and scientific computing systems restrict differentiable eigendecompositions to symmetric or Hermitian matrices.

Spectral Functions

A spectral function depends only on eigenvalues:

f(A)=Qf(Λ)QT. f(A) = Q f(\Lambda) Q^T.

Examples:

FunctionSpectral Form
Matrix exponentialeA=QeΛQTe^A = Q e^\Lambda Q^T
Matrix logarithmlogA=QlogΛQT\log A = Q \log \Lambda Q^T
Matrix square rootA1/2=QΛ1/2QTA^{1/2} = Q \Lambda^{1/2} Q^T
Matrix inverseA1=QΛ1QTA^{-1} = Q \Lambda^{-1} Q^T

Differentiating spectral functions involves differentiating eigenvalues and eigenvectors simultaneously.

The resulting formulas often contain divided differences:

f(λi)f(λj)λiλj. \frac{f(\lambda_i)-f(\lambda_j)} {\lambda_i-\lambda_j}.

These expressions remain finite when eigenvalues coincide if handled carefully.

Example: Largest Eigenvalue

Consider

λmax(A). \lambda_{\max}(A).

If the largest eigenvalue is simple, then

dλmax=qTdAq. d\lambda_{\max} = q^T dA q.

Thus

Aλmax=qqT. \nabla_A \lambda_{\max} = qq^T.

This appears in:

ApplicationUse
Spectral norm regularizationLargest singular/eigenvalue
Stability constraintsSpectral radius
Graph methodsLeading eigenvector
OptimizationConvex spectral penalties

If the largest eigenvalue is repeated, the gradient becomes set-valued.

Rayleigh Quotient

The Rayleigh quotient is

R(x)=xTAxxTx. R(x) = \frac{x^T A x}{x^T x}.

For symmetric AA, stationary points are eigenvectors.

Differentiate numerator and denominator:

dR=2xTAdx+xTdAxxTxxTAx(xTx)22xTdx. dR = \frac{ 2x^T A\,dx + x^T dA x }{x^T x} - \frac{ x^T A x }{ (x^T x)^2 } 2x^T dx.

When xx is normalized:

xTx=1, x^T x = 1,

this simplifies to

dR=xTdAx+2(AxR(x)x)Tdx. dR = x^T dA x + 2(Ax - R(x)x)^T dx.

At an eigenvector,

Ax=λx, Ax = \lambda x,

the second term vanishes.

Then

dR=xTdAx. dR = x^T dA x.

The Rayleigh quotient connects optimization and eigensystems directly.

Orthogonality Constraints

Eigenvector matrices satisfy

QTQ=I. Q^TQ = I.

Differentiate:

dQTQ+QTdQ=0. dQ^TQ + Q^TdQ = 0.

Therefore

QTdQ Q^TdQ

is skew-symmetric.

This constraint appears in all orthogonal matrix derivatives: QR, eigendecomposition, SVD, Stiefel manifolds, and orthogonal parameterizations.

A backward rule must preserve this geometry.

Reverse Mode for Symmetric Eigendecomposition

Suppose

A=QΛQT, A = Q\Lambda Q^T,

and a loss depends on both QQ and Λ\Lambda.

Let:

Qˉ=LQ,Λˉ=LΛ. \bar{Q} = \frac{\partial L}{\partial Q}, \qquad \bar{\Lambda} = \frac{\partial L}{\partial \Lambda}.

Define matrix FF:

Fij={1λiλj,ij,0,i=j. F_{ij} = \begin{cases} \frac{1}{\lambda_i - \lambda_j}, & i \ne j, \\ 0, & i=j. \end{cases}

Then the symmetric reverse rule is:

Aˉ=Q(Λˉ+F(QTQˉ))QT. \bar{A} = Q \left( \bar{\Lambda} + F \odot (Q^T \bar{Q}) \right) Q^T.

Symmetrization is usually required:

Aˉ12(Aˉ+AˉT). \bar{A} \leftarrow \frac{1}{2}(\bar{A}+\bar{A}^T).

The term involving FF is precisely where instability appears when eigenvalues collide.

Eigenvalue Crossing

Suppose two eigenvalues approach each other:

λiλj. \lambda_i \to \lambda_j.

Then

1λiλj. \frac{1}{\lambda_i - \lambda_j} \to \infty.

The gradient can explode.

This is not a bug in AD. The mathematical derivative itself becomes singular.

Near eigenvalue crossings:

QuantityStability
EigenvaluesStable
EigenspacesUsually stable
Individual eigenvectorsUnstable

This distinction is critical in spectral machine learning methods.

Numerical Stability

Practical eigendecomposition differentiation often requires:

TechniquePurpose
Eigenvalue regularizationAvoid collisions
SymmetrizationRemove asymmetric noise
Projector objectivesAvoid unstable basis gradients
Truncated spectraIgnore small unstable modes
Stable spectral gapsImprove conditioning

Spectral methods are fundamentally conditioning-sensitive.

Power Iteration and Implicit Differentiation

Some systems compute dominant eigenvectors iteratively:

xk+1=AxkAxk. x_{k+1} = \frac{Ax_k}{\|Ax_k\|}.

AD can:

  1. Differentiate through all iterations.
  2. Use implicit differentiation on the fixed-point equation.

Implicit methods are often cheaper and more stable for long iterative processes.

The fixed-point condition is:

x=AxAx. x = \frac{Ax}{\|Ax\|}.

Differentiating the fixed-point equation yields a linear system for the derivative.

This approach generalizes to many iterative eigensolvers.

Graph Laplacians

Graph learning frequently uses eigendecompositions of Laplacians:

L=DA. L = D - A.

The smallest eigenvectors encode graph geometry.

Spectral clustering, diffusion maps, graph embeddings, and manifold learning all rely on these spectra.

Repeated or nearly repeated eigenvalues are common in graphs with weakly connected components. This can make eigenvector gradients unstable.

Using subspace projectors instead of raw eigenvectors often improves robustness.

Spectral Radius

The spectral radius is

ρ(A)=maxiλi. \rho(A) = \max_i |\lambda_i|.

It controls:

DomainMeaning
Dynamical systemsStability
Recurrent networksGradient growth
Numerical methodsConvergence
Control theorySystem response

Differentiating spectral radius is difficult when multiple eigenvalues attain the maximum magnitude.

Implementation Considerations

A production AD implementation for eigendecomposition should track:

matrix symmetry
eigenvalue ordering
degeneracy handling
batch dimensions
dtype
complex vs real arithmetic
orthogonality conventions

The backward rule should explicitly define behavior when eigenvalues are repeated.

Many frameworks either:

StrategyBehavior
Return NaNsSignal undefined derivative
Use approximate stabilizationNumerically smoother
Document undefined regionUser responsibility

No implementation can make fundamentally undefined derivatives well-defined.

Practical Guidance

Prefer:

Stable ObjectiveAvoid
EigenvaluesRaw eigenvectors
Subspace projectorsArbitrary basis vectors
Spectral tracesUnstable spectral coordinates
Orthogonally invariant lossesBasis-dependent losses
Symmetric problemsGeneral nonsymmetric spectra

When possible, formulate objectives using:

QQT QQ^T

rather than QQ itself.

Summary

Eigenvalue problems reveal the geometric limits of automatic differentiation. Eigenvalues have stable first-order perturbations. Eigenvectors do not. Their derivatives contain inverse spectral gaps, making gradients unstable or undefined near repeated eigenvalues.

Differentiating eigendecompositions correctly requires respecting orthogonality constraints, spectral degeneracy, and basis ambiguity. Stable formulations should depend on invariant subspaces or spectral quantities rather than arbitrary eigenvector coordinates.