# Eigenvalue Problems

## Eigenvalue Problems

Eigenvalue problems are fundamental in numerical analysis, optimization, physics, graph methods, control theory, and machine learning. They are also among the most subtle operations in automatic differentiation.

The main difficulty is that eigenvectors are not stable objects. Small perturbations in the matrix can produce large changes in eigenvectors when eigenvalues are close or repeated. The derivative formulas reveal this instability directly.

## Eigenvalues and Eigenvectors

For a square matrix

$$
A \in \mathbb{R}^{n \times n},
$$

an eigenpair satisfies

$$
Av = \lambda v,
$$

where

$$
v \ne 0.
$$

The scalar $\lambda$ is an eigenvalue and $v$ is an eigenvector.

For symmetric matrices,

$$
A = A^T,
$$

all eigenvalues are real, and there exists an orthonormal basis of eigenvectors:

$$
A = Q\Lambda Q^T,
$$

where

$$
Q^TQ = I,
$$

and

$$
\Lambda = \operatorname{diag}(\lambda_1,\ldots,\lambda_n).
$$

Symmetric eigenproblems are much more stable and better behaved for AD than general nonsymmetric eigenproblems.

## Why Eigenproblems Matter

Eigenvalue computations appear in:

| Application | Role |
|---|---|
| PCA | Principal directions |
| Spectral clustering | Graph Laplacian eigenvectors |
| Stability analysis | System modes |
| Differential equations | Modal decomposition |
| Quantum mechanics | Hamiltonian spectra |
| Covariance analysis | Principal components |
| Optimization | Hessian curvature |
| Graph neural networks | Spectral filters |

Many machine learning systems differentiate through eigendecompositions implicitly or explicitly.

## Differential of Eigenvalues

Assume $A$ is symmetric with normalized eigenvectors:

$$
q_i^T q_i = 1.
$$

The eigenvalue equation is

$$
Aq_i = \lambda_i q_i.
$$

Differentiate:

$$
dA\,q_i + A\,dq_i =
d\lambda_i\,q_i + \lambda_i\,dq_i.
$$

Left-multiply by $q_i^T$:

$$
q_i^T dA\,q_i
+
q_i^T A\,dq_i =
d\lambda_i\,q_i^T q_i
+
\lambda_i q_i^T dq_i.
$$

Since

$$
Aq_i = \lambda_i q_i,
$$

we obtain

$$
q_i^T A\,dq_i =
\lambda_i q_i^T dq_i.
$$

The terms cancel:

$$
d\lambda_i =
q_i^T dA\,q_i.
$$

This is the first-order perturbation formula for eigenvalues.

The gradient of the eigenvalue with respect to the matrix is

$$
\nabla_A \lambda_i =
q_i q_i^T.
$$

This formula is stable even when eigenvectors are not.

## Reverse Rule for Eigenvalues

Suppose a scalar loss depends only on eigenvalues:

$$
L = f(\lambda_1,\ldots,\lambda_n).
$$

Let

$$
\bar{\lambda}_i =
\frac{\partial L}{\partial \lambda_i}.
$$

Then

$$
dL =
\sum_i \bar{\lambda}_i d\lambda_i.
$$

Substitute the perturbation formula:

$$
dL =
\sum_i
\bar{\lambda}_i
q_i^T dA q_i.
$$

Rewrite using trace:

$$
dL =
\operatorname{tr}
\left(
Q\,\operatorname{diag}(\bar{\lambda})\,Q^T dA
\right).
$$

Therefore,

$$
\bar{A} =
Q\,\operatorname{diag}(\bar{\lambda})\,Q^T.
$$

This rule is clean and stable for symmetric matrices.

## Differential of Eigenvectors

Eigenvectors are more difficult.

Assume distinct eigenvalues. Differentiate:

$$
Aq_i = \lambda_i q_i.
$$

Rearrange:

$$
(A - \lambda_i I)dq_i =
(d\lambda_i I - dA)q_i.
$$

Expand $dq_i$ in the eigenbasis:

$$
dq_i =
\sum_{j \ne i} c_{ji} q_j.
$$

Project onto $q_j$:

$$
q_j^T(A-\lambda_i I)dq_i =
q_j^T(d\lambda_i I - dA)q_i.
$$

Since

$$
Aq_j = \lambda_j q_j,
$$

we obtain

$$
(\lambda_j - \lambda_i)c_{ji} =
-q_j^T dA q_i.
$$

Thus

$$
c_{ji} =
\frac{q_j^T dA q_i}{\lambda_i - \lambda_j}.
$$

Therefore

$$
dq_i =
\sum_{j \ne i}
q_j
\frac{q_j^T dA q_i}{\lambda_i - \lambda_j}.
$$

This formula is extremely important.

It shows that eigenvector derivatives contain terms:

$$
\frac{1}{\lambda_i - \lambda_j}.
$$

If eigenvalues are close, derivatives become large. If eigenvalues are equal, the derivative becomes undefined.

## Geometric Meaning

Eigenvectors are unstable because only the eigenspace is intrinsic.

Suppose

$$
\lambda_i = \lambda_j.
$$

Then any orthonormal basis of the corresponding eigenspace is valid. The individual vectors are arbitrary.

Thus the mapping

$$
A \mapsto q_i
$$

is not continuous at repeated eigenvalues.

But the projector onto the eigenspace:

$$
P = QQ^T
$$

is stable and differentiable under broader conditions.

This distinction matters in AD. Stable objectives should depend on invariant subspaces rather than arbitrary basis vectors.

## Symmetric vs Nonsymmetric Problems

For symmetric matrices:

| Property | Result |
|---|---|
| Eigenvalues | Real |
| Eigenvectors | Orthogonal |
| Decomposition | Stable |
| Gradient formulas | Relatively clean |

For general matrices:

| Property | Result |
|---|---|
| Eigenvalues | May be complex |
| Eigenvectors | May be nonorthogonal |
| Jordan blocks | Possible |
| Perturbation theory | Much harder |

Most ML and scientific computing systems restrict differentiable eigendecompositions to symmetric or Hermitian matrices.

## Spectral Functions

A spectral function depends only on eigenvalues:

$$
f(A) =
Q f(\Lambda) Q^T.
$$

Examples:

| Function | Spectral Form |
|---|---|
| Matrix exponential | $e^A = Q e^\Lambda Q^T$ |
| Matrix logarithm | $\log A = Q \log \Lambda Q^T$ |
| Matrix square root | $A^{1/2} = Q \Lambda^{1/2} Q^T$ |
| Matrix inverse | $A^{-1} = Q \Lambda^{-1} Q^T$ |

Differentiating spectral functions involves differentiating eigenvalues and eigenvectors simultaneously.

The resulting formulas often contain divided differences:

$$
\frac{f(\lambda_i)-f(\lambda_j)}
{\lambda_i-\lambda_j}.
$$

These expressions remain finite when eigenvalues coincide if handled carefully.

## Example: Largest Eigenvalue

Consider

$$
\lambda_{\max}(A).
$$

If the largest eigenvalue is simple, then

$$
d\lambda_{\max} =
q^T dA q.
$$

Thus

$$
\nabla_A \lambda_{\max} =
qq^T.
$$

This appears in:

| Application | Use |
|---|---|
| Spectral norm regularization | Largest singular/eigenvalue |
| Stability constraints | Spectral radius |
| Graph methods | Leading eigenvector |
| Optimization | Convex spectral penalties |

If the largest eigenvalue is repeated, the gradient becomes set-valued.

## Rayleigh Quotient

The Rayleigh quotient is

$$
R(x) =
\frac{x^T A x}{x^T x}.
$$

For symmetric $A$, stationary points are eigenvectors.

Differentiate numerator and denominator:

$$
dR =
\frac{
2x^T A\,dx + x^T dA x
}{x^T x} -
\frac{
x^T A x
}{
(x^T x)^2
}
2x^T dx.
$$

When $x$ is normalized:

$$
x^T x = 1,
$$

this simplifies to

$$
dR =
x^T dA x
+
2(Ax - R(x)x)^T dx.
$$

At an eigenvector,

$$
Ax = \lambda x,
$$

the second term vanishes.

Then

$$
dR = x^T dA x.
$$

The Rayleigh quotient connects optimization and eigensystems directly.

## Orthogonality Constraints

Eigenvector matrices satisfy

$$
Q^TQ = I.
$$

Differentiate:

$$
dQ^TQ + Q^TdQ = 0.
$$

Therefore

$$
Q^TdQ
$$

is skew-symmetric.

This constraint appears in all orthogonal matrix derivatives: QR, eigendecomposition, SVD, Stiefel manifolds, and orthogonal parameterizations.

A backward rule must preserve this geometry.

## Reverse Mode for Symmetric Eigendecomposition

Suppose

$$
A = Q\Lambda Q^T,
$$

and a loss depends on both $Q$ and $\Lambda$.

Let:

$$
\bar{Q} =
\frac{\partial L}{\partial Q},
\qquad
\bar{\Lambda} =
\frac{\partial L}{\partial \Lambda}.
$$

Define matrix $F$:

$$
F_{ij} =
\begin{cases}
\frac{1}{\lambda_i - \lambda_j}, & i \ne j, \\
0, & i=j.
\end{cases}
$$

Then the symmetric reverse rule is:

$$
\bar{A} =
Q
\left(
\bar{\Lambda}
+
F \odot (Q^T \bar{Q})
\right)
Q^T.
$$

Symmetrization is usually required:

$$
\bar{A}
\leftarrow
\frac{1}{2}(\bar{A}+\bar{A}^T).
$$

The term involving $F$ is precisely where instability appears when eigenvalues collide.

## Eigenvalue Crossing

Suppose two eigenvalues approach each other:

$$
\lambda_i \to \lambda_j.
$$

Then

$$
\frac{1}{\lambda_i - \lambda_j}
\to \infty.
$$

The gradient can explode.

This is not a bug in AD. The mathematical derivative itself becomes singular.

Near eigenvalue crossings:

| Quantity | Stability |
|---|---|
| Eigenvalues | Stable |
| Eigenspaces | Usually stable |
| Individual eigenvectors | Unstable |

This distinction is critical in spectral machine learning methods.

## Numerical Stability

Practical eigendecomposition differentiation often requires:

| Technique | Purpose |
|---|---|
| Eigenvalue regularization | Avoid collisions |
| Symmetrization | Remove asymmetric noise |
| Projector objectives | Avoid unstable basis gradients |
| Truncated spectra | Ignore small unstable modes |
| Stable spectral gaps | Improve conditioning |

Spectral methods are fundamentally conditioning-sensitive.

## Power Iteration and Implicit Differentiation

Some systems compute dominant eigenvectors iteratively:

$$
x_{k+1} =
\frac{Ax_k}{\|Ax_k\|}.
$$

AD can:

1. Differentiate through all iterations.
2. Use implicit differentiation on the fixed-point equation.

Implicit methods are often cheaper and more stable for long iterative processes.

The fixed-point condition is:

$$
x = \frac{Ax}{\|Ax\|}.
$$

Differentiating the fixed-point equation yields a linear system for the derivative.

This approach generalizes to many iterative eigensolvers.

## Graph Laplacians

Graph learning frequently uses eigendecompositions of Laplacians:

$$
L = D - A.
$$

The smallest eigenvectors encode graph geometry.

Spectral clustering, diffusion maps, graph embeddings, and manifold learning all rely on these spectra.

Repeated or nearly repeated eigenvalues are common in graphs with weakly connected components. This can make eigenvector gradients unstable.

Using subspace projectors instead of raw eigenvectors often improves robustness.

## Spectral Radius

The spectral radius is

$$
\rho(A) =
\max_i |\lambda_i|.
$$

It controls:

| Domain | Meaning |
|---|---|
| Dynamical systems | Stability |
| Recurrent networks | Gradient growth |
| Numerical methods | Convergence |
| Control theory | System response |

Differentiating spectral radius is difficult when multiple eigenvalues attain the maximum magnitude.

## Implementation Considerations

A production AD implementation for eigendecomposition should track:

```text
matrix symmetry
eigenvalue ordering
degeneracy handling
batch dimensions
dtype
complex vs real arithmetic
orthogonality conventions
```

The backward rule should explicitly define behavior when eigenvalues are repeated.

Many frameworks either:

| Strategy | Behavior |
|---|---|
| Return NaNs | Signal undefined derivative |
| Use approximate stabilization | Numerically smoother |
| Document undefined region | User responsibility |

No implementation can make fundamentally undefined derivatives well-defined.

## Practical Guidance

Prefer:

| Stable Objective | Avoid |
|---|---|
| Eigenvalues | Raw eigenvectors |
| Subspace projectors | Arbitrary basis vectors |
| Spectral traces | Unstable spectral coordinates |
| Orthogonally invariant losses | Basis-dependent losses |
| Symmetric problems | General nonsymmetric spectra |

When possible, formulate objectives using:

$$
QQ^T
$$

rather than $Q$ itself.

## Summary

Eigenvalue problems reveal the geometric limits of automatic differentiation. Eigenvalues have stable first-order perturbations. Eigenvectors do not. Their derivatives contain inverse spectral gaps, making gradients unstable or undefined near repeated eigenvalues.

Differentiating eigendecompositions correctly requires respecting orthogonality constraints, spectral degeneracy, and basis ambiguity. Stable formulations should depend on invariant subspaces or spectral quantities rather than arbitrary eigenvector coordinates.

