Skip to content

Verified Differentiation

Automatic differentiation systems are usually trusted because they implement mathematically established rules such as the chain rule, product rule, and linearization of...

Automatic differentiation systems are usually trusted because they implement mathematically established rules such as the chain rule, product rule, and linearization of elementary operations.

However, practical AD systems are large compiler and runtime systems containing:

ComponentComplexity
graph transformationscompiler correctness
tensor kernelsnumerical precision
mutation handlingstate consistency
checkpointingrecomputation logic
custom gradientsuser-defined semantics
distributed executionsynchronization correctness

A small mistake may silently produce incorrect gradients.

Verified differentiation studies how to prove that derivative computations are correct with respect to precise mathematical and program semantics.

This area connects automatic differentiation with:

FieldRole
formal verificationproof of correctness
theorem provingmachine-checked reasoning
programming language semanticsrigorous program meaning
numerical analysisfloating-point guarantees
compiler verificationtransformation correctness

The goal is stronger guarantees than empirical testing alone.

What Does Correctness Mean?

A derivative system may appear correct while violating important properties.

There are several possible notions of correctness.

Mathematical correctness

The produced derivative matches the mathematical derivative of the represented function.

Program correctness

The transformed program correctly computes the derivative of the original program.

Numerical correctness

Floating-point execution approximates the exact derivative within known bounds.

Semantic correctness

Compiler optimizations preserve derivative meaning.

These notions are related but distinct.

The Central Property

Suppose a program computes

f:XY. f : X \to Y.

An AD transform produces another program

Df:X×XY×Y Df : X \times X \to Y \times Y

for forward mode.

Correctness informally means:

Df(x,x˙)=(f(x),Jf(x)x˙). Df(x,\dot{x}) = (f(x), J_f(x)\dot{x}).

For reverse mode, correctness means the generated adjoint computes the transpose Jacobian action:

xˉ=Jf(x)Tyˉ. \bar{x} = J_f(x)^T \bar{y}.

A verified AD system attempts to prove these properties formally.

Why Testing Is Not Enough

AD systems are difficult to validate purely through testing.

Finite-difference checks are common:

f(x+h)f(xh)2h. \frac{ f(x+h)-f(x-h) }{ 2h }.

But finite differences have limitations:

ProblemConsequence
truncation errorinaccurate reference
floating-point cancellationinstability
non-smooth functionsundefined comparison
high dimensionsexpensive checking

A system may pass numerical tests while still violating formal derivative semantics.

Formal verification aims to eliminate such ambiguity.

Formal Semantics of Programs

Verification requires precise semantics for the source language.

A language semantics defines:

ConstructMeaning
arithmeticnumerical interpretation
control flowexecution behavior
recursionfixed-point meaning
mutationstate transition
exceptionserror propagation

Without rigorous semantics, it is impossible to prove that differentiation preserves meaning.

Differentiable programming therefore depends heavily on programming language theory.

Forward Mode Verification

Forward mode is conceptually simpler.

A forward transform augments each value with a tangent:

x(x,x˙). x \mapsto (x,\dot{x}).

Elementary operations are lifted.

Example:

(x,x˙)+(y,y˙)=(x+y,x˙+y˙). (x,\dot{x}) + (y,\dot{y}) = (x+y,\dot{x}+\dot{y}).

Multiplication becomes:

(x,x˙)(y,y˙)=(xy,xy˙+yx˙). (x,\dot{x})(y,\dot{y}) = (xy,x\dot{y}+y\dot{x}).

Verification proves that the transformed operations preserve the derivative interpretation.

The proof usually proceeds structurally over program syntax.

Dual Number Semantics

Forward mode is often verified through dual numbers.

Define:

a+bε,ε2=0. a+b\varepsilon, \qquad \varepsilon^2=0.

Evaluating a function on dual numbers gives:

f(x+x˙ε)=f(x)+f(x)x˙ε. f(x+\dot{x}\varepsilon) = f(x)+f'(x)\dot{x}\varepsilon.

Thus the tangent component corresponds exactly to the derivative.

Formal verification may prove:

PropertyMeaning
algebra preservationoperations respect dual structure
compositionalitychain rule correctness
soundnessextracted tangent equals derivative

Dual-number semantics provide a clean mathematical model for forward AD.

Reverse Mode Verification

Reverse mode is harder.

The backward pass depends on:

IssueDifficulty
execution traceruntime dependency
mutationoverwritten values
control flowdynamic structure
memory reusealiasing
checkpointingrecomputation correctness

A reverse transform effectively constructs a second program that computes cotangent propagation.

Verification must prove that this program computes the transpose Jacobian action correctly.

Pullback Semantics

Reverse mode is often formalized through pullbacks.

Given:

f:XY, f : X \to Y,

reverse mode computes:

f:YX. f^* : Y^* \to X^*.

This maps output cotangents back to input cotangents.

The correctness condition is:

f(yˉ)=Jf(x)Tyˉ. f^*(\bar{y}) = J_f(x)^T\bar{y}.

Verification then becomes proof that generated pullback programs implement the correct linear maps.

SSA Verification

Many compiler AD systems operate on SSA intermediate representations.

Example:

x1 = ...
x2 = ...
x3 = mul(x1, x2)

The reverse transform generates:

x1_bar += x2 * x3_bar
x2_bar += x1 * x3_bar

Verification proves:

PropertyMeaning
data dependency preservationcorrect adjoint flow
linearitycotangent accumulation validity
optimization safetytransformed IR preserves semantics

Compiler verification becomes part of differentiation verification.

Verified Compiler Passes

An optimizing compiler may transform code:

x * 2

into:

x + x

Ordinary semantics remain equivalent.

Derivative semantics must also remain equivalent.

A verified differentiable compiler must prove that optimizations preserve both:

SemanticsRequirement
primal semanticssame outputs
derivative semanticssame derivatives

This significantly increases compiler complexity.

Floating-Point Verification

Real AD systems use floating-point arithmetic.

Floating-point derivatives differ from exact real derivatives because of:

IssueExample
roundingrepresentation error
cancellationsubtractive instability
overflowinfinite gradients
underflowzero gradients

Formal verification may include error bounds:

g^gϵ. |\hat{g}-g| \leq \epsilon.

Here:

SymbolMeaning
gexact derivative
ĝcomputed derivative

Verified numerical analysis attempts to bound such errors rigorously.

Differentiating Non-Smooth Programs

Many programs contain non-differentiable operations:

OperationProblem
maxkink
absundefined derivative at zero
sortpermutation discontinuity
branchingpath discontinuity

AD systems usually adopt conventions:

ConventionExample
subgradientReLU derivative at zero
one-sided derivativebranch convention
arbitrary selectionimplementation-defined

Verification must specify which generalized derivative notion is intended.

Otherwise correctness claims are ambiguous.

Verified Higher-Order Differentiation

Nested differentiation introduces subtle problems.

Example:

grad(grad(f))

Potential failures include:

FailureCause
perturbation confusiontangent aliasing
tape reuseincorrect adjoints
mixed-mode mismatchsemantic inconsistency

Verified higher-order systems formally separate derivative levels and tangent spaces.

Proof Assistants

Formal verification often uses theorem provers.

Examples include:

SystemPurpose
Coqconstructive proofs
Leandependent type verification
Isabellehigher-order logic
Agdatype-theoretic proofs

These systems allow machine-checked correctness proofs for differentiation transformations.

A proof assistant can verify:

TargetExample
derivative rulechain rule implementation
compiler passSSA adjoint generation
algebraic lawlinearity
numerical propertyerror bound

Machine-checked proofs reduce reliance on informal reasoning.

Categorical Semantics

Category theory provides abstract semantics for differentiation.

A differentiable computation may be modeled as:

ConceptInterpretation
morphismprogram
tangent functorforward derivative
cotangent structurereverse derivative
compositionchain rule

Differential categories formalize differentiation algebraically.

Verification then becomes proof that transformations satisfy categorical laws.

This gives very general semantic guarantees.

Verified Tensor Systems

Tensor compilers introduce additional correctness issues.

A tensor transformation may involve:

TransformationRisk
fusionchanged evaluation order
parallelizationrace conditions
layout optimizationindexing errors
distributed reductionsynchronization bugs

Verified differentiation must account for tensor algebra semantics as well as scalar calculus.

Probabilistic Verification

Probabilistic differentiation adds stochastic semantics.

Verification questions include:

QuestionMeaning
estimator unbiasednessexpected gradient correctness
variance boundsoptimization stability
expectation interchangelegality of differentiation
Monte Carlo convergenceasymptotic validity

The derivative itself becomes a random variable.

Formal reasoning now combines probability theory with calculus.

Verified Scientific Computing

Scientific simulation increasingly depends on differentiable solvers.

Verification targets include:

SystemConcern
PDE solveradjoint consistency
ODE integratordiscretization correctness
optimization layerKKT derivative validity
differentiable physicsconservation laws

A small gradient error may invalidate optimization results in scientific applications.

This makes verification especially important in high-stakes domains.

Certified Gradients

One goal is certified gradients.

Instead of returning only:

f(x), \nabla f(x),

the system may return:

(g^,ϵ), (\hat{g},\epsilon),

where:

QuantityMeaning
ĝcomputed gradient
εproven error bound

This resembles interval arithmetic and verified numerical computation.

Security and Reliability

Incorrect gradients may create security and reliability risks.

Examples include:

DomainRisk
autonomous systemsunsafe optimization
medical systemsunstable training
financial optimizationincorrect sensitivity
scientific inferenceinvalid conclusions

Verification therefore has practical as well as theoretical importance.

Differentiation as a Certified Transformation

A verified differentiable compiler treats AD as a semantics-preserving program transformation.

The compiler must prove:

meaning(Df)=D(meaning(f)). \text{meaning}(Df) = D(\text{meaning}(f)).

This equation captures the central goal.

Differentiation of the program should equal differentiation of the mathematical function represented by the program.

Failure Modes

Verification itself has limitations.

Incorrect specifications

A proof may verify the wrong property.

Incomplete semantics

The language meaning may omit runtime behavior.

Floating-point mismatch

Real execution may differ from mathematical models.

Trusted kernel assumptions

Proof assistants rely on trusted foundations.

Scalability

Large tensor systems are difficult to verify completely.

Thus verification improves confidence but does not eliminate all risk.

Conceptual Shift

Classical AD assumes derivative rules are implemented correctly.

Verified differentiation treats derivative computation itself as an object requiring proof.

The derivative becomes a formally certified computational artifact.

This shifts AD from:

Classical viewVerified view
practical algorithmformally verified transformation
empirical testingmathematical proof
heuristic confidencemachine-checked correctness

The goal is not only fast gradients, but trustworthy gradients.

Summary

Verified differentiation studies formal correctness of automatic differentiation systems.

This includes proof of derivative semantics, compiler transformations, adjoint generation, floating-point behavior, higher-order differentiation, and stochastic gradient estimators.

The field connects automatic differentiation with theorem proving, formal semantics, compiler verification, and numerical analysis.

As differentiable systems become infrastructure for science, engineering, and autonomous decision systems, verified differentiation becomes increasingly important for reliability, safety, and mathematical trustworthiness.