Formal Verification

Automatic differentiation systems are trusted infrastructure. Scientific computing, machine learning, optimization, simulation, and control systems depend on gradients being correct. Yet AD implementations are complicated:

compiler transformations,
graph rewrites,
mutation handling,
checkpointing,
floating-point arithmetic,
higher-order differentiation,
distributed execution.

A small semantic error may silently produce incorrect gradients.

Formal verification studies how to prove that:

differentiation rules are correct,
transformed programs preserve semantics,
gradients satisfy mathematical definitions,
AD compilers are sound.

The goal is not testing a few examples. The goal is proving correctness for entire classes of programs.

Why Verification Matters

A numerical bug in ordinary code may:

crash,
produce NaNs,
visibly fail.

Gradient bugs are harder:

training diverges slowly,
optimization converges incorrectly,
physical simulations drift,
probabilistic inference becomes biased.

The system may appear functional while silently computing wrong derivatives.

Large AD systems are especially difficult because correctness depends on:

forward semantics,
reverse semantics,
transformation correctness,
memory reconstruction,
adjoint accumulation order,
control-flow handling.

Verification becomes increasingly important as differentiable systems move into:

aerospace,
robotics,
medicine,
finance,
scientific infrastructure.

What Must Be Verified

Several distinct properties matter.

Property	Meaning
Primal correctness	Program computes intended value
Differential correctness	Derivative matches mathematical derivative
Transformation correctness	AD transform preserves semantics
Higher-order correctness	Nested differentiation remains valid
Numerical correctness	Floating-point behavior remains bounded
Memory correctness	Reverse-mode storage/reconstruction valid
Optimization correctness	Compiler rewrites preserve gradients

An AD system may satisfy some while failing others.

Semantic Specification of Derivatives

Verification requires a precise definition of differentiation.

For a function:

f : \mathbb{R}^n \to \mathbb{R}^m,

the derivative at $x$ is the linear map:

Df(x)

satisfying:

f(x+h) = f(x)+Df(x)h+o(\|h\|).

This becomes the reference semantics.

An AD transformation is correct if the transformed program computes this derivative.

For forward mode:

\hat f(x,v) = (f(x),Df(x)v).

For reverse mode:

\hat f(x,w) = (f(x),w^T Df(x)).

Verification proves equivalence between:

transformed computation,
mathematical derivative definition.

Structural Induction on Programs

Many AD correctness proofs use structural induction.

Programs are built compositionally:

constants,
variables,
primitive operations,
composition,
control flow.

Correctness is proved recursively.

Example structure:

Base Cases

constants,
variables.

Inductive Cases

addition,
multiplication,
composition,
abstraction,
loops.

The chain rule becomes the inductive composition law.

This mirrors the operational structure of AD itself: local derivative rules combine into global correctness.

Verification of Forward Mode

Forward-mode correctness is comparatively straightforward.

The transformed computation propagates:

primal values,
tangent values.

Proof proceeds by induction over evaluation structure.

For multiplication:

f(x,y)=xy.

Forward mode computes:

(x,\dot x)(y,\dot y) = (xy,x\dot y+y\dot x).

Correctness follows directly from the product rule.

Because forward mode follows execution order naturally, its semantics align closely with ordinary evaluation semantics.

This makes forward mode easier to verify formally.

Verification of Reverse Mode

Reverse mode is substantially harder.

The backward pass:

reverses execution,
reconstructs intermediate state,
propagates adjoints,
accumulates gradients.

Correctness must show that reverse accumulation computes:

w^T J_f.

The challenge is that reverse mode does not follow ordinary evaluation order.

Proofs therefore require:

tape semantics,
continuation semantics,
adjoint invariants,
dual-space reasoning.

For each operation, one proves:

local pullback correctness,
composition correctness,
accumulation correctness.

The full reverse sweep then follows by induction over the computation graph.

Tape Correctness

A reverse-mode tape stores intermediate information required during backpropagation.

Verification must prove:

stored values are sufficient,
reconstructed values are correct,
adjoint propagation uses the correct primal state.

Suppose:

z = x*y

The backward pass requires:

original $x$ ,
original $y$ .

If mutation changes these values before the reverse sweep, gradients become incorrect.

Tape verification therefore interacts strongly with:

SSA transformation,
immutable semantics,
alias analysis,
memory lifetime.

SSA Form and Verification

Many verified AD systems lower programs into SSA form.

SSA guarantees:

each variable assigned once,
explicit data dependencies,
immutable intermediate values.

This greatly simplifies verification because:

tape reconstruction becomes explicit,
mutation disappears,
dependency graphs become structural.

Differentiable compiler IRs frequently adopt SSA for exactly this reason.

Examples include:

MLIR-based systems,
Enzyme IR transformations,
differentiable functional compilers.

Equational Reasoning

Formal verification often uses equational reasoning.

For example:

D(f+g)=Df+Dg

D(fg)=fDg+gDf

D(g\circ f)=Dg\circ Df.

The proof system rewrites programs using these identities.

This resembles symbolic algebra, but operates over program semantics rather than expression syntax alone.

Equational reasoning becomes more complicated with:

mutation,
floating-point arithmetic,
side effects,
control flow.

Pure functional semantics simplify proofs substantially.

Verification of Higher-Order AD

Higher-order differentiation introduces additional complexity.

Suppose:

D(D[f]).

Correctness now depends on:

nested perturbation semantics,
perturbation freshness,
closure environments,
higher-order substitution.

Perturbation confusion becomes a formal correctness issue.

Verification systems must prove:

perturbations remain isolated,
tangent spaces compose correctly,
nested derivatives correspond to higher-order calculus.

These proofs often require:

dependent types,
linear types,
categorical semantics.

Floating-Point Verification

Exact real-number differentiation differs from floating-point differentiation.

Verification must distinguish:

mathematical correctness,
numerical correctness.

Example:

(a+b)+c \ne a+(b+c)

in floating point.

Compiler optimizations preserving real semantics may alter gradients numerically.

Floating-point verification studies:

rounding bounds,
cancellation,
overflow,
underflow,
reproducibility.

This becomes particularly difficult in:

parallel reductions,
GPU kernels,
distributed gradient aggregation.

Verified floating-point AD remains an active research area.

Verified Compiler Transformations

AD systems perform many transformations:

fusion,
checkpointing,
operator reassociation,
graph simplification,
dead-code elimination.

Each transformation must preserve:

primal semantics,
derivative semantics.

Verification frameworks therefore prove transformation correctness formally.

A compiler rewrite:

P \to P'

must satisfy:

P \equiv P'

and:

D[P]\equiv D[P'].

Without such guarantees, optimization may silently corrupt gradients.

Theorem Provers

Several systems use theorem provers for AD verification.

Examples include:

Coq,
Lean,
Isabelle/HOL,
Agda.

These systems allow:

machine-checked proofs,
formal semantics,
verified transformations.

A proof assistant can verify:

derivative rules,
chain-rule correctness,
compiler rewrites,
higher-order semantics.

The challenge is scalability.

Real differentiable systems are large:

tensor compilers,
GPU kernels,
distributed graphs,
mixed precision.

Bridging formal proofs and industrial-scale systems remains difficult.

Verified Differentiable Languages

Some research languages are designed around verification from the beginning.

Goals include:

type-safe differentiation,
linear resource tracking,
verified reverse mode,
higher-order correctness.

These systems often combine:

functional semantics,
linear logic,
dependent types,
categorical structure.

The language semantics themselves encode differentiation invariants.

Logical Relations

Logical relations are frequently used to prove equivalence between:

original programs,
differentiated programs.

The idea is:

relate primal computations,
relate derivative computations,
prove preservation inductively.

Logical relations become especially useful for:

higher-order functions,
closures,
polymorphism,
recursive definitions.

They are one of the main semantic tools for proving AD correctness in functional languages.

Verification of Control Flow

Conditionals and loops complicate differentiation.

Suppose:

if x > 0:
    y = x*x
else:
    y = 0

Correctness depends on:

branch semantics,
differentiability at boundaries,
tape reconstruction,
control-flow alignment.

Loops introduce:

iterative state,
recurrent dependencies,
checkpoint correctness.

Verification often models loops using:

fixpoint semantics,
induction principles,
recurrence invariants.

Certified Gradients

One long-term goal is certified gradients:

gradients with machine-checkable correctness guarantees.

Applications include:

scientific reproducibility,
safety-critical optimization,
verified robotics,
formally safe controllers,
reliable differentiable simulators.

A certified gradient system would guarantee:

derivative correctness,
numerical bounds,
semantic preservation.

Such systems remain expensive today but are becoming increasingly relevant.

Practical Limits of Verification

Formal verification has costs:

proof complexity,
compiler restrictions,
performance overhead,
engineering effort.

Large ML systems evolve rapidly:

custom kernels,
hardware-specific optimizations,
dynamic graphs,
distributed execution.

Complete formal verification of industrial AD stacks remains difficult.

Most practical systems therefore combine:

formal reasoning,
testing,
gradient checking,
runtime assertions,
restricted semantics.

Gradient Checking

A widely used practical method is gradient checking.

Compare:

AD gradient,
numerical finite-difference approximation.

Example:

\frac{f(x+h)-f(x-h)}{2h}.

This does not prove correctness formally, but detects many implementation bugs.

Gradient checking is especially useful for:

custom operators,
hand-written backward rules,
fused kernels.

However:

finite differences are approximate,
higher-order derivatives remain difficult,
numerical instability may hide bugs.

Formal verification is stronger because it proves correctness structurally.

The Central Verification Problem

Automatic differentiation transforms programs into derivative programs.

Formal verification asks whether this transformation is semantically sound.

The challenge spans several interacting layers:

calculus,
algebra,
program semantics,
compiler theory,
floating-point arithmetic,
memory models,
higher-order computation.

Forward mode is relatively direct because derivative propagation follows evaluation order.

Reverse mode is harder because it reconstructs and inverts computational structure.

Verified AD systems therefore increasingly rely on:

immutable IRs,
SSA representations,
functional semantics,
linear typing,
theorem-proving infrastructure.

As differentiable systems become foundational infrastructure, formal verification becomes less optional and more architectural.