# Verified Differentiation

## Verified Differentiation

Automatic differentiation systems are usually trusted because they implement mathematically established rules such as the chain rule, product rule, and linearization of elementary operations.

However, practical AD systems are large compiler and runtime systems containing:

| Component | Complexity |
|---|---|
| graph transformations | compiler correctness |
| tensor kernels | numerical precision |
| mutation handling | state consistency |
| checkpointing | recomputation logic |
| custom gradients | user-defined semantics |
| distributed execution | synchronization correctness |

A small mistake may silently produce incorrect gradients.

Verified differentiation studies how to prove that derivative computations are correct with respect to precise mathematical and program semantics.

This area connects automatic differentiation with:

| Field | Role |
|---|---|
| formal verification | proof of correctness |
| theorem proving | machine-checked reasoning |
| programming language semantics | rigorous program meaning |
| numerical analysis | floating-point guarantees |
| compiler verification | transformation correctness |

The goal is stronger guarantees than empirical testing alone.

## What Does Correctness Mean?

A derivative system may appear correct while violating important properties.

There are several possible notions of correctness.

### Mathematical correctness

The produced derivative matches the mathematical derivative of the represented function.

### Program correctness

The transformed program correctly computes the derivative of the original program.

### Numerical correctness

Floating-point execution approximates the exact derivative within known bounds.

### Semantic correctness

Compiler optimizations preserve derivative meaning.

These notions are related but distinct.

## The Central Property

Suppose a program computes

$$
f : X \to Y.
$$

An AD transform produces another program

$$
Df : X \times X \to Y \times Y
$$

for forward mode.

Correctness informally means:

$$
Df(x,\dot{x}) =
(f(x), J_f(x)\dot{x}).
$$

For reverse mode, correctness means the generated adjoint computes the transpose Jacobian action:

$$
\bar{x} =
J_f(x)^T \bar{y}.
$$

A verified AD system attempts to prove these properties formally.

## Why Testing Is Not Enough

AD systems are difficult to validate purely through testing.

Finite-difference checks are common:

$$
\frac{
f(x+h)-f(x-h)
}{
2h
}.
$$

But finite differences have limitations:

| Problem | Consequence |
|---|---|
| truncation error | inaccurate reference |
| floating-point cancellation | instability |
| non-smooth functions | undefined comparison |
| high dimensions | expensive checking |

A system may pass numerical tests while still violating formal derivative semantics.

Formal verification aims to eliminate such ambiguity.

## Formal Semantics of Programs

Verification requires precise semantics for the source language.

A language semantics defines:

| Construct | Meaning |
|---|---|
| arithmetic | numerical interpretation |
| control flow | execution behavior |
| recursion | fixed-point meaning |
| mutation | state transition |
| exceptions | error propagation |

Without rigorous semantics, it is impossible to prove that differentiation preserves meaning.

Differentiable programming therefore depends heavily on programming language theory.

## Forward Mode Verification

Forward mode is conceptually simpler.

A forward transform augments each value with a tangent:

$$
x \mapsto (x,\dot{x}).
$$

Elementary operations are lifted.

Example:

$$
(x,\dot{x}) + (y,\dot{y}) =
(x+y,\dot{x}+\dot{y}).
$$

Multiplication becomes:

$$
(x,\dot{x})(y,\dot{y}) =
(xy,x\dot{y}+y\dot{x}).
$$

Verification proves that the transformed operations preserve the derivative interpretation.

The proof usually proceeds structurally over program syntax.

## Dual Number Semantics

Forward mode is often verified through dual numbers.

Define:

$$
a+b\varepsilon,
\qquad
\varepsilon^2=0.
$$

Evaluating a function on dual numbers gives:

$$
f(x+\dot{x}\varepsilon) =
f(x)+f'(x)\dot{x}\varepsilon.
$$

Thus the tangent component corresponds exactly to the derivative.

Formal verification may prove:

| Property | Meaning |
|---|---|
| algebra preservation | operations respect dual structure |
| compositionality | chain rule correctness |
| soundness | extracted tangent equals derivative |

Dual-number semantics provide a clean mathematical model for forward AD.

## Reverse Mode Verification

Reverse mode is harder.

The backward pass depends on:

| Issue | Difficulty |
|---|---|
| execution trace | runtime dependency |
| mutation | overwritten values |
| control flow | dynamic structure |
| memory reuse | aliasing |
| checkpointing | recomputation correctness |

A reverse transform effectively constructs a second program that computes cotangent propagation.

Verification must prove that this program computes the transpose Jacobian action correctly.

## Pullback Semantics

Reverse mode is often formalized through pullbacks.

Given:

$$
f : X \to Y,
$$

reverse mode computes:

$$
f^* : Y^* \to X^*.
$$

This maps output cotangents back to input cotangents.

The correctness condition is:

$$
f^*(\bar{y}) =
J_f(x)^T\bar{y}.
$$

Verification then becomes proof that generated pullback programs implement the correct linear maps.

## SSA Verification

Many compiler AD systems operate on SSA intermediate representations.

Example:

```text
x1 = ...
x2 = ...
x3 = mul(x1, x2)
```

The reverse transform generates:

```text
x1_bar += x2 * x3_bar
x2_bar += x1 * x3_bar
```

Verification proves:

| Property | Meaning |
|---|---|
| data dependency preservation | correct adjoint flow |
| linearity | cotangent accumulation validity |
| optimization safety | transformed IR preserves semantics |

Compiler verification becomes part of differentiation verification.

## Verified Compiler Passes

An optimizing compiler may transform code:

```text
x * 2
```

into:

```text
x + x
```

Ordinary semantics remain equivalent.

Derivative semantics must also remain equivalent.

A verified differentiable compiler must prove that optimizations preserve both:

| Semantics | Requirement |
|---|---|
| primal semantics | same outputs |
| derivative semantics | same derivatives |

This significantly increases compiler complexity.

## Floating-Point Verification

Real AD systems use floating-point arithmetic.

Floating-point derivatives differ from exact real derivatives because of:

| Issue | Example |
|---|---|
| rounding | representation error |
| cancellation | subtractive instability |
| overflow | infinite gradients |
| underflow | zero gradients |

Formal verification may include error bounds:

$$
|\hat{g}-g| \leq \epsilon.
$$

Here:

| Symbol | Meaning |
|---|---|
| `g` | exact derivative |
| `ĝ` | computed derivative |

Verified numerical analysis attempts to bound such errors rigorously.

## Differentiating Non-Smooth Programs

Many programs contain non-differentiable operations:

| Operation | Problem |
|---|---|
| max | kink |
| abs | undefined derivative at zero |
| sort | permutation discontinuity |
| branching | path discontinuity |

AD systems usually adopt conventions:

| Convention | Example |
|---|---|
| subgradient | ReLU derivative at zero |
| one-sided derivative | branch convention |
| arbitrary selection | implementation-defined |

Verification must specify which generalized derivative notion is intended.

Otherwise correctness claims are ambiguous.

## Verified Higher-Order Differentiation

Nested differentiation introduces subtle problems.

Example:

```text
grad(grad(f))
```

Potential failures include:

| Failure | Cause |
|---|---|
| perturbation confusion | tangent aliasing |
| tape reuse | incorrect adjoints |
| mixed-mode mismatch | semantic inconsistency |

Verified higher-order systems formally separate derivative levels and tangent spaces.

## Proof Assistants

Formal verification often uses theorem provers.

Examples include:

| System | Purpose |
|---|---|
| Coq | constructive proofs |
| Lean | dependent type verification |
| Isabelle | higher-order logic |
| Agda | type-theoretic proofs |

These systems allow machine-checked correctness proofs for differentiation transformations.

A proof assistant can verify:

| Target | Example |
|---|---|
| derivative rule | chain rule implementation |
| compiler pass | SSA adjoint generation |
| algebraic law | linearity |
| numerical property | error bound |

Machine-checked proofs reduce reliance on informal reasoning.

## Categorical Semantics

Category theory provides abstract semantics for differentiation.

A differentiable computation may be modeled as:

| Concept | Interpretation |
|---|---|
| morphism | program |
| tangent functor | forward derivative |
| cotangent structure | reverse derivative |
| composition | chain rule |

Differential categories formalize differentiation algebraically.

Verification then becomes proof that transformations satisfy categorical laws.

This gives very general semantic guarantees.

## Verified Tensor Systems

Tensor compilers introduce additional correctness issues.

A tensor transformation may involve:

| Transformation | Risk |
|---|---|
| fusion | changed evaluation order |
| parallelization | race conditions |
| layout optimization | indexing errors |
| distributed reduction | synchronization bugs |

Verified differentiation must account for tensor algebra semantics as well as scalar calculus.

## Probabilistic Verification

Probabilistic differentiation adds stochastic semantics.

Verification questions include:

| Question | Meaning |
|---|---|
| estimator unbiasedness | expected gradient correctness |
| variance bounds | optimization stability |
| expectation interchange | legality of differentiation |
| Monte Carlo convergence | asymptotic validity |

The derivative itself becomes a random variable.

Formal reasoning now combines probability theory with calculus.

## Verified Scientific Computing

Scientific simulation increasingly depends on differentiable solvers.

Verification targets include:

| System | Concern |
|---|---|
| PDE solver | adjoint consistency |
| ODE integrator | discretization correctness |
| optimization layer | KKT derivative validity |
| differentiable physics | conservation laws |

A small gradient error may invalidate optimization results in scientific applications.

This makes verification especially important in high-stakes domains.

## Certified Gradients

One goal is certified gradients.

Instead of returning only:

$$
\nabla f(x),
$$

the system may return:

$$
(\hat{g},\epsilon),
$$

where:

| Quantity | Meaning |
|---|---|
| `ĝ` | computed gradient |
| `ε` | proven error bound |

This resembles interval arithmetic and verified numerical computation.

## Security and Reliability

Incorrect gradients may create security and reliability risks.

Examples include:

| Domain | Risk |
|---|---|
| autonomous systems | unsafe optimization |
| medical systems | unstable training |
| financial optimization | incorrect sensitivity |
| scientific inference | invalid conclusions |

Verification therefore has practical as well as theoretical importance.

## Differentiation as a Certified Transformation

A verified differentiable compiler treats AD as a semantics-preserving program transformation.

The compiler must prove:

$$
\text{meaning}(Df) =
D(\text{meaning}(f)).
$$

This equation captures the central goal.

Differentiation of the program should equal differentiation of the mathematical function represented by the program.

## Failure Modes

Verification itself has limitations.

### Incorrect specifications

A proof may verify the wrong property.

### Incomplete semantics

The language meaning may omit runtime behavior.

### Floating-point mismatch

Real execution may differ from mathematical models.

### Trusted kernel assumptions

Proof assistants rely on trusted foundations.

### Scalability

Large tensor systems are difficult to verify completely.

Thus verification improves confidence but does not eliminate all risk.

## Conceptual Shift

Classical AD assumes derivative rules are implemented correctly.

Verified differentiation treats derivative computation itself as an object requiring proof.

The derivative becomes a formally certified computational artifact.

This shifts AD from:

| Classical view | Verified view |
|---|---|
| practical algorithm | formally verified transformation |
| empirical testing | mathematical proof |
| heuristic confidence | machine-checked correctness |

The goal is not only fast gradients, but trustworthy gradients.

## Summary

Verified differentiation studies formal correctness of automatic differentiation systems.

This includes proof of derivative semantics, compiler transformations, adjoint generation, floating-point behavior, higher-order differentiation, and stochastic gradient estimators.

The field connects automatic differentiation with theorem proving, formal semantics, compiler verification, and numerical analysis.

As differentiable systems become infrastructure for science, engineering, and autonomous decision systems, verified differentiation becomes increasingly important for reliability, safety, and mathematical trustworthiness.