Operational semantics explains how automatic differentiation executes. Denotational semantics explains what differentiable programs mean.

A denotational model assigns mathematical objects to programs:

numbers,
functions,
spaces,
morphisms,
domains,
linear maps.

The meaning of a program becomes independent of:

execution order,
implementation strategy,
compiler structure,
hardware behavior.

For automatic differentiation, denotational semantics answers a central question:

What mathematical object does a differentiable program denote?

This question becomes increasingly important for:

higher-order differentiation,
differentiable programming languages,
compiler correctness,
formal verification,
categorical semantics,
differentiable functional languages.

Denotational models provide the semantic foundation connecting:

programs,
derivatives,
linear approximations,
compositional structure.

Operational vs Denotational Semantics

Operational semantics describes execution.

Example:

x = x + 1

Operationally:

evaluate $x$ ,
add $1$ ,
assign result.

Denotational semantics ignores execution details and assigns a mathematical meaning directly.

The program denotes a function:

f(x)=x+1.

Differentiation then acts on the denoted mathematical object rather than on execution steps.

This distinction matters because:

many implementations may realize the same semantics,
differentiation should depend on meaning, not incidental execution structure.

Programs as Morphisms

A denotational model typically interprets programs as morphisms in a category.

A program:

P : X \to Y

denotes a map between semantic spaces.

Examples:

smooth functions,
continuous maps,
measurable maps,
linear maps,
domain-theoretic functions.

Automatic differentiation then becomes a semantic transformation:

P \mapsto DP.

The derivative itself is another semantic morphism.

This abstraction separates:

syntax,
implementation,
mathematical meaning.

Smooth Maps as Semantics

The simplest denotational model interprets programs as smooth functions:

f : \mathbb{R}^n \to \mathbb{R}^m.

Differentiation becomes ordinary calculus:

Df(x).

Forward mode computes:

(x,v)\mapsto(f(x),Df(x)v).

Reverse mode computes pullbacks:

(x,w)\mapsto(f(x),Df(x)^T w).

This model is mathematically clean, but insufficient for real programming languages because it lacks:

higher-order functions,
recursion,
effects,
partiality,
infinite structures.

More expressive denotational models are required.

Domain Theory

Programming languages often contain:

recursion,
loops,
nontermination.

A recursive definition:

f(x) = f(x)

has no ordinary smooth-function interpretation.

Domain theory introduces partially ordered semantic spaces where recursive computations correspond to fixpoints.

A domain contains:

partial information,
approximation structure,
least upper bounds.

Programs denote continuous functions between domains.

Recursive definitions are interpreted through least fixpoints:

f=\mathrm{fix}(F).

This allows denotational semantics for:

recursion,
infinite computations,
lazy evaluation.

Differentiable domain theory extends these ideas to differentiable computation.

Continuous Linearization

Differentiation fundamentally produces linear approximations.

Denotational models therefore often incorporate:

vector spaces,
Banach spaces,
locally convex spaces,
linear maps.

A program denotes:

nonlinear semantic behavior,
together with linearized local structure.

The derivative becomes a semantic linear operator:

Df(x):V\to W.

This linearization structure is central in:

differential categories,
tangent categories,
linear logic semantics,
reverse-mode semantics.

Denotational Semantics of Forward Mode

Forward mode admits a direct denotational interpretation.

A function:

f:X\to Y

is lifted into tangent space:

Tf:TX\to TY.

The tangent-space interpretation becomes:

Tf(x,v)=(f(x),Df(x)v).

The tangent bundle is therefore part of the semantic universe itself.

Programs denote tangent-transforming maps.

Forward-mode AD becomes semantic lifting rather than merely operational augmentation.

Denotational Semantics of Reverse Mode

Reverse mode is subtler.

Forward mode propagates tangent vectors covariantly:

v\mapsto Df(x)v.

Reverse mode propagates cotangent vectors contravariantly:

w\mapsto Df(x)^T w.

A denotational model must therefore include:

dual spaces,
pullbacks,
adjoint operators.

The backward pass becomes a semantic pullback operation.

This distinction explains why reverse mode is operationally more complicated:

execution order reverses,
linear maps transpose,
cotangent spaces propagate backward.

Several categorical models interpret reverse mode through:

continuation semantics,
optics,
profunctors,
dual categories.

Logical Relations

Logical relations are often used to relate:

original programs,
differentiated programs.

Suppose:

$P$ computes a function,
$D[P]$ computes derivatives.

A logical relation proves:

D[P]

denotes the mathematical derivative of the meaning of $P$ .

This is especially useful for:

higher-order functions,
polymorphism,
closures,
recursive definitions.

Logical relations connect operational differentiation with denotational meaning.

Lambda Calculus Semantics

Lambda calculus requires semantic models supporting function spaces.

A higher-order function:

F:(X\to Y)\to Z

must itself be interpreted mathematically.

Cartesian closed categories provide:

products,
exponentials,
higher-order function semantics.

Differential lambda calculi require:

cartesian closure,
differential structure,
tangent semantics.

The denotational model must support:

abstraction,
application,
differentiation,
higher-order composition.

This becomes one of the deepest areas in differentiable programming semantics.

Linear Logic Semantics

Differentiation interacts naturally with linear logic.

A derivative is linear in perturbations:



Df(x)(a v+b w)=aDf(x)(v)+bDf(x)(w)

Linear logic distinguishes:

linear resources,
duplicable resources.

This matters operationally because:

reverse mode accumulates gradients linearly,
tensor duplication consumes memory,
adjoint propagation is additive.

Denotational models based on linear logic explain:

gradient accumulation,
checkpointing,
memory-sensitive AD,
resource-aware differentiation.

Probabilistic Denotational Models

Modern differentiable systems increasingly combine:

differentiation,
probability,
stochastic computation.

Programs may denote:

probability measures,
stochastic kernels,
random processes.

Differentiation then acts on probabilistic semantics.

Examples include:

variational inference,
differentiable Monte Carlo,
probabilistic programming,
score-function estimators.

Denotational semantics for probabilistic AD remains an active research area.

Nonsmooth Semantics

Many practical programs are not smooth:

ReLU,
max,
clipping,
sparse routing,
conditionals.

Classical derivatives may not exist.

Denotational models must then incorporate:

generalized derivatives,
subgradients,
Clarke derivatives,
piecewise smooth structure.

Different AD systems implicitly choose different nonsmooth semantics.

A denotational framework makes these choices explicit.

Denotational Meaning of Computational Graphs

A computational graph operationally represents dependency structure.

Denotationally, the graph denotes a composed morphism:

f_n\circ\cdots\circ f_2\circ f_1.

Differentiation acts compositionally:

D(f_n\circ\cdots\circ f_1).

The graph itself is not the meaning.

It is an implementation structure realizing a semantic morphism.

This distinction matters because:

many graphs may denote the same function,
compiler rewrites alter graphs,
semantics should remain invariant.

Full Abstraction

A denotational model is fully abstract if:

semantically equivalent programs,
operationally indistinguishable programs,

coincide.

For differentiable programming, this becomes difficult because observations may include:

outputs,
gradients,
Hessians,
runtime behavior,
memory behavior.

Two programs may:

compute identical outputs,
produce different gradients,
expose different numerical behavior.

Designing fully abstract differentiable semantics remains difficult.

Denotational Models and Compiler Correctness

Compiler transformations should preserve denotational meaning.

Suppose:

P \to P'.

Correctness requires:

[[P]]=[[P']].

For AD systems:

$$ [[D[P]]]

D([[P]]). $$

This equation states:

The semantics of the differentiated program equal the derivative of the program semantics.

This is one of the central correctness conditions in differentiable compiler theory.

Continuation Semantics and Reverse Mode

Reverse mode closely resembles continuation semantics.

A continuation represents:

the future use of a value.

An adjoint represents:

sensitivity of future computation to a value.

This correspondence leads to denotational models where:

reverse mode,
CPS transformation,
continuation categories,

share semantic structure.

Several modern reverse-mode theories are built around this connection.

Synthetic Differential Geometry

Some denotational models use synthetic differential geometry.

Instead of defining derivatives through limits, infinitesimals become primitive semantic objects.

Dual numbers:

a+\epsilon b

with:

\epsilon^2=0

become semantic entities directly.

Differentiation emerges algebraically from infinitesimal structure.

This approach aligns naturally with forward-mode AD semantics.

The Semantic Equation of AD

A central denotational equation for forward mode is:



T(g\circ f)=T(g)\circ T(f)

Differentiation preserves composition.

This equation explains why local derivative rules combine into globally correct derivatives.

For reverse mode, the corresponding semantic law involves pullbacks and dual composition.

These compositional laws are the semantic heart of automatic differentiation.

Practical Relevance

Denotational semantics may appear abstract, but it directly influences:

differentiable language design,
compiler IR structure,
higher-order AD,
correctness proofs,
optimization legality,
resource semantics.

Modern differentiable systems increasingly rely on semantic models because:

AD transformations are complex,
compiler optimizations are aggressive,
higher-order programs are common,
differentiable infrastructure is becoming foundational.

The Core Semantic Picture

Denotational models interpret differentiable programs as mathematical objects.

Differentiation acts on semantic meaning rather than execution steps.

Forward mode lifts programs into tangent semantics.

Reverse mode propagates dual semantic structure backward.

Computational graphs become implementations of semantic morphisms.

Compiler transformations are correct when semantic meaning remains invariant.

Automatic differentiation therefore becomes a semantic transformation over the mathematical meaning of computation itself.