# Differential Lambda Calculus

## Differential Lambda Calculus

Automatic differentiation is deeply connected to functional programming and lambda calculus. Programs can be viewed as mathematical functions, and differentiation can be treated as a transformation on programs themselves.

Differential lambda calculus extends ordinary lambda calculus with explicit notions of infinitesimal variation, linear approximation, and derivative propagation.

This provides a formal foundation for differentiable programming languages.

### Ordinary Lambda Calculus

Lambda calculus is a minimal formal system for describing computation.

Its core constructs are:

| Construct | Meaning |
|---|---|
| variable | symbolic value |
| abstraction | function definition |
| application | function call |

A function is written:

$$
\lambda x . f(x).
$$

Function application:

$$
(\lambda x . f(x))(a).
$$

Reduction substitutes:

$$
x \mapsto a.
$$

Despite its simplicity, lambda calculus forms the basis of:

- functional programming
- type theory
- compiler semantics
- program transformation

Differentiable programming extends this framework to support derivatives.

### Programs as Mathematical Functions

Automatic differentiation treats programs as differentiable mappings.

Suppose:

$$
f : X \to Y.
$$

Differentiation produces:

$$
Df : X\times X \to Y.
$$

The derivative transforms:

- a value
- a perturbation

into a new perturbation.

Differential lambda calculus internalizes this transformation directly into the programming language.

### Why Lambda Calculus Matters for AD

Most modern AD systems operate on program structure.

Examples:

- tracing computational graphs
- transforming intermediate representations
- source-to-source differentiation
- staged compilation

Lambda calculus provides a precise semantic model for such transformations.

Differentiation becomes a program transformation rule.

### Differential Operator

Differential lambda calculus introduces a differential operator:

$$
D.
$$

Applied to a term:

$$
D(f).
$$

Operationally:

$$
D(f)(x,v) =
Df_x(v).
$$

The derivative is itself a first-class computable object.

This mirrors forward-mode AD directly.

### Differential Application

Ordinary application:

$$
f(x).
$$

Differential application propagates perturbations:

$$
D(f(x)).
$$

By chain rule:

$$
D(f(x)) =
D(f)(x,D(x)).
$$

If $x$ is perturbed by $v$:

$$
D(f)(x,v) =
Df_x(v).
$$

Differential lambda calculus formalizes this propagation syntactically.

### Linear Use of Variables

Differentiation is fundamentally linear.

For small perturbations:

$$
f(x+h)
\approx
f(x)+Df_x(h).
$$

The perturbation $h$ appears linearly.

This motivates linear logic and linear type systems.

A derivative consumes perturbations exactly once.

This becomes important for:

- reverse-mode memory management
- mutation control
- differentiable functional languages
- compiler optimization

### Linear Lambda Calculus

Linear lambda calculus restricts variable usage.

Variables cannot be:

- duplicated freely
- discarded arbitrarily

This matches derivative propagation naturally.

For example, tangent vectors propagate linearly through programs.

Differential lambda calculus often combines:

- ordinary nonlinear computation
- linear perturbation structure

within a unified calculus.

### Differential Substitution

Ordinary substitution replaces variables directly.

Differential substitution tracks infinitesimal variation.

Suppose:

$$
f(x+\varepsilon v).
$$

Expanding:

$$
f(x)+\varepsilon Df_x(v).
$$

The derivative measures sensitivity of substitution itself.

Differential lambda calculi formalize this process structurally.

### Ehrhard-Regnier Differential Lambda Calculus

One influential formulation was introduced by entity["people","Thomas Ehrhard","French mathematician and computer scientist"] and entity["people","Laurent Regnier","French computer scientist"].

This system extends lambda calculus with:

- linear application
- differential operators
- additive structure

The derivative of a term becomes another term inside the language.

Differentiation is no longer external meta-analysis.

It becomes part of computation itself.

### Differential Application Operator

The calculus introduces syntax resembling:

$$
D(s)\cdot t.
$$

Interpretation:

- perturb function $s$
- apply infinitesimal variation $t$

This explicitly represents derivative propagation.

Operational semantics then enforce chain-rule behavior automatically.

### Additive Structure

Differentiation requires addition:

$$
D(f+g)=Df+Dg.
$$

Thus differential lambda calculi introduce additive terms:

$$
s+t.
$$

Programs become algebraic objects supporting:

- linear combination
- infinitesimal variation
- derivative composition

This differs from ordinary lambda calculus, which is purely syntactic.

### Tangent Semantics

Differential lambda calculus corresponds naturally to tangent semantics.

Each term denotes:

$$
(x,\dot{x}).
$$

Function evaluation propagates both:

- primal computation
- tangent computation

This matches forward-mode AD exactly.

The calculus therefore provides a semantic foundation for dual-number evaluation.

### Reverse Mode Semantics

Forward mode propagates tangent vectors.

Reverse mode propagates cotangent vectors.

Differential lambda calculi can be extended with:

- continuation semantics
- adjoint transformations
- linear continuations

These correspond naturally to reverse-mode AD.

Modern differentiable language compilers often formalize reverse mode through continuation-passing style transformations.

### Continuation-Passing Style

In continuation-passing style (CPS), computations explicitly represent future computation context.

Reverse mode naturally aligns with CPS because gradients propagate backward through future dependencies.

Operationally:

| Forward mode | Reverse mode |
|---|---|
| tangent propagation | continuation propagation |
| local pushforward | global pullback |
| direct evaluation | adjoint accumulation |

Many theoretical reverse-mode formulations therefore use CPS-transformed lambda calculi.

### Closures and Lexical Scope

Functional languages use closures:

$$
\lambda x . f(x,y).
$$

The variable $y$ comes from surrounding context.

Differentiation must preserve closure semantics.

This becomes subtle because gradients must flow through captured environments.

Differential lambda calculus provides formal rules for differentiating closures safely.

### Higher-Order Functions

Functions may accept functions as arguments:

$$
f(g).
$$

Differentiating higher-order programs requires differentiating function-valued objects.

This creates deep semantic challenges:

- differentiating closures
- differentiating control flow
- differentiating recursion
- differentiating laziness

Differential lambda calculus provides a foundation for reasoning about these systems formally.

### Example

Consider:

$$
f=\lambda x . x^2.
$$

Its derivative becomes:

$$
Df(x,v)=2xv.
$$

Operationally:

$$
(x,\dot{x})
\mapsto
(x^2,2x\dot{x}).
$$

This is exactly the forward-mode lifting rule.

The derivative is represented as another lambda term.

### Typed Differential Calculi

Modern systems often add types.

For example:

$$
f : X \to Y.
$$

Then:

$$
Df : X \times X \to Y.
$$

Types encode derivative structure explicitly.

This enables:

- compile-time verification
- differentiation safety
- optimization
- effect tracking

Differentiable programming languages increasingly rely on typed differentiation systems.

### Differential Categories and Lambda Calculus

Differential lambda calculi connect directly to differential categories.

The categorical derivative operator:

$$
D[f]
$$

corresponds to syntactic differential operators inside lambda terms.

This unifies:

- syntax
- semantics
- geometry
- differentiation

within a single framework.

### Resource Sensitivity

Differentiation is resource-sensitive.

Reverse mode especially requires careful management of:

- intermediate values
- memory retention
- recomputation
- mutation

Linear and differential lambda calculi model these constraints explicitly.

This influences:

- gradient checkpointing
- differentiable compilers
- memory-safe AD systems

### Functional AD Systems

Many modern AD frameworks are heavily influenced by functional semantics.

Examples include:

- entity["software","JAX","Python automatic differentiation library"]
- entity["software","Zygote","Julia automatic differentiation system"]
- entity["software","Dex","Differentiable functional programming language"]

These systems emphasize:

- pure functions
- compositional transformations
- higher-order differentiation
- staged compilation

Functional purity simplifies differentiation semantics dramatically.

### Mutation and Effects

Imperative mutation complicates differentiation.

Example:

```text
x = x + 1
```

The original value disappears.

Reverse mode may still need it later.

Differential lambda calculi typically assume:

- immutable values
- referential transparency
- explicit dataflow

Modern differentiable languages often restrict mutation or track it carefully through effect systems.

### Differentiable Programming Languages

The long-term goal is languages where differentiation is a native semantic operation.

Instead of external AD libraries:

```python
grad(f)
```

the language itself understands differentiation structurally.

Examples of desired capabilities:

- differentiable control flow
- differentiable recursion
- differentiable data structures
- differentiable higher-order functions
- differentiable probabilistic programs

Differential lambda calculus provides theoretical foundations for such systems.

### Operational Interpretation

Automatic differentiation can therefore be viewed as:

| Perspective | Interpretation |
|---|---|
| numerical | derivative computation |
| algebraic | nilpotent propagation |
| geometric | tangent transport |
| categorical | functorial lifting |
| lambda-calculus | program transformation |

These viewpoints are mathematically equivalent descriptions of the same phenomenon.

### Summary

Differential lambda calculus extends ordinary lambda calculus with explicit differential structure.

Programs become differentiable objects.

Differentiation becomes:

- syntactic transformation
- semantic lifting
- linear perturbation propagation

The derivative operator acts compositionally across program structure.

This framework provides foundations for:

- differentiable programming languages
- typed AD systems
- higher-order differentiation
- compiler-level AD
- functional differentiable computation

Automatic differentiation is therefore not merely a numerical technique. It can be understood as a structural property of computation itself.