Dual numbers provide an algebraic mechanism for differentiation, but they also have a precise geometric meaning. A dual number represents a point together with an...
Dual numbers provide an algebraic mechanism for differentiation, but they also have a precise geometric meaning. A dual number represents a point together with an infinitesimal displacement. Forward mode automatic differentiation can therefore be interpreted as the propagation of tangent vectors through a computation.
This geometric viewpoint explains why the chain rule appears naturally, why derivatives are linear maps, and why forward mode computes directional derivatives.
Points and Infinitesimal Motion
Consider a smooth function
An ordinary real number identifies a point on the real line.
A dual number
adds an infinitesimal displacement attached to the point .
Geometrically:
- is the base point
- is an infinitesimal velocity or tangent direction
- marks infinitesimal scale
The dual number therefore describes a first-order motion through space.
Instead of representing only location, it represents location plus local movement.
Tangent Vectors
In differential geometry, a tangent vector describes how a curve moves through a point.
Let
be a smooth curve with
Its tangent vector at is
Now consider the infinitesimal expression
This has exactly the structure of a dual number.
Dual numbers therefore encode tangent vectors directly.
The coefficient of is not merely a derivative value. It is a geometric tangent direction attached to a point.
Tangent Propagation Through Functions
Suppose
and consider a tangent vector
Evaluating gives
Geometrically:
- is the transformed point
- is the transformed tangent vector
The derivative acts by transporting infinitesimal motion.
The map
is the tangent map of at .
Forward mode AD computes this tangent transport automatically.
The Tangent Bundle
For a smooth manifold , the tangent bundle contains:
- all points
- all tangent vectors attached at each point
For Euclidean space:
Each element has two components:
This matches exactly the structure used in forward mode AD:
Dual numbers are therefore a coordinate representation of tangent bundle computation.
Forward mode AD evaluates functions on tangent bundles.
Functions Lift to Tangent Maps
Given
there exists an induced tangent map
The tangent map acts as
This is exactly the operational semantics of forward mode automatic differentiation.
Every variable carries:
- a primal value
- a tangent vector
Each operation updates both simultaneously.
AD therefore computes lifted functions on tangent spaces.
Example: A Curve Through a Function
Consider
Take a curve
Applying :
Expanding:
Near , only the linear term matters:
Thus the tangent vector transforms as
Now compare with dual numbers:
The coefficient of gives the same tangent transformation.
Dual-number arithmetic therefore reproduces first-order geometry exactly.
Directional Derivatives
A tangent vector specifies a direction.
For
the directional derivative at in direction is
Forward mode computes this naturally.
Evaluate:
Then:
The infinitesimal coefficient is the directional derivative.
Thus forward mode computes:
where is the Jacobian matrix.
Geometrically, this measures how the function changes when moving infinitesimally in direction .
Jacobians as Local Linear Maps
Near a point , a smooth function behaves approximately linearly:
The Jacobian is the best local linear approximation.
Dual numbers isolate exactly this linear part.
Because
all nonlinear higher-order terms vanish.
Only the local linear transformation survives.
Forward mode therefore computes the local geometry of the function.
Geometric Meaning of the Chain Rule
Suppose
and
Then
induces tangent propagation:
Thus:
This is the chain rule.
Geometrically, tangent vectors move through successive local linearizations.
Forward mode AD performs exactly this process during execution.
Pushforwards
In geometry, transporting tangent vectors through a map is called the pushforward.
For a smooth map , the pushforward at is
It maps tangent vectors from one space into another.
Automatic differentiation computes pushforwards mechanically.
Each program instruction pushes tangent information forward through the computational graph.
This is why forward mode is sometimes called tangent-linear propagation.
Example in Two Dimensions
Consider:
Take a tangent vector:
The Jacobian is:
Applying the tangent map:
Using dual numbers:
Then:
and
The coefficients match exactly.
Forward mode therefore computes tangent-vector transport through multivariate functions.
Infinitesimal Geometry
Dual numbers provide a first-order approximation to local geometry.
At infinitesimal scale:
- curves become tangent vectors
- smooth maps become linear maps
- nonlinear structure disappears
The nilpotent condition
removes curvature terms and retains only first-order behavior.
This is why dual numbers naturally encode differentiation.
Tangent Spaces as Linearization
At every point , the tangent space provides the best linear approximation to the surrounding geometry.
Automatic differentiation operates entirely within this linearized world.
Forward mode propagates tangent vectors:
Reverse mode propagates cotangent vectors:
These correspond to two dual geometric viewpoints:
| Mode | Geometric Object | Operation |
|---|---|---|
| Forward mode | Tangent vectors | Pushforward |
| Reverse mode | Cotangent vectors | Pullback |
This distinction becomes central in large-scale optimization and deep learning.
Relation to Differential Forms
Reverse mode AD is naturally connected to cotangent geometry.
A gradient is not fundamentally a tangent vector. It is a covector:
It acts on tangent vectors:
Forward mode pushes tangent vectors forward.
Reverse mode pulls covectors backward.
The two modes correspond to dual geometric structures.
Computational Graph Geometry
A computational graph can be viewed geometrically as a composition of local maps.
Each node defines a local transformation:
Forward mode propagates tangent information along graph edges.
At each node:
- Compute the primal output
- Apply the local Jacobian to the incoming tangent
The global derivative emerges from repeated local linear transport.
This interpretation explains why AD scales compositionally.
Geometric Summary
Dual numbers are not merely algebraic tricks. They model infinitesimal geometry.
A dual number
represents:
- a point
- an attached tangent vector
Applying a smooth function transports both:
Forward mode automatic differentiation is therefore:
- tangent bundle evaluation
- infinitesimal motion propagation
- local linear transport through programs
The chain rule becomes geometric composition of tangent maps.