Geometric Interpretation

Dual numbers provide an algebraic mechanism for differentiation, but they also have a precise geometric meaning. A dual number represents a point together with an infinitesimal displacement. Forward mode automatic differentiation can therefore be interpreted as the propagation of tangent vectors through a computation.

This geometric viewpoint explains why the chain rule appears naturally, why derivatives are linear maps, and why forward mode computes directional derivatives.

Points and Infinitesimal Motion

Consider a smooth function

f : \mathbb{R} \to \mathbb{R}.

An ordinary real number $x$ identifies a point on the real line.

A dual number

x + v\varepsilon

adds an infinitesimal displacement $v$ attached to the point $x$ .

Geometrically:

$x$ is the base point
$v$ is an infinitesimal velocity or tangent direction
$\varepsilon$ marks infinitesimal scale

The dual number therefore describes a first-order motion through space.

Instead of representing only location, it represents location plus local movement.

Tangent Vectors

In differential geometry, a tangent vector describes how a curve moves through a point.

Let

\gamma(t)

be a smooth curve with

\gamma(0)=x.

Its tangent vector at $x$ is

\gamma'(0).

Now consider the infinitesimal expression

\gamma(0) + \gamma'(0)\varepsilon.

This has exactly the structure of a dual number.

Dual numbers therefore encode tangent vectors directly.

The coefficient of $\varepsilon$ is not merely a derivative value. It is a geometric tangent direction attached to a point.

Tangent Propagation Through Functions

Suppose

f : \mathbb{R} \to \mathbb{R}

and consider a tangent vector

x + v\varepsilon.

Evaluating $f$ gives

f(x+v\varepsilon) = f(x) + f'(x)v\varepsilon.

Geometrically:

$f(x)$ is the transformed point
$f'(x)v$ is the transformed tangent vector

The derivative acts by transporting infinitesimal motion.

The map

v \mapsto f'(x)v

is the tangent map of $f$ at $x$ .

Forward mode AD computes this tangent transport automatically.

The Tangent Bundle

For a smooth manifold $M$ , the tangent bundle $TM$ contains:

all points $x \in M$
all tangent vectors attached at each point

For Euclidean space:

T\mathbb{R}^n = \mathbb{R}^n \times \mathbb{R}^n.

Each element has two components:

(x,v).

This matches exactly the structure used in forward mode AD:

(\text{value}, \text{tangent}).

Dual numbers are therefore a coordinate representation of tangent bundle computation.

Forward mode AD evaluates functions on tangent bundles.

Functions Lift to Tangent Maps

Given

f : \mathbb{R}^n \to \mathbb{R}^m,

there exists an induced tangent map

Tf : T\mathbb{R}^n \to T\mathbb{R}^m.

The tangent map acts as

Tf(x,v) = (f(x), Df_x(v)).

This is exactly the operational semantics of forward mode automatic differentiation.

Every variable carries:

a primal value
a tangent vector

Each operation updates both simultaneously.

AD therefore computes lifted functions on tangent spaces.

Example: A Curve Through a Function

Consider

f(x)=x^2.

Take a curve

\gamma(t)=x+tv.

Applying $f$ :

f(\gamma(t)) = (x+tv)^2.

Expanding:

= x^2 + 2xvt + v^2t^2.

Near $t=0$ , only the linear term matters:

f(\gamma(t)) \approx x^2 + 2xvt.

Thus the tangent vector transforms as

v \mapsto 2xv.

Now compare with dual numbers:

f(x+v\varepsilon) = x^2 + 2xv\varepsilon.

The coefficient of $\varepsilon$ gives the same tangent transformation.

Dual-number arithmetic therefore reproduces first-order geometry exactly.

Directional Derivatives

A tangent vector specifies a direction.

For

f : \mathbb{R}^n \to \mathbb{R},

the directional derivative at $x$ in direction $v$ is

Df_x(v).

Forward mode computes this naturally.

Evaluate:

x + v\varepsilon.

Then:

f(x+v\varepsilon) = f(x) + Df_x(v)\varepsilon.

The infinitesimal coefficient is the directional derivative.

Thus forward mode computes:

Jv,

where $J$ is the Jacobian matrix.

Geometrically, this measures how the function changes when moving infinitesimally in direction $v$ .

Jacobians as Local Linear Maps

Near a point $x$ , a smooth function behaves approximately linearly:

f(x+h) \approx f(x)+J_x h.

The Jacobian is the best local linear approximation.

Dual numbers isolate exactly this linear part.

Because

\varepsilon^2 = 0,

all nonlinear higher-order terms vanish.

Only the local linear transformation survives.

Forward mode therefore computes the local geometry of the function.

Geometric Meaning of the Chain Rule

Suppose

f : \mathbb{R}^n \to \mathbb{R}^m

and

g : \mathbb{R}^m \to \mathbb{R}^k.

Then

g \circ f

induces tangent propagation:

(x,v) \mapsto (f(x), Df_x(v)) \mapsto (g(f(x)), Dg_{f(x)}(Df_x(v))).

Thus:

D(g\circ f)_x(v) = Dg_{f(x)}(Df_x(v)).

This is the chain rule.

Geometrically, tangent vectors move through successive local linearizations.

Forward mode AD performs exactly this process during execution.

Pushforwards

In geometry, transporting tangent vectors through a map is called the pushforward.

For a smooth map $f$ , the pushforward at $x$ is

f_* : T_xM \to T_{f(x)}N.

It maps tangent vectors from one space into another.

Automatic differentiation computes pushforwards mechanically.

Each program instruction pushes tangent information forward through the computational graph.

This is why forward mode is sometimes called tangent-linear propagation.

Example in Two Dimensions

Consider:

f(x,y) = (xy,\sin x).

Take a tangent vector:

(x,y,v_x,v_y).

The Jacobian is:

J= \begin{bmatrix} y & x \\ \cos x & 0 \end{bmatrix}.

Applying the tangent map:

J \begin{bmatrix} v_x \\ v_y \end{bmatrix} = \begin{bmatrix} yv_x + xv_y \\ (\cos x)v_x \end{bmatrix}.

Using dual numbers:

x \mapsto x+v_x\varepsilon

y \mapsto y+v_y\varepsilon.

Then:

xy = xy + (yv_x+xv_y)\varepsilon

and

\sin x = \sin x + (\cos x)v_x\varepsilon.

The coefficients match exactly.

Forward mode therefore computes tangent-vector transport through multivariate functions.

Infinitesimal Geometry

Dual numbers provide a first-order approximation to local geometry.

At infinitesimal scale:

curves become tangent vectors
smooth maps become linear maps
nonlinear structure disappears

The nilpotent condition

\varepsilon^2 = 0

removes curvature terms and retains only first-order behavior.

This is why dual numbers naturally encode differentiation.

Tangent Spaces as Linearization

At every point $x$ , the tangent space provides the best linear approximation to the surrounding geometry.

Automatic differentiation operates entirely within this linearized world.

Forward mode propagates tangent vectors:

v \to Jv.

Reverse mode propagates cotangent vectors:

u \to J^Tu.

These correspond to two dual geometric viewpoints:

Mode	Geometric Object	Operation
Forward mode	Tangent vectors	Pushforward
Reverse mode	Cotangent vectors	Pullback

This distinction becomes central in large-scale optimization and deep learning.

Relation to Differential Forms

Reverse mode AD is naturally connected to cotangent geometry.

A gradient is not fundamentally a tangent vector. It is a covector:

df_x.

It acts on tangent vectors:

df_x(v)=Df_x(v).

Forward mode pushes tangent vectors forward.

Reverse mode pulls covectors backward.

The two modes correspond to dual geometric structures.

Computational Graph Geometry

A computational graph can be viewed geometrically as a composition of local maps.

Each node defines a local transformation:

x \to y.

Forward mode propagates tangent information along graph edges.

At each node:

Compute the primal output
Apply the local Jacobian to the incoming tangent

The global derivative emerges from repeated local linear transport.

This interpretation explains why AD scales compositionally.

Geometric Summary

Dual numbers are not merely algebraic tricks. They model infinitesimal geometry.

A dual number

x+v\varepsilon

represents:

a point $x$
an attached tangent vector $v$

Applying a smooth function transports both:

f(x+v\varepsilon) = f(x)+Df_x(v)\varepsilon.

Forward mode automatic differentiation is therefore:

tangent bundle evaluation
infinitesimal motion propagation
local linear transport through programs

The chain rule becomes geometric composition of tangent maps.