Tangent Propagation

Forward mode automatic differentiation computes derivatives by propagating tangent values alongside ordinary values. The ordinary value is called the primal. The derivative value is called the tangent.

For every program variable $v$ , forward mode tracks:

(v,\dot v)

The dot notation means “the change in $v$ induced by a chosen change in the input.”

If the input changes in direction $\dot x$ , then every later value changes according to the chain rule.

Directional Derivatives

Let:

f:\mathbb{R}^n\to\mathbb{R}^m

At input $x$ , choose a direction:

\dot x\in\mathbb{R}^n

Forward mode computes:

\dot y = J_f(x)\dot x

where $J_f(x)$ is the Jacobian.

Thus forward mode computes a directional derivative. It tells us how the output changes when the input moves infinitesimally in direction $\dot x$ .

Primal and Tangent Execution

Consider:

f(x)=x^2+3x

A program evaluates:

v1 = x * x
v2 = 3 * x
v3 = v1 + v2
return v3

Forward mode evaluates a paired program:

v1     = x * x
dot_v1 = x * dot_x + x * dot_x

v2     = 3 * x
dot_v2 = 3 * dot_x

v3     = v1 + v2
dot_v3 = dot_v1 + dot_v2

return v3, dot_v3

If $\dot x=1$ , then:

\dot v_3=2x+3

which is the ordinary derivative.

Local Tangent Rules

Each primitive operation has a tangent rule.

Operation	Primal	Tangent
addition	$z=x+y$	$\dot z=\dot x+\dot y$
subtraction	$z=x-y$	$\dot z=\dot x-\dot y$
multiplication	$z=xy$	$\dot z=y\dot x+x\dot y$
division	$z=x/y$	$\dot z=(y\dot x-x\dot y)/y^2$
sine	$z=\sin x$	$\dot z=\cos(x)\dot x$
exponential	$z=e^x$	$\dot z=e^x\dot x$
logarithm	$z=\log x$	$\dot z=\dot x/x$

These rules are applied in the same order as the program.

Tangent Seeding

The initial tangent determines the derivative being computed.

For one scalar input:

\dot x=1

computes:

\frac{df}{dx}

For vector input $x=(x_1,\dots,x_n)$ , choosing:

\dot x=e_i

computes the $i$ -th column of the Jacobian.

Example:

f(x_1,x_2)= \begin{bmatrix} x_1x_2\\ \sin(x_1)+x_2 \end{bmatrix}

Seed:

\dot x= \begin{bmatrix} 1\\ 0 \end{bmatrix}

Then forward mode computes the first Jacobian column:

\dot y= \begin{bmatrix} x_2\\ \cos(x_1) \end{bmatrix}

Seed:

\dot x= \begin{bmatrix} 0\\ 1 \end{bmatrix}

Then it computes the second column:

\dot y= \begin{bmatrix} x_1\\ 1 \end{bmatrix}

Tangents Through Control Flow

Forward mode follows the same control flow as the primal program.

Example:

if x > 0:
    y = x * x
else:
    y = -x

If $x>0$ :

\dot y=2x\dot x

If $x<0$ :

\dot y=-\dot x

At $x=0$ , the function has a corner. Forward mode returns the tangent of the executed branch. It does not infer a symbolic piecewise derivative.

This is important for programs with:

thresholds
comparisons
clipping
indexing
data-dependent branches

Tangents Through Loops

Forward mode handles loops directly because tangent propagation follows primal execution.

Example:

y = x
for i in range(n):
    y = y * y

The tangent program is:

y = x
dot_y = dot_x

for i in range(n):
    old_y = y
    old_dot_y = dot_y

    y = old_y * old_y
    dot_y = old_y * old_dot_y + old_y * old_dot_y

Each iteration propagates the tangent through the loop body.

For fixed $n$ , this computes the derivative of the function represented by the loop.

Tangents for Structured Values

Real programs use arrays, tuples, structs, and nested containers.

Forward mode assigns tangents only to differentiable components.

Example:

State {
    position: Float
    velocity: Float
    step: Int
}

A tangent state may contain:

StateTangent {
    position: Float
    velocity: Float
}

The integer step has no tangent because discrete values do not vary infinitesimally.

This distinction becomes important in differentiable programming languages and compiler IRs.

Multiple Tangent Directions

Forward mode can carry several tangent directions at once.

Instead of:

(v,\dot v)

use:

(v,\dot v_1,\dot v_2,\dots,\dot v_k)

This computes:

J_f(x)S

where $S$ is a matrix of seed directions.

Multi-direction forward mode is useful for:

computing several Jacobian columns
exploiting SIMD and vector hardware
reducing interpreter overhead
sparse Jacobian recovery

The cost grows roughly linearly with the number of tangent directions, but batching can improve constants.

Memory Behavior

Forward mode has simple memory behavior.

At each point, it needs only:

current primal values
current tangent values

It does not need to store a full tape for a later backward pass.

Thus forward mode is attractive for:

streaming computations
long simulations
online sensitivity analysis
low-memory systems

Its weakness is dimensional scaling. A full gradient of a scalar function with $n$ inputs requires $n$ forward passes.

Tangent Propagation as Pushforward

In differential geometry, tangent propagation is a pushforward.

A function:

f:X\to Y

maps points in $X$ to points in $Y$ .

Its derivative maps tangent vectors at $x$ to tangent vectors at $f(x)$ :

Df_x:T_xX\to T_{f(x)}Y

Forward mode computes this map operationally.

In coordinates, this is exactly:

\dot y=J_f(x)\dot x

This geometric view clarifies why forward mode moves in the same direction as computation.

Summary

Tangent propagation is the core mechanism of forward mode.

It evaluates each variable as:

(v,\dot v)

and applies local tangent rules in program order.

Forward mode computes Jacobian-vector products, supports ordinary control flow naturally, and uses little memory. Its cost is proportional to the number of tangent directions being propagated.