Skip to content

Forward Accumulation

Forward accumulation is the forward-mode form of automatic differentiation. It propagates derivative information in the same order as ordinary program evaluation. Each...

Forward accumulation is the forward-mode form of automatic differentiation. It propagates derivative information in the same order as ordinary program evaluation. Each intermediate value carries two pieces of information:

ComponentMeaning
Primalthe ordinary value computed by the program
Tangentthe derivative of that value with respect to a chosen input direction

If a program computes:

y=f(x) y=f(x)

then forward accumulation computes both:

y y

and:

y˙=Jf(x)x˙ \dot y = J_f(x)\dot x

where:

  • Jf(x)J_f(x) is the Jacobian of ff at xx
  • x˙\dot x is the seed tangent
  • y˙\dot y is the propagated output tangent

Forward mode therefore computes a Jacobian-vector product.

The Tangent Pair

Forward mode evaluates every variable as a pair:

(v,v˙) (v,\dot v)

The first component is the primal value. The second component is the tangent value.

For an input xx, we initialize:

(x,x˙) (x,\dot x)

The choice of x˙\dot x determines which derivative direction we compute.

For a scalar input, set:

x˙=1 \dot x = 1

Then the output tangent equals the ordinary derivative:

y˙=dydx \dot y = \frac{dy}{dx}

For a vector input xRnx\in\mathbb{R}^n, set:

x˙=ei \dot x = e_i

where eie_i is the ii-th standard basis vector. Then the output tangent gives the ii-th column of the Jacobian.

Simple Scalar Example

Consider:

f(x)=x2+3x f(x)=x^2+3x

Introduce intermediate variables:

v1=x2 v_1=x^2 v2=3x v_2=3x v3=v1+v2 v_3=v_1+v_2

Forward accumulation evaluates primal and tangent values together.

VariablePrimalTangent
xxxx11
v1=x2v_1=x^2x2x^22x2x
v2=3xv_2=3x3x3x33
v3=v1+v2v_3=v_1+v_2x2+3xx^2+3x2x+32x+3

At the end:

f(x)=2x+3 f'(x)=2x+3

The derivative appears as a byproduct of ordinary evaluation.

Operational Rules

Every primitive operation is lifted from values to tangent pairs.

Addition

z=x+y z=x+y

Forward rule:

(z,z˙)=(x+y,x˙+y˙) (z,\dot z) = (x+y,\dot x+\dot y)

Multiplication

z=xy z=xy

Forward rule:

(z,z˙)=(xy,yx˙+xy˙) (z,\dot z) = (xy,y\dot x+x\dot y)

Sine

z=sin(x) z=\sin(x)

Forward rule:

(z,z˙)=(sin(x),cos(x)x˙) (z,\dot z) = (\sin(x),\cos(x)\dot x)

Exponential

z=ex z=e^x

Forward rule:

(z,z˙)=(ex,exx˙) (z,\dot z) = (e^x,e^x\dot x)

Each rule is local. The AD system only needs the derivative rule for the current primitive.

Forward Mode as Program Transformation

Forward accumulation can be viewed as a program transformation.

Original program:

v1 = x * x
v2 = 3 * x
v3 = v1 + v2
return v3

Forward-transformed program:

v1      = x * x
dot_v1  = x * dot_x + x * dot_x

v2      = 3 * x
dot_v2  = 3 * dot_x

v3      = v1 + v2
dot_v3  = dot_v1 + dot_v2

return v3, dot_v3

This transformation preserves the original computation while adding tangent computation beside it.

The transformed program has the same control flow as the original program. Branches, loops, and function calls execute normally, but each differentiable value carries a tangent.

Jacobian-Vector Products

For a function:

f:RnRm f:\mathbb{R}^n\to\mathbb{R}^m

the derivative at a point is the Jacobian:

Jf(x)Rm×n J_f(x)\in\mathbb{R}^{m\times n}

Forward mode computes:

Jf(x)x˙ J_f(x)\dot x

This is a directional derivative.

If:

x˙=ei \dot x=e_i

then:

Jf(x)ei J_f(x)e_i

returns the ii-th column of the Jacobian.

Thus, computing the full Jacobian by forward mode requires nn passes, one for each input dimension.

Cost Model

Let C(f)C(f) be the cost of evaluating ff.

One forward-mode pass usually costs a small constant multiple of C(f)C(f).

For one seed direction:

CforwardkC(f) C_{\text{forward}} \approx k C(f)

where kk is often between 2 and 5, depending on the primitive set and implementation.

For a full Jacobian of:

f:RnRm f:\mathbb{R}^n\to\mathbb{R}^m

forward mode requires nn seeded evaluations:

CJacobiannkC(f) C_{\text{Jacobian}} \approx n k C(f)

Forward mode is therefore well suited when:

  • the number of inputs is small
  • the number of outputs is large
  • directional derivatives are sufficient
  • Jacobian-vector products are needed directly

Multi-Seed Forward Mode

Instead of carrying one tangent direction, a system may carry several tangents at once.

Each variable becomes:

(v,v˙1,v˙2,,v˙p) (v,\dot v_1,\dot v_2,\dots,\dot v_p)

where pp is the number of seed directions.

This computes:

Jf(x)S J_f(x)S

where:

SRn×p S\in\mathbb{R}^{n\times p}

contains multiple seed vectors.

Multi-seed forward mode amortizes overhead across directions. It is useful when computing several Jacobian columns together, especially on vector hardware.

Example: Two Inputs

Let:

f(x,y)=xy+sin(x) f(x,y)=xy+\sin(x)

Intermediate variables:

v1=xy v_1=xy v2=sin(x) v_2=\sin(x) v3=v1+v2 v_3=v_1+v_2

To compute the derivative with respect to xx, seed:

x˙=1,y˙=0 \dot x=1,\quad \dot y=0

Propagation:

v˙1=yx˙+xy˙=y \dot v_1=y\dot x+x\dot y=y v˙2=cos(x)x˙=cos(x) \dot v_2=\cos(x)\dot x=\cos(x) v˙3=y+cos(x) \dot v_3=y+\cos(x)

Therefore:

fx=y+cos(x) \frac{\partial f}{\partial x}=y+\cos(x)

To compute the derivative with respect to yy, seed:

x˙=0,y˙=1 \dot x=0,\quad \dot y=1

Propagation:

v˙1=x \dot v_1=x v˙2=0 \dot v_2=0 v˙3=x \dot v_3=x

Therefore:

fy=x \frac{\partial f}{\partial y}=x

Forward Mode and Dual Numbers

Forward accumulation is naturally represented with dual numbers.

A dual number has the form:

a+bε a+b\varepsilon

where:

ε2=0 \varepsilon^2=0

Evaluate a function on:

x+x˙ε x+\dot x\varepsilon

Then:

f(x+x˙ε)=f(x)+f(x)x˙ε f(x+\dot x\varepsilon) = f(x)+f'(x)\dot x\varepsilon

The coefficient of ε\varepsilon is the forward-mode tangent.

This gives an algebraic interpretation of forward accumulation: ordinary arithmetic is extended so that derivative information propagates automatically.

Control Flow

Forward accumulation follows the same branch decisions as primal execution.

Example:

if x > 0:
    y = x * x
else:
    y = -x

If x>0x>0, the derivative rule is:

y˙=2xx˙ \dot y=2x\dot x

If x<0x<0, the derivative rule is:

y˙=x˙ \dot y=-\dot x

At x=0x=0, the program is not differentiable in the classical sense because the active branch changes. AD returns the derivative of the executed branch, not a symbolic piecewise derivative over all branches.

This distinction matters for programs with comparisons, clipping, thresholding, and discrete control.

Strengths of Forward Accumulation

Forward accumulation is simple and predictable.

It has several practical advantages:

  • no reverse tape is required
  • memory usage is low
  • implementation is straightforward
  • works naturally with loops and recursion
  • good for small-input, large-output functions
  • good for online derivative computation

Because tangents are computed immediately, forward mode does not need to store the entire computation graph for a later backward pass.

Limitations

Forward mode becomes expensive when the input dimension is large and the output dimension is small.

For:

f:RnR f:\mathbb{R}^n\to\mathbb{R}

computing the full gradient with forward mode requires nn passes.

This is inefficient for neural networks, where nn may be millions or billions.

Reverse mode is preferred in that case because it computes the full gradient of a scalar-output function in one reverse pass.

Summary

Forward accumulation evaluates a program while propagating tangent values through each primitive operation.

Its core object is the pair:

(v,v˙) (v,\dot v)

Its core computation is:

y˙=Jf(x)x˙ \dot y=J_f(x)\dot x

It is best understood as:

  • local derivative propagation
  • program transformation
  • dual-number arithmetic
  • Jacobian-vector product evaluation

Forward accumulation is the simplest operational form of automatic differentiation and the basis for many higher-order and mixed-mode techniques.