Forward accumulation is the forward-mode form of automatic differentiation. It propagates derivative information in the same order as ordinary program evaluation. Each...
Forward accumulation is the forward-mode form of automatic differentiation. It propagates derivative information in the same order as ordinary program evaluation. Each intermediate value carries two pieces of information:
| Component | Meaning |
|---|---|
| Primal | the ordinary value computed by the program |
| Tangent | the derivative of that value with respect to a chosen input direction |
If a program computes:
then forward accumulation computes both:
and:
where:
- is the Jacobian of at
- is the seed tangent
- is the propagated output tangent
Forward mode therefore computes a Jacobian-vector product.
The Tangent Pair
Forward mode evaluates every variable as a pair:
The first component is the primal value. The second component is the tangent value.
For an input , we initialize:
The choice of determines which derivative direction we compute.
For a scalar input, set:
Then the output tangent equals the ordinary derivative:
For a vector input , set:
where is the -th standard basis vector. Then the output tangent gives the -th column of the Jacobian.
Simple Scalar Example
Consider:
Introduce intermediate variables:
Forward accumulation evaluates primal and tangent values together.
| Variable | Primal | Tangent |
|---|---|---|
At the end:
The derivative appears as a byproduct of ordinary evaluation.
Operational Rules
Every primitive operation is lifted from values to tangent pairs.
Addition
Forward rule:
Multiplication
Forward rule:
Sine
Forward rule:
Exponential
Forward rule:
Each rule is local. The AD system only needs the derivative rule for the current primitive.
Forward Mode as Program Transformation
Forward accumulation can be viewed as a program transformation.
Original program:
v1 = x * x
v2 = 3 * x
v3 = v1 + v2
return v3Forward-transformed program:
v1 = x * x
dot_v1 = x * dot_x + x * dot_x
v2 = 3 * x
dot_v2 = 3 * dot_x
v3 = v1 + v2
dot_v3 = dot_v1 + dot_v2
return v3, dot_v3This transformation preserves the original computation while adding tangent computation beside it.
The transformed program has the same control flow as the original program. Branches, loops, and function calls execute normally, but each differentiable value carries a tangent.
Jacobian-Vector Products
For a function:
the derivative at a point is the Jacobian:
Forward mode computes:
This is a directional derivative.
If:
then:
returns the -th column of the Jacobian.
Thus, computing the full Jacobian by forward mode requires passes, one for each input dimension.
Cost Model
Let be the cost of evaluating .
One forward-mode pass usually costs a small constant multiple of .
For one seed direction:
where is often between 2 and 5, depending on the primitive set and implementation.
For a full Jacobian of:
forward mode requires seeded evaluations:
Forward mode is therefore well suited when:
- the number of inputs is small
- the number of outputs is large
- directional derivatives are sufficient
- Jacobian-vector products are needed directly
Multi-Seed Forward Mode
Instead of carrying one tangent direction, a system may carry several tangents at once.
Each variable becomes:
where is the number of seed directions.
This computes:
where:
contains multiple seed vectors.
Multi-seed forward mode amortizes overhead across directions. It is useful when computing several Jacobian columns together, especially on vector hardware.
Example: Two Inputs
Let:
Intermediate variables:
To compute the derivative with respect to , seed:
Propagation:
Therefore:
To compute the derivative with respect to , seed:
Propagation:
Therefore:
Forward Mode and Dual Numbers
Forward accumulation is naturally represented with dual numbers.
A dual number has the form:
where:
Evaluate a function on:
Then:
The coefficient of is the forward-mode tangent.
This gives an algebraic interpretation of forward accumulation: ordinary arithmetic is extended so that derivative information propagates automatically.
Control Flow
Forward accumulation follows the same branch decisions as primal execution.
Example:
if x > 0:
y = x * x
else:
y = -xIf , the derivative rule is:
If , the derivative rule is:
At , the program is not differentiable in the classical sense because the active branch changes. AD returns the derivative of the executed branch, not a symbolic piecewise derivative over all branches.
This distinction matters for programs with comparisons, clipping, thresholding, and discrete control.
Strengths of Forward Accumulation
Forward accumulation is simple and predictable.
It has several practical advantages:
- no reverse tape is required
- memory usage is low
- implementation is straightforward
- works naturally with loops and recursion
- good for small-input, large-output functions
- good for online derivative computation
Because tangents are computed immediately, forward mode does not need to store the entire computation graph for a later backward pass.
Limitations
Forward mode becomes expensive when the input dimension is large and the output dimension is small.
For:
computing the full gradient with forward mode requires passes.
This is inefficient for neural networks, where may be millions or billions.
Reverse mode is preferred in that case because it computes the full gradient of a scalar-output function in one reverse pass.
Summary
Forward accumulation evaluates a program while propagating tangent values through each primitive operation.
Its core object is the pair:
Its core computation is:
It is best understood as:
- local derivative propagation
- program transformation
- dual-number arithmetic
- Jacobian-vector product evaluation
Forward accumulation is the simplest operational form of automatic differentiation and the basis for many higher-order and mixed-mode techniques.