# Forward Accumulation

Forward accumulation is the forward-mode form of automatic differentiation. It propagates derivative information in the same order as ordinary program evaluation. Each intermediate value carries two pieces of information:

| Component | Meaning |
|---|---|
| Primal | the ordinary value computed by the program |
| Tangent | the derivative of that value with respect to a chosen input direction |

If a program computes:

$$
y=f(x)
$$

then forward accumulation computes both:

$$
y
$$

and:

$$
\dot y =
J_f(x)\dot x
$$

where:
- $J_f(x)$ is the Jacobian of $f$ at $x$
- $\dot x$ is the seed tangent
- $\dot y$ is the propagated output tangent

Forward mode therefore computes a Jacobian-vector product.

## The Tangent Pair

Forward mode evaluates every variable as a pair:

$$
(v,\dot v)
$$

The first component is the primal value. The second component is the tangent value.

For an input $x$, we initialize:

$$
(x,\dot x)
$$

The choice of $\dot x$ determines which derivative direction we compute.

For a scalar input, set:

$$
\dot x = 1
$$

Then the output tangent equals the ordinary derivative:

$$
\dot y = \frac{dy}{dx}
$$

For a vector input $x\in\mathbb{R}^n$, set:

$$
\dot x = e_i
$$

where $e_i$ is the $i$-th standard basis vector. Then the output tangent gives the $i$-th column of the Jacobian.

## Simple Scalar Example

Consider:

$$
f(x)=x^2+3x
$$

Introduce intermediate variables:

$$
v_1=x^2
$$

$$
v_2=3x
$$

$$
v_3=v_1+v_2
$$

Forward accumulation evaluates primal and tangent values together.

| Variable | Primal | Tangent |
|---|---:|---:|
| $x$ | $x$ | $1$ |
| $v_1=x^2$ | $x^2$ | $2x$ |
| $v_2=3x$ | $3x$ | $3$ |
| $v_3=v_1+v_2$ | $x^2+3x$ | $2x+3$ |

At the end:

$$
f'(x)=2x+3
$$

The derivative appears as a byproduct of ordinary evaluation.

## Operational Rules

Every primitive operation is lifted from values to tangent pairs.

### Addition

$$
z=x+y
$$

Forward rule:

$$
(z,\dot z) =
(x+y,\dot x+\dot y)
$$

### Multiplication

$$
z=xy
$$

Forward rule:

$$
(z,\dot z) =
(xy,y\dot x+x\dot y)
$$

### Sine

$$
z=\sin(x)
$$

Forward rule:

$$
(z,\dot z) =
(\sin(x),\cos(x)\dot x)
$$

### Exponential

$$
z=e^x
$$

Forward rule:

$$
(z,\dot z) =
(e^x,e^x\dot x)
$$

Each rule is local. The AD system only needs the derivative rule for the current primitive.

## Forward Mode as Program Transformation

Forward accumulation can be viewed as a program transformation.

Original program:

```text
v1 = x * x
v2 = 3 * x
v3 = v1 + v2
return v3
```

Forward-transformed program:

```text
v1      = x * x
dot_v1  = x * dot_x + x * dot_x

v2      = 3 * x
dot_v2  = 3 * dot_x

v3      = v1 + v2
dot_v3  = dot_v1 + dot_v2

return v3, dot_v3
```

This transformation preserves the original computation while adding tangent computation beside it.

The transformed program has the same control flow as the original program. Branches, loops, and function calls execute normally, but each differentiable value carries a tangent.

## Jacobian-Vector Products

For a function:

$$
f:\mathbb{R}^n\to\mathbb{R}^m
$$

the derivative at a point is the Jacobian:

$$
J_f(x)\in\mathbb{R}^{m\times n}
$$

Forward mode computes:

$$
J_f(x)\dot x
$$

This is a directional derivative.

If:

$$
\dot x=e_i
$$

then:

$$
J_f(x)e_i
$$

returns the $i$-th column of the Jacobian.

Thus, computing the full Jacobian by forward mode requires $n$ passes, one for each input dimension.

## Cost Model

Let $C(f)$ be the cost of evaluating $f$.

One forward-mode pass usually costs a small constant multiple of $C(f)$.

For one seed direction:

$$
C_{\text{forward}} \approx k C(f)
$$

where $k$ is often between 2 and 5, depending on the primitive set and implementation.

For a full Jacobian of:

$$
f:\mathbb{R}^n\to\mathbb{R}^m
$$

forward mode requires $n$ seeded evaluations:

$$
C_{\text{Jacobian}} \approx n k C(f)
$$

Forward mode is therefore well suited when:
- the number of inputs is small
- the number of outputs is large
- directional derivatives are sufficient
- Jacobian-vector products are needed directly

## Multi-Seed Forward Mode

Instead of carrying one tangent direction, a system may carry several tangents at once.

Each variable becomes:

$$
(v,\dot v_1,\dot v_2,\dots,\dot v_p)
$$

where $p$ is the number of seed directions.

This computes:

$$
J_f(x)S
$$

where:

$$
S\in\mathbb{R}^{n\times p}
$$

contains multiple seed vectors.

Multi-seed forward mode amortizes overhead across directions. It is useful when computing several Jacobian columns together, especially on vector hardware.

## Example: Two Inputs

Let:

$$
f(x,y)=xy+\sin(x)
$$

Intermediate variables:

$$
v_1=xy
$$

$$
v_2=\sin(x)
$$

$$
v_3=v_1+v_2
$$

To compute the derivative with respect to $x$, seed:

$$
\dot x=1,\quad \dot y=0
$$

Propagation:

$$
\dot v_1=y\dot x+x\dot y=y
$$

$$
\dot v_2=\cos(x)\dot x=\cos(x)
$$

$$
\dot v_3=y+\cos(x)
$$

Therefore:

$$
\frac{\partial f}{\partial x}=y+\cos(x)
$$

To compute the derivative with respect to $y$, seed:

$$
\dot x=0,\quad \dot y=1
$$

Propagation:

$$
\dot v_1=x
$$

$$
\dot v_2=0
$$

$$
\dot v_3=x
$$

Therefore:

$$
\frac{\partial f}{\partial y}=x
$$

## Forward Mode and Dual Numbers

Forward accumulation is naturally represented with dual numbers.

A dual number has the form:

$$
a+b\varepsilon
$$

where:

$$
\varepsilon^2=0
$$

Evaluate a function on:

$$
x+\dot x\varepsilon
$$

Then:

$$
f(x+\dot x\varepsilon) =
f(x)+f'(x)\dot x\varepsilon
$$

The coefficient of $\varepsilon$ is the forward-mode tangent.

This gives an algebraic interpretation of forward accumulation: ordinary arithmetic is extended so that derivative information propagates automatically.

## Control Flow

Forward accumulation follows the same branch decisions as primal execution.

Example:

```text
if x > 0:
    y = x * x
else:
    y = -x
```

If $x>0$, the derivative rule is:

$$
\dot y=2x\dot x
$$

If $x<0$, the derivative rule is:

$$
\dot y=-\dot x
$$

At $x=0$, the program is not differentiable in the classical sense because the active branch changes. AD returns the derivative of the executed branch, not a symbolic piecewise derivative over all branches.

This distinction matters for programs with comparisons, clipping, thresholding, and discrete control.

## Strengths of Forward Accumulation

Forward accumulation is simple and predictable.

It has several practical advantages:
- no reverse tape is required
- memory usage is low
- implementation is straightforward
- works naturally with loops and recursion
- good for small-input, large-output functions
- good for online derivative computation

Because tangents are computed immediately, forward mode does not need to store the entire computation graph for a later backward pass.

## Limitations

Forward mode becomes expensive when the input dimension is large and the output dimension is small.

For:

$$
f:\mathbb{R}^n\to\mathbb{R}
$$

computing the full gradient with forward mode requires $n$ passes.

This is inefficient for neural networks, where $n$ may be millions or billions.

Reverse mode is preferred in that case because it computes the full gradient of a scalar-output function in one reverse pass.

## Summary

Forward accumulation evaluates a program while propagating tangent values through each primitive operation.

Its core object is the pair:

$$
(v,\dot v)
$$

Its core computation is:

$$
\dot y=J_f(x)\dot x
$$

It is best understood as:
- local derivative propagation
- program transformation
- dual-number arithmetic
- Jacobian-vector product evaluation

Forward accumulation is the simplest operational form of automatic differentiation and the basis for many higher-order and mixed-mode techniques.

