# Chapter 5. Forward Mode Automatic Differentiation

## Tangent Propagation

Forward mode automatic differentiation computes derivatives by carrying two values through a program at the same time: the ordinary value and its tangent. The ordinary value tells us what the program computes. The tangent tells us how that value changes when the input is perturbed.

For a scalar function

$$
f : \mathbb{R} \to \mathbb{R},
$$

we usually write the derivative as $f'(x)$. Forward mode instead asks a slightly more operational question:

Given a small input perturbation $\dot{x}$, what perturbation $\dot{y}$ appears in the output?

If

$$
y = f(x),
$$

then the tangent satisfies

$$
\dot{y} = f'(x)\dot{x}.
$$

The dot notation does not mean time derivative here. It means tangent value. The pair

$$
(x, \dot{x})
$$

represents both the primal value $x$ and the directional change $\dot{x}$.

For a multivariate function

$$
f : \mathbb{R}^n \to \mathbb{R}^m,
$$

the same idea becomes

$$
\dot{y} = J_f(x)\dot{x},
$$

where $J_f(x)$ is the Jacobian of $f$ at $x$. Forward mode computes this Jacobian-vector product directly, without first constructing the full Jacobian.

### Tangents as first-order perturbations

A tangent can be understood through a first-order expansion. Suppose the input is perturbed from $x$ to

$$
x + \epsilon \dot{x},
$$

where $\epsilon$ is small. Then

$$
f(x + \epsilon \dot{x}) =
f(x) + \epsilon J_f(x)\dot{x} + O(\epsilon^2).
$$

Forward mode keeps the coefficient of $\epsilon$. It discards all higher-order terms. Therefore, each intermediate value carries a first-order local approximation of how it changes.

For a program variable $v$, forward mode stores

$$
(v, \dot{v}).
$$

The first component is the normal runtime value. The second component is the tangent propagated from the inputs.

### Propagation through primitive operations

Consider a program built from elementary operations. Forward mode assigns each operation a rule for both the primal result and the tangent result.

If

$$
z = x + y,
$$

then

$$
\dot{z} = \dot{x} + \dot{y}.
$$

If

$$
z = xy,
$$

then

$$
\dot{z} = \dot{x}y + x\dot{y}.
$$

If

$$
z = \sin x,
$$

then

$$
\dot{z} = \cos(x)\dot{x}.
$$

If

$$
z = \exp x,
$$

then

$$
\dot{z} = \exp(x)\dot{x}.
$$

Each rule is just the ordinary derivative rule applied locally. The important point is that the rule is applied during normal program execution. There is no separate symbolic expression for the whole function.

### Example

Let

$$
f(x) = x^2 + 3x.
$$

Write the program as

$$
a = x^2,
$$

$$
b = 3x,
$$

$$
y = a + b.
$$

To compute $f'(x)$, seed the input tangent with

$$
\dot{x} = 1.
$$

Then propagate:

$$
a = x^2,
\qquad
\dot{a} = 2x\dot{x} = 2x.
$$

$$
b = 3x,
\qquad
\dot{b} = 3\dot{x} = 3.
$$

$$
y = a + b,
\qquad
\dot{y} = \dot{a} + \dot{b} = 2x + 3.
$$

So the output tangent is

$$
\dot{y} = f'(x) = 2x + 3.
$$

At $x = 5$, the program computes

$$
y = 5^2 + 3 \cdot 5 = 40,
$$

and the tangent computes

$$
\dot{y} = 2 \cdot 5 + 3 = 13.
$$

The result of forward mode is therefore the pair

$$
(40, 13).
$$

The first value is the function output. The second value is the derivative in the seeded direction.

### Directional derivatives

For a function with many inputs, the tangent seed chooses a direction.

Let

$$
f(x_1, x_2) = x_1x_2 + \sin x_1.
$$

If we seed

$$
\dot{x}_1 = 1,
\qquad
\dot{x}_2 = 0,
$$

then forward mode computes the derivative with respect to $x_1$. If we seed

$$
\dot{x}_1 = 0,
\qquad
\dot{x}_2 = 1,
$$

then it computes the derivative with respect to $x_2$.

More generally, if

$$
\dot{x} =
\begin{bmatrix}
\dot{x}_1 \\
\dot{x}_2
\end{bmatrix},
$$

then forward mode computes the directional derivative

$$
J_f(x)\dot{x}.
$$

This is the basic reason forward mode is efficient when the number of input directions is small. One run gives one Jacobian-vector product. To compute a full Jacobian with $n$ input dimensions, we usually need $n$ forward-mode runs, one for each basis direction.

### Tangent propagation as program execution

Forward mode can be viewed as a mechanical transformation of a program.

A statement such as

```text
z = x * y
```

becomes

```text
z  = x * y
dz = dx * y + x * dy
```

A statement such as

```text
z = sin(x)
```

becomes

```text
z  = sin(x)
dz = cos(x) * dx
```

The transformed program follows the same control flow as the original program. It executes the primal computation and the tangent computation side by side.

This gives forward mode an important implementation property: it does not need to store the whole computation for a later reverse pass. Once an operation has propagated its tangent, its local derivative information can often be discarded.

### Locality

Tangent propagation is local. Each operation only needs:

1. The primal inputs.
2. The tangent inputs.
3. The derivative rule for that operation.

It does not need to know the full expression that produced its inputs. It also does not need to know how its output will be used later.

This locality makes forward mode simple to implement with operator overloading. A number type can store both a primal value and a tangent value. Arithmetic operators are then overloaded to propagate both fields.

For example, conceptually:

```go
type Dual struct {
    Value   float64
    Tangent float64
}

func Add(x, y Dual) Dual {
    return Dual{
        Value:   x.Value + y.Value,
        Tangent: x.Tangent + y.Tangent,
    }
}

func Mul(x, y Dual) Dual {
    return Dual{
        Value:   x.Value * y.Value,
        Tangent: x.Tangent*y.Value + x.Value*y.Tangent,
    }
}
```

This structure is the operational core of forward mode. More advanced systems generalize it to vectors, arrays, tensors, sparse directions, and higher-order tangents.

### Cost model

For scalar tangents, forward mode usually adds a small constant factor to the cost of the primal computation. Each primitive operation performs its normal computation and a tangent computation.

For example, multiplication changes from

```text
z = x * y
```

to

```text
z  = x * y
dz = dx * y + x * dy
```

So one multiplication becomes one primal multiplication plus a few extra arithmetic operations.

If the tangent is a vector of $k$ directions, then each variable carries $k$ tangent components. The cost becomes roughly proportional to $k$. This is useful when $k$ is small. It becomes expensive when $k$ approaches the full input dimension.

### The role of seeding

The tangent seed defines the derivative query.

For

$$
f : \mathbb{R}^n \to \mathbb{R}^m,
$$

the input tangent $\dot{x}$ is a vector in input space. Forward mode computes

$$
\dot{y} = J_f(x)\dot{x}.
$$

To compute the first column of the Jacobian, use

$$
\dot{x} = e_1.
$$

To compute the second column, use

$$
\dot{x} = e_2.
$$

Repeating this for all standard basis vectors gives the full Jacobian. However, when only a directional derivative is needed, one seed is enough.

This is common in scientific computing, sensitivity analysis, and implicit methods, where the question is often not “what is the whole Jacobian?” but “how does the output change in this particular direction?”

### Summary of the mechanism

Forward mode automatic differentiation propagates tangents through the same computation that produces the primal output. Each variable is extended from a value into a pair:

$$
v \mapsto (v, \dot{v}).
$$

Each primitive operation is extended by its local derivative rule. The final tangent is the derivative of the output in the seeded input direction.

Forward mode is direct, local, and exact up to the floating point behavior of the underlying computation. Its natural output is a Jacobian-vector product, which makes it especially effective for functions with few inputs or for derivative queries involving a small number of directions.

