# Tangent Propagation

Forward mode automatic differentiation computes derivatives by propagating tangent values alongside ordinary values. The ordinary value is called the primal. The derivative value is called the tangent.

For every program variable $v$, forward mode tracks:

$$
(v,\dot v)
$$

The dot notation means “the change in $v$ induced by a chosen change in the input.”

If the input changes in direction $\dot x$, then every later value changes according to the chain rule.

## Directional Derivatives

Let:

$$
f:\mathbb{R}^n\to\mathbb{R}^m
$$

At input $x$, choose a direction:

$$
\dot x\in\mathbb{R}^n
$$

Forward mode computes:

$$
\dot y = J_f(x)\dot x
$$

where $J_f(x)$ is the Jacobian.

Thus forward mode computes a directional derivative. It tells us how the output changes when the input moves infinitesimally in direction $\dot x$.

## Primal and Tangent Execution

Consider:

$$
f(x)=x^2+3x
$$

A program evaluates:

```text
v1 = x * x
v2 = 3 * x
v3 = v1 + v2
return v3
```

Forward mode evaluates a paired program:

```text
v1     = x * x
dot_v1 = x * dot_x + x * dot_x

v2     = 3 * x
dot_v2 = 3 * dot_x

v3     = v1 + v2
dot_v3 = dot_v1 + dot_v2

return v3, dot_v3
```

If $\dot x=1$, then:

$$
\dot v_3=2x+3
$$

which is the ordinary derivative.

## Local Tangent Rules

Each primitive operation has a tangent rule.

| Operation | Primal | Tangent |
|---|---|---|
| addition | $z=x+y$ | $\dot z=\dot x+\dot y$ |
| subtraction | $z=x-y$ | $\dot z=\dot x-\dot y$ |
| multiplication | $z=xy$ | $\dot z=y\dot x+x\dot y$ |
| division | $z=x/y$ | $\dot z=(y\dot x-x\dot y)/y^2$ |
| sine | $z=\sin x$ | $\dot z=\cos(x)\dot x$ |
| exponential | $z=e^x$ | $\dot z=e^x\dot x$ |
| logarithm | $z=\log x$ | $\dot z=\dot x/x$ |

These rules are applied in the same order as the program.

## Tangent Seeding

The initial tangent determines the derivative being computed.

For one scalar input:

$$
\dot x=1
$$

computes:

$$
\frac{df}{dx}
$$

For vector input $x=(x_1,\dots,x_n)$, choosing:

$$
\dot x=e_i
$$

computes the $i$-th column of the Jacobian.

Example:

$$
f(x_1,x_2)=
\begin{bmatrix}
x_1x_2\\
\sin(x_1)+x_2
\end{bmatrix}
$$

Seed:

$$
\dot x=
\begin{bmatrix}
1\\
0
\end{bmatrix}
$$

Then forward mode computes the first Jacobian column:

$$
\dot y=
\begin{bmatrix}
x_2\\
\cos(x_1)
\end{bmatrix}
$$

Seed:

$$
\dot x=
\begin{bmatrix}
0\\
1
\end{bmatrix}
$$

Then it computes the second column:

$$
\dot y=
\begin{bmatrix}
x_1\\
1
\end{bmatrix}
$$

## Tangents Through Control Flow

Forward mode follows the same control flow as the primal program.

Example:

```text
if x > 0:
    y = x * x
else:
    y = -x
```

If $x>0$:

$$
\dot y=2x\dot x
$$

If $x<0$:

$$
\dot y=-\dot x
$$

At $x=0$, the function has a corner. Forward mode returns the tangent of the executed branch. It does not infer a symbolic piecewise derivative.

This is important for programs with:
- thresholds
- comparisons
- clipping
- indexing
- data-dependent branches

## Tangents Through Loops

Forward mode handles loops directly because tangent propagation follows primal execution.

Example:

```text
y = x
for i in range(n):
    y = y * y
```

The tangent program is:

```text
y = x
dot_y = dot_x

for i in range(n):
    old_y = y
    old_dot_y = dot_y

    y = old_y * old_y
    dot_y = old_y * old_dot_y + old_y * old_dot_y
```

Each iteration propagates the tangent through the loop body.

For fixed $n$, this computes the derivative of the function represented by the loop.

## Tangents for Structured Values

Real programs use arrays, tuples, structs, and nested containers.

Forward mode assigns tangents only to differentiable components.

Example:

```text
State {
    position: Float
    velocity: Float
    step: Int
}
```

A tangent state may contain:

```text
StateTangent {
    position: Float
    velocity: Float
}
```

The integer `step` has no tangent because discrete values do not vary infinitesimally.

This distinction becomes important in differentiable programming languages and compiler IRs.

## Multiple Tangent Directions

Forward mode can carry several tangent directions at once.

Instead of:

$$
(v,\dot v)
$$

use:

$$
(v,\dot v_1,\dot v_2,\dots,\dot v_k)
$$

This computes:

$$
J_f(x)S
$$

where $S$ is a matrix of seed directions.

Multi-direction forward mode is useful for:
- computing several Jacobian columns
- exploiting SIMD and vector hardware
- reducing interpreter overhead
- sparse Jacobian recovery

The cost grows roughly linearly with the number of tangent directions, but batching can improve constants.

## Memory Behavior

Forward mode has simple memory behavior.

At each point, it needs only:
- current primal values
- current tangent values

It does not need to store a full tape for a later backward pass.

Thus forward mode is attractive for:
- streaming computations
- long simulations
- online sensitivity analysis
- low-memory systems

Its weakness is dimensional scaling. A full gradient of a scalar function with $n$ inputs requires $n$ forward passes.

## Tangent Propagation as Pushforward

In differential geometry, tangent propagation is a pushforward.

A function:

$$
f:X\to Y
$$

maps points in $X$ to points in $Y$.

Its derivative maps tangent vectors at $x$ to tangent vectors at $f(x)$:

$$
Df_x:T_xX\to T_{f(x)}Y
$$

Forward mode computes this map operationally.

In coordinates, this is exactly:

$$
\dot y=J_f(x)\dot x
$$

This geometric view clarifies why forward mode moves in the same direction as computation.

## Summary

Tangent propagation is the core mechanism of forward mode.

It evaluates each variable as:

$$
(v,\dot v)
$$

and applies local tangent rules in program order.

Forward mode computes Jacobian-vector products, supports ordinary control flow naturally, and uses little memory. Its cost is proportional to the number of tangent directions being propagated.

