# Forward Evaluation Rules

## Forward Evaluation Rules

Forward mode automatic differentiation works by replacing each primitive operation with an extended operation on pairs:

$$
x \mapsto (x, \dot{x})
$$

The first component is the primal value. The second component is the tangent value. The evaluation rule for each operation must compute both.

For a primitive operation

$$
z = \phi(x_1, x_2, \ldots, x_k),
$$

the forward rule is

$$
\dot{z} =
\sum_{i=1}^{k}
\frac{\partial \phi}{\partial x_i}
(x_1, \ldots, x_k)\dot{x}_i.
$$

This is the local chain rule. Each primitive consumes primal inputs and tangent inputs, then produces a primal output and a tangent output.

### Unary operations

For a unary operation

$$
z = \phi(x),
$$

the forward rule is

$$
\dot{z} = \phi'(x)\dot{x}.
$$

Examples:

| Operation | Primal rule | Tangent rule |
|---|---:|---:|
| Negation | $z = -x$ | $\dot{z} = -\dot{x}$ |
| Square | $z = x^2$ | $\dot{z} = 2x\dot{x}$ |
| Reciprocal | $z = 1/x$ | $\dot{z} = -\dot{x}/x^2$ |
| Square root | $z = \sqrt{x}$ | $\dot{z} = \dot{x}/(2\sqrt{x})$ |
| Exponential | $z = e^x$ | $\dot{z} = e^x\dot{x}$ |
| Logarithm | $z = \log x$ | $\dot{z} = \dot{x}/x$ |
| Sine | $z = \sin x$ | $\dot{z} = \cos x\dot{x}$ |
| Cosine | $z = \cos x$ | $\dot{z} = -\sin x\dot{x}$ |
| Tangent | $z = \tan x$ | $\dot{z} = (1+\tan^2 x)\dot{x}$ |

These rules are local. The rule for $\sin$ does not depend on where $x$ came from or how $z$ will be used.

### Binary operations

For a binary operation

$$
z = \phi(x, y),
$$

the forward rule is

$$
\dot{z} =
\frac{\partial \phi}{\partial x}(x,y)\dot{x}
+
\frac{\partial \phi}{\partial y}(x,y)\dot{y}.
$$

Examples:

| Operation | Primal rule | Tangent rule |
|---|---:|---:|
| Addition | $z = x + y$ | $\dot{z} = \dot{x} + \dot{y}$ |
| Subtraction | $z = x - y$ | $\dot{z} = \dot{x} - \dot{y}$ |
| Multiplication | $z = xy$ | $\dot{z} = \dot{x}y + x\dot{y}$ |
| Division | $z = x/y$ | $\dot{z} = (\dot{x}y - x\dot{y})/y^2$ |
| Power, constant exponent | $z = x^c$ | $\dot{z} = cx^{c-1}\dot{x}$ |
| Power, variable exponent | $z = x^y$ | $\dot{z} = x^y(y\dot{x}/x + \dot{y}\log x)$ |

The variable-exponent power rule assumes $x > 0$. If $x \le 0$, real-valued semantics become restricted and implementation-specific.

### Constants and input variables

Constants have zero tangent:

$$
c \mapsto (c, 0).
$$

Input variables receive tangent seeds. For a function

$$
f(x_1,\ldots,x_n),
$$

to compute the derivative in direction

$$
v = (v_1,\ldots,v_n),
$$

initialize

$$
x_i \mapsto (x_i, v_i).
$$

Then evaluate the program using forward rules. The output tangent is

$$
J_f(x)v.
$$

To compute the derivative with respect to $x_j$, use the basis seed

$$
v = e_j.
$$

### Example: composed expression

Consider

$$
f(x) = \exp(\sin x + x^2).
$$

Write it as a sequence of primitive operations:

$$
a = \sin x
$$

$$
b = x^2
$$

$$
c = a + b
$$

$$
y = \exp c
$$

Seed

$$
\dot{x} = 1.
$$

Now evaluate forward:

$$
\dot{a} = \cos x \dot{x} = \cos x
$$

$$
\dot{b} = 2x\dot{x} = 2x
$$

$$
\dot{c} = \dot{a} + \dot{b} = \cos x + 2x
$$

$$
\dot{y} = \exp(c)\dot{c}
$$

Substitute $c = \sin x + x^2$:

$$
\dot{y} =
\exp(\sin x + x^2)(\cos x + 2x).
$$

Forward evaluation has applied the chain rule one primitive at a time.

### Vector-valued primitives

Many AD systems operate on arrays and tensors rather than scalar variables. The same rule applies, but each primitive may represent a vector operation.

For

$$
z = x + y,
$$

where $x,y,z$ are vectors or tensors,

$$
\dot{z} = \dot{x} + \dot{y}.
$$

For elementwise multiplication,

$$
z = x \odot y,
$$

the rule is

$$
\dot{z} = \dot{x} \odot y + x \odot \dot{y}.
$$

For matrix multiplication,

$$
C = AB,
$$

the tangent rule is

$$
\dot{C} = \dot{A}B + A\dot{B}.
$$

For a linear solve,

$$
x = A^{-1}b,
$$

equivalently

$$
Ax = b,
$$

the tangent satisfies

$$
A\dot{x} + \dot{A}x = \dot{b}.
$$

So

$$
\dot{x} = A^{-1}(\dot{b} - \dot{A}x).
$$

The rule should be implemented by solving another linear system, not by explicitly forming $A^{-1}$.

### Broadcasting rules

Array programs often use broadcasting. Suppose

$$
z_{ij} = x_{ij} + b_j.
$$

Then

$$
\dot{z}_{ij} = \dot{x}_{ij} + \dot{b}_j.
$$

Forward mode follows the primal broadcasting shape. The tangent of a broadcast value is broadcast in the same way.

Unlike reverse mode, forward mode usually does not need a reduction to undo broadcasting. The tangent flows in the same direction as the primal computation.

### Reductions

For a reduction such as

$$
y = \sum_i x_i,
$$

the forward rule is

$$
\dot{y} = \sum_i \dot{x}_i.
$$

For a mean,

$$
y = \frac{1}{n}\sum_i x_i,
$$

the rule is

$$
\dot{y} = \frac{1}{n}\sum_i \dot{x}_i.
$$

For a product,

$$
y = \prod_i x_i,
$$

the rule can be written as

$$
\dot{y} =
\sum_i
\dot{x}_i
\prod_{j \ne i} x_j.
$$

When all $x_i$ are nonzero, this can be computed as

$$
\dot{y} =
y
\sum_i
\frac{\dot{x}_i}{x_i}.
$$

The second form is shorter, but it has different numerical behavior near zero. A robust implementation must handle zeros carefully.

### Conditionals

For conditionals, forward mode follows the branch taken by the primal execution.

```text
if x > 0:
    y = x * x
else:
    y = 0
```

If $x > 0$, then

$$
\dot{y} = 2x\dot{x}.
$$

If $x \le 0$, then

$$
\dot{y} = 0.
$$

At the branch boundary $x = 0$, the mathematical derivative may be undefined or may depend on the chosen convention. Forward mode differentiates the executed path. It does not automatically reason about all possible paths.

### Loops

Loops are handled by repeated application of local rules. Consider:

```text
y = 1
for i in 1..n:
    y = y * x
```

The primal computes

$$
y = x^n.
$$

The tangent recurrence is

$$
\dot{y}_{k+1} =
\dot{y}_k x + y_k \dot{x}.
$$

After $n$ iterations, the tangent is

$$
\dot{y} = nx^{n-1}\dot{x}.
$$

Forward mode does not need to unroll the loop symbolically. It propagates the tangent through each executed iteration.

### Function calls

If a program calls a function,

```text
z = g(x, y)
```

then the AD system needs a forward rule for $g$. There are two common cases.

If $g$ is written in differentiable code, the AD transform can enter the function body and propagate tangents through its operations.

If $g$ is a primitive or external library function, the system needs a custom rule:

$$
(z, \dot{z}) = \operatorname{jvp}_g(x, y, \dot{x}, \dot{y}).
$$

This custom rule is often called a JVP rule, because it computes a Jacobian-vector product.

### Rule correctness

A forward rule is correct when it preserves the first-order expansion of the primitive.

For each primitive $\phi$, the rule must satisfy

$$
\phi(x + \epsilon\dot{x}) =
\phi(x)
+
\epsilon\dot{z}.
$$

For multiple inputs:

$$
\phi(x_1 + \epsilon\dot{x}_1,\ldots,x_k + \epsilon\dot{x}_k) =
\phi(x_1,\ldots,x_k)
+
\epsilon\dot{z}.
$$

Because $\epsilon^2 = 0$, only first-order terms remain. This gives a precise test for the rule.

### Implementation sketch

A minimal forward evaluator can represent values as:

```go
type Dual struct {
    Value   float64
    Tangent float64
}
```

Primitive rules are then ordinary functions:

```go
func Add(a, b Dual) Dual {
    return Dual{
        Value:   a.Value + b.Value,
        Tangent: a.Tangent + b.Tangent,
    }
}

func Mul(a, b Dual) Dual {
    return Dual{
        Value:   a.Value * b.Value,
        Tangent: a.Tangent*b.Value + a.Value*b.Tangent,
    }
}

func Sin(a Dual) Dual {
    return Dual{
        Value:   math.Sin(a.Value),
        Tangent: math.Cos(a.Value) * a.Tangent,
    }
}

func Exp(a Dual) Dual {
    v := math.Exp(a.Value)
    return Dual{
        Value:   v,
        Tangent: v * a.Tangent,
    }
}
```

A function written using these operations computes both value and derivative:

```go
func F(x Dual) Dual {
    return Exp(Add(Sin(x), Mul(x, x)))
}
```

Calling it with

```go
x := Dual{Value: 2, Tangent: 1}
y := F(x)
```

returns

```text
y.Value   = exp(sin(2) + 4)
y.Tangent = exp(sin(2) + 4) * (cos(2) + 4)
```

### Summary

Forward evaluation rules extend each primitive operation from primal values to primal-tangent pairs. The tangent rule is the local Jacobian of the primitive applied to the input tangents.

The whole AD computation is obtained by ordinary execution of these extended primitives. Constants receive zero tangent, inputs receive seed tangents, and every intermediate variable receives a tangent computed by the local chain rule.

