Forward mode automatic differentiation works by replacing each primitive operation with an extended operation on pairs:
Forward mode automatic differentiation works by replacing each primitive operation with an extended operation on pairs:
The first component is the primal value. The second component is the tangent value. The evaluation rule for each operation must compute both.
For a primitive operation
the forward rule is
This is the local chain rule. Each primitive consumes primal inputs and tangent inputs, then produces a primal output and a tangent output.
Unary operations
For a unary operation
the forward rule is
Examples:
| Operation | Primal rule | Tangent rule |
|---|---|---|
| Negation | ||
| Square | ||
| Reciprocal | ||
| Square root | ||
| Exponential | ||
| Logarithm | ||
| Sine | ||
| Cosine | ||
| Tangent |
These rules are local. The rule for does not depend on where came from or how will be used.
Binary operations
For a binary operation
the forward rule is
Examples:
| Operation | Primal rule | Tangent rule |
|---|---|---|
| Addition | ||
| Subtraction | ||
| Multiplication | ||
| Division | ||
| Power, constant exponent | ||
| Power, variable exponent |
The variable-exponent power rule assumes . If , real-valued semantics become restricted and implementation-specific.
Constants and input variables
Constants have zero tangent:
Input variables receive tangent seeds. For a function
to compute the derivative in direction
initialize
Then evaluate the program using forward rules. The output tangent is
To compute the derivative with respect to , use the basis seed
Example: composed expression
Consider
Write it as a sequence of primitive operations:
Seed
Now evaluate forward:
Substitute :
Forward evaluation has applied the chain rule one primitive at a time.
Vector-valued primitives
Many AD systems operate on arrays and tensors rather than scalar variables. The same rule applies, but each primitive may represent a vector operation.
For
where are vectors or tensors,
For elementwise multiplication,
the rule is
For matrix multiplication,
the tangent rule is
For a linear solve,
equivalently
the tangent satisfies
So
The rule should be implemented by solving another linear system, not by explicitly forming .
Broadcasting rules
Array programs often use broadcasting. Suppose
Then
Forward mode follows the primal broadcasting shape. The tangent of a broadcast value is broadcast in the same way.
Unlike reverse mode, forward mode usually does not need a reduction to undo broadcasting. The tangent flows in the same direction as the primal computation.
Reductions
For a reduction such as
the forward rule is
For a mean,
the rule is
For a product,
the rule can be written as
When all are nonzero, this can be computed as
The second form is shorter, but it has different numerical behavior near zero. A robust implementation must handle zeros carefully.
Conditionals
For conditionals, forward mode follows the branch taken by the primal execution.
if x > 0:
y = x * x
else:
y = 0If , then
If , then
At the branch boundary , the mathematical derivative may be undefined or may depend on the chosen convention. Forward mode differentiates the executed path. It does not automatically reason about all possible paths.
Loops
Loops are handled by repeated application of local rules. Consider:
y = 1
for i in 1..n:
y = y * xThe primal computes
The tangent recurrence is
After iterations, the tangent is
Forward mode does not need to unroll the loop symbolically. It propagates the tangent through each executed iteration.
Function calls
If a program calls a function,
z = g(x, y)then the AD system needs a forward rule for . There are two common cases.
If is written in differentiable code, the AD transform can enter the function body and propagate tangents through its operations.
If is a primitive or external library function, the system needs a custom rule:
This custom rule is often called a JVP rule, because it computes a Jacobian-vector product.
Rule correctness
A forward rule is correct when it preserves the first-order expansion of the primitive.
For each primitive , the rule must satisfy
For multiple inputs:
Because , only first-order terms remain. This gives a precise test for the rule.
Implementation sketch
A minimal forward evaluator can represent values as:
type Dual struct {
Value float64
Tangent float64
}Primitive rules are then ordinary functions:
func Add(a, b Dual) Dual {
return Dual{
Value: a.Value + b.Value,
Tangent: a.Tangent + b.Tangent,
}
}
func Mul(a, b Dual) Dual {
return Dual{
Value: a.Value * b.Value,
Tangent: a.Tangent*b.Value + a.Value*b.Tangent,
}
}
func Sin(a Dual) Dual {
return Dual{
Value: math.Sin(a.Value),
Tangent: math.Cos(a.Value) * a.Tangent,
}
}
func Exp(a Dual) Dual {
v := math.Exp(a.Value)
return Dual{
Value: v,
Tangent: v * a.Tangent,
}
}A function written using these operations computes both value and derivative:
func F(x Dual) Dual {
return Exp(Add(Sin(x), Mul(x, x)))
}Calling it with
x := Dual{Value: 2, Tangent: 1}
y := F(x)returns
y.Value = exp(sin(2) + 4)
y.Tangent = exp(sin(2) + 4) * (cos(2) + 4)Summary
Forward evaluation rules extend each primitive operation from primal values to primal-tangent pairs. The tangent rule is the local Jacobian of the primitive applied to the input tangents.
The whole AD computation is obtained by ordinary execution of these extended primitives. Constants receive zero tangent, inputs receive seed tangents, and every intermediate variable receives a tangent computed by the local chain rule.