A minimal forward mode automatic differentiation engine has one job: evaluate a program while carrying both a value and its derivative. The engine does not build a graph. It...
Minimal Forward Mode Engine
A minimal forward mode automatic differentiation engine has one job: evaluate a program while carrying both a value and its derivative. The engine does not build a graph. It does not store a tape. It computes derivatives in the same order as the original computation.
Forward mode is best understood as ordinary evaluation over a richer number type.
Instead of evaluating with real numbers, we evaluate with pairs:
The first component is the primal value. The second component is the tangent value. The tangent records how the output changes when the input is perturbed in a chosen direction.
For a scalar function
we initialize the input as
Then the final tangent is exactly .
For a multivariate function
the tangent represents a direction vector . Forward mode computes the Jacobian-vector product:
This is the central operation of forward mode.
The Core Data Type
A minimal engine can start with a single type:
type Dual struct {
Value float64
Deriv float64
}Value is the ordinary numeric value. Deriv is the derivative with respect to the chosen seed direction.
For scalar differentiation, the input variable receives derivative 1.
func Var(x float64) Dual {
return Dual{
Value: x,
Deriv: 1,
}
}Constants receive derivative 0.
func Const(x float64) Dual {
return Dual{
Value: x,
Deriv: 0,
}
}This distinction matters. A variable changes when the input changes. A constant does not.
Arithmetic Rules
Each primitive operation must define how values and derivatives propagate.
For addition:
func Add(a, b Dual) Dual {
return Dual{
Value: a.Value + b.Value,
Deriv: a.Deriv + b.Deriv,
}
}For subtraction:
func Sub(a, b Dual) Dual {
return Dual{
Value: a.Value - b.Value,
Deriv: a.Deriv - b.Deriv,
}
}For multiplication:
func Mul(a, b Dual) Dual {
return Dual{
Value: a.Value * b.Value,
Deriv: a.Deriv*b.Value + a.Value*b.Deriv,
}
}For division:
func Div(a, b Dual) Dual {
return Dual{
Value: a.Value / b.Value,
Deriv: (a.Deriv*b.Value - a.Value*b.Deriv) / (b.Value * b.Value),
}
}These are just the ordinary derivative rules encoded as executable code.
Elementary Functions
The same pattern extends to functions such as sin, cos, exp, and log.
func Sin(x Dual) Dual {
return Dual{
Value: math.Sin(x.Value),
Deriv: math.Cos(x.Value) * x.Deriv,
}
}
func Cos(x Dual) Dual {
return Dual{
Value: math.Cos(x.Value),
Deriv: -math.Sin(x.Value) * x.Deriv,
}
}
func Exp(x Dual) Dual {
e := math.Exp(x.Value)
return Dual{
Value: e,
Deriv: e * x.Deriv,
}
}
func Log(x Dual) Dual {
return Dual{
Value: math.Log(x.Value),
Deriv: x.Deriv / x.Value,
}
}Each rule has the same shape:
Forward mode applies the chain rule locally at every operation.
A Complete Example
Consider:
In Go:
func F(x Dual) Dual {
return Add(
Add(
Mul(x, x),
Mul(Const(3), x),
),
Const(2),
)
}Evaluate at :
func main() {
y := F(Var(5))
fmt.Println("value:", y.Value)
fmt.Println("derivative:", y.Deriv)
}The result is:
value: 42
derivative: 13The derivative is correct because:
and therefore:
The engine never constructs the symbolic expression . It also never estimates the derivative using finite differences. It computes the derivative exactly through the program structure, subject only to floating point arithmetic.
Multivariate Inputs
For a function
we can compute partial derivatives by choosing different seeds.
func G(x, y Dual) Dual {
return Add(
Mul(x, y),
Sin(x),
)
}To compute , seed x with 1 and y with 0.
x := Dual{Value: 2, Deriv: 1}
y := Dual{Value: 3, Deriv: 0}
out := G(x, y)The tangent gives:
To compute , seed x with 0 and y with 1.
x := Dual{Value: 2, Deriv: 0}
y := Dual{Value: 3, Deriv: 1}
out := G(x, y)The tangent gives:
A full gradient for requires forward passes if we use scalar tangents. Each pass seeds one input direction.
Vector Tangents
A more general engine stores a vector of derivatives.
type DualVec struct {
Value float64
Deriv []float64
}Now one forward pass can carry multiple seed directions.
For example, for two inputs:
x := DualVec{Value: 2, Deriv: []float64{1, 0}}
y := DualVec{Value: 3, Deriv: []float64{0, 1}}The output derivative vector contains both partial derivatives.
This is convenient, but it changes the cost model. Every primitive operation now performs vector arithmetic on the tangent field.
Scalar tangent:
cost per primitive: O(1)Vector tangent of width k:
cost per primitive: O(k)The choice depends on the shape of the problem.
Minimal Engine Interface
A small Go-style API can expose only the essential operations:
type Dual struct {
Value float64
Deriv float64
}
func Var(x float64) Dual
func Const(x float64) Dual
func Add(a, b Dual) Dual
func Sub(a, b Dual) Dual
func Mul(a, b Dual) Dual
func Div(a, b Dual) Dual
func Sin(x Dual) Dual
func Cos(x Dual) Dual
func Exp(x Dual) Dual
func Log(x Dual) DualThis is enough to differentiate many scalar programs. More functions can be added incrementally.
The important design rule is that every primitive must preserve the invariant:
Dual.Value = primal value
Dual.Deriv = derivative of primal value with respect to the seedOnce this invariant holds for constants, variables, and primitive operations, it holds for every expression built from them.
Why Forward Mode Is Simple
Forward mode is simple because the derivative flows in the same direction as evaluation.
Original program:
inputs -> intermediate values -> outputForward mode:
input tangents -> intermediate tangents -> output tangentThere is no need to revisit earlier operations. There is no backward pass. There is no tape. The engine can run with constant extra memory per active value.
This makes forward mode attractive for:
| Problem shape | Why forward mode fits |
|---|---|
| Few inputs, many outputs | One seed direction can update all outputs |
| Jacobian-vector products | Directly computed |
| Local sensitivity analysis | Cheap for selected directions |
| Embedded systems | Simple memory model |
| Small numerical kernels | Low implementation overhead |
| Higher-order Taylor methods | Natural extension through richer number types |
Limitations
Forward mode becomes expensive when the input dimension is large and the output dimension is small.
For a function:
a full gradient needs scalar forward passes, or one pass with tangent width . Either way, the work scales with the number of inputs.
This is why reverse mode dominates deep learning. Neural networks often have millions or billions of parameters and a scalar loss. Reverse mode can compute the full gradient with work comparable to a small constant multiple of the primal evaluation.
Forward mode remains valuable because it is predictable, local, and easy to implement. It is also the natural primitive for Jacobian-vector products, higher-order methods, and testing reverse-mode systems.
Minimal Correctness Argument
The correctness proof is structural.
For each expression , the engine computes:
where is the seed direction.
For variables, this holds by construction. For constants, the derivative is zero. For each primitive operation, the implementation applies the corresponding derivative rule. For composition, the tangent propagation is exactly the chain rule.
Therefore every expression built from supported primitives has a correct forward-mode derivative.
The engine is small because automatic differentiation does not require symbolic algebra. It only requires local derivative rules and ordinary program evaluation.