Dual numbers give forward mode automatic differentiation a compact algebraic form. Instead of storing a value and a tangent as two unrelated fields, we package them into one...
Dual numbers give forward mode automatic differentiation a compact algebraic form. Instead of storing a value and a tangent as two unrelated fields, we package them into one object:
where is a formal symbol satisfying
The number is the primal value. The number is the tangent. The symbol marks the tangent part.
A dual number is therefore a first-order approximation stored as an algebraic value:
It behaves like an ordinary number, except all terms involving vanish.
Why
The rule means dual numbers keep only first-order information. This mirrors the first-order Taylor expansion:
Dual numbers replace the small perturbation with . Since , every second-order and higher-order term disappears exactly.
So
This is the central identity behind forward mode AD.
Arithmetic with dual numbers
Let
Addition is componentwise:
Multiplication follows ordinary algebra, then removes the term:
Since ,
The tangent part is exactly the product rule.
Division works similarly. For
the dual result is
The tangent part is exactly the quotient rule.
Elementary functions
Dual numbers extend ordinary elementary functions by Taylor expansion.
For a smooth scalar function ,
For example:
Every primitive operation exposes both its value rule and its derivative rule.
Example
Let
Evaluate it on the dual input
This corresponds to primal input and tangent seed .
Now compute:
Expand:
Since ,
Then
Collect primal and tangent parts:
The primal part is . The tangent part is .
At ,
So the function value is , and the derivative is .
Directional derivatives with dual numbers
For a function
we seed each input variable with a tangent component:
The program then computes
The coefficient of is the Jacobian-vector product.
For example, let
Use the seeded inputs
Then
The product term gives
The sine term gives
So
The tangent is
This equals
Implementation form
A dual number can be represented as a pair:
type Dual struct {
Value float64
Tangent float64
}Addition:
func Add(a, b Dual) Dual {
return Dual{
Value: a.Value + b.Value,
Tangent: a.Tangent + b.Tangent,
}
}Multiplication:
func Mul(a, b Dual) Dual {
return Dual{
Value: a.Value * b.Value,
Tangent: a.Tangent*b.Value + a.Value*b.Tangent,
}
}Sine:
func Sin(a Dual) Dual {
return Dual{
Value: math.Sin(a.Value),
Tangent: math.Cos(a.Value) * a.Tangent,
}
}Exponentiation:
func Exp(a Dual) Dual {
v := math.Exp(a.Value)
return Dual{
Value: v,
Tangent: v * a.Tangent,
}
}This representation is enough to build a small forward mode AD system. A user writes ordinary numerical code, but the inputs are dual numbers instead of plain floating point numbers. The overloaded operations then propagate derivatives automatically.
Multiple tangent directions
A scalar dual number stores one tangent direction. To propagate several directions in one pass, replace the scalar tangent with a vector:
type DualVec struct {
Value float64
Tangent []float64
}Now the value is still scalar, but the tangent records several directional derivatives at once.
If the tangent vector has length , one execution computes Jacobian-vector products. This is often called vector forward mode.
For example, to compute the full gradient of a scalar function
one can seed all basis directions at once by giving each input a tangent vector:
The output tangent vector then contains the gradient components.
This is practical when is small or moderate. For very large , reverse mode is usually preferred for scalar outputs.
Dual numbers versus finite differences
Dual numbers may look similar to finite differences because both involve perturbing the input. The difference is fundamental.
Finite differences evaluate
for a small floating point number . The result depends on the choice of . If is too large, truncation error dominates. If is too small, roundoff error dominates.
Dual numbers use a formal perturbation with . There is no small numerical step. The derivative is carried exactly through the arithmetic rules of the program, subject only to the normal floating point errors of the primal and tangent computations.
So dual numbers avoid the step-size problem of finite differences.
Dual numbers versus symbolic differentiation
Dual numbers also differ from symbolic differentiation. Symbolic differentiation constructs an expression for the derivative. This expression may become large and difficult to simplify.
Dual numbers execute the original program once with extended arithmetic. They compute derivative values, not derivative formulas. The derivative computation follows the same structure as the primal computation.
This is why dual numbers are well suited to program differentiation. They do not require the whole program to be converted into a symbolic expression.
Algebraic meaning
The dual numbers form the algebra
This means polynomials in , but with the relation . Every element reduces to the form
The primal value is the constant coefficient. The tangent value is the first-order coefficient.
Forward mode AD can be seen as evaluating a program over this algebra instead of over ordinary real numbers. If the original program computes over , the differentiated program computes over dual numbers.
This view explains why ordinary arithmetic rules automatically become derivative propagation rules. The chain rule is built into composition over the dual number algebra.
Practical limitations
Dual numbers work cleanly for smooth operations. Care is needed for operations that are discontinuous, non-smooth, or undefined at some inputs.
For example:
has no derivative at . A dual-number implementation must choose what to do at that point. It may return an error, return a conventional subgradient, or follow the derivative of the branch taken by the program.
Conditionals introduce similar issues. If a program contains
if x > 0 {
y = x
} else {
y = 0
}then the derivative follows the executed branch. At the boundary , the mathematical derivative may be undefined even though the program still returns a value.
Thus dual numbers provide exact first-order propagation through the executed operations, but they do not remove the mathematical difficulties of non-smooth programs.
Summary
Dual numbers are the algebraic core of forward mode automatic differentiation. A value
stores both a primal value and a tangent value. The rule
removes all higher-order terms, leaving exactly the first-order derivative information.
Evaluating a program on dual numbers computes both the original output and the directional derivative in one execution. This makes dual numbers one of the simplest and most precise ways to implement forward mode AD.