Skip to content

Dual Numbers

Dual numbers give forward mode automatic differentiation a compact algebraic form. Instead of storing a value and a tangent as two unrelated fields, we package them into one...

Dual numbers give forward mode automatic differentiation a compact algebraic form. Instead of storing a value and a tangent as two unrelated fields, we package them into one object:

x+ϵx˙ x + \epsilon \dot{x}

where ϵ\epsilon is a formal symbol satisfying

ϵ2=0. \epsilon^2 = 0.

The number xx is the primal value. The number x˙\dot{x} is the tangent. The symbol ϵ\epsilon marks the tangent part.

A dual number is therefore a first-order approximation stored as an algebraic value:

x+ϵx˙. x + \epsilon \dot{x}.

It behaves like an ordinary number, except all terms involving ϵ2\epsilon^2 vanish.

Why ϵ2=0\epsilon^2 = 0

The rule ϵ2=0\epsilon^2 = 0 means dual numbers keep only first-order information. This mirrors the first-order Taylor expansion:

f(x+h)=f(x)+f(x)h+O(h2). f(x + h) = f(x) + f'(x)h + O(h^2).

Dual numbers replace the small perturbation hh with ϵx˙\epsilon \dot{x}. Since ϵ2=0\epsilon^2 = 0, every second-order and higher-order term disappears exactly.

So

f(x+ϵx˙)=f(x)+ϵf(x)x˙. f(x + \epsilon \dot{x}) = f(x) + \epsilon f'(x)\dot{x}.

This is the central identity behind forward mode AD.

Arithmetic with dual numbers

Let

a=x+ϵx˙,b=y+ϵy˙. a = x + \epsilon \dot{x}, \qquad b = y + \epsilon \dot{y}.

Addition is componentwise:

a+b=(x+y)+ϵ(x˙+y˙). a + b = (x + y) + \epsilon(\dot{x} + \dot{y}).

Multiplication follows ordinary algebra, then removes the ϵ2\epsilon^2 term:

ab=(x+ϵx˙)(y+ϵy˙) ab = (x + \epsilon \dot{x})(y + \epsilon \dot{y}) =xy+ϵxy˙+ϵyx˙+ϵ2x˙y˙. = xy + \epsilon x\dot{y} + \epsilon y\dot{x} + \epsilon^2\dot{x}\dot{y}.

Since ϵ2=0\epsilon^2 = 0,

ab=xy+ϵ(xy˙+yx˙). ab = xy + \epsilon(x\dot{y} + y\dot{x}).

The tangent part is exactly the product rule.

Division works similarly. For

z=xy, z = \frac{x}{y},

the dual result is

x+ϵx˙y+ϵy˙=xy+ϵx˙yxy˙y2. \frac{x + \epsilon\dot{x}}{y + \epsilon\dot{y}} = \frac{x}{y} + \epsilon \frac{\dot{x}y - x\dot{y}}{y^2}.

The tangent part is exactly the quotient rule.

Elementary functions

Dual numbers extend ordinary elementary functions by Taylor expansion.

For a smooth scalar function ff,

f(x+ϵx˙)=f(x)+ϵf(x)x˙. f(x + \epsilon \dot{x}) = f(x) + \epsilon f'(x)\dot{x}.

For example:

sin(x+ϵx˙)=sinx+ϵcosxx˙. \sin(x + \epsilon\dot{x}) = \sin x + \epsilon \cos x \dot{x}. exp(x+ϵx˙)=expx+ϵexpxx˙. \exp(x + \epsilon\dot{x}) = \exp x + \epsilon \exp x \dot{x}. log(x+ϵx˙)=logx+ϵx˙x. \log(x + \epsilon\dot{x}) = \log x + \epsilon \frac{\dot{x}}{x}. x+ϵx˙=x+ϵx˙2x. \sqrt{x + \epsilon\dot{x}} = \sqrt{x} + \epsilon \frac{\dot{x}}{2\sqrt{x}}.

Every primitive operation exposes both its value rule and its derivative rule.

Example

Let

f(x)=x2+3x. f(x) = x^2 + 3x.

Evaluate it on the dual input

x+ϵ. x + \epsilon.

This corresponds to primal input xx and tangent seed x˙=1\dot{x} = 1.

Now compute:

f(x+ϵ)=(x+ϵ)2+3(x+ϵ). f(x + \epsilon) = (x + \epsilon)^2 + 3(x + \epsilon).

Expand:

(x+ϵ)2=x2+2ϵx+ϵ2. (x + \epsilon)^2 = x^2 + 2\epsilon x + \epsilon^2.

Since ϵ2=0\epsilon^2 = 0,

(x+ϵ)2=x2+2ϵx. (x + \epsilon)^2 = x^2 + 2\epsilon x.

Then

f(x+ϵ)=x2+2ϵx+3x+3ϵ. f(x + \epsilon) = x^2 + 2\epsilon x + 3x + 3\epsilon.

Collect primal and tangent parts:

f(x+ϵ)=(x2+3x)+ϵ(2x+3). f(x + \epsilon) = (x^2 + 3x) + \epsilon(2x + 3).

The primal part is f(x)f(x). The tangent part is f(x)f'(x).

At x=5x = 5,

f(5+ϵ)=40+13ϵ. f(5 + \epsilon) = 40 + 13\epsilon.

So the function value is 4040, and the derivative is 1313.

Directional derivatives with dual numbers

For a function

f:RnRm, f : \mathbb{R}^n \to \mathbb{R}^m,

we seed each input variable with a tangent component:

xixi+ϵx˙i. x_i \mapsto x_i + \epsilon \dot{x}_i.

The program then computes

f(x+ϵx˙)=f(x)+ϵJf(x)x˙. f(x + \epsilon \dot{x}) = f(x) + \epsilon J_f(x)\dot{x}.

The coefficient of ϵ\epsilon is the Jacobian-vector product.

For example, let

f(x,y)=xy+sinx. f(x, y) = xy + \sin x.

Use the seeded inputs

x+ϵx˙,y+ϵy˙. x + \epsilon \dot{x}, \qquad y + \epsilon \dot{y}.

Then

f(x+ϵx˙,y+ϵy˙)=(x+ϵx˙)(y+ϵy˙)+sin(x+ϵx˙). f(x + \epsilon\dot{x}, y + \epsilon\dot{y}) = (x + \epsilon\dot{x})(y + \epsilon\dot{y}) + \sin(x + \epsilon\dot{x}).

The product term gives

xy+ϵ(xy˙+yx˙). xy + \epsilon(x\dot{y} + y\dot{x}).

The sine term gives

sinx+ϵcosxx˙. \sin x + \epsilon \cos x \dot{x}.

So

f(x+ϵx˙,y+ϵy˙)=xy+sinx+ϵ(xy˙+yx˙+cosxx˙). f(x + \epsilon\dot{x}, y + \epsilon\dot{y}) = xy + \sin x + \epsilon(x\dot{y} + y\dot{x} + \cos x \dot{x}).

The tangent is

f˙=(y+cosx)x˙+xy˙. \dot{f} = (y + \cos x)\dot{x} + x\dot{y}.

This equals

Jf(x,y)[x˙y˙]. J_f(x, y) \begin{bmatrix} \dot{x} \\ \dot{y} \end{bmatrix}.

Implementation form

A dual number can be represented as a pair:

type Dual struct {
    Value   float64
    Tangent float64
}

Addition:

func Add(a, b Dual) Dual {
    return Dual{
        Value:   a.Value + b.Value,
        Tangent: a.Tangent + b.Tangent,
    }
}

Multiplication:

func Mul(a, b Dual) Dual {
    return Dual{
        Value:   a.Value * b.Value,
        Tangent: a.Tangent*b.Value + a.Value*b.Tangent,
    }
}

Sine:

func Sin(a Dual) Dual {
    return Dual{
        Value:   math.Sin(a.Value),
        Tangent: math.Cos(a.Value) * a.Tangent,
    }
}

Exponentiation:

func Exp(a Dual) Dual {
    v := math.Exp(a.Value)
    return Dual{
        Value:   v,
        Tangent: v * a.Tangent,
    }
}

This representation is enough to build a small forward mode AD system. A user writes ordinary numerical code, but the inputs are dual numbers instead of plain floating point numbers. The overloaded operations then propagate derivatives automatically.

Multiple tangent directions

A scalar dual number stores one tangent direction. To propagate several directions in one pass, replace the scalar tangent with a vector:

type DualVec struct {
    Value   float64
    Tangent []float64
}

Now the value is still scalar, but the tangent records several directional derivatives at once.

If the tangent vector has length kk, one execution computes kk Jacobian-vector products. This is often called vector forward mode.

For example, to compute the full gradient of a scalar function

f:RnR, f : \mathbb{R}^n \to \mathbb{R},

one can seed all nn basis directions at once by giving each input a tangent vector:

x1x1+ϵe1, x_1 \mapsto x_1 + \epsilon e_1, x2x2+ϵe2, x_2 \mapsto x_2 + \epsilon e_2, \cdots xnxn+ϵen. x_n \mapsto x_n + \epsilon e_n.

The output tangent vector then contains the gradient components.

This is practical when nn is small or moderate. For very large nn, reverse mode is usually preferred for scalar outputs.

Dual numbers versus finite differences

Dual numbers may look similar to finite differences because both involve perturbing the input. The difference is fundamental.

Finite differences evaluate

f(x+h)f(x)h \frac{f(x + h) - f(x)}{h}

for a small floating point number hh. The result depends on the choice of hh. If hh is too large, truncation error dominates. If hh is too small, roundoff error dominates.

Dual numbers use a formal perturbation ϵ\epsilon with ϵ2=0\epsilon^2 = 0. There is no small numerical step. The derivative is carried exactly through the arithmetic rules of the program, subject only to the normal floating point errors of the primal and tangent computations.

So dual numbers avoid the step-size problem of finite differences.

Dual numbers versus symbolic differentiation

Dual numbers also differ from symbolic differentiation. Symbolic differentiation constructs an expression for the derivative. This expression may become large and difficult to simplify.

Dual numbers execute the original program once with extended arithmetic. They compute derivative values, not derivative formulas. The derivative computation follows the same structure as the primal computation.

This is why dual numbers are well suited to program differentiation. They do not require the whole program to be converted into a symbolic expression.

Algebraic meaning

The dual numbers form the algebra

R[ϵ]/(ϵ2). \mathbb{R}[\epsilon] / (\epsilon^2).

This means polynomials in ϵ\epsilon, but with the relation ϵ2=0\epsilon^2 = 0. Every element reduces to the form

a+ϵb. a + \epsilon b.

The primal value aa is the constant coefficient. The tangent value bb is the first-order coefficient.

Forward mode AD can be seen as evaluating a program over this algebra instead of over ordinary real numbers. If the original program computes over R\mathbb{R}, the differentiated program computes over dual numbers.

This view explains why ordinary arithmetic rules automatically become derivative propagation rules. The chain rule is built into composition over the dual number algebra.

Practical limitations

Dual numbers work cleanly for smooth operations. Care is needed for operations that are discontinuous, non-smooth, or undefined at some inputs.

For example:

x |x|

has no derivative at x=0x = 0. A dual-number implementation must choose what to do at that point. It may return an error, return a conventional subgradient, or follow the derivative of the branch taken by the program.

Conditionals introduce similar issues. If a program contains

if x > 0 {
    y = x
} else {
    y = 0
}

then the derivative follows the executed branch. At the boundary x=0x = 0, the mathematical derivative may be undefined even though the program still returns a value.

Thus dual numbers provide exact first-order propagation through the executed operations, but they do not remove the mathematical difficulties of non-smooth programs.

Summary

Dual numbers are the algebraic core of forward mode automatic differentiation. A value

x+ϵx˙ x + \epsilon\dot{x}

stores both a primal value and a tangent value. The rule

ϵ2=0 \epsilon^2 = 0

removes all higher-order terms, leaving exactly the first-order derivative information.

Evaluating a program on dual numbers computes both the original output and the directional derivative in one execution. This makes dual numbers one of the simplest and most precise ways to implement forward mode AD.