Skip to content

Chapter 7. Dual Numbers and Algebraic Structures

Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of...

Algebra of Dual Numbers

Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of carrying only a value, a dual number carries a value and its first-order variation.

A dual number has the form

a+bε a + b\varepsilon

where a,bRa,b \in \mathbb{R}, and ε\varepsilon is a formal element satisfying

ε2=0 \varepsilon^2 = 0

but

ε0. \varepsilon \neq 0.

The element ε\varepsilon behaves like an infinitesimal direction. It is not a small real number. It is an algebraic marker that records first-order change and automatically deletes all second-order terms.

The Basic Algebra

Let

x=a+bε x = a + b\varepsilon

and

y=c+dε. y = c + d\varepsilon.

Addition is componentwise:

x+y=(a+c)+(b+d)ε. x + y = (a+c) + (b+d)\varepsilon.

Multiplication follows ordinary distributivity, together with the rule ε2=0\varepsilon^2 = 0:

xy=(a+bε)(c+dε) xy = (a+b\varepsilon)(c+d\varepsilon) =ac+adε+bcε+bdε2 = ac + ad\varepsilon + bc\varepsilon + bd\varepsilon^2 =ac+(ad+bc)ε. = ac + (ad+bc)\varepsilon.

So the product rule is built into the multiplication law:

(a,b)(c,d)=(ac,ad+bc). (a,b)(c,d) = (ac, ad+bc).

This is already the core of automatic differentiation. The first component stores the primal value. The second component stores the derivative information.

Dual Numbers as Value-Derivative Pairs

In forward mode AD, we evaluate a function on a dual input

x=a+ε. x = a + \varepsilon.

More generally, if the input is seeded with tangent bb, we write

x=a+bε. x = a + b\varepsilon.

For a smooth scalar function ff, Taylor expansion gives

f(a+bε)=f(a)+f(a)bε+12f(a)b2ε2+. f(a+b\varepsilon) = f(a) + f'(a)b\varepsilon + \frac{1}{2}f''(a)b^2\varepsilon^2 + \cdots.

Since ε2=0\varepsilon^2 = 0, all higher-order terms vanish:

f(a+bε)=f(a)+f(a)bε. f(a+b\varepsilon) = f(a) + f'(a)b\varepsilon.

Thus a single evaluation over dual numbers computes both the value and the directional derivative.

The rule is:

f(a,b)=(f(a),f(a)b). f(a, b) = (f(a), f'(a)b).

For one input, choosing b=1b=1 gives the ordinary derivative:

f(a+ε)=f(a)+f(a)ε. f(a+\varepsilon) = f(a) + f'(a)\varepsilon.

Example

Let

f(x)=x3+2x. f(x) = x^3 + 2x.

Evaluate it at x=5+εx = 5 + \varepsilon:

(5+ε)3+2(5+ε) (5+\varepsilon)^3 + 2(5+\varepsilon) =125+75ε+10+2ε = 125 + 75\varepsilon + 10 + 2\varepsilon =135+77ε. = 135 + 77\varepsilon.

So

f(5)=135 f(5) = 135

and

f(5)=77. f'(5) = 77.

Checking directly:

f(x)=3x2+2 f'(x) = 3x^2 + 2 f(5)=325+2=77. f'(5) = 3 \cdot 25 + 2 = 77.

The derivative appears as the coefficient of ε\varepsilon.

Why ε2=0\varepsilon^2 = 0 Matters

The rule ε2=0\varepsilon^2 = 0 is what makes dual numbers represent first-order calculus. When multiplying perturbations, any second-order term disappears.

For example:

(a+bε)2=a2+2abε+b2ε2=a2+2abε. (a+b\varepsilon)^2 = a^2 + 2ab\varepsilon + b^2\varepsilon^2 = a^2 + 2ab\varepsilon.

The coefficient of ε\varepsilon is exactly the derivative of x2x^2 at aa, applied to direction bb:

D(x2)a[b]=2ab. D(x^2)_a[b] = 2ab.

This pattern holds for every smooth elementary operation used in a program. Dual arithmetic forces each operation to carry both its value and its local linearization.

Division and Inverses

A dual number a+bεa+b\varepsilon has a multiplicative inverse when a0a \neq 0.

We seek

(a+bε)1=c+dε. (a+b\varepsilon)^{-1} = c+d\varepsilon.

The product must equal 11:

(a+bε)(c+dε)=ac+(ad+bc)ε=1. (a+b\varepsilon)(c+d\varepsilon) = ac + (ad+bc)\varepsilon = 1.

So

ac=1 ac = 1

and

ad+bc=0. ad + bc = 0.

Hence

c=1a c = \frac{1}{a}

and

d=ba2. d = -\frac{b}{a^2}.

Therefore

(a+bε)1=1aba2ε. (a+b\varepsilon)^{-1} = \frac{1}{a} - \frac{b}{a^2}\varepsilon.

This corresponds to the derivative rule

ddx1x=1x2. \frac{d}{dx}\frac{1}{x} = -\frac{1}{x^2}.

Division follows from multiplication by the inverse:

a+bεc+dε=(a+bε)(1cdc2ε) \frac{a+b\varepsilon}{c+d\varepsilon} = (a+b\varepsilon) \left( \frac{1}{c} - \frac{d}{c^2}\varepsilon \right) =ac+bcadc2ε. = \frac{a}{c} + \frac{bc-ad}{c^2}\varepsilon.

Elementary Functions

Elementary functions extend naturally to dual numbers. For a smooth function ff,

f(a+bε)=f(a)+f(a)bε. f(a+b\varepsilon) = f(a) + f'(a)b\varepsilon.

This gives direct evaluation rules.

For sine:

sin(a+bε)=sina+bcosaε. \sin(a+b\varepsilon) = \sin a + b\cos a \varepsilon.

For cosine:

cos(a+bε)=cosabsinaε. \cos(a+b\varepsilon) = \cos a - b\sin a \varepsilon.

For exponential:

exp(a+bε)=exp(a)+bexp(a)ε. \exp(a+b\varepsilon) = \exp(a) + b\exp(a)\varepsilon.

For logarithm, assuming a>0a>0:

log(a+bε)=loga+baε. \log(a+b\varepsilon) = \log a + \frac{b}{a}\varepsilon.

For powers:

(a+bε)n=an+nan1bε. (a+b\varepsilon)^n = a^n + n a^{n-1}b\varepsilon.

Each rule has the same shape: compute the primal value, then multiply the local derivative by the incoming tangent.

Dual Numbers and the Chain Rule

The chain rule is not added as an external algorithm. It follows from function composition over dual numbers.

Let

h(x)=f(g(x)). h(x) = f(g(x)).

Evaluate at

x=a+bε. x = a + b\varepsilon.

First apply gg:

g(a+bε)=g(a)+g(a)bε. g(a+b\varepsilon) = g(a) + g'(a)b\varepsilon.

Then apply ff:

f(g(a)+g(a)bε)=f(g(a))+f(g(a))g(a)bε. f(g(a) + g'(a)b\varepsilon) = f(g(a)) + f'(g(a))g'(a)b\varepsilon.

Therefore

h(a)b=f(g(a))g(a)b. h'(a)b = f'(g(a))g'(a)b.

This is exactly the chain rule.

Dual numbers turn the chain rule into ordinary evaluation. A program written over real numbers can often be lifted to dual numbers by replacing each primitive operation with its dual-number version.

Computational Interpretation

In an implementation, a dual number is usually represented as a pair:

type Dual struct {
    Val float64
    Dot float64
}

Here Val is the primal value, and Dot is the tangent.

Addition:

func Add(x, y Dual) Dual {
    return Dual{
        Val: x.Val + y.Val,
        Dot: x.Dot + y.Dot,
    }
}

Multiplication:

func Mul(x, y Dual) Dual {
    return Dual{
        Val: x.Val * y.Val,
        Dot: x.Dot*y.Val + x.Val*y.Dot,
    }
}

Sine:

func Sin(x Dual) Dual {
    return Dual{
        Val: math.Sin(x.Val),
        Dot: math.Cos(x.Val) * x.Dot,
    }
}

A function written against these operations computes derivatives automatically.

For example:

func F(x Dual) Dual {
    return Add(Mul(Mul(x, x), x), Mul(Const(2), x))
}

With input

x := Dual{Val: 5, Dot: 1}

the result is

Dual{Val: 135, Dot: 77}

The same execution computes the primal value and the derivative.

Multiple Inputs

For a function

f:RnR, f : \mathbb{R}^n \to \mathbb{R},

a dual number can propagate one directional derivative at a time. Each input receives a primal value and a tangent seed.

For example, let

f(x,y)=xy+sinx. f(x,y) = xy + \sin x.

To compute the derivative in direction

(vx,vy), (v_x, v_y),

evaluate

x=a+vxε x = a + v_x\varepsilon

and

y=b+vyε. y = b + v_y\varepsilon.

Then

xy=ab+(avy+bvx)ε xy = ab + (av_y + bv_x)\varepsilon

and

sinx=sina+vxcosaε. \sin x = \sin a + v_x\cos a\varepsilon.

So

f(x,y)=ab+sina+(avy+bvx+vxcosa)ε. f(x,y) = ab + \sin a + (av_y + bv_x + v_x\cos a)\varepsilon.

The coefficient of ε\varepsilon is

Df(a,b)[vx,vy]=bvx+avy+vxcosa. Df_{(a,b)}[v_x,v_y] = b v_x + a v_y + v_x\cos a.

Equivalently,

Df(a,b)[v]=f(a,b)v. Df_{(a,b)}[v] = \nabla f(a,b) \cdot v.

Forward mode naturally computes Jacobian-vector products.

Relation to Forward Mode AD

Forward mode AD is dual-number evaluation generalized to programs.

Each program variable carries two components:

variable=(value,tangent). \text{variable} = (\text{value}, \text{tangent}).

Each primitive instruction updates both components.

For a program statement

z=xy, z = x \cdot y,

the lifted statement is

zval=xvalyval z_{\text{val}} = x_{\text{val}} y_{\text{val}} zdot=xdotyval+xvalydot. z_{\text{dot}} = x_{\text{dot}} y_{\text{val}} + x_{\text{val}} y_{\text{dot}}.

For

z=sinx, z = \sin x,

the lifted statement is

zval=sin(xval) z_{\text{val}} = \sin(x_{\text{val}}) zdot=cos(xval)xdot. z_{\text{dot}} = \cos(x_{\text{val}})x_{\text{dot}}.

This local transformation is enough. The global derivative emerges from executing the transformed program.

Algebraic Summary

The dual numbers form a commutative algebra over the real numbers:

D=R[ε]/(ε2). \mathbb{D} = \mathbb{R}[\varepsilon]/(\varepsilon^2).

This notation means: take polynomials in ε\varepsilon, but identify every term containing ε2\varepsilon^2 or higher powers with zero.

Every dual number has a unique form:

a+bε. a + b\varepsilon.

The real part aa is the value. The dual part bb is the first-order coefficient.

This small algebra is powerful because it encodes first-order differential calculus directly into arithmetic. In forward mode AD, differentiation becomes evaluation in the algebra of dual numbers.