Chapter 7. Dual Numbers and Algebraic Structures

Algebra of Dual Numbers

Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of carrying only a value, a dual number carries a value and its first-order variation.

A dual number has the form

a + b\varepsilon

where $a,b \in \mathbb{R}$ , and $\varepsilon$ is a formal element satisfying

\varepsilon^2 = 0

but

\varepsilon \neq 0.

The element $\varepsilon$ behaves like an infinitesimal direction. It is not a small real number. It is an algebraic marker that records first-order change and automatically deletes all second-order terms.

The Basic Algebra

Let

x = a + b\varepsilon

and

y = c + d\varepsilon.

Addition is componentwise:

x + y = (a+c) + (b+d)\varepsilon.

Multiplication follows ordinary distributivity, together with the rule $\varepsilon^2 = 0$ :

xy = (a+b\varepsilon)(c+d\varepsilon)

= ac + ad\varepsilon + bc\varepsilon + bd\varepsilon^2

= ac + (ad+bc)\varepsilon.

So the product rule is built into the multiplication law:

(a,b)(c,d) = (ac, ad+bc).

This is already the core of automatic differentiation. The first component stores the primal value. The second component stores the derivative information.

Dual Numbers as Value-Derivative Pairs

In forward mode AD, we evaluate a function on a dual input

x = a + \varepsilon.

More generally, if the input is seeded with tangent $b$ , we write

x = a + b\varepsilon.

For a smooth scalar function $f$ , Taylor expansion gives

f(a+b\varepsilon) = f(a) + f'(a)b\varepsilon + \frac{1}{2}f''(a)b^2\varepsilon^2 + \cdots.

Since $\varepsilon^2 = 0$ , all higher-order terms vanish:

f(a+b\varepsilon) = f(a) + f'(a)b\varepsilon.

Thus a single evaluation over dual numbers computes both the value and the directional derivative.

The rule is:

f(a, b) = (f(a), f'(a)b).

For one input, choosing $b=1$ gives the ordinary derivative:

f(a+\varepsilon) = f(a) + f'(a)\varepsilon.

Example

Let

f(x) = x^3 + 2x.

Evaluate it at $x = 5 + \varepsilon$ :

(5+\varepsilon)^3 + 2(5+\varepsilon)

= 125 + 75\varepsilon + 10 + 2\varepsilon

= 135 + 77\varepsilon.

f(5) = 135

and

f'(5) = 77.

Checking directly:

f'(x) = 3x^2 + 2

f'(5) = 3 \cdot 25 + 2 = 77.

The derivative appears as the coefficient of $\varepsilon$ .

Why $\varepsilon^2 = 0$ Matters

The rule $\varepsilon^2 = 0$ is what makes dual numbers represent first-order calculus. When multiplying perturbations, any second-order term disappears.

For example:

(a+b\varepsilon)^2 = a^2 + 2ab\varepsilon + b^2\varepsilon^2 = a^2 + 2ab\varepsilon.

The coefficient of $\varepsilon$ is exactly the derivative of $x^2$ at $a$ , applied to direction $b$ :

D(x^2)_a[b] = 2ab.

This pattern holds for every smooth elementary operation used in a program. Dual arithmetic forces each operation to carry both its value and its local linearization.

Division and Inverses

A dual number $a+b\varepsilon$ has a multiplicative inverse when $a \neq 0$ .

We seek

(a+b\varepsilon)^{-1} = c+d\varepsilon.

The product must equal $1$ :

(a+b\varepsilon)(c+d\varepsilon) = ac + (ad+bc)\varepsilon = 1.

ac = 1

and

ad + bc = 0.

Hence

c = \frac{1}{a}

and

d = -\frac{b}{a^2}.

Therefore

(a+b\varepsilon)^{-1} = \frac{1}{a} - \frac{b}{a^2}\varepsilon.

This corresponds to the derivative rule

\frac{d}{dx}\frac{1}{x} = -\frac{1}{x^2}.

Division follows from multiplication by the inverse:

\frac{a+b\varepsilon}{c+d\varepsilon} = (a+b\varepsilon) \left( \frac{1}{c} - \frac{d}{c^2}\varepsilon \right)

= \frac{a}{c} + \frac{bc-ad}{c^2}\varepsilon.

Elementary Functions

Elementary functions extend naturally to dual numbers. For a smooth function $f$ ,

f(a+b\varepsilon) = f(a) + f'(a)b\varepsilon.

This gives direct evaluation rules.

For sine:

\sin(a+b\varepsilon) = \sin a + b\cos a \varepsilon.

For cosine:

\cos(a+b\varepsilon) = \cos a - b\sin a \varepsilon.

For exponential:

\exp(a+b\varepsilon) = \exp(a) + b\exp(a)\varepsilon.

For logarithm, assuming $a>0$ :

\log(a+b\varepsilon) = \log a + \frac{b}{a}\varepsilon.

For powers:

(a+b\varepsilon)^n = a^n + n a^{n-1}b\varepsilon.

Each rule has the same shape: compute the primal value, then multiply the local derivative by the incoming tangent.

Dual Numbers and the Chain Rule

The chain rule is not added as an external algorithm. It follows from function composition over dual numbers.

Let

h(x) = f(g(x)).

Evaluate at

x = a + b\varepsilon.

First apply $g$ :

g(a+b\varepsilon) = g(a) + g'(a)b\varepsilon.

Then apply $f$ :

f(g(a) + g'(a)b\varepsilon) = f(g(a)) + f'(g(a))g'(a)b\varepsilon.

Therefore

h'(a)b = f'(g(a))g'(a)b.

This is exactly the chain rule.

Dual numbers turn the chain rule into ordinary evaluation. A program written over real numbers can often be lifted to dual numbers by replacing each primitive operation with its dual-number version.

Computational Interpretation

In an implementation, a dual number is usually represented as a pair:

type Dual struct {
    Val float64
    Dot float64
}

Here Val is the primal value, and Dot is the tangent.

Addition:

func Add(x, y Dual) Dual {
    return Dual{
        Val: x.Val + y.Val,
        Dot: x.Dot + y.Dot,
    }
}

Multiplication:

func Mul(x, y Dual) Dual {
    return Dual{
        Val: x.Val * y.Val,
        Dot: x.Dot*y.Val + x.Val*y.Dot,
    }
}

Sine:

func Sin(x Dual) Dual {
    return Dual{
        Val: math.Sin(x.Val),
        Dot: math.Cos(x.Val) * x.Dot,
    }
}

A function written against these operations computes derivatives automatically.

For example:

func F(x Dual) Dual {
    return Add(Mul(Mul(x, x), x), Mul(Const(2), x))
}

With input

x := Dual{Val: 5, Dot: 1}

the result is

Dual{Val: 135, Dot: 77}

The same execution computes the primal value and the derivative.

Multiple Inputs

For a function

f : \mathbb{R}^n \to \mathbb{R},

a dual number can propagate one directional derivative at a time. Each input receives a primal value and a tangent seed.

For example, let

f(x,y) = xy + \sin x.

To compute the derivative in direction

(v_x, v_y),

evaluate

x = a + v_x\varepsilon

and

y = b + v_y\varepsilon.

Then

xy = ab + (av_y + bv_x)\varepsilon

and

\sin x = \sin a + v_x\cos a\varepsilon.

f(x,y) = ab + \sin a + (av_y + bv_x + v_x\cos a)\varepsilon.

The coefficient of $\varepsilon$ is

Df_{(a,b)}[v_x,v_y] = b v_x + a v_y + v_x\cos a.

Equivalently,

Df_{(a,b)}[v] = \nabla f(a,b) \cdot v.

Forward mode naturally computes Jacobian-vector products.

Relation to Forward Mode AD

Forward mode AD is dual-number evaluation generalized to programs.

Each program variable carries two components:

\text{variable} = (\text{value}, \text{tangent}).

Each primitive instruction updates both components.

For a program statement

z = x \cdot y,

the lifted statement is

z_{\text{val}} = x_{\text{val}} y_{\text{val}}

z_{\text{dot}} = x_{\text{dot}} y_{\text{val}} + x_{\text{val}} y_{\text{dot}}.

For

z = \sin x,

the lifted statement is

z_{\text{val}} = \sin(x_{\text{val}})

z_{\text{dot}} = \cos(x_{\text{val}})x_{\text{dot}}.

This local transformation is enough. The global derivative emerges from executing the transformed program.

Algebraic Summary

The dual numbers form a commutative algebra over the real numbers:

\mathbb{D} = \mathbb{R}[\varepsilon]/(\varepsilon^2).

This notation means: take polynomials in $\varepsilon$ , but identify every term containing $\varepsilon^2$ or higher powers with zero.

Every dual number has a unique form:

a + b\varepsilon.

The real part $a$ is the value. The dual part $b$ is the first-order coefficient.

This small algebra is powerful because it encodes first-order differential calculus directly into arithmetic. In forward mode AD, differentiation becomes evaluation in the algebra of dual numbers.