# Chapter 7. Dual Numbers and Algebraic Structures

## Algebra of Dual Numbers

Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of carrying only a value, a dual number carries a value and its first-order variation.

A dual number has the form

$$
a + b\varepsilon
$$

where $a,b \in \mathbb{R}$, and $\varepsilon$ is a formal element satisfying

$$
\varepsilon^2 = 0
$$

but

$$
\varepsilon \neq 0.
$$

The element $\varepsilon$ behaves like an infinitesimal direction. It is not a small real number. It is an algebraic marker that records first-order change and automatically deletes all second-order terms.

### The Basic Algebra

Let

$$
x = a + b\varepsilon
$$

and

$$
y = c + d\varepsilon.
$$

Addition is componentwise:

$$
x + y = (a+c) + (b+d)\varepsilon.
$$

Multiplication follows ordinary distributivity, together with the rule $\varepsilon^2 = 0$:

$$
xy = (a+b\varepsilon)(c+d\varepsilon)
$$

$$
= ac + ad\varepsilon + bc\varepsilon + bd\varepsilon^2
$$

$$
= ac + (ad+bc)\varepsilon.
$$

So the product rule is built into the multiplication law:

$$
(a,b)(c,d) = (ac, ad+bc).
$$

This is already the core of automatic differentiation. The first component stores the primal value. The second component stores the derivative information.

### Dual Numbers as Value-Derivative Pairs

In forward mode AD, we evaluate a function on a dual input

$$
x = a + \varepsilon.
$$

More generally, if the input is seeded with tangent $b$, we write

$$
x = a + b\varepsilon.
$$

For a smooth scalar function $f$, Taylor expansion gives

$$
f(a+b\varepsilon) =
f(a) + f'(a)b\varepsilon + \frac{1}{2}f''(a)b^2\varepsilon^2 + \cdots.
$$

Since $\varepsilon^2 = 0$, all higher-order terms vanish:

$$
f(a+b\varepsilon) = f(a) + f'(a)b\varepsilon.
$$

Thus a single evaluation over dual numbers computes both the value and the directional derivative.

The rule is:

$$
f(a, b) = (f(a), f'(a)b).
$$

For one input, choosing $b=1$ gives the ordinary derivative:

$$
f(a+\varepsilon) = f(a) + f'(a)\varepsilon.
$$

### Example

Let

$$
f(x) = x^3 + 2x.
$$

Evaluate it at $x = 5 + \varepsilon$:

$$
(5+\varepsilon)^3 + 2(5+\varepsilon)
$$

$$
= 125 + 75\varepsilon + 10 + 2\varepsilon
$$

$$
= 135 + 77\varepsilon.
$$

So

$$
f(5) = 135
$$

and

$$
f'(5) = 77.
$$

Checking directly:

$$
f'(x) = 3x^2 + 2
$$

$$
f'(5) = 3 \cdot 25 + 2 = 77.
$$

The derivative appears as the coefficient of $\varepsilon$.

### Why $\varepsilon^2 = 0$ Matters

The rule $\varepsilon^2 = 0$ is what makes dual numbers represent first-order calculus. When multiplying perturbations, any second-order term disappears.

For example:

$$
(a+b\varepsilon)^2 =
a^2 + 2ab\varepsilon + b^2\varepsilon^2 =
a^2 + 2ab\varepsilon.
$$

The coefficient of $\varepsilon$ is exactly the derivative of $x^2$ at $a$, applied to direction $b$:

$$
D(x^2)_a[b] = 2ab.
$$

This pattern holds for every smooth elementary operation used in a program. Dual arithmetic forces each operation to carry both its value and its local linearization.

### Division and Inverses

A dual number $a+b\varepsilon$ has a multiplicative inverse when $a \neq 0$.

We seek

$$
(a+b\varepsilon)^{-1} = c+d\varepsilon.
$$

The product must equal $1$:

$$
(a+b\varepsilon)(c+d\varepsilon) =
ac + (ad+bc)\varepsilon =
1.
$$

So

$$
ac = 1
$$

and

$$
ad + bc = 0.
$$

Hence

$$
c = \frac{1}{a}
$$

and

$$
d = -\frac{b}{a^2}.
$$

Therefore

$$
(a+b\varepsilon)^{-1} =
\frac{1}{a} - \frac{b}{a^2}\varepsilon.
$$

This corresponds to the derivative rule

$$
\frac{d}{dx}\frac{1}{x} = -\frac{1}{x^2}.
$$

Division follows from multiplication by the inverse:

$$
\frac{a+b\varepsilon}{c+d\varepsilon} =
(a+b\varepsilon)
\left(
\frac{1}{c} - \frac{d}{c^2}\varepsilon
\right)
$$

$$ =
\frac{a}{c}
+
\frac{bc-ad}{c^2}\varepsilon.
$$

### Elementary Functions

Elementary functions extend naturally to dual numbers. For a smooth function $f$,

$$
f(a+b\varepsilon) = f(a) + f'(a)b\varepsilon.
$$

This gives direct evaluation rules.

For sine:

$$
\sin(a+b\varepsilon) =
\sin a + b\cos a \varepsilon.
$$

For cosine:

$$
\cos(a+b\varepsilon) =
\cos a - b\sin a \varepsilon.
$$

For exponential:

$$
\exp(a+b\varepsilon) =
\exp(a) + b\exp(a)\varepsilon.
$$

For logarithm, assuming $a>0$:

$$
\log(a+b\varepsilon) =
\log a + \frac{b}{a}\varepsilon.
$$

For powers:

$$
(a+b\varepsilon)^n =
a^n + n a^{n-1}b\varepsilon.
$$

Each rule has the same shape: compute the primal value, then multiply the local derivative by the incoming tangent.

### Dual Numbers and the Chain Rule

The chain rule is not added as an external algorithm. It follows from function composition over dual numbers.

Let

$$
h(x) = f(g(x)).
$$

Evaluate at

$$
x = a + b\varepsilon.
$$

First apply $g$:

$$
g(a+b\varepsilon) =
g(a) + g'(a)b\varepsilon.
$$

Then apply $f$:

$$
f(g(a) + g'(a)b\varepsilon) =
f(g(a)) + f'(g(a))g'(a)b\varepsilon.
$$

Therefore

$$
h'(a)b = f'(g(a))g'(a)b.
$$

This is exactly the chain rule.

Dual numbers turn the chain rule into ordinary evaluation. A program written over real numbers can often be lifted to dual numbers by replacing each primitive operation with its dual-number version.

### Computational Interpretation

In an implementation, a dual number is usually represented as a pair:

```go
type Dual struct {
    Val float64
    Dot float64
}
```

Here `Val` is the primal value, and `Dot` is the tangent.

Addition:

```go
func Add(x, y Dual) Dual {
    return Dual{
        Val: x.Val + y.Val,
        Dot: x.Dot + y.Dot,
    }
}
```

Multiplication:

```go
func Mul(x, y Dual) Dual {
    return Dual{
        Val: x.Val * y.Val,
        Dot: x.Dot*y.Val + x.Val*y.Dot,
    }
}
```

Sine:

```go
func Sin(x Dual) Dual {
    return Dual{
        Val: math.Sin(x.Val),
        Dot: math.Cos(x.Val) * x.Dot,
    }
}
```

A function written against these operations computes derivatives automatically.

For example:

```go
func F(x Dual) Dual {
    return Add(Mul(Mul(x, x), x), Mul(Const(2), x))
}
```

With input

```go
x := Dual{Val: 5, Dot: 1}
```

the result is

```go
Dual{Val: 135, Dot: 77}
```

The same execution computes the primal value and the derivative.

### Multiple Inputs

For a function

$$
f : \mathbb{R}^n \to \mathbb{R},
$$

a dual number can propagate one directional derivative at a time. Each input receives a primal value and a tangent seed.

For example, let

$$
f(x,y) = xy + \sin x.
$$

To compute the derivative in direction

$$
(v_x, v_y),
$$

evaluate

$$
x = a + v_x\varepsilon
$$

and

$$
y = b + v_y\varepsilon.
$$

Then

$$
xy = ab + (av_y + bv_x)\varepsilon
$$

and

$$
\sin x = \sin a + v_x\cos a\varepsilon.
$$

So

$$
f(x,y) =
ab + \sin a
+
(av_y + bv_x + v_x\cos a)\varepsilon.
$$

The coefficient of $\varepsilon$ is

$$
Df_{(a,b)}[v_x,v_y] =
b v_x + a v_y + v_x\cos a.
$$

Equivalently,

$$
Df_{(a,b)}[v] =
\nabla f(a,b) \cdot v.
$$

Forward mode naturally computes Jacobian-vector products.

### Relation to Forward Mode AD

Forward mode AD is dual-number evaluation generalized to programs.

Each program variable carries two components:

$$
\text{variable} = (\text{value}, \text{tangent}).
$$

Each primitive instruction updates both components.

For a program statement

$$
z = x \cdot y,
$$

the lifted statement is

$$
z_{\text{val}} = x_{\text{val}} y_{\text{val}}
$$

$$
z_{\text{dot}} =
x_{\text{dot}} y_{\text{val}}
+
x_{\text{val}} y_{\text{dot}}.
$$

For

$$
z = \sin x,
$$

the lifted statement is

$$
z_{\text{val}} = \sin(x_{\text{val}})
$$

$$
z_{\text{dot}} = \cos(x_{\text{val}})x_{\text{dot}}.
$$

This local transformation is enough. The global derivative emerges from executing the transformed program.

### Algebraic Summary

The dual numbers form a commutative algebra over the real numbers:

$$
\mathbb{D} = \mathbb{R}[\varepsilon]/(\varepsilon^2).
$$

This notation means: take polynomials in $\varepsilon$, but identify every term containing $\varepsilon^2$ or higher powers with zero.

Every dual number has a unique form:

$$
a + b\varepsilon.
$$

The real part $a$ is the value. The dual part $b$ is the first-order coefficient.

This small algebra is powerful because it encodes first-order differential calculus directly into arithmetic. In forward mode AD, differentiation becomes evaluation in the algebra of dual numbers.

