# Chapter 20. Building an AD Engine

## Minimal Forward Mode Engine

A minimal forward mode automatic differentiation engine has one job: evaluate a program while carrying both a value and its derivative. The engine does not build a graph. It does not store a tape. It computes derivatives in the same order as the original computation.

Forward mode is best understood as ordinary evaluation over a richer number type.

Instead of evaluating with real numbers, we evaluate with pairs:

$$
x \mapsto (x, \dot{x})
$$

The first component is the primal value. The second component is the tangent value. The tangent records how the output changes when the input is perturbed in a chosen direction.

For a scalar function

$$
f : \mathbb{R} \to \mathbb{R}
$$

we initialize the input as

$$
x \mapsto (x, 1)
$$

Then the final tangent is exactly $f'(x)$.

For a multivariate function

$$
f : \mathbb{R}^n \to \mathbb{R}
$$

the tangent represents a direction vector $v$. Forward mode computes the Jacobian-vector product:

$$
J_f(x)v
$$

This is the central operation of forward mode.

## The Core Data Type

A minimal engine can start with a single type:

```go
type Dual struct {
    Value float64
    Deriv float64
}
```

`Value` is the ordinary numeric value. `Deriv` is the derivative with respect to the chosen seed direction.

For scalar differentiation, the input variable receives derivative `1`.

```go
func Var(x float64) Dual {
    return Dual{
        Value: x,
        Deriv: 1,
    }
}
```

Constants receive derivative `0`.

```go
func Const(x float64) Dual {
    return Dual{
        Value: x,
        Deriv: 0,
    }
}
```

This distinction matters. A variable changes when the input changes. A constant does not.

## Arithmetic Rules

Each primitive operation must define how values and derivatives propagate.

For addition:

```go
func Add(a, b Dual) Dual {
    return Dual{
        Value: a.Value + b.Value,
        Deriv: a.Deriv + b.Deriv,
    }
}
```

For subtraction:

```go
func Sub(a, b Dual) Dual {
    return Dual{
        Value: a.Value - b.Value,
        Deriv: a.Deriv - b.Deriv,
    }
}
```

For multiplication:

```go
func Mul(a, b Dual) Dual {
    return Dual{
        Value: a.Value * b.Value,
        Deriv: a.Deriv*b.Value + a.Value*b.Deriv,
    }
}
```

For division:

```go
func Div(a, b Dual) Dual {
    return Dual{
        Value: a.Value / b.Value,
        Deriv: (a.Deriv*b.Value - a.Value*b.Deriv) / (b.Value * b.Value),
    }
}
```

These are just the ordinary derivative rules encoded as executable code.

## Elementary Functions

The same pattern extends to functions such as `sin`, `cos`, `exp`, and `log`.

```go
func Sin(x Dual) Dual {
    return Dual{
        Value: math.Sin(x.Value),
        Deriv: math.Cos(x.Value) * x.Deriv,
    }
}

func Cos(x Dual) Dual {
    return Dual{
        Value: math.Cos(x.Value),
        Deriv: -math.Sin(x.Value) * x.Deriv,
    }
}

func Exp(x Dual) Dual {
    e := math.Exp(x.Value)
    return Dual{
        Value: e,
        Deriv: e * x.Deriv,
    }
}

func Log(x Dual) Dual {
    return Dual{
        Value: math.Log(x.Value),
        Deriv: x.Deriv / x.Value,
    }
}
```

Each rule has the same shape:

$$
y = f(x)
$$

$$
\dot{y} = f'(x)\dot{x}
$$

Forward mode applies the chain rule locally at every operation.

## A Complete Example

Consider:

$$
f(x) = x^2 + 3x + 2
$$

In Go:

```go
func F(x Dual) Dual {
    return Add(
        Add(
            Mul(x, x),
            Mul(Const(3), x),
        ),
        Const(2),
    )
}
```

Evaluate at $x = 5$:

```go
func main() {
    y := F(Var(5))

    fmt.Println("value:", y.Value)
    fmt.Println("derivative:", y.Deriv)
}
```

The result is:

```text
value: 42
derivative: 13
```

The derivative is correct because:

$$
f'(x) = 2x + 3
$$

and therefore:

$$
f'(5) = 13
$$

The engine never constructs the symbolic expression $2x + 3$. It also never estimates the derivative using finite differences. It computes the derivative exactly through the program structure, subject only to floating point arithmetic.

## Multivariate Inputs

For a function

$$
f(x, y) = xy + \sin(x)
$$

we can compute partial derivatives by choosing different seeds.

```go
func G(x, y Dual) Dual {
    return Add(
        Mul(x, y),
        Sin(x),
    )
}
```

To compute $\partial f / \partial x$, seed `x` with `1` and `y` with `0`.

```go
x := Dual{Value: 2, Deriv: 1}
y := Dual{Value: 3, Deriv: 0}

out := G(x, y)
```

The tangent gives:

$$
\frac{\partial f}{\partial x} = y + \cos(x)
$$

To compute $\partial f / \partial y$, seed `x` with `0` and `y` with `1`.

```go
x := Dual{Value: 2, Deriv: 0}
y := Dual{Value: 3, Deriv: 1}

out := G(x, y)
```

The tangent gives:

$$
\frac{\partial f}{\partial y} = x
$$

A full gradient for $f : \mathbb{R}^n \to \mathbb{R}$ requires $n$ forward passes if we use scalar tangents. Each pass seeds one input direction.

## Vector Tangents

A more general engine stores a vector of derivatives.

```go
type DualVec struct {
    Value float64
    Deriv []float64
}
```

Now one forward pass can carry multiple seed directions.

For example, for two inputs:

```go
x := DualVec{Value: 2, Deriv: []float64{1, 0}}
y := DualVec{Value: 3, Deriv: []float64{0, 1}}
```

The output derivative vector contains both partial derivatives.

This is convenient, but it changes the cost model. Every primitive operation now performs vector arithmetic on the tangent field.

Scalar tangent:

```text
cost per primitive: O(1)
```

Vector tangent of width `k`:

```text
cost per primitive: O(k)
```

The choice depends on the shape of the problem.

## Minimal Engine Interface

A small Go-style API can expose only the essential operations:

```go
type Dual struct {
    Value float64
    Deriv float64
}

func Var(x float64) Dual
func Const(x float64) Dual

func Add(a, b Dual) Dual
func Sub(a, b Dual) Dual
func Mul(a, b Dual) Dual
func Div(a, b Dual) Dual

func Sin(x Dual) Dual
func Cos(x Dual) Dual
func Exp(x Dual) Dual
func Log(x Dual) Dual
```

This is enough to differentiate many scalar programs. More functions can be added incrementally.

The important design rule is that every primitive must preserve the invariant:

```text
Dual.Value = primal value
Dual.Deriv = derivative of primal value with respect to the seed
```

Once this invariant holds for constants, variables, and primitive operations, it holds for every expression built from them.

## Why Forward Mode Is Simple

Forward mode is simple because the derivative flows in the same direction as evaluation.

Original program:

```text
inputs -> intermediate values -> output
```

Forward mode:

```text
input tangents -> intermediate tangents -> output tangent
```

There is no need to revisit earlier operations. There is no backward pass. There is no tape. The engine can run with constant extra memory per active value.

This makes forward mode attractive for:

| Problem shape | Why forward mode fits |
|---|---|
| Few inputs, many outputs | One seed direction can update all outputs |
| Jacobian-vector products | Directly computed |
| Local sensitivity analysis | Cheap for selected directions |
| Embedded systems | Simple memory model |
| Small numerical kernels | Low implementation overhead |
| Higher-order Taylor methods | Natural extension through richer number types |

## Limitations

Forward mode becomes expensive when the input dimension is large and the output dimension is small.

For a function:

$$
f : \mathbb{R}^n \to \mathbb{R}
$$

a full gradient needs $n$ scalar forward passes, or one pass with tangent width $n$. Either way, the work scales with the number of inputs.

This is why reverse mode dominates deep learning. Neural networks often have millions or billions of parameters and a scalar loss. Reverse mode can compute the full gradient with work comparable to a small constant multiple of the primal evaluation.

Forward mode remains valuable because it is predictable, local, and easy to implement. It is also the natural primitive for Jacobian-vector products, higher-order methods, and testing reverse-mode systems.

## Minimal Correctness Argument

The correctness proof is structural.

For each expression $e$, the engine computes:

$$
(e(x), De(x)v)
$$

where $v$ is the seed direction.

For variables, this holds by construction. For constants, the derivative is zero. For each primitive operation, the implementation applies the corresponding derivative rule. For composition, the tangent propagation is exactly the chain rule.

Therefore every expression built from supported primitives has a correct forward-mode derivative.

The engine is small because automatic differentiation does not require symbolic algebra. It only requires local derivative rules and ordinary program evaluation.

