# Wengert Lists

## Wengert Lists

A Wengert list is a linear representation of a computation in which every intermediate result is assigned to a unique variable. It is one of the earliest and most influential formulations of automatic differentiation.

The central idea is simple:

1. decompose a computation into elementary operations;
2. assign each intermediate result a new identifier;
3. record operations in execution order.

This transforms a program into a sequence of explicit data dependencies.

Wengert lists are foundational because reverse mode automatic differentiation operates naturally on such ordered traces.

### Historical Background

The method is named after Robert Wengert, whose 1964 work introduced a systematic way to propagate derivatives through numerical computations.

The key insight was that derivative computation could follow the exact structure of the original numerical program rather than symbolic algebraic manipulation.

This idea eventually became the basis for modern automatic differentiation systems.

### Basic Structure

Consider

$$
y = \sin(x_1 x_2 + x_3).
$$

A Wengert list expands the computation into elementary steps:

| Step | Assignment |
|---|---|
| 1 | $v_1 = x_1 x_2$ |
| 2 | $v_2 = v_1 + x_3$ |
| 3 | $v_3 = \sin(v_2)$ |
| 4 | $y = v_3$ |

Each variable:

1. is assigned exactly once;
2. depends only on earlier variables.

This property creates an explicit acyclic dependency structure.

### Single Assignment Property

Wengert lists use single assignment semantics:

$$
v_i \text{ assigned once only}.
$$

For example:

```text
v1 = x + y
v1 = v1 + z
```

is not a valid Wengert form.

Instead:

```text
v1 = x + y
v2 = v1 + z
```

must be used.

This property simplifies differentiation because dependencies become immutable.

Modern compiler intermediate representations such as SSA form follow the same principle.

### Wengert Lists as Computational Graphs

A Wengert list is equivalent to a topologically ordered computational graph.

Example:

$$
v_1 = x_1x_2,
$$

$$
v_2 = \sin(v_1),
$$

$$
v_3 = v_1 + v_2.
$$

Graphically:

```text
x1 ----\
        (*) ---- v1 ----\
x2 ----/                 \
                           (+) ---- v3
v1 -------- sin ---- v2 /
```

The Wengert list linearizes this graph:

| Index | Operation |
|---|---|
| 1 | multiply |
| 2 | sine |
| 3 | add |

Forward evaluation processes the list top-to-bottom.

Reverse differentiation processes the list bottom-to-top.

### Forward Evaluation

The forward pass computes primal values sequentially.

Suppose

$$
x_1 = 2,
\qquad
x_2 = 3,
\qquad
x_3 = 1.
$$

Forward execution:

| Variable | Value |
|---|---|
| $v_1 = x_1x_2$ | $6$ |
| $v_2 = v_1 + x_3$ | $7$ |
| $v_3 = \sin(v_2)$ | $\sin(7)$ |

Each step only depends on earlier results.

### Reverse Accumulation on a Wengert List

Reverse mode associates each variable with adjoint

$$
\bar v_i =
\frac{\partial y}{\partial v_i}.
$$

Initialize:

$$
\bar v_3 = 1.
$$

Now process the list backward.

#### Step 3

$$
v_3 = \sin(v_2)
$$

Reverse rule:

$$
\bar v_2
\mathrel{+}=
\bar v_3 \cos(v_2).
$$

#### Step 2

$$
v_2 = v_1 + x_3
$$

Reverse rules:

$$
\bar v_1
\mathrel{+}=
\bar v_2,
$$

$$
\bar x_3
\mathrel{+}=
\bar v_2.
$$

#### Step 1

$$
v_1 = x_1x_2
$$

Reverse rules:

$$
\bar x_1
\mathrel{+}=
\bar v_1 x_2,
$$

$$
\bar x_2
\mathrel{+}=
\bar v_1 x_1.
$$

The final adjoints are the desired derivatives.

### Explicit Dependency Representation

Wengert lists make dependencies explicit.

Each entry records:

| Field | Meaning |
|---|---|
| operation | primitive computation |
| inputs | dependency references |
| output | produced variable |

This explicit structure enables:

1. reverse traversal;
2. derivative propagation;
3. graph optimization;
4. checkpointing;
5. parallel scheduling.

Without explicit dependency structure, reverse mode becomes difficult to implement correctly.

### Example with Shared Dependencies

Consider

$$
y = x^2 + \sin(x^2).
$$

Wengert form:

| Step | Assignment |
|---|---|
| 1 | $v_1 = x^2$ |
| 2 | $v_2 = \sin(v_1)$ |
| 3 | $v_3 = v_1 + v_2$ |
| 4 | $y = v_3$ |

Notice that $v_1$ influences the output through two paths:

1. directly through addition;
2. indirectly through sine.

Reverse accumulation naturally handles this through additive adjoint accumulation:

$$
\bar v_1 =
1 + \cos(v_1).
$$

Then

$$
\bar x =
2x(1+\cos(v_1)).
$$

The Wengert list exposes these dependency paths explicitly.

### Relation to SSA Form

Modern compilers frequently use static single assignment form.

SSA and Wengert lists share the same core property:

$$
\text{every variable assigned once}.
$$

Example SSA:

```text
v1 = x * y
v2 = sin(v1)
v3 = v1 + v2
```

This similarity makes SSA-based compiler infrastructures highly suitable for automatic differentiation systems.

Many modern AD compilers operate directly on SSA intermediate representations.

### Wengert Lists and Operator Overloading

Operator-overloading AD systems dynamically construct Wengert lists during execution.

Example:

```text
a = x * y
b = sin(a)
c = a + b
```

Each overloaded operation appends a node:

```text
tape.append(multiply(x, y))
tape.append(sine(a))
tape.append(add(a, b))
```

The resulting runtime tape is effectively a dynamically generated Wengert list.

### Memory Characteristics

A Wengert list stores every intermediate operation.

Suppose:

$$
N
$$

operations execute.

The list requires:

$$
O(N)
$$

storage.

Large machine learning workloads may generate enormous lists.

Memory costs arise from:

| Stored Data | Purpose |
|---|---|
| operation codes | reverse rules |
| input references | dependency tracking |
| primal values | local derivatives |
| tensor metadata | shapes and layouts |

Reverse mode performance is often constrained more by Wengert list storage than arithmetic throughput.

### Advantages of Wengert Lists

#### Simplicity

The structure is linear and explicit.

#### Deterministic Traversal

Forward and reverse orders are unambiguous.

#### Local Differentiation Rules

Each operation defines its own local derivative behavior.

#### Dynamic Execution Support

Lists naturally capture runtime execution paths.

#### Easy Debugging

The recorded trace exposes the exact executed computation.

### Limitations

#### Memory Growth

Every intermediate operation must be retained.

#### Fine-Grained Overhead

Tiny operations generate large traces.

#### Mutation Complexity

Programs with mutable state complicate dependency tracking.

#### Parallelism Challenges

Reverse accumulation may require synchronization between adjoint updates.

### Wengert Lists and Higher-Order Differentiation

A reverse pass itself can be represented as another Wengert list.

This enables higher-order derivatives:

$$
\frac{d^2f}{dx^2},
\qquad
\frac{d^3f}{dx^3},
\quad \text{etc.}
$$

However, nested differentiation introduces complications:

1. nested tapes;
2. perturbation confusion;
3. adjoint lifetime management.

Higher-order AD systems often require more sophisticated representations than basic Wengert lists.

### Linearized Program Interpretation

A Wengert list transforms a program into an explicit sequence of local transformations.

Each line represents:

$$
v_i = g_i(v_{p_1}, \ldots, v_{p_r}).
$$

The full program becomes a composition:

$$
f = g_k \circ g_{k-1} \circ \cdots \circ g_1.
$$

Forward mode propagates derivatives through this composition from left to right.

Reverse mode propagates adjoints from right to left.

The Wengert list therefore provides the operational structure needed to realize the chain rule algorithmically.

### Conceptual Summary

A Wengert list is a linearized computational graph with explicit single-assignment intermediate variables.

It converts a numerical program into a form suitable for automatic differentiation by:

1. exposing dependencies;
2. preserving evaluation order;
3. recording local operations;
4. enabling reverse traversal.

Most modern reverse mode systems, whether tape-based, graph-based, or compiler-based, still fundamentally rely on the same structural principles introduced by Wengert lists.

