# Tape Design

## Tape Design

A tape is an append-only record of the operations executed during the forward pass. Reverse mode uses the tape to replay derivative rules backward.

A graph stores nodes and edges as linked objects. A tape stores instructions in execution order.

```text
forward pass:
input -> op 0 -> op 1 -> op 2 -> output

tape:
[op 0, op 1, op 2]

backward pass:
op 2 -> op 1 -> op 0
```

The tape is one of the simplest practical representations for reverse mode AD. It is compact, linear, and easy to traverse in reverse.

## Tape Entries

A minimal tape entry records:
- the operation kind
- the output slot
- the input slots
- any saved values needed by the backward rule

For scalar AD:

```go
type OpKind int

const (
    OpAdd OpKind = iota
    OpMul
    OpSin
)

type Instr struct {
    Op  OpKind
    Out int
    A   int
    B   int
}
```

The slots refer to arrays of values and gradients.

```go
type Tape struct {
    Values []float64
    Grads  []float64
    Instrs []Instr
}
```

A new variable allocates a value slot:

```go
func (t *Tape) Var(x float64) int {
    id := len(t.Values)

    t.Values = append(t.Values, x)
    t.Grads = append(t.Grads, 0)

    return id
}
```

Each operation creates a new output slot and appends one instruction.

## Forward Operations

Addition:

```go
func (t *Tape) Add(a, b int) int {
    out := t.Var(t.Values[a] + t.Values[b])

    t.Instrs = append(t.Instrs, Instr{
        Op:  OpAdd,
        Out: out,
        A:   a,
        B:   b,
    })

    return out
}
```

Multiplication:

```go
func (t *Tape) Mul(a, b int) int {
    out := t.Var(t.Values[a] * t.Values[b])

    t.Instrs = append(t.Instrs, Instr{
        Op:  OpMul,
        Out: out,
        A:   a,
        B:   b,
    })

    return out
}
```

Sine:

```go
func (t *Tape) Sin(x int) int {
    out := t.Var(math.Sin(t.Values[x]))

    t.Instrs = append(t.Instrs, Instr{
        Op:  OpSin,
        Out: out,
        A:   x,
    })

    return out
}
```

The user program works with integer handles, not pointers. The tape owns the storage.

## Backward Pass

The backward pass starts by seeding the output gradient:

```go
t.Grads[out] = 1
```

Then it walks instructions in reverse order.

```go
func (t *Tape) Backward(out int) {
    t.Grads[out] = 1

    for i := len(t.Instrs) - 1; i >= 0; i-- {
        ins := t.Instrs[i]

        switch ins.Op {
        case OpAdd:
            t.Grads[ins.A] += t.Grads[ins.Out]
            t.Grads[ins.B] += t.Grads[ins.Out]

        case OpMul:
            t.Grads[ins.A] += t.Values[ins.B] * t.Grads[ins.Out]
            t.Grads[ins.B] += t.Values[ins.A] * t.Grads[ins.Out]

        case OpSin:
            t.Grads[ins.A] += math.Cos(t.Values[ins.A]) * t.Grads[ins.Out]
        }
    }
}
```

This is reverse mode without graph traversal. The tape already gives a valid reverse topological order because instructions were appended in forward execution order.

## Complete Example

For:

$$
f(x) = x^2 + \sin(x)
$$

```go
func main() {
    var t Tape

    x := t.Var(2)

    x2 := t.Mul(x, x)
    sx := t.Sin(x)
    y := t.Add(x2, sx)

    t.Backward(y)

    fmt.Println("value:", t.Values[y])
    fmt.Println("grad:", t.Grads[x])
}
```

The derivative is:

$$
f'(x) = 2x + \cos(x)
$$

At `x = 2`, the tape computes this by replaying local rules backward.

## Why a Tape Is Efficient

A tape has good locality. Values and gradients live in contiguous arrays. Instructions live in another contiguous array.

This avoids many costs of pointer-based graph nodes:
- fewer heap objects
- fewer pointer traversals
- simpler traversal
- better cache locality
- easier serialization
- easier memory reuse

A pointer graph is easier to inspect. A tape is often easier to execute quickly.

## Saved Values

Some backward rules need values from the forward pass.

For multiplication, the backward pass needs both input values:

```go
a.Grad += b.Value * out.Grad
b.Grad += a.Value * out.Grad
```

In the tape version, these are read from:

```go
t.Values[ins.A]
t.Values[ins.B]
```

For operations whose backward rule needs extra data, the instruction can store parameters.

Example for power:

```go
type Instr struct {
    Op  OpKind
    Out int
    A   int
    B   int

    Param float64
}
```

Then:

```go
func (t *Tape) Pow(x int, p float64) int {
    out := t.Var(math.Pow(t.Values[x], p))

    t.Instrs = append(t.Instrs, Instr{
        Op:    OpPow,
        Out:   out,
        A:     x,
        Param: p,
    })

    return out
}
```

Backward:

```go
case OpPow:
    x := t.Values[ins.A]
    p := ins.Param
    t.Grads[ins.A] += p * math.Pow(x, p-1) * t.Grads[ins.Out]
```

The tape must retain whatever the backward rule cannot reconstruct cheaply or safely.

## Constants

Constants can be represented as value slots too.

```go
func (t *Tape) Const(x float64) int {
    return t.Var(x)
}
```

This is simple, but gradients will also be allocated for constants.

A slightly better design separates variables from constants:

```go
type SlotKind int

const (
    SlotVar SlotKind = iota
    SlotConst
    SlotTemp
)

type Tape struct {
    Values []float64
    Grads  []float64
    Kinds  []SlotKind
    Instrs []Instr
}
```

Then the backward pass may skip accumulating gradients into constants. For a minimal engine, the simpler representation is usually enough.

## Reset and Reuse

A tape is naturally reusable if its arrays are cleared between evaluations.

```go
func (t *Tape) Reset() {
    t.Values = t.Values[:0]
    t.Grads = t.Grads[:0]
    t.Instrs = t.Instrs[:0]
}
```

This preserves allocated capacity and avoids repeated allocation.

For iterative optimization, this matters:

```go
for step := 0; step < steps; step++ {
    t.Reset()

    x := t.Var(param)
    y := loss(&t, x)

    t.Backward(y)

    param -= lr * t.Grads[x]
}
```

A tape-based design therefore fits tight numerical loops well.

## Persistent vs Ephemeral Tapes

There are two broad tape lifetimes.

| Tape style | Meaning | Typical use |
|---|---|---|
| Ephemeral tape | Built for one forward pass, discarded after backward | Dynamic eager AD |
| Persistent tape | Recorded once, replayed many times | Static computation, repeated kernels |

An ephemeral tape is simple and matches ordinary program execution.

A persistent tape requires stable inputs, stable control flow, and a way to update input values before replay. It can improve performance when the same computation runs many times.

## Handling Control Flow

A tape records the operations that actually executed.

```go
func H(t *Tape, x int) int {
    if t.Values[x] > 0 {
        return t.Mul(x, x)
    }

    return t.Mul(t.Const(-1), x)
}
```

For positive `x`, the tape contains the square branch. For non-positive `x`, it contains the negation branch.

The derivative is local to that execution path. This matches dynamic graph AD.

At a branch boundary, the mathematical function may have a kink. The tape still returns the derivative of the executed branch.

## Aliasing and Mutation

Tape design becomes harder when the source language allows mutation.

Consider:

```go
x = x + 1
y = x * x
```

A tape should not overwrite the old value if the backward pass may need it. Each assignment should create a new slot, or the tape must explicitly save overwritten values.

A safe minimal rule is:

```text
Every differentiable operation writes to a fresh slot.
```

This gives single-assignment behavior, similar to SSA form.

Mutation can be compiled into fresh slots plus environment updates:

```text
x0 = input
x1 = x0 + 1
y  = x1 * x1
```

This makes reverse mode straightforward.

## Tape vs Graph

| Property | Tape | Graph |
|---|---|---|
| Storage | Linear arrays | Nodes and edges |
| Traversal | Reverse instruction order | Reverse topological order |
| Allocation | Compact | Often many objects |
| Debugging | Less visual | More inspectable |
| Control flow | Records executed path | Records executed path |
| Serialization | Simple | More complex |
| Optimization | Good for local passes | Good for structural passes |

A tape is usually the better first production representation. A graph is usually the better first teaching representation.

## Minimal Tape Invariant

A correct tape must satisfy one invariant:

```text
For every instruction, all input slots were created before the output slot, and the backward rule has access to every primal value it needs.
```

Because instructions are appended during forward execution, reverse iteration gives a valid dependency order.

The backward rule for each instruction applies:

$$
\bar{x_i} \mathrel{+}= \bar{y}\frac{\partial y}{\partial x_i}
$$

The tape is therefore a linearized computation graph. Its advantage is that the linearization is already done during execution.