# SSA Form

## SSA Form

Static single assignment form, or SSA, is an intermediate representation where each variable is assigned exactly once.

A source program may reuse the same variable name many times:

```python
def f(x):
    x = x * x
    x = sin(x)
    return x
```

In SSA, each assignment receives a fresh name:

```text
x0 = input
x1 = x0 * x0
x2 = sin(x1)
return x2
```

This small change has a large effect. It turns mutation of names into an explicit dataflow graph. For automatic differentiation, that graph is exactly what the transformation needs.

### Why SSA Helps AD

Automatic differentiation follows dependencies. To compute the derivative of an output with respect to an input, the system must know which intermediate values depend on which earlier values.

SSA makes these dependencies local and explicit.

```text
x1 = mul x0 x0
x2 = sin x1
```

The line defining `x2` depends on `x1`. The line defining `x1` depends on `x0`. There is no ambiguity about which version of `x` is being used, because each version has a unique name.

This is useful for both forward and reverse mode.

Forward mode attaches a tangent to each SSA value:

```text
x0, dx0 = input
x1 = mul x0 x0
dx1 = add (mul dx0 x0), (mul x0 dx0)
x2 = sin x1
dx2 = mul (cos x1), dx1
return x2, dx2
```

Reverse mode attaches an adjoint to each SSA value:

```text
x2_bar = seed
x1_bar += cos(x1) * x2_bar
x0_bar += x0 * x1_bar
x0_bar += x0 * x1_bar
return x0_bar
```

The reverse program is the original dependency graph traversed backward.

### SSA Values and Adjoint Accumulation

In reverse mode, each SSA value may contribute to several later values. Its adjoint is the sum of all downstream contributions.

Consider:

```text
x0 = input
x1 = x0 * x0
x2 = x1 + x0
return x2
```

The input `x0` reaches the output through two paths:

```text
x0 -> x1 -> x2
x0 -> x2
```

The reverse pass accumulates both contributions:

```text
x2_bar = 1
x1_bar += x2_bar
x0_bar += x2_bar
x0_bar += x0 * x1_bar
x0_bar += x0 * x1_bar
```

The `+=` form is important. Even though each primal SSA value is defined once, its adjoint may be updated many times.

This gives a useful rule:

```text
Primal SSA values are single assignment.
Adjoint values are accumulation variables.
```

Some compiler IRs represent adjoints in SSA too, using explicit reduction or phi-like accumulation. Conceptually, however, reverse mode computes sums of cotangent contributions.

### SSA as a Computational Graph

A straight-line SSA program is equivalent to a directed acyclic graph.

```text
x0 = input
x1 = mul x0 x0
x2 = sin x1
x3 = add x2 x0
return x3
```

The graph is:

```text
x0 -> x1 -> x2 -> x3
x0 -----------> x3
```

Forward mode walks this graph from inputs to outputs. Reverse mode walks it from outputs to inputs.

This graph view also explains why SSA is friendly to optimization. If a value has no path to a returned output, dead code elimination can remove it. If two expressions compute the same value under the IR semantics, common subexpression elimination can merge them. If a value is needed by the reverse pass, liveness analysis can decide where it must be saved.

### Basic Blocks

Real programs contain branches and loops, so SSA is organized into basic blocks.

A basic block is a sequence of instructions with one entry point and one exit instruction.

```text
entry:
    x0 = input
    c0 = gt x0 0
    branch c0, positive, negative

positive:
    x1 = mul x0 x0
    jump exit(x1)

negative:
    x2 = neg x0
    jump exit(x2)

exit(x3):
    return x3
```

The `exit` block receives a value from either branch. SSA needs a way to express that `x3` may come from `x1` or `x2`.

Traditional SSA uses phi nodes:

```text
x3 = phi(x1 from positive, x2 from negative)
```

Block-argument SSA writes the same idea as parameters to the destination block:

```text
jump exit(x1)
jump exit(x2)

exit(x3):
    return x3
```

Both forms make control-flow joins explicit.

### Phi Nodes and Derivatives

A phi node selects the value produced by the executed predecessor block.

```text
x3 = phi(x1, x2)
```

In forward mode, the tangent follows the same selection:

```text
dx3 = phi(dx1, dx2)
```

In reverse mode, the adjoint must flow back to the value that was selected.

If execution came from `positive`, then:

```text
x1_bar += x3_bar
```

If execution came from `negative`, then:

```text
x2_bar += x3_bar
```

The reverse pass must respect the actual control-flow path. It must not send adjoints into branches that did not execute.

This is one reason branch history matters in reverse mode. The backward pass needs enough information to reconstruct the forward path through the control-flow graph.

### Loops in SSA

Loops also use phi nodes or block arguments.

A simple loop:

```python
def f(x, n):
    y = x
    for i in range(n):
        y = y * y
    return y
```

may lower to:

```text
entry:
    y0 = x
    i0 = 0
    jump loop(i0, y0)

loop(i, y):
    c = lt i n
    branch c, body, exit

body:
    y_next = mul y y
    i_next = add i 1
    jump loop(i_next, y_next)

exit:
    return y
```

The loop block arguments `i` and `y` represent values that change each iteration while preserving SSA. Each iteration receives new logical values.

Forward mode adds tangent block arguments:

```text
loop(i, y, dy):
    c = lt i n
    branch c, body, exit

body:
    y_next = mul y y
    dy_next = add (mul dy y), (mul y dy)
    jump loop(i_next, y_next, dy_next)
```

Reverse mode needs to propagate adjoints through loop iterations in reverse order. That often requires saving the sequence of loop values or recomputing them.

### Reverse Mode over SSA Blocks

Reverse transformation of SSA code has a natural structure.

For each primal block, the AD system constructs a corresponding reverse block. The forward pass records enough information to decide which reverse block to enter later. The reverse pass walks control flow backward.

For straight-line code:

```text
x1 = op1 x0
x2 = op2 x1
x3 = op3 x1 x2
return x3
```

the reverse instruction order is:

```text
reverse op3
reverse op2
reverse op1
```

For control flow, the reverse control-flow graph is related to the primal graph but not identical. Joins become branches. Branches become joins. Loop exits become reverse loop entries.

This is manageable because SSA makes block inputs, block outputs, and dependencies explicit.

### Liveness and Saved Values

Reverse mode often needs primal values during the backward pass.

For example:

```text
x1 = sin x0
```

The reverse rule may need `x0`:

```text
x0_bar += cos(x0) * x1_bar
```

If `x0` will no longer be available when the reverse pass runs, it must be saved.

SSA supports precise liveness analysis. The compiler can ask:

```text
Which primal values are needed by reverse rules?
Where are those values produced?
Where are they last used in the forward pass?
Can they be recomputed instead of saved?
```

This analysis controls memory use.

A naive reverse-mode system saves every intermediate. An SSA-based compiler can save only the values required by derivative rules, and it can combine this with checkpointing or recomputation.

### SSA and Mutation

SSA does not eliminate memory mutation by itself. It eliminates reassignment of names.

A source update such as:

```text
a[i] = v
```

can be represented in different ways.

A functional SSA representation creates a new array value:

```text
a1 = scatter a0 i v
```

This is clean for AD because the old array and new array are distinct values.

A lower-level memory SSA representation may use explicit memory states:

```text
m1 = store m0, a, i, v
```

Here memory itself is threaded through SSA values. A load depends on the memory state:

```text
x = load m1, a, i
```

This makes mutation analyzable. The compiler can see which store reaches which load.

For reverse mode, this matters because an in-place update may overwrite a value needed later by the backward pass. SSA-style memory representation makes the overwrite explicit.

### SSA and Activity Analysis

Activity analysis classifies which values participate in differentiation.

A value is active when it depends on differentiable inputs and influences differentiable outputs.

SSA makes this a graph reachability problem.

From the inputs, mark values that depend on active inputs. From the outputs, mark values that influence active outputs. The intersection gives values that need derivative tracking.

Example:

```text
x0 = input active
n0 = input inactive
x1 = mul x0 x0
i1 = add n0 1
x2 = sin x1
return x2
```

Here `x1` and `x2` are active. `n0` and `i1` are inactive. The derivative transformation can ignore tangents and adjoints for inactive integer loop counters, sizes, strings, handles, and metadata.

### SSA and Optimization After AD

Derivative generation often creates redundant code. SSA gives compiler passes a standard representation for cleaning it up.

For example:

```text
dx1 = add (mul dx0 x0), (mul x0 dx0)
```

can be simplified to:

```text
dx1 = mul 2, mul x0 dx0
```

If `dx0` is known to be `1`, this may become:

```text
dx1 = mul 2, x0
```

SSA also enables dead code elimination:

```text
x1 = expensive(x0)
dx1 = derivative_expensive(x0, dx0)
return x1
```

If only the primal result is needed, the derivative value can be removed. If only a gradient is needed and the primal output is unused after differentiation, some primal values may be kept only when required by reverse rules.

### A Minimal SSA AD Algorithm

A forward-mode transformation for straight-line SSA code is direct:

```text
for instruction v = op(args):
    emit primal instruction v = op(args)
    emit tangent instruction dv = jvp(op, args, dargs)
```

A reverse-mode transformation has two phases:

```text
forward:
    emit primal instructions
    save primal values needed by reverse rules

reverse:
    initialize output adjoints
    for primal instructions in reverse order:
        emit vjp rule
        accumulate adjoints into input adjoints
```

For basic blocks, the same idea applies with explicit handling of branches, phi nodes, and loops.

The hard parts are not the local derivative rules. The hard parts are saving values, reversing control flow, handling mutation, and producing efficient code.

### Why SSA Became Central

SSA is central in modern compilers because it provides a clean framework for analysis and optimization. AD needs the same qualities.

It needs explicit dependencies. It needs dominance and liveness information. It needs control-flow joins. It needs clean treatment of values whose definitions are unique. It needs a way to represent mutation without hiding it behind variable names.

For these reasons, many compiler-based AD systems either use SSA directly or use an IR with SSA-like properties.

### Summary

SSA form is a natural representation for automatic differentiation.

Forward mode extends each SSA value with a tangent. Reverse mode extends each SSA value with an adjoint and traverses the dependency graph backward. Phi nodes and block arguments make branches and loops explicit. Liveness and activity analysis decide which values need derivative state and which primal values must be saved.

SSA does not solve every problem in AD, but it gives the compiler the right shape of program: explicit values, explicit dependencies, and explicit control flow.

