Static single assignment form, or SSA, is an intermediate representation where each variable is assigned exactly once.
Static single assignment form, or SSA, is an intermediate representation where each variable is assigned exactly once.
A source program may reuse the same variable name many times:
def f(x):
x = x * x
x = sin(x)
return xIn SSA, each assignment receives a fresh name:
x0 = input
x1 = x0 * x0
x2 = sin(x1)
return x2This small change has a large effect. It turns mutation of names into an explicit dataflow graph. For automatic differentiation, that graph is exactly what the transformation needs.
Why SSA Helps AD
Automatic differentiation follows dependencies. To compute the derivative of an output with respect to an input, the system must know which intermediate values depend on which earlier values.
SSA makes these dependencies local and explicit.
x1 = mul x0 x0
x2 = sin x1The line defining x2 depends on x1. The line defining x1 depends on x0. There is no ambiguity about which version of x is being used, because each version has a unique name.
This is useful for both forward and reverse mode.
Forward mode attaches a tangent to each SSA value:
x0, dx0 = input
x1 = mul x0 x0
dx1 = add (mul dx0 x0), (mul x0 dx0)
x2 = sin x1
dx2 = mul (cos x1), dx1
return x2, dx2Reverse mode attaches an adjoint to each SSA value:
x2_bar = seed
x1_bar += cos(x1) * x2_bar
x0_bar += x0 * x1_bar
x0_bar += x0 * x1_bar
return x0_barThe reverse program is the original dependency graph traversed backward.
SSA Values and Adjoint Accumulation
In reverse mode, each SSA value may contribute to several later values. Its adjoint is the sum of all downstream contributions.
Consider:
x0 = input
x1 = x0 * x0
x2 = x1 + x0
return x2The input x0 reaches the output through two paths:
x0 -> x1 -> x2
x0 -> x2The reverse pass accumulates both contributions:
x2_bar = 1
x1_bar += x2_bar
x0_bar += x2_bar
x0_bar += x0 * x1_bar
x0_bar += x0 * x1_barThe += form is important. Even though each primal SSA value is defined once, its adjoint may be updated many times.
This gives a useful rule:
Primal SSA values are single assignment.
Adjoint values are accumulation variables.Some compiler IRs represent adjoints in SSA too, using explicit reduction or phi-like accumulation. Conceptually, however, reverse mode computes sums of cotangent contributions.
SSA as a Computational Graph
A straight-line SSA program is equivalent to a directed acyclic graph.
x0 = input
x1 = mul x0 x0
x2 = sin x1
x3 = add x2 x0
return x3The graph is:
x0 -> x1 -> x2 -> x3
x0 -----------> x3Forward mode walks this graph from inputs to outputs. Reverse mode walks it from outputs to inputs.
This graph view also explains why SSA is friendly to optimization. If a value has no path to a returned output, dead code elimination can remove it. If two expressions compute the same value under the IR semantics, common subexpression elimination can merge them. If a value is needed by the reverse pass, liveness analysis can decide where it must be saved.
Basic Blocks
Real programs contain branches and loops, so SSA is organized into basic blocks.
A basic block is a sequence of instructions with one entry point and one exit instruction.
entry:
x0 = input
c0 = gt x0 0
branch c0, positive, negative
positive:
x1 = mul x0 x0
jump exit(x1)
negative:
x2 = neg x0
jump exit(x2)
exit(x3):
return x3The exit block receives a value from either branch. SSA needs a way to express that x3 may come from x1 or x2.
Traditional SSA uses phi nodes:
x3 = phi(x1 from positive, x2 from negative)Block-argument SSA writes the same idea as parameters to the destination block:
jump exit(x1)
jump exit(x2)
exit(x3):
return x3Both forms make control-flow joins explicit.
Phi Nodes and Derivatives
A phi node selects the value produced by the executed predecessor block.
x3 = phi(x1, x2)In forward mode, the tangent follows the same selection:
dx3 = phi(dx1, dx2)In reverse mode, the adjoint must flow back to the value that was selected.
If execution came from positive, then:
x1_bar += x3_barIf execution came from negative, then:
x2_bar += x3_barThe reverse pass must respect the actual control-flow path. It must not send adjoints into branches that did not execute.
This is one reason branch history matters in reverse mode. The backward pass needs enough information to reconstruct the forward path through the control-flow graph.
Loops in SSA
Loops also use phi nodes or block arguments.
A simple loop:
def f(x, n):
y = x
for i in range(n):
y = y * y
return ymay lower to:
entry:
y0 = x
i0 = 0
jump loop(i0, y0)
loop(i, y):
c = lt i n
branch c, body, exit
body:
y_next = mul y y
i_next = add i 1
jump loop(i_next, y_next)
exit:
return yThe loop block arguments i and y represent values that change each iteration while preserving SSA. Each iteration receives new logical values.
Forward mode adds tangent block arguments:
loop(i, y, dy):
c = lt i n
branch c, body, exit
body:
y_next = mul y y
dy_next = add (mul dy y), (mul y dy)
jump loop(i_next, y_next, dy_next)Reverse mode needs to propagate adjoints through loop iterations in reverse order. That often requires saving the sequence of loop values or recomputing them.
Reverse Mode over SSA Blocks
Reverse transformation of SSA code has a natural structure.
For each primal block, the AD system constructs a corresponding reverse block. The forward pass records enough information to decide which reverse block to enter later. The reverse pass walks control flow backward.
For straight-line code:
x1 = op1 x0
x2 = op2 x1
x3 = op3 x1 x2
return x3the reverse instruction order is:
reverse op3
reverse op2
reverse op1For control flow, the reverse control-flow graph is related to the primal graph but not identical. Joins become branches. Branches become joins. Loop exits become reverse loop entries.
This is manageable because SSA makes block inputs, block outputs, and dependencies explicit.
Liveness and Saved Values
Reverse mode often needs primal values during the backward pass.
For example:
x1 = sin x0The reverse rule may need x0:
x0_bar += cos(x0) * x1_barIf x0 will no longer be available when the reverse pass runs, it must be saved.
SSA supports precise liveness analysis. The compiler can ask:
Which primal values are needed by reverse rules?
Where are those values produced?
Where are they last used in the forward pass?
Can they be recomputed instead of saved?This analysis controls memory use.
A naive reverse-mode system saves every intermediate. An SSA-based compiler can save only the values required by derivative rules, and it can combine this with checkpointing or recomputation.
SSA and Mutation
SSA does not eliminate memory mutation by itself. It eliminates reassignment of names.
A source update such as:
a[i] = vcan be represented in different ways.
A functional SSA representation creates a new array value:
a1 = scatter a0 i vThis is clean for AD because the old array and new array are distinct values.
A lower-level memory SSA representation may use explicit memory states:
m1 = store m0, a, i, vHere memory itself is threaded through SSA values. A load depends on the memory state:
x = load m1, a, iThis makes mutation analyzable. The compiler can see which store reaches which load.
For reverse mode, this matters because an in-place update may overwrite a value needed later by the backward pass. SSA-style memory representation makes the overwrite explicit.
SSA and Activity Analysis
Activity analysis classifies which values participate in differentiation.
A value is active when it depends on differentiable inputs and influences differentiable outputs.
SSA makes this a graph reachability problem.
From the inputs, mark values that depend on active inputs. From the outputs, mark values that influence active outputs. The intersection gives values that need derivative tracking.
Example:
x0 = input active
n0 = input inactive
x1 = mul x0 x0
i1 = add n0 1
x2 = sin x1
return x2Here x1 and x2 are active. n0 and i1 are inactive. The derivative transformation can ignore tangents and adjoints for inactive integer loop counters, sizes, strings, handles, and metadata.
SSA and Optimization After AD
Derivative generation often creates redundant code. SSA gives compiler passes a standard representation for cleaning it up.
For example:
dx1 = add (mul dx0 x0), (mul x0 dx0)can be simplified to:
dx1 = mul 2, mul x0 dx0If dx0 is known to be 1, this may become:
dx1 = mul 2, x0SSA also enables dead code elimination:
x1 = expensive(x0)
dx1 = derivative_expensive(x0, dx0)
return x1If only the primal result is needed, the derivative value can be removed. If only a gradient is needed and the primal output is unused after differentiation, some primal values may be kept only when required by reverse rules.
A Minimal SSA AD Algorithm
A forward-mode transformation for straight-line SSA code is direct:
for instruction v = op(args):
emit primal instruction v = op(args)
emit tangent instruction dv = jvp(op, args, dargs)A reverse-mode transformation has two phases:
forward:
emit primal instructions
save primal values needed by reverse rules
reverse:
initialize output adjoints
for primal instructions in reverse order:
emit vjp rule
accumulate adjoints into input adjointsFor basic blocks, the same idea applies with explicit handling of branches, phi nodes, and loops.
The hard parts are not the local derivative rules. The hard parts are saving values, reversing control flow, handling mutation, and producing efficient code.
Why SSA Became Central
SSA is central in modern compilers because it provides a clean framework for analysis and optimization. AD needs the same qualities.
It needs explicit dependencies. It needs dominance and liveness information. It needs control-flow joins. It needs clean treatment of values whose definitions are unique. It needs a way to represent mutation without hiding it behind variable names.
For these reasons, many compiler-based AD systems either use SSA directly or use an IR with SSA-like properties.
Summary
SSA form is a natural representation for automatic differentiation.
Forward mode extends each SSA value with a tangent. Reverse mode extends each SSA value with an adjoint and traverses the dependency graph backward. Phi nodes and block arguments make branches and loops explicit. Liveness and activity analysis decide which values need derivative state and which primal values must be saved.
SSA does not solve every problem in AD, but it gives the compiler the right shape of program: explicit values, explicit dependencies, and explicit control flow.