Loops express repeated computation. Recurrence relations express the same idea mathematically: each state is computed from one or more earlier states.
Loops express repeated computation. Recurrence relations express the same idea mathematically: each state is computed from one or more earlier states.
A loop such as
h = h0
for t in range(n):
h = tanh(w * h + b)
y = hcorresponds to the recurrence
Automatic differentiation treats this as a sequence of dependent operations. For a fixed number of iterations, the loop can be expanded into a straight-line program.
Unrolling a Loop
For , the loop becomes:
h0 = input
a0 = w * h0 + b
h1 = tanh(a0)
a1 = w * h1 + b
h2 = tanh(a1)
a2 = w * h2 + b
h3 = tanh(a2)
y = h3The dependency chain is:
h0 -> a0 -> h1 -> a1 -> h2 -> a2 -> h3 -> yForward mode moves left to right. Reverse mode moves right to left.
Forward Differentiation Through a Recurrence
Let
where is a parameter. Forward mode propagates both the state and its tangent.
If we differentiate with respect to , define
Then
For
with respect to :
Forward mode computes this during the same pass as the original loop.
Reverse Differentiation Through a Recurrence
Reverse mode starts from the final output and propagates adjoints backward through time.
For
the adjoint of receives
The parameter adjoint receives
For
let
Then
These rules are applied from down to .
Backpropagation Through Time
Backpropagation through time is reverse-mode AD applied to an unrolled recurrence.
The word “time” does not require physical time. It refers to an ordered sequence of repeated computation steps.
Examples include:
| Program structure | Recurrence view |
|---|---|
| Recurrent neural network | Hidden state update |
| Iterative solver | Approximation update |
| Simulation | State transition |
| Optimization loop | Parameter update |
| Dynamic program | Table update |
| Markov process | State evolution |
The same reverse-mode rule applies: store or reconstruct the states, then propagate adjoints backward through the unrolled computation.
Loop-Carried State
A loop-carried variable is a value that passes from one iteration to the next.
s = 0
for i in range(n):
s = s + x[i] * x[i]
y = sIn single-assignment form:
s0 = 0
p0 = x0 * x0
s1 = s0 + p0
p1 = x1 * x1
s2 = s1 + p1
p2 = x2 * x2
s3 = s2 + p2The adjoint of each state flows backward:
s3 -> s2 -> s1 -> s0Each input receives its contribution:
For this sum, , so
Multiple State Variables
Loops often carry several variables.
a = a0
b = b0
for t in range(n):
new_a = a + b
new_b = a * b
a = new_a
b = new_b
y = aThe recurrence is
The Jacobian of one step is
Forward mode multiplies tangents by this Jacobian at each step. Reverse mode multiplies adjoints by its transpose.
Loops With Arrays
Array loops introduce indexed dependencies.
for i in range(n):
y[i] = x[i] * x[i]Each output depends only on the corresponding input:
The Jacobian is diagonal.
By contrast:
s = 0
for i in range(n):
s = s + x[i]
y[i] = sdefines a prefix sum:
Here, each output depends on many earlier inputs. The Jacobian is lower triangular.
AD does not require the programmer to build this Jacobian explicitly. It propagates through the loop structure.
Dynamic Loop Counts
A loop may stop based on runtime data.
while norm(r) > eps:
x = update(x)
r = residual(x)For one execution, the loop runs iterations. AD differentiates the trace of length .
This derivative is valid under the assumption that small input perturbations keep the same number of iterations. Near a point where the stopping count changes, the derivative may be discontinuous or misleading.
This is common in solvers, simulations, and optimization routines.
Memory Cost of Reverse Mode
Reverse mode usually needs primal values from each iteration.
For
the backward step at time often needs , , and intermediate values inside .
If the loop has iterations and the state size is , storing all states costs
memory.
This is often the limiting cost in long sequences and simulations.
Recompute Instead of Store
A system can reduce memory by recomputing states during the backward pass.
Instead of storing
h0, h1, h2, ..., hnit may store only checkpoints:
h0, h100, h200, ..., hnDuring backward execution, it recomputes missing states from the nearest checkpoint.
This trades extra compute for lower memory. The technique is essential for long recurrent networks, neural ODEs, differentiable physics, and iterative solvers.
Loop Fusion and Differentiation
Compilers may transform loops before or after differentiation.
For example:
for i in range(n):
y[i] = sin(x[i])
for i in range(n):
z[i] = y[i] + 1can be fused into:
for i in range(n):
z[i] = sin(x[i]) + 1Differentiation interacts with such transformations. A compiler must preserve the primal result and the derivative result.
In many cases, differentiating after optimization gives faster derivative code. In other cases, differentiating before optimization exposes more structure to the optimizer.
Recurrences and Stability
Repeated multiplication by local Jacobians can amplify or shrink derivatives.
For a scalar recurrence
the derivative is
If the factors tend to have magnitude greater than , gradients may explode. If they tend to have magnitude less than , gradients may vanish.
This is a mathematical property of the recurrence, not only an implementation problem.
Core Idea
A loop is a compact syntax for a repeated dependency graph. For a fixed execution, AD can unroll the loop into a straight-line trace and apply the same derivative propagation rules as before.
Forward mode carries tangents through each iteration. Reverse mode stores or reconstructs loop states and propagates adjoints backward. The main difficulties are dynamic stopping conditions, memory cost, and numerical stability.