A straight-line program is the simplest model of computation used in automatic differentiation. It is a program with a fixed sequence of assignments, no branches, no loops,...
Straight-Line Programs
A straight-line program is the simplest model of computation used in automatic differentiation. It is a program with a fixed sequence of assignments, no branches, no loops, and no recursion. Every operation is executed exactly once, in a known order.
A typical straight-line program has the form:
v1 = x1
v2 = x2
v3 = v1 * v2
v4 = sin(v3)
v5 = v4 + v1
y = v5The program computes a function from inputs to outputs. In this example, the function is
Automatic differentiation treats this program as a sequence of elementary operations. Each assignment introduces an intermediate variable. Each intermediate variable depends only on earlier variables. This gives the program a natural evaluation order.
Straight-Line Programs as Factorizations
A straight-line program factors a complicated function into simple local maps.
For example, instead of differentiating
as one expression, AD differentiates the sequence:
Each step has a simple derivative. The derivative of the whole program is assembled by the chain rule.
This is the central idea of AD: do not symbolically rewrite the whole expression, and do not approximate derivatives by finite differences. Instead, run the original computation while propagating exact local derivative information.
Inputs, Intermediates, and Outputs
A straight-line program separates variables into three groups.
| Kind | Meaning | Example |
|---|---|---|
| Input variables | Values supplied from outside the program | x1, x2 |
| Intermediate variables | Values computed by the program | v3, v4 |
| Output variables | Final returned values | y |
The intermediate variables are important because AD attaches derivative information to them.
In forward mode, each variable carries a tangent value:
where is the primal value and is the directional derivative.
In reverse mode, each variable later receives an adjoint:
The straight-line structure makes both directions easy to define.
Example: Forward Propagation
Consider
v1 = x1
v2 = x2
v3 = v1 * v2
v4 = sin(v3)
v5 = v4 + v1
y = v5To compute the derivative with respect to , seed
Then propagate:
so
The derivative was obtained by applying one local rule per line.
Example: Reverse Propagation
For the same program, reverse mode starts from the output:
Since ,
Then move backward:
v5 = v4 + v1contributes
Next,
v4 = sin(v3)contributes
Next,
v3 = v1 * v2contributes
At the end,
Reverse mode computes all input sensitivities for one scalar output in one backward pass.
Why Straight-Line Programs Matter
Most real programs are not straight-line programs. They contain branches, loops, mutation, arrays, function calls, and dynamic allocation. Still, straight-line programs are the right starting point because every execution of a real program produces a concrete trace.
For one particular run, the program executes a definite sequence of operations. That executed trace can be treated as a straight-line program.
For example:
if x > 0:
y = x * x
else:
y = -xFor a particular input, only one branch runs. If , the executed trace is:
v1 = x
v2 = v1 * v1
y = v2If , the executed trace is:
v1 = x
v2 = -v1
y = v2AD differentiates the executed computation. This is why dynamic computation graphs can support ordinary control flow: the trace for each execution is a straight-line program.
Computational Graph View
A straight-line program can also be represented as a directed acyclic graph. Variables are nodes, and operations create edges from inputs to outputs.
For
v3 = v1 * v2
v4 = sin(v3)
v5 = v4 + v1the dependencies are:
v1 ─┬─> v3 ─> v4 ─┐
│ ├─> v5
v2 ─┘ │
v1 ───────────────┘The graph is acyclic because each variable depends only on variables computed earlier.
Forward mode moves along the graph from inputs to outputs. Reverse mode moves against the graph from outputs to inputs.
Straight-Line Programs and the Chain Rule
The mathematical basis is repeated composition.
Suppose a program computes:
Then
The derivative is
AD generalizes this to many variables and many operations. Each line contributes a local derivative. The global derivative is the product or accumulation of these local derivatives according to the program dependency structure.
Locality
The key property of straight-line AD is locality. To differentiate a line, AD only needs:
- the operation,
- the input values,
- the derivative rule for that operation.
For example, to differentiate
v = a * bAD uses
It does not need to know where and came from. It does not need to know how will be used later. This local structure is what makes AD compositional and implementable.
Operational Form
A simple internal representation for a straight-line program is a list of instructions:
type Op int
const (
Add Op = iota
Mul
Sin
)
type Instr struct {
Op Op
In []int
Out int
}A program can then be stored as:
type Program struct {
NumInputs int
NumVars int
Instrs []Instr
Outputs []int
}For the example , the instruction list might be:
Program{
NumInputs: 2,
NumVars: 5,
Instrs: []Instr{
{Op: Mul, In: []int{0, 1}, Out: 2},
{Op: Sin, In: []int{2}, Out: 3},
{Op: Add, In: []int{3, 0}, Out: 4},
},
Outputs: []int{4},
}This representation is close to what many AD systems store internally, though production systems add types, shapes, memory locations, aliases, and device placement.
Limits of the Model
Straight-line programs deliberately ignore several hard parts of real computation.
They do not model branches directly. They do not model loops directly. They do not model mutation, aliasing, exceptions, I/O, concurrency, or nondeterminism. They also do not describe how to differentiate through data-dependent program structure.
These limitations are useful. They isolate the core mechanism of AD from the complications of full programming languages. Once the straight-line case is understood, control flow can be treated as a method for generating straight-line traces, or as higher-level structure that an AD compiler must transform.
Core Idea
A straight-line program is the atomic model of automatic differentiation. It turns a function into an ordered sequence of elementary operations. AD then differentiates the program by attaching derivative propagation rules to each operation.
Forward mode propagates tangents in execution order. Reverse mode records the execution and propagates adjoints backward. Both modes depend on the same fact: a computation can be decomposed into local steps, and the chain rule composes the derivatives of those steps.