Skip to content

Chapter 3. Programs as Mathematical Objects

A straight-line program is the simplest model of computation used in automatic differentiation. It is a program with a fixed sequence of assignments, no branches, no loops,...

Straight-Line Programs

A straight-line program is the simplest model of computation used in automatic differentiation. It is a program with a fixed sequence of assignments, no branches, no loops, and no recursion. Every operation is executed exactly once, in a known order.

A typical straight-line program has the form:

v1 = x1
v2 = x2
v3 = v1 * v2
v4 = sin(v3)
v5 = v4 + v1
y  = v5

The program computes a function from inputs to outputs. In this example, the function is

y=sin(x1x2)+x1. y = \sin(x_1 x_2) + x_1.

Automatic differentiation treats this program as a sequence of elementary operations. Each assignment introduces an intermediate variable. Each intermediate variable depends only on earlier variables. This gives the program a natural evaluation order.

Straight-Line Programs as Factorizations

A straight-line program factors a complicated function into simple local maps.

For example, instead of differentiating

f(x1,x2)=sin(x1x2)+x1 f(x_1, x_2) = \sin(x_1 x_2) + x_1

as one expression, AD differentiates the sequence:

v3=x1x2 v_3 = x_1 x_2 v4=sin(v3) v_4 = \sin(v_3) v5=v4+x1 v_5 = v_4 + x_1

Each step has a simple derivative. The derivative of the whole program is assembled by the chain rule.

This is the central idea of AD: do not symbolically rewrite the whole expression, and do not approximate derivatives by finite differences. Instead, run the original computation while propagating exact local derivative information.

Inputs, Intermediates, and Outputs

A straight-line program separates variables into three groups.

KindMeaningExample
Input variablesValues supplied from outside the programx1, x2
Intermediate variablesValues computed by the programv3, v4
Output variablesFinal returned valuesy

The intermediate variables are important because AD attaches derivative information to them.

In forward mode, each variable carries a tangent value:

(vi,v˙i) (v_i, \dot v_i)

where viv_i is the primal value and v˙i\dot v_i is the directional derivative.

In reverse mode, each variable later receives an adjoint:

vˉi=yvi. \bar v_i = \frac{\partial y}{\partial v_i}.

The straight-line structure makes both directions easy to define.

Example: Forward Propagation

Consider

v1 = x1
v2 = x2
v3 = v1 * v2
v4 = sin(v3)
v5 = v4 + v1
y  = v5

To compute the derivative with respect to x1x_1, seed

v˙1=1,v˙2=0. \dot v_1 = 1,\qquad \dot v_2 = 0.

Then propagate:

v˙3=v˙1v2+v1v˙2=x2 \dot v_3 = \dot v_1 v_2 + v_1 \dot v_2 = x_2 v˙4=cos(v3)v˙3=cos(x1x2)x2 \dot v_4 = \cos(v_3)\dot v_3 = \cos(x_1x_2)x_2 v˙5=v˙4+v˙1 \dot v_5 = \dot v_4 + \dot v_1

so

yx1=x2cos(x1x2)+1. \frac{\partial y}{\partial x_1} = x_2\cos(x_1x_2) + 1.

The derivative was obtained by applying one local rule per line.

Example: Reverse Propagation

For the same program, reverse mode starts from the output:

yˉ=1. \bar y = 1.

Since y=v5y = v_5,

vˉ5=1. \bar v_5 = 1.

Then move backward:

v5 = v4 + v1

contributes

vˉ4+=vˉ5,vˉ1+=vˉ5. \bar v_4 += \bar v_5, \qquad \bar v_1 += \bar v_5.

Next,

v4 = sin(v3)

contributes

vˉ3+=vˉ4cos(v3). \bar v_3 += \bar v_4 \cos(v_3).

Next,

v3 = v1 * v2

contributes

vˉ1+=vˉ3v2,vˉ2+=vˉ3v1. \bar v_1 += \bar v_3 v_2, \qquad \bar v_2 += \bar v_3 v_1.

At the end,

vˉ1=yx1,vˉ2=yx2. \bar v_1 = \frac{\partial y}{\partial x_1}, \qquad \bar v_2 = \frac{\partial y}{\partial x_2}.

Reverse mode computes all input sensitivities for one scalar output in one backward pass.

Why Straight-Line Programs Matter

Most real programs are not straight-line programs. They contain branches, loops, mutation, arrays, function calls, and dynamic allocation. Still, straight-line programs are the right starting point because every execution of a real program produces a concrete trace.

For one particular run, the program executes a definite sequence of operations. That executed trace can be treated as a straight-line program.

For example:

if x > 0:
    y = x * x
else:
    y = -x

For a particular input, only one branch runs. If x>0x > 0, the executed trace is:

v1 = x
v2 = v1 * v1
y  = v2

If x0x \le 0, the executed trace is:

v1 = x
v2 = -v1
y  = v2

AD differentiates the executed computation. This is why dynamic computation graphs can support ordinary control flow: the trace for each execution is a straight-line program.

Computational Graph View

A straight-line program can also be represented as a directed acyclic graph. Variables are nodes, and operations create edges from inputs to outputs.

For

v3 = v1 * v2
v4 = sin(v3)
v5 = v4 + v1

the dependencies are:

v1 ─┬─> v3 ─> v4 ─┐
    │             ├─> v5
v2 ─┘             │
v1 ───────────────┘

The graph is acyclic because each variable depends only on variables computed earlier.

Forward mode moves along the graph from inputs to outputs. Reverse mode moves against the graph from outputs to inputs.

Straight-Line Programs and the Chain Rule

The mathematical basis is repeated composition.

Suppose a program computes:

v1=g1(x) v_1 = g_1(x) v2=g2(v1) v_2 = g_2(v_1) v3=g3(v2) v_3 = g_3(v_2)

Then

y=v3=g3(g2(g1(x))). y = v_3 = g_3(g_2(g_1(x))).

The derivative is

dydx=g3(v2)g2(v1)g1(x). \frac{dy}{dx} = g_3'(v_2)g_2'(v_1)g_1'(x).

AD generalizes this to many variables and many operations. Each line contributes a local derivative. The global derivative is the product or accumulation of these local derivatives according to the program dependency structure.

Locality

The key property of straight-line AD is locality. To differentiate a line, AD only needs:

  1. the operation,
  2. the input values,
  3. the derivative rule for that operation.

For example, to differentiate

v = a * b

AD uses

dv=bda+adb. dv = b\,da + a\,db.

It does not need to know where aa and bb came from. It does not need to know how vv will be used later. This local structure is what makes AD compositional and implementable.

Operational Form

A simple internal representation for a straight-line program is a list of instructions:

type Op int

const (
    Add Op = iota
    Mul
    Sin
)

type Instr struct {
    Op   Op
    In   []int
    Out  int
}

A program can then be stored as:

type Program struct {
    NumInputs int
    NumVars   int
    Instrs    []Instr
    Outputs   []int
}

For the example y=sin(x1x2)+x1y = \sin(x_1x_2) + x_1, the instruction list might be:

Program{
    NumInputs: 2,
    NumVars:   5,
    Instrs: []Instr{
        {Op: Mul, In: []int{0, 1}, Out: 2},
        {Op: Sin, In: []int{2},    Out: 3},
        {Op: Add, In: []int{3, 0}, Out: 4},
    },
    Outputs: []int{4},
}

This representation is close to what many AD systems store internally, though production systems add types, shapes, memory locations, aliases, and device placement.

Limits of the Model

Straight-line programs deliberately ignore several hard parts of real computation.

They do not model branches directly. They do not model loops directly. They do not model mutation, aliasing, exceptions, I/O, concurrency, or nondeterminism. They also do not describe how to differentiate through data-dependent program structure.

These limitations are useful. They isolate the core mechanism of AD from the complications of full programming languages. Once the straight-line case is understood, control flow can be treated as a method for generating straight-line traces, or as higher-level structure that an AD compiler must transform.

Core Idea

A straight-line program is the atomic model of automatic differentiation. It turns a function into an ordered sequence of elementary operations. AD then differentiates the program by attaching derivative propagation rules to each operation.

Forward mode propagates tangents in execution order. Reverse mode records the execution and propagates adjoints backward. Both modes depend on the same fact: a computation can be decomposed into local steps, and the chain rule composes the derivatives of those steps.