Forward mode automatic differentiation computes derivatives by carrying two values through a program at the same time: the ordinary value and its tangent. The ordinary value...
Tangent Propagation
Forward mode automatic differentiation computes derivatives by carrying two values through a program at the same time: the ordinary value and its tangent. The ordinary value tells us what the program computes. The tangent tells us how that value changes when the input is perturbed.
For a scalar function
we usually write the derivative as . Forward mode instead asks a slightly more operational question:
Given a small input perturbation , what perturbation appears in the output?
If
then the tangent satisfies
The dot notation does not mean time derivative here. It means tangent value. The pair
represents both the primal value and the directional change .
For a multivariate function
the same idea becomes
where is the Jacobian of at . Forward mode computes this Jacobian-vector product directly, without first constructing the full Jacobian.
Tangents as first-order perturbations
A tangent can be understood through a first-order expansion. Suppose the input is perturbed from to
where is small. Then
Forward mode keeps the coefficient of . It discards all higher-order terms. Therefore, each intermediate value carries a first-order local approximation of how it changes.
For a program variable , forward mode stores
The first component is the normal runtime value. The second component is the tangent propagated from the inputs.
Propagation through primitive operations
Consider a program built from elementary operations. Forward mode assigns each operation a rule for both the primal result and the tangent result.
If
then
If
then
If
then
If
then
Each rule is just the ordinary derivative rule applied locally. The important point is that the rule is applied during normal program execution. There is no separate symbolic expression for the whole function.
Example
Let
Write the program as
To compute , seed the input tangent with
Then propagate:
So the output tangent is
At , the program computes
and the tangent computes
The result of forward mode is therefore the pair
The first value is the function output. The second value is the derivative in the seeded direction.
Directional derivatives
For a function with many inputs, the tangent seed chooses a direction.
Let
If we seed
then forward mode computes the derivative with respect to . If we seed
then it computes the derivative with respect to .
More generally, if
then forward mode computes the directional derivative
This is the basic reason forward mode is efficient when the number of input directions is small. One run gives one Jacobian-vector product. To compute a full Jacobian with input dimensions, we usually need forward-mode runs, one for each basis direction.
Tangent propagation as program execution
Forward mode can be viewed as a mechanical transformation of a program.
A statement such as
z = x * ybecomes
z = x * y
dz = dx * y + x * dyA statement such as
z = sin(x)becomes
z = sin(x)
dz = cos(x) * dxThe transformed program follows the same control flow as the original program. It executes the primal computation and the tangent computation side by side.
This gives forward mode an important implementation property: it does not need to store the whole computation for a later reverse pass. Once an operation has propagated its tangent, its local derivative information can often be discarded.
Locality
Tangent propagation is local. Each operation only needs:
- The primal inputs.
- The tangent inputs.
- The derivative rule for that operation.
It does not need to know the full expression that produced its inputs. It also does not need to know how its output will be used later.
This locality makes forward mode simple to implement with operator overloading. A number type can store both a primal value and a tangent value. Arithmetic operators are then overloaded to propagate both fields.
For example, conceptually:
type Dual struct {
Value float64
Tangent float64
}
func Add(x, y Dual) Dual {
return Dual{
Value: x.Value + y.Value,
Tangent: x.Tangent + y.Tangent,
}
}
func Mul(x, y Dual) Dual {
return Dual{
Value: x.Value * y.Value,
Tangent: x.Tangent*y.Value + x.Value*y.Tangent,
}
}This structure is the operational core of forward mode. More advanced systems generalize it to vectors, arrays, tensors, sparse directions, and higher-order tangents.
Cost model
For scalar tangents, forward mode usually adds a small constant factor to the cost of the primal computation. Each primitive operation performs its normal computation and a tangent computation.
For example, multiplication changes from
z = x * yto
z = x * y
dz = dx * y + x * dySo one multiplication becomes one primal multiplication plus a few extra arithmetic operations.
If the tangent is a vector of directions, then each variable carries tangent components. The cost becomes roughly proportional to . This is useful when is small. It becomes expensive when approaches the full input dimension.
The role of seeding
The tangent seed defines the derivative query.
For
the input tangent is a vector in input space. Forward mode computes
To compute the first column of the Jacobian, use
To compute the second column, use
Repeating this for all standard basis vectors gives the full Jacobian. However, when only a directional derivative is needed, one seed is enough.
This is common in scientific computing, sensitivity analysis, and implicit methods, where the question is often not “what is the whole Jacobian?” but “how does the output change in this particular direction?”
Summary of the mechanism
Forward mode automatic differentiation propagates tangents through the same computation that produces the primal output. Each variable is extended from a value into a pair:
Each primitive operation is extended by its local derivative rule. The final tangent is the derivative of the output in the seeded input direction.
Forward mode is direct, local, and exact up to the floating point behavior of the underlying computation. Its natural output is a Jacobian-vector product, which makes it especially effective for functions with few inputs or for derivative queries involving a small number of directions.