The natural output of forward mode automatic differentiation is a Jacobian-vector product. Instead of constructing the full Jacobian matrix explicitly, forward mode computes...
The natural output of forward mode automatic differentiation is a Jacobian-vector product. Instead of constructing the full Jacobian matrix explicitly, forward mode computes how a perturbation vector propagates through a function.
For a function
the Jacobian at is
Given a direction vector
forward mode computes
This product is called a Jacobian-vector product, usually abbreviated JVP.
Geometric interpretation
A differentiable function locally behaves like a linear map. Around a point ,
If we perturb the input in direction ,
then the first-order output perturbation is
So the JVP tells us how infinitesimal motion in input space transforms into infinitesimal motion in output space.
Forward mode computes exactly this transformed direction.
Tangent propagation produces JVPs
Suppose the inputs are seeded with tangents:
Forward propagation computes tangent values for all intermediate variables. The final output tangent is
The tangent vector is therefore the directional derivative of the function in direction .
This is why forward mode is sometimes described as directional differentiation.
Example: scalar output
Consider
The Jacobian is
Choose direction
Then
Now compute the same result using forward mode.
Seed:
Evaluate:
Substitute:
Next:
Finally:
So
This equals the Jacobian-vector product.
Example: vector output
Now consider
The Jacobian is
For direction
the JVP is
Forward mode computes this directly.
Seed:
Then:
The output tangent vector is exactly the JVP.
JVPs without explicit Jacobians
The important point is that forward mode never forms the Jacobian matrix explicitly.
For a large system, the Jacobian may be enormous. Suppose
The full Jacobian contains entries. Explicit storage is often impossible.
Forward mode avoids this cost. It computes
directly by propagating one tangent vector through the computation graph.
This is especially valuable when:
- Only directional derivatives are needed.
- The Jacobian is sparse or implicit.
- Forming the full matrix would be too expensive.
Computational complexity
Suppose the primal function evaluation costs .
A forward-mode JVP typically costs approximately
up to a small constant factor.
The tangent computation follows the same graph as the primal computation. Each primitive performs some extra local derivative work, but the asymptotic complexity is usually unchanged.
Computing the full Jacobian is more expensive.
For
one forward pass computes one JVP. To recover the full Jacobian, we usually evaluate:
where are standard basis vectors.
Thus full Jacobian construction requires approximately forward passes.
Forward mode is therefore efficient when:
or when only a few directional derivatives are required.
Matrix view of tangent propagation
Each intermediate variable has a tangent:
If the primitive operation is
then
This is exactly multiplication by the local Jacobian of the primitive.
The entire computation graph therefore performs repeated local matrix-vector multiplications:
Forward mode composes these local linear maps incrementally during execution.
Relation to the chain rule
Suppose
Then
Apply this Jacobian to a vector :
Forward mode computes exactly this sequence:
- Push through .
- Push the resulting tangent through .
The tangent vector flows forward through the composed computation.
This is the operational form of the chain rule.
Basis seeding
To compute a specific partial derivative, choose a basis direction.
For
suppose we want
Use seed:
Then
More generally:
| Seed vector | Result |
|---|---|
| First Jacobian column | |
| Second Jacobian column | |
| -th Jacobian column | |
| arbitrary | directional derivative |
Thus the seed determines the derivative query.
Multiple directions simultaneously
Forward mode can propagate several tangent directions at once.
Instead of scalar tangents,
use tangent matrices:
Each variable now carries tangent components.
The output becomes
where
This computes JVPs simultaneously.
If
the identity matrix, then
so the full Jacobian is recovered in one vectorized pass. However, this may require large tangent storage and substantial arithmetic overhead.
JVPs in machine learning
Modern machine learning systems frequently use JVPs.
Applications include:
| Application | Use of JVP |
|---|---|
| Sensitivity analysis | perturbation propagation |
| Meta-learning | differentiating parameter updates |
| Implicit layers | linearized solver differentiation |
| Neural ODEs | tangent dynamics |
| Hessian-vector products | nested differentiation |
| Second-order optimization | curvature approximations |
| Physics simulation | variational equations |
Many algorithms only require products with derivatives, not explicit derivative matrices.
This distinction is fundamental in large-scale systems.
JVP versus VJP
Forward mode computes
Reverse mode computes
The reverse-mode product is called a vector-Jacobian product (VJP) or adjoint product.
The two have complementary complexity profiles:
| Mode | Natural product | Efficient when |
|---|---|---|
| Forward mode | few inputs | |
| Reverse mode | few outputs |
For scalar-output functions,
reverse mode computes the full gradient in one pass, while forward mode needs passes.
For scalar-input functions,
forward mode computes the full derivative vector in one pass.
Linearization viewpoint
A JVP can also be viewed as evaluation of the linearized function.
Define the linearization of at :
Forward mode computes
without materializing as a matrix.
In many systems, the linearized operator is more important than the Jacobian itself. Optimization methods, Krylov solvers, Newton methods, and sensitivity analysis often only require repeated applications of the linearized operator.
Forward mode naturally exposes this operator form.
Sparse directional propagation
If the seed vector is sparse, tangent propagation only activates dependent computations.
For example, if
for most components, many tangent computations remain zero.
This property is useful for:
- sparse Jacobian estimation,
- localized sensitivity analysis,
- block-structured systems,
- PDE discretizations,
- graph-based models.
Efficient sparse forward-mode systems exploit this structure to reduce arithmetic and memory cost.
Summary
Forward mode automatic differentiation naturally computes Jacobian-vector products:
A tangent seed vector defines an infinitesimal perturbation direction. Tangent propagation pushes this perturbation through the computation graph using local derivative rules. The resulting output tangent is the directional derivative of the function.
The key property is that forward mode computes JVPs directly, without explicitly forming Jacobian matrices. This makes it effective for directional sensitivity analysis, sparse systems, higher-order methods, and problems where the number of input directions is small.