Forward mode automatic differentiation computes Jacobian-vector products:
Forward mode automatic differentiation computes Jacobian-vector products:
The seed vector
determines which derivative information propagates through the program. Choosing good seeds is therefore central to efficiency.
A naive approach computes one Jacobian column at a time using basis vectors:
This always works, but it may waste computation when the Jacobian has structure. Efficient seeding strategies reduce the number of forward passes, reduce tangent storage, or improve hardware utilization.
Basis seeding
The simplest strategy uses standard basis vectors.
For
seed:
Then forward mode computes:
which equals the -th Jacobian column.
For example, if
then:
The full Jacobian is assembled column-by-column:
This method is simple and numerically stable. Its cost is proportional to the input dimension:
Dense identity seeding
Instead of one basis vector per pass, vector forward mode can seed all basis directions simultaneously.
Use:
Then:
This computes the entire Jacobian in one execution.
However, each variable now carries an -dimensional tangent vector. The runtime and memory cost become:
This strategy is practical only for moderate .
Arbitrary directional seeding
Sometimes only directional sensitivity is needed.
Suppose we want:
Then we directly seed:
This avoids constructing the full Jacobian.
Applications include:
| Application | Direction meaning |
|---|---|
| sensitivity analysis | perturbation direction |
| optimization | search direction |
| PDE solvers | linearized update |
| continuation methods | parameter path |
| implicit differentiation | perturbation constraint |
Directional seeding is often the most efficient use of forward mode.
Random seeding
Random seeds are useful for probabilistic estimation and debugging.
Choose:
Then compute:
Applications include:
| Use case | Goal |
|---|---|
| Jacobian norm estimation | approximate operator norm |
| randomized linear algebra | low-rank structure |
| derivative testing | detect implementation bugs |
| trace estimation | Hutchinson-type methods |
| sensitivity sampling | stochastic analysis |
Random directional derivatives often reveal structural problems without building the full Jacobian.
Structured seeding
Many Jacobians are sparse. Basis seeding wastes work because most derivative entries are zero.
Suppose:
$$ J_f(x) = \begin{bmatrix}
- & 0 & * & 0 \ 0 & * & 0 & * \
- & 0 & 0 & 0 \end{bmatrix}. $$
Columns 1 and 2 never contribute to the same output row. Their seed directions can therefore share one tangent dimension.
Instead of:
we may use compressed seeds:
Now two forward passes recover all columns.
This is the basis of compressed Jacobian computation.
Graph coloring
Efficient sparse seeding is usually formulated as a graph coloring problem.
Define the column interference graph:
- each Jacobian column is a node,
- two columns share an edge if they can affect the same output row.
Columns with no conflict may share a tangent direction.
Coloring assigns independent columns the same seed channel.
If the coloring uses colors, the Jacobian can be recovered in forward passes instead of .
For sparse systems:
This can reduce derivative cost dramatically.
Applications:
| Domain | Sparsity source |
|---|---|
| PDE discretization | local coupling |
| finite element methods | mesh locality |
| circuit simulation | component connectivity |
| robotics | chain topology |
| optimization | block constraints |
Example of compressed seeding
Suppose:
Columns 1 and 2 do not overlap in rows.
Use compressed seed:
Then:
One forward pass recovers both columns simultaneously.
Next seed:
This gives:
Only two passes are needed instead of three.
Block seeding
Variables often appear naturally in groups.
Suppose:
Each block may correspond to:
- one physical subsystem,
- one tensor,
- one parameter group,
- one mesh region,
- one neural layer.
Block seeding assigns tangent dimensions per block rather than per scalar variable.
Advantages:
| Benefit | Explanation |
|---|---|
| better cache locality | contiguous tangent access |
| vectorized kernels | tensor-friendly layout |
| simpler memory management | fewer sparse structures |
| reduced overhead | coarser granularity |
This is common in tensor AD systems.
Tensor seeding
Machine learning systems usually work with tensors rather than scalar variables.
Suppose:
Instead of seeding each scalar independently, a tangent tensor is attached:
Tensor-level seeding allows:
- batched JVPs,
- hardware tensor acceleration,
- fused tangent kernels,
- layerwise differentiation.
Modern frameworks frequently optimize around tensor tangent propagation rather than scalar dual arithmetic.
Seeding for Hessian-vector products
Forward mode combines naturally with reverse mode.
Suppose:
Its gradient is:
Differentiate the gradient in direction :
where is the Hessian.
A common strategy:
- compute gradient with reverse mode,
- apply forward mode to the reverse computation.
The forward seed defines the Hessian-vector direction.
This avoids explicit Hessian construction.
Seed matrices
General vector forward mode uses a seed matrix:
Each column is one tangent direction.
The output is:
Choice of determines:
| Seed type | Result |
|---|---|
| identity | full Jacobian |
| basis subset | selected columns |
| random | stochastic directional probes |
| sparse compressed | sparse Jacobian recovery |
| low-rank | projected sensitivities |
| block structured | subsystem derivatives |
Thus seeding defines the derivative query algebraically.
Adaptive seeding
Some systems choose seeds dynamically.
Example strategies:
| Strategy | Goal |
|---|---|
| runtime sparsity discovery | reduce tangent dimensions |
| adaptive coloring | exploit observed dependencies |
| low-rank updates | reduce directional count |
| dynamic batching | improve GPU occupancy |
| selective propagation | avoid inactive paths |
Adaptive methods are important when derivative structure changes across executions.
Seed reuse
In iterative algorithms, the same derivative directions often appear repeatedly.
Examples:
| Algorithm | Reused direction |
|---|---|
| Newton-Krylov methods | Krylov basis vectors |
| continuation methods | path tangent |
| optimization | search direction |
| sensitivity analysis | parameter perturbations |
Caching or reusing tangent allocations can reduce overhead.
Some systems preallocate tangent buffers to avoid repeated memory allocation during many JVP evaluations.
Forward-over-forward seeding
Nested forward mode introduces hierarchical tangent structures.
Suppose:
Now seeds exist at multiple derivative levels.
This enables:
| Nesting | Result |
|---|---|
| forward-over-forward | second derivatives |
| vector-forward-over-forward | Hessians |
| nested sparse tangents | structured higher-order tensors |
Seed management becomes increasingly important as derivative order grows.
Numerical conditioning
The seed direction also affects numerical behavior.
If:
small floating point errors may dominate.
Large or badly scaled seed vectors can amplify instability.
Good practice includes:
- scaling tangent directions,
- normalizing perturbation vectors,
- avoiding extremely sparse tiny magnitudes,
- using structured seeds aligned with problem geometry.
This becomes important in ill-conditioned scientific problems.
Seeding and hardware layout
Efficient implementations often organize tangent dimensions to match hardware.
CPU implementations may use:
- SIMD lane packing,
- structure-of-arrays layouts,
- cache-aligned tangent blocks.
GPU implementations may use:
- batched tensor kernels,
- warp-aligned tangent dimensions,
- fused primal-tangent kernels.
The algebraic seed structure therefore interacts directly with low-level systems design.
Tradeoffs
Different seeding strategies optimize different goals.
| Strategy | Best for | Weakness |
|---|---|---|
| basis seeding | simplicity | many passes |
| dense identity | small full Jacobians | memory explosion |
| directional | sensitivity queries | incomplete information |
| sparse compressed | sparse systems | coloring overhead |
| block seeding | tensor programs | coarse granularity |
| random | stochastic estimation | approximate results |
No single strategy dominates all workloads.
Summary
The seed vector or seed matrix determines which derivative information forward mode computes. Efficient seeding strategies exploit structure in the derivative problem to reduce runtime, memory usage, and tangent dimensionality.
The simplest strategy uses basis vectors to recover Jacobian columns individually. More advanced methods use compressed sparse seeds, graph coloring, block tangents, tensor seeds, or randomized projections. In large-scale systems, the efficiency of forward mode depends as much on seeding strategy as on the differentiation rules themselves.