Skip to content

Efficient Seeding Strategies

Forward mode automatic differentiation computes Jacobian-vector products:

Forward mode automatic differentiation computes Jacobian-vector products:

Jf(x)v. J_f(x)v.

The seed vector

v v

determines which derivative information propagates through the program. Choosing good seeds is therefore central to efficiency.

A naive approach computes one Jacobian column at a time using basis vectors:

e1,e2,,en. e_1, e_2, \ldots, e_n.

This always works, but it may waste computation when the Jacobian has structure. Efficient seeding strategies reduce the number of forward passes, reduce tangent storage, or improve hardware utilization.

Basis seeding

The simplest strategy uses standard basis vectors.

For

f:RnRm, f : \mathbb{R}^n \to \mathbb{R}^m,

seed:

v=ei. v = e_i.

Then forward mode computes:

Jf(x)ei, J_f(x)e_i,

which equals the ii-th Jacobian column.

For example, if

n=3, n = 3,

then:

e1=[100],e2=[010],e3=[001]. e_1 = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \qquad e_2 = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}, \qquad e_3 = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}.

The full Jacobian is assembled column-by-column:

Jf(x)=[Jf(x)e1Jf(x)e2Jf(x)e3]. J_f(x) = \begin{bmatrix} J_f(x)e_1 & J_f(x)e_2 & J_f(x)e_3 \end{bmatrix}.

This method is simple and numerically stable. Its cost is proportional to the input dimension:

O(nCf). O(nC_f).

Dense identity seeding

Instead of one basis vector per pass, vector forward mode can seed all basis directions simultaneously.

Use:

V=In. V = I_n.

Then:

Jf(x)V=Jf(x). J_f(x)V = J_f(x).

This computes the entire Jacobian in one execution.

However, each variable now carries an nn-dimensional tangent vector. The runtime and memory cost become:

O(nCf),O(nMf). O(nC_f), \qquad O(nM_f).

This strategy is practical only for moderate nn.

Arbitrary directional seeding

Sometimes only directional sensitivity is needed.

Suppose we want:

Jf(x)v. J_f(x)v.

Then we directly seed:

x˙=v. \dot{x} = v.

This avoids constructing the full Jacobian.

Applications include:

ApplicationDirection meaning
sensitivity analysisperturbation direction
optimizationsearch direction
PDE solverslinearized update
continuation methodsparameter path
implicit differentiationperturbation constraint

Directional seeding is often the most efficient use of forward mode.

Random seeding

Random seeds are useful for probabilistic estimation and debugging.

Choose:

viN(0,1). v_i \sim \mathcal{N}(0,1).

Then compute:

Jf(x)v. J_f(x)v.

Applications include:

Use caseGoal
Jacobian norm estimationapproximate operator norm
randomized linear algebralow-rank structure
derivative testingdetect implementation bugs
trace estimationHutchinson-type methods
sensitivity samplingstochastic analysis

Random directional derivatives often reveal structural problems without building the full Jacobian.

Structured seeding

Many Jacobians are sparse. Basis seeding wastes work because most derivative entries are zero.

Suppose:

$$ J_f(x) = \begin{bmatrix}

  • & 0 & * & 0 \ 0 & * & 0 & * \
  • & 0 & 0 & 0 \end{bmatrix}. $$

Columns 1 and 2 never contribute to the same output row. Their seed directions can therefore share one tangent dimension.

Instead of:

e1,e2,e3,e4, e_1, \quad e_2, \quad e_3, \quad e_4,

we may use compressed seeds:

v1=[1100],v2=[0011]. v_1 = \begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix}, \qquad v_2 = \begin{bmatrix} 0 \\ 0 \\ 1 \\ 1 \end{bmatrix}.

Now two forward passes recover all columns.

This is the basis of compressed Jacobian computation.

Graph coloring

Efficient sparse seeding is usually formulated as a graph coloring problem.

Define the column interference graph:

  • each Jacobian column is a node,
  • two columns share an edge if they can affect the same output row.

Columns with no conflict may share a tangent direction.

Coloring assigns independent columns the same seed channel.

If the coloring uses kk colors, the Jacobian can be recovered in kk forward passes instead of nn.

For sparse systems:

kn. k \ll n.

This can reduce derivative cost dramatically.

Applications:

DomainSparsity source
PDE discretizationlocal coupling
finite element methodsmesh locality
circuit simulationcomponent connectivity
roboticschain topology
optimizationblock constraints

Example of compressed seeding

Suppose:

J=[a0c0b0]. J = \begin{bmatrix} a & 0 & c \\ 0 & b & 0 \end{bmatrix}.

Columns 1 and 2 do not overlap in rows.

Use compressed seed:

v=[110]. v = \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}.

Then:

Jv=[ab]. Jv = \begin{bmatrix} a \\ b \end{bmatrix}.

One forward pass recovers both columns simultaneously.

Next seed:

v=[001]. v = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}.

This gives:

[c0]. \begin{bmatrix} c \\ 0 \end{bmatrix}.

Only two passes are needed instead of three.

Block seeding

Variables often appear naturally in groups.

Suppose:

x=(x(1),x(2),,x(k)). x = (x^{(1)}, x^{(2)}, \ldots, x^{(k)}).

Each block may correspond to:

  • one physical subsystem,
  • one tensor,
  • one parameter group,
  • one mesh region,
  • one neural layer.

Block seeding assigns tangent dimensions per block rather than per scalar variable.

Advantages:

BenefitExplanation
better cache localitycontiguous tangent access
vectorized kernelstensor-friendly layout
simpler memory managementfewer sparse structures
reduced overheadcoarser granularity

This is common in tensor AD systems.

Tensor seeding

Machine learning systems usually work with tensors rather than scalar variables.

Suppose:

WRm×n. W \in \mathbb{R}^{m \times n}.

Instead of seeding each scalar independently, a tangent tensor is attached:

W˙Rm×n. \dot{W} \in \mathbb{R}^{m \times n}.

Tensor-level seeding allows:

  • batched JVPs,
  • hardware tensor acceleration,
  • fused tangent kernels,
  • layerwise differentiation.

Modern frameworks frequently optimize around tensor tangent propagation rather than scalar dual arithmetic.

Seeding for Hessian-vector products

Forward mode combines naturally with reverse mode.

Suppose:

f:RnR. f : \mathbb{R}^n \to \mathbb{R}.

Its gradient is:

f(x). \nabla f(x).

Differentiate the gradient in direction vv:

Hf(x)v, H_f(x)v,

where Hf(x)H_f(x) is the Hessian.

A common strategy:

  1. compute gradient with reverse mode,
  2. apply forward mode to the reverse computation.

The forward seed defines the Hessian-vector direction.

This avoids explicit Hessian construction.

Seed matrices

General vector forward mode uses a seed matrix:

VRn×k. V \in \mathbb{R}^{n \times k}.

Each column is one tangent direction.

The output is:

Jf(x)V. J_f(x)V.

Choice of VV determines:

Seed typeResult
identityfull Jacobian
basis subsetselected columns
randomstochastic directional probes
sparse compressedsparse Jacobian recovery
low-rankprojected sensitivities
block structuredsubsystem derivatives

Thus seeding defines the derivative query algebraically.

Adaptive seeding

Some systems choose seeds dynamically.

Example strategies:

StrategyGoal
runtime sparsity discoveryreduce tangent dimensions
adaptive coloringexploit observed dependencies
low-rank updatesreduce directional count
dynamic batchingimprove GPU occupancy
selective propagationavoid inactive paths

Adaptive methods are important when derivative structure changes across executions.

Seed reuse

In iterative algorithms, the same derivative directions often appear repeatedly.

Examples:

AlgorithmReused direction
Newton-Krylov methodsKrylov basis vectors
continuation methodspath tangent
optimizationsearch direction
sensitivity analysisparameter perturbations

Caching or reusing tangent allocations can reduce overhead.

Some systems preallocate tangent buffers to avoid repeated memory allocation during many JVP evaluations.

Forward-over-forward seeding

Nested forward mode introduces hierarchical tangent structures.

Suppose:

x+ϵ1a+ϵ2b+ϵ1ϵ2c. x + \epsilon_1 a + \epsilon_2 b + \epsilon_1\epsilon_2 c.

Now seeds exist at multiple derivative levels.

This enables:

NestingResult
forward-over-forwardsecond derivatives
vector-forward-over-forwardHessians
nested sparse tangentsstructured higher-order tensors

Seed management becomes increasingly important as derivative order grows.

Numerical conditioning

The seed direction also affects numerical behavior.

If:

Jf(x)v0, J_f(x)v \approx 0,

small floating point errors may dominate.

Large or badly scaled seed vectors can amplify instability.

Good practice includes:

  • scaling tangent directions,
  • normalizing perturbation vectors,
  • avoiding extremely sparse tiny magnitudes,
  • using structured seeds aligned with problem geometry.

This becomes important in ill-conditioned scientific problems.

Seeding and hardware layout

Efficient implementations often organize tangent dimensions to match hardware.

CPU implementations may use:

  • SIMD lane packing,
  • structure-of-arrays layouts,
  • cache-aligned tangent blocks.

GPU implementations may use:

  • batched tensor kernels,
  • warp-aligned tangent dimensions,
  • fused primal-tangent kernels.

The algebraic seed structure therefore interacts directly with low-level systems design.

Tradeoffs

Different seeding strategies optimize different goals.

StrategyBest forWeakness
basis seedingsimplicitymany passes
dense identitysmall full Jacobiansmemory explosion
directionalsensitivity queriesincomplete information
sparse compressedsparse systemscoloring overhead
block seedingtensor programscoarse granularity
randomstochastic estimationapproximate results

No single strategy dominates all workloads.

Summary

The seed vector or seed matrix determines which derivative information forward mode computes. Efficient seeding strategies exploit structure in the derivative problem to reduce runtime, memory usage, and tangent dimensionality.

The simplest strategy uses basis vectors to recover Jacobian columns individually. More advanced methods use compressed sparse seeds, graph coloring, block tangents, tensor seeds, or randomized projections. In large-scale systems, the efficiency of forward mode depends as much on seeding strategy as on the differentiation rules themselves.