So far, forward mode has propagated a single tangent direction:
where
This computes one directional derivative:
However, many applications require several directional derivatives simultaneously. Instead of propagating one tangent scalar, we can propagate an entire tangent vector space.
Each variable becomes
Now every variable carries tangent components at once.
The resulting computation produces
where
contains seed directions.
This is called higher-dimensional forward mode, vector forward mode, or multidirectional forward mode.
Tangent spaces
For a function
the derivative at is the linear map
The vector space
acts as the tangent space at . A tangent vector represents an infinitesimal perturbation direction.
Scalar forward mode propagates one tangent vector:
Higher-dimensional forward mode propagates a collection of tangent vectors simultaneously:
Equivalently, it propagates a tangent matrix
The output is
Each column of the result is one JVP.
From scalar tangents to vector tangents
In scalar forward mode:
In vector forward mode:
For addition:
the tangent rule becomes
where all tangents are vectors in .
For multiplication:
the tangent rule becomes
The scalars and multiply every tangent component.
Thus each primitive lifts naturally from scalar tangents to vector tangents.
Example: two tangent directions
Consider
Suppose we want derivatives in two directions:
These are the standard basis directions.
The tangent matrix is
Seed the inputs:
Now propagate.
First output:
Its tangent:
Substitute:
Second output:
Its tangent:
Collect results:
Because the seed matrix was the identity, the output equals the full Jacobian.
Full Jacobians in one pass
If
then
Thus vector forward mode can compute the full Jacobian in one pass.
However, every variable now carries an -dimensional tangent vector. If the input dimension is large, this becomes expensive.
The memory cost becomes
and the arithmetic cost becomes
So this strategy is practical only when is moderate or when the Jacobian has exploitable structure.
Matrix interpretation
Forward propagation with -dimensional tangents can be viewed as propagating a local linear map.
Suppose a primitive operation has local Jacobian
Instead of multiplying a vector,
we now multiply a matrix:
Each primitive therefore propagates several tangent directions simultaneously.
The entire computation graph becomes a sequence of matrix propagations:
Scalar forward mode is the special case .
SIMD and batched execution
Vector forward mode maps naturally onto modern hardware.
If tangent vectors are packed into contiguous arrays, many tangent operations become vectorizable:
| Operation | SIMD behavior |
|---|---|
| addition | vector add |
| multiplication | fused vector multiply-add |
| transcendental functions | batched evaluation |
| tensor primitives | batched kernels |
For example, if
a CPU SIMD register may compute all four tangent components simultaneously.
On GPUs, tangent dimensions can often be batched across tensor operations.
Thus higher-dimensional tangent spaces may achieve better hardware utilization than repeated scalar forward passes.
Sparse tangent spaces
Large tangent vectors are often sparse.
Suppose a function depends locally on its inputs. Many tangent components remain zero throughout the computation.
Example:
Only tangent components for and contribute.
Instead of storing dense tangent vectors, a sparse representation stores only nonzero entries:
type SparseTangent struct {
Indices []int
Values []float64
}This can reduce memory and arithmetic cost dramatically for sparse derivative structures.
Sparse forward mode is especially important for:
- sparse Jacobians,
- PDE systems,
- graph computations,
- circuit simulation,
- large optimization problems.
Block tangent propagation
Some systems use block tangents instead of individual tangent vectors.
Suppose variables are partitioned into blocks:
Each block carries its own tangent subspace.
This gives block-Jacobian propagation:
Block methods improve locality and reduce overhead when derivatives naturally cluster into subsystems.
Examples:
| Domain | Natural blocks |
|---|---|
| robotics | joints or limbs |
| PDE solvers | spatial regions |
| graphics | object groups |
| optimization | parameter groups |
| databases | partitioned relations |
Hyper-dual interpretation
Higher-dimensional tangents can also be expressed algebraically.
Scalar forward mode uses dual numbers:
Vector forward mode introduces multiple nilpotent generators:
with
Each generator corresponds to one tangent direction.
This algebra represents a first-order tangent space with independent basis directions.
More advanced systems relax the cross-term condition and allow:
Those structures lead to hyper-dual numbers and higher-order differentiation.
Tangent dimension explosion
A major limitation of vector forward mode is tangent growth.
If every variable carries an -dimensional tangent vector, memory traffic can dominate runtime.
Suppose the primal program stores:
floating point values.
If each value carries a tangent vector of dimension , the tangent storage becomes enormous.
This causes:
| Problem | Effect |
|---|---|
| cache pressure | poor locality |
| memory bandwidth | bottleneck |
| register pressure | spilling |
| GPU occupancy loss | reduced parallel efficiency |
| tensor expansion | large intermediate allocations |
Therefore large tangent dimensions require careful engineering.
Compression techniques
Several techniques reduce tangent overhead.
Directional batching
Instead of propagating all directions simultaneously, split them into batches:
Each pass computes only a subset of tangent directions.
Sparse compression
Store only active tangent components.
Graph coloring
Exploit Jacobian sparsity to combine independent seed directions into fewer passes.
If two columns of the Jacobian never contribute to the same output row, they can share a seed vector.
This reduces the number of required tangent dimensions.
Low-rank approximation
Some systems approximate tangent spaces using low-rank projections:
This is useful when sensitivities lie near a low-dimensional manifold.
Nested tangent spaces
Higher-dimensional tangent spaces compose naturally.
Suppose each tangent component is itself a dual number:
This structure propagates higher-order derivatives.
Nested forward mode uses tangent spaces of tangent spaces.
Examples:
| Nesting | Result |
|---|---|
| dual of dual | second derivatives |
| vector dual of dual | Hessian-vector products |
| nested vector duals | higher-order tensors |
This compositional structure is one reason forward mode is mathematically elegant.
Tangent spaces on manifolds
In Euclidean space, tangents are ordinary vectors.
For manifolds, tangent spaces become geometric objects attached to points.
Example:
| Manifold | Tangent space |
|---|---|
| sphere | tangent plane |
| rotation group | skew-symmetric matrices |
| probability simplex | constrained vectors |
Forward mode generalizes naturally if primitives define how tangent vectors transform between manifolds.
This becomes important in:
- robotics,
- computer graphics,
- geometric optimization,
- physics simulation,
- Lie-group dynamics.
Summary
Higher-dimensional tangent spaces generalize forward mode from one directional derivative to many simultaneous directional derivatives. Each variable carries a tangent vector rather than a scalar tangent. The resulting computation propagates a matrix of directions through the computation graph:
This allows efficient batched JVPs, full Jacobian construction for moderate input dimensions, and exploitation of hardware vectorization and derivative sparsity. The main challenge is tangent dimension growth, which increases arithmetic cost, memory usage, and bandwidth pressure.