Skip to content

Higher-Dimensional Tangent Spaces

So far, forward mode has propagated a single tangent direction:

So far, forward mode has propagated a single tangent direction:

x(x,x˙), x \mapsto (x, \dot{x}),

where

x˙R. \dot{x} \in \mathbb{R}.

This computes one directional derivative:

Jf(x)v. J_f(x)v.

However, many applications require several directional derivatives simultaneously. Instead of propagating one tangent scalar, we can propagate an entire tangent vector space.

Each variable becomes

x(x,x˙),x˙Rk. x \mapsto (x, \dot{x}), \qquad \dot{x} \in \mathbb{R}^k.

Now every variable carries kk tangent components at once.

The resulting computation produces

Jf(x)V, J_f(x)V,

where

VRn×k V \in \mathbb{R}^{n \times k}

contains kk seed directions.

This is called higher-dimensional forward mode, vector forward mode, or multidirectional forward mode.

Tangent spaces

For a function

f:RnRm, f : \mathbb{R}^n \to \mathbb{R}^m,

the derivative at xx is the linear map

Jf(x):RnRm. J_f(x) : \mathbb{R}^n \to \mathbb{R}^m.

The vector space

Rn \mathbb{R}^n

acts as the tangent space at xx. A tangent vector represents an infinitesimal perturbation direction.

Scalar forward mode propagates one tangent vector:

vRn. v \in \mathbb{R}^n.

Higher-dimensional forward mode propagates a collection of tangent vectors simultaneously:

v1,v2,,vk. v_1, v_2, \ldots, v_k.

Equivalently, it propagates a tangent matrix

V=[v1v2vk]. V = \begin{bmatrix} | & | & & | \\ v_1 & v_2 & \cdots & v_k \\ | & | & & | \end{bmatrix}.

The output is

Jf(x)V. J_f(x)V.

Each column of the result is one JVP.

From scalar tangents to vector tangents

In scalar forward mode:

xi(xi,x˙i),x˙iR. x_i \mapsto (x_i, \dot{x}_i), \qquad \dot{x}_i \in \mathbb{R}.

In vector forward mode:

xi(xi,x˙i),x˙iRk. x_i \mapsto (x_i, \dot{x}_i), \qquad \dot{x}_i \in \mathbb{R}^k.

For addition:

z=x+y, z = x + y,

the tangent rule becomes

z˙=x˙+y˙, \dot{z} = \dot{x} + \dot{y},

where all tangents are vectors in Rk\mathbb{R}^k.

For multiplication:

z=xy, z = xy,

the tangent rule becomes

z˙=yx˙+xy˙. \dot{z} = y\dot{x} + x\dot{y}.

The scalars xx and yy multiply every tangent component.

Thus each primitive lifts naturally from scalar tangents to vector tangents.

Example: two tangent directions

Consider

f(x,y)=[xyx+y]. f(x,y) = \begin{bmatrix} xy \\ x+y \end{bmatrix}.

Suppose we want derivatives in two directions:

v1=[10],v2=[01]. v_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \qquad v_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}.

These are the standard basis directions.

The tangent matrix is

V=[1001]. V = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.

Seed the inputs:

x˙=[1,0],y˙=[0,1]. \dot{x} = [1,0], \qquad \dot{y} = [0,1].

Now propagate.

First output:

f1=xy. f_1 = xy.

Its tangent:

f˙1=yx˙+xy˙. \dot{f}_1 = y\dot{x} + x\dot{y}.

Substitute:

f˙1=[y,x]. \dot{f}_1 = [y, x].

Second output:

f2=x+y. f_2 = x+y.

Its tangent:

f˙2=x˙+y˙=[1,1]. \dot{f}_2 = \dot{x} + \dot{y} = [1,1].

Collect results:

Jf(x,y)V=[yx11]. J_f(x,y)V = \begin{bmatrix} y & x \\ 1 & 1 \end{bmatrix}.

Because the seed matrix was the identity, the output equals the full Jacobian.

Full Jacobians in one pass

If

V=In, V = I_n,

then

Jf(x)V=Jf(x). J_f(x)V = J_f(x).

Thus vector forward mode can compute the full Jacobian in one pass.

However, every variable now carries an nn-dimensional tangent vector. If the input dimension is large, this becomes expensive.

The memory cost becomes

O(nMf), O(nM_f),

and the arithmetic cost becomes

O(nCf). O(nC_f).

So this strategy is practical only when nn is moderate or when the Jacobian has exploitable structure.

Matrix interpretation

Forward propagation with kk-dimensional tangents can be viewed as propagating a local linear map.

Suppose a primitive operation has local Jacobian

A. A.

Instead of multiplying a vector,

Av, Av,

we now multiply a matrix:

AV. AV.

Each primitive therefore propagates several tangent directions simultaneously.

The entire computation graph becomes a sequence of matrix propagations:

VA1VA2A1VJf(x)V. V \mapsto A_1V \mapsto A_2A_1V \mapsto \cdots \mapsto J_f(x)V.

Scalar forward mode is the special case k=1k=1.

SIMD and batched execution

Vector forward mode maps naturally onto modern hardware.

If tangent vectors are packed into contiguous arrays, many tangent operations become vectorizable:

OperationSIMD behavior
additionvector add
multiplicationfused vector multiply-add
transcendental functionsbatched evaluation
tensor primitivesbatched kernels

For example, if

x˙R4, \dot{x} \in \mathbb{R}^4,

a CPU SIMD register may compute all four tangent components simultaneously.

On GPUs, tangent dimensions can often be batched across tensor operations.

Thus higher-dimensional tangent spaces may achieve better hardware utilization than repeated scalar forward passes.

Sparse tangent spaces

Large tangent vectors are often sparse.

Suppose a function depends locally on its inputs. Many tangent components remain zero throughout the computation.

Example:

f(x1,,xn)=xixj. f(x_1,\ldots,x_n) = x_i x_j.

Only tangent components for xix_i and xjx_j contribute.

Instead of storing dense tangent vectors, a sparse representation stores only nonzero entries:

type SparseTangent struct {
    Indices []int
    Values  []float64
}

This can reduce memory and arithmetic cost dramatically for sparse derivative structures.

Sparse forward mode is especially important for:

  • sparse Jacobians,
  • PDE systems,
  • graph computations,
  • circuit simulation,
  • large optimization problems.

Block tangent propagation

Some systems use block tangents instead of individual tangent vectors.

Suppose variables are partitioned into blocks:

x=(x(1),x(2),). x = (x^{(1)}, x^{(2)}, \ldots).

Each block carries its own tangent subspace.

This gives block-Jacobian propagation:

Jf(x)=[J11J12J21J22]. J_f(x) = \begin{bmatrix} J_{11} & J_{12} \\ J_{21} & J_{22} \end{bmatrix}.

Block methods improve locality and reduce overhead when derivatives naturally cluster into subsystems.

Examples:

DomainNatural blocks
roboticsjoints or limbs
PDE solversspatial regions
graphicsobject groups
optimizationparameter groups
databasespartitioned relations

Hyper-dual interpretation

Higher-dimensional tangents can also be expressed algebraically.

Scalar forward mode uses dual numbers:

x+ϵx˙,ϵ2=0. x + \epsilon \dot{x}, \qquad \epsilon^2 = 0.

Vector forward mode introduces multiple nilpotent generators:

x+i=1kϵix˙i, x + \sum_{i=1}^{k} \epsilon_i \dot{x}_i,

with

ϵi2=0,ϵiϵj=0. \epsilon_i^2 = 0, \qquad \epsilon_i\epsilon_j = 0.

Each generator corresponds to one tangent direction.

This algebra represents a first-order tangent space with kk independent basis directions.

More advanced systems relax the cross-term condition and allow:

ϵiϵj0. \epsilon_i\epsilon_j \ne 0.

Those structures lead to hyper-dual numbers and higher-order differentiation.

Tangent dimension explosion

A major limitation of vector forward mode is tangent growth.

If every variable carries an nn-dimensional tangent vector, memory traffic can dominate runtime.

Suppose the primal program stores:

108 10^8

floating point values.

If each value carries a tangent vector of dimension 10001000, the tangent storage becomes enormous.

This causes:

ProblemEffect
cache pressurepoor locality
memory bandwidthbottleneck
register pressurespilling
GPU occupancy lossreduced parallel efficiency
tensor expansionlarge intermediate allocations

Therefore large tangent dimensions require careful engineering.

Compression techniques

Several techniques reduce tangent overhead.

Directional batching

Instead of propagating all directions simultaneously, split them into batches:

V=[V1V2]. V = [V_1 \mid V_2 \mid \cdots].

Each pass computes only a subset of tangent directions.

Sparse compression

Store only active tangent components.

Graph coloring

Exploit Jacobian sparsity to combine independent seed directions into fewer passes.

If two columns of the Jacobian never contribute to the same output row, they can share a seed vector.

This reduces the number of required tangent dimensions.

Low-rank approximation

Some systems approximate tangent spaces using low-rank projections:

VUVr. V \approx UV_r.

This is useful when sensitivities lie near a low-dimensional manifold.

Nested tangent spaces

Higher-dimensional tangent spaces compose naturally.

Suppose each tangent component is itself a dual number:

(x+ϵ1a)+ϵ2(b+ϵ1c). (x + \epsilon_1 a) + \epsilon_2(b + \epsilon_1 c).

This structure propagates higher-order derivatives.

Nested forward mode uses tangent spaces of tangent spaces.

Examples:

NestingResult
dual of dualsecond derivatives
vector dual of dualHessian-vector products
nested vector dualshigher-order tensors

This compositional structure is one reason forward mode is mathematically elegant.

Tangent spaces on manifolds

In Euclidean space, tangents are ordinary vectors.

For manifolds, tangent spaces become geometric objects attached to points.

Example:

ManifoldTangent space
sphere S2S^2tangent plane
rotation group SO(3)SO(3)skew-symmetric matrices
probability simplexconstrained vectors

Forward mode generalizes naturally if primitives define how tangent vectors transform between manifolds.

This becomes important in:

  • robotics,
  • computer graphics,
  • geometric optimization,
  • physics simulation,
  • Lie-group dynamics.

Summary

Higher-dimensional tangent spaces generalize forward mode from one directional derivative to many simultaneous directional derivatives. Each variable carries a tangent vector rather than a scalar tangent. The resulting computation propagates a matrix of directions through the computation graph:

Jf(x)V. J_f(x)V.

This allows efficient batched JVPs, full Jacobian construction for moderate input dimensions, and exploitation of hardware vectorization and derivative sparsity. The main challenge is tangent dimension growth, which increases arithmetic cost, memory usage, and bandwidth pressure.