Skip to content

Case Studies

Forward mode automatic differentiation appears in many numerical systems where directional derivatives, local sensitivities, or small parameter sets are important. This...

Forward mode automatic differentiation appears in many numerical systems where directional derivatives, local sensitivities, or small parameter sets are important. This chapter examines concrete applications and analyzes why forward mode fits their computational structure.

The emphasis is not merely on using derivatives, but on how tangent propagation integrates with the execution model of each system.

Sensitivity Analysis in Scientific Computing

Many scientific simulations depend on a small set of physical parameters:

θ=(θ1,,θk). \theta = (\theta_1,\ldots,\theta_k).

The simulation output may be extremely large:

f:RkRm,mk. f : \mathbb{R}^k \to \mathbb{R}^m, \qquad m \gg k.

Examples:

SimulationParameters
climate modelsdiffusion coefficients
fluid simulationviscosity
orbital dynamicsgravitational constants
chemical systemsreaction rates
epidemiological modelsinfection parameters

The key property is:

km. k \ll m.

Forward mode is efficient because one tangent pass computes sensitivities with respect to one parameter direction.

Example: ODE simulation

Consider the ODE:

dydt=θy,y(0)=1. \frac{dy}{dt} = \theta y, \qquad y(0)=1.

The solution is:

y(t)=eθt. y(t)=e^{\theta t}.

We want sensitivity with respect to θ\theta:

yθ. \frac{\partial y}{\partial \theta}.

Instead of symbolic differentiation, propagate tangents directly through the numerical integrator.

Seed:

θ˙=1. \dot{\theta}=1.

Euler update:

yn+1=yn+Δtθyn. y_{n+1}=y_n+\Delta t \,\theta y_n.

Tangent update:

y˙n+1=y˙n+Δt(θ˙yn+θy˙n). \dot{y}_{n+1} = \dot{y}_n + \Delta t ( \dot{\theta}y_n + \theta\dot{y}_n ).

The tangent recurrence evolves alongside the primal recurrence.

This approach generalizes to very large ODE and PDE systems.

Newton and Krylov Solvers

Many nonlinear solvers require Jacobian-vector products rather than explicit Jacobians.

Suppose:

F(x)=0. F(x)=0.

Newton methods solve:

JF(x)Δx=F(x). J_F(x)\Delta x = -F(x).

Krylov solvers such as GMRES often require only repeated evaluations of:

JF(x)v. J_F(x)v.

Forward mode computes these products naturally.

Jacobian-free Newton-Krylov

Instead of forming the Jacobian explicitly:

  1. choose direction vv,
  2. seed tangent x˙=v\dot{x}=v,
  3. run forward mode,
  4. obtain:
F˙=JF(x)v. \dot{F}=J_F(x)v.

This avoids explicit matrix construction entirely.

Benefits:

BenefitExplanation
lower memoryno Jacobian storage
matrix-free methodsoperator-only access
easier parallelizationdirectional evaluations
sparse compatibilitylocal tangent propagation

Large PDE solvers frequently use this structure.

Robotics and Kinematics

Robot kinematics naturally form chained transformations.

Suppose a robot arm has joint angles:

θ1,,θn. \theta_1,\ldots,\theta_n.

Forward kinematics computes end-effector position:

p=f(θ). p=f(\theta).

The Jacobian:

Jf(θ) J_f(\theta)

maps joint velocities into end-effector velocities.

Forward mode matches this physical interpretation directly.

Tangent interpretation

Seed:

θ˙=(ω1,,ωn). \dot{\theta} = (\omega_1,\ldots,\omega_n).

Then:

p˙=Jf(θ)θ˙ \dot{p} = J_f(\theta)\dot{\theta}

is the resulting end-effector velocity.

The tangent is literally the instantaneous physical motion.

Chain structure

Robot transforms compose sequentially:

T=T1T2Tn. T = T_1T_2\cdots T_n.

Forward tangent propagation becomes:

T˙=T˙1T2Tn+T1T˙2Tn+. \dot{T} = \dot{T}_1T_2\cdots T_n + T_1\dot{T}_2\cdots T_n +\cdots.

This aligns naturally with forward-mode accumulation.

Computer Graphics and Rendering

Graphics systems frequently optimize low-dimensional parameters controlling large outputs.

Examples:

ParametersOutputs
camera poseimage pixels
lighting coefficientsrendered image
material propertiesshading fields
skeletal jointsmesh deformation

Again:

km. k \ll m.

Forward mode efficiently propagates parameter perturbations into rendered outputs.

Differentiable transformations

Suppose a mesh vertex undergoes affine transformation:

p=Rp+t. p' = Rp+t.

Forward mode propagates tangents:

p˙=R˙p+Rp˙+t˙. \dot{p}' = \dot{R}p + R\dot{p} + \dot{t}.

If only camera parameters vary, most scene data has zero tangent.

Sparse tangent propagation becomes highly effective.

Optimization with Few Parameters

Some optimization problems have large outputs but few parameters.

Example:

f:R3R106. f : \mathbb{R}^3 \to \mathbb{R}^{10^6}.

Suppose:

  • 3 calibration parameters,
  • one million measurements.

Forward mode computes the entire Jacobian in only three passes.

Reverse mode would instead require propagating adjoints from every output.

Forward mode is therefore preferable.

Applications:

DomainParameters
camera calibrationlens coefficients
physical fittingmaterial constants
system identificationlow-dimensional models
experimental tuningcontrol parameters

Automatic Differentiation Inside Solvers

Many numerical algorithms are themselves iterative programs.

Example:

for k in 1..N:
    x = g(x)

Forward mode differentiates through the iteration directly.

Fixed-point iteration

Suppose:

xk+1=g(xk,θ). x_{k+1}=g(x_k,\theta).

The tangent recurrence becomes:

x˙k+1=gxx˙k+gθθ˙. \dot{x}_{k+1} = \frac{\partial g}{\partial x}\dot{x}_k + \frac{\partial g}{\partial \theta}\dot{\theta}.

This computes parameter sensitivity during convergence.

Applications:

AlgorithmSensitivity
nonlinear solversparameter dependence
iterative PDE solverscoefficient variation
simulation loopscontrol perturbations
optimization iterationshyperparameter effects

Circuit Simulation

Electronic circuits naturally produce sparse derivative structures.

Each component interacts only locally.

Example resistor equation:

I=VR. I=\frac{V}{R}.

Forward tangent:

I˙=V˙RVR2R˙. \dot{I} = \frac{\dot{V}}{R} - \frac{V}{R^2}\dot{R}.

Large circuits contain millions of local equations. Sparse forward mode propagates only locally active derivatives.

Graph coloring and compressed seeding are heavily used in circuit simulation AD systems.

Computational Fluid Dynamics

Fluid simulations involve massive sparse systems.

Discretized Navier-Stokes equations often have stencil structure:

uit+1=F(ui1t,uit,ui+1t). u_i^{t+1} = F(u_{i-1}^t,u_i^t,u_{i+1}^t).

Each update depends only on neighboring cells.

The Jacobian is sparse and structured.

Forward mode efficiently propagates sensitivities such as:

ParameterMeaning
viscosityturbulence sensitivity
boundary conditionsflow response
forcing termspressure response

Sparse seeding dramatically reduces tangent dimension.

Neural Ordinary Differential Equations

Neural ODEs define dynamics:

dhdt=f(h,t,θ). \frac{dh}{dt}=f(h,t,\theta).

Forward mode propagates tangent dynamics:

dh˙dt=fhh˙+fθθ˙. \frac{d\dot{h}}{dt} = \frac{\partial f}{\partial h}\dot{h} + \frac{\partial f}{\partial \theta}\dot{\theta}.

This is called the variational equation.

Forward mode is efficient when:

  • parameter dimension is small,
  • only selected sensitivities are needed,
  • directional perturbations matter more than full gradients.

Implicit Differentiation

Consider an implicitly defined system:

F(x,θ)=0. F(x,\theta)=0.

Differentiate:

Fxx˙+Fθθ˙=0. \frac{\partial F}{\partial x}\dot{x} + \frac{\partial F}{\partial \theta}\dot{\theta} = 0.

Rearrange:

x˙=(Fx)1Fθθ˙. \dot{x} = - \left( \frac{\partial F}{\partial x} \right)^{-1} \frac{\partial F}{\partial \theta}\dot{\theta}.

Forward mode computes:

Fθθ˙ \frac{\partial F}{\partial \theta}\dot{\theta}

directly as a JVP.

This is fundamental in:

  • constrained optimization,
  • equilibrium models,
  • differentiable physics,
  • differentiable optimization layers.

Probabilistic Programming

Probabilistic systems often involve small parameter perturbations.

Suppose:

logp(xθ) \log p(x|\theta)

depends on a moderate number of parameters.

Forward mode efficiently computes:

Jlogp(θ)v. J_{\log p}(\theta)v.

Applications:

TaskUse
Fisher informationdirectional curvature
sensitivity analysisposterior perturbation
variational inferencelocal updates
uncertainty propagationparameter effects

Forward mode integrates well with sampling-based systems because tangent propagation follows the primal execution path.

Real-Time Systems

Forward mode has low memory overhead because it does not require a backward pass.

This is important in:

  • embedded systems,
  • robotics controllers,
  • streaming simulation,
  • online optimization,
  • real-time estimation.

Reverse mode often requires storing intermediate states. Forward mode can propagate tangents online during execution.

Streaming example

Suppose sensor updates arrive continuously:

while true:
    state = update(state, sensor)

Forward mode updates tangents incrementally:

state, tangent = update(state, tangent, sensor)

No global tape is required.

Differentiable Databases

Some differentiable query systems propagate sensitivities through relational operations.

Suppose:

Qθ(D) Q_\theta(D)

depends on tunable parameters.

Forward mode propagates tangent information through:

  • joins,
  • aggregations,
  • ranking functions,
  • retrieval scores.

Example:

si=θxi. s_i = \theta^\top x_i.

Tangent:

s˙i=θ˙xi. \dot{s}_i = \dot{\theta}^\top x_i.

This enables:

  • sensitivity-aware ranking,
  • differentiable retrieval,
  • query optimization,
  • gradient-guided search.

Hyperparameter Sensitivity

Training systems often study sensitivity to hyperparameters:

HyperparameterExample
learning rateoptimizer stability
regularizationgeneralization
scheduler constantsconvergence speed
physical coefficientssimulation behavior

Forward mode efficiently computes directional effects of small hyperparameter perturbations without recomputing the entire optimization process separately.

When Forward Mode Fails

Forward mode becomes inefficient when input dimension is extremely large.

Example:

f:R109R. f : \mathbb{R}^{10^9} \to \mathbb{R}.

Computing the full gradient by forward mode requires approximately:

109 10^9

forward passes.

This is why deep neural network training uses reverse mode.

Forward mode also struggles when:

  • tangent dimensions become dense,
  • memory bandwidth dominates,
  • tangent vectors exceed cache capacity,
  • derivative structure lacks locality.

Hybrid Systems

Modern AD systems often combine modes.

Examples:

CombinationUse
forward-over-reverseHessian-vector products
reverse-over-forwardJacobian rows
sparse-forward-over-reversestructured second derivatives
block-forward + reversemixed tensor systems

Forward mode is therefore rarely isolated. It acts as a building block inside larger differentiation systems.

Summary

Forward mode automatic differentiation is especially effective when:

  • the number of input directions is small,
  • directional sensitivities are sufficient,
  • Jacobians are sparse,
  • memory efficiency matters,
  • online propagation is needed.

Its natural computation is the Jacobian-vector product:

Jf(x)v. J_f(x)v.

This operator form appears throughout scientific computing, robotics, simulation, optimization, graphics, probabilistic systems, and differentiable infrastructure.