Skip to content

Sensitivity Analysis

Sensitivity analysis studies how changes in inputs affect the outputs of a system. In differential equations, optimization, simulation, and machine learning, the main object...

Sensitivity analysis studies how changes in inputs affect the outputs of a system. In differential equations, optimization, simulation, and machine learning, the main object is usually a function

y=f(x,θ), y = f(x,\theta),

where xx is an input or initial condition and θ\theta is a parameter vector. The central question is:

How does y change when x or θ changes? \text{How does } y \text{ change when } x \text{ or } \theta \text{ changes?}

Automatic differentiation provides a systematic way to compute these sensitivities with machine precision accuracy.

Sensitivity analysis appears in nearly every computational science domain:

DomainTypical sensitivity question
PhysicsHow does material stiffness affect deformation?
Climate modelingHow sensitive is temperature to emission parameters?
FinanceHow does option price change with volatility?
PharmacologyHow does dosage affect concentration dynamics?
Machine learningHow does loss change with model parameters?
RoboticsHow do control parameters affect trajectories?

The derivative itself becomes an object of scientific interpretation.

Local Sensitivity

The simplest form is local sensitivity. Suppose

y=f(θ). y = f(\theta).

A small perturbation dθd\theta produces a first-order change

dyJdθ, dy \approx J\, d\theta,

where

J=fθ J = \frac{\partial f}{\partial \theta}

is the Jacobian matrix.

If f:RmRnf : \mathbb{R}^m \to \mathbb{R}^n, then

JRn×m. J \in \mathbb{R}^{n \times m}.

Each entry measures a directional influence:

Jij=yiθj. J_{ij} = \frac{\partial y_i}{\partial \theta_j}.

Large magnitude means the output is sensitive to that parameter. Small magnitude means the output changes little under perturbation.

Forward-mode AD computes columns of JJ. Reverse-mode AD computes rows of JJ^\top.

Sensitivity as Linearization

Sensitivity analysis is fundamentally a linear approximation problem.

Around a reference parameter θ0\theta_0,

f(θ0+δθ)f(θ0)+J(θ0)δθ. f(\theta_0 + \delta\theta) \approx f(\theta_0) + J(\theta_0)\delta\theta.

The Jacobian acts as the local linear model of the system.

This viewpoint is important because many large systems are only tractable through local approximations. The full nonlinear model may be expensive, discontinuous, or analytically inaccessible. The linearized sensitivity often provides enough information for:

  • optimization,
  • uncertainty propagation,
  • control design,
  • stability analysis,
  • parameter fitting,
  • experimental design.

Forward Sensitivity Propagation

Suppose a program computes

vk+1=ϕk(vk,θ). v_{k+1} = \phi_k(v_k,\theta).

Forward sensitivity propagation augments each variable with its derivative:

Sk=vkθ. S_k = \frac{\partial v_k}{\partial \theta}.

Applying the chain rule gives

Sk+1=ϕkvkSk+ϕkθ. S_{k+1} = \frac{\partial \phi_k}{\partial v_k} S_k + \frac{\partial \phi_k}{\partial \theta}.

This is the computational foundation of forward-mode AD.

Conceptually:

QuantityMeaning
vkv_kProgram state
SkS_kSensitivity of state
Jacobian multiplicationPropagation rule

The sensitivity evolves alongside the state itself.

Directional Sensitivity

Often we do not need the full Jacobian. We only need sensitivity in a particular direction.

Given a perturbation vector uu,

Ju Ju

measures how outputs change along that direction.

This is called a Jacobian-vector product (JVP).

Forward-mode AD computes JVPs efficiently. The cost is usually close to one evaluation of the original function, independent of output dimension.

This matters for large-scale systems where:

m1, m \gg 1,

and forming the full Jacobian is impossible.

Examples include:

  • PDE simulations,
  • neural networks,
  • differentiable physics systems,
  • inverse problems with millions of variables.

Adjoint Sensitivity

Sometimes the output is scalar:

L=(f(θ)). L = \ell(f(\theta)).

Now we want

θL. \nabla_\theta L.

Reverse-mode AD propagates adjoints backward:

vˉk=Lvk. \bar{v}_k = \frac{\partial L}{\partial v_k}.

The reverse update applies transposed Jacobians:

vˉk=vˉk+1ϕkvk. \bar{v}_k = \bar{v}_{k+1} \frac{\partial \phi_k}{\partial v_k}.

This computes vector-Jacobian products:

vJ. v^\top J.

Adjoint sensitivity methods are essential when:

SituationPreferred method
Few parameters, many outputsForward mode
Many parameters, scalar lossReverse mode

This distinction determines the architecture of most scientific AD systems.

Continuous Sensitivity Analysis

For differential equations,

dydt=f(y,t,θ), \frac{dy}{dt}=f(y,t,\theta),

the sensitivity matrix

S(t)=y(t)θ S(t)=\frac{\partial y(t)}{\partial \theta}

satisfies

dSdt=fyS+fθ. \frac{dS}{dt} = f_y S + f_\theta.

This converts the derivative problem into another differential equation.

The pair

(y,S) (y,S)

is integrated simultaneously.

The size of SS is

dim(y)×dim(θ). \dim(y)\times \dim(\theta).

This becomes expensive for large parameter spaces, which motivates adjoint methods.

Global Sensitivity

Local derivatives only describe infinitesimal perturbations. Many systems are highly nonlinear, chaotic, or discontinuous. In such cases, local derivatives may fail to capture global behavior.

Global sensitivity analysis studies variation across a parameter distribution.

Suppose parameters are random variables:

θp(θ). \theta \sim p(\theta).

We may ask:

  • which parameters dominate variance?
  • which interactions matter?
  • which parameters are effectively irrelevant?

Classical techniques include:

MethodPurpose
Monte Carlo samplingEstimate output statistics
Sobol indicesVariance decomposition
Morris methodScreening influential variables
Polynomial chaosSpectral uncertainty expansion

Automatic differentiation still plays a role. Many global methods require repeated local derivatives, surrogate models, or gradient-based sampling.

Condition Numbers

Sensitivity is closely related to conditioning.

Suppose

y=f(x). y=f(x).

The condition number measures relative amplification of perturbations:

κ(x)=xf(x)dfdx. \kappa(x) = \left| \frac{x}{f(x)} \frac{df}{dx} \right|.

Large condition numbers imply instability:

small input errorlarge output error. \text{small input error} \to \text{large output error}.

For matrix problems, conditioning often depends on singular values.

For example, solving

Ax=b Ax=b

has condition number

κ(A)=AA1. \kappa(A) = \|A\| \|A^{-1}\|.

Sensitivity analysis therefore connects directly to numerical stability.

Sensitivity in Optimization

Optimization algorithms are sensitivity systems.

Given a loss

L(θ), L(\theta),

gradient descent updates

θk+1=θkηθL. \theta_{k+1} = \theta_k - \eta \nabla_\theta L.

The gradient itself is the sensitivity of the loss to parameter perturbations.

Second-order methods use Hessians:

H=2Lθ2. H = \frac{\partial^2 L}{\partial \theta^2}.

The Hessian describes sensitivity of the gradient.

OrderObjectMeaning
FirstGradientLinear sensitivity
SecondHessianCurvature sensitivity
HigherHigher tensorsNonlinear interaction structure

Automatic differentiation makes these objects computationally accessible.

Sensitivity of Fixed Points

Many systems compute fixed points:

x=g(x,θ). x^* = g(x^*,\theta).

Examples:

  • nonlinear solvers,
  • equilibrium systems,
  • optimization layers,
  • implicit neural networks.

Differentiating through the fixed point gives

dx=gxdx+gθdθ. dx^* = \frac{\partial g}{\partial x}dx^* + \frac{\partial g}{\partial \theta}d\theta.

Rearranging,

(Igx)dx=gθdθ. \left( I - \frac{\partial g}{\partial x} \right) dx^* = \frac{\partial g}{\partial \theta}d\theta.

Thus

xθ=(Igx)1gθ. \frac{\partial x^*}{\partial \theta} = \left( I-g_x \right)^{-1} g_\theta.

This is implicit differentiation. It avoids differentiating through every iteration of the solver.

Chaotic Systems

Sensitivity analysis becomes difficult in chaotic systems.

A small perturbation may grow exponentially:

δy(t)eλtδy(0), \|\delta y(t)\| \approx e^{\lambda t}\|\delta y(0)\|,

where λ\lambda is a Lyapunov exponent.

In such systems:

  • gradients may explode,
  • local sensitivities become unreliable,
  • long-term predictions become unstable.

Examples include:

  • weather systems,
  • turbulent fluids,
  • orbital mechanics,
  • nonlinear dynamical systems.

Even exact derivatives may become numerically useless after long horizons.

This is not an AD failure. It is a property of the underlying dynamics.

Sensitivity and Identifiability

A parameter may exist mathematically but still be unidentifiable from observations.

Suppose two parameters always appear together:

f(x,θ1,θ2)=f(x,θ1+θ2). f(x,\theta_1,\theta_2) = f(x,\theta_1+\theta_2).

Then individual sensitivities cannot distinguish them.

The Jacobian becomes rank deficient:

rank(J)<m. \operatorname{rank}(J) < m.

Consequences include:

  • unstable parameter estimates,
  • ill-conditioned optimization,
  • large uncertainty,
  • non-unique solutions.

Sensitivity analysis therefore reveals structural properties of models, not only numerical derivatives.

Sparse Sensitivities

Large scientific systems often have sparse dependence structure.

For example:

yi depends only on nearby variables. y_i \text{ depends only on nearby variables}.

Then the Jacobian contains mostly zeros.

StructureBenefit
Sparse JacobianReduced memory
Sparse HessianFaster second-order methods
Block structureParallel computation
Local couplingEfficient differentiation

Scientific AD systems exploit sparsity aggressively.

Methods include:

  • graph coloring,
  • compressed Jacobians,
  • sparse linear algebra,
  • local stencil propagation.

Without sparsity exploitation, many large simulations become computationally infeasible.

Sensitivity in Machine Learning

Deep learning is a sensitivity propagation problem.

Each layer computes

hk+1=ϕk(hk,θk). h_{k+1} = \phi_k(h_k,\theta_k).

Backpropagation computes

Lθk. \frac{\partial L}{\partial \theta_k}.

This is reverse-mode sensitivity analysis through a layered computational graph.

Problems such as vanishing gradients arise because repeated Jacobian products shrink:

kJk0. \prod_k J_k \to 0.

Exploding gradients occur when repeated products grow rapidly:

kJk. \prod_k J_k \to \infty.

Training stability is therefore a sensitivity conditioning problem.

Practical Interpretation

Sensitivity values must be interpreted carefully.

A large derivative does not automatically mean a parameter is important. Units matter.

For example:

yθ \frac{\partial y}{\partial \theta}

depends on scaling. Normalized sensitivities are often preferable:

θyyθ. \frac{\theta}{y} \frac{\partial y}{\partial \theta}.

These measure relative influence.

Interpretation also depends on:

  • parameter ranges,
  • noise levels,
  • model assumptions,
  • numerical stability,
  • identifiability.

Sensitivity analysis is therefore both numerical and statistical.

Summary

Sensitivity analysis studies how perturbations propagate through computational systems. Automatic differentiation provides the machinery for computing these sensitivities accurately and efficiently.

Forward-mode methods propagate perturbations directly. Reverse-mode methods propagate influence backward from outputs. Continuous sensitivity equations extend the same ideas to dynamical systems. Implicit differentiation handles equilibrium and solver-defined systems.

In large scientific applications, sensitivity analysis is not merely a derivative computation technique. It becomes a structural tool for understanding stability, uncertainty, conditioning, identifiability, and control of complex systems.