Sensitivity Analysis

Sensitivity analysis studies how changes in inputs affect the outputs of a system. In differential equations, optimization, simulation, and machine learning, the main object is usually a function

y = f(x,\theta),

where $x$ is an input or initial condition and $\theta$ is a parameter vector. The central question is:

\text{How does } y \text{ change when } x \text{ or } \theta \text{ changes?}

Automatic differentiation provides a systematic way to compute these sensitivities with machine precision accuracy.

Sensitivity analysis appears in nearly every computational science domain:

Domain	Typical sensitivity question
Physics	How does material stiffness affect deformation?
Climate modeling	How sensitive is temperature to emission parameters?
Finance	How does option price change with volatility?
Pharmacology	How does dosage affect concentration dynamics?
Machine learning	How does loss change with model parameters?
Robotics	How do control parameters affect trajectories?

The derivative itself becomes an object of scientific interpretation.

Local Sensitivity

The simplest form is local sensitivity. Suppose

y = f(\theta).

A small perturbation $d\theta$ produces a first-order change

dy \approx J\, d\theta,

where

J = \frac{\partial f}{\partial \theta}

is the Jacobian matrix.

If $f : \mathbb{R}^m \to \mathbb{R}^n$ , then

J \in \mathbb{R}^{n \times m}.

Each entry measures a directional influence:

J_{ij} = \frac{\partial y_i}{\partial \theta_j}.

Large magnitude means the output is sensitive to that parameter. Small magnitude means the output changes little under perturbation.

Forward-mode AD computes columns of $J$ . Reverse-mode AD computes rows of $J^\top$ .

Sensitivity as Linearization

Sensitivity analysis is fundamentally a linear approximation problem.

Around a reference parameter $\theta_0$ ,

f(\theta_0 + \delta\theta) \approx f(\theta_0) + J(\theta_0)\delta\theta.

The Jacobian acts as the local linear model of the system.

This viewpoint is important because many large systems are only tractable through local approximations. The full nonlinear model may be expensive, discontinuous, or analytically inaccessible. The linearized sensitivity often provides enough information for:

optimization,
uncertainty propagation,
control design,
stability analysis,
parameter fitting,
experimental design.

Forward Sensitivity Propagation

Suppose a program computes

v_{k+1} = \phi_k(v_k,\theta).

Forward sensitivity propagation augments each variable with its derivative:

S_k = \frac{\partial v_k}{\partial \theta}.

Applying the chain rule gives

S_{k+1} = \frac{\partial \phi_k}{\partial v_k} S_k + \frac{\partial \phi_k}{\partial \theta}.

This is the computational foundation of forward-mode AD.

Conceptually:

Quantity	Meaning
$v_k$	Program state
$S_k$	Sensitivity of state
Jacobian multiplication	Propagation rule

The sensitivity evolves alongside the state itself.

Directional Sensitivity

Often we do not need the full Jacobian. We only need sensitivity in a particular direction.

Given a perturbation vector $u$ ,

Ju

measures how outputs change along that direction.

This is called a Jacobian-vector product (JVP).

Forward-mode AD computes JVPs efficiently. The cost is usually close to one evaluation of the original function, independent of output dimension.

This matters for large-scale systems where:

m \gg 1,

and forming the full Jacobian is impossible.

Examples include:

PDE simulations,
neural networks,
differentiable physics systems,
inverse problems with millions of variables.

Adjoint Sensitivity

Sometimes the output is scalar:

L = \ell(f(\theta)).

Now we want

\nabla_\theta L.

Reverse-mode AD propagates adjoints backward:

\bar{v}_k = \frac{\partial L}{\partial v_k}.

The reverse update applies transposed Jacobians:

\bar{v}_k = \bar{v}_{k+1} \frac{\partial \phi_k}{\partial v_k}.

This computes vector-Jacobian products:

v^\top J.

Adjoint sensitivity methods are essential when:

Situation	Preferred method
Few parameters, many outputs	Forward mode
Many parameters, scalar loss	Reverse mode

This distinction determines the architecture of most scientific AD systems.

Continuous Sensitivity Analysis

For differential equations,

\frac{dy}{dt}=f(y,t,\theta),

the sensitivity matrix

S(t)=\frac{\partial y(t)}{\partial \theta}

satisfies

\frac{dS}{dt} = f_y S + f_\theta.

This converts the derivative problem into another differential equation.

The pair

(y,S)

is integrated simultaneously.

The size of $S$ is

\dim(y)\times \dim(\theta).

This becomes expensive for large parameter spaces, which motivates adjoint methods.

Global Sensitivity

Local derivatives only describe infinitesimal perturbations. Many systems are highly nonlinear, chaotic, or discontinuous. In such cases, local derivatives may fail to capture global behavior.

Global sensitivity analysis studies variation across a parameter distribution.

Suppose parameters are random variables:

\theta \sim p(\theta).

We may ask:

which parameters dominate variance?
which interactions matter?
which parameters are effectively irrelevant?

Classical techniques include:

Method	Purpose
Monte Carlo sampling	Estimate output statistics
Sobol indices	Variance decomposition
Morris method	Screening influential variables
Polynomial chaos	Spectral uncertainty expansion

Automatic differentiation still plays a role. Many global methods require repeated local derivatives, surrogate models, or gradient-based sampling.

Condition Numbers

Sensitivity is closely related to conditioning.

Suppose

y=f(x).

The condition number measures relative amplification of perturbations:

\kappa(x) = \left| \frac{x}{f(x)} \frac{df}{dx} \right|.

Large condition numbers imply instability:

\text{small input error} \to \text{large output error}.

For matrix problems, conditioning often depends on singular values.

For example, solving

Ax=b

has condition number

\kappa(A) = \|A\| \|A^{-1}\|.

Sensitivity analysis therefore connects directly to numerical stability.

Sensitivity in Optimization

Optimization algorithms are sensitivity systems.

Given a loss

L(\theta),

gradient descent updates

\theta_{k+1} = \theta_k - \eta \nabla_\theta L.

The gradient itself is the sensitivity of the loss to parameter perturbations.

Second-order methods use Hessians:

H = \frac{\partial^2 L}{\partial \theta^2}.

The Hessian describes sensitivity of the gradient.

Order	Object	Meaning
First	Gradient	Linear sensitivity
Second	Hessian	Curvature sensitivity
Higher	Higher tensors	Nonlinear interaction structure

Automatic differentiation makes these objects computationally accessible.

Sensitivity of Fixed Points

Many systems compute fixed points:

x^* = g(x^*,\theta).

Examples:

nonlinear solvers,
equilibrium systems,
optimization layers,
implicit neural networks.

Differentiating through the fixed point gives

dx^* = \frac{\partial g}{\partial x}dx^* + \frac{\partial g}{\partial \theta}d\theta.

Rearranging,

\left( I - \frac{\partial g}{\partial x} \right) dx^* = \frac{\partial g}{\partial \theta}d\theta.

Thus

\frac{\partial x^*}{\partial \theta} = \left( I-g_x \right)^{-1} g_\theta.

This is implicit differentiation. It avoids differentiating through every iteration of the solver.

Chaotic Systems

Sensitivity analysis becomes difficult in chaotic systems.

A small perturbation may grow exponentially:

\|\delta y(t)\| \approx e^{\lambda t}\|\delta y(0)\|,

where $\lambda$ is a Lyapunov exponent.

In such systems:

gradients may explode,
local sensitivities become unreliable,
long-term predictions become unstable.

Examples include:

weather systems,
turbulent fluids,
orbital mechanics,
nonlinear dynamical systems.

Even exact derivatives may become numerically useless after long horizons.

This is not an AD failure. It is a property of the underlying dynamics.

Sensitivity and Identifiability

A parameter may exist mathematically but still be unidentifiable from observations.

Suppose two parameters always appear together:

f(x,\theta_1,\theta_2) = f(x,\theta_1+\theta_2).

Then individual sensitivities cannot distinguish them.

The Jacobian becomes rank deficient:

\operatorname{rank}(J) < m.

Consequences include:

unstable parameter estimates,
ill-conditioned optimization,
large uncertainty,
non-unique solutions.

Sensitivity analysis therefore reveals structural properties of models, not only numerical derivatives.

Sparse Sensitivities

Large scientific systems often have sparse dependence structure.

For example:

y_i \text{ depends only on nearby variables}.

Then the Jacobian contains mostly zeros.

Structure	Benefit
Sparse Jacobian	Reduced memory
Sparse Hessian	Faster second-order methods
Block structure	Parallel computation
Local coupling	Efficient differentiation

Scientific AD systems exploit sparsity aggressively.

Methods include:

graph coloring,
compressed Jacobians,
sparse linear algebra,
local stencil propagation.

Without sparsity exploitation, many large simulations become computationally infeasible.

Sensitivity in Machine Learning

Deep learning is a sensitivity propagation problem.

Each layer computes

h_{k+1} = \phi_k(h_k,\theta_k).

Backpropagation computes

\frac{\partial L}{\partial \theta_k}.

This is reverse-mode sensitivity analysis through a layered computational graph.

Problems such as vanishing gradients arise because repeated Jacobian products shrink:

\prod_k J_k \to 0.

Exploding gradients occur when repeated products grow rapidly:

\prod_k J_k \to \infty.

Training stability is therefore a sensitivity conditioning problem.

Practical Interpretation

Sensitivity values must be interpreted carefully.

A large derivative does not automatically mean a parameter is important. Units matter.

For example:

\frac{\partial y}{\partial \theta}

depends on scaling. Normalized sensitivities are often preferable:

\frac{\theta}{y} \frac{\partial y}{\partial \theta}.

These measure relative influence.

Interpretation also depends on:

parameter ranges,
noise levels,
model assumptions,
numerical stability,
identifiability.

Sensitivity analysis is therefore both numerical and statistical.

Summary

Sensitivity analysis studies how perturbations propagate through computational systems. Automatic differentiation provides the machinery for computing these sensitivities accurately and efficiently.

Forward-mode methods propagate perturbations directly. Reverse-mode methods propagate influence backward from outputs. Continuous sensitivity equations extend the same ideas to dynamical systems. Implicit differentiation handles equilibrium and solver-defined systems.

In large scientific applications, sensitivity analysis is not merely a derivative computation technique. It becomes a structural tool for understanding stability, uncertainty, conditioning, identifiability, and control of complex systems.