# Sensitivity Analysis

## Sensitivity Analysis

Sensitivity analysis studies how changes in inputs affect the outputs of a system. In differential equations, optimization, simulation, and machine learning, the main object is usually a function

$$
y = f(x,\theta),
$$

where $x$ is an input or initial condition and $\theta$ is a parameter vector. The central question is:

$$
\text{How does } y \text{ change when } x \text{ or } \theta \text{ changes?}
$$

Automatic differentiation provides a systematic way to compute these sensitivities with machine precision accuracy.

Sensitivity analysis appears in nearly every computational science domain:

| Domain | Typical sensitivity question |
|---|---|
| Physics | How does material stiffness affect deformation? |
| Climate modeling | How sensitive is temperature to emission parameters? |
| Finance | How does option price change with volatility? |
| Pharmacology | How does dosage affect concentration dynamics? |
| Machine learning | How does loss change with model parameters? |
| Robotics | How do control parameters affect trajectories? |

The derivative itself becomes an object of scientific interpretation.

### Local Sensitivity

The simplest form is local sensitivity. Suppose

$$
y = f(\theta).
$$

A small perturbation $d\theta$ produces a first-order change

$$
dy \approx J\, d\theta,
$$

where

$$
J = \frac{\partial f}{\partial \theta}
$$

is the Jacobian matrix.

If $f : \mathbb{R}^m \to \mathbb{R}^n$, then

$$
J \in \mathbb{R}^{n \times m}.
$$

Each entry measures a directional influence:

$$
J_{ij} =
\frac{\partial y_i}{\partial \theta_j}.
$$

Large magnitude means the output is sensitive to that parameter. Small magnitude means the output changes little under perturbation.

Forward-mode AD computes columns of $J$. Reverse-mode AD computes rows of $J^\top$.

### Sensitivity as Linearization

Sensitivity analysis is fundamentally a linear approximation problem.

Around a reference parameter $\theta_0$,

$$
f(\theta_0 + \delta\theta)
\approx
f(\theta_0)
+
J(\theta_0)\delta\theta.
$$

The Jacobian acts as the local linear model of the system.

This viewpoint is important because many large systems are only tractable through local approximations. The full nonlinear model may be expensive, discontinuous, or analytically inaccessible. The linearized sensitivity often provides enough information for:

- optimization,
- uncertainty propagation,
- control design,
- stability analysis,
- parameter fitting,
- experimental design.

### Forward Sensitivity Propagation

Suppose a program computes

$$
v_{k+1} = \phi_k(v_k,\theta).
$$

Forward sensitivity propagation augments each variable with its derivative:

$$
S_k = \frac{\partial v_k}{\partial \theta}.
$$

Applying the chain rule gives

$$
S_{k+1} =
\frac{\partial \phi_k}{\partial v_k} S_k
+
\frac{\partial \phi_k}{\partial \theta}.
$$

This is the computational foundation of forward-mode AD.

Conceptually:

| Quantity | Meaning |
|---|---|
| $v_k$ | Program state |
| $S_k$ | Sensitivity of state |
| Jacobian multiplication | Propagation rule |

The sensitivity evolves alongside the state itself.

### Directional Sensitivity

Often we do not need the full Jacobian. We only need sensitivity in a particular direction.

Given a perturbation vector $u$,

$$
Ju
$$

measures how outputs change along that direction.

This is called a Jacobian-vector product (JVP).

Forward-mode AD computes JVPs efficiently. The cost is usually close to one evaluation of the original function, independent of output dimension.

This matters for large-scale systems where:

$$
m \gg 1,
$$

and forming the full Jacobian is impossible.

Examples include:

- PDE simulations,
- neural networks,
- differentiable physics systems,
- inverse problems with millions of variables.

### Adjoint Sensitivity

Sometimes the output is scalar:

$$
L = \ell(f(\theta)).
$$

Now we want

$$
\nabla_\theta L.
$$

Reverse-mode AD propagates adjoints backward:

$$
\bar{v}_k =
\frac{\partial L}{\partial v_k}.
$$

The reverse update applies transposed Jacobians:

$$
\bar{v}_k =
\bar{v}_{k+1}
\frac{\partial \phi_k}{\partial v_k}.
$$

This computes vector-Jacobian products:

$$
v^\top J.
$$

Adjoint sensitivity methods are essential when:

| Situation | Preferred method |
|---|---|
| Few parameters, many outputs | Forward mode |
| Many parameters, scalar loss | Reverse mode |

This distinction determines the architecture of most scientific AD systems.

### Continuous Sensitivity Analysis

For differential equations,

$$
\frac{dy}{dt}=f(y,t,\theta),
$$

the sensitivity matrix

$$
S(t)=\frac{\partial y(t)}{\partial \theta}
$$

satisfies

$$
\frac{dS}{dt} =
f_y S + f_\theta.
$$

This converts the derivative problem into another differential equation.

The pair

$$
(y,S)
$$

is integrated simultaneously.

The size of $S$ is

$$
\dim(y)\times \dim(\theta).
$$

This becomes expensive for large parameter spaces, which motivates adjoint methods.

### Global Sensitivity

Local derivatives only describe infinitesimal perturbations. Many systems are highly nonlinear, chaotic, or discontinuous. In such cases, local derivatives may fail to capture global behavior.

Global sensitivity analysis studies variation across a parameter distribution.

Suppose parameters are random variables:

$$
\theta \sim p(\theta).
$$

We may ask:

- which parameters dominate variance?
- which interactions matter?
- which parameters are effectively irrelevant?

Classical techniques include:

| Method | Purpose |
|---|---|
| Monte Carlo sampling | Estimate output statistics |
| Sobol indices | Variance decomposition |
| Morris method | Screening influential variables |
| Polynomial chaos | Spectral uncertainty expansion |

Automatic differentiation still plays a role. Many global methods require repeated local derivatives, surrogate models, or gradient-based sampling.

### Condition Numbers

Sensitivity is closely related to conditioning.

Suppose

$$
y=f(x).
$$

The condition number measures relative amplification of perturbations:

$$
\kappa(x) =
\left|
\frac{x}{f(x)}
\frac{df}{dx}
\right|.
$$

Large condition numbers imply instability:

$$
\text{small input error}
\to
\text{large output error}.
$$

For matrix problems, conditioning often depends on singular values.

For example, solving

$$
Ax=b
$$

has condition number

$$
\kappa(A) =
\|A\| \|A^{-1}\|.
$$

Sensitivity analysis therefore connects directly to numerical stability.

### Sensitivity in Optimization

Optimization algorithms are sensitivity systems.

Given a loss

$$
L(\theta),
$$

gradient descent updates

$$
\theta_{k+1} =
\theta_k -
\eta \nabla_\theta L.
$$

The gradient itself is the sensitivity of the loss to parameter perturbations.

Second-order methods use Hessians:

$$
H =
\frac{\partial^2 L}{\partial \theta^2}.
$$

The Hessian describes sensitivity of the gradient.

| Order | Object | Meaning |
|---|---|---|
| First | Gradient | Linear sensitivity |
| Second | Hessian | Curvature sensitivity |
| Higher | Higher tensors | Nonlinear interaction structure |

Automatic differentiation makes these objects computationally accessible.

### Sensitivity of Fixed Points

Many systems compute fixed points:

$$
x^* = g(x^*,\theta).
$$

Examples:

- nonlinear solvers,
- equilibrium systems,
- optimization layers,
- implicit neural networks.

Differentiating through the fixed point gives

$$
dx^* =
\frac{\partial g}{\partial x}dx^*
+
\frac{\partial g}{\partial \theta}d\theta.
$$

Rearranging,

$$
\left(
I - \frac{\partial g}{\partial x}
\right)
dx^* =
\frac{\partial g}{\partial \theta}d\theta.
$$

Thus

$$
\frac{\partial x^*}{\partial \theta} =
\left(
I-g_x
\right)^{-1}
g_\theta.
$$

This is implicit differentiation. It avoids differentiating through every iteration of the solver.

### Chaotic Systems

Sensitivity analysis becomes difficult in chaotic systems.

A small perturbation may grow exponentially:

$$
\|\delta y(t)\|
\approx
e^{\lambda t}\|\delta y(0)\|,
$$

where $\lambda$ is a Lyapunov exponent.

In such systems:

- gradients may explode,
- local sensitivities become unreliable,
- long-term predictions become unstable.

Examples include:

- weather systems,
- turbulent fluids,
- orbital mechanics,
- nonlinear dynamical systems.

Even exact derivatives may become numerically useless after long horizons.

This is not an AD failure. It is a property of the underlying dynamics.

### Sensitivity and Identifiability

A parameter may exist mathematically but still be unidentifiable from observations.

Suppose two parameters always appear together:

$$
f(x,\theta_1,\theta_2) =
f(x,\theta_1+\theta_2).
$$

Then individual sensitivities cannot distinguish them.

The Jacobian becomes rank deficient:

$$
\operatorname{rank}(J) < m.
$$

Consequences include:

- unstable parameter estimates,
- ill-conditioned optimization,
- large uncertainty,
- non-unique solutions.

Sensitivity analysis therefore reveals structural properties of models, not only numerical derivatives.

### Sparse Sensitivities

Large scientific systems often have sparse dependence structure.

For example:

$$
y_i
\text{ depends only on nearby variables}.
$$

Then the Jacobian contains mostly zeros.

| Structure | Benefit |
|---|---|
| Sparse Jacobian | Reduced memory |
| Sparse Hessian | Faster second-order methods |
| Block structure | Parallel computation |
| Local coupling | Efficient differentiation |

Scientific AD systems exploit sparsity aggressively.

Methods include:

- graph coloring,
- compressed Jacobians,
- sparse linear algebra,
- local stencil propagation.

Without sparsity exploitation, many large simulations become computationally infeasible.

### Sensitivity in Machine Learning

Deep learning is a sensitivity propagation problem.

Each layer computes

$$
h_{k+1} = \phi_k(h_k,\theta_k).
$$

Backpropagation computes

$$
\frac{\partial L}{\partial \theta_k}.
$$

This is reverse-mode sensitivity analysis through a layered computational graph.

Problems such as vanishing gradients arise because repeated Jacobian products shrink:

$$
\prod_k J_k \to 0.
$$

Exploding gradients occur when repeated products grow rapidly:

$$
\prod_k J_k \to \infty.
$$

Training stability is therefore a sensitivity conditioning problem.

### Practical Interpretation

Sensitivity values must be interpreted carefully.

A large derivative does not automatically mean a parameter is important. Units matter.

For example:

$$
\frac{\partial y}{\partial \theta}
$$

depends on scaling. Normalized sensitivities are often preferable:

$$
\frac{\theta}{y}
\frac{\partial y}{\partial \theta}.
$$

These measure relative influence.

Interpretation also depends on:

- parameter ranges,
- noise levels,
- model assumptions,
- numerical stability,
- identifiability.

Sensitivity analysis is therefore both numerical and statistical.

### Summary

Sensitivity analysis studies how perturbations propagate through computational systems. Automatic differentiation provides the machinery for computing these sensitivities accurately and efficiently.

Forward-mode methods propagate perturbations directly. Reverse-mode methods propagate influence backward from outputs. Continuous sensitivity equations extend the same ideas to dynamical systems. Implicit differentiation handles equilibrium and solver-defined systems.

In large scientific applications, sensitivity analysis is not merely a derivative computation technique. It becomes a structural tool for understanding stability, uncertainty, conditioning, identifiability, and control of complex systems.