# Applications Across Science and Engineering

## Applications Across Science and Engineering

Automatic differentiation became important because derivatives are required everywhere numerical models are optimized, controlled, calibrated, or analyzed. Once a system can compute derivatives automatically, entire classes of algorithms become practical at large scale.

The same mathematical mechanism appears across many fields:

$$
\text{compute value}
\quad \rightarrow \quad
\text{compute sensitivity}.
$$

The value may be a loss, simulation result, likelihood, energy, trajectory, or prediction. The derivative describes how that quantity changes with respect to inputs, parameters, or state variables.

## Optimization

Optimization is the most direct application of derivatives.

Suppose we want to minimize

$$
f(x).
$$

Gradient-based optimization updates parameters according to

$$
x_{k+1} =
x_k -
\eta \nabla f(x_k),
$$

where:

- $\eta$ is a step size
- $\nabla f(x_k)$ is the gradient

The gradient identifies local descent directions. Without derivatives, optimization must rely on derivative-free search methods, which scale poorly in high dimensions.

Automatic differentiation enables large-scale optimization because it computes gradients efficiently for programs with millions or billions of parameters.

Applications include:

| Area | Optimization target |
|---|---|
| Machine learning | Loss functions |
| Engineering design | Cost and constraint functions |
| Robotics | Trajectory objectives |
| Finance | Risk-adjusted returns |
| Control systems | Policy and stability objectives |
| Physics | Energy minimization |

In many cases, the optimizer itself becomes part of a larger differentiable system.

## Machine Learning

Machine learning is currently the most visible application of automatic differentiation.

A model defines a function:

$$
\hat{y} = f_\theta(x),
$$

where:

- $x$ is input data
- $\theta$ are parameters
- $\hat{y}$ is the prediction

Training minimizes a loss:

$$
L(\theta) =
\sum_i
\ell(f_\theta(x_i), y_i).
$$

The required gradient is

$$
\nabla_\theta L.
$$

Reverse mode AD computes this efficiently because the loss is scalar while the parameter vector is large.

Neural networks are compositions of differentiable layers:

$$
f(x) =
f_n(f_{n-1}(\cdots f_1(x))).
$$

Backpropagation applies the chain rule through this composition.

Modern machine learning systems depend on AD for:

| Model type | Derivative use |
|---|---|
| Feedforward networks | Parameter gradients |
| Transformers | Attention and layer gradients |
| RNNs | Temporal gradient propagation |
| Diffusion models | Score function learning |
| Reinforcement learning | Policy gradients |
| Meta-learning | Higher-order gradients |

Without AD, training these systems would be computationally infeasible.

## Scientific Simulation

Many scientific simulations solve forward problems:

$$
u = S(p),
$$

where:

- $p$ are parameters
- $S$ is a simulation
- $u$ is the simulated state

Examples include fluid dynamics, climate models, elasticity, electromagnetics, and molecular dynamics.

Scientific computing often requires sensitivities:

$$
\frac{\partial u}{\partial p}.
$$

These sensitivities support:

| Task | Purpose |
|---|---|
| Calibration | Fit parameters to data |
| Sensitivity analysis | Identify influential parameters |
| Inverse problems | Recover hidden causes |
| Design optimization | Improve system behavior |
| Uncertainty quantification | Propagate uncertainty |

Finite differences are often too expensive because each parameter perturbation requires rerunning the simulation.

Adjoint methods, which are reverse mode AD specialized for PDEs and simulations, make these problems tractable.

## Inverse Problems

An inverse problem reconstructs unknown parameters from observations.

Suppose:

$$
y = S(p)
$$

is a forward model. Given observed data $y^*$, we seek parameters $p$ minimizing

$$
L(p) =
\|S(p)-y^*\|^2.
$$

Examples include:

| Field | Unknown quantity |
|---|---|
| Medical imaging | Tissue structure |
| Geophysics | Subsurface properties |
| Astronomy | Physical parameters |
| Seismology | Earth structure |
| Tomography | Internal densities |

The optimization requires derivatives of the simulation with respect to parameters. AD provides these sensitivities automatically or semi-automatically.

## Computational Fluid Dynamics

Fluid simulations are governed by partial differential equations such as the Navier-Stokes equations.

A simulation may depend on:

- geometry
- boundary conditions
- material parameters
- control variables

Engineers often need gradients of objectives such as drag, lift, pressure loss, or energy efficiency.

For example:

$$
J(\theta) =
\text{drag}(\theta).
$$

Optimizing shape parameters requires

$$
\nabla_\theta J.
$$

Finite differences become impractical because each parameter perturbation requires a full simulation.

Adjoint differentiation computes these gradients at much lower cost.

This enabled modern aerodynamic optimization for aircraft, turbines, and flow systems.

## Robotics and Control

Robotics systems involve dynamics, geometry, sensing, and control.

A robot trajectory may be defined by:

$$
x_{t+1} = f(x_t, u_t),
$$

where:

- $x_t$ is state
- $u_t$ is control input

Optimization-based control methods require derivatives of future trajectories with respect to controls.

Applications include:

| Problem | Derivative use |
|---|---|
| Trajectory optimization | State sensitivities |
| Model predictive control | Control gradients |
| Robot calibration | Parameter estimation |
| SLAM | Optimization over geometry |
| Policy learning | Reinforcement gradients |

Differentiable simulation has become especially important in robotics because learning and control increasingly interact.

## Computational Finance

Financial models often depend on parameters such as interest rates, volatility, and asset prices.

Sensitivity measures are called Greeks.

For example:

| Greek | Derivative |
|---|---|
| Delta | $\partial V / \partial S$ |
| Vega | $\partial V / \partial \sigma$ |
| Theta | $\partial V / \partial t$ |

where:

- $V$ is option value
- $S$ is asset price
- $\sigma$ is volatility

AD allows many sensitivities to be computed simultaneously and accurately.

Monte Carlo pricing systems especially benefit from reverse mode methods because many derivatives can be obtained from one backward sweep.

## Probabilistic Programming

Probabilistic models define distributions rather than deterministic outputs.

A typical task is maximizing a log-likelihood:

$$
\log p_\theta(x).
$$

Inference algorithms often require gradients:

$$
\nabla_\theta \log p_\theta(x).
$$

Hamiltonian Monte Carlo, variational inference, and gradient-based Bayesian methods all depend on efficient derivative computation.

Probabilistic programming systems therefore integrate AD deeply into their runtime semantics.

## Computer Graphics

Modern graphics increasingly uses differentiable rendering.

A renderer computes:

$$
I = R(\theta),
$$

where:

- $I$ is an image
- $\theta$ describes scene parameters

Differentiable rendering computes:

$$
\frac{\partial I}{\partial \theta}.
$$

This enables optimization over:

| Parameter | Example |
|---|---|
| Geometry | Shape reconstruction |
| Materials | Reflectance estimation |
| Lighting | Illumination recovery |
| Camera pose | Tracking and calibration |

Applications include inverse graphics, neural rendering, scene reconstruction, and synthetic data generation.

## Signal Processing

Signal processing systems frequently optimize filters, transforms, or latent representations.

Examples include:

| Task | Derivative use |
|---|---|
| Audio synthesis | Parameter optimization |
| Image restoration | Loss minimization |
| Compression | Rate-distortion optimization |
| Communications | Channel estimation |
| Spectral methods | Differentiable transforms |

Many operations are linear, but modern pipelines increasingly include learned or nonlinear components. AD allows gradients to flow through the entire pipeline.

## Differentiable Physics

Differentiable physics systems combine simulation with optimization or learning.

A simulation becomes part of a computational graph:

$$
x_{t+1} = F(x_t, u_t, \theta).
$$

The entire trajectory becomes differentiable.

Applications include:

| Area | Objective |
|---|---|
| Soft robotics | Learn control policies |
| Material systems | Estimate parameters |
| Animation | Physics-constrained optimization |
| Scientific ML | Hybrid simulation-learning models |

Differentiable simulators blur the boundary between numerical solvers and trainable systems.

## Databases and Query Systems

More recent work explores differentiable databases and differentiable query execution.

Traditional databases are discrete systems. But modern applications increasingly combine retrieval, ranking, recommendation, and learning.

Examples include differentiable:

- ranking functions
- embedding retrieval
- approximate joins
- neural query operators
- learned indexes

This area remains experimental, but it reflects a broader trend: treating large software systems as differentiable computational structures.

## Why AD Generalizes So Broadly

Automatic differentiation applies broadly because it does not depend on one domain.

It only requires:

1. A computation
2. Differentiable primitive operations
3. A chain-rule propagation mechanism

Once those ingredients exist, the same machinery works for:

- neural networks
- PDE solvers
- rendering systems
- optimization pipelines
- probabilistic models
- control systems

The derivative engine is domain-independent. Only the primitive operations and computational structure change.

## From Numerical Programs to Differentiable Systems

Originally, automatic differentiation differentiated isolated functions.

Modern systems increasingly differentiate entire pipelines:

```text
data
  -> preprocessing
  -> simulation
  -> neural model
  -> optimization
  -> evaluation
```

Each stage may involve different abstractions:

- tensors
- sparse matrices
- solvers
- probabilistic samplers
- graph operations
- database queries

The long-term direction of AD is therefore larger than gradient computation alone.

The broader goal is differentiable systems infrastructure: computational environments where sensitivity information flows through the same pathways as ordinary computation.

