# Molecular Simulation

## Molecular Simulation

Molecular simulation models the behavior of atoms and molecules using physical interaction laws. Automatic differentiation is important because many molecular methods require gradients of energy functions with respect to particle coordinates, force-field parameters, or quantum variables.

A molecular system is typically described by particle positions

$$
x = (x_1,\ldots,x_N),
$$

where each particle position

$$
x_i \in \mathbb{R}^3.
$$

The system energy is

$$
E(x,\theta),
$$

where $\theta$ contains force-field or model parameters.

Most computational tasks reduce to differentiating this energy.

### Forces as Energy Gradients

In classical mechanics, forces are negative energy gradients:

$$
F_i = -\frac{\partial E}{\partial x_i}.
$$

This relation is the foundation of molecular dynamics.

A molecular simulation repeatedly performs:

1. compute energy,
2. compute forces,
3. integrate equations of motion,
4. update positions and velocities.

Automatic differentiation naturally computes forces because force computation is exactly reverse-mode differentiation of the scalar energy function.

### Pairwise Interaction Potentials

A simple molecular model uses pairwise potentials:

$$
E(x) =
\sum_{i<j}
\phi(r_{ij}),
$$

where

$$
r_{ij} = \|x_i - x_j\|.
$$

A common example is the Lennard-Jones potential:

$$
\phi(r) =
4\epsilon
\left[
\left(
\frac{\sigma}{r}
\right)^{12} -
\left(
\frac{\sigma}{r}
\right)^6
\right].
$$

The force follows from differentiation:

$$
F_i = -
\frac{\partial E}{\partial x_i}.
$$

Since the energy is scalar and the number of coordinates is large, reverse-mode AD is ideal.

The computational graph is usually regular and local:

| Structure | Consequence |
|---|---|
| Pairwise interactions | Sparse dependence |
| Short-range cutoffs | Local gradient propagation |
| Neighbor lists | Reduced complexity |
| Shared force law | Repeated derivative structure |

### Molecular Dynamics

Classical molecular dynamics evolves positions according to Newton's laws:

$$
m_i \frac{d^2 x_i}{dt^2} =
F_i(x).
$$

Equivalently,

$$
m_i \frac{d^2 x_i}{dt^2} = -
\frac{\partial E}{\partial x_i}.
$$

A simulation integrates these equations numerically.

For example, velocity Verlet updates are:

$$
x_{k+1} =
x_k
+
v_k \Delta t
+
\frac{1}{2}a_k \Delta t^2,
$$

$$
v_{k+1} =
v_k
+
\frac{1}{2}(a_k+a_{k+1})\Delta t.
$$

Differentiating through the trajectory gives sensitivities of observables to parameters or initial conditions.

### Sensitivity of Trajectories

Suppose the trajectory depends on parameters $\theta$:

$$
x(t)=x(t;\theta).
$$

An observable may be

$$
L(\theta)=\ell(x(T)).
$$

AD computes

$$
\nabla_\theta L.
$$

Applications include:

| Application | Differentiated quantity |
|---|---|
| Force-field fitting | Energy parameters |
| Drug design | Binding scores |
| Protein folding | Structural objectives |
| Material design | Elastic or thermal properties |
| Enhanced sampling | Bias potentials |
| Inverse molecular design | Atomic configuration objectives |

Trajectory differentiation is difficult because simulations may be long, chaotic, and memory-intensive.

### Chaotic Dynamics

Molecular systems are often chaotic. Nearby trajectories diverge exponentially:

$$
\|\delta x(t)\|
\approx
e^{\lambda t}
\|\delta x(0)\|.
$$

This creates unstable long-horizon gradients.

A gradient may become dominated by numerical noise after enough integration time. This is not an AD defect. It reflects sensitivity of the physical system itself.

Practical approaches include:

| Method | Purpose |
|---|---|
| Shorter horizons | Reduce instability |
| Averaged observables | Stabilize statistics |
| Shadowing methods | Improve chaotic sensitivities |
| Reparameterization | Reduce variance |
| Stochastic estimators | Avoid trajectory explosion |

Long-time molecular sensitivities remain an active research problem.

### Constraints

Molecular systems frequently contain constraints:

$$
g(x)=0.
$$

Examples include fixed bond lengths or rigid molecular groups.

Constraint algorithms such as SHAKE or RATTLE solve implicit systems each time step.

Differentiating through constraints requires care. Suppose positions are projected to satisfy

$$
g(x)=0.
$$

Then derivatives must account for the projection step.

Naively ignoring the projection produces inconsistent forces or sensitivities.

Constraint differentiation is usually implemented with implicit differentiation rather than unrolling every iterative correction.

### Thermodynamic Ensembles

Many simulations model thermal equilibrium rather than deterministic trajectories.

For example, the Boltzmann distribution is

$$
p(x) =
\frac{1}{Z}
e^{-\beta E(x)},
$$

where

$$
Z =
\int e^{-\beta E(x)}dx
$$

is the partition function.

Observable expectations are

$$
\mathbb{E}[f(x)] =
\int f(x)p(x)\,dx.
$$

Differentiating these expectations is central in statistical mechanics and probabilistic modeling.

The derivative becomes

$$
\frac{\partial}{\partial \theta}
\mathbb{E}[f(x)] =
\mathbb{E}
\left[
\frac{\partial f}{\partial \theta}
\right] -
\beta
\operatorname{Cov}
\left(
f,
\frac{\partial E}{\partial \theta}
\right).
$$

This connects molecular simulation with gradient estimation in probabilistic inference.

### Monte Carlo Methods

Monte Carlo simulation samples configurations instead of integrating trajectories.

A Markov chain produces states:

$$
x_0 \to x_1 \to \cdots.
$$

The acceptance probability often depends on energy differences:

$$
a(x \to x') =
\min
\left(
1,
e^{-\beta(E(x')-E(x))}
\right).
$$

Differentiating Monte Carlo algorithms is difficult because sampling introduces discontinuous accept/reject decisions.

Two approaches are common:

| Approach | Idea |
|---|---|
| Differentiate estimators | Treat samples as fixed |
| Reparameterized samplers | Push gradients through randomness |

Many practical systems differentiate only smooth energy evaluations and avoid differentiating the discrete acceptance step.

### Force-Field Optimization

A force field defines interaction energies through parameterized functions:

$$
E(x,\theta).
$$

Parameters include:

- bond constants,
- angle penalties,
- electrostatic coefficients,
- Lennard-Jones scales,
- torsion terms.

Given experimental or quantum reference data,

$$
z_i,
$$

we define a fitting objective:

$$
L(\theta) =
\sum_i
\|f_i(\theta)-z_i\|^2.
$$

AD computes parameter gradients automatically.

This is increasingly important because modern force fields may contain:

| Model type | Complexity |
|---|---|
| Classical force fields | Hand-designed analytic forms |
| Polarizable models | Coupled field equations |
| Neural potentials | Deep neural networks |
| Hybrid quantum-classical models | Nested differentiable solvers |

### Neural Potentials

Modern molecular simulation increasingly uses learned energy functions.

A neural network predicts energy:

$$
E_\theta(x).
$$

Forces are obtained by differentiation:

$$
F_i = -
\frac{\partial E_\theta}{\partial x_i}.
$$

This architecture has an important property: forces are automatically conservative because they derive from a scalar potential.

Without this property, learned force models may violate physical consistency.

AD makes energy-based modeling practical because the same computation graph produces both energies and forces.

### Symmetry Constraints

Molecular systems obey physical symmetries.

The energy should be invariant under:

| Transformation | Requirement |
|---|---|
| Translation | Energy unchanged |
| Rotation | Energy unchanged |
| Particle permutation | Energy respects particle identity rules |

Therefore,

$$
E(x) =
E(Rx+t).
$$

Derivatives must preserve these symmetries.

For example, translational invariance implies:

$$
\sum_i F_i = 0.
$$

AD preserves these relations if the energy implementation itself respects symmetry.

### Neighbor Lists and Sparse Structure

Direct pairwise interaction costs

$$
O(N^2).
$$

Practical molecular simulations use neighbor lists or spatial partitioning.

Only nearby particles interact:

$$
r_{ij} < r_c.
$$

This creates sparse dependence structure.

AD systems must preserve locality. Dense Jacobian construction is infeasible for large systems.

Efficient implementations use:

| Technique | Benefit |
|---|---|
| Neighbor lists | Reduced interaction count |
| Cell grids | Local memory access |
| Sparse accumulation | Lower memory use |
| Kernel fusion | Better GPU throughput |

### Quantum Molecular Methods

Quantum chemistry introduces wavefunctions, orbitals, and electronic structure calculations.

For example, Hartree-Fock methods solve nonlinear eigenvalue systems:

$$
F(C)C = SC\varepsilon.
$$

Density functional theory solves related variational systems.

Differentiation appears in:

- geometry optimization,
- force computation,
- parameter fitting,
- differentiable quantum chemistry,
- variational wavefunction learning.

Quantum methods often contain:

| Component | AD challenge |
|---|---|
| Eigenvalue decompositions | Degenerate spectra |
| Self-consistent field loops | Implicit differentiation |
| Basis transforms | Large dense tensors |
| Exchange-correlation models | Complex derivative chains |

Modern systems increasingly combine analytic derivative formulas with AD infrastructure.

### Enhanced Sampling

Rare molecular events may occur too slowly for direct simulation.

Enhanced sampling introduces bias potentials:

$$
E'(x)=E(x)+V(x,\theta).
$$

The bias helps explore important regions of configuration space.

Differentiable enhanced sampling methods optimize $V$ using gradients of exploration quality, free-energy estimates, or target distributions.

Examples include:

- metadynamics,
- variational enhanced sampling,
- differentiable umbrella sampling,
- learned reaction coordinates.

### Memory and Reverse Mode

Reverse-mode AD through long molecular trajectories has severe memory costs.

A trajectory with millions of atoms and millions of time steps cannot store every intermediate state.

Strategies include:

| Method | Idea |
|---|---|
| Checkpointing | Store selected states |
| Recomputation | Rebuild missing states |
| Reversible integrators | Recover earlier states |
| Adjoint dynamics | Backward sensitivity equations |
| Truncated gradients | Limit time horizon |

The choice depends on whether exact trajectory gradients are required.

### Differentiable Molecular Design

Inverse molecular design optimizes molecular structures directly.

A differentiable pipeline may look like:

```text
molecule representation
    -> geometry generation
    -> energy model
    -> simulation or relaxation
    -> property prediction
    -> scalar objective
    -> gradient-based update
```

The objective may target:

- stability,
- binding affinity,
- conductivity,
- catalytic efficiency,
- optical properties,
- synthesizability.

This transforms molecular simulation into a differentiable optimization system.

### Failure Modes

Differentiable molecular systems fail in characteristic ways.

| Failure mode | Cause |
|---|---|
| Exploding trajectory gradients | Chaotic dynamics |
| Nonphysical learned forces | Force model not energy-derived |
| Broken symmetry | Incorrect representation |
| Noisy gradients | Monte Carlo variance |
| Memory explosion | Long reverse trajectories |
| Inconsistent constraints | Ignored projection derivatives |
| Discontinuous derivatives | Hard cutoffs or switching functions |

Many production systems smooth discontinuities specifically to improve gradient behavior.

### Summary

Molecular simulation is fundamentally a differentiable physical system because forces derive from energy gradients. Automatic differentiation provides efficient force computation, parameter sensitivities, differentiable optimization, and learned energy modeling.

The major challenges are scale, chaotic dynamics, sparse interactions, constrained dynamics, and differentiating through long trajectories or quantum solvers. Effective systems combine AD with domain-specific numerical structure rather than treating molecular simulation as a generic dense tensor program.

