Physics-informed models combine data fitting with equations from physics or applied mathematics. The model is trained not only to match observed samples, but also to satisfy...
Physics-informed models combine data fitting with equations from physics or applied mathematics. The model is trained not only to match observed samples, but also to satisfy known laws such as conservation equations, differential equations, boundary conditions, or constitutive relations.
A common setting is a neural network approximation to an unknown function:
where is a spatial coordinate, is time, and are trainable parameters.
The model is trained against two kinds of constraints:
for observed data, and
for a differential equation operator . Automatic differentiation is used to compute the derivatives of with respect to its inputs.
Differential Equation Residuals
Consider a differential equation:
A physics-informed neural network substitutes into the equation and forms a residual:
The residual loss is:
The points are often called collocation points. They need not be observed data points. They are locations where the model is asked to satisfy the governing equation.
Example: Heat Equation
For the one-dimensional heat equation,
the residual is:
AD computes both derivatives:
u = model(x, t)
u_t = derivative(u, t)
u_x = derivative(u, x)
u_xx = derivative(u_x, x)
residual = u_t - alpha * u_xx
physics_loss = mean(residual * residual)This is a higher-order AD workload. The derivative requires differentiating a derivative.
Full Training Objective
A typical physics-informed loss combines several terms:
The terms are:
| Term | Meaning |
|---|---|
| Fit observed measurements | |
| Satisfy equation residual | |
| Satisfy boundary conditions | |
| Satisfy initial conditions |
The weights control the relative scale of each constraint. Poor weighting can make training fail even when all derivatives are correct.
AD differentiates the full scalar loss with respect to . It also computes derivatives of with respect to coordinates such as and .
Two Kinds of Derivatives
Physics-informed models use AD in two distinct ways.
First, AD computes derivatives with respect to inputs:
These derivatives build the physics residual.
Second, AD computes derivatives of the training loss with respect to parameters:
These gradients update the model.
The computation therefore contains nested differentiation. The model output is differentiated with respect to inputs, then the residual loss is differentiated with respect to parameters.
Boundary and Initial Conditions
Boundary conditions constrain the solution on the edge of the domain. For example:
The boundary loss may be:
Initial conditions constrain the solution at the starting time:
The initial-condition loss is:
These terms usually need only ordinary model evaluations. The physics residual may require first, second, or higher derivatives.
Strong and Weak Forms
Many physics-informed models use the strong form of a differential equation. The residual is evaluated pointwise:
This requires the model to be differentiable enough for the derivatives in the equation.
A weak form integrates the equation against test functions. Instead of enforcing pointwise residuals, it enforces integral identities. Weak forms can reduce derivative order and may behave better for rough solutions.
For example, a second-order equation in strong form may become a first-derivative expression after integration by parts.
The choice affects AD cost. Strong forms often require higher-order derivatives. Weak forms often require quadrature, basis functions, and differentiable integration.
Inverse Problems
Physics-informed models can estimate unknown physical parameters. Suppose the heat coefficient is unknown. Treat it as a trainable parameter:
Then optimize:
AD computes gradients with respect to both neural network parameters and physical parameters:
This makes inverse modeling natural when the governing equation is differentiable and the unknown parameters enter the residual smoothly.
Differentiating Through Solvers
Physics-informed modeling also includes differentiable numerical solvers. Instead of representing directly with a neural network, the model may define parameters of a simulator and differentiate through the simulation.
For a time-stepping solver:
a loss at the final state can be differentiated through all time steps:
Reverse mode through many time steps resembles backpropagation through time. It can be memory-intensive because intermediate states are needed for the backward pass.
Adjoint methods and checkpointing are often used to reduce memory.
Numerical Precision and Smoothness
Physics-informed models often need higher-order derivatives. This makes smooth activations important.
ReLU networks are piecewise linear. Their second derivative is zero almost everywhere and undefined at kink points. This can be unsuitable for equations involving second derivatives.
Smooth activations such as tanh, sigmoid, softplus, sine, or other differentiable basis functions are often preferred when the physics residual requires higher derivatives.
The choice of activation changes the differentiability class of . AD can only compute derivatives of the represented program. It cannot create smoothness that the model does not have.
Scaling and Conditioning
Physics-informed losses often combine terms with different units and magnitudes. A data loss may be small while a residual loss may be large, or the reverse.
If one term dominates, the optimizer may ignore the others. This is a conditioning problem.
Common techniques include nondimensionalization, adaptive loss weights, residual normalization, curriculum sampling, and separate monitoring of each loss component.
The AD system may compute correct gradients, but the optimizer still follows the geometry induced by the weighted loss.
Collocation Sampling
Collocation points determine where the physics residual is enforced.
Sampling can be uniform, random, grid-based, adaptive, or concentrated near boundaries and discontinuities. Adaptive sampling adds more points where residuals are large.
A typical loop is:
sample data points
sample boundary points
sample collocation points
compute data loss
compute boundary loss
compute physics residual loss
combine losses
backward
optimizer stepChanging the collocation distribution changes the training objective. It is similar to changing the data distribution in ordinary machine learning.
AD Cost
Physics-informed models can be expensive because they require derivatives with respect to inputs and parameters.
For a scalar model output , computing a first derivative may be cheap. Computing many second derivatives, mixed partials, or Jacobians for vector fields can be costly.
For a vector-valued field:
the residual may require divergence, curl, gradients, Hessians, or Laplacians. Efficient AD mode selection matters. Forward mode can be effective for low-dimensional inputs. Reverse mode is effective for many parameters and scalar losses. Mixed-mode AD is often needed.
Common Failure Modes
Physics-informed models can fail for reasons unrelated to derivative correctness.
The PDE may be stiff or ill-conditioned.
The residual loss may dominate the data loss.
Boundary conditions may be underweighted.
Collocation points may miss important regions.
The network may lack the smoothness required by the equation.
Higher-order derivatives may amplify floating point error.
The optimizer may converge to a function with small residual at sampled points but poor behavior elsewhere.
The inverse problem may be non-identifiable, so different parameter values explain the data equally well.
These are modeling and numerical issues. AD supplies derivatives of the chosen computational objective.
Interface to AD Systems
Physics-informed models require AD systems to support derivatives with respect to both inputs and parameters.
A clean implementation separates them:
u = model(params, coordinates)
coordinate_derivatives = diff(u, coordinates)
residual = physics_operator(u, coordinate_derivatives)
loss = residual_loss + data_loss + boundary_loss
param_grad = grad(loss, params)This exposes the two derivative levels clearly. Coordinate derivatives define the equation residual. Parameter derivatives train the model.
Physics-informed models are therefore a demanding use case for AD. They require higher-order differentiation, mixed-mode execution, careful numerical scaling, and explicit graph management. The benefit is that scientific structure becomes part of the training signal instead of only post-training evaluation.