# Physics-Informed Models

## Physics-Informed Models

Physics-informed models combine data fitting with equations from physics or applied mathematics. The model is trained not only to match observed samples, but also to satisfy known laws such as conservation equations, differential equations, boundary conditions, or constitutive relations.

A common setting is a neural network approximation to an unknown function:

$$
u_\theta(x,t)
$$

where $x$ is a spatial coordinate, $t$ is time, and $\theta$ are trainable parameters.

The model is trained against two kinds of constraints:

$$
u_\theta(x_i,t_i) \approx y_i
$$

for observed data, and

$$
\mathcal{N}[u_\theta](x_j,t_j) \approx 0
$$

for a differential equation operator $\mathcal{N}$. Automatic differentiation is used to compute the derivatives of $u_\theta$ with respect to its inputs.

## Differential Equation Residuals

Consider a differential equation:

$$
\mathcal{N}[u](x,t) = 0.
$$

A physics-informed neural network substitutes $u_\theta$ into the equation and forms a residual:

$$
r_\theta(x,t) =
\mathcal{N}[u_\theta](x,t).
$$

The residual loss is:

$$
L_{\text{phys}} =
\frac{1}{M}
\sum_{j=1}^{M}
\left\|
r_\theta(x_j,t_j)
\right\|^2.
$$

The points $(x_j,t_j)$ are often called collocation points. They need not be observed data points. They are locations where the model is asked to satisfy the governing equation.

## Example: Heat Equation

For the one-dimensional heat equation,

$$
\frac{\partial u}{\partial t} =
\alpha
\frac{\partial^2 u}{\partial x^2},
$$

the residual is:

$$
r_\theta(x,t) =
\frac{\partial u_\theta}{\partial t} -
\alpha
\frac{\partial^2 u_\theta}{\partial x^2}.
$$

AD computes both derivatives:

```text
u = model(x, t)

u_t = derivative(u, t)
u_x = derivative(u, x)
u_xx = derivative(u_x, x)

residual = u_t - alpha * u_xx
physics_loss = mean(residual * residual)
```

This is a higher-order AD workload. The derivative $u_xx$ requires differentiating a derivative.

## Full Training Objective

A typical physics-informed loss combines several terms:

$$
L(\theta) =
\lambda_{\text{data}} L_{\text{data}}
+
\lambda_{\text{phys}} L_{\text{phys}}
+
\lambda_{\text{bc}} L_{\text{bc}}
+
\lambda_{\text{ic}} L_{\text{ic}}.
$$

The terms are:

| Term | Meaning |
|---|---|
| $L_{\text{data}}$ | Fit observed measurements |
| $L_{\text{phys}}$ | Satisfy equation residual |
| $L_{\text{bc}}$ | Satisfy boundary conditions |
| $L_{\text{ic}}$ | Satisfy initial conditions |

The weights $\lambda$ control the relative scale of each constraint. Poor weighting can make training fail even when all derivatives are correct.

AD differentiates the full scalar loss with respect to $\theta$. It also computes derivatives of $u_\theta$ with respect to coordinates such as $x$ and $t$.

## Two Kinds of Derivatives

Physics-informed models use AD in two distinct ways.

First, AD computes derivatives with respect to inputs:

$$
\frac{\partial u_\theta}{\partial x},
\qquad
\frac{\partial u_\theta}{\partial t},
\qquad
\frac{\partial^2 u_\theta}{\partial x^2}.
$$

These derivatives build the physics residual.

Second, AD computes derivatives of the training loss with respect to parameters:

$$
\nabla_\theta L(\theta).
$$

These gradients update the model.

The computation therefore contains nested differentiation. The model output is differentiated with respect to inputs, then the residual loss is differentiated with respect to parameters.

## Boundary and Initial Conditions

Boundary conditions constrain the solution on the edge of the domain. For example:

$$
u(0,t) = a(t),
\qquad
u(1,t) = b(t).
$$

The boundary loss may be:

$$
L_{\text{bc}} =
\frac{1}{M_b}
\sum_j
\left(
u_\theta(0,t_j)-a(t_j)
\right)^2
+
\left(
u_\theta(1,t_j)-b(t_j)
\right)^2.
$$

Initial conditions constrain the solution at the starting time:

$$
u(x,0) = u_0(x).
$$

The initial-condition loss is:

$$
L_{\text{ic}} =
\frac{1}{M_i}
\sum_j
\left(
u_\theta(x_j,0)-u_0(x_j)
\right)^2.
$$

These terms usually need only ordinary model evaluations. The physics residual may require first, second, or higher derivatives.

## Strong and Weak Forms

Many physics-informed models use the strong form of a differential equation. The residual is evaluated pointwise:

$$
\mathcal{N}[u_\theta](x,t) = 0.
$$

This requires the model to be differentiable enough for the derivatives in the equation.

A weak form integrates the equation against test functions. Instead of enforcing pointwise residuals, it enforces integral identities. Weak forms can reduce derivative order and may behave better for rough solutions.

For example, a second-order equation in strong form may become a first-derivative expression after integration by parts.

The choice affects AD cost. Strong forms often require higher-order derivatives. Weak forms often require quadrature, basis functions, and differentiable integration.

## Inverse Problems

Physics-informed models can estimate unknown physical parameters. Suppose the heat coefficient $\alpha$ is unknown. Treat it as a trainable parameter:

$$
r_{\theta,\alpha}(x,t) =
\frac{\partial u_\theta}{\partial t} -
\alpha
\frac{\partial^2 u_\theta}{\partial x^2}.
$$

Then optimize:

$$
\min_{\theta,\alpha} L(\theta,\alpha).
$$

AD computes gradients with respect to both neural network parameters and physical parameters:

$$
\nabla_\theta L,
\qquad
\frac{\partial L}{\partial \alpha}.
$$

This makes inverse modeling natural when the governing equation is differentiable and the unknown parameters enter the residual smoothly.

## Differentiating Through Solvers

Physics-informed modeling also includes differentiable numerical solvers. Instead of representing $u$ directly with a neural network, the model may define parameters of a simulator and differentiate through the simulation.

For a time-stepping solver:

$$
s_{t+1} = \Phi_\theta(s_t),
$$

a loss at the final state can be differentiated through all time steps:

$$
L = \ell(s_T).
$$

Reverse mode through many time steps resembles backpropagation through time. It can be memory-intensive because intermediate states are needed for the backward pass.

Adjoint methods and checkpointing are often used to reduce memory.

## Numerical Precision and Smoothness

Physics-informed models often need higher-order derivatives. This makes smooth activations important.

ReLU networks are piecewise linear. Their second derivative is zero almost everywhere and undefined at kink points. This can be unsuitable for equations involving second derivatives.

Smooth activations such as tanh, sigmoid, softplus, sine, or other differentiable basis functions are often preferred when the physics residual requires higher derivatives.

The choice of activation changes the differentiability class of $u_\theta$. AD can only compute derivatives of the represented program. It cannot create smoothness that the model does not have.

## Scaling and Conditioning

Physics-informed losses often combine terms with different units and magnitudes. A data loss may be small while a residual loss may be large, or the reverse.

If one term dominates, the optimizer may ignore the others. This is a conditioning problem.

Common techniques include nondimensionalization, adaptive loss weights, residual normalization, curriculum sampling, and separate monitoring of each loss component.

The AD system may compute correct gradients, but the optimizer still follows the geometry induced by the weighted loss.

## Collocation Sampling

Collocation points determine where the physics residual is enforced.

Sampling can be uniform, random, grid-based, adaptive, or concentrated near boundaries and discontinuities. Adaptive sampling adds more points where residuals are large.

A typical loop is:

```text
sample data points
sample boundary points
sample collocation points

compute data loss
compute boundary loss
compute physics residual loss

combine losses
backward
optimizer step
```

Changing the collocation distribution changes the training objective. It is similar to changing the data distribution in ordinary machine learning.

## AD Cost

Physics-informed models can be expensive because they require derivatives with respect to inputs and parameters.

For a scalar model output $u_\theta(x,t)$, computing a first derivative may be cheap. Computing many second derivatives, mixed partials, or Jacobians for vector fields can be costly.

For a vector-valued field:

$$
u_\theta: \mathbb{R}^d \to \mathbb{R}^m,
$$

the residual may require divergence, curl, gradients, Hessians, or Laplacians. Efficient AD mode selection matters. Forward mode can be effective for low-dimensional inputs. Reverse mode is effective for many parameters and scalar losses. Mixed-mode AD is often needed.

## Common Failure Modes

Physics-informed models can fail for reasons unrelated to derivative correctness.

The PDE may be stiff or ill-conditioned.

The residual loss may dominate the data loss.

Boundary conditions may be underweighted.

Collocation points may miss important regions.

The network may lack the smoothness required by the equation.

Higher-order derivatives may amplify floating point error.

The optimizer may converge to a function with small residual at sampled points but poor behavior elsewhere.

The inverse problem may be non-identifiable, so different parameter values explain the data equally well.

These are modeling and numerical issues. AD supplies derivatives of the chosen computational objective.

## Interface to AD Systems

Physics-informed models require AD systems to support derivatives with respect to both inputs and parameters.

A clean implementation separates them:

```text
u = model(params, coordinates)

coordinate_derivatives = diff(u, coordinates)

residual = physics_operator(u, coordinate_derivatives)

loss = residual_loss + data_loss + boundary_loss

param_grad = grad(loss, params)
```

This exposes the two derivative levels clearly. Coordinate derivatives define the equation residual. Parameter derivatives train the model.

Physics-informed models are therefore a demanding use case for AD. They require higher-order differentiation, mixed-mode execution, careful numerical scaling, and explicit graph management. The benefit is that scientific structure becomes part of the training signal instead of only post-training evaluation.