Differentiation describes how a function changes locally. A Taylor expansion extends this idea by approximating a function with a polynomial around a point.
Differentiation describes how a function changes locally. A Taylor expansion extends this idea by approximating a function with a polynomial around a point.
Automatic differentiation is fundamentally a first-order method in most applications, but higher-order AD is closely connected to Taylor expansions. Forward mode can be generalized to propagate higher-order coefficients, and many derivative identities become clearer through the Taylor viewpoint.
Local Polynomial Approximation
Let
be sufficiently smooth near a point . The Taylor expansion around is
More compactly,
The coefficients are derivatives evaluated at the expansion point.
The first-order approximation is
This is the linearization used by ordinary AD.
The second-order approximation adds curvature:
Higher-order terms capture increasingly fine local structure.
Taylor Expansion in Multiple Variables
For
the expansion around in direction begins as
The terms have geometric meaning:
| Term | Meaning |
|---|---|
| base value | |
| first-order directional change | |
| second-order curvature | |
| higher-order terms | finer local structure |
The first-order term is linear in . The second-order term is quadratic.
Forward mode computes first-order directional information directly. Higher-order AD computes additional coefficients of this expansion.
Example
Consider
Its derivatives satisfy
for all .
The Taylor expansion around is
For small , truncating after a few terms gives an accurate approximation.
Now consider
At ,
The derivative information determines the full local series structure.
Truncated Series
Practical Taylor methods use truncated series.
A truncated series of order has the form
Terms above degree are discarded.
For example, second-order truncation gives
Truncated series behave algebraically. They can be added, multiplied, composed, and differentiated while discarding higher-order terms.
This algebraic structure is the basis of higher-order forward AD.
Dual Numbers as First-Order Taylor Series
Ordinary dual numbers use an infinitesimal satisfying
A dual number has the form
This is exactly a truncated first-order Taylor expansion.
If
then
because higher-order terms vanish.
Forward mode AD with dual numbers therefore computes first-order Taylor coefficients automatically.
Higher-Order Truncated Algebras
To obtain higher derivatives, we generalize the nilpotent rule.
Suppose
Then a truncated series becomes
Applying a smooth function produces higher-order derivative coefficients automatically.
For example, using third-order truncation,
This idea leads to Taylor-mode AD.
Taylor Mode Automatic Differentiation
Taylor mode propagates polynomial coefficients rather than single tangents.
Instead of storing
we store
where
| Coefficient | Meaning |
|---|---|
| primal value | |
| first derivative coefficient | |
| second derivative coefficient | |
| higher-order coefficients |
Primitive operations are lifted to operate on truncated series.
For addition:
gives
For multiplication:
coefficients combine through polynomial convolution.
This allows one forward pass to compute multiple derivative orders simultaneously.
Composition and Taylor Series
The chain rule appears naturally inside Taylor expansions.
Suppose
Expand around :
Substitute into the Taylor expansion of :
The first-order term already contains the chain rule:
Higher-order terms produce generalized chain-rule formulas such as Faà di Bruno expansions.
Taylor methods therefore provide another perspective on derivative composition.
Hessians from Taylor Expansions
The Hessian appears as the quadratic coefficient in multivariate expansions.
For
the Hessian determines second-order behavior.
If we choose a direction and define
then
This is the second directional derivative along .
Many higher-order AD algorithms operate through directional Taylor coefficients rather than explicit Hessian matrices.
Taylor Expansions and Numerical Analysis
Taylor expansions are central in numerical methods.
| Area | Use of Taylor expansions |
|---|---|
| Optimization | local quadratic models |
| ODE solvers | local time stepping |
| Stability analysis | perturbation growth |
| Root finding | Newton and higher-order methods |
| Error estimation | truncation analysis |
| Sensitivity analysis | local parameter dependence |
AD provides exact derivative information needed for these methods without symbolic differentiation.
Radius of Validity
A Taylor expansion is local.
The approximation quality depends on:
- Distance from the expansion point.
- Smoothness of the function.
- Order of truncation.
Near singularities or discontinuities, the expansion may fail.
For example,
has Taylor expansion around :
This converges only for
Outside that radius, the series diverges.
AD computes derivatives locally. It does not guarantee that low-order local approximations remain accurate globally.
Taylor Expansions and Floating Point
Taylor coefficients are mathematically exact derivatives, but practical computation uses floating point arithmetic.
Higher-order derivatives can become numerically unstable:
| Issue | Effect |
|---|---|
| cancellation | loss of precision |
| large factorial scaling | overflow or underflow |
| repeated differentiation | amplification of rounding error |
| high-order tensors | large memory growth |
For this reason, many AD systems focus primarily on first-order reverse mode and selected second-order products rather than arbitrary high-order expansions.
Relationship to Reverse Mode
Forward Taylor propagation generalizes naturally to higher orders. Reverse mode becomes much more complicated at higher orders because reverse propagation already represents a transposed linearized computation.
Second-order reverse mode requires differentiating reverse computations themselves. The resulting systems involve nested pullbacks, higher-order adjoints, and large intermediate structures.
This is one reason higher-order forward techniques remain important even in systems dominated by reverse-mode machine learning.
Conceptual View
Taylor expansions provide a unifying interpretation of differentiation.
A differentiable function can be viewed locally as a polynomial approximation:
Automatic differentiation computes the coefficients of this local expansion algorithmically from the program structure.
Ordinary forward mode computes the linear term.
Higher-order AD computes additional coefficients.
Reverse mode computes transposed actions of these local linear structures.
In all cases, the program is treated as a composition of primitive operations, and local expansion rules are propagated through the computation.