Taylor mode automatic differentiation computes derivatives by propagating truncated Taylor series through a program.
Taylor mode automatic differentiation computes derivatives by propagating truncated Taylor series through a program.
Instead of tracking only values and first derivatives, Taylor mode tracks entire local polynomial expansions.
For a scalar function
expanded around point ,
Taylor mode computes these coefficients automatically.
This gives direct access to higher-order derivatives.
Basic Idea
Forward mode propagates first-order tangent information:
Taylor mode propagates higher-order polynomial structure:
Each coefficient represents derivative information.
Typically:
The program is evaluated symbolically over truncated power series rather than over ordinary numbers.
Truncated Polynomial Algebra
Taylor mode works in the algebra:
the ring of polynomials truncated at degree .
A Taylor object has form:
Addition is coefficient-wise:
Multiplication uses convolution:
Higher-order derivatives emerge from polynomial coefficient propagation.
Example: Polynomial Function
Let
Suppose:
Expand:
Keeping terms up to degree 2:
The coefficients are:
| Degree | Coefficient |
|---|---|
The second-order term contains the quadratic interaction naturally.
No explicit Hessian rules are required. The polynomial algebra produces the correct structure automatically.
Recovering Derivatives
If we choose:
then:
Therefore:
| Coefficient | Meaning |
|---|---|
| function value | |
| first derivative | |
| second derivative divided by | |
| third derivative divided by |
and so on.
Taylor mode computes all derivatives up to order simultaneously.
Comparison with Nested Forward Mode
Nested forward mode computes higher derivatives by nesting dual numbers repeatedly.
Taylor mode computes all orders together in one algebraic object.
| Method | Representation |
|---|---|
| nested duals | repeated first-order structure |
| Taylor mode | truncated polynomial |
| hyper-duals | structured second-order dual algebra |
| symbolic differentiation | explicit algebraic manipulation |
Taylor mode avoids some nesting complexity and perturbation tagging issues.
Elementary Functions
Taylor mode requires propagation rules for elementary functions.
Suppose:
If
then
must satisfy:
Matching coefficients gives recurrence relations for the Taylor coefficients.
Similarly:
| Function | Defining relation |
|---|---|
Taylor mode implementations derive coefficient recurrences from these identities.
Recurrence Computation
Taylor coefficients are usually computed recursively.
Suppose:
For multiplication:
the coefficient recurrence is:
This is discrete convolution.
For division:
coefficients satisfy:
Solving recursively gives .
Taylor mode therefore becomes a coefficient propagation system.
Directional Higher Derivatives
Taylor mode naturally computes directional higher derivatives.
Suppose:
Then:
The coefficients correspond to repeated directional derivatives.
This is valuable because full higher-order derivative tensors are usually too large to materialize.
Taylor mode computes directional expansions efficiently.
Example: Second Directional Derivative
Let
Choose direction:
Define:
Then:
Expand:
The coefficient of gives:
Taylor mode extracts this automatically.
Complexity
Suppose:
| Symbol | Meaning |
|---|---|
| cost of evaluating original function | |
| Taylor order |
Then Taylor mode often costs roughly:
because coefficient propagation involves convolution-like operations across orders.
More optimized methods can reduce some operations using FFT-like techniques or specialized recurrences.
Still, high-order derivatives become expensive quickly.
Memory Structure
Taylor mode stores coefficients for every intermediate variable.
If the original program has intermediates and Taylor order , storage is roughly:
This is usually better than explicitly materializing high-order derivative tensors.
The representation is compact because it stores directional polynomial structure rather than full multidimensional derivative arrays.
Taylor Mode vs Reverse Mode
Taylor mode and reverse mode have different strengths.
| Property | Taylor mode | Reverse mode |
|---|---|---|
| gradients | moderate | excellent |
| high-order directional derivatives | excellent | difficult |
| full Hessians | possible | possible |
| high-order tensors | structured | memory-heavy |
| implementation complexity | algebraic | tape-heavy |
| nested differentiation | simpler | harder |
| scalar-output optimization | weaker | strong |
Taylor mode is particularly attractive for moderate-order derivatives and local series expansions.
Taylor Models
Taylor mode connects naturally to Taylor models used in validated numerics.
A Taylor model represents a function as:
where:
| Component | Meaning |
|---|---|
| polynomial approximation | |
| rigorous remainder bound |
This allows:
| Capability | Purpose |
|---|---|
| interval arithmetic | rigorous bounds |
| verified ODE solving | guaranteed solutions |
| uncertainty propagation | bounded error |
| validated optimization | correctness guarantees |
Taylor-mode AD provides the polynomial component automatically.
Differential Equation Solvers
Taylor mode is especially useful for ODE solvers.
Suppose:
A Taylor-series integrator computes:
Automatic differentiation computes the required derivatives recursively.
This yields high-order integrators with strong local accuracy.
Taylor integrators are widely used in celestial mechanics, validated numerics, and stiff dynamical systems.
Sparse Higher-Order Structure
Full higher-order derivative tensors are enormous.
For dimension :
| Order | Tensor size |
|---|---|
| 1 | |
| 2 | |
| 3 | |
| 4 |
Taylor mode avoids constructing these directly.
Instead it propagates compressed directional structure through coefficient series.
This is often the only practical way to obtain high-order derivative information.
Compiler Representation
A compiler-based Taylor-mode system may lower each variable into coefficient arrays:
x[0] = primal
x[1] = first-order coefficient
x[2] = second-order coefficient
...
x[p] = pth-order coefficientOperations become coefficient recurrences.
For multiplication:
for k in 0..p:
z[k] = sum(x[i] * y[k-i] for i in 0..k)The transformed program resembles polynomial arithmetic.
Numerical Stability
High-order Taylor coefficients can become unstable.
Common problems include:
| Problem | Cause |
|---|---|
| coefficient explosion | factorial growth |
| cancellation | alternating signs |
| truncation instability | insufficient order |
| floating point amplification | repeated convolutions |
| stiff dynamics | rapidly varying derivatives |
Scaling and normalization techniques are often required.
Practical Uses
Taylor mode appears in:
| Domain | Usage |
|---|---|
| celestial mechanics | high-order integration |
| validated numerics | rigorous bounds |
| dynamical systems | local flow expansion |
| chaos analysis | sensitivity growth |
| PDE solvers | local expansions |
| perturbation methods | asymptotic analysis |
| scientific computing | repeated directional derivatives |
It is less common in large neural network training because reverse-mode gradients dominate that workload.
Conceptual Interpretation
Forward mode propagates tangent vectors.
Taylor mode propagates local polynomial geometry.
Instead of asking:
Taylor mode asks:
The program becomes a transformation over truncated polynomial algebras rather than over ordinary scalar values.