Skip to content

Taylor Mode AD

Taylor mode automatic differentiation computes derivatives by propagating truncated Taylor series through a program.

Taylor mode automatic differentiation computes derivatives by propagating truncated Taylor series through a program.

Instead of tracking only values and first derivatives, Taylor mode tracks entire local polynomial expansions.

For a scalar function

f(x), f(x),

expanded around point x0x_0,

f(x0+h)=f(x0)+f(x0)h+f(x0)2!h2+f(3)(x0)3!h3+ f(x_0 + h) = f(x_0) + f'(x_0)h + \frac{f''(x_0)}{2!}h^2 + \frac{f^{(3)}(x_0)}{3!}h^3 + \cdots

Taylor mode computes these coefficients automatically.

This gives direct access to higher-order derivatives.

Basic Idea

Forward mode propagates first-order tangent information:

x+ϵv. x + \epsilon v.

Taylor mode propagates higher-order polynomial structure:

x(t)=x0+x1t+x2t2++xptp. x(t) = x_0 + x_1 t + x_2 t^2 + \cdots + x_p t^p.

Each coefficient represents derivative information.

Typically:

xk=1k!dkxdtk(0). x_k = \frac{1}{k!} \frac{d^k x}{dt^k}(0).

The program is evaluated symbolically over truncated power series rather than over ordinary numbers.

Truncated Polynomial Algebra

Taylor mode works in the algebra:

R[t]/(tp+1), \mathbb{R}[t]/(t^{p+1}),

the ring of polynomials truncated at degree pp.

A Taylor object has form:

a0+a1t+a2t2++aptp. a_0 + a_1 t + a_2 t^2 + \cdots + a_p t^p.

Addition is coefficient-wise:

(a+b)k=ak+bk. (a+b)_k = a_k + b_k.

Multiplication uses convolution:

(ab)k=i=0kaibki. (ab)_k = \sum_{i=0}^k a_i b_{k-i}.

Higher-order derivatives emerge from polynomial coefficient propagation.

Example: Polynomial Function

Let

f(x)=x3. f(x) = x^3.

Suppose:

x(t)=x0+x1t+x2t2. x(t) = x_0 + x_1 t + x_2 t^2.

Expand:

f(x(t))=(x0+x1t+x2t2)3. f(x(t)) = (x_0 + x_1 t + x_2 t^2)^3.

Keeping terms up to degree 2:

=x03+3x02x1t+(3x02x2+3x0x12)t2. = x_0^3 + 3x_0^2x_1 t + (3x_0^2x_2 + 3x_0x_1^2)t^2.

The coefficients are:

DegreeCoefficient
t0t^0x03x_0^3
t1t^13x02x13x_0^2x_1
t2t^23x02x2+3x0x123x_0^2x_2 + 3x_0x_1^2

The second-order term contains the quadratic interaction naturally.

No explicit Hessian rules are required. The polynomial algebra produces the correct structure automatically.

Recovering Derivatives

If we choose:

x(t)=x0+t, x(t) = x_0 + t,

then:

f(x0+t)=k=0pf(k)(x0)k!tk. f(x_0+t) = \sum_{k=0}^p \frac{f^{(k)}(x_0)}{k!} t^k.

Therefore:

CoefficientMeaning
t0t^0function value
t1t^1first derivative
t2t^2second derivative divided by 2!2!
t3t^3third derivative divided by 3!3!

and so on.

Taylor mode computes all derivatives up to order pp simultaneously.

Comparison with Nested Forward Mode

Nested forward mode computes higher derivatives by nesting dual numbers repeatedly.

Taylor mode computes all orders together in one algebraic object.

MethodRepresentation
nested dualsrepeated first-order structure
Taylor modetruncated polynomial
hyper-dualsstructured second-order dual algebra
symbolic differentiationexplicit algebraic manipulation

Taylor mode avoids some nesting complexity and perturbation tagging issues.

Elementary Functions

Taylor mode requires propagation rules for elementary functions.

Suppose:

y(t)=exp(x(t)). y(t) = \exp(x(t)).

If

x(t)=k=0pxktk, x(t) = \sum_{k=0}^p x_k t^k,

then

y(t)=k=0pyktk y(t) = \sum_{k=0}^p y_k t^k

must satisfy:

y(t)=y(t)x(t). y'(t) = y(t)x'(t).

Matching coefficients gives recurrence relations for the Taylor coefficients.

Similarly:

FunctionDefining relation
expx\exp xy=yxy' = yx'
sinx\sin xy=cos(x)xy' = \cos(x)x'
cosx\cos xy=sin(x)xy' = -\sin(x)x'
logx\log xyx=xy'x = x'

Taylor mode implementations derive coefficient recurrences from these identities.

Recurrence Computation

Taylor coefficients are usually computed recursively.

Suppose:

x(t)=k=0pxktk. x(t) = \sum_{k=0}^p x_k t^k.

For multiplication:

z(t)=x(t)y(t), z(t) = x(t)y(t),

the coefficient recurrence is:

zk=i=0kxiyki. z_k = \sum_{i=0}^k x_i y_{k-i}.

This is discrete convolution.

For division:

z(t)=x(t)y(t), z(t) = \frac{x(t)}{y(t)},

coefficients satisfy:

xk=i=0kyizki. x_k = \sum_{i=0}^k y_i z_{k-i}.

Solving recursively gives zkz_k.

Taylor mode therefore becomes a coefficient propagation system.

Directional Higher Derivatives

Taylor mode naturally computes directional higher derivatives.

Suppose:

x(t)=x0+tv. x(t) = x_0 + tv.

Then:

f(x0+tv)=k=0p1k!Dkf(x0)[v,,v]tk. f(x_0 + tv) = \sum_{k=0}^p \frac{1}{k!} D^k f(x_0)[v,\ldots,v] t^k.

The coefficients correspond to repeated directional derivatives.

This is valuable because full higher-order derivative tensors are usually too large to materialize.

Taylor mode computes directional expansions efficiently.

Example: Second Directional Derivative

Let

f(x,y)=x2y. f(x,y)=x^2y.

Choose direction:

v=(v1,v2). v=(v_1,v_2).

Define:

x(t)=x+tv1,y(t)=y+tv2. x(t)=x+tv_1, \quad y(t)=y+tv_2.

Then:

f(x(t),y(t))=(x+tv1)2(y+tv2). f(x(t),y(t)) = (x+tv_1)^2(y+tv_2).

Expand:

=x2y+(2xyv1+x2v2)t+(yv12+2xv1v2)t2+v12v2t3. = x^2y + (2xyv_1 + x^2v_2)t + (yv_1^2 + 2xv_1v_2)t^2 + v_1^2v_2 t^3.

The coefficient of t2t^2 gives:

12D2f(x,y)[v,v]. \frac{1}{2} D^2f(x,y)[v,v].

Taylor mode extracts this automatically.

Complexity

Suppose:

SymbolMeaning
CCcost of evaluating original function
ppTaylor order

Then Taylor mode often costs roughly:

O(p2C) O(p^2 C)

because coefficient propagation involves convolution-like operations across orders.

More optimized methods can reduce some operations using FFT-like techniques or specialized recurrences.

Still, high-order derivatives become expensive quickly.

Memory Structure

Taylor mode stores coefficients for every intermediate variable.

If the original program has NN intermediates and Taylor order pp, storage is roughly:

O(pN). O(pN).

This is usually better than explicitly materializing high-order derivative tensors.

The representation is compact because it stores directional polynomial structure rather than full multidimensional derivative arrays.

Taylor Mode vs Reverse Mode

Taylor mode and reverse mode have different strengths.

PropertyTaylor modeReverse mode
gradientsmoderateexcellent
high-order directional derivativesexcellentdifficult
full Hessianspossiblepossible
high-order tensorsstructuredmemory-heavy
implementation complexityalgebraictape-heavy
nested differentiationsimplerharder
scalar-output optimizationweakerstrong

Taylor mode is particularly attractive for moderate-order derivatives and local series expansions.

Taylor Models

Taylor mode connects naturally to Taylor models used in validated numerics.

A Taylor model represents a function as:

P(x)+R(x), P(x) + R(x),

where:

ComponentMeaning
P(x)P(x)polynomial approximation
R(x)R(x)rigorous remainder bound

This allows:

CapabilityPurpose
interval arithmeticrigorous bounds
verified ODE solvingguaranteed solutions
uncertainty propagationbounded error
validated optimizationcorrectness guarantees

Taylor-mode AD provides the polynomial component automatically.

Differential Equation Solvers

Taylor mode is especially useful for ODE solvers.

Suppose:

dxdt=f(x). \frac{dx}{dt} = f(x).

A Taylor-series integrator computes:

x(t+h)=x(t)+hx(t)+h22!x(t)+ x(t+h) = x(t) + hx'(t) + \frac{h^2}{2!}x''(t) + \cdots

Automatic differentiation computes the required derivatives recursively.

This yields high-order integrators with strong local accuracy.

Taylor integrators are widely used in celestial mechanics, validated numerics, and stiff dynamical systems.

Sparse Higher-Order Structure

Full higher-order derivative tensors are enormous.

For dimension nn:

OrderTensor size
1nn
2n2n^2
3n3n^3
4n4n^4

Taylor mode avoids constructing these directly.

Instead it propagates compressed directional structure through coefficient series.

This is often the only practical way to obtain high-order derivative information.

Compiler Representation

A compiler-based Taylor-mode system may lower each variable into coefficient arrays:

x[0] = primal
x[1] = first-order coefficient
x[2] = second-order coefficient
...
x[p] = pth-order coefficient

Operations become coefficient recurrences.

For multiplication:

for k in 0..p:
    z[k] = sum(x[i] * y[k-i] for i in 0..k)

The transformed program resembles polynomial arithmetic.

Numerical Stability

High-order Taylor coefficients can become unstable.

Common problems include:

ProblemCause
coefficient explosionfactorial growth
cancellationalternating signs
truncation instabilityinsufficient order
floating point amplificationrepeated convolutions
stiff dynamicsrapidly varying derivatives

Scaling and normalization techniques are often required.

Practical Uses

Taylor mode appears in:

DomainUsage
celestial mechanicshigh-order integration
validated numericsrigorous bounds
dynamical systemslocal flow expansion
chaos analysissensitivity growth
PDE solverslocal expansions
perturbation methodsasymptotic analysis
scientific computingrepeated directional derivatives

It is less common in large neural network training because reverse-mode gradients dominate that workload.

Conceptual Interpretation

Forward mode propagates tangent vectors.

Taylor mode propagates local polynomial geometry.

Instead of asking:

How does the function change infinitesimally? \text{How does the function change infinitesimally?}

Taylor mode asks:

What local power series does the function induce? \text{What local power series does the function induce?}

The program becomes a transformation over truncated polynomial algebras rather than over ordinary scalar values.