Skip to content

Taylor Expansions

Differentiation describes how a function changes locally. A Taylor expansion extends this idea by approximating a function with a polynomial around a point.

Differentiation describes how a function changes locally. A Taylor expansion extends this idea by approximating a function with a polynomial around a point.

Automatic differentiation is fundamentally a first-order method in most applications, but higher-order AD is closely connected to Taylor expansions. Forward mode can be generalized to propagate higher-order coefficients, and many derivative identities become clearer through the Taylor viewpoint.

Local Polynomial Approximation

Let

f:RR f : \mathbb{R} \to \mathbb{R}

be sufficiently smooth near a point xx. The Taylor expansion around xx is

f(x+h)=f(x)+f(x)h+12f(x)h2+16f(3)(x)h3+ f(x + h) = f(x) + f'(x)h + \frac{1}{2}f''(x)h^2 + \frac{1}{6}f^{(3)}(x)h^3 + \cdots

More compactly,

f(x+h)=k=0f(k)(x)k!hk f(x+h) = \sum_{k=0}^{\infty} \frac{f^{(k)}(x)}{k!}h^k

The coefficients are derivatives evaluated at the expansion point.

The first-order approximation is

f(x+h)f(x)+f(x)h f(x+h) \approx f(x) + f'(x)h

This is the linearization used by ordinary AD.

The second-order approximation adds curvature:

f(x+h)f(x)+f(x)h+12f(x)h2 f(x+h) \approx f(x) + f'(x)h + \frac{1}{2}f''(x)h^2

Higher-order terms capture increasingly fine local structure.

Taylor Expansion in Multiple Variables

For

f:RnR f : \mathbb{R}^n \to \mathbb{R}

the expansion around xx in direction Δx\Delta x begins as

f(x+Δx)=f(x)+f(x)TΔx+12ΔxTHf(x)Δx+ f(x + \Delta x) = f(x) + \nabla f(x)^T\Delta x + \frac{1}{2} \Delta x^T H_f(x)\Delta x + \cdots

The terms have geometric meaning:

TermMeaning
f(x)f(x)base value
f(x)TΔx\nabla f(x)^T\Delta xfirst-order directional change
12ΔxTHf(x)Δx\frac{1}{2}\Delta x^T H_f(x)\Delta xsecond-order curvature
higher-order termsfiner local structure

The first-order term is linear in Δx\Delta x. The second-order term is quadratic.

Forward mode computes first-order directional information directly. Higher-order AD computes additional coefficients of this expansion.

Example

Consider

f(x)=ex f(x) = e^x

Its derivatives satisfy

f(k)(x)=ex f^{(k)}(x) = e^x

for all kk.

The Taylor expansion around x=0x=0 is

eh=1+h+12h2+16h3+ e^h = 1 + h + \frac{1}{2}h^2 + \frac{1}{6}h^3 + \cdots

For small hh, truncating after a few terms gives an accurate approximation.

Now consider

f(x)=sinx f(x) = \sin x

At x=0x=0,

sinh=h16h3+1120h5 \sin h = h - \frac{1}{6}h^3 + \frac{1}{120}h^5 - \cdots

The derivative information determines the full local series structure.

Truncated Series

Practical Taylor methods use truncated series.

A truncated series of order pp has the form

a0+a1h+a2h2++aphp a_0 + a_1h + a_2h^2 + \cdots + a_ph^p

Terms above degree pp are discarded.

For example, second-order truncation gives

f(x+h)f(x)+f(x)h+12f(x)h2 f(x+h) \approx f(x) + f'(x)h + \frac{1}{2}f''(x)h^2

Truncated series behave algebraically. They can be added, multiplied, composed, and differentiated while discarding higher-order terms.

This algebraic structure is the basis of higher-order forward AD.

Dual Numbers as First-Order Taylor Series

Ordinary dual numbers use an infinitesimal ε\varepsilon satisfying

ε2=0 \varepsilon^2 = 0

A dual number has the form

a+bε a + b\varepsilon

This is exactly a truncated first-order Taylor expansion.

If

xx+ε x \mapsto x + \varepsilon

then

f(x+ε)=f(x)+f(x)ε f(x+\varepsilon) = f(x) + f'(x)\varepsilon

because higher-order terms vanish.

Forward mode AD with dual numbers therefore computes first-order Taylor coefficients automatically.

Higher-Order Truncated Algebras

To obtain higher derivatives, we generalize the nilpotent rule.

Suppose

εp+1=0 \varepsilon^{p+1} = 0

Then a truncated series becomes

a0+a1ε+a2ε2++apεp a_0 + a_1\varepsilon + a_2\varepsilon^2 + \cdots + a_p\varepsilon^p

Applying a smooth function produces higher-order derivative coefficients automatically.

For example, using third-order truncation,

f(x+ε)=f(x)+f(x)ε+12f(x)ε2+16f(3)(x)ε3 f(x+\varepsilon) = f(x) + f'(x)\varepsilon + \frac{1}{2}f''(x)\varepsilon^2 + \frac{1}{6}f^{(3)}(x)\varepsilon^3

This idea leads to Taylor-mode AD.

Taylor Mode Automatic Differentiation

Taylor mode propagates polynomial coefficients rather than single tangents.

Instead of storing

(x,x˙) (x, \dot{x})

we store

(x0,x1,x2,,xp) (x_0, x_1, x_2, \ldots, x_p)

where

CoefficientMeaning
x0x_0primal value
x1x_1first derivative coefficient
x2x_2second derivative coefficient
\vdotshigher-order coefficients

Primitive operations are lifted to operate on truncated series.

For addition:

(a0+a1ε+)+(b0+b1ε+) (a_0 + a_1\varepsilon + \cdots) + (b_0 + b_1\varepsilon + \cdots)

gives

(a0+b0)+(a1+b1)ε+ (a_0+b_0) + (a_1+b_1)\varepsilon + \cdots

For multiplication:

(a0+a1ε+)(b0+b1ε+) (a_0 + a_1\varepsilon + \cdots) (b_0 + b_1\varepsilon + \cdots)

coefficients combine through polynomial convolution.

This allows one forward pass to compute multiple derivative orders simultaneously.

Composition and Taylor Series

The chain rule appears naturally inside Taylor expansions.

Suppose

f(x)=h(g(x)) f(x) = h(g(x))

Expand gg around xx:

g(x+h)=g(x)+g(x)h+ g(x+h) = g(x) + g'(x)h + \cdots

Substitute into the Taylor expansion of hh:

h(g(x+h))=h(g(x))+h(g(x))g(x)h+ h(g(x+h)) = h(g(x)) + h'(g(x))g'(x)h + \cdots

The first-order term already contains the chain rule:

(hg)(x)=h(g(x))g(x) (h \circ g)'(x) = h'(g(x))g'(x)

Higher-order terms produce generalized chain-rule formulas such as Faà di Bruno expansions.

Taylor methods therefore provide another perspective on derivative composition.

Hessians from Taylor Expansions

The Hessian appears as the quadratic coefficient in multivariate expansions.

For

f(x+Δx)=f(x)+f(x)TΔx+12ΔxTHf(x)Δx+ f(x+\Delta x) = f(x) + \nabla f(x)^T\Delta x + \frac{1}{2}\Delta x^T H_f(x)\Delta x + \cdots

the Hessian determines second-order behavior.

If we choose a direction vv and define

g(t)=f(x+tv) g(t) = f(x+tv)

then

g(0)=vTHf(x)v g''(0) = v^T H_f(x)v

This is the second directional derivative along vv.

Many higher-order AD algorithms operate through directional Taylor coefficients rather than explicit Hessian matrices.

Taylor Expansions and Numerical Analysis

Taylor expansions are central in numerical methods.

AreaUse of Taylor expansions
Optimizationlocal quadratic models
ODE solverslocal time stepping
Stability analysisperturbation growth
Root findingNewton and higher-order methods
Error estimationtruncation analysis
Sensitivity analysislocal parameter dependence

AD provides exact derivative information needed for these methods without symbolic differentiation.

Radius of Validity

A Taylor expansion is local.

The approximation quality depends on:

  1. Distance from the expansion point.
  2. Smoothness of the function.
  3. Order of truncation.

Near singularities or discontinuities, the expansion may fail.

For example,

f(x)=11x f(x) = \frac{1}{1-x}

has Taylor expansion around x=0x=0:

1+x+x2+x3+ 1 + x + x^2 + x^3 + \cdots

This converges only for

x<1 |x| < 1

Outside that radius, the series diverges.

AD computes derivatives locally. It does not guarantee that low-order local approximations remain accurate globally.

Taylor Expansions and Floating Point

Taylor coefficients are mathematically exact derivatives, but practical computation uses floating point arithmetic.

Higher-order derivatives can become numerically unstable:

IssueEffect
cancellationloss of precision
large factorial scalingoverflow or underflow
repeated differentiationamplification of rounding error
high-order tensorslarge memory growth

For this reason, many AD systems focus primarily on first-order reverse mode and selected second-order products rather than arbitrary high-order expansions.

Relationship to Reverse Mode

Forward Taylor propagation generalizes naturally to higher orders. Reverse mode becomes much more complicated at higher orders because reverse propagation already represents a transposed linearized computation.

Second-order reverse mode requires differentiating reverse computations themselves. The resulting systems involve nested pullbacks, higher-order adjoints, and large intermediate structures.

This is one reason higher-order forward techniques remain important even in systems dominated by reverse-mode machine learning.

Conceptual View

Taylor expansions provide a unifying interpretation of differentiation.

A differentiable function can be viewed locally as a polynomial approximation:

f(x+h)=constant term+linear term+quadratic term+ f(x+h) = \text{constant term} + \text{linear term} + \text{quadratic term} + \cdots

Automatic differentiation computes the coefficients of this local expansion algorithmically from the program structure.

Ordinary forward mode computes the linear term.

Higher-order AD computes additional coefficients.

Reverse mode computes transposed actions of these local linear structures.

In all cases, the program is treated as a composition of primitive operations, and local expansion rules are propagated through the computation.