# Taylor Expansions

## Taylor Expansions

Differentiation describes how a function changes locally. A Taylor expansion extends this idea by approximating a function with a polynomial around a point.

Automatic differentiation is fundamentally a first-order method in most applications, but higher-order AD is closely connected to Taylor expansions. Forward mode can be generalized to propagate higher-order coefficients, and many derivative identities become clearer through the Taylor viewpoint.

### Local Polynomial Approximation

Let

$$
f : \mathbb{R} \to \mathbb{R}
$$

be sufficiently smooth near a point $x$. The Taylor expansion around $x$ is

$$
f(x + h) =
f(x)
+
f'(x)h
+
\frac{1}{2}f''(x)h^2
+
\frac{1}{6}f^{(3)}(x)h^3
+
\cdots
$$

More compactly,

$$
f(x+h) =
\sum_{k=0}^{\infty}
\frac{f^{(k)}(x)}{k!}h^k
$$

The coefficients are derivatives evaluated at the expansion point.

The first-order approximation is

$$
f(x+h)
\approx
f(x) + f'(x)h
$$

This is the linearization used by ordinary AD.

The second-order approximation adds curvature:

$$
f(x+h)
\approx
f(x)
+
f'(x)h
+
\frac{1}{2}f''(x)h^2
$$

Higher-order terms capture increasingly fine local structure.

### Taylor Expansion in Multiple Variables

For

$$
f : \mathbb{R}^n \to \mathbb{R}
$$

the expansion around $x$ in direction $\Delta x$ begins as

$$
f(x + \Delta x) =
f(x)
+
\nabla f(x)^T\Delta x
+
\frac{1}{2}
\Delta x^T H_f(x)\Delta x
+
\cdots
$$

The terms have geometric meaning:

| Term | Meaning |
|---|---|
| $f(x)$ | base value |
| $\nabla f(x)^T\Delta x$ | first-order directional change |
| $\frac{1}{2}\Delta x^T H_f(x)\Delta x$ | second-order curvature |
| higher-order terms | finer local structure |

The first-order term is linear in $\Delta x$. The second-order term is quadratic.

Forward mode computes first-order directional information directly. Higher-order AD computes additional coefficients of this expansion.

### Example

Consider

$$
f(x) = e^x
$$

Its derivatives satisfy

$$
f^{(k)}(x) = e^x
$$

for all $k$.

The Taylor expansion around $x=0$ is

$$
e^h =
1 + h + \frac{1}{2}h^2 + \frac{1}{6}h^3 + \cdots
$$

For small $h$, truncating after a few terms gives an accurate approximation.

Now consider

$$
f(x) = \sin x
$$

At $x=0$,

$$
\sin h =
h - \frac{1}{6}h^3 + \frac{1}{120}h^5 - \cdots
$$

The derivative information determines the full local series structure.

### Truncated Series

Practical Taylor methods use truncated series.

A truncated series of order $p$ has the form

$$
a_0 + a_1h + a_2h^2 + \cdots + a_ph^p
$$

Terms above degree $p$ are discarded.

For example, second-order truncation gives

$$
f(x+h)
\approx
f(x)
+
f'(x)h
+
\frac{1}{2}f''(x)h^2
$$

Truncated series behave algebraically. They can be added, multiplied, composed, and differentiated while discarding higher-order terms.

This algebraic structure is the basis of higher-order forward AD.

### Dual Numbers as First-Order Taylor Series

Ordinary dual numbers use an infinitesimal $\varepsilon$ satisfying

$$
\varepsilon^2 = 0
$$

A dual number has the form

$$
a + b\varepsilon
$$

This is exactly a truncated first-order Taylor expansion.

If

$$
x \mapsto x + \varepsilon
$$

then

$$
f(x+\varepsilon) =
f(x)
+
f'(x)\varepsilon
$$

because higher-order terms vanish.

Forward mode AD with dual numbers therefore computes first-order Taylor coefficients automatically.

### Higher-Order Truncated Algebras

To obtain higher derivatives, we generalize the nilpotent rule.

Suppose

$$
\varepsilon^{p+1} = 0
$$

Then a truncated series becomes

$$
a_0 + a_1\varepsilon + a_2\varepsilon^2 + \cdots + a_p\varepsilon^p
$$

Applying a smooth function produces higher-order derivative coefficients automatically.

For example, using third-order truncation,

$$
f(x+\varepsilon) =
f(x)
+
f'(x)\varepsilon
+
\frac{1}{2}f''(x)\varepsilon^2
+
\frac{1}{6}f^{(3)}(x)\varepsilon^3
$$

This idea leads to Taylor-mode AD.

### Taylor Mode Automatic Differentiation

Taylor mode propagates polynomial coefficients rather than single tangents.

Instead of storing

$$
(x, \dot{x})
$$

we store

$$
(x_0, x_1, x_2, \ldots, x_p)
$$

where

| Coefficient | Meaning |
|---|---|
| $x_0$ | primal value |
| $x_1$ | first derivative coefficient |
| $x_2$ | second derivative coefficient |
| $\vdots$ | higher-order coefficients |

Primitive operations are lifted to operate on truncated series.

For addition:

$$
(a_0 + a_1\varepsilon + \cdots)
+
(b_0 + b_1\varepsilon + \cdots)
$$

gives

$$
(a_0+b_0)
+
(a_1+b_1)\varepsilon
+
\cdots
$$

For multiplication:

$$
(a_0 + a_1\varepsilon + \cdots)
(b_0 + b_1\varepsilon + \cdots)
$$

coefficients combine through polynomial convolution.

This allows one forward pass to compute multiple derivative orders simultaneously.

### Composition and Taylor Series

The chain rule appears naturally inside Taylor expansions.

Suppose

$$
f(x) = h(g(x))
$$

Expand $g$ around $x$:

$$
g(x+h) =
g(x)
+
g'(x)h
+
\cdots
$$

Substitute into the Taylor expansion of $h$:

$$
h(g(x+h)) =
h(g(x))
+
h'(g(x))g'(x)h
+
\cdots
$$

The first-order term already contains the chain rule:

$$
(h \circ g)'(x) =
h'(g(x))g'(x)
$$

Higher-order terms produce generalized chain-rule formulas such as Faà di Bruno expansions.

Taylor methods therefore provide another perspective on derivative composition.

### Hessians from Taylor Expansions

The Hessian appears as the quadratic coefficient in multivariate expansions.

For

$$
f(x+\Delta x) =
f(x)
+
\nabla f(x)^T\Delta x
+
\frac{1}{2}\Delta x^T H_f(x)\Delta x
+
\cdots
$$

the Hessian determines second-order behavior.

If we choose a direction $v$ and define

$$
g(t) = f(x+tv)
$$

then

$$
g''(0) =
v^T H_f(x)v
$$

This is the second directional derivative along $v$.

Many higher-order AD algorithms operate through directional Taylor coefficients rather than explicit Hessian matrices.

### Taylor Expansions and Numerical Analysis

Taylor expansions are central in numerical methods.

| Area | Use of Taylor expansions |
|---|---|
| Optimization | local quadratic models |
| ODE solvers | local time stepping |
| Stability analysis | perturbation growth |
| Root finding | Newton and higher-order methods |
| Error estimation | truncation analysis |
| Sensitivity analysis | local parameter dependence |

AD provides exact derivative information needed for these methods without symbolic differentiation.

### Radius of Validity

A Taylor expansion is local.

The approximation quality depends on:

1. Distance from the expansion point.
2. Smoothness of the function.
3. Order of truncation.

Near singularities or discontinuities, the expansion may fail.

For example,

$$
f(x) = \frac{1}{1-x}
$$

has Taylor expansion around $x=0$:

$$
1 + x + x^2 + x^3 + \cdots
$$

This converges only for

$$
|x| < 1
$$

Outside that radius, the series diverges.

AD computes derivatives locally. It does not guarantee that low-order local approximations remain accurate globally.

### Taylor Expansions and Floating Point

Taylor coefficients are mathematically exact derivatives, but practical computation uses floating point arithmetic.

Higher-order derivatives can become numerically unstable:

| Issue | Effect |
|---|---|
| cancellation | loss of precision |
| large factorial scaling | overflow or underflow |
| repeated differentiation | amplification of rounding error |
| high-order tensors | large memory growth |

For this reason, many AD systems focus primarily on first-order reverse mode and selected second-order products rather than arbitrary high-order expansions.

### Relationship to Reverse Mode

Forward Taylor propagation generalizes naturally to higher orders. Reverse mode becomes much more complicated at higher orders because reverse propagation already represents a transposed linearized computation.

Second-order reverse mode requires differentiating reverse computations themselves. The resulting systems involve nested pullbacks, higher-order adjoints, and large intermediate structures.

This is one reason higher-order forward techniques remain important even in systems dominated by reverse-mode machine learning.

### Conceptual View

Taylor expansions provide a unifying interpretation of differentiation.

A differentiable function can be viewed locally as a polynomial approximation:

$$
f(x+h) =
\text{constant term}
+
\text{linear term}
+
\text{quadratic term}
+
\cdots
$$

Automatic differentiation computes the coefficients of this local expansion algorithmically from the program structure.

Ordinary forward mode computes the linear term.

Higher-order AD computes additional coefficients.

Reverse mode computes transposed actions of these local linear structures.

In all cases, the program is treated as a composition of primitive operations, and local expansion rules are propagated through the computation.