# Taylor Mode AD

## Taylor Mode AD

Taylor mode automatic differentiation computes derivatives by propagating truncated Taylor series through a program.

Instead of tracking only values and first derivatives, Taylor mode tracks entire local polynomial expansions.

For a scalar function

$$
f(x),
$$

expanded around point $x_0$,

$$
f(x_0 + h) =
f(x_0)
+
f'(x_0)h
+
\frac{f''(x_0)}{2!}h^2
+
\frac{f^{(3)}(x_0)}{3!}h^3
+
\cdots
$$

Taylor mode computes these coefficients automatically.

This gives direct access to higher-order derivatives.

## Basic Idea

Forward mode propagates first-order tangent information:

$$
x + \epsilon v.
$$

Taylor mode propagates higher-order polynomial structure:

$$
x(t) =
x_0 + x_1 t + x_2 t^2 + \cdots + x_p t^p.
$$

Each coefficient represents derivative information.

Typically:

$$
x_k =
\frac{1}{k!}
\frac{d^k x}{dt^k}(0).
$$

The program is evaluated symbolically over truncated power series rather than over ordinary numbers.

## Truncated Polynomial Algebra

Taylor mode works in the algebra:

$$
\mathbb{R}[t]/(t^{p+1}),
$$

the ring of polynomials truncated at degree $p$.

A Taylor object has form:

$$
a_0 + a_1 t + a_2 t^2 + \cdots + a_p t^p.
$$

Addition is coefficient-wise:

$$
(a+b)_k = a_k + b_k.
$$

Multiplication uses convolution:

$$
(ab)_k =
\sum_{i=0}^k a_i b_{k-i}.
$$

Higher-order derivatives emerge from polynomial coefficient propagation.

## Example: Polynomial Function

Let

$$
f(x) = x^3.
$$

Suppose:

$$
x(t) = x_0 + x_1 t + x_2 t^2.
$$

Expand:

$$
f(x(t)) =
(x_0 + x_1 t + x_2 t^2)^3.
$$

Keeping terms up to degree 2:

$$ =
x_0^3
+
3x_0^2x_1 t
+
(3x_0^2x_2 + 3x_0x_1^2)t^2.
$$

The coefficients are:

| Degree | Coefficient |
|---|---|
| $t^0$ | $x_0^3$ |
| $t^1$ | $3x_0^2x_1$ |
| $t^2$ | $3x_0^2x_2 + 3x_0x_1^2$ |

The second-order term contains the quadratic interaction naturally.

No explicit Hessian rules are required. The polynomial algebra produces the correct structure automatically.

## Recovering Derivatives

If we choose:

$$
x(t) = x_0 + t,
$$

then:

$$
f(x_0+t) =
\sum_{k=0}^p
\frac{f^{(k)}(x_0)}{k!} t^k.
$$

Therefore:

| Coefficient | Meaning |
|---|---|
| $t^0$ | function value |
| $t^1$ | first derivative |
| $t^2$ | second derivative divided by $2!$ |
| $t^3$ | third derivative divided by $3!$ |

and so on.

Taylor mode computes all derivatives up to order $p$ simultaneously.

## Comparison with Nested Forward Mode

Nested forward mode computes higher derivatives by nesting dual numbers repeatedly.

Taylor mode computes all orders together in one algebraic object.

| Method | Representation |
|---|---|
| nested duals | repeated first-order structure |
| Taylor mode | truncated polynomial |
| hyper-duals | structured second-order dual algebra |
| symbolic differentiation | explicit algebraic manipulation |

Taylor mode avoids some nesting complexity and perturbation tagging issues.

## Elementary Functions

Taylor mode requires propagation rules for elementary functions.

Suppose:

$$
y(t) = \exp(x(t)).
$$

If

$$
x(t) = \sum_{k=0}^p x_k t^k,
$$

then

$$
y(t) = \sum_{k=0}^p y_k t^k
$$

must satisfy:

$$
y'(t) = y(t)x'(t).
$$

Matching coefficients gives recurrence relations for the Taylor coefficients.

Similarly:

| Function | Defining relation |
|---|---|
| $\exp x$ | $y' = yx'$ |
| $\sin x$ | $y' = \cos(x)x'$ |
| $\cos x$ | $y' = -\sin(x)x'$ |
| $\log x$ | $y'x = x'$ |

Taylor mode implementations derive coefficient recurrences from these identities.

## Recurrence Computation

Taylor coefficients are usually computed recursively.

Suppose:

$$
x(t) = \sum_{k=0}^p x_k t^k.
$$

For multiplication:

$$
z(t) = x(t)y(t),
$$

the coefficient recurrence is:

$$
z_k =
\sum_{i=0}^k x_i y_{k-i}.
$$

This is discrete convolution.

For division:

$$
z(t) = \frac{x(t)}{y(t)},
$$

coefficients satisfy:

$$
x_k =
\sum_{i=0}^k y_i z_{k-i}.
$$

Solving recursively gives $z_k$.

Taylor mode therefore becomes a coefficient propagation system.

## Directional Higher Derivatives

Taylor mode naturally computes directional higher derivatives.

Suppose:

$$
x(t) = x_0 + tv.
$$

Then:

$$
f(x_0 + tv) =
\sum_{k=0}^p
\frac{1}{k!}
D^k f(x_0)[v,\ldots,v] t^k.
$$

The coefficients correspond to repeated directional derivatives.

This is valuable because full higher-order derivative tensors are usually too large to materialize.

Taylor mode computes directional expansions efficiently.

## Example: Second Directional Derivative

Let

$$
f(x,y)=x^2y.
$$

Choose direction:

$$
v=(v_1,v_2).
$$

Define:

$$
x(t)=x+tv_1,
\quad
y(t)=y+tv_2.
$$

Then:

$$
f(x(t),y(t)) =
(x+tv_1)^2(y+tv_2).
$$

Expand:

$$ =
x^2y
+
(2xyv_1 + x^2v_2)t
+
(yv_1^2 + 2xv_1v_2)t^2
+
v_1^2v_2 t^3.
$$

The coefficient of $t^2$ gives:

$$
\frac{1}{2}
D^2f(x,y)[v,v].
$$

Taylor mode extracts this automatically.

## Complexity

Suppose:

| Symbol | Meaning |
|---|---|
| $C$ | cost of evaluating original function |
| $p$ | Taylor order |

Then Taylor mode often costs roughly:

$$
O(p^2 C)
$$

because coefficient propagation involves convolution-like operations across orders.

More optimized methods can reduce some operations using FFT-like techniques or specialized recurrences.

Still, high-order derivatives become expensive quickly.

## Memory Structure

Taylor mode stores coefficients for every intermediate variable.

If the original program has $N$ intermediates and Taylor order $p$, storage is roughly:

$$
O(pN).
$$

This is usually better than explicitly materializing high-order derivative tensors.

The representation is compact because it stores directional polynomial structure rather than full multidimensional derivative arrays.

## Taylor Mode vs Reverse Mode

Taylor mode and reverse mode have different strengths.

| Property | Taylor mode | Reverse mode |
|---|---|---|
| gradients | moderate | excellent |
| high-order directional derivatives | excellent | difficult |
| full Hessians | possible | possible |
| high-order tensors | structured | memory-heavy |
| implementation complexity | algebraic | tape-heavy |
| nested differentiation | simpler | harder |
| scalar-output optimization | weaker | strong |

Taylor mode is particularly attractive for moderate-order derivatives and local series expansions.

## Taylor Models

Taylor mode connects naturally to Taylor models used in validated numerics.

A Taylor model represents a function as:

$$
P(x) + R(x),
$$

where:

| Component | Meaning |
|---|---|
| $P(x)$ | polynomial approximation |
| $R(x)$ | rigorous remainder bound |

This allows:

| Capability | Purpose |
|---|---|
| interval arithmetic | rigorous bounds |
| verified ODE solving | guaranteed solutions |
| uncertainty propagation | bounded error |
| validated optimization | correctness guarantees |

Taylor-mode AD provides the polynomial component automatically.

## Differential Equation Solvers

Taylor mode is especially useful for ODE solvers.

Suppose:

$$
\frac{dx}{dt} = f(x).
$$

A Taylor-series integrator computes:

$$
x(t+h) =
x(t)
+
hx'(t)
+
\frac{h^2}{2!}x''(t)
+
\cdots
$$

Automatic differentiation computes the required derivatives recursively.

This yields high-order integrators with strong local accuracy.

Taylor integrators are widely used in celestial mechanics, validated numerics, and stiff dynamical systems.

## Sparse Higher-Order Structure

Full higher-order derivative tensors are enormous.

For dimension $n$:

| Order | Tensor size |
|---|---:|
| 1 | $n$ |
| 2 | $n^2$ |
| 3 | $n^3$ |
| 4 | $n^4$ |

Taylor mode avoids constructing these directly.

Instead it propagates compressed directional structure through coefficient series.

This is often the only practical way to obtain high-order derivative information.

## Compiler Representation

A compiler-based Taylor-mode system may lower each variable into coefficient arrays:

```text
x[0] = primal
x[1] = first-order coefficient
x[2] = second-order coefficient
...
x[p] = pth-order coefficient
```

Operations become coefficient recurrences.

For multiplication:

```text
for k in 0..p:
    z[k] = sum(x[i] * y[k-i] for i in 0..k)
```

The transformed program resembles polynomial arithmetic.

## Numerical Stability

High-order Taylor coefficients can become unstable.

Common problems include:

| Problem | Cause |
|---|---|
| coefficient explosion | factorial growth |
| cancellation | alternating signs |
| truncation instability | insufficient order |
| floating point amplification | repeated convolutions |
| stiff dynamics | rapidly varying derivatives |

Scaling and normalization techniques are often required.

## Practical Uses

Taylor mode appears in:

| Domain | Usage |
|---|---|
| celestial mechanics | high-order integration |
| validated numerics | rigorous bounds |
| dynamical systems | local flow expansion |
| chaos analysis | sensitivity growth |
| PDE solvers | local expansions |
| perturbation methods | asymptotic analysis |
| scientific computing | repeated directional derivatives |

It is less common in large neural network training because reverse-mode gradients dominate that workload.

## Conceptual Interpretation

Forward mode propagates tangent vectors.

Taylor mode propagates local polynomial geometry.

Instead of asking:

$$
\text{How does the function change infinitesimally?}
$$

Taylor mode asks:

$$
\text{What local power series does the function induce?}
$$

The program becomes a transformation over truncated polynomial algebras rather than over ordinary scalar values.