Skip to content

Hyper-Dual Numbers

Dual numbers compute first derivatives exactly. Truncated polynomial algebras extend this to higher-order derivatives, but practical higher-order differentiation introduces an...

Dual numbers compute first derivatives exactly. Truncated polynomial algebras extend this to higher-order derivatives, but practical higher-order differentiation introduces an important problem: extracting second derivatives accurately without symbolic expansion or numerical cancellation.

Hyper-dual numbers solve this problem by introducing multiple nilpotent infinitesimal directions whose mixed products survive.

They provide an exact algebraic mechanism for computing:

  • second derivatives
  • mixed partial derivatives
  • Hessians

without finite differences and without truncation error.

Motivation

Ordinary dual numbers satisfy:

ε2=0. \varepsilon^2 = 0.

Evaluating

f(x+ε) f(x+\varepsilon)

produces:

f(x)+f(x)ε. f(x)+f'(x)\varepsilon.

Only first-order information survives.

To recover second derivatives, one possibility is nested dual numbers or truncated polynomial algebras. However, those approaches may:

  • increase implementation complexity
  • require managing higher polynomial coefficients
  • introduce perturbation confusion in nested systems

Hyper-dual numbers provide a cleaner construction for exact second-order differentiation.

The Hyper-Dual Algebra

Introduce two independent infinitesimal generators:

ε1,ε2. \varepsilon_1,\varepsilon_2.

Require:

ε12=0 \varepsilon_1^2 = 0 ε22=0. \varepsilon_2^2 = 0.

But preserve the mixed product:

ε1ε20. \varepsilon_1\varepsilon_2 \neq 0.

Also:

(ε1ε2)2=0. (\varepsilon_1\varepsilon_2)^2 = 0.

A hyper-dual number has the form:

a+bε1+cε2+dε1ε2. a + b\varepsilon_1 + c\varepsilon_2 + d\varepsilon_1\varepsilon_2.

This algebra stores:

ComponentMeaning
aaprimal value
bbfirst derivative in direction 1
ccfirst derivative in direction 2
ddmixed second derivative

Why Mixed Products Matter

The key idea is that:

(ε1+ε2)2=2ε1ε2. (\varepsilon_1+\varepsilon_2)^2 = 2\varepsilon_1\varepsilon_2.

The square does not vanish completely because cross terms survive.

This allows second-order information to appear algebraically.

Taylor Expansion

For a smooth scalar function:

f(x+h), f(x+h),

the second-order Taylor expansion is:

f(x+h)=f(x)+f(x)h+12f(x)h2. f(x+h) = f(x) + f'(x)h + \frac12 f''(x)h^2.

Now substitute:

h=aε1+bε2. h = a\varepsilon_1 + b\varepsilon_2.

Since:

ε12=ε22=0, \varepsilon_1^2 = \varepsilon_2^2 = 0,

the square becomes:

h2=2abε1ε2. h^2 = 2ab\varepsilon_1\varepsilon_2.

Thus:

f(x+h)=f(x)+f(x)(aε1+bε2)+f(x)abε1ε2. f(x+h) = f(x) + f'(x)(a\varepsilon_1+b\varepsilon_2) + f''(x)ab\varepsilon_1\varepsilon_2.

The coefficient of:

ε1ε2 \varepsilon_1\varepsilon_2

is exactly the second derivative.

Example

Let:

f(x)=x3. f(x)=x^3.

Use the hyper-dual input:

x+ε1+ε2. x+\varepsilon_1+\varepsilon_2.

Expand:

(x+ε1+ε2)3. (x+\varepsilon_1+\varepsilon_2)^3.

First compute:

(x+h)3=x3+3x2h+3xh2+h3. (x+h)^3 = x^3 + 3x^2h + 3xh^2 + h^3.

Since:

h=ε1+ε2, h=\varepsilon_1+\varepsilon_2,

and:

h2=2ε1ε2, h^2 = 2\varepsilon_1\varepsilon_2,

while:

h3=0, h^3=0,

we obtain:

x3+3x2(ε1+ε2)+6xε1ε2. x^3 + 3x^2(\varepsilon_1+\varepsilon_2) + 6x\varepsilon_1\varepsilon_2.

Thus:

CoefficientValue
11x3x^3
ε1\varepsilon_13x23x^2
ε2\varepsilon_23x23x^2
ε1ε2\varepsilon_1\varepsilon_26x6x

Since:

f(x)=6x, f''(x)=6x,

the mixed coefficient gives the exact second derivative.

Multivariable Functions

Hyper-dual numbers naturally extend to multivariate functions.

Suppose:

f:RnR. f : \mathbb{R}^n \to \mathbb{R}.

Choose two perturbation directions:

u,vRn. u,v \in \mathbb{R}^n.

Evaluate:

x+uε1+vε2. x + u\varepsilon_1 + v\varepsilon_2.

Then:

f(x+uε1+vε2) f(x+u\varepsilon_1+v\varepsilon_2)

expands to:

f(x)+Dfx(u)ε1+Dfx(v)ε2+uTHxvε1ε2. f(x) + Df_x(u)\varepsilon_1 + Df_x(v)\varepsilon_2 + u^T H_x v \, \varepsilon_1\varepsilon_2.

The mixed coefficient gives the Hessian bilinear form:

uTHxv. u^T H_x v.

This computes exact second-order directional derivatives.

Hessian Extraction

To compute a Hessian entry:

2fxixj, \frac{\partial^2 f}{\partial x_i \partial x_j},

seed:

u=ei,v=ej. u=e_i, \quad v=e_j.

Then the coefficient of:

ε1ε2 \varepsilon_1\varepsilon_2

is exactly:

Hij. H_{ij}.

Repeated evaluation recovers the full Hessian matrix.

Example: Two Variables

Let:

f(x,y)=x2y+sin(xy). f(x,y)=x^2y+\sin(xy).

Choose perturbations:

xx+ε1 x \mapsto x+\varepsilon_1 yy+ε2. y \mapsto y+\varepsilon_2.

Then:

xy=xy+yε1+xε2+ε1ε2. xy = xy + y\varepsilon_1 + x\varepsilon_2 + \varepsilon_1\varepsilon_2.

Mixed terms appear automatically.

Expanding the entire function produces coefficients involving:

ε1ε2, \varepsilon_1\varepsilon_2,

which equal:

2fxy. \frac{\partial^2 f}{\partial x\partial y}.

No symbolic differentiation is needed.

Exactness

Hyper-dual differentiation is exact up to floating-point arithmetic.

Unlike finite differences:

MethodError Source
Finite differencestruncation + cancellation
Symbolic differentiationexpression explosion
Hyper-dual numbersfloating-point only

No step size is required.

No subtraction cancellation occurs.

The derivative structure emerges algebraically.

Algebraic Structure

The hyper-dual algebra can be written:

R[ε1,ε2]/(ε12,ε22). \mathbb{R}[\varepsilon_1,\varepsilon_2] / (\varepsilon_1^2,\varepsilon_2^2).

Basis elements are:

1,ε1,ε2,ε1ε2. 1, \varepsilon_1, \varepsilon_2, \varepsilon_1\varepsilon_2.

Dimension is four.

Multiplication rules:

ProductResult
ε12\varepsilon_1^200
ε22\varepsilon_2^200
ε1ε2\varepsilon_1\varepsilon_2survives
(ε1ε2)2(\varepsilon_1\varepsilon_2)^200

This carefully chosen nilpotent structure isolates second-order interactions.

Computational Interpretation

A hyper-dual number may be represented as:

type HyperDual struct {
    Val  float64
    D1   float64
    D2   float64
    D12  float64
}

Components represent:

FieldMeaning
Valprimal value
D1first derivative along direction 1
D2first derivative along direction 2
D12mixed second derivative

Multiplication Rule

Suppose:

x=(a,b,c,d) x=(a,b,c,d)

and

y=(p,q,r,s). y=(p,q,r,s).

Then multiplication becomes:

xy=(ap,aq+bp,ar+cp,as+br+cq+dp). xy= ( ap, aq+bp, ar+cp, as+br+cq+dp ).

The mixed term obeys the second-order product rule automatically.

Example Implementation

func Mul(x, y HyperDual) HyperDual {
    return HyperDual{
        Val: x.Val * y.Val,

        D1:
            x.D1*y.Val +
            x.Val*y.D1,

        D2:
            x.D2*y.Val +
            x.Val*y.D2,

        D12:
            x.D12*y.Val +
            x.D1*y.D2 +
            x.D2*y.D1 +
            x.Val*y.D12,
    }
}

The D12 component contains all mixed second-order interactions.

Relation to Hessian-Vector Products

Hyper-dual numbers compute second-order directional derivatives naturally.

Given:

uTHv, u^T H v,

evaluate:

x+uε1+vε2. x + u\varepsilon_1 + v\varepsilon_2.

The coefficient of:

ε1ε2 \varepsilon_1\varepsilon_2

is the result.

This avoids explicit Hessian construction.

For large systems, Hessian-vector products are often preferable to dense Hessians.

Perturbation Confusion

Nested dual-number systems may accidentally mix perturbation symbols.

Hyper-dual numbers avoid this by explicitly separating infinitesimal generators:

ε1,ε2. \varepsilon_1, \varepsilon_2.

Each perturbation direction remains algebraically distinct.

This improves correctness in higher-order implementations.

Relation to Truncated Polynomial Algebras

Hyper-dual numbers differ from ordinary truncated polynomial algebras.

Truncated polynomial algebra:

R[ε]/(ε3) \mathbb{R}[\varepsilon]/(\varepsilon^3)

keeps powers:

1,ε,ε2. 1,\varepsilon,\varepsilon^2.

Hyper-dual algebra instead keeps:

1,ε1,ε2,ε1ε2. 1, \varepsilon_1, \varepsilon_2, \varepsilon_1\varepsilon_2.

This distinction matters:

StructureStores
Truncated polynomialrepeated derivatives
Hyper-dualmixed derivatives

Hyper-dual systems are particularly effective for Hessian computation.

Complexity

For nn variables:

  • one forward dual pass computes one directional derivative
  • one hyper-dual pass computes one second-order directional interaction

Dense Hessian construction still requires multiple evaluations.

However, the method remains exact and compositional.

Geometric Interpretation

Dual numbers represent tangent vectors.

Hyper-dual numbers represent interacting tangent directions.

The mixed product:

ε1ε2 \varepsilon_1\varepsilon_2

captures curvature.

First-order infinitesimals describe local linear geometry.

Second-order mixed infinitesimals describe local quadratic geometry.

Hyper-dual numbers therefore encode second-order local structure.

Summary

Hyper-dual numbers extend dual numbers by introducing multiple independent nilpotent directions whose mixed products survive.

The algebra:

R[ε1,ε2]/(ε12,ε22) \mathbb{R}[\varepsilon_1,\varepsilon_2] / (\varepsilon_1^2,\varepsilon_2^2)

produces exact second derivatives through ordinary program evaluation.

Key properties:

FeatureResult
Independent infinitesimalsseparate derivative directions
Mixed products survivesecond-order information
No finite differencesexact differentiation
Local algebraic propagationautomatic Hessian computation
Structured nilpotencystable higher-order AD

Hyper-dual numbers provide one of the cleanest exact formulations of second-order automatic differentiation.