Skip to content

Elementary Operations

Automatic differentiation reduces differentiation to a finite collection of elementary operations. Every program, regardless of complexity, is decomposed into primitive...

Automatic differentiation reduces differentiation to a finite collection of elementary operations. Every program, regardless of complexity, is decomposed into primitive computational steps with known local derivative rules.

An AD system therefore requires two components:

  1. a representation of computation as primitive operations
  2. derivative propagation rules for each primitive

This section formalizes these elementary operations and shows how derivative rules are attached to them.

Primitive Operations

A primitive operation is an operation whose derivative behavior is directly known.

Typical primitives include:

  • arithmetic operations
  • transcendental functions
  • tensor primitives
  • control primitives
  • linear algebra kernels

Examples:

CategoryOperations
Arithmetic+,,×,/+,-,\times,/
Powerxn,xx^n,\sqrt{x}
Exponentialexp,log\exp,\log
Trigonometricsin,cos,tan\sin,\cos,\tan
Hyperbolicsinh,cosh\sinh,\cosh
Comparison<,>,=<,>,=
Tensorreshape, transpose, broadcast
Linear algebramatmul, solve, svd

Complex functions are compositions of these primitives.

The Local Differentiation Principle

Each primitive operation defines:

  • output values
  • local derivative transformations

Suppose:

z=ϕ(x1,,xn) z=\phi(x_1,\dots,x_n)

The operation defines a local Jacobian:

Jϕ=[zixj] J_\phi = \left[ \frac{\partial z_i}{\partial x_j} \right]

AD systems propagate derivatives through compositions of these local Jacobians.

The global derivative is never derived symbolically.

Unary Operations

Unary operations map one input to one output.

z=ϕ(x) z=\phi(x)

Negation

z=x z=-x

Derivative:

dzdx=1 \frac{dz}{dx}=-1

Forward propagation:

z˙=x˙ \dot z=-\dot x

Reverse propagation:

xˉ=zˉ \bar x=-\bar z

Reciprocal

z=1x z=\frac{1}{x}

Derivative:

dzdx=1x2 \frac{dz}{dx} = -\frac{1}{x^2}

Forward:

z˙=x˙x2 \dot z = -\frac{\dot x}{x^2}

Reverse:

xˉ+=zˉx2 \bar x += -\frac{\bar z}{x^2}

Square Root

z=x z=\sqrt{x}

Derivative:

dzdx=12x \frac{dz}{dx} = \frac{1}{2\sqrt{x}}

Forward:

z˙=x˙2x \dot z = \frac{\dot x}{2\sqrt{x}}

Reverse:

xˉ+=zˉ2x \bar x += \frac{\bar z}{2\sqrt{x}}

Binary Operations

Binary operations combine two inputs.

z=ϕ(x,y) z=\phi(x,y)

Addition

z=x+y z=x+y

Local derivatives:

zx=1 \frac{\partial z}{\partial x}=1 zy=1 \frac{\partial z}{\partial y}=1

Forward:

z˙=x˙+y˙ \dot z=\dot x+\dot y

Reverse:

xˉ+=zˉ \bar x += \bar z yˉ+=zˉ \bar y += \bar z

Subtraction

z=xy z=x-y

Forward:

z˙=x˙y˙ \dot z=\dot x-\dot y

Reverse:

xˉ+=zˉ \bar x += \bar z yˉ=zˉ \bar y -= \bar z

Multiplication

z=xy z=xy

Local derivatives:

zx=y \frac{\partial z}{\partial x}=y zy=x \frac{\partial z}{\partial y}=x

Forward:

z˙=yx˙+xy˙ \dot z = y\dot x+x\dot y

Reverse:

xˉ+=yzˉ \bar x += y\bar z yˉ+=xzˉ \bar y += x\bar z

Division

z=xy z=\frac{x}{y}

Local derivatives:

zx=1y \frac{\partial z}{\partial x} = \frac{1}{y} zy=xy2 \frac{\partial z}{\partial y} = -\frac{x}{y^2}

Forward:

z˙=yx˙xy˙y2 \dot z = \frac{y\dot x-x\dot y}{y^2}

Reverse:

xˉ+=zˉy \bar x += \frac{\bar z}{y} yˉ=xzˉy2 \bar y -= \frac{x\bar z}{y^2}

Exponential and Logarithmic Operations

Exponential

z=ex z=e^x

Derivative:

dzdx=ex=z \frac{dz}{dx}=e^x=z

Forward:

z˙=zx˙ \dot z=z\dot x

Reverse:

xˉ+=zzˉ \bar x += z\bar z

Natural Logarithm

z=log(x) z=\log(x)

Derivative:

dzdx=1x \frac{dz}{dx}=\frac{1}{x}

Forward:

z˙=x˙x \dot z=\frac{\dot x}{x}

Reverse:

xˉ+=zˉx \bar x += \frac{\bar z}{x}

Trigonometric Operations

Sine

z=sin(x) z=\sin(x)

Derivative:

dzdx=cos(x) \frac{dz}{dx}=\cos(x)

Forward:

z˙=cos(x)x˙ \dot z=\cos(x)\dot x

Reverse:

xˉ+=cos(x)zˉ \bar x += \cos(x)\bar z

Cosine

z=cos(x) z=\cos(x)

Derivative:

dzdx=sin(x) \frac{dz}{dx}=-\sin(x)

Forward:

z˙=sin(x)x˙ \dot z=-\sin(x)\dot x

Reverse:

xˉ=sin(x)zˉ \bar x -= \sin(x)\bar z

Tangent

z=tan(x) z=\tan(x)

Derivative:

dzdx=sec2(x) \frac{dz}{dx} = \sec^2(x)

Forward:

z˙=sec2(x)x˙ \dot z=\sec^2(x)\dot x

Reverse:

xˉ+=sec2(x)zˉ \bar x += \sec^2(x)\bar z

Power Operations

Constant Exponent

z=xn z=x^n

Derivative:

dzdx=nxn1 \frac{dz}{dx} = nx^{n-1}

Forward:

z˙=nxn1x˙ \dot z = nx^{n-1}\dot x

Reverse:

xˉ+=nxn1zˉ \bar x += nx^{n-1}\bar z

Variable Exponent

z=xy z=x^y

This operation depends on both variables.

Derivative identities:

zx=yxy1 \frac{\partial z}{\partial x} = yx^{y-1} zy=xylog(x) \frac{\partial z}{\partial y} = x^y\log(x)

Forward:

z˙=yxy1x˙+xylog(x)y˙ \dot z = yx^{y-1}\dot x + x^y\log(x)\dot y

Reverse:

xˉ+=yxy1zˉ \bar x += yx^{y-1}\bar z yˉ+=xylog(x)zˉ \bar y += x^y\log(x)\bar z

Vector-Valued Operations

Primitive operations may produce vectors or tensors.

Example:

z=Ax z=Ax

where:

  • ARm×nA\in\mathbb{R}^{m\times n}
  • xRnx\in\mathbb{R}^n

Derivative:

zx=A \frac{\partial z}{\partial x}=A

Forward:

z˙=Ax˙ \dot z=A\dot x

Reverse:

xˉ+=ATzˉ \bar x += A^T\bar z

Reverse mode naturally introduces transposed operators.

This is one reason reverse mode aligns well with linear algebra systems.

Tensor Operations

Modern AD systems require derivatives for tensor primitives.

Broadcast

Broadcasting conceptually replicates dimensions.

Reverse propagation must reduce along broadcast axes.

Example:

zij=xi+yj z_{ij}=x_i+y_j

Backward propagation accumulates gradients:

xˉi=jzˉij \bar x_i = \sum_j \bar z_{ij} yˉj=izˉij \bar y_j = \sum_i \bar z_{ij}

Reshape

Reshape changes layout without changing values.

Derivative propagation only changes metadata.

No arithmetic transformation occurs.

Transpose

Transpose reverses axes.

Backward rule:

Xˉ+=(Yˉ)T \bar X += (\bar Y)^T

Linear Algebra Kernels

Efficient AD systems treat matrix operations as primitives.

Matrix Multiplication

C=AB C=AB

Forward:

C˙=A˙B+AB˙ \dot C = \dot A B + A\dot B

Reverse:

Aˉ+=CˉBT \bar A += \bar C B^T Bˉ+=ATCˉ \bar B += A^T \bar C

These rules are fundamental in neural network training.

Local Derivative Tables

AD implementations often store derivative rules in dispatch tables.

Conceptually:

OperationForward RuleReverse Rule
addx˙+y˙\dot x+\dot ydistribute
mulproduct ruleweighted accumulation
expexx˙e^x\dot xmultiply by output
sincos(x)x˙\cos(x)\dot xmultiply by cosine

The runtime evaluates:

  1. primal computation
  2. local derivative rule
  3. propagation

This separation allows extensibility.

Primitive Sets and Closure

An AD system only differentiates operations whose local rules are defined.

If every operation in a program belongs to the primitive set, then the entire program becomes differentiable under composition.

This closure property is central:

  • differentiable programs are built compositionally
  • local rules induce global derivatives

The entire AD framework depends on this compositional closure.

Numerical Stability of Primitive Rules

Derivative rules may amplify numerical instability.

Example:

ddxlog(x)=1x \frac{d}{dx}\log(x)=\frac{1}{x}

Near zero:

  • gradients explode
  • floating point error increases

Similarly:

ddxx=12x \frac{d}{dx}\sqrt{x} = \frac{1}{2\sqrt{x}}

becomes unstable near zero.

AD systems therefore require:

  • stable primitive implementations
  • numerically safe kernels
  • domain checks
  • fused operations

Modern deep learning systems often implement stabilized primitives directly:

  • logsumexp
  • softmax-crossentropy fusion
  • stable normalization kernels

Primitive Operations as the Foundation of AD

Automatic differentiation does not differentiate arbitrary mathematics directly.

It differentiates programs composed from primitives.

Every AD engine therefore rests on:

  • a computational graph
  • a primitive operator set
  • local derivative propagation rules

All higher abstractions ultimately reduce to these elementary operations.