Elementary Operations

Automatic differentiation reduces differentiation to a finite collection of elementary operations. Every program, regardless of complexity, is decomposed into primitive computational steps with known local derivative rules.

An AD system therefore requires two components:

a representation of computation as primitive operations
derivative propagation rules for each primitive

This section formalizes these elementary operations and shows how derivative rules are attached to them.

Primitive Operations

A primitive operation is an operation whose derivative behavior is directly known.

Typical primitives include:

arithmetic operations
transcendental functions
tensor primitives
control primitives
linear algebra kernels

Examples:

Category	Operations
Arithmetic	$+,-,\times,/$
Power	$x^n,\sqrt{x}$
Exponential	$\exp,\log$
Trigonometric	$\sin,\cos,\tan$
Hyperbolic	$\sinh,\cosh$
Comparison	$<,>,=$
Tensor	reshape, transpose, broadcast
Linear algebra	matmul, solve, svd

Complex functions are compositions of these primitives.

The Local Differentiation Principle

Each primitive operation defines:

output values
local derivative transformations

Suppose:

z=\phi(x_1,\dots,x_n)

The operation defines a local Jacobian:

J_\phi = \left[ \frac{\partial z_i}{\partial x_j} \right]

AD systems propagate derivatives through compositions of these local Jacobians.

The global derivative is never derived symbolically.

Unary Operations

Unary operations map one input to one output.

z=\phi(x)

Negation

z=-x

Derivative:

\frac{dz}{dx}=-1

Forward propagation:

\dot z=-\dot x

Reverse propagation:

\bar x=-\bar z

Reciprocal

z=\frac{1}{x}

Derivative:

\frac{dz}{dx} = -\frac{1}{x^2}

Forward:

\dot z = -\frac{\dot x}{x^2}

Reverse:

\bar x += -\frac{\bar z}{x^2}

Square Root

z=\sqrt{x}

Derivative:

\frac{dz}{dx} = \frac{1}{2\sqrt{x}}

Forward:

\dot z = \frac{\dot x}{2\sqrt{x}}

Reverse:

\bar x += \frac{\bar z}{2\sqrt{x}}

Binary Operations

Binary operations combine two inputs.

z=\phi(x,y)

Addition



z=x+y

Local derivatives:

\frac{\partial z}{\partial x}=1

\frac{\partial z}{\partial y}=1

Forward:

\dot z=\dot x+\dot y

Reverse:

\bar x += \bar z

\bar y += \bar z

Subtraction

z=x-y

Forward:

\dot z=\dot x-\dot y

Reverse:

\bar x += \bar z

\bar y -= \bar z

Multiplication



z=xy

Local derivatives:

\frac{\partial z}{\partial x}=y

\frac{\partial z}{\partial y}=x

Forward:

\dot z = y\dot x+x\dot y

Reverse:

\bar x += y\bar z

\bar y += x\bar z

Division

z=\frac{x}{y}

Local derivatives:

\frac{\partial z}{\partial x} = \frac{1}{y}

\frac{\partial z}{\partial y} = -\frac{x}{y^2}

Forward:

\dot z = \frac{y\dot x-x\dot y}{y^2}

Reverse:

\bar x += \frac{\bar z}{y}

\bar y -= \frac{x\bar z}{y^2}

Exponential and Logarithmic Operations

Exponential



z=e^x

Derivative:

\frac{dz}{dx}=e^x=z

Forward:

\dot z=z\dot x

Reverse:

\bar x += z\bar z

Natural Logarithm

z=\log(x)

Derivative:

\frac{dz}{dx}=\frac{1}{x}

Forward:

\dot z=\frac{\dot x}{x}

Reverse:

\bar x += \frac{\bar z}{x}

Trigonometric Operations

Sine



z=\sin(x)

Derivative:

\frac{dz}{dx}=\cos(x)

Forward:

\dot z=\cos(x)\dot x

Reverse:

\bar x += \cos(x)\bar z

Cosine

z=\cos(x)

Derivative:

\frac{dz}{dx}=-\sin(x)

Forward:

\dot z=-\sin(x)\dot x

Reverse:

\bar x -= \sin(x)\bar z

Tangent

z=\tan(x)

Derivative:

\frac{dz}{dx} = \sec^2(x)

Forward:

\dot z=\sec^2(x)\dot x

Reverse:

\bar x += \sec^2(x)\bar z

Power Operations

Constant Exponent

z=x^n

Derivative:

\frac{dz}{dx} = nx^{n-1}

Forward:

\dot z = nx^{n-1}\dot x

Reverse:

\bar x += nx^{n-1}\bar z

Variable Exponent

z=x^y

This operation depends on both variables.

Derivative identities:

\frac{\partial z}{\partial x} = yx^{y-1}

\frac{\partial z}{\partial y} = x^y\log(x)

Forward:

\dot z = yx^{y-1}\dot x + x^y\log(x)\dot y

Reverse:

\bar x += yx^{y-1}\bar z

\bar y += x^y\log(x)\bar z

Vector-Valued Operations

Primitive operations may produce vectors or tensors.

Example:

z=Ax

where:

$A\in\mathbb{R}^{m\times n}$
$x\in\mathbb{R}^n$

Derivative:

\frac{\partial z}{\partial x}=A

Forward:

\dot z=A\dot x

Reverse:

\bar x += A^T\bar z

Reverse mode naturally introduces transposed operators.

This is one reason reverse mode aligns well with linear algebra systems.

Tensor Operations

Modern AD systems require derivatives for tensor primitives.

Broadcast

Broadcasting conceptually replicates dimensions.

Reverse propagation must reduce along broadcast axes.

Example:

z_{ij}=x_i+y_j

Backward propagation accumulates gradients:

\bar x_i = \sum_j \bar z_{ij}

\bar y_j = \sum_i \bar z_{ij}

Reshape

Reshape changes layout without changing values.

Derivative propagation only changes metadata.

No arithmetic transformation occurs.

Transpose

Transpose reverses axes.

Backward rule:

\bar X += (\bar Y)^T

Linear Algebra Kernels

Efficient AD systems treat matrix operations as primitives.

Matrix Multiplication

C=AB

Forward:

\dot C = \dot A B + A\dot B

Reverse:

\bar A += \bar C B^T

\bar B += A^T \bar C

These rules are fundamental in neural network training.

Local Derivative Tables

AD implementations often store derivative rules in dispatch tables.

Conceptually:

Operation	Forward Rule	Reverse Rule
add	$\dot x+\dot y$	distribute
mul	product rule	weighted accumulation
exp	$e^x\dot x$	multiply by output
sin	$\cos(x)\dot x$	multiply by cosine

The runtime evaluates:

primal computation
local derivative rule
propagation

This separation allows extensibility.

Primitive Sets and Closure

An AD system only differentiates operations whose local rules are defined.

If every operation in a program belongs to the primitive set, then the entire program becomes differentiable under composition.

This closure property is central:

differentiable programs are built compositionally
local rules induce global derivatives

The entire AD framework depends on this compositional closure.

Numerical Stability of Primitive Rules

Derivative rules may amplify numerical instability.

Example:

\frac{d}{dx}\log(x)=\frac{1}{x}

Near zero:

gradients explode
floating point error increases

Similarly:

\frac{d}{dx}\sqrt{x} = \frac{1}{2\sqrt{x}}

becomes unstable near zero.

AD systems therefore require:

stable primitive implementations
numerically safe kernels
domain checks
fused operations

Modern deep learning systems often implement stabilized primitives directly:

logsumexp
softmax-crossentropy fusion
stable normalization kernels

Primitive Operations as the Foundation of AD

Automatic differentiation does not differentiate arbitrary mathematics directly.

It differentiates programs composed from primitives.

Every AD engine therefore rests on:

a computational graph
a primitive operator set
local derivative propagation rules

All higher abstractions ultimately reduce to these elementary operations.