# Elementary Operations

Automatic differentiation reduces differentiation to a finite collection of elementary operations. Every program, regardless of complexity, is decomposed into primitive computational steps with known local derivative rules.

An AD system therefore requires two components:

1. a representation of computation as primitive operations
2. derivative propagation rules for each primitive

This section formalizes these elementary operations and shows how derivative rules are attached to them.

## Primitive Operations

A primitive operation is an operation whose derivative behavior is directly known.

Typical primitives include:
- arithmetic operations
- transcendental functions
- tensor primitives
- control primitives
- linear algebra kernels

Examples:

| Category | Operations |
|---|---|
| Arithmetic | $+,-,\times,/$ |
| Power | $x^n,\sqrt{x}$ |
| Exponential | $\exp,\log$ |
| Trigonometric | $\sin,\cos,\tan$ |
| Hyperbolic | $\sinh,\cosh$ |
| Comparison | $<,>,=$ |
| Tensor | reshape, transpose, broadcast |
| Linear algebra | matmul, solve, svd |

Complex functions are compositions of these primitives.

## The Local Differentiation Principle

Each primitive operation defines:
- output values
- local derivative transformations

Suppose:

$$
z=\phi(x_1,\dots,x_n)
$$

The operation defines a local Jacobian:

$$
J_\phi =
\left[
\frac{\partial z_i}{\partial x_j}
\right]
$$

AD systems propagate derivatives through compositions of these local Jacobians.

The global derivative is never derived symbolically.

## Unary Operations

Unary operations map one input to one output.

$$
z=\phi(x)
$$

### Negation

$$
z=-x
$$

Derivative:

$$
\frac{dz}{dx}=-1
$$

Forward propagation:

$$
\dot z=-\dot x
$$

Reverse propagation:

$$
\bar x=-\bar z
$$

---

### Reciprocal

$$
z=\frac{1}{x}
$$

Derivative:

$$
\frac{dz}{dx} =
-\frac{1}{x^2}
$$

Forward:

$$
\dot z =
-\frac{\dot x}{x^2}
$$

Reverse:

$$
\bar x
+=
-\frac{\bar z}{x^2}
$$

---

### Square Root

$$
z=\sqrt{x}
$$

Derivative:

$$
\frac{dz}{dx} =
\frac{1}{2\sqrt{x}}
$$

Forward:

$$
\dot z =
\frac{\dot x}{2\sqrt{x}}
$$

Reverse:

$$
\bar x
+=
\frac{\bar z}{2\sqrt{x}}
$$

## Binary Operations

Binary operations combine two inputs.

$$
z=\phi(x,y)
$$

### Addition

$$
z=x+y
$$

Local derivatives:

$$
\frac{\partial z}{\partial x}=1
$$

$$
\frac{\partial z}{\partial y}=1
$$

Forward:

$$
\dot z=\dot x+\dot y
$$

Reverse:

$$
\bar x += \bar z
$$

$$
\bar y += \bar z
$$

---

### Subtraction

$$
z=x-y
$$

Forward:

$$
\dot z=\dot x-\dot y
$$

Reverse:

$$
\bar x += \bar z
$$

$$
\bar y -= \bar z
$$

---

### Multiplication

$$
z=xy
$$

Local derivatives:

$$
\frac{\partial z}{\partial x}=y
$$

$$
\frac{\partial z}{\partial y}=x
$$

Forward:

$$
\dot z =
y\dot x+x\dot y
$$

Reverse:

$$
\bar x += y\bar z
$$

$$
\bar y += x\bar z
$$

---

### Division

$$
z=\frac{x}{y}
$$

Local derivatives:

$$
\frac{\partial z}{\partial x} =
\frac{1}{y}
$$

$$
\frac{\partial z}{\partial y} =
-\frac{x}{y^2}
$$

Forward:

$$
\dot z =
\frac{y\dot x-x\dot y}{y^2}
$$

Reverse:

$$
\bar x += \frac{\bar z}{y}
$$

$$
\bar y -= \frac{x\bar z}{y^2}
$$

## Exponential and Logarithmic Operations

### Exponential

$$
z=e^x
$$

Derivative:

$$
\frac{dz}{dx}=e^x=z
$$

Forward:

$$
\dot z=z\dot x
$$

Reverse:

$$
\bar x += z\bar z
$$

---

### Natural Logarithm

$$
z=\log(x)
$$

Derivative:

$$
\frac{dz}{dx}=\frac{1}{x}
$$

Forward:

$$
\dot z=\frac{\dot x}{x}
$$

Reverse:

$$
\bar x += \frac{\bar z}{x}
$$

## Trigonometric Operations

### Sine

$$
z=\sin(x)
$$

Derivative:

$$
\frac{dz}{dx}=\cos(x)
$$

Forward:

$$
\dot z=\cos(x)\dot x
$$

Reverse:

$$
\bar x += \cos(x)\bar z
$$

---

### Cosine

$$
z=\cos(x)
$$

Derivative:

$$
\frac{dz}{dx}=-\sin(x)
$$

Forward:

$$
\dot z=-\sin(x)\dot x
$$

Reverse:

$$
\bar x -= \sin(x)\bar z
$$

---

### Tangent

$$
z=\tan(x)
$$

Derivative:

$$
\frac{dz}{dx} =
\sec^2(x)
$$

Forward:

$$
\dot z=\sec^2(x)\dot x
$$

Reverse:

$$
\bar x += \sec^2(x)\bar z
$$

## Power Operations

### Constant Exponent

$$
z=x^n
$$

Derivative:

$$
\frac{dz}{dx} =
nx^{n-1}
$$

Forward:

$$
\dot z =
nx^{n-1}\dot x
$$

Reverse:

$$
\bar x
+=
nx^{n-1}\bar z
$$

---

### Variable Exponent

$$
z=x^y
$$

This operation depends on both variables.

Derivative identities:

$$
\frac{\partial z}{\partial x} =
yx^{y-1}
$$

$$
\frac{\partial z}{\partial y} =
x^y\log(x)
$$

Forward:

$$
\dot z =
yx^{y-1}\dot x
+
x^y\log(x)\dot y
$$

Reverse:

$$
\bar x
+=
yx^{y-1}\bar z
$$

$$
\bar y
+=
x^y\log(x)\bar z
$$

## Vector-Valued Operations

Primitive operations may produce vectors or tensors.

Example:

$$
z=Ax
$$

where:
- $A\in\mathbb{R}^{m\times n}$
- $x\in\mathbb{R}^n$

Derivative:

$$
\frac{\partial z}{\partial x}=A
$$

Forward:

$$
\dot z=A\dot x
$$

Reverse:

$$
\bar x += A^T\bar z
$$

Reverse mode naturally introduces transposed operators.

This is one reason reverse mode aligns well with linear algebra systems.

## Tensor Operations

Modern AD systems require derivatives for tensor primitives.

### Broadcast

Broadcasting conceptually replicates dimensions.

Reverse propagation must reduce along broadcast axes.

Example:

$$
z_{ij}=x_i+y_j
$$

Backward propagation accumulates gradients:

$$
\bar x_i =
\sum_j \bar z_{ij}
$$

$$
\bar y_j =
\sum_i \bar z_{ij}
$$

### Reshape

Reshape changes layout without changing values.

Derivative propagation only changes metadata.

No arithmetic transformation occurs.

### Transpose

Transpose reverses axes.

Backward rule:

$$
\bar X += (\bar Y)^T
$$

## Linear Algebra Kernels

Efficient AD systems treat matrix operations as primitives.

### Matrix Multiplication

$$
C=AB
$$

Forward:

$$
\dot C =
\dot A B
+
A\dot B
$$

Reverse:

$$
\bar A += \bar C B^T
$$

$$
\bar B += A^T \bar C
$$

These rules are fundamental in neural network training.

## Local Derivative Tables

AD implementations often store derivative rules in dispatch tables.

Conceptually:

| Operation | Forward Rule | Reverse Rule |
|---|---|---|
| add | $\dot x+\dot y$ | distribute |
| mul | product rule | weighted accumulation |
| exp | $e^x\dot x$ | multiply by output |
| sin | $\cos(x)\dot x$ | multiply by cosine |

The runtime evaluates:
1. primal computation
2. local derivative rule
3. propagation

This separation allows extensibility.

## Primitive Sets and Closure

An AD system only differentiates operations whose local rules are defined.

If every operation in a program belongs to the primitive set, then the entire program becomes differentiable under composition.

This closure property is central:
- differentiable programs are built compositionally
- local rules induce global derivatives

The entire AD framework depends on this compositional closure.

## Numerical Stability of Primitive Rules

Derivative rules may amplify numerical instability.

Example:

$$
\frac{d}{dx}\log(x)=\frac{1}{x}
$$

Near zero:
- gradients explode
- floating point error increases

Similarly:

$$
\frac{d}{dx}\sqrt{x} =
\frac{1}{2\sqrt{x}}
$$

becomes unstable near zero.

AD systems therefore require:
- stable primitive implementations
- numerically safe kernels
- domain checks
- fused operations

Modern deep learning systems often implement stabilized primitives directly:
- logsumexp
- softmax-crossentropy fusion
- stable normalization kernels

## Primitive Operations as the Foundation of AD

Automatic differentiation does not differentiate arbitrary mathematics directly.

It differentiates programs composed from primitives.

Every AD engine therefore rests on:
- a computational graph
- a primitive operator set
- local derivative propagation rules

All higher abstractions ultimately reduce to these elementary operations.

