Autodiff

Tangent Propagation

Forward mode automatic differentiation computes derivatives by propagating tangent values alongside ordinary values. The ordinary value is called the primal. The derivative...

Case Studies

This section studies reverse mode automatic differentiation through concrete examples. Each case has the same structure:

Effect Systems and Mutation

Automatic differentiation is easiest to define for pure functions. A pure function behaves like a mathematical mapping: it consumes inputs, produces outputs, and has no...

Physics-Informed Models

Physics-informed models combine data fitting with equations from physics or applied mathematics. The model is trained not only to match observed samples, but also to satisfy...

Unified Differentiable Infrastructure

Automatic differentiation began as a numerical technique for computing gradients of scalar functions.

Production Deployment

A minimal automatic differentiation engine can compute correct gradients on small programs. A production system must survive long-running workloads, large tensors, distributed...

Differentiation of Large Stateful Systems

Automatic differentiation works naturally on pure mathematical functions:

Differentiation of Large Stateful Systems

Automatic differentiation works naturally on pure mathematical functions:

Summary and Synthesis

Automatic differentiation is a method for computing derivatives by transforming programs into derivative-propagating computations. It does not approximate derivatives...

Case Studies

Forward mode automatic differentiation appears in many numerical systems where directional derivatives, local sensitivities, or small parameter sets are important. This...

Reverse Mode in Deep Learning

Reverse mode automatic differentiation is the mathematical and systems basis of backpropagation. In deep learning, the objective is usually a scalar loss depending on many...

Differential Lambda Calculus

Automatic differentiation is deeply connected to functional programming and lambda calculus. Programs can be viewed as mathematical functions, and differentiation can be...

Complexity of Higher Orders

Higher-order automatic differentiation faces a fundamental problem: derivative structure grows combinatorially with order.

GPU Tensor Kernels

Modern automatic differentiation systems are fundamentally tensor compiler systems. Their performance depends less on mathematical differentiation rules than on how...

Type Systems for Differentiation

Automatic differentiation interacts deeply with type systems because differentiation changes the structure of computation. A derivative operator maps one function into another...

Reinforcement Learning

Reinforcement learning studies learning systems that act in an environment. Unlike supervised learning, the training signal is not a target label for each input. The model...

Probabilistic Programming

Probabilistic programming represents uncertainty using executable probabilistic models. A probabilistic program defines a distribution rather than only a deterministic computation.

Summary

Differentiable systems architecture extends automatic differentiation beyond isolated functions and neural network layers. The central idea is to treat larger systems as...

Distributed Gradient Computation

Distributed gradient computation appears when a differentiable program no longer fits comfortably on one device or one machine. The reason may be model size, data volume,...

Verified Differentiation

Automatic differentiation systems are usually trusted because they implement mathematically established rules such as the chain rule, product rule, and linearization of...

Differentiation as Functorial Transformation

The preceding sections described automatic differentiation through algebraic, categorical, logical, and denotational models. These viewpoints converge on one central idea:

Testing Derivatives

An automatic differentiation engine is only useful if its derivatives are correct. A small mistake in a backward rule can silently corrupt optimization, training, or...

Comparative Architecture Analysis

The systems in this chapter show that automatic differentiation is not one implementation technique. It is a family of program transformations. Each system chooses a different...

Comparative Architecture Analysis

The systems in this chapter show that automatic differentiation is not one implementation technique. It is a family of program transformations. Each system chooses a different...

Differentiable Subprograms

A differentiable subprogram is a program fragment that can participate in derivative propagation as a coherent unit. Instead of differentiating an entire application...

AD as Program Transformation

Automatic differentiation can be understood as a transformation from one program into another program.

Sparse Forward Methods

Many real-world Jacobians are sparse. Most derivative entries are zero because outputs depend only on small subsets of inputs.

Checkpointing

Checkpointing is a technique for reducing the memory cost of reverse mode automatic differentiation by selectively storing intermediate states and recomputing missing values...

Differential Lambda Calculus

Automatic differentiation is deeply connected to functional programming and lambda calculus. Programs can be viewed as mathematical functions, and differentiation can be...

Perturbation Confusion

Perturbation confusion is a correctness bug that appears in nested automatic differentiation, especially nested forward mode. It happens when two derivative computations...

Exception Handling and Undefined Regions

Programs do not only branch between valid computations. They also fail, stop early, raise exceptions, return sentinel values, or enter undefined numerical regions. These...

Sparse Tensor Derivatives

Most real computational problems are sparse. Large matrices and tensors often contain mostly zeros, structured blocks, or local interactions. Sparse representations reduce...

AD in Swift

Swift became an important experiment in language-integrated automatic differentiation because it attempted to make differentiation a core compiler feature rather than a...

Meta-Learning

Meta-learning studies systems that improve how they learn. Instead of only optimizing model parameters for one task, a meta-learning method optimizes some part of the learning...

Robotics and Control

Robotics and control systems interact with the physical world through sensing, estimation, planning, and actuation. Automatic differentiation is important because modern...

Hybrid Symbolic-Numeric Systems

A hybrid symbolic-numeric system combines discrete symbolic reasoning with continuous numerical computation. In the context of automatic differentiation, it means a pipeline...

GPU and TPU Execution

Modern automatic differentiation systems are built around accelerator hardware. GPUs and TPUs provide enormous throughput for tensor operations, making large-scale...

Differentiable Programming Languages

Automatic differentiation began as a transformation applied to numerical programs. A differentiable programming language instead treats differentiation as a native semantic...

Denotational Models

Operational semantics explains how automatic differentiation executes. Denotational semantics explains what differentiable programs mean.

Performance Benchmarking

Performance benchmarking measures whether an automatic differentiation engine is fast, memory-efficient, and scalable under realistic workloads. It also protects the engine...

Tinygrad

Tinygrad is a small deep learning framework centered around a minimal reverse-mode automatic differentiation engine. It was created by entity"people","George...

Tinygrad

Tinygrad is a small deep learning framework centered around a minimal reverse-mode automatic differentiation engine. It was created by entity"people","George...

Taylor Expansions

Differentiation describes how a function changes locally. A Taylor expansion extends this idea by approximating a function with a polynomial around a point.

Applications Across Science and Engineering

Automatic differentiation became important because derivatives are required everywhere numerical models are optimized, controlled, calibrated, or analyzed. Once a system can...

Dual Spaces and Pushforwards

Forward mode and reverse mode propagate different kinds of objects.

Purity and Side Effects

A pure computation is easier to differentiate because every output is determined only by its explicit inputs. There is no hidden state, no external mutation, and no dependence...

Numerical Exactness up to Floating Point

Automatic differentiation computes derivatives exactly with respect to the executed floating point program. This distinguishes AD from numerical differentiation, which...

Efficient Seeding Strategies

Forward mode automatic differentiation computes Jacobian-vector products:

Memory-Time Tradeoffs

Reverse mode automatic differentiation is computationally efficient for scalar-output functions, but it has a major systems cost: it needs information from the forward pass...

Category-Theoretic View

Automatic differentiation can be described operationally through dual numbers and computational graphs. It can also be described abstractly using category theory.

Efficient Higher-Order Methods

Higher-order derivatives contain rich geometric information, but naïve computation quickly becomes impractical.

Differentiating Stateful Systems

A stateful system is a program whose output depends not only on its explicit inputs, but also on stored state. The state may live in variables, objects, arrays, files, random...

Singular Value Decomposition

The singular value decomposition SVD is one of the most important matrix factorizations in numerical linear algebra. It appears in dimensionality reduction, least squares,...

AD in Julia

Julia was designed for high-performance technical computing. It combines interactive syntax with a compiler capable of specializing code aggressively based on types. This...

Implicit Layers

An implicit layer defines its output as the solution of an equation, not as a fixed sequence of explicit operations. Instead of computing

Signal Processing

Signal processing studies how information is represented, transformed, filtered, compressed, reconstructed, and estimated from signals. A signal may be a time series, an...

Differentiable Operating Systems

A differentiable operating system is an execution environment whose resource-management decisions can be optimized using gradients or gradient-like feedback. Instead of...

Parallelism

Automatic differentiation is usually described as a transformation of programs or computational graphs. In real systems, it is also a parallel execution problem. Large...

Quantum Differentiation

Quantum computation introduces a computational model fundamentally different from classical programs.

Formal Verification

Automatic differentiation systems are trusted infrastructure. Scientific computing, machine learning, optimization, simulation, and control systems depend on gradients being...

Custom Gradients

A custom gradient gives the user direct control over the backward rule of an operation. The forward computation still produces an ordinary value, but the derivative no longer...

Enzyme

Enzyme is a compiler-based automatic differentiation system for LLVM and MLIR. Instead of differentiating source code directly, or recording tensor operations at runtime,...

Enzyme

Enzyme is a compiler-based automatic differentiation system for LLVM and MLIR. Instead of differentiating source code directly, or recording tensor operations at runtime,...

Historical Development

Automatic differentiation developed from a simple observation: a numerical computation already contains the structure needed to compute its derivative. The program evaluates...

Linearization

Linearization is the operation of replacing a nonlinear function by its best local linear approximation at a chosen point. Automatic differentiation can be understood as a...

Memory and State

Automatic differentiation operates on computations, but computations execute inside a memory model. Variables occupy storage locations, arrays are mutated, buffers are reused,...

Computational Complexity

Automatic differentiation is fundamentally a computational technique. Its practical importance comes from the fact that derivatives can often be computed with asymptotic cost...

Higher-Dimensional Tangent Spaces

So far, forward mode has propagated a single tangent direction:

Wengert Lists

A Wengert list is a linear representation of a computation in which every intermediate result is assigned to a unique variable. It is one of the earliest and most influential...

Differential Algebras

Dual numbers and hyper-dual numbers are special cases of a broader algebraic structure called a differential algebra. This framework abstracts differentiation away from...

Taylor Mode AD

Taylor mode automatic differentiation computes derivatives by propagating truncated Taylor series through a program.

Non-Smooth Programs

A non-smooth program contains operations where the derivative is undefined, discontinuous, set-valued, or unstable under small perturbations. These programs arise naturally in...

Eigenvalue Problems

Eigenvalue problems are fundamental in numerical analysis, optimization, physics, graph methods, control theory, and machine learning. They are also among the most subtle...

Attention Mechanisms

Attention is a sequence operation that lets each position read information from other positions. Instead of compressing the whole past into one recurrent hidden state,...

Computational Finance

Computational finance uses numerical models to price contracts, measure risk, and optimize portfolios. Automatic differentiation is useful because most financial computations...

Differentiable Compilers

A differentiable compiler is a compilation system that supports gradient propagation through compilation decisions, generated programs, or execution behavior. Instead of...

Determinism and Reproducibility

Automatic differentiation systems are often assumed to be deterministic. Given identical inputs, identical parameters, and identical code, many users expect identical...

Probabilistic Automatic Differentiation

Classical automatic differentiation computes derivatives of deterministic programs.

Program Equivalence

Automatic differentiation transforms programs. A fundamental semantic question therefore arises:

Operator Libraries

An automatic differentiation engine becomes useful only after it supports a sufficiently rich set of primitive operations. The collection of these primitives is the operator...

Zygote

Zygote is a source-to-source reverse-mode automatic differentiation system for the Julia programming language. It was designed to differentiate high-level Julia code directly,...

Zygote

Zygote is a source-to-source reverse-mode automatic differentiation system for the Julia programming language. It was designed to differentiate high-level Julia code directly,...

Accuracy, Complexity, and Stability

Derivative computation is not only a mathematical problem. It is also a numerical and systems problem. A derivative method must answer three questions simultaneously:

Computational Graphs

A computational graph represents a calculation as nodes and edges. Nodes represent operations or values. Edges represent data dependencies. Automatic differentiation uses this...

Loops and Recurrence Relations

Loops express repeated computation. Recurrence relations express the same idea mathematically: each state is computed from one or more earlier states.

Mixed-Mode Differentiation

Mixed-mode differentiation combines forward accumulation and reverse accumulation in the same derivative computation. It is used when neither pure forward mode nor pure...

Complexity Analysis

Forward mode automatic differentiation has a simple cost model. It evaluates the original program and, at the same time, evaluates the tangent program. Each primitive...

Tape-Based Systems

Most reverse mode automatic differentiation systems require a mechanism for recording the forward computation so that the reverse pass can later traverse it backward. This...

Hyper-Dual Numbers

Dual numbers compute first derivatives exactly. Truncated polynomial algebras extend this to higher-order derivatives, but practical higher-order differentiation introduces an...

Nested AD

Nested automatic differentiation means applying automatic differentiation inside another automatic differentiation computation.

Piecewise Differentiability

A piecewise differentiable function is built from several differentiable pieces joined by boundaries. Each piece has an ordinary derivative inside its region. At the...

Differentiating Factorizations

Matrix factorizations rewrite a matrix into structured factors. They are used because the factors make later computations cheaper, more stable, or easier to interpret. In...

AD in Python

Python became the dominant language for modern machine learning and differentiable computing because it combines a simple programming model with access to high-performance...

Sequence Models

Sequence models process ordered data. The input is not one independent vector, but a series:

Molecular Simulation

Molecular simulation models the behavior of atoms and molecules using physical interaction laws. Automatic differentiation is important because many molecular methods require...

Differentiable Search and Retrieval

Differentiable search and retrieval systems integrate information access into gradient-based learning. Instead of treating retrieval as an external symbolic operation, the...

Gradient Vanishing and Explosion

Gradient-based optimization relies on propagating derivative information through many layers, time steps, or computational transformations. In deep systems, these gradients...

Neural ODEs

Classical neural networks apply a finite sequence of transformations:

Lambda Calculus and AD

Automatic differentiation becomes substantially more difficult once programs contain higher-order functions.

Memory Management

Memory management is the main systems problem in reverse mode automatic differentiation. The derivative rules are usually small. The hard part is deciding which primal values,...

JAX

JAX is an automatic differentiation and array programming system for Python. It combines NumPy-like syntax with composable program transformations. Its core transformations...

JAX

JAX is an automatic differentiation and array programming system for Python. It combines NumPy-like syntax with composable program transformations. Its core transformations...

Automatic Differentiation

Automatic differentiation computes derivatives by applying the chain rule to the operations of a program. The input is ordinary code that computes a value. The output is code,...

Chain Rule as Composition Algebra

The chain rule is the central theorem behind automatic differentiation. Every useful AD algorithm is a disciplined way of applying the chain rule to a program.

Control Flow

Control flow determines which operations a program executes. Straight-line programs have a fixed sequence of operations, but ordinary programs contain branches, loops,...

Reverse Accumulation

Reverse accumulation is the reverse-mode form of automatic differentiation. It propagates derivative information backward from outputs to inputs.

Jacobian-Vector Products

The natural output of forward mode automatic differentiation is a Jacobian-vector product. Instead of constructing the full Jacobian matrix explicitly, forward mode computes...

Reverse Accumulation Algorithms

Reverse accumulation is the operational core of reverse mode automatic differentiation. The forward pass evaluates a program and records dependency information. The reverse...

Truncated Polynomial Algebras

Dual numbers capture first-order derivatives because the infinitesimal element satisfies

Higher-Order Reverse Mode

Reverse mode is efficient for scalar-output functions because it propagates one adjoint backward through the computation and produces a full gradient. For

Dynamic Graphs

A dynamic graph is a computation graph built while the program runs. Its structure depends on ordinary runtime values: branches, loop counts, recursive calls, tensor shapes,...

Linear Algebra Primitives

Linear algebra primitives are tensor operations with algebraic structure: matrix multiplication, triangular solves, factorizations, inverses, determinants, norms, and spectral...

Neural Network Training

Neural network training is the repeated application of three operations: evaluate a model, differentiate a scalar loss, and update parameters. Automatic differentiation...

Computational Fluid Dynamics

Computational fluid dynamics studies fluid motion by solving discretized forms of the governing equations. Automatic differentiation enters CFD when we want gradients of...

Differentiable Physics Engines

A differentiable physics engine computes gradients of physical simulation outputs with respect to inputs, parameters, or control signals. Instead of treating simulation as a...

Memory Explosion

Reverse-mode automatic differentiation trades computation for memory. To compute gradients efficiently, the backward pass requires access to intermediate values produced...

Continuous-Time Adjoint Methods

Many systems evolve continuously over time rather than through discrete layers. A state variable changes according to a differential equation:

Differential Categories

Cartesian differential categories model differentiation in categories with products. Differential categories generalize this idea further by shifting attention from cartesian...

Tape Design

A tape is an append-only record of the operations executed during the forward pass. Reverse mode uses the tape to replay derivative rules backward.

PyTorch Autograd

PyTorch Autograd is a dynamic reverse-mode automatic differentiation system. It records tensor operations as they execute, builds a computation graph at runtime, and then...

PyTorch Autograd

PyTorch Autograd is a dynamic reverse-mode automatic differentiation system. It records tensor operations as they execute, builds a computation graph at runtime, and then...

Symbolic Differentiation

Symbolic differentiation computes derivatives by manipulating expressions. The input is a formula. The output is another formula.

Jacobians and Hessians

The gradient is enough when a function has many inputs and one scalar output. More general programs need more general derivative objects. Two of the most important are the...

Dependency Graphs

A dependency graph describes how values in a computation depend on earlier values. Automatic differentiation operates on these dependencies.

Forward Accumulation

Forward accumulation is the forward-mode form of automatic differentiation. It propagates derivative information in the same order as ordinary program evaluation. Each...

Forward Evaluation Rules

Forward mode automatic differentiation works by replacing each primitive operation with an extended operation on pairs:

Vector-Jacobian Products

Reverse mode automatic differentiation fundamentally computes vector-Jacobian products. The gradient of a scalar function is a special case of this more general operation.

Geometric Interpretation

Dual numbers provide an algebraic mechanism for differentiation, but they also have a precise geometric meaning. A dual number represents a point together with an...

Hessian-Vector Products

A Hessian-vector product computes

Recursion

Recursion is control flow where a function calls itself. In automatic differentiation, recursion behaves like a loop with a call stack. Each recursive call contributes one...

Broadcasting Semantics

Broadcasting is the rule system that allows tensor operations between arrays of different shapes without explicitly materializing expanded copies. It is one of the most...

Differentiable Programming

Differentiable programming treats differentiation as a general programming-language feature. A program can contain numerical kernels, control flow, data structures, solvers,...

Backpropagation

Backpropagation is reverse mode automatic differentiation applied to neural networks. In most machine learning writing, the term refers to the whole training procedure: run a...

Inverse Problems

An inverse problem asks for causes from effects. A forward model predicts observations from parameters. An inverse model tries to recover parameters from observations.

Differentiable Rendering

Differentiable rendering is the process of computing derivatives of rendered images with respect to scene parameters. A renderer becomes part of the computational graph rather...

Overflow and Underflow

Floating point systems represent numbers within a finite range. When a computed value exceeds the largest representable magnitude, overflow occurs. When a value becomes too...

Differentiable Optimization Layers

An optimization layer is a program component whose output is the solution of an optimization problem. Instead of computing

Categorical Semantics

Algebraic semantics describes differentiation through derivations, tangent maps, and linear structure. Categorical semantics goes further. It studies differentiation as a...

Graph Representation

A graph representation makes the structure of a differentiated computation explicit. In reverse mode, this structure is required because the backward pass must know which...

TensorFlow Autograd

TensorFlow Autograd refers to TensorFlow’s automatic differentiation system, mainly exposed through tf.GradientTape. It is a reverse-mode AD system designed for tensor...

TensorFlow Autograd

TensorFlow Autograd refers to TensorFlow’s automatic differentiation system, mainly exposed through tf.GradientTape. It is a reverse-mode AD system designed for tensor...

Numerical Differentiation

Numerical differentiation estimates derivatives by evaluating a function at nearby input values. It treats the function as a black box. The method does not need access to the...

Multivariate Calculus

Automatic differentiation is usually applied to functions with many inputs and many outputs. The calculus needed for this setting is multivariate calculus: the study of how a...

Intermediate Variables

Intermediate variables are the named values created between program inputs and program outputs. They make automatic differentiation mechanical.

Elementary Operations

Automatic differentiation reduces differentiation to a finite collection of elementary operations. Every program, regardless of complexity, is decomposed into primitive...

Dual Numbers

Dual numbers give forward mode automatic differentiation a compact algebraic form. Instead of storing a value and a tangent as two unrelated fields, we package them into one...

Reverse Computational Graphs

Reverse mode automatic differentiation operates on a computational graph. The forward pass evaluates the graph from inputs to outputs. The reverse pass traverses the same...

Nilpotent Elements

The defining feature of dual numbers is the existence of a nonzero element whose square vanishes:

Hessian Computation

For a scalar function

forward

A loop repeats a computation until a condition fails or a fixed iteration count is reached. In automatic differentiation, loops are important because many numerical algorithms...

Tensor Operations

Tensor operations generalize scalar, vector, and matrix operations to arrays with arbitrary rank. In automatic differentiation, a tensor is usually treated as a typed array...

Functional Languages

Functional programming languages provide a natural semantic foundation for automatic differentiation. Programs are expressed as compositions of functions, immutable values,...

Stochastic Optimization

Stochastic optimization studies optimization when the objective is accessed through samples, noisy estimates, or partial observations. In machine learning, this is the normal...

Sensitivity Analysis

Sensitivity analysis studies how changes in inputs affect the outputs of a system. In differential equations, optimization, simulation, and machine learning, the main object...

Differentiable Databases

A differentiable database is a data system whose operations participate in gradient-based optimization. Instead of treating storage and querying as external infrastructure,...

Stability of Reverse Mode

Reverse mode automatic differentiation computes gradients by propagating adjoint values backward through a computational graph. In exact arithmetic, the reverse accumulation...

Differentiating Through Solvers

A solver is a program that computes a value by search, iteration, or factorization. Instead of evaluating a closed-form expression, it finds a value that satisfies a condition.

Algebraic Semantics

Automatic differentiation is often introduced operationally. A program executes elementary operations, and derivative information propagates alongside the computation. This...

Minimal Reverse Mode Engine

Reverse mode automatic differentiation computes derivatives by traversing the program backward after evaluation. Unlike forward mode, which propagates tangents alongside...

Tapenade

Tapenade is a source-transformation automatic differentiation system developed at INRIA. Like ADIFOR, it takes an existing program and produces a new differentiated program....

Tapenade

Tapenade is a source-transformation automatic differentiation system developed at INRIA. Like ADIFOR, it takes an existing program and produces a new differentiated program....

Chapter 1. Introduction

A derivative measures how an output changes when an input changes. That sentence is simple, but it is one of the main ideas behind numerical computing, optimization, machine...

Chapter 2. Mathematical Foundations

Automatic differentiation begins with a simple object: a function.

Chapter 3. Programs as Mathematical Objects

A straight-line program is the simplest model of computation used in automatic differentiation. It is a program with a fixed sequence of assignments, no branches, no loops,...

Chapter 4. Core Theory of Automatic Differentiation

Automatic differentiation is built on a simple observation: a complicated derivative can be computed by composing many small local derivatives. Instead of manipulating a full...

Chapter 5. Forward Mode Automatic Differentiation

Forward mode automatic differentiation computes derivatives by carrying two values through a program at the same time: the ordinary value and its tangent. The ordinary value...

Chapter 6. Reverse Mode Automatic Differentiation

Reverse mode automatic differentiation computes derivatives by propagating sensitivities backward through a computation. In forward mode, each intermediate value carries a...

Chapter 7. Dual Numbers and Algebraic Structures

Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of...

Chapter 8. Higher-Order Differentiation

First derivatives describe local rate of change. Second derivatives describe how that rate of change itself changes. In optimization, this is curvature. In dynamics, it is...

Chapter 9. Differentiation of Control Flow

A conditional is a program construct that chooses one computation among several possible computations. In ordinary code, this is written as if, else, switch, case, pattern...

Chapter 10. Matrix and Tensor Differentiation

Matrix calculus is the notation and rule system used to differentiate functions whose inputs, outputs, or intermediate values are vectors, matrices, or tensors. Automatic...

Chapter 13. Optimization and Machine Learning

Gradient descent is the basic optimization procedure behind much of modern machine learning. It is simple enough to state in one line, but rich enough to expose many of the...

Chapter 14. Scientific Computing Applications

Differential equations are one of the main reasons automatic differentiation matters in scientific computing. Many scientific models are not written as closed-form functions....

Chapter 15. Differentiable Systems Architecture

An end-to-end differentiable pipeline is a system whose final objective can send derivative information backward through every trainable or tunable stage of computation....

Chapter 17. Numerical and Systems Concerns

Automatic differentiation computes derivatives by executing arithmetic. On a real machine, arithmetic uses finite precision. This means AD gives the derivative of the...

Chapter 18. Advanced Topics

Many programs do not compute their output by applying a fixed sequence of explicit operations. Instead, they define the output as the solution of another problem.

Chapter 19. Theory and Foundations

Automatic differentiation is often described by a simple rule:

Chapter 20. Building an AD Engine

A minimal forward mode automatic differentiation engine has one job: evaluate a program while carrying both a value and its derivative. The engine does not build a graph. It...

Chapter 21. Major AD Systems

ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...

Appendix

ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...

Chapter 16. Sparse and Structured Differentiation

Sparse and structured differentiation studies how to compute derivatives without materializing dense derivative objects. Many real systems have enormous Jacobians and...

Chapter 22. Open Problems

Automatic differentiation works naturally on pure mathematical functions:

Auto Diff

Auto Diff book notes exported from ChatGPT, organized into 22 chapters.

Appendix

ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...

Chapter 22. Open Problems

Automatic differentiation works naturally on pure mathematical functions:

Chapter 21. Major AD Systems

ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...

Chapter 20. Building an AD Engine

A minimal forward mode automatic differentiation engine has one job: evaluate a program while carrying both a value and its derivative. The engine does not build a graph. It...

Chapter 19. Theory and Foundations

Automatic differentiation is often described by a simple rule:

Chapter 18. Advanced Topics

Many programs do not compute their output by applying a fixed sequence of explicit operations. Instead, they define the output as the solution of another problem.

Chapter 17. Numerical and Systems Concerns

Automatic differentiation computes derivatives by executing arithmetic. On a real machine, arithmetic uses finite precision. This means AD gives the derivative of the...

Chapter 16. Sparse and Structured Differentiation

Sparse and structured differentiation studies how to compute derivatives without materializing dense derivative objects. Many real systems have enormous Jacobians and...

Chapter 15. Differentiable Systems Architecture

An end-to-end differentiable pipeline is a system whose final objective can send derivative information backward through every trainable or tunable stage of computation....

Chapter 14. Scientific Computing Applications

Differential equations are one of the main reasons automatic differentiation matters in scientific computing. Many scientific models are not written as closed-form functions....

Chapter 13. Optimization and Machine Learning

Gradient descent is the basic optimization procedure behind much of modern machine learning. It is simple enough to state in one line, but rich enough to expose many of the...

Chapter 12. AD in Modern Programming Languages

Lisp is one of the natural homes of automatic differentiation. It treats programs as data, has a simple expression syntax, and supports macro systems that can transform code...

Chapter 11. Compiler and Runtime Design

Source transformation is an implementation strategy for automatic differentiation in which a program that computes a function is rewritten into another program that computes...

Ahead-of-Time vs Just-in-Time Differentiation

Automatic differentiation can be performed before a program runs, while it runs, or in a staged phase between the two.

Chapter 10. Matrix and Tensor Differentiation

Matrix calculus is the notation and rule system used to differentiate functions whose inputs, outputs, or intermediate values are vectors, matrices, or tensors. Automatic...

Kernel Fusion

Kernel fusion combines several small operations into one larger executable unit.

Chapter 9. Differentiation of Control Flow

A conditional is a program construct that chooses one computation among several possible computations. In ordinary code, this is written as if, else, switch, case, pattern...

Memory Planning

Memory planning determines where values are stored, how long they remain alive, and when storage can be reused.

Chapter 8. Higher-Order Differentiation

First derivatives describe local rate of change. Second derivatives describe how that rate of change itself changes. In optimization, this is curvature. In dynamics, it is...

Staging and Partial Evaluation

Staging is the separation of a program into phases.

Chapter 7. Dual Numbers and Algebraic Structures

Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of...

Tracing Systems

Tracing is an implementation strategy where an AD system observes a program while it runs and records the operations that occur.

AD in Rust

Rust is an attractive language for automatic differentiation because it combines low-level performance with strong static guarantees. It gives the programmer control over...

Chapter 6. Reverse Mode Automatic Differentiation

Reverse mode automatic differentiation computes derivatives by propagating sensitivities backward through a computation. In forward mode, each intermediate value carries a...

Graph IRs

A graph intermediate representation models a program as nodes and edges.

Chapter 5. Forward Mode Automatic Differentiation

Forward mode automatic differentiation computes derivatives by carrying two values through a program at the same time: the ordinary value and its tangent. The ordinary value...

SSA Form

Static single assignment form, or SSA, is an intermediate representation where each variable is assigned exactly once.

AD in C and C++

C and C++ are important targets for automatic differentiation because much scientific, engineering, graphics, finance, and machine learning infrastructure is written in these...

Chapter 4. Core Theory of Automatic Differentiation

Automatic differentiation is built on a simple observation: a complicated derivative can be computed by composing many small local derivatives. Instead of manipulating a full...

Intermediate Representations

An intermediate representation, or IR, is the internal program form used by a compiler or AD system after parsing and before final code generation.

Chapter 3. Programs as Mathematical Objects

A straight-line program is the simplest model of computation used in automatic differentiation. It is a program with a fixed sequence of assignments, no branches, no loops,...

Operator Overloading

Operator overloading implements automatic differentiation by changing the meaning of ordinary arithmetic operations for special numeric objects.

Chapter 2. Mathematical Foundations

Automatic differentiation begins with a simple object: a function.

Chapter 11. Compiler and Runtime Design

Source transformation is an implementation strategy for automatic differentiation in which a program that computes a function is rewritten into another program that computes...

Chapter 12. AD in Modern Programming Languages

Lisp is one of the natural homes of automatic differentiation. It treats programs as data, has a simple expression syntax, and supports macro systems that can transform code...

Chapter 1. Introduction

A derivative measures how an output changes when an input changes. That sentence is simple, but it is one of the main ideas behind numerical computing, optimization, machine...