Book – brain

Euler Products

Euler products are one of the central ideas of analytic number theory. They express infinite sums over integers as infinite products over primes.

Dirichlet Series

An arithmetic function $fn$ can be encoded into an infinite series of the form

Average Orders of Arithmetic Functions

Arithmetic functions often fluctuate strongly from one integer to the next.

Möbius Inversion

Many arithmetic functions are defined through sums over divisors. For example,

Dirichlet Convolution

Arithmetic functions can be added and multiplied pointwise, but number theory has another product that is better adapted to divisibility.

Completely Multiplicative Functions

An arithmetic function is a function defined on the positive integers. Such a function

Frobenius Automorphisms

One of the deepest ideas in algebraic number theory is that prime numbers possess hidden symmetry inside field extensions.

Parity Problem

Sieve methods are extremely effective for estimating how many integers avoid small prime factors. They have produced major results about:

Appendix A.1 Sets and Functions

A set is a collection of objects called elements.

Liouville Function

The Liouville function is an arithmetic function denoted by

Decomposition and Inertia Groups

Let

Chen's Theorem

The Twin Prime Conjecture states that infinitely many primes satisfy

Euler Totient Function

Euler's totient function is an arithmetic function denoted by

Ramification of Primes

In the ordinary integers, every nonzero integer factors uniquely into prime numbers.

Bombieri-Vinogradov Theorem

The Prime Number Theorem for arithmetic progressions states that for

The Langlands Program

The Langlands program is one of the largest and most influential research programs in modern mathematics.

Möbius Function

The Möbius function is an arithmetic function denoted by

Local Fields

Classical number theory studies arithmetic globally over fields such as

Brun-Titchmarsh Theorem

One of the central problems of analytic number theory is understanding how primes distribute among residue classes.

The Birch and Swinnerton-Dyer Conjecture

An elliptic curve over $\mathbb{Q}$ may be written in Weierstrass form

Divisor Functions

Divisor functions measure the positive divisors of an integer. They are among the first examples of arithmetic functions, because their values depend directly on the prime...

$p$-Adic Numbers

The real numbers arise by completing the rational numbers using the ordinary absolute value. The $p$-adic numbers arise by completing the rational numbers using the $p$-adic...

Large Sieve

Classical sieve methods estimate how many integers survive congruence restrictions. The large sieve approaches these problems from a different direction.

The Riemann Hypothesis

The Riemann zeta function is one of the central objects in mathematics.

Applications to Computation

Modular arithmetic is not only a theoretical language for divisibility. It is also one of the main tools of computation with integers.

Valuations and Absolute Values

In ordinary analysis, the absolute value

Selberg Sieve

Brun's sieve introduced the idea of estimating sifted sets through truncated inclusion-exclusion. However, Brun's method often produced bounds that were technically difficult...

Fermat's Last Theorem

Fermat's Last Theorem states that there are no positive integers

Fast Modular Exponentiation

Modular arithmetic often requires computing powers such as

Dedekind Domains

Ordinary integers satisfy several remarkable properties simultaneously:

Brun Sieve

Sieve methods are techniques for counting integers that remain after removing residue classes modulo primes.

Geometric Langlands Theory

The classical Langlands program relates:

Open Problems in Number Theory

Number theory contains some of the oldest and deepest unsolved problems in mathematics.

Chinese Remainder Theorem

The Chinese remainder theorem describes when several congruence conditions can be combined into one congruence. Its cleanest form occurs when the moduli are pairwise coprime.

Principal Ideals

Let $R$ be a commutative ring. An ideal $I\subseteq R$ is called principal if there exists an element $\alpha\in R$ such that

Schnirelmann Density

In additive number theory, ordinary asymptotic density is often too weak to control additive behavior.

Shimura Varieties

Modular curves parameterize elliptic curves and connect modular forms with arithmetic geometry.

Arithmetic Statistics

Arithmetic statistics studies the distribution of arithmetic objects inside large families.

Systems of Congruences

A system of congruences asks for an integer satisfying several congruence conditions simultaneously.

Discriminants

The discriminant is one of the most important invariants of a number field. It measures how the arithmetic of the field differs from ordinary rational arithmetic.

Additive Bases

A central question in additive number theory asks whether every integer can be represented as a sum of elements from a fixed set.

Trace Formulas

Fourier analysis decomposes functions into harmonic frequencies.

Probabilistic Models for Primes

Prime numbers are deterministic objects, but many aspects of their distribution resemble random behavior.

Modular Inverses

In ordinary arithmetic, division by a nonzero number means multiplication by its reciprocal. Modular arithmetic is more delicate. A residue class may or may not have a...

Units and Dirichlet Unit Theorem

Let $K$ be a number field and let

Exponential Sums

Exponential sums are among the central tools of analytic number theory.

Random Matrices and Zeta Zeros

The Riemann zeta function is defined for $\operatorname{Re}s>1$ by

Linear Congruences

A linear congruence is a congruence of the form

Class Groups

In ordinary integers, every ideal is generated by a single element:

Circle Method

Many problems in additive number theory ask whether an integer can be represented in the form

Functoriality

The Langlands program predicts that many different arithmetic objects are connected by systematic transfers.

Probabilistic Algorithms

A probabilistic algorithm uses random choices during its execution. In number theory, this is often a practical advantage rather than a weakness.

Arithmetic Modulo $n$

Arithmetic modulo $n$ is arithmetic performed on residue classes modulo $n$. Instead of distinguishing all integers separately, we identify integers that have the same...

Ideals and Prime Ideals

In ordinary integers, every number factors uniquely into primes. In many rings of algebraic integers, this property fails.

Waring's Problem

Waring's problem asks whether every sufficiently large positive integer can be written as a sum of a bounded number of fixed powers.

Galois Representations

Galois groups encode the symmetries of algebraic equations and field extensions.

Probabilistic Primality

A primality test determines whether an integer is prime.

Residue Classes

Congruence modulo $n$ groups integers according to their remainders after division by $n$. If two integers have the same remainder, they are congruent modulo $n$.

Unique Factorization Failure

One of the central properties of the ordinary integers is unique factorization.

Goldbach Problems

Goldbach-type problems ask whether integers can be represented as sums of primes. They are among the oldest and most famous problems in additive number theory.

The Langlands Program

The Langlands program is one of the most ambitious and influential theories in modern mathematics.

Smooth Numbers

A positive integer is called $y$-smooth if all of its prime factors are at most $y$.

Congruence Relations

Ordinary equality compares integers exactly. In many arithmetic problems, however, only the remainder after division matters.

Sumsets

Additive number theory studies arithmetic structure through addition of integers and subsets of integers.

Automorphic Representations

Classical modular form theory begins with analytic functions satisfying symmetry conditions.

Random Integers

Number theory often studies exact statements about individual integers. For example, one may ask whether a given integer is prime, squarefree, smooth, or representable as a...

Distribution Heuristics of Primes

The infinitude of primes guarantees that primes continue indefinitely, but it says nothing about how frequently primes occur.

Generalized Riemann Hypothesis

The classical Riemann Hypothesis concerns the zeros of the Riemann zeta function

Automorphic Forms

Modular forms are functions on the upper half-plane satisfying symmetry conditions under the modular group

Zero-Knowledge Proofs

A zero-knowledge proof allows one party to convince another that a statement is true without revealing why it is true.

Euler's Proof

Euclid proved that there are infinitely many primes by contradiction. Euler discovered a very different proof based on infinite series and products.

Number Fields

A number field is a finite extension of the rational numbers. Concretely, it is a field $K$ satisfying

Nonvanishing Results

A central theme in analytic number theory is determining when an $L$-function is nonzero at a particular point.

The Modularity Theorem

For centuries, elliptic curves and modular forms were studied as separate objects.

Post-Quantum Cryptography

Modern public-key cryptography relies heavily on two computational assumptions:

Euclid's Proof

Euclid's proof of the infinitude of primes is one of the earliest examples of a general argument in number theory. It does not depend on computation, experimentation, or...

Minimal Polynomials

An algebraic number is a complex number that satisfies some nonzero polynomial equation with rational coefficients. Thus $\alpha\in\mathbb{C}$ is algebraic if there exists a...

Primes in Arithmetic Progressions

An arithmetic progression is a sequence of the form

Elliptic Curves and Modularity

An elliptic curve is simultaneously:

Lattice Cryptography

Lattice cryptography is a family of cryptographic systems based on the presumed hardness of computational problems on high-dimensional lattices.

Infinitude of Primes

Prime numbers are the building blocks of the positive integers. Once unique prime factorization is known, a natural question arises: are there only finitely many primes, or do...

Modular Curves

The modular group acts on the upper half-plane by fractional linear transformations:

Pairing-Based Cryptography

Pairing-based cryptography uses special maps defined on elliptic curve groups. A pairing is a function

Arithmetic Functions from Factorization

An arithmetic function is a function whose domain is the positive integers. It assigns a value to each integer

Diophantine Approximation

Diophantine approximation studies how closely real numbers can be approximated by rational numbers.

Orthogonality Relations

Dirichlet characters behave analogously to exponential functions in Fourier analysis. Just as complex exponentials separate frequencies, characters separate residue classes...

Hecke Operators

Modular forms already possess symmetry under the modular group. Yet a deeper arithmetic structure emerges through another family of operators: the Hecke operators.

Elliptic Curve Cryptography

Elliptic curve cryptography is a public-key cryptographic framework based on the arithmetic of elliptic curves over finite fields.

Canonical Prime Decomposition

Unique prime factorization says that every integer $n>1$ can be written as a product of primes. The canonical prime decomposition is the ordered and exponentiated version of...

Pell Equations via Continued Fractions

Recall that a Pell equation has the form

Dirichlet Characters

The Riemann zeta function studies prime numbers globally, without distinguishing congruence classes. However, many arithmetic questions concern primes satisfying conditions such as

Cusp Forms

Modular forms satisfy strong symmetry conditions under the modular group. Among them, cusp forms form the deepest and most arithmetic subclass.

Diffie-Hellman Key Exchange

Secure communication requires two parties to share secret information. In classical symmetric cryptography, both parties must already possess the same secret key before...

Unique Prime Factorization

The fundamental theorem of arithmetic states that every integer $n>1$ can be written as a product of prime numbers, and that this product is unique up to the order of the factors.

Convergents

The convergents of a continued fraction are the rational numbers obtained by truncating the expansion at finite stages.

Connections with Prime Distribution

The Riemann zeta function was introduced through the series

Eisenstein Series

Among all modular forms, Eisenstein series are the most explicit and computationally accessible.

RSA Cryptosystem

Classical cryptography uses a shared secret key. Both sender and receiver must know the same secret information in advance.

Coprime Integers

Two integers $a$ and $b$, not both zero, are called coprime if their greatest common divisor is $1$:

Explicit Formulae

One of the deepest ideas in analytic number theory is that the zeros of the zeta function determine the distribution of prime numbers.

Modular Forms

Modular forms are among the central objects of modern number theory.

Symbolic and Numeric Computation

Modern number theory relies heavily on computation. Two broad computational paradigms dominate the subject:

Bezout Identities

Let $a$ and $b$ be integers. An integer of the form

Infinite Continued Fractions

Finite continued fractions correspond exactly to rational numbers. When the Euclidean algorithm never terminates, the continued fraction becomes infinite.

Riemann Hypothesis

The Riemann zeta function has nontrivial zeros inside the critical strip

Modular Functions

The modular group acts on the upper half-plane by fractional linear transformations:

Algorithms for Elliptic Curves

Elliptic curves occupy a central position in modern number theory, arithmetic geometry, and cryptography.

Extended Euclidean Algorithm

The Euclidean algorithm computes the greatest common divisor of two integers. The extended Euclidean algorithm does more. It also expresses the gcd as an integer linear...

Finite Continued Fractions

A finite continued fraction is an expression of the form

Zeros of the Zeta Function

The zeros of the Riemann zeta function are the complex numbers $s$ satisfying

Modular Groups

Modular forms begin with the action of certain matrix groups on the complex upper half-plane.

Algorithms for Modular Forms

Modular forms are highly structured analytic functions with deep arithmetic properties. Although their definitions involve complex analysis and group actions, modular forms...

Euclidean Algorithm

The greatest common divisor of two integers can be found by listing divisors, but this method becomes inefficient for large numbers. For example, finding

Euclidean Algorithm Revisited

The Euclidean algorithm is one of the oldest and most important algorithms in mathematics. It computes the greatest common divisor of two integers using repeated division.

Functional Equation

The defining series of the zeta function,

Global Class Field Theory

One of the central goals of algebraic number theory is to classify field extensions of a number field

Lattice Reduction

A lattice is a discrete additive subgroup of Euclidean space. More concretely, let

Suggested Projects and Explorations

Study empirical properties of prime numbers through computation.

Least Common Multiples

Let $a$ and $b$ be nonzero integers. An integer $m$ is called a common multiple of $a$ and $b$ if

Computational Aspects

Quadratic residue theory is not only a theoretical subject. It also plays a major role in computational number theory, cryptography, primality testing, and algorithm design.

Analytic Continuation

The defining series of the Riemann zeta function is

Local Class Field Theory

Global class field theory studies finite abelian extensions of number fields such as

Integer Factorization

Integer factorization asks for the prime decomposition of a positive integer. Given

Problem Sets

1. Prove that the sum of two even integers is even.

Greatest Common Divisors

Let $a$ and $b$ be integers, not both zero. An integer $d$ is called a common divisor of $a$ and $b$ if

Higher Reciprocity Laws

Quadratic reciprocity describes when one prime is a square modulo another prime. A natural question is whether similar laws exist for higher powers.

Euler Product Formula

The defining series of the Riemann zeta function is

Hilbert Class Fields

One of the central discoveries of algebraic number theory is that unique factorization may fail in rings of algebraic integers.

Primality Testing

A prime number is an integer greater than $1$ whose only positive divisors are

The Division Algorithm

The division algorithm is one of the basic structural facts about the integers. It says that any integer can be divided by a positive integer with a unique quotient and remainder.

Gauss Sums

Gauss sums arise from combining multiplicative and additive structures modulo a prime. They form one of the fundamental tools of analytic and algebraic number theory.

Definition of the Zeta Function

One of the central objects of analytic number theory is the Riemann zeta function. It connects infinite series, prime numbers, complex analysis, and arithmetic structure into...

Reciprocity Maps

One of the oldest themes in number theory is reciprocity: the phenomenon that solvability conditions for one prime are controlled by arithmetic involving another prime.

Fast Integer Arithmetic

Modern computational number theory depends fundamentally on efficient arithmetic with large integers.

Composite Numbers

A positive integer $n>1$ is called composite if it is not prime.

Quadratic Reciprocity

The theory of quadratic residues asks a fundamental question:

Abelian Extensions

A central goal of algebraic number theory is to understand field extensions of a given base field, especially extensions of the rational numbers

Trace Formula

One of the central ideas of modern analysis is that functions may be decomposed spectrally into elementary pieces.

Prime Numbers

Prime numbers are the fundamental building blocks of arithmetic.

Euler Criterion

Euler criterion gives an efficient way to decide whether an integer is a square modulo an odd prime. Let $p$ be an odd prime and let $a$ be an integer not divisible by $p$....

Adeles and Ideles

The rational numbers may be studied through their completions:

Functoriality

Functoriality is the unifying mechanism of the Langlands program. It predicts systematic relationships between automorphic representations attached to different algebraic groups.

Divisibility Relations

Division of integers does not always produce an integer. For example,

Short Intervals

The Prime Number Theorem describes the average distribution of primes up to a large number $x$:

Local-Global Principles

A central problem in number theory is determining whether an equation possesses rational or integral solutions.

Langlands Program

The Langlands program is a broad collection of conjectures connecting number theory, representation theory, harmonic analysis, and algebraic geometry. Its central idea is that...

Historical Development of Number Systems

The idea of number arose long before formal mathematics. Early civilizations used numbers for counting objects, measuring land, recording trade, and tracking time.

Legendre Symbol

Let $p$ be an odd prime and let $a\in\mathbb{Z}$. The Legendre symbol is defined by

Hensel’s Lemma

One of the central ideas of number theory is that congruences modulo powers of a prime often approximate genuine arithmetic solutions.

Adelic Methods

Number theory studies arithmetic simultaneously at two levels:

Appendix J. Historical Notes and Bibliography

Number theory is one of the oldest parts of mathematics, but modern number theory is not a single ancient subject carried forward unchanged. It is a layered discipline....

Growth of Integers

The integers extend infinitely in both directions:

Squares Modulo $n$

A quadratic congruence is a congruence involving a square. The basic form is

Logarithmic Integral

The logarithmic integral is the function

Completion of Fields

The rational numbers form a field rich enough for arithmetic, yet insufficient for many limiting processes.

Automorphic Representations

Classically, number theory studied special analytic functions such as modular forms. These functions satisfy strong symmetry conditions under actions of arithmetic groups.

Appendix I. Computational Tools

Computation has become an essential part of number theory. Classical arithmetic relied mainly on symbolic reasoning and hand calculations. Modern arithmetic combines rigorous...

Recursive Definitions

Many mathematical objects are defined recursively. A recursive definition specifies:

Geometry of Diophantine Problems

A Diophantine equation is first an arithmetic object. It asks for solutions in integers or rational numbers. But every polynomial equation also defines a geometric object.

Prime Number Theorem

The Prime Number Theorem describes the asymptotic distribution of prime numbers. It states that

Appendix H. Category Theory Basics

Category theory studies mathematical structures through objects and maps between them. Instead of looking only at what objects are made of, it studies how they relate to other...

$p$-Adic Numbers

The real numbers arise by completing the rational numbers with respect to the ordinary absolute value. This completion produces a field suited to Euclidean geometry and...

Representation Theory Background

Representation theory studies abstract algebraic objects by expressing them as linear transformations of vector spaces.

Strong Induction

Ordinary induction proves a statement $Pn$ by showing that truth passes from one case to the next:

Rational and Integral Points

A central problem in number theory is to study solutions of polynomial equations whose coordinates belong to a specified number system. Two important cases are:

Appendix G. Linear Algebra Review

A vector space over a field $F$ is a set $V$ equipped with addition and scalar multiplication satisfying the usual algebraic rules.

Absolute Values

The ordinary absolute value on the real numbers measures magnitude:

Weil Conjectures

One of the central problems in arithmetic geometry is understanding the number of solutions of polynomial equations over finite fields.

Mathematical Induction

Many statements in number theory concern all natural numbers. For example, one may wish to prove that

Exponential Diophantine Equations

An exponential Diophantine equation is a Diophantine equation in which one or more unknowns appear as exponents. Typical examples include

Prime Counting Function

One of the oldest questions in number theory asks how prime numbers are distributed among the positive integers. Since primes become less frequent as numbers grow larger,...

Appendix F. Measure and Integration

Measure theory extends the ideas of length, area, volume, and integration to more general settings. In number theory, measure appears in probability, harmonic analysis,...

Ramification

One of the central ideas of algebraic number theory is that prime numbers may behave differently after passing to a larger field.

Étale Cohomology

Classical topology studies geometric spaces using invariants such as homology and cohomology. Over the complex numbers, algebraic varieties can often be viewed as topological...

Absolute Value and Distance

The order relation distinguishes positive and negative integers, but in many situations the sign of a number is less important than its magnitude. For example, the integers

Catalan-Type Equations

A Catalan-type equation is a Diophantine equation involving powers whose values differ by a small amount. The classical example is

Abel Summation

In analytic number theory, one often studies sums of the form

Appendix E. Topology Background

Topology studies continuity, convergence, connectedness, and geometric structure in an abstract setting. In number theory, topology appears naturally in real analysis, complex...

Cyclotomic Fields

One of the most important classes of number fields arises from the solutions of the equation

Arithmetic Surfaces

Arithmetic geometry often studies families of algebraic curves varying over arithmetic bases. The most important base is

Order Relations

The integers are not merely a collection of numbers equipped with arithmetic operations. They also possess an order structure. Given two integers $a$ and $b$, one can...

Sums of Squares

One of the oldest questions in number theory asks which integers can be written as sums of squares. Typical examples are

Convergence Methods

Analytic number theory studies infinite sums, products, and integrals. Before such expressions can be manipulated safely, one must understand the meaning of convergence.

Appendix D. Real and Complex Analysis Review

The real numbers $\mathbb{R}$ extend the rational numbers $\mathbb{Q}$ by filling gaps such as

Curves over Fields

An algebraic curve is a geometric object whose dimension is one. Curves are among the oldest and most important objects in number theory and algebraic geometry.

Arithmetic Operations

An arithmetic operation is a rule that combines numbers to produce another number. The most basic operations on integers are addition, subtraction, multiplication, and division.

Pell Equations

A Pell equation is a Diophantine equation of the form

Euler Products

Euler products arise when an infinite series has coefficients controlled by multiplication. The simplest and most important example is the zeta series

Appendix C. Abstract Algebra Review

Abstract algebra studies sets equipped with operations. In number theory, these structures organize arithmetic behavior.

Galois Groups

A polynomial equation may possess several roots related by hidden algebraic symmetries. Consider

Morphisms and Fibers

Geometry is not only concerned with spaces themselves, but also with maps between spaces. In algebraic geometry and arithmetic geometry, these maps are called morphisms.

The Integers

The natural numbers are sufficient for counting and addition, but they are not sufficient for subtraction. For example,

Appendix B. Proof Techniques

A mathematical proof is a logically complete argument establishing the truth of a statement from accepted assumptions, definitions, and previously proved results.

Pythagorean Triples

A Pythagorean triple is a triple of positive integers

Splitting Fields

A central problem in algebra is to determine where a polynomial factors completely into linear terms. Consider the polynomial

Schemes

Classical algebraic geometry studies varieties defined by polynomial equations. This theory works well over algebraically closed fields, especially over $\mathbb{C}$. However,...

Chapter 1. Foundations of Arithmetic

The natural numbers arise from the basic act of counting. When we count objects in a collection, we assign successive numbers:

Chapter 2. Classical Number Theory

A Diophantine equation is an equation whose solutions are required to be integers. The unknowns are not allowed to range over the real numbers or complex numbers unless...

Chapter 3. Analytic Number Theory

The harmonic series is the infinite series

Chapter 4. Algebraic Number Theory

A field is a number system in which addition, subtraction, multiplication, and division by nonzero elements are always possible. The rational numbers $\mathbb{Q}$, the real...

Chapter 5. Arithmetic Geometry and Modern Directions

Arithmetic geometry studies solutions of polynomial equations by combining algebra, geometry, and number theory. Its basic objects are spaces defined by polynomial equations....

Appendix

A set is a collection of objects, called its elements. If $x$ is an element of a set $A$, we write $x \in A$. If $x$ is not an element of $A$, we write $x \notin A$.

Modern Number Theory

Modern Number Theory book notes exported from ChatGPT, organized into 5 chapters.

Appendix

A set is a collection of objects, called its elements. If $x$ is an element of a set $A$, we write $x \in A$. If $x$ is not an element of $A$, we write $x \notin A$.

Future Directions in Number Theory

Modern number theory continues to evolve rapidly.

Chapter 5. Arithmetic Geometry and Modern Directions

Arithmetic geometry studies solutions of polynomial equations by combining algebra, geometry, and number theory. Its basic objects are spaces defined by polynomial equations....

Chapter 4. Algebraic Number Theory

A field is a number system in which addition, subtraction, multiplication, and division by nonzero elements are always possible. The rational numbers $\mathbb{Q}$, the real...

Chapter 3. Analytic Number Theory

The harmonic series is the infinite series

Chapter 2. Classical Number Theory

A Diophantine equation is an equation whose solutions are required to be integers. The unknowns are not allowed to range over the real numbers or complex numbers unless...

Chapter 1. Foundations of Arithmetic

The natural numbers arise from the basic act of counting. When we count objects in a collection, we assign successive numbers:

Tangent Propagation

Forward mode automatic differentiation computes derivatives by propagating tangent values alongside ordinary values. The ordinary value is called the primal. The derivative...

Case Studies

This section studies reverse mode automatic differentiation through concrete examples. Each case has the same structure:

Effect Systems and Mutation

Automatic differentiation is easiest to define for pure functions. A pure function behaves like a mathematical mapping: it consumes inputs, produces outputs, and has no...

Physics-Informed Models

Physics-informed models combine data fitting with equations from physics or applied mathematics. The model is trained not only to match observed samples, but also to satisfy...

Unified Differentiable Infrastructure

Automatic differentiation began as a numerical technique for computing gradients of scalar functions.

Production Deployment

A minimal automatic differentiation engine can compute correct gradients on small programs. A production system must survive long-running workloads, large tensors, distributed...

Differentiation of Large Stateful Systems

Automatic differentiation works naturally on pure mathematical functions:

Differentiation of Large Stateful Systems

Automatic differentiation works naturally on pure mathematical functions:

Summary and Synthesis

Automatic differentiation is a method for computing derivatives by transforming programs into derivative-propagating computations. It does not approximate derivatives...

Case Studies

Forward mode automatic differentiation appears in many numerical systems where directional derivatives, local sensitivities, or small parameter sets are important. This...

Reverse Mode in Deep Learning

Reverse mode automatic differentiation is the mathematical and systems basis of backpropagation. In deep learning, the objective is usually a scalar loss depending on many...

Differential Lambda Calculus

Automatic differentiation is deeply connected to functional programming and lambda calculus. Programs can be viewed as mathematical functions, and differentiation can be...

Complexity of Higher Orders

Higher-order automatic differentiation faces a fundamental problem: derivative structure grows combinatorially with order.

GPU Tensor Kernels

Modern automatic differentiation systems are fundamentally tensor compiler systems. Their performance depends less on mathematical differentiation rules than on how...

Type Systems for Differentiation

Automatic differentiation interacts deeply with type systems because differentiation changes the structure of computation. A derivative operator maps one function into another...

Reinforcement Learning

Reinforcement learning studies learning systems that act in an environment. Unlike supervised learning, the training signal is not a target label for each input. The model...

Probabilistic Programming

Probabilistic programming represents uncertainty using executable probabilistic models. A probabilistic program defines a distribution rather than only a deterministic computation.

Summary

Differentiable systems architecture extends automatic differentiation beyond isolated functions and neural network layers. The central idea is to treat larger systems as...

Distributed Gradient Computation

Distributed gradient computation appears when a differentiable program no longer fits comfortably on one device or one machine. The reason may be model size, data volume,...

Verified Differentiation

Automatic differentiation systems are usually trusted because they implement mathematically established rules such as the chain rule, product rule, and linearization of...

Differentiation as Functorial Transformation

The preceding sections described automatic differentiation through algebraic, categorical, logical, and denotational models. These viewpoints converge on one central idea:

Testing Derivatives

An automatic differentiation engine is only useful if its derivatives are correct. A small mistake in a backward rule can silently corrupt optimization, training, or...

Comparative Architecture Analysis

The systems in this chapter show that automatic differentiation is not one implementation technique. It is a family of program transformations. Each system chooses a different...

Comparative Architecture Analysis

The systems in this chapter show that automatic differentiation is not one implementation technique. It is a family of program transformations. Each system chooses a different...

Differentiable Subprograms

A differentiable subprogram is a program fragment that can participate in derivative propagation as a coherent unit. Instead of differentiating an entire application...

AD as Program Transformation

Automatic differentiation can be understood as a transformation from one program into another program.

Sparse Forward Methods

Many real-world Jacobians are sparse. Most derivative entries are zero because outputs depend only on small subsets of inputs.

Checkpointing

Checkpointing is a technique for reducing the memory cost of reverse mode automatic differentiation by selectively storing intermediate states and recomputing missing values...

Differential Lambda Calculus

Automatic differentiation is deeply connected to functional programming and lambda calculus. Programs can be viewed as mathematical functions, and differentiation can be...

Perturbation Confusion

Perturbation confusion is a correctness bug that appears in nested automatic differentiation, especially nested forward mode. It happens when two derivative computations...

Exception Handling and Undefined Regions

Programs do not only branch between valid computations. They also fail, stop early, raise exceptions, return sentinel values, or enter undefined numerical regions. These...

Sparse Tensor Derivatives

Most real computational problems are sparse. Large matrices and tensors often contain mostly zeros, structured blocks, or local interactions. Sparse representations reduce...

AD in Swift

Swift became an important experiment in language-integrated automatic differentiation because it attempted to make differentiation a core compiler feature rather than a...

Meta-Learning

Meta-learning studies systems that improve how they learn. Instead of only optimizing model parameters for one task, a meta-learning method optimizes some part of the learning...

Robotics and Control

Robotics and control systems interact with the physical world through sensing, estimation, planning, and actuation. Automatic differentiation is important because modern...

Hybrid Symbolic-Numeric Systems

A hybrid symbolic-numeric system combines discrete symbolic reasoning with continuous numerical computation. In the context of automatic differentiation, it means a pipeline...

GPU and TPU Execution

Modern automatic differentiation systems are built around accelerator hardware. GPUs and TPUs provide enormous throughput for tensor operations, making large-scale...

Differentiable Programming Languages

Automatic differentiation began as a transformation applied to numerical programs. A differentiable programming language instead treats differentiation as a native semantic...

Denotational Models

Operational semantics explains how automatic differentiation executes. Denotational semantics explains what differentiable programs mean.

Performance Benchmarking

Performance benchmarking measures whether an automatic differentiation engine is fast, memory-efficient, and scalable under realistic workloads. It also protects the engine...

Tinygrad

Tinygrad is a small deep learning framework centered around a minimal reverse-mode automatic differentiation engine. It was created by entity"people","George...

Tinygrad

Tinygrad is a small deep learning framework centered around a minimal reverse-mode automatic differentiation engine. It was created by entity"people","George...

Taylor Expansions

Differentiation describes how a function changes locally. A Taylor expansion extends this idea by approximating a function with a polynomial around a point.

Applications Across Science and Engineering

Automatic differentiation became important because derivatives are required everywhere numerical models are optimized, controlled, calibrated, or analyzed. Once a system can...

Dual Spaces and Pushforwards

Forward mode and reverse mode propagate different kinds of objects.

Purity and Side Effects

A pure computation is easier to differentiate because every output is determined only by its explicit inputs. There is no hidden state, no external mutation, and no dependence...

Numerical Exactness up to Floating Point

Automatic differentiation computes derivatives exactly with respect to the executed floating point program. This distinguishes AD from numerical differentiation, which...

Efficient Seeding Strategies

Forward mode automatic differentiation computes Jacobian-vector products:

Memory-Time Tradeoffs

Reverse mode automatic differentiation is computationally efficient for scalar-output functions, but it has a major systems cost: it needs information from the forward pass...

Category-Theoretic View

Automatic differentiation can be described operationally through dual numbers and computational graphs. It can also be described abstractly using category theory.

Efficient Higher-Order Methods

Higher-order derivatives contain rich geometric information, but naïve computation quickly becomes impractical.

Differentiating Stateful Systems

A stateful system is a program whose output depends not only on its explicit inputs, but also on stored state. The state may live in variables, objects, arrays, files, random...

Singular Value Decomposition

The singular value decomposition SVD is one of the most important matrix factorizations in numerical linear algebra. It appears in dimensionality reduction, least squares,...

AD in Julia

Julia was designed for high-performance technical computing. It combines interactive syntax with a compiler capable of specializing code aggressively based on types. This...

Implicit Layers

An implicit layer defines its output as the solution of an equation, not as a fixed sequence of explicit operations. Instead of computing

Signal Processing

Signal processing studies how information is represented, transformed, filtered, compressed, reconstructed, and estimated from signals. A signal may be a time series, an...

Differentiable Operating Systems

A differentiable operating system is an execution environment whose resource-management decisions can be optimized using gradients or gradient-like feedback. Instead of...

Parallelism

Automatic differentiation is usually described as a transformation of programs or computational graphs. In real systems, it is also a parallel execution problem. Large...

Quantum Differentiation

Quantum computation introduces a computational model fundamentally different from classical programs.

Formal Verification

Automatic differentiation systems are trusted infrastructure. Scientific computing, machine learning, optimization, simulation, and control systems depend on gradients being...

Custom Gradients

A custom gradient gives the user direct control over the backward rule of an operation. The forward computation still produces an ordinary value, but the derivative no longer...

Enzyme

Enzyme is a compiler-based automatic differentiation system for LLVM and MLIR. Instead of differentiating source code directly, or recording tensor operations at runtime,...

Enzyme

Enzyme is a compiler-based automatic differentiation system for LLVM and MLIR. Instead of differentiating source code directly, or recording tensor operations at runtime,...

Historical Development

Automatic differentiation developed from a simple observation: a numerical computation already contains the structure needed to compute its derivative. The program evaluates...

Linearization

Linearization is the operation of replacing a nonlinear function by its best local linear approximation at a chosen point. Automatic differentiation can be understood as a...

Memory and State

Automatic differentiation operates on computations, but computations execute inside a memory model. Variables occupy storage locations, arrays are mutated, buffers are reused,...

Computational Complexity

Automatic differentiation is fundamentally a computational technique. Its practical importance comes from the fact that derivatives can often be computed with asymptotic cost...

Higher-Dimensional Tangent Spaces

So far, forward mode has propagated a single tangent direction:

Wengert Lists

A Wengert list is a linear representation of a computation in which every intermediate result is assigned to a unique variable. It is one of the earliest and most influential...

Differential Algebras

Dual numbers and hyper-dual numbers are special cases of a broader algebraic structure called a differential algebra. This framework abstracts differentiation away from...

Taylor Mode AD

Taylor mode automatic differentiation computes derivatives by propagating truncated Taylor series through a program.

Non-Smooth Programs

A non-smooth program contains operations where the derivative is undefined, discontinuous, set-valued, or unstable under small perturbations. These programs arise naturally in...

Eigenvalue Problems

Eigenvalue problems are fundamental in numerical analysis, optimization, physics, graph methods, control theory, and machine learning. They are also among the most subtle...

Attention Mechanisms

Attention is a sequence operation that lets each position read information from other positions. Instead of compressing the whole past into one recurrent hidden state,...

Computational Finance

Computational finance uses numerical models to price contracts, measure risk, and optimize portfolios. Automatic differentiation is useful because most financial computations...

Differentiable Compilers

A differentiable compiler is a compilation system that supports gradient propagation through compilation decisions, generated programs, or execution behavior. Instead of...

Determinism and Reproducibility

Automatic differentiation systems are often assumed to be deterministic. Given identical inputs, identical parameters, and identical code, many users expect identical...

Probabilistic Automatic Differentiation

Classical automatic differentiation computes derivatives of deterministic programs.

Program Equivalence

Automatic differentiation transforms programs. A fundamental semantic question therefore arises:

Operator Libraries

An automatic differentiation engine becomes useful only after it supports a sufficiently rich set of primitive operations. The collection of these primitives is the operator...

Zygote

Zygote is a source-to-source reverse-mode automatic differentiation system for the Julia programming language. It was designed to differentiate high-level Julia code directly,...

Zygote

Zygote is a source-to-source reverse-mode automatic differentiation system for the Julia programming language. It was designed to differentiate high-level Julia code directly,...

Accuracy, Complexity, and Stability

Derivative computation is not only a mathematical problem. It is also a numerical and systems problem. A derivative method must answer three questions simultaneously:

Computational Graphs

A computational graph represents a calculation as nodes and edges. Nodes represent operations or values. Edges represent data dependencies. Automatic differentiation uses this...

Loops and Recurrence Relations

Loops express repeated computation. Recurrence relations express the same idea mathematically: each state is computed from one or more earlier states.

Mixed-Mode Differentiation

Mixed-mode differentiation combines forward accumulation and reverse accumulation in the same derivative computation. It is used when neither pure forward mode nor pure...

Complexity Analysis

Forward mode automatic differentiation has a simple cost model. It evaluates the original program and, at the same time, evaluates the tangent program. Each primitive...

Tape-Based Systems

Most reverse mode automatic differentiation systems require a mechanism for recording the forward computation so that the reverse pass can later traverse it backward. This...

Hyper-Dual Numbers

Dual numbers compute first derivatives exactly. Truncated polynomial algebras extend this to higher-order derivatives, but practical higher-order differentiation introduces an...

Nested AD

Nested automatic differentiation means applying automatic differentiation inside another automatic differentiation computation.

Piecewise Differentiability

A piecewise differentiable function is built from several differentiable pieces joined by boundaries. Each piece has an ordinary derivative inside its region. At the...

Differentiating Factorizations

Matrix factorizations rewrite a matrix into structured factors. They are used because the factors make later computations cheaper, more stable, or easier to interpret. In...

AD in Python

Python became the dominant language for modern machine learning and differentiable computing because it combines a simple programming model with access to high-performance...

Sequence Models

Sequence models process ordered data. The input is not one independent vector, but a series:

Molecular Simulation

Molecular simulation models the behavior of atoms and molecules using physical interaction laws. Automatic differentiation is important because many molecular methods require...

Differentiable Search and Retrieval

Differentiable search and retrieval systems integrate information access into gradient-based learning. Instead of treating retrieval as an external symbolic operation, the...

Gradient Vanishing and Explosion

Gradient-based optimization relies on propagating derivative information through many layers, time steps, or computational transformations. In deep systems, these gradients...

Neural ODEs

Classical neural networks apply a finite sequence of transformations:

Lambda Calculus and AD

Automatic differentiation becomes substantially more difficult once programs contain higher-order functions.

Memory Management

Memory management is the main systems problem in reverse mode automatic differentiation. The derivative rules are usually small. The hard part is deciding which primal values,...

JAX

JAX is an automatic differentiation and array programming system for Python. It combines NumPy-like syntax with composable program transformations. Its core transformations...

JAX

JAX is an automatic differentiation and array programming system for Python. It combines NumPy-like syntax with composable program transformations. Its core transformations...

Automatic Differentiation

Automatic differentiation computes derivatives by applying the chain rule to the operations of a program. The input is ordinary code that computes a value. The output is code,...

Chain Rule as Composition Algebra

The chain rule is the central theorem behind automatic differentiation. Every useful AD algorithm is a disciplined way of applying the chain rule to a program.

Control Flow

Control flow determines which operations a program executes. Straight-line programs have a fixed sequence of operations, but ordinary programs contain branches, loops,...

Reverse Accumulation

Reverse accumulation is the reverse-mode form of automatic differentiation. It propagates derivative information backward from outputs to inputs.

Jacobian-Vector Products

The natural output of forward mode automatic differentiation is a Jacobian-vector product. Instead of constructing the full Jacobian matrix explicitly, forward mode computes...

Reverse Accumulation Algorithms

Reverse accumulation is the operational core of reverse mode automatic differentiation. The forward pass evaluates a program and records dependency information. The reverse...

Truncated Polynomial Algebras

Dual numbers capture first-order derivatives because the infinitesimal element satisfies

Higher-Order Reverse Mode

Reverse mode is efficient for scalar-output functions because it propagates one adjoint backward through the computation and produces a full gradient. For

Dynamic Graphs

A dynamic graph is a computation graph built while the program runs. Its structure depends on ordinary runtime values: branches, loop counts, recursive calls, tensor shapes,...

Linear Algebra Primitives

Linear algebra primitives are tensor operations with algebraic structure: matrix multiplication, triangular solves, factorizations, inverses, determinants, norms, and spectral...

Neural Network Training

Neural network training is the repeated application of three operations: evaluate a model, differentiate a scalar loss, and update parameters. Automatic differentiation...

Computational Fluid Dynamics

Computational fluid dynamics studies fluid motion by solving discretized forms of the governing equations. Automatic differentiation enters CFD when we want gradients of...

Differentiable Physics Engines

A differentiable physics engine computes gradients of physical simulation outputs with respect to inputs, parameters, or control signals. Instead of treating simulation as a...

Memory Explosion

Reverse-mode automatic differentiation trades computation for memory. To compute gradients efficiently, the backward pass requires access to intermediate values produced...

Continuous-Time Adjoint Methods

Many systems evolve continuously over time rather than through discrete layers. A state variable changes according to a differential equation:

Differential Categories

Cartesian differential categories model differentiation in categories with products. Differential categories generalize this idea further by shifting attention from cartesian...

Tape Design

A tape is an append-only record of the operations executed during the forward pass. Reverse mode uses the tape to replay derivative rules backward.

PyTorch Autograd

PyTorch Autograd is a dynamic reverse-mode automatic differentiation system. It records tensor operations as they execute, builds a computation graph at runtime, and then...

PyTorch Autograd

PyTorch Autograd is a dynamic reverse-mode automatic differentiation system. It records tensor operations as they execute, builds a computation graph at runtime, and then...

Symbolic Differentiation

Symbolic differentiation computes derivatives by manipulating expressions. The input is a formula. The output is another formula.

Jacobians and Hessians

The gradient is enough when a function has many inputs and one scalar output. More general programs need more general derivative objects. Two of the most important are the...

Dependency Graphs

A dependency graph describes how values in a computation depend on earlier values. Automatic differentiation operates on these dependencies.

Forward Accumulation

Forward accumulation is the forward-mode form of automatic differentiation. It propagates derivative information in the same order as ordinary program evaluation. Each...

Forward Evaluation Rules

Forward mode automatic differentiation works by replacing each primitive operation with an extended operation on pairs:

Vector-Jacobian Products

Reverse mode automatic differentiation fundamentally computes vector-Jacobian products. The gradient of a scalar function is a special case of this more general operation.

Geometric Interpretation

Dual numbers provide an algebraic mechanism for differentiation, but they also have a precise geometric meaning. A dual number represents a point together with an...

Recursion

Recursion is control flow where a function calls itself. In automatic differentiation, recursion behaves like a loop with a call stack. Each recursive call contributes one...

Broadcasting Semantics

Broadcasting is the rule system that allows tensor operations between arrays of different shapes without explicitly materializing expanded copies. It is one of the most...

Differentiable Programming

Differentiable programming treats differentiation as a general programming-language feature. A program can contain numerical kernels, control flow, data structures, solvers,...

Backpropagation

Backpropagation is reverse mode automatic differentiation applied to neural networks. In most machine learning writing, the term refers to the whole training procedure: run a...

Inverse Problems

An inverse problem asks for causes from effects. A forward model predicts observations from parameters. An inverse model tries to recover parameters from observations.

Differentiable Rendering

Differentiable rendering is the process of computing derivatives of rendered images with respect to scene parameters. A renderer becomes part of the computational graph rather...

Overflow and Underflow

Floating point systems represent numbers within a finite range. When a computed value exceeds the largest representable magnitude, overflow occurs. When a value becomes too...

Differentiable Optimization Layers

An optimization layer is a program component whose output is the solution of an optimization problem. Instead of computing

Categorical Semantics

Algebraic semantics describes differentiation through derivations, tangent maps, and linear structure. Categorical semantics goes further. It studies differentiation as a...

Graph Representation

A graph representation makes the structure of a differentiated computation explicit. In reverse mode, this structure is required because the backward pass must know which...

TensorFlow Autograd

TensorFlow Autograd refers to TensorFlow’s automatic differentiation system, mainly exposed through tf.GradientTape. It is a reverse-mode AD system designed for tensor...

TensorFlow Autograd

TensorFlow Autograd refers to TensorFlow’s automatic differentiation system, mainly exposed through tf.GradientTape. It is a reverse-mode AD system designed for tensor...

Numerical Differentiation

Numerical differentiation estimates derivatives by evaluating a function at nearby input values. It treats the function as a black box. The method does not need access to the...

Multivariate Calculus

Automatic differentiation is usually applied to functions with many inputs and many outputs. The calculus needed for this setting is multivariate calculus: the study of how a...

Intermediate Variables

Intermediate variables are the named values created between program inputs and program outputs. They make automatic differentiation mechanical.

Elementary Operations

Automatic differentiation reduces differentiation to a finite collection of elementary operations. Every program, regardless of complexity, is decomposed into primitive...

Dual Numbers

Dual numbers give forward mode automatic differentiation a compact algebraic form. Instead of storing a value and a tangent as two unrelated fields, we package them into one...

Reverse Computational Graphs

Reverse mode automatic differentiation operates on a computational graph. The forward pass evaluates the graph from inputs to outputs. The reverse pass traverses the same...

Nilpotent Elements

The defining feature of dual numbers is the existence of a nonzero element whose square vanishes:

forward

A loop repeats a computation until a condition fails or a fixed iteration count is reached. In automatic differentiation, loops are important because many numerical algorithms...

Tensor Operations

Tensor operations generalize scalar, vector, and matrix operations to arrays with arbitrary rank. In automatic differentiation, a tensor is usually treated as a typed array...

Functional Languages

Functional programming languages provide a natural semantic foundation for automatic differentiation. Programs are expressed as compositions of functions, immutable values,...

Stochastic Optimization

Stochastic optimization studies optimization when the objective is accessed through samples, noisy estimates, or partial observations. In machine learning, this is the normal...

Sensitivity Analysis

Sensitivity analysis studies how changes in inputs affect the outputs of a system. In differential equations, optimization, simulation, and machine learning, the main object...

Differentiable Databases

A differentiable database is a data system whose operations participate in gradient-based optimization. Instead of treating storage and querying as external infrastructure,...

Stability of Reverse Mode

Reverse mode automatic differentiation computes gradients by propagating adjoint values backward through a computational graph. In exact arithmetic, the reverse accumulation...

Differentiating Through Solvers

A solver is a program that computes a value by search, iteration, or factorization. Instead of evaluating a closed-form expression, it finds a value that satisfies a condition.

Algebraic Semantics

Automatic differentiation is often introduced operationally. A program executes elementary operations, and derivative information propagates alongside the computation. This...

Minimal Reverse Mode Engine

Reverse mode automatic differentiation computes derivatives by traversing the program backward after evaluation. Unlike forward mode, which propagates tangents alongside...

Tapenade

Tapenade is a source-transformation automatic differentiation system developed at INRIA. Like ADIFOR, it takes an existing program and produces a new differentiated program....

Tapenade

Tapenade is a source-transformation automatic differentiation system developed at INRIA. Like ADIFOR, it takes an existing program and produces a new differentiated program....

Chapter 1. Introduction

A derivative measures how an output changes when an input changes. That sentence is simple, but it is one of the main ideas behind numerical computing, optimization, machine...

Chapter 2. Mathematical Foundations

Automatic differentiation begins with a simple object: a function.

Chapter 3. Programs as Mathematical Objects

A straight-line program is the simplest model of computation used in automatic differentiation. It is a program with a fixed sequence of assignments, no branches, no loops,...

Chapter 4. Core Theory of Automatic Differentiation

Automatic differentiation is built on a simple observation: a complicated derivative can be computed by composing many small local derivatives. Instead of manipulating a full...

Chapter 5. Forward Mode Automatic Differentiation

Forward mode automatic differentiation computes derivatives by carrying two values through a program at the same time: the ordinary value and its tangent. The ordinary value...

Chapter 6. Reverse Mode Automatic Differentiation

Reverse mode automatic differentiation computes derivatives by propagating sensitivities backward through a computation. In forward mode, each intermediate value carries a...

Chapter 7. Dual Numbers and Algebraic Structures

Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of...

Chapter 8. Higher-Order Differentiation

First derivatives describe local rate of change. Second derivatives describe how that rate of change itself changes. In optimization, this is curvature. In dynamics, it is...

Chapter 9. Differentiation of Control Flow

A conditional is a program construct that chooses one computation among several possible computations. In ordinary code, this is written as if, else, switch, case, pattern...

Chapter 10. Matrix and Tensor Differentiation

Matrix calculus is the notation and rule system used to differentiate functions whose inputs, outputs, or intermediate values are vectors, matrices, or tensors. Automatic...

Chapter 13. Optimization and Machine Learning

Gradient descent is the basic optimization procedure behind much of modern machine learning. It is simple enough to state in one line, but rich enough to expose many of the...

Chapter 14. Scientific Computing Applications

Differential equations are one of the main reasons automatic differentiation matters in scientific computing. Many scientific models are not written as closed-form functions....

Chapter 15. Differentiable Systems Architecture

An end-to-end differentiable pipeline is a system whose final objective can send derivative information backward through every trainable or tunable stage of computation....

Chapter 17. Numerical and Systems Concerns

Automatic differentiation computes derivatives by executing arithmetic. On a real machine, arithmetic uses finite precision. This means AD gives the derivative of the...

Chapter 18. Advanced Topics

Many programs do not compute their output by applying a fixed sequence of explicit operations. Instead, they define the output as the solution of another problem.

Chapter 19. Theory and Foundations

Automatic differentiation is often described by a simple rule:

Chapter 20. Building an AD Engine

A minimal forward mode automatic differentiation engine has one job: evaluate a program while carrying both a value and its derivative. The engine does not build a graph. It...

Chapter 21. Major AD Systems

ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...

Appendix

ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...

Chapter 16. Sparse and Structured Differentiation

Sparse and structured differentiation studies how to compute derivatives without materializing dense derivative objects. Many real systems have enormous Jacobians and...

Chapter 22. Open Problems

Automatic differentiation works naturally on pure mathematical functions:

Auto Diff

Auto Diff book notes exported from ChatGPT, organized into 22 chapters.

Appendix

ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...

Chapter 22. Open Problems

Automatic differentiation works naturally on pure mathematical functions:

Chapter 21. Major AD Systems

ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...

Chapter 20. Building an AD Engine

A minimal forward mode automatic differentiation engine has one job: evaluate a program while carrying both a value and its derivative. The engine does not build a graph. It...

Chapter 19. Theory and Foundations

Automatic differentiation is often described by a simple rule:

Chapter 18. Advanced Topics

Many programs do not compute their output by applying a fixed sequence of explicit operations. Instead, they define the output as the solution of another problem.

Chapter 17. Numerical and Systems Concerns

Automatic differentiation computes derivatives by executing arithmetic. On a real machine, arithmetic uses finite precision. This means AD gives the derivative of the...

Chapter 16. Sparse and Structured Differentiation

Sparse and structured differentiation studies how to compute derivatives without materializing dense derivative objects. Many real systems have enormous Jacobians and...

Chapter 15. Differentiable Systems Architecture

An end-to-end differentiable pipeline is a system whose final objective can send derivative information backward through every trainable or tunable stage of computation....

Chapter 14. Scientific Computing Applications

Differential equations are one of the main reasons automatic differentiation matters in scientific computing. Many scientific models are not written as closed-form functions....

Chapter 13. Optimization and Machine Learning

Gradient descent is the basic optimization procedure behind much of modern machine learning. It is simple enough to state in one line, but rich enough to expose many of the...

Chapter 12. AD in Modern Programming Languages

Lisp is one of the natural homes of automatic differentiation. It treats programs as data, has a simple expression syntax, and supports macro systems that can transform code...

Chapter 11. Compiler and Runtime Design

Source transformation is an implementation strategy for automatic differentiation in which a program that computes a function is rewritten into another program that computes...

Ahead-of-Time vs Just-in-Time Differentiation

Automatic differentiation can be performed before a program runs, while it runs, or in a staged phase between the two.

Chapter 10. Matrix and Tensor Differentiation

Matrix calculus is the notation and rule system used to differentiate functions whose inputs, outputs, or intermediate values are vectors, matrices, or tensors. Automatic...

Kernel Fusion

Kernel fusion combines several small operations into one larger executable unit.

Chapter 9. Differentiation of Control Flow

A conditional is a program construct that chooses one computation among several possible computations. In ordinary code, this is written as if, else, switch, case, pattern...

Memory Planning

Memory planning determines where values are stored, how long they remain alive, and when storage can be reused.

Chapter 8. Higher-Order Differentiation

First derivatives describe local rate of change. Second derivatives describe how that rate of change itself changes. In optimization, this is curvature. In dynamics, it is...

Staging and Partial Evaluation

Staging is the separation of a program into phases.

Chapter 7. Dual Numbers and Algebraic Structures

Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of...

Tracing Systems

Tracing is an implementation strategy where an AD system observes a program while it runs and records the operations that occur.

AD in Rust

Rust is an attractive language for automatic differentiation because it combines low-level performance with strong static guarantees. It gives the programmer control over...

Chapter 6. Reverse Mode Automatic Differentiation

Reverse mode automatic differentiation computes derivatives by propagating sensitivities backward through a computation. In forward mode, each intermediate value carries a...

Graph IRs

A graph intermediate representation models a program as nodes and edges.

Chapter 5. Forward Mode Automatic Differentiation

Forward mode automatic differentiation computes derivatives by carrying two values through a program at the same time: the ordinary value and its tangent. The ordinary value...

SSA Form

Static single assignment form, or SSA, is an intermediate representation where each variable is assigned exactly once.

AD in C and C++

C and C++ are important targets for automatic differentiation because much scientific, engineering, graphics, finance, and machine learning infrastructure is written in these...

Chapter 4. Core Theory of Automatic Differentiation

Automatic differentiation is built on a simple observation: a complicated derivative can be computed by composing many small local derivatives. Instead of manipulating a full...

Intermediate Representations

An intermediate representation, or IR, is the internal program form used by a compiler or AD system after parsing and before final code generation.

Chapter 3. Programs as Mathematical Objects

A straight-line program is the simplest model of computation used in automatic differentiation. It is a program with a fixed sequence of assignments, no branches, no loops,...

Operator Overloading

Operator overloading implements automatic differentiation by changing the meaning of ordinary arithmetic operations for special numeric objects.

Chapter 2. Mathematical Foundations

Automatic differentiation begins with a simple object: a function.

Chapter 11. Compiler and Runtime Design

Source transformation is an implementation strategy for automatic differentiation in which a program that computes a function is rewritten into another program that computes...

Chapter 12. AD in Modern Programming Languages

Lisp is one of the natural homes of automatic differentiation. It treats programs as data, has a simple expression syntax, and supports macro systems that can transform code...

Chapter 1. Introduction

A derivative measures how an output changes when an input changes. That sentence is simple, but it is one of the main ideas behind numerical computing, optimization, machine...

Reference material: set theory and relations, proof techniques, linear algebra for graph theory, probability review, algorithms and complexity, mathematical notation, historical development, common graph families, theorem index, and symbol index.

XIV. Specialized Structures

Bipartite, complete, and regular graphs, interval and chordal graphs, comparability and perfect graphs, tournaments, grid graphs, De Bruijn graphs, Kneser graphs, the Petersen graph, and Ramanujan graphs.

XIII. Applications

Social networks, web graphs, PageRank, biological and chemical networks, electrical and transportation networks, compiler graphs, knowledge graphs, recommendation systems, graph neural networks, distributed systems, and blockchain networks.

XII. Advanced Topics

Extremal graph theory, Turán-type problems, Szemerédi regularity lemma, minor theory, Robertson-Seymour theory, infinite graphs, topological and category-theoretic graph theory, simplicial complexes, graph limits, temporal graphs, and quantum graph theory.

Euler Products

Dirichlet Series

Average Orders of Arithmetic Functions

Möbius Inversion

Dirichlet Convolution

Completely Multiplicative Functions

Frobenius Automorphisms

Parity Problem

Appendix A.1 Sets and Functions

Liouville Function

Decomposition and Inertia Groups

Chen's Theorem

Euler Totient Function

Ramification of Primes

Bombieri-Vinogradov Theorem

The Langlands Program

Möbius Function

Local Fields

Brun-Titchmarsh Theorem

The Birch and Swinnerton-Dyer Conjecture

Divisor Functions

$p$-Adic Numbers

Large Sieve

The Riemann Hypothesis

Applications to Computation

Valuations and Absolute Values

Selberg Sieve

Fermat's Last Theorem

Fast Modular Exponentiation

Dedekind Domains

Brun Sieve

Geometric Langlands Theory

Open Problems in Number Theory

Chinese Remainder Theorem

Principal Ideals

Schnirelmann Density

Shimura Varieties

Arithmetic Statistics

Systems of Congruences

Discriminants

Additive Bases

Trace Formulas

Probabilistic Models for Primes

Modular Inverses

Units and Dirichlet Unit Theorem

Exponential Sums

Automorphic $L$-Functions

Random Matrices and Zeta Zeros

Linear Congruences

Class Groups

Circle Method

Functoriality

Probabilistic Algorithms

Arithmetic Modulo $n$

Ideals and Prime Ideals

Waring's Problem

Galois Representations

Probabilistic Primality

Residue Classes

Unique Factorization Failure

Goldbach Problems

The Langlands Program

Smooth Numbers

Congruence Relations

Norm and Trace

Sumsets

Automorphic Representations

Random Integers

Distribution Heuristics of Primes

Ring of Integers

Generalized Riemann Hypothesis

Automorphic Forms

Zero-Knowledge Proofs

Euler's Proof

Number Fields

Nonvanishing Results

The Modularity Theorem

Post-Quantum Cryptography

Euclid's Proof

Minimal Polynomials