Euler Products
Euler products are one of the central ideas of analytic number theory. They express infinite sums over integers as infinite products over primes.
476 notes
Euler products are one of the central ideas of analytic number theory. They express infinite sums over integers as infinite products over primes.
An arithmetic function $fn$ can be encoded into an infinite series of the form
Arithmetic functions often fluctuate strongly from one integer to the next.
Many arithmetic functions are defined through sums over divisors. For example,
Arithmetic functions can be added and multiplied pointwise, but number theory has another product that is better adapted to divisibility.
An arithmetic function is a function defined on the positive integers. Such a function
One of the deepest ideas in algebraic number theory is that prime numbers possess hidden symmetry inside field extensions.
Sieve methods are extremely effective for estimating how many integers avoid small prime factors. They have produced major results about:
A set is a collection of objects called elements.
The Liouville function is an arithmetic function denoted by
Let
The Twin Prime Conjecture states that infinitely many primes satisfy
Euler's totient function is an arithmetic function denoted by
In the ordinary integers, every nonzero integer factors uniquely into prime numbers.
The Prime Number Theorem for arithmetic progressions states that for
The Langlands program is one of the largest and most influential research programs in modern mathematics.
The Möbius function is an arithmetic function denoted by
Classical number theory studies arithmetic globally over fields such as
One of the central problems of analytic number theory is understanding how primes distribute among residue classes.
An elliptic curve over $\mathbb{Q}$ may be written in Weierstrass form
Divisor functions measure the positive divisors of an integer. They are among the first examples of arithmetic functions, because their values depend directly on the prime...
The real numbers arise by completing the rational numbers using the ordinary absolute value. The $p$-adic numbers arise by completing the rational numbers using the $p$-adic...
Classical sieve methods estimate how many integers survive congruence restrictions. The large sieve approaches these problems from a different direction.
The Riemann zeta function is one of the central objects in mathematics.
Modular arithmetic is not only a theoretical language for divisibility. It is also one of the main tools of computation with integers.
In ordinary analysis, the absolute value
Brun's sieve introduced the idea of estimating sifted sets through truncated inclusion-exclusion. However, Brun's method often produced bounds that were technically difficult...
Fermat's Last Theorem states that there are no positive integers
Modular arithmetic often requires computing powers such as
Ordinary integers satisfy several remarkable properties simultaneously:
Sieve methods are techniques for counting integers that remain after removing residue classes modulo primes.
The classical Langlands program relates:
Number theory contains some of the oldest and deepest unsolved problems in mathematics.
The Chinese remainder theorem describes when several congruence conditions can be combined into one congruence. Its cleanest form occurs when the moduli are pairwise coprime.
Let $R$ be a commutative ring. An ideal $I\subseteq R$ is called principal if there exists an element $\alpha\in R$ such that
In additive number theory, ordinary asymptotic density is often too weak to control additive behavior.
Modular curves parameterize elliptic curves and connect modular forms with arithmetic geometry.
Arithmetic statistics studies the distribution of arithmetic objects inside large families.
A system of congruences asks for an integer satisfying several congruence conditions simultaneously.
The discriminant is one of the most important invariants of a number field. It measures how the arithmetic of the field differs from ordinary rational arithmetic.
A central question in additive number theory asks whether every integer can be represented as a sum of elements from a fixed set.
Fourier analysis decomposes functions into harmonic frequencies.
Prime numbers are deterministic objects, but many aspects of their distribution resemble random behavior.
In ordinary arithmetic, division by a nonzero number means multiplication by its reciprocal. Modular arithmetic is more delicate. A residue class may or may not have a...
Let $K$ be a number field and let
Exponential sums are among the central tools of analytic number theory.
The Riemann zeta function
The Riemann zeta function is defined for $\operatorname{Re}s>1$ by
A linear congruence is a congruence of the form
In ordinary integers, every ideal is generated by a single element:
Many problems in additive number theory ask whether an integer can be represented in the form
The Langlands program predicts that many different arithmetic objects are connected by systematic transfers.
A probabilistic algorithm uses random choices during its execution. In number theory, this is often a practical advantage rather than a weakness.
Arithmetic modulo $n$ is arithmetic performed on residue classes modulo $n$. Instead of distinguishing all integers separately, we identify integers that have the same...
In ordinary integers, every number factors uniquely into primes. In many rings of algebraic integers, this property fails.
Waring's problem asks whether every sufficiently large positive integer can be written as a sum of a bounded number of fixed powers.
Galois groups encode the symmetries of algebraic equations and field extensions.
A primality test determines whether an integer is prime.
Congruence modulo $n$ groups integers according to their remainders after division by $n$. If two integers have the same remainder, they are congruent modulo $n$.
One of the central properties of the ordinary integers is unique factorization.
Goldbach-type problems ask whether integers can be represented as sums of primes. They are among the oldest and most famous problems in additive number theory.
The Langlands program is one of the most ambitious and influential theories in modern mathematics.
A positive integer is called $y$-smooth if all of its prime factors are at most $y$.
Ordinary equality compares integers exactly. In many arithmetic problems, however, only the remainder after division matters.
Let $K$ be a number field of degree
Additive number theory studies arithmetic structure through addition of integers and subsets of integers.
Classical modular form theory begins with analytic functions satisfying symmetry conditions.
Number theory often studies exact statements about individual integers. For example, one may ask whether a given integer is prime, squarefree, smooth, or representable as a...
The infinitude of primes guarantees that primes continue indefinitely, but it says nothing about how frequently primes occur.
In ordinary arithmetic, the integers
The classical Riemann Hypothesis concerns the zeros of the Riemann zeta function
Modular forms are functions on the upper half-plane satisfying symmetry conditions under the modular group
A zero-knowledge proof allows one party to convince another that a statement is true without revealing why it is true.
Euclid proved that there are infinitely many primes by contradiction. Euler discovered a very different proof based on infinite series and products.
A number field is a finite extension of the rational numbers. Concretely, it is a field $K$ satisfying
A central theme in analytic number theory is determining when an $L$-function is nonzero at a particular point.
For centuries, elliptic curves and modular forms were studied as separate objects.
Modern public-key cryptography relies heavily on two computational assumptions:
Euclid's proof of the infinitude of primes is one of the earliest examples of a general argument in number theory. It does not depend on computation, experimentation, or...
An algebraic number is a complex number that satisfies some nonzero polynomial equation with rational coefficients. Thus $\alpha\in\mathbb{C}$ is algebraic if there exists a...
An arithmetic progression is a sequence of the form
An elliptic curve is simultaneously:
Lattice cryptography is a family of cryptographic systems based on the presumed hardness of computational problems on high-dimensional lattices.
Prime numbers are the building blocks of the positive integers. Once unique prime factorization is known, a natural question arises: are there only finitely many primes, or do...
The ordinary integers
The Riemann zeta function
The modular group acts on the upper half-plane by fractional linear transformations:
Pairing-based cryptography uses special maps defined on elliptic curve groups. A pairing is a function
An arithmetic function is a function whose domain is the positive integers. It assigns a value to each integer
Diophantine approximation studies how closely real numbers can be approximated by rational numbers.
Dirichlet characters behave analogously to exponential functions in Fourier analysis. Just as complex exponentials separate frequencies, characters separate residue classes...
Modular forms already possess symmetry under the modular group. Yet a deeper arithmetic structure emerges through another family of operators: the Hecke operators.
Elliptic curve cryptography is a public-key cryptographic framework based on the arithmetic of elliptic curves over finite fields.
Unique prime factorization says that every integer $n>1$ can be written as a product of primes. The canonical prime decomposition is the ordered and exponentiated version of...
Recall that a Pell equation has the form
The Riemann zeta function studies prime numbers globally, without distinguishing congruence classes. However, many arithmetic questions concern primes satisfying conditions such as
Modular forms satisfy strong symmetry conditions under the modular group. Among them, cusp forms form the deepest and most arithmetic subclass.
Secure communication requires two parties to share secret information. In classical symmetric cryptography, both parties must already possess the same secret key before...
The fundamental theorem of arithmetic states that every integer $n>1$ can be written as a product of prime numbers, and that this product is unique up to the order of the factors.
The convergents of a continued fraction are the rational numbers obtained by truncating the expansion at finite stages.
The Riemann zeta function was introduced through the series
Among all modular forms, Eisenstein series are the most explicit and computationally accessible.
Classical cryptography uses a shared secret key. Both sender and receiver must know the same secret information in advance.
Two integers $a$ and $b$, not both zero, are called coprime if their greatest common divisor is $1$:
Many important numbers are irrational:
One of the deepest ideas in analytic number theory is that the zeros of the zeta function determine the distribution of prime numbers.
Modular forms are among the central objects of modern number theory.
Modern number theory relies heavily on computation. Two broad computational paradigms dominate the subject:
Let $a$ and $b$ be integers. An integer of the form
Finite continued fractions correspond exactly to rational numbers. When the Euclidean algorithm never terminates, the continued fraction becomes infinite.
The Riemann zeta function has nontrivial zeros inside the critical strip
The modular group acts on the upper half-plane by fractional linear transformations:
Elliptic curves occupy a central position in modern number theory, arithmetic geometry, and cryptography.
The Euclidean algorithm computes the greatest common divisor of two integers. The extended Euclidean algorithm does more. It also expresses the gcd as an integer linear...
A finite continued fraction is an expression of the form
The zeros of the Riemann zeta function are the complex numbers $s$ satisfying
Modular forms begin with the action of certain matrix groups on the complex upper half-plane.
Modular forms are highly structured analytic functions with deep arithmetic properties. Although their definitions involve complex analysis and group actions, modular forms...
Write the two integers as
The greatest common divisor of two integers can be found by listing divisors, but this method becomes inefficient for large numbers. For example, finding
The Euclidean algorithm is one of the oldest and most important algorithms in mathematics. It computes the greatest common divisor of two integers using repeated division.
The defining series of the zeta function,
One of the central goals of algebraic number theory is to classify field extensions of a number field
A lattice is a discrete additive subgroup of Euclidean space. More concretely, let
Study empirical properties of prime numbers through computation.
Let $a$ and $b$ be nonzero integers. An integer $m$ is called a common multiple of $a$ and $b$ if
Quadratic residue theory is not only a theoretical subject. It also plays a major role in computational number theory, cryptography, primality testing, and algorithm design.
The defining series of the Riemann zeta function is
Global class field theory studies finite abelian extensions of number fields such as
Integer factorization asks for the prime decomposition of a positive integer. Given
1. Prove that the sum of two even integers is even.
Let $a$ and $b$ be integers, not both zero. An integer $d$ is called a common divisor of $a$ and $b$ if
Quadratic reciprocity describes when one prime is a square modulo another prime. A natural question is whether similar laws exist for higher powers.
The defining series of the Riemann zeta function is
One of the central discoveries of algebraic number theory is that unique factorization may fail in rings of algebraic integers.
A prime number is an integer greater than $1$ whose only positive divisors are
The division algorithm is one of the basic structural facts about the integers. It says that any integer can be divided by a positive integer with a unique quotient and remainder.
Gauss sums arise from combining multiplicative and additive structures modulo a prime. They form one of the fundamental tools of analytic and algebraic number theory.
One of the central objects of analytic number theory is the Riemann zeta function. It connects infinite series, prime numbers, complex analysis, and arithmetic structure into...
One of the oldest themes in number theory is reciprocity: the phenomenon that solvability conditions for one prime are controlled by arithmetic involving another prime.
Modern computational number theory depends fundamentally on efficient arithmetic with large integers.
A positive integer $n>1$ is called composite if it is not prime.
The theory of quadratic residues asks a fundamental question:
A pair of primes
A central goal of algebraic number theory is to understand field extensions of a given base field, especially extensions of the rational numbers
One of the central ideas of modern analysis is that functions may be decomposed spectrally into elementary pieces.
Prime numbers are the fundamental building blocks of arithmetic.
Euler criterion gives an efficient way to decide whether an integer is a square modulo an odd prime. Let $p$ be an odd prime and let $a$ be an integer not divisible by $p$....
Let
The rational numbers may be studied through their completions:
Functoriality is the unifying mechanism of the Langlands program. It predicts systematic relationships between automorphic representations attached to different algebraic groups.
Division of integers does not always produce an integer. For example,
The Legendre symbol
The Prime Number Theorem describes the average distribution of primes up to a large number $x$:
A central problem in number theory is determining whether an equation possesses rational or integral solutions.
The Langlands program is a broad collection of conjectures connecting number theory, representation theory, harmonic analysis, and algebraic geometry. Its central idea is that...
A group $G$ is abelian if
The idea of number arose long before formal mathematics. Early civilizations used numbers for counting objects, measuring land, recording trade, and tracking time.
Let $p$ be an odd prime and let $a\in\mathbb{Z}$. The Legendre symbol is defined by
The Prime Number Theorem states that
One of the central ideas of number theory is that congruences modulo powers of a prime often approximate genuine arithmetic solutions.
Number theory studies arithmetic simultaneously at two levels:
Number theory is one of the oldest parts of mathematics, but modern number theory is not a single ancient subject carried forward unchanged. It is a layered discipline....
The integers extend infinitely in both directions:
A quadratic congruence is a congruence involving a square. The basic form is
The logarithmic integral is the function
The rational numbers form a field rich enough for arithmetic, yet insufficient for many limiting processes.
Classically, number theory studied special analytic functions such as modular forms. These functions satisfy strong symmetry conditions under actions of arithmetic groups.
Computation has become an essential part of number theory. Classical arithmetic relied mainly on symbolic reasoning and hand calculations. Modern arithmetic combines rigorous...
Many mathematical objects are defined recursively. A recursive definition specifies:
A Diophantine equation is first an arithmetic object. It asks for solutions in integers or rational numbers. But every polynomial equation also defines a geometric object.
The Prime Number Theorem describes the asymptotic distribution of prime numbers. It states that
Category theory studies mathematical structures through objects and maps between them. Instead of looking only at what objects are made of, it studies how they relate to other...
The real numbers arise by completing the rational numbers with respect to the ordinary absolute value. This completion produces a field suited to Euclidean geometry and...
Representation theory studies abstract algebraic objects by expressing them as linear transformations of vector spaces.
Ordinary induction proves a statement $Pn$ by showing that truth passes from one case to the next:
A central problem in number theory is to study solutions of polynomial equations whose coordinates belong to a specified number system. Two important cases are:
The prime counting function
A vector space over a field $F$ is a set $V$ equipped with addition and scalar multiplication satisfying the usual algebraic rules.
The ordinary absolute value on the real numbers measures magnitude:
One of the central problems in arithmetic geometry is understanding the number of solutions of polynomial equations over finite fields.
Many statements in number theory concern all natural numbers. For example, one may wish to prove that
An exponential Diophantine equation is a Diophantine equation in which one or more unknowns appear as exponents. Typical examples include
One of the oldest questions in number theory asks how prime numbers are distributed among the positive integers. Since primes become less frequent as numbers grow larger,...
Measure theory extends the ideas of length, area, volume, and integration to more general settings. In number theory, measure appears in probability, harmonic analysis,...
One of the central ideas of algebraic number theory is that prime numbers may behave differently after passing to a larger field.
Classical topology studies geometric spaces using invariants such as homology and cohomology. Over the complex numbers, algebraic varieties can often be viewed as topological...
The order relation distinguishes positive and negative integers, but in many situations the sign of a number is less important than its magnitude. For example, the integers
A Catalan-type equation is a Diophantine equation involving powers whose values differ by a small amount. The classical example is
In analytic number theory, one often studies sums of the form
Topology studies continuity, convergence, connectedness, and geometric structure in an abstract setting. In number theory, topology appears naturally in real analysis, complex...
One of the most important classes of number fields arises from the solutions of the equation
Arithmetic geometry often studies families of algebraic curves varying over arithmetic bases. The most important base is
The integers are not merely a collection of numbers equipped with arithmetic operations. They also possess an order structure. Given two integers $a$ and $b$, one can...
One of the oldest questions in number theory asks which integers can be written as sums of squares. Typical examples are
Analytic number theory studies infinite sums, products, and integrals. Before such expressions can be manipulated safely, one must understand the meaning of convergence.
The real numbers $\mathbb{R}$ extend the rational numbers $\mathbb{Q}$ by filling gaps such as
The familiar fields
An algebraic curve is a geometric object whose dimension is one. Curves are among the oldest and most important objects in number theory and algebraic geometry.
An arithmetic operation is a rule that combines numbers to produce another number. The most basic operations on integers are addition, subtraction, multiplication, and division.
A Pell equation is a Diophantine equation of the form
Euler products arise when an infinite series has coefficients controlled by multiplication. The simplest and most important example is the zeta series
Abstract algebra studies sets equipped with operations. In number theory, these structures organize arithmetic behavior.
A polynomial equation may possess several roots related by hidden algebraic symmetries. Consider
Geometry is not only concerned with spaces themselves, but also with maps between spaces. In algebraic geometry and arithmetic geometry, these maps are called morphisms.
The natural numbers are sufficient for counting and addition, but they are not sufficient for subtraction. For example,
A mathematical proof is a logically complete argument establishing the truth of a statement from accepted assumptions, definitions, and previously proved results.
A Pythagorean triple is a triple of positive integers
An infinite product has the form
A central problem in algebra is to determine where a polynomial factors completely into linear terms. Consider the polynomial
Classical algebraic geometry studies varieties defined by polynomial equations. This theory works well over algebraically closed fields, especially over $\mathbb{C}$. However,...
The natural numbers arise from the basic act of counting. When we count objects in a collection, we assign successive numbers:
A Diophantine equation is an equation whose solutions are required to be integers. The unknowns are not allowed to range over the real numbers or complex numbers unless...
The harmonic series is the infinite series
A field is a number system in which addition, subtraction, multiplication, and division by nonzero elements are always possible. The rational numbers $\mathbb{Q}$, the real...
Arithmetic geometry studies solutions of polynomial equations by combining algebra, geometry, and number theory. Its basic objects are spaces defined by polynomial equations....
A set is a collection of objects, called its elements. If $x$ is an element of a set $A$, we write $x \in A$. If $x$ is not an element of $A$, we write $x \notin A$.
Modern Number Theory book notes exported from ChatGPT, organized into 5 chapters.
A set is a collection of objects, called its elements. If $x$ is an element of a set $A$, we write $x \in A$. If $x$ is not an element of $A$, we write $x \notin A$.
Modern number theory continues to evolve rapidly.
| Period | Development |
| Definition | Location |
| Theorem | Location |
| Symbol | Meaning |
Arithmetic geometry studies solutions of polynomial equations by combining algebra, geometry, and number theory. Its basic objects are spaces defined by polynomial equations....
A field is a number system in which addition, subtraction, multiplication, and division by nonzero elements are always possible. The rational numbers $\mathbb{Q}$, the real...
The harmonic series is the infinite series
A Diophantine equation is an equation whose solutions are required to be integers. The unknowns are not allowed to range over the real numbers or complex numbers unless...
The natural numbers arise from the basic act of counting. When we count objects in a collection, we assign successive numbers:
Forward mode automatic differentiation computes derivatives by propagating tangent values alongside ordinary values. The ordinary value is called the primal. The derivative...
This section studies reverse mode automatic differentiation through concrete examples. Each case has the same structure:
Automatic differentiation is easiest to define for pure functions. A pure function behaves like a mathematical mapping: it consumes inputs, produces outputs, and has no...
Physics-informed models combine data fitting with equations from physics or applied mathematics. The model is trained not only to match observed samples, but also to satisfy...
Automatic differentiation began as a numerical technique for computing gradients of scalar functions.
A minimal automatic differentiation engine can compute correct gradients on small programs. A production system must survive long-running workloads, large tensors, distributed...
Automatic differentiation works naturally on pure mathematical functions:
Automatic differentiation works naturally on pure mathematical functions:
Automatic differentiation is a method for computing derivatives by transforming programs into derivative-propagating computations. It does not approximate derivatives...
Forward mode automatic differentiation appears in many numerical systems where directional derivatives, local sensitivities, or small parameter sets are important. This...
Reverse mode automatic differentiation is the mathematical and systems basis of backpropagation. In deep learning, the objective is usually a scalar loss depending on many...
Automatic differentiation is deeply connected to functional programming and lambda calculus. Programs can be viewed as mathematical functions, and differentiation can be...
Higher-order automatic differentiation faces a fundamental problem: derivative structure grows combinatorially with order.
Modern automatic differentiation systems are fundamentally tensor compiler systems. Their performance depends less on mathematical differentiation rules than on how...
Automatic differentiation interacts deeply with type systems because differentiation changes the structure of computation. A derivative operator maps one function into another...
Reinforcement learning studies learning systems that act in an environment. Unlike supervised learning, the training signal is not a target label for each input. The model...
Probabilistic programming represents uncertainty using executable probabilistic models. A probabilistic program defines a distribution rather than only a deterministic computation.
Differentiable systems architecture extends automatic differentiation beyond isolated functions and neural network layers. The central idea is to treat larger systems as...
Distributed gradient computation appears when a differentiable program no longer fits comfortably on one device or one machine. The reason may be model size, data volume,...
Automatic differentiation systems are usually trusted because they implement mathematically established rules such as the chain rule, product rule, and linearization of...
The preceding sections described automatic differentiation through algebraic, categorical, logical, and denotational models. These viewpoints converge on one central idea:
An automatic differentiation engine is only useful if its derivatives are correct. A small mistake in a backward rule can silently corrupt optimization, training, or...
The systems in this chapter show that automatic differentiation is not one implementation technique. It is a family of program transformations. Each system chooses a different...
The systems in this chapter show that automatic differentiation is not one implementation technique. It is a family of program transformations. Each system chooses a different...
A differentiable subprogram is a program fragment that can participate in derivative propagation as a coherent unit. Instead of differentiating an entire application...
Automatic differentiation can be understood as a transformation from one program into another program.
Many real-world Jacobians are sparse. Most derivative entries are zero because outputs depend only on small subsets of inputs.
Checkpointing is a technique for reducing the memory cost of reverse mode automatic differentiation by selectively storing intermediate states and recomputing missing values...
Automatic differentiation is deeply connected to functional programming and lambda calculus. Programs can be viewed as mathematical functions, and differentiation can be...
Perturbation confusion is a correctness bug that appears in nested automatic differentiation, especially nested forward mode. It happens when two derivative computations...
Programs do not only branch between valid computations. They also fail, stop early, raise exceptions, return sentinel values, or enter undefined numerical regions. These...
Most real computational problems are sparse. Large matrices and tensors often contain mostly zeros, structured blocks, or local interactions. Sparse representations reduce...
Swift became an important experiment in language-integrated automatic differentiation because it attempted to make differentiation a core compiler feature rather than a...
Meta-learning studies systems that improve how they learn. Instead of only optimizing model parameters for one task, a meta-learning method optimizes some part of the learning...
Robotics and control systems interact with the physical world through sensing, estimation, planning, and actuation. Automatic differentiation is important because modern...
A hybrid symbolic-numeric system combines discrete symbolic reasoning with continuous numerical computation. In the context of automatic differentiation, it means a pipeline...
Modern automatic differentiation systems are built around accelerator hardware. GPUs and TPUs provide enormous throughput for tensor operations, making large-scale...
Automatic differentiation began as a transformation applied to numerical programs. A differentiable programming language instead treats differentiation as a native semantic...
Operational semantics explains how automatic differentiation executes. Denotational semantics explains what differentiable programs mean.
Performance benchmarking measures whether an automatic differentiation engine is fast, memory-efficient, and scalable under realistic workloads. It also protects the engine...
Tinygrad is a small deep learning framework centered around a minimal reverse-mode automatic differentiation engine. It was created by entity"people","George...
Tinygrad is a small deep learning framework centered around a minimal reverse-mode automatic differentiation engine. It was created by entity"people","George...
Differentiation describes how a function changes locally. A Taylor expansion extends this idea by approximating a function with a polynomial around a point.
Automatic differentiation became important because derivatives are required everywhere numerical models are optimized, controlled, calibrated, or analyzed. Once a system can...
Forward mode and reverse mode propagate different kinds of objects.
A pure computation is easier to differentiate because every output is determined only by its explicit inputs. There is no hidden state, no external mutation, and no dependence...
Automatic differentiation computes derivatives exactly with respect to the executed floating point program. This distinguishes AD from numerical differentiation, which...
Forward mode automatic differentiation computes Jacobian-vector products:
Reverse mode automatic differentiation is computationally efficient for scalar-output functions, but it has a major systems cost: it needs information from the forward pass...
Automatic differentiation can be described operationally through dual numbers and computational graphs. It can also be described abstractly using category theory.
Higher-order derivatives contain rich geometric information, but naïve computation quickly becomes impractical.
A stateful system is a program whose output depends not only on its explicit inputs, but also on stored state. The state may live in variables, objects, arrays, files, random...
The singular value decomposition SVD is one of the most important matrix factorizations in numerical linear algebra. It appears in dimensionality reduction, least squares,...
Julia was designed for high-performance technical computing. It combines interactive syntax with a compiler capable of specializing code aggressively based on types. This...
An implicit layer defines its output as the solution of an equation, not as a fixed sequence of explicit operations. Instead of computing
Signal processing studies how information is represented, transformed, filtered, compressed, reconstructed, and estimated from signals. A signal may be a time series, an...
A differentiable operating system is an execution environment whose resource-management decisions can be optimized using gradients or gradient-like feedback. Instead of...
Automatic differentiation is usually described as a transformation of programs or computational graphs. In real systems, it is also a parallel execution problem. Large...
Quantum computation introduces a computational model fundamentally different from classical programs.
Automatic differentiation systems are trusted infrastructure. Scientific computing, machine learning, optimization, simulation, and control systems depend on gradients being...
A custom gradient gives the user direct control over the backward rule of an operation. The forward computation still produces an ordinary value, but the derivative no longer...
Enzyme is a compiler-based automatic differentiation system for LLVM and MLIR. Instead of differentiating source code directly, or recording tensor operations at runtime,...
Enzyme is a compiler-based automatic differentiation system for LLVM and MLIR. Instead of differentiating source code directly, or recording tensor operations at runtime,...
Automatic differentiation developed from a simple observation: a numerical computation already contains the structure needed to compute its derivative. The program evaluates...
Linearization is the operation of replacing a nonlinear function by its best local linear approximation at a chosen point. Automatic differentiation can be understood as a...
Automatic differentiation operates on computations, but computations execute inside a memory model. Variables occupy storage locations, arrays are mutated, buffers are reused,...
Automatic differentiation is fundamentally a computational technique. Its practical importance comes from the fact that derivatives can often be computed with asymptotic cost...
So far, forward mode has propagated a single tangent direction:
A Wengert list is a linear representation of a computation in which every intermediate result is assigned to a unique variable. It is one of the earliest and most influential...
Dual numbers and hyper-dual numbers are special cases of a broader algebraic structure called a differential algebra. This framework abstracts differentiation away from...
Taylor mode automatic differentiation computes derivatives by propagating truncated Taylor series through a program.
A non-smooth program contains operations where the derivative is undefined, discontinuous, set-valued, or unstable under small perturbations. These programs arise naturally in...
Eigenvalue problems are fundamental in numerical analysis, optimization, physics, graph methods, control theory, and machine learning. They are also among the most subtle...
Attention is a sequence operation that lets each position read information from other positions. Instead of compressing the whole past into one recurrent hidden state,...
Computational finance uses numerical models to price contracts, measure risk, and optimize portfolios. Automatic differentiation is useful because most financial computations...
A differentiable compiler is a compilation system that supports gradient propagation through compilation decisions, generated programs, or execution behavior. Instead of...
Automatic differentiation systems are often assumed to be deterministic. Given identical inputs, identical parameters, and identical code, many users expect identical...
Classical automatic differentiation computes derivatives of deterministic programs.
Automatic differentiation transforms programs. A fundamental semantic question therefore arises:
An automatic differentiation engine becomes useful only after it supports a sufficiently rich set of primitive operations. The collection of these primitives is the operator...
Zygote is a source-to-source reverse-mode automatic differentiation system for the Julia programming language. It was designed to differentiate high-level Julia code directly,...
Zygote is a source-to-source reverse-mode automatic differentiation system for the Julia programming language. It was designed to differentiate high-level Julia code directly,...
Derivative computation is not only a mathematical problem. It is also a numerical and systems problem. A derivative method must answer three questions simultaneously:
A computational graph represents a calculation as nodes and edges. Nodes represent operations or values. Edges represent data dependencies. Automatic differentiation uses this...
Loops express repeated computation. Recurrence relations express the same idea mathematically: each state is computed from one or more earlier states.
Mixed-mode differentiation combines forward accumulation and reverse accumulation in the same derivative computation. It is used when neither pure forward mode nor pure...
Forward mode automatic differentiation has a simple cost model. It evaluates the original program and, at the same time, evaluates the tangent program. Each primitive...
Most reverse mode automatic differentiation systems require a mechanism for recording the forward computation so that the reverse pass can later traverse it backward. This...
Dual numbers compute first derivatives exactly. Truncated polynomial algebras extend this to higher-order derivatives, but practical higher-order differentiation introduces an...
Nested automatic differentiation means applying automatic differentiation inside another automatic differentiation computation.
A piecewise differentiable function is built from several differentiable pieces joined by boundaries. Each piece has an ordinary derivative inside its region. At the...
Matrix factorizations rewrite a matrix into structured factors. They are used because the factors make later computations cheaper, more stable, or easier to interpret. In...
Python became the dominant language for modern machine learning and differentiable computing because it combines a simple programming model with access to high-performance...
Sequence models process ordered data. The input is not one independent vector, but a series:
Molecular simulation models the behavior of atoms and molecules using physical interaction laws. Automatic differentiation is important because many molecular methods require...
Differentiable search and retrieval systems integrate information access into gradient-based learning. Instead of treating retrieval as an external symbolic operation, the...
Gradient-based optimization relies on propagating derivative information through many layers, time steps, or computational transformations. In deep systems, these gradients...
Classical neural networks apply a finite sequence of transformations:
Automatic differentiation becomes substantially more difficult once programs contain higher-order functions.
Memory management is the main systems problem in reverse mode automatic differentiation. The derivative rules are usually small. The hard part is deciding which primal values,...
JAX is an automatic differentiation and array programming system for Python. It combines NumPy-like syntax with composable program transformations. Its core transformations...
JAX is an automatic differentiation and array programming system for Python. It combines NumPy-like syntax with composable program transformations. Its core transformations...
Automatic differentiation computes derivatives by applying the chain rule to the operations of a program. The input is ordinary code that computes a value. The output is code,...
The chain rule is the central theorem behind automatic differentiation. Every useful AD algorithm is a disciplined way of applying the chain rule to a program.
Control flow determines which operations a program executes. Straight-line programs have a fixed sequence of operations, but ordinary programs contain branches, loops,...
Reverse accumulation is the reverse-mode form of automatic differentiation. It propagates derivative information backward from outputs to inputs.
The natural output of forward mode automatic differentiation is a Jacobian-vector product. Instead of constructing the full Jacobian matrix explicitly, forward mode computes...
Reverse accumulation is the operational core of reverse mode automatic differentiation. The forward pass evaluates a program and records dependency information. The reverse...
Dual numbers capture first-order derivatives because the infinitesimal element satisfies
Reverse mode is efficient for scalar-output functions because it propagates one adjoint backward through the computation and produces a full gradient. For
A dynamic graph is a computation graph built while the program runs. Its structure depends on ordinary runtime values: branches, loop counts, recursive calls, tensor shapes,...
Linear algebra primitives are tensor operations with algebraic structure: matrix multiplication, triangular solves, factorizations, inverses, determinants, norms, and spectral...
Neural network training is the repeated application of three operations: evaluate a model, differentiate a scalar loss, and update parameters. Automatic differentiation...
Computational fluid dynamics studies fluid motion by solving discretized forms of the governing equations. Automatic differentiation enters CFD when we want gradients of...
A differentiable physics engine computes gradients of physical simulation outputs with respect to inputs, parameters, or control signals. Instead of treating simulation as a...
Reverse-mode automatic differentiation trades computation for memory. To compute gradients efficiently, the backward pass requires access to intermediate values produced...
Many systems evolve continuously over time rather than through discrete layers. A state variable changes according to a differential equation:
Cartesian differential categories model differentiation in categories with products. Differential categories generalize this idea further by shifting attention from cartesian...
A tape is an append-only record of the operations executed during the forward pass. Reverse mode uses the tape to replay derivative rules backward.
PyTorch Autograd is a dynamic reverse-mode automatic differentiation system. It records tensor operations as they execute, builds a computation graph at runtime, and then...
PyTorch Autograd is a dynamic reverse-mode automatic differentiation system. It records tensor operations as they execute, builds a computation graph at runtime, and then...
Symbolic differentiation computes derivatives by manipulating expressions. The input is a formula. The output is another formula.
The gradient is enough when a function has many inputs and one scalar output. More general programs need more general derivative objects. Two of the most important are the...
A dependency graph describes how values in a computation depend on earlier values. Automatic differentiation operates on these dependencies.
Forward accumulation is the forward-mode form of automatic differentiation. It propagates derivative information in the same order as ordinary program evaluation. Each...
Forward mode automatic differentiation works by replacing each primitive operation with an extended operation on pairs:
Reverse mode automatic differentiation fundamentally computes vector-Jacobian products. The gradient of a scalar function is a special case of this more general operation.
Dual numbers provide an algebraic mechanism for differentiation, but they also have a precise geometric meaning. A dual number represents a point together with an...
A Hessian-vector product computes
Recursion is control flow where a function calls itself. In automatic differentiation, recursion behaves like a loop with a call stack. Each recursive call contributes one...
Broadcasting is the rule system that allows tensor operations between arrays of different shapes without explicitly materializing expanded copies. It is one of the most...
Differentiable programming treats differentiation as a general programming-language feature. A program can contain numerical kernels, control flow, data structures, solvers,...
Backpropagation is reverse mode automatic differentiation applied to neural networks. In most machine learning writing, the term refers to the whole training procedure: run a...
An inverse problem asks for causes from effects. A forward model predicts observations from parameters. An inverse model tries to recover parameters from observations.
Differentiable rendering is the process of computing derivatives of rendered images with respect to scene parameters. A renderer becomes part of the computational graph rather...
Floating point systems represent numbers within a finite range. When a computed value exceeds the largest representable magnitude, overflow occurs. When a value becomes too...
An optimization layer is a program component whose output is the solution of an optimization problem. Instead of computing
Algebraic semantics describes differentiation through derivations, tangent maps, and linear structure. Categorical semantics goes further. It studies differentiation as a...
A graph representation makes the structure of a differentiated computation explicit. In reverse mode, this structure is required because the backward pass must know which...
TensorFlow Autograd refers to TensorFlow’s automatic differentiation system, mainly exposed through tf.GradientTape. It is a reverse-mode AD system designed for tensor...
TensorFlow Autograd refers to TensorFlow’s automatic differentiation system, mainly exposed through tf.GradientTape. It is a reverse-mode AD system designed for tensor...
Numerical differentiation estimates derivatives by evaluating a function at nearby input values. It treats the function as a black box. The method does not need access to the...
Automatic differentiation is usually applied to functions with many inputs and many outputs. The calculus needed for this setting is multivariate calculus: the study of how a...
Intermediate variables are the named values created between program inputs and program outputs. They make automatic differentiation mechanical.
Automatic differentiation reduces differentiation to a finite collection of elementary operations. Every program, regardless of complexity, is decomposed into primitive...
Dual numbers give forward mode automatic differentiation a compact algebraic form. Instead of storing a value and a tangent as two unrelated fields, we package them into one...
Reverse mode automatic differentiation operates on a computational graph. The forward pass evaluates the graph from inputs to outputs. The reverse pass traverses the same...
The defining feature of dual numbers is the existence of a nonzero element whose square vanishes:
For a scalar function
A loop repeats a computation until a condition fails or a fixed iteration count is reached. In automatic differentiation, loops are important because many numerical algorithms...
Tensor operations generalize scalar, vector, and matrix operations to arrays with arbitrary rank. In automatic differentiation, a tensor is usually treated as a typed array...
Functional programming languages provide a natural semantic foundation for automatic differentiation. Programs are expressed as compositions of functions, immutable values,...
Stochastic optimization studies optimization when the objective is accessed through samples, noisy estimates, or partial observations. In machine learning, this is the normal...
Sensitivity analysis studies how changes in inputs affect the outputs of a system. In differential equations, optimization, simulation, and machine learning, the main object...
A differentiable database is a data system whose operations participate in gradient-based optimization. Instead of treating storage and querying as external infrastructure,...
Reverse mode automatic differentiation computes gradients by propagating adjoint values backward through a computational graph. In exact arithmetic, the reverse accumulation...
A solver is a program that computes a value by search, iteration, or factorization. Instead of evaluating a closed-form expression, it finds a value that satisfies a condition.
Automatic differentiation is often introduced operationally. A program executes elementary operations, and derivative information propagates alongside the computation. This...
Reverse mode automatic differentiation computes derivatives by traversing the program backward after evaluation. Unlike forward mode, which propagates tangents alongside...
Tapenade is a source-transformation automatic differentiation system developed at INRIA. Like ADIFOR, it takes an existing program and produces a new differentiated program....
Tapenade is a source-transformation automatic differentiation system developed at INRIA. Like ADIFOR, it takes an existing program and produces a new differentiated program....
A derivative measures how an output changes when an input changes. That sentence is simple, but it is one of the main ideas behind numerical computing, optimization, machine...
Automatic differentiation begins with a simple object: a function.
A straight-line program is the simplest model of computation used in automatic differentiation. It is a program with a fixed sequence of assignments, no branches, no loops,...
Automatic differentiation is built on a simple observation: a complicated derivative can be computed by composing many small local derivatives. Instead of manipulating a full...
Forward mode automatic differentiation computes derivatives by carrying two values through a program at the same time: the ordinary value and its tangent. The ordinary value...
Reverse mode automatic differentiation computes derivatives by propagating sensitivities backward through a computation. In forward mode, each intermediate value carries a...
Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of...
First derivatives describe local rate of change. Second derivatives describe how that rate of change itself changes. In optimization, this is curvature. In dynamics, it is...
A conditional is a program construct that chooses one computation among several possible computations. In ordinary code, this is written as if, else, switch, case, pattern...
Matrix calculus is the notation and rule system used to differentiate functions whose inputs, outputs, or intermediate values are vectors, matrices, or tensors. Automatic...
Gradient descent is the basic optimization procedure behind much of modern machine learning. It is simple enough to state in one line, but rich enough to expose many of the...
Differential equations are one of the main reasons automatic differentiation matters in scientific computing. Many scientific models are not written as closed-form functions....
An end-to-end differentiable pipeline is a system whose final objective can send derivative information backward through every trainable or tunable stage of computation....
Automatic differentiation computes derivatives by executing arithmetic. On a real machine, arithmetic uses finite precision. This means AD gives the derivative of the...
Many programs do not compute their output by applying a fixed sequence of explicit operations. Instead, they define the output as the solution of another problem.
Automatic differentiation is often described by a simple rule:
A minimal forward mode automatic differentiation engine has one job: evaluate a program while carrying both a value and its derivative. The engine does not build a graph. It...
ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...
ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...
Sparse and structured differentiation studies how to compute derivatives without materializing dense derivative objects. Many real systems have enormous Jacobians and...
Automatic differentiation works naturally on pure mathematical functions:
Auto Diff book notes exported from ChatGPT, organized into 22 chapters.
ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...
Automatic differentiation works naturally on pure mathematical functions:
ADIFOR, short for Automatic Differentiation of Fortran, is one of the classical source-transformation systems for automatic differentiation. It was designed for numerical...
A minimal forward mode automatic differentiation engine has one job: evaluate a program while carrying both a value and its derivative. The engine does not build a graph. It...
Automatic differentiation is often described by a simple rule:
Many programs do not compute their output by applying a fixed sequence of explicit operations. Instead, they define the output as the solution of another problem.
Automatic differentiation computes derivatives by executing arithmetic. On a real machine, arithmetic uses finite precision. This means AD gives the derivative of the...
Sparse and structured differentiation studies how to compute derivatives without materializing dense derivative objects. Many real systems have enormous Jacobians and...
An end-to-end differentiable pipeline is a system whose final objective can send derivative information backward through every trainable or tunable stage of computation....
Differential equations are one of the main reasons automatic differentiation matters in scientific computing. Many scientific models are not written as closed-form functions....
Gradient descent is the basic optimization procedure behind much of modern machine learning. It is simple enough to state in one line, but rich enough to expose many of the...
Lisp is one of the natural homes of automatic differentiation. It treats programs as data, has a simple expression syntax, and supports macro systems that can transform code...
Source transformation is an implementation strategy for automatic differentiation in which a program that computes a function is rewritten into another program that computes...
Automatic differentiation can be performed before a program runs, while it runs, or in a staged phase between the two.
Matrix calculus is the notation and rule system used to differentiate functions whose inputs, outputs, or intermediate values are vectors, matrices, or tensors. Automatic...
Kernel fusion combines several small operations into one larger executable unit.
A conditional is a program construct that chooses one computation among several possible computations. In ordinary code, this is written as if, else, switch, case, pattern...
Memory planning determines where values are stored, how long they remain alive, and when storage can be reused.
First derivatives describe local rate of change. Second derivatives describe how that rate of change itself changes. In optimization, this is curvature. In dynamics, it is...
Staging is the separation of a program into phases.
Dual numbers give the cleanest algebraic model of forward mode automatic differentiation. They extend ordinary real numbers with a formal infinitesimal part. Instead of...
Tracing is an implementation strategy where an AD system observes a program while it runs and records the operations that occur.
Rust is an attractive language for automatic differentiation because it combines low-level performance with strong static guarantees. It gives the programmer control over...
Reverse mode automatic differentiation computes derivatives by propagating sensitivities backward through a computation. In forward mode, each intermediate value carries a...
A graph intermediate representation models a program as nodes and edges.
Forward mode automatic differentiation computes derivatives by carrying two values through a program at the same time: the ordinary value and its tangent. The ordinary value...
Static single assignment form, or SSA, is an intermediate representation where each variable is assigned exactly once.
C and C++ are important targets for automatic differentiation because much scientific, engineering, graphics, finance, and machine learning infrastructure is written in these...
Automatic differentiation is built on a simple observation: a complicated derivative can be computed by composing many small local derivatives. Instead of manipulating a full...
An intermediate representation, or IR, is the internal program form used by a compiler or AD system after parsing and before final code generation.
A straight-line program is the simplest model of computation used in automatic differentiation. It is a program with a fixed sequence of assignments, no branches, no loops,...
Operator overloading implements automatic differentiation by changing the meaning of ordinary arithmetic operations for special numeric objects.
Automatic differentiation begins with a simple object: a function.
Source transformation is an implementation strategy for automatic differentiation in which a program that computes a function is rewritten into another program that computes...
Lisp is one of the natural homes of automatic differentiation. It treats programs as data, has a simple expression syntax, and supports macro systems that can transform code...
A derivative measures how an output changes when an input changes. That sentence is simple, but it is one of the main ideas behind numerical computing, optimization, machine...
Reference material: set theory and relations, proof techniques, linear algebra for graph theory, probability review, algorithms and complexity, mathematical notation, historical development, common graph families, theorem index, and symbol index.
Bipartite, complete, and regular graphs, interval and chordal graphs, comparability and perfect graphs, tournaments, grid graphs, De Bruijn graphs, Kneser graphs, the Petersen graph, and Ramanujan graphs.
Social networks, web graphs, PageRank, biological and chemical networks, electrical and transportation networks, compiler graphs, knowledge graphs, recommendation systems, graph neural networks, distributed systems, and blockchain networks.
Extremal graph theory, Turán-type problems, Szemerédi regularity lemma, minor theory, Robertson-Seymour theory, infinite graphs, topological and category-theoretic graph theory, simplicial complexes, graph limits, temporal graphs, and quantum graph theory.
Complexity classes, NP-complete graph problems, Hamiltonian paths and cycles, the traveling salesman problem, graph isomorphism, parameterized complexity, fixed-parameter tractability, and approximation hardness.
Graph traversal, BFS, DFS, shortest paths, Dijkstra, Bellman-Ford, Floyd-Warshall, network flow, maximum flow, minimum cut, matching algorithms, union-find, dynamic graph algorithms, approximation, and randomized algorithms.
Random graph models, Erdős-Rényi graphs, threshold phenomena, small-world and scale-free networks, preferential attachment, percolation, probabilistic methods, and random processes on networks.
Adjacency, incidence, and Laplacian matrices, graph spectrum, eigenvalues and eigenvectors, spectral theorems, algebraic connectivity, expander graphs, graph energy, Cayley graphs, and automorphism groups.
Vertex and edge coloring, chromatic number, chromatic polynomial, greedy coloring, Brooks' theorem, perfect graphs, Ramsey theory, list coloring, and fractional coloring.
Matchings, perfect matchings, Hall's marriage theorem, bipartite matching, maximum matching algorithms, vertex and edge covers, independent sets, cliques, and dominating sets.
Vertex and edge connectivity, cuts, bridges, articulation points, Menger's theorem, network reliability, expanders, separators, and random walks on graphs.
Planar graphs, plane embeddings, Euler's formula, faces, Kuratowski's theorem, coloring of planar graphs, dual graphs, geometric graphs, and connections to computational geometry.
Trees, forests, rooted and binary trees, spanning trees, minimum spanning trees, Prüfer sequences, tree traversal, tree decomposition, and applications of trees.
Directed graphs, in-degree and out-degree, directed paths and cycles, strong connectivity, DAGs, topological ordering, weighted graphs, multigraphs, hypergraphs, and labeled graphs.
A comprehensive reference covering graph theory from elementary definitions through structural theory, algorithms, algebraic methods, probabilistic models, computational complexity, and modern applications in fourteen parts with appendices.
Basic definitions, vertices and edges, graph representations, degree, walks, paths, cycles, connectivity, isomorphism, subgraphs, and classes of graphs — the bedrock of graph theory.
A comprehensive book covering linear algebra from foundations through spectral theory, matrix decompositions, numerical methods, and modern applications in ten parts with appendices.
Reference material: set theory, proof techniques, real and complex numbers, polynomial algebra, calculus review, numerical computation, notation, historical notes, glossary, and index.
Complex vector spaces, finite fields, modules, category theory, convex geometry, random matrices, operator theory, spectral graph theory, compressed sensing, tensor decompositions, geometric algebra, and AI applications.
Linear regression, optimization, graphs, Markov chains, differential equations, Fourier transforms, signal processing, computer graphics, robotics, quantum mechanics, machine learning, PCA, and more.
Tensor products, exterior and symmetric algebras, multilinear maps, bilinear forms, Clifford algebras, Lie algebras, representation theory, and infinite-dimensional spaces.
Floating point arithmetic, conditioning, stability, iterative solvers, Jacobi, Gauss-Seidel, conjugate gradient, Krylov subspaces, QR algorithm, sparse and randomized methods.
LU, PLU, Cholesky, QR, Schur, SVD, polar, Hessenberg, tridiagonalization, and canonical matrix forms.
Eigenvalues, eigenvectors, diagonalization, the spectral theorem, Jordan canonical form, Cayley-Hamilton, matrix functions, and Perron-Frobenius theory.
Inner products, norms, orthogonality, Gram-Schmidt, orthogonal projections, least squares, QR factorization, Hermitian spaces, and quadratic forms.
Linear maps, kernel and image, matrix representation, isomorphisms, projections, reflections, rotations, similarity, and invariant subspaces.
Abstract vector spaces, subspaces, span, linear independence, basis, dimension, coordinate systems, dual spaces, and direct sums.
Scalars, vectors, matrices, linear equations, Gaussian elimination, determinants, and matrix factorizations — the bedrock of linear algebra.