Skip to content

Enzyme

Enzyme is a compiler-based automatic differentiation system for LLVM and MLIR. Instead of differentiating source code directly, or recording tensor operations at runtime,...

Enzyme is a compiler-based automatic differentiation system for LLVM and MLIR. Instead of differentiating source code directly, or recording tensor operations at runtime, Enzyme works on compiler intermediate representation.

This design gives Enzyme a distinct position among AD systems. ADIFOR and Tapenade transform source programs. PyTorch records a dynamic tape. JAX traces array programs. Zygote transforms Julia IR. Enzyme differentiates low-level optimized IR produced by many frontends.

The project describes Enzyme as an AD plugin for statically analyzable LLVM and MLIR. Because many languages lower to LLVM, Enzyme can in principle support C, C++, Fortran, Julia, Rust, Swift, Python frontends, and other LLVM-compatible systems through one differentiation engine. citeturn523391search3turn523391search4

AD at the Compiler IR Level

A compiler typically lowers source code through several stages:

source language
  -> frontend IR
  -> LLVM IR or MLIR
  -> optimized IR
  -> machine code

Enzyme inserts automatic differentiation into this pipeline. It analyzes a function in LLVM IR or MLIR and synthesizes a derivative function.

Conceptually, a call such as:

double y = f(x);

can be transformed into code that computes a derivative of f with respect to x.

Enzyme is often invoked through marker calls such as __enzyme_autodiff. The compiler pass later replaces those calls with generated derivative code. citeturn523391search4turn523391search5

Why Differentiate Optimized IR

A key Enzyme argument is that differentiation after compiler optimization can produce better derivative programs.

Traditional source-transformation tools often differentiate before the compiler sees the program. The compiler then optimizes both primal and derivative code. Enzyme can instead let standard compiler passes simplify the primal program first, then differentiate the optimized representation.

This matters because source programs often contain abstraction overhead, temporary values, dead computations, redundant loads, and library abstractions. If these are removed before AD, the derivative program can be smaller and faster.

The Enzyme paper reports that differentiating optimized IR gave a geometric mean speedup over differentiating unoptimized IR on its benchmark suite. citeturn523391search2turn523391search10

Reverse Mode

Enzyme is best known for reverse-mode AD. For a scalar-valued function

L=f(x), L = f(x),

reverse mode computes the gradient:

xˉ=Lx. \bar{x} = \frac{\partial L}{\partial x}.

At the IR level, this means Enzyme must transform loads, stores, arithmetic operations, branches, function calls, memory allocation, and compiler intrinsics into corresponding adjoint computations.

For a simple source statement:

z = x * y;

the mathematical adjoint rule is:

x_bar += z_bar * y
y_bar += z_bar * x

At LLVM IR level, the operation may no longer look like a neat source-level assignment. It may be a sequence of instructions involving registers, memory, and optimized control flow. Enzyme performs AD over that lowered representation.

Activity Analysis

Like Tapenade and ADIFOR, Enzyme needs activity analysis. An instruction or value is active if it depends on differentiable inputs and can affect differentiable outputs.

At the IR level, activity analysis must reason about:

ObjectQuestion
SSA valuesDoes this value carry derivative information?
memory locationsDoes this allocation or pointer contain active data?
loads and storesDoes this memory operation read or write active state?
function callsDoes this callee need a derivative rule?
control flowDoes branch structure affect active computation?

This analysis is harder than in a pure tensor graph because LLVM IR exposes general-purpose program behavior. It includes pointer operations, mutation, aliasing, and low-level memory effects.

Memory and Mutation

Reverse mode needs forward-pass values during the backward pass. In high-level AD systems, these values are often saved on a tape. In compiler AD, the same issue appears as memory management for adjoint computation.

Enzyme must decide which values to preserve, which values to recompute, and how to handle overwritten memory.

For code such as:

double t = x * y;
double u = sin(t);

the backward pass needs x, y, and often t.

For code with mutation:

a[i] = a[i] + x;

the reverse pass may need the old value of a[i], the index i, and the control path that reached the store. This becomes more complex with aliasing, pointer arithmetic, and external function calls.

Enzyme’s compiler-level position gives it access to alias analysis, memory SSA, optimization passes, and other compiler infrastructure. That is a major advantage over a purely syntactic source transformer.

Cross-Language Differentiation

Because Enzyme operates below the source language, it can differentiate code from multiple languages that compile to LLVM IR.

This is one of its main architectural contributions. A numerical kernel written in C, called from Julia, and compiled through LLVM can be differentiated without rewriting the kernel in a machine learning framework.

The Enzyme project explicitly targets existing code, including scientific and machine-learning workloads, by synthesizing gradients of statically analyzable programs expressed in LLVM IR. citeturn523391search2turn523391search10

This addresses a practical problem: many valuable numerical codes were not written in PyTorch, TensorFlow, or JAX. They may be legacy Fortran, C++, simulation kernels, or HPC libraries. Enzyme gives these programs a path into differentiable workflows.

Parallel and Accelerator Code

Enzyme is also significant because it targets parallel programs and accelerator kernels. Work on Enzyme has studied reverse-mode AD for OpenMP, MPI, RAJA, Julia tasks, and GPU kernels. citeturn523391search0turn523391search12turn523391search13

This is difficult because reverse mode over parallelism must respect synchronization, reductions, memory sharing, and communication.

For example, differentiating a parallel reduction requires the adjoint program to distribute the output cotangent back to each contributing input. Differentiating MPI communication requires reasoning about data movement between processes. Differentiating GPU kernels requires generating efficient adjoint kernels, not merely scalar host code.

This places Enzyme closer to HPC compiler research than to ordinary neural network autograd.

Custom Rules and External Functions

No AD system can automatically differentiate every external function. Enzyme still needs derivative rules or annotations for operations whose semantics are opaque at the IR level.

Examples include:

CaseRequirement
math library callsknown derivative formulas
BLAS or LAPACK callscustom adjoints for linear algebra kernels
external C functionsderivative rule or activity annotation
nondifferentiable operationsexplicit handling or rejection
I/O and system callsusually passive or unsupported

The IR-level approach reduces language duplication, but it does not remove the need for semantic knowledge. The compiler can see instructions, but it may not know the mathematical meaning of every external call unless a rule is supplied.

Strengths

Enzyme’s main strength is its placement in the compiler stack. By working on LLVM and MLIR, it can reuse compiler analyses and optimizations while serving many source languages.

It can differentiate code that would be awkward or impossible to rewrite in a tensor framework. This includes legacy scientific code, low-level kernels, mixed-language applications, and performance-critical routines.

It also fits high-performance computing. Many scientific codes already depend on LLVM-compatible toolchains, aggressive optimization, parallel execution, and accelerator backends. Enzyme meets these programs closer to their existing compilation model.

A further strength is performance. Differentiating optimized IR can remove unnecessary derivative work before it is generated, and the resulting derivative code can pass through the rest of the compiler pipeline.

Limitations

Enzyme works best on statically analyzable programs. Highly dynamic behavior, opaque external calls, complex aliasing, undefined behavior, and runtime reflection can block or complicate differentiation.

The low-level IR view can also lose source-level intent. A compiler IR exposes operations precisely, but it may obscure higher-level mathematical structure. Sometimes a high-level custom rule for an operation is better than differentiating its low-level implementation instruction by instruction.

Debugging can be difficult. When a gradient is wrong or unsupported, the user may need to understand source code, compiler IR, AD annotations, and generated derivative code.

Like all reverse-mode systems, Enzyme faces memory pressure. Large programs may require careful recomputation, caching, or checkpointing strategies.

Historical Role

Enzyme represents a major compiler-centric line of automatic differentiation. It treats AD as a transformation over optimized program IR rather than as a feature of one array library or one source language.

This changes the scope of AD. Instead of asking users to move their code into a differentiable framework, Enzyme attempts to bring differentiation to existing compiled programs.

Its historical importance is therefore clear: Enzyme makes automatic differentiation part of the general compiler toolchain. That connects AD to systems programming, HPC, accelerator compilation, and multi-language numerical software.