Skip to content

AD in C and C++

C and C++ are important targets for automatic differentiation because much scientific, engineering, graphics, finance, and machine learning infrastructure is written in these...

C and C++ are important targets for automatic differentiation because much scientific, engineering, graphics, finance, and machine learning infrastructure is written in these languages. They provide direct control over memory layout, compilation, vectorization, and hardware-specific optimization. They also make AD difficult because programs can contain pointer aliasing, mutation, templates, manual memory management, macros, and calls into external libraries.

In C and C++, automatic differentiation is less a single technique than a family of implementation strategies. The main approaches are operator overloading, source transformation, and compiler intermediate representation transformation.

Operator Overloading in C++

C++ supports operator overloading, so forward mode can be implemented by replacing ordinary scalar types with derivative-aware scalar types.

A simple dual-number type looks like this:

struct Dual {
    double x;   // primal value
    double dx;  // tangent value
};

Dual operator+(Dual a, Dual b) {
    return {a.x + b.x, a.dx + b.dx};
}

Dual operator*(Dual a, Dual b) {
    return {
        a.x * b.x,
        a.dx * b.x + a.x * b.dx
    };
}

A function written generically can then run over either double or Dual:

template <class T>
T f(T x) {
    return (x + T{1}) * sin(x);
}

With suitable overloads for sin, evaluating f(Dual{x, 1}) returns both the function value and the derivative with respect to x.

This works well for small to medium programs. It also composes naturally with templates, expression templates, and generic numeric code.

Reverse Mode with Tapes

Reverse mode in C++ is commonly implemented with a tape. During the forward pass, each primitive operation records enough information to propagate adjoints backward.

A variable type may contain:

struct Var {
    double value;
    int tape_id;
};

The tape stores operations:

struct Node {
    double value;
    double adjoint;
    int left;
    int right;
    Op op;
};

For multiplication, the forward pass stores the input references and the output value. The reverse pass applies:

adjoint[left]  += adjoint[out] * value[right];
adjoint[right] += adjoint[out] * value[left];

Tape-based reverse mode is flexible. It supports dynamic control flow because it records the operations that actually ran. It also has costs: allocation, tape memory, pointer chasing, and limited compiler visibility.

Source Transformation

Source transformation tools parse C or C++ code and generate differentiated code. Instead of overloading arithmetic operators, the tool rewrites the program.

For example, a primal function:

double f(double x) {
    return (x + 1.0) * sin(x);
}

may be transformed into:

void f_d(double x, double x_d, double* y, double* y_d) {
    double v1 = x + 1.0;
    double v1_d = x_d;

    double v2 = sin(x);
    double v2_d = cos(x) * x_d;

    *y = v1 * v2;
    *y_d = v1_d * v2 + v1 * v2_d;
}

This approach can produce efficient plain C or C++ code. It is especially useful in scientific computing, where generated derivative code may be compiled by standard toolchains and integrated into larger solvers.

The difficulty is language coverage. Full C++ is large. Templates, overload resolution, macros, exceptions, virtual dispatch, and undefined behavior make source transformation difficult.

Compiler IR Differentiation

A newer strategy differentiates compiler intermediate representations, especially LLVM IR. The AD tool operates after frontend parsing and type checking.

This has several advantages:

BenefitExplanation
Language coverageC, C++, Rust, Julia, Fortran frontends can lower to LLVM
Optimization reuseExisting compiler passes optimize primal and derivative code
Library visibilityInlined code can be differentiated after lowering
Hardware targetingLLVM already supports many CPU and accelerator targets

The cost is that high-level source information may be lost. Arrays, loops, and mathematical intent are represented as low-level control flow and memory operations. The AD pass must understand loads, stores, aliasing, calls, and activity analysis.

Activity Analysis

Activity analysis determines which values influence derivatives.

A value is active if perturbing it can affect the differentiated output. A value is passive if it has no derivative relevance.

Example:

double f(double x, int n) {
    return n * x * x;
}

Here x is active. n is usually passive because it is an integer control or scale parameter. The derivative with respect to x depends on n, but n itself does not receive a derivative.

Activity analysis reduces work. It prevents the AD system from propagating adjoints through irrelevant values.

Pointer Aliasing

Pointer aliasing is one of the hard parts of AD in C and C++.

Consider:

void axpy(double* y, const double* x, double a, int n) {
    for (int i = 0; i < n; i++) {
        y[i] += a * x[i];
    }
}

If x and y point to overlapping memory, the semantics differ from the non-overlapping case. Reverse mode must respect the exact order of reads and writes.

The AD system needs alias information. It may rely on:

MechanismRole
restrict in CPromises non-aliasing pointers
const qualifiersMarks read-only data
Compiler alias analysisInfers possible overlap
Runtime checksSelects safe derivative path
Conservative fallbackStores more intermediate state

Without correct alias handling, reverse-mode derivatives can be wrong.

Mutation and Reverse Mode

C and C++ programs frequently update arrays in place. Reverse mode must invert the effect of these updates on adjoints.

For assignment:

x[i] = y;

the old value of x[i] may be needed during the reverse pass if later computations depend on it. The AD system may need to store snapshots or record enough information to replay the program.

For accumulation:

x[i] += y;

the adjoint behavior resembles a scatter-add in reverse.

In array-heavy code, these rules dominate performance. Efficient AD for C and C++ must treat memory operations as first-class derivative operations.

Templates and Expression Templates

C++ templates can help AD. A numeric function templated over scalar type can work with double, float, dual numbers, intervals, arbitrary precision values, or AD variables.

template <typename Scalar>
Scalar energy(const Scalar* x, int n) {
    Scalar e = 0;
    for (int i = 0; i < n; i++) {
        e += x[i] * x[i];
    }
    return e;
}

This style gives forward-mode AD almost for free.

Expression templates go further. They delay evaluation and build expression trees at compile time. AD libraries can exploit these trees to reduce temporaries and fuse operations.

The downside is compile time, error-message complexity, and fragile interaction with large codebases.

Custom Adjoints for Libraries

C and C++ programs often call BLAS, LAPACK, FFT libraries, sparse solvers, graphics kernels, and vendor APIs. AD systems rarely differentiate through these binaries directly.

Instead, they use custom derivative rules.

Library operationDerivative treatment
Matrix multiplyTransposed matrix products
Linear solveSolve adjoint linear system
CholeskyStructured triangular derivative
FFTLinear adjoint transform
ConvolutionCorrelation-like adjoint kernels
ODE solverSensitivity or adjoint method

Custom adjoints are essential for performance. Differentiating the implementation of a dense matrix multiply loop would lose optimized BLAS performance. Calling optimized adjoint kernels preserves the performance model.

Memory Management

Manual memory management complicates AD.

Reverse mode extends object lifetimes because intermediate values may be needed after the forward pass. A buffer that can be freed in ordinary execution may need to remain alive until the reverse pass.

C++ RAII helps, but AD systems must still manage:

ConcernAD consequence
Stack allocationValues may disappear before reverse pass
Heap allocationTape must own or reference values safely
Move semanticsActive values need valid adjoint identity
DestructorsSide effects may occur during unwinding
Custom allocatorsTape and adjoint storage must integrate cleanly

High-performance systems often use arena allocation for tapes and adjoints.

Exceptions and Undefined Behavior

C and C++ contain behavior that is difficult to differentiate safely.

Exceptions create non-local control flow. Reverse mode must know which operations completed before the exception. In many AD systems, differentiable regions are expected to avoid exceptions.

Undefined behavior is worse. If the primal program has undefined behavior, the derivative program has no reliable meaning. Examples include out-of-bounds access, use-after-free, signed integer overflow, and invalid aliasing assumptions.

A robust AD tool cannot repair undefined primal semantics.

Performance Model

C and C++ AD performance depends heavily on the implementation strategy.

ApproachStrengthWeakness
Forward operator overloadingSimple, generic, good for few inputsExpensive for many-input scalar-output gradients
Reverse tapeGood for scalar losses, dynamic control flowTape memory and runtime overhead
Source transformationEfficient generated codeHard language coverage
LLVM IR transformationBroad frontend supportLow-level alias and memory complexity
Custom adjointsHigh performance for librariesRequires manual rules

Production systems often combine these methods.

AD Tools in C and C++

Representative tools include:

ToolMain approach
ADOL-COperator overloading and tape-based AD
CppADOperator overloading with tapes
AdeptExpression templates and reverse mode
CoDiPackOperator overloading for scientific computing
Stan MathReverse-mode autodiff for probabilistic modeling
TapenadeSource transformation, especially C and Fortran
EnzymeLLVM IR-level automatic differentiation

Each tool reflects a different answer to the same question: where should differentiation happen, at the library level, source level, or compiler level?

Practical Design Guidance

For C and C++ systems, the AD strategy should follow the workload.

Use forward-mode operator overloading when the number of independent variables is small, templates are already used, and development speed matters.

Use reverse-mode tape systems when computing gradients of scalar objectives with many parameters, especially when the program has dynamic control flow.

Use source transformation or compiler-level AD when performance matters and the codebase can tolerate a more structured build pipeline.

Use custom adjoints for large library primitives. This is usually mandatory for linear algebra, solvers, FFTs, and accelerator kernels.

The central engineering problem is not the derivative rule for multiplication. It is preserving C and C++ program semantics while producing derivative code that remains fast, memory-safe, and compatible with existing numerical libraries.