AD in C and C++

C and C++ are important targets for automatic differentiation because much scientific, engineering, graphics, finance, and machine learning infrastructure is written in these languages. They provide direct control over memory layout, compilation, vectorization, and hardware-specific optimization. They also make AD difficult because programs can contain pointer aliasing, mutation, templates, manual memory management, macros, and calls into external libraries.

In C and C++, automatic differentiation is less a single technique than a family of implementation strategies. The main approaches are operator overloading, source transformation, and compiler intermediate representation transformation.

Operator Overloading in C++

C++ supports operator overloading, so forward mode can be implemented by replacing ordinary scalar types with derivative-aware scalar types.

A simple dual-number type looks like this:

struct Dual {
    double x;   // primal value
    double dx;  // tangent value
};

Dual operator+(Dual a, Dual b) {
    return {a.x + b.x, a.dx + b.dx};
}

Dual operator*(Dual a, Dual b) {
    return {
        a.x * b.x,
        a.dx * b.x + a.x * b.dx
    };
}

A function written generically can then run over either double or Dual:

template <class T>
T f(T x) {
    return (x + T{1}) * sin(x);
}

With suitable overloads for sin, evaluating f(Dual{x, 1}) returns both the function value and the derivative with respect to x.

This works well for small to medium programs. It also composes naturally with templates, expression templates, and generic numeric code.

Reverse Mode with Tapes

Reverse mode in C++ is commonly implemented with a tape. During the forward pass, each primitive operation records enough information to propagate adjoints backward.

A variable type may contain:

struct Var {
    double value;
    int tape_id;
};

The tape stores operations:

struct Node {
    double value;
    double adjoint;
    int left;
    int right;
    Op op;
};

For multiplication, the forward pass stores the input references and the output value. The reverse pass applies:

adjoint[left]  += adjoint[out] * value[right];
adjoint[right] += adjoint[out] * value[left];

Tape-based reverse mode is flexible. It supports dynamic control flow because it records the operations that actually ran. It also has costs: allocation, tape memory, pointer chasing, and limited compiler visibility.

Source Transformation

Source transformation tools parse C or C++ code and generate differentiated code. Instead of overloading arithmetic operators, the tool rewrites the program.

For example, a primal function:

double f(double x) {
    return (x + 1.0) * sin(x);
}

may be transformed into:

void f_d(double x, double x_d, double* y, double* y_d) {
    double v1 = x + 1.0;
    double v1_d = x_d;

    double v2 = sin(x);
    double v2_d = cos(x) * x_d;

    *y = v1 * v2;
    *y_d = v1_d * v2 + v1 * v2_d;
}

This approach can produce efficient plain C or C++ code. It is especially useful in scientific computing, where generated derivative code may be compiled by standard toolchains and integrated into larger solvers.

The difficulty is language coverage. Full C++ is large. Templates, overload resolution, macros, exceptions, virtual dispatch, and undefined behavior make source transformation difficult.

Compiler IR Differentiation

A newer strategy differentiates compiler intermediate representations, especially LLVM IR. The AD tool operates after frontend parsing and type checking.

This has several advantages:

Benefit	Explanation
Language coverage	C, C++, Rust, Julia, Fortran frontends can lower to LLVM
Optimization reuse	Existing compiler passes optimize primal and derivative code
Library visibility	Inlined code can be differentiated after lowering
Hardware targeting	LLVM already supports many CPU and accelerator targets

The cost is that high-level source information may be lost. Arrays, loops, and mathematical intent are represented as low-level control flow and memory operations. The AD pass must understand loads, stores, aliasing, calls, and activity analysis.

Activity Analysis

Activity analysis determines which values influence derivatives.

A value is active if perturbing it can affect the differentiated output. A value is passive if it has no derivative relevance.

Example:

double f(double x, int n) {
    return n * x * x;
}

Here x is active. n is usually passive because it is an integer control or scale parameter. The derivative with respect to x depends on n, but n itself does not receive a derivative.

Activity analysis reduces work. It prevents the AD system from propagating adjoints through irrelevant values.

Pointer Aliasing

Pointer aliasing is one of the hard parts of AD in C and C++.

Consider:

void axpy(double* y, const double* x, double a, int n) {
    for (int i = 0; i < n; i++) {
        y[i] += a * x[i];
    }
}

If x and y point to overlapping memory, the semantics differ from the non-overlapping case. Reverse mode must respect the exact order of reads and writes.

The AD system needs alias information. It may rely on:

Mechanism	Role
`restrict` in C	Promises non-aliasing pointers
`const` qualifiers	Marks read-only data
Compiler alias analysis	Infers possible overlap
Runtime checks	Selects safe derivative path
Conservative fallback	Stores more intermediate state

Without correct alias handling, reverse-mode derivatives can be wrong.

Mutation and Reverse Mode

C and C++ programs frequently update arrays in place. Reverse mode must invert the effect of these updates on adjoints.

For assignment:

x[i] = y;

the old value of x[i] may be needed during the reverse pass if later computations depend on it. The AD system may need to store snapshots or record enough information to replay the program.

For accumulation:

x[i] += y;

the adjoint behavior resembles a scatter-add in reverse.

In array-heavy code, these rules dominate performance. Efficient AD for C and C++ must treat memory operations as first-class derivative operations.

Templates and Expression Templates

C++ templates can help AD. A numeric function templated over scalar type can work with double, float, dual numbers, intervals, arbitrary precision values, or AD variables.

template <typename Scalar>
Scalar energy(const Scalar* x, int n) {
    Scalar e = 0;
    for (int i = 0; i < n; i++) {
        e += x[i] * x[i];
    }
    return e;
}

This style gives forward-mode AD almost for free.

Expression templates go further. They delay evaluation and build expression trees at compile time. AD libraries can exploit these trees to reduce temporaries and fuse operations.

The downside is compile time, error-message complexity, and fragile interaction with large codebases.

Custom Adjoints for Libraries

C and C++ programs often call BLAS, LAPACK, FFT libraries, sparse solvers, graphics kernels, and vendor APIs. AD systems rarely differentiate through these binaries directly.

Instead, they use custom derivative rules.

Library operation	Derivative treatment
Matrix multiply	Transposed matrix products
Linear solve	Solve adjoint linear system
Cholesky	Structured triangular derivative
FFT	Linear adjoint transform
Convolution	Correlation-like adjoint kernels
ODE solver	Sensitivity or adjoint method

Custom adjoints are essential for performance. Differentiating the implementation of a dense matrix multiply loop would lose optimized BLAS performance. Calling optimized adjoint kernels preserves the performance model.

Memory Management

Manual memory management complicates AD.

Reverse mode extends object lifetimes because intermediate values may be needed after the forward pass. A buffer that can be freed in ordinary execution may need to remain alive until the reverse pass.

C++ RAII helps, but AD systems must still manage:

Concern	AD consequence
Stack allocation	Values may disappear before reverse pass
Heap allocation	Tape must own or reference values safely
Move semantics	Active values need valid adjoint identity
Destructors	Side effects may occur during unwinding
Custom allocators	Tape and adjoint storage must integrate cleanly

High-performance systems often use arena allocation for tapes and adjoints.

Exceptions and Undefined Behavior

C and C++ contain behavior that is difficult to differentiate safely.

Exceptions create non-local control flow. Reverse mode must know which operations completed before the exception. In many AD systems, differentiable regions are expected to avoid exceptions.

Undefined behavior is worse. If the primal program has undefined behavior, the derivative program has no reliable meaning. Examples include out-of-bounds access, use-after-free, signed integer overflow, and invalid aliasing assumptions.

A robust AD tool cannot repair undefined primal semantics.

Performance Model

C and C++ AD performance depends heavily on the implementation strategy.

Approach	Strength	Weakness
Forward operator overloading	Simple, generic, good for few inputs	Expensive for many-input scalar-output gradients
Reverse tape	Good for scalar losses, dynamic control flow	Tape memory and runtime overhead
Source transformation	Efficient generated code	Hard language coverage
LLVM IR transformation	Broad frontend support	Low-level alias and memory complexity
Custom adjoints	High performance for libraries	Requires manual rules

Production systems often combine these methods.

AD Tools in C and C++

Representative tools include:

Tool	Main approach
ADOL-C	Operator overloading and tape-based AD
CppAD	Operator overloading with tapes
Adept	Expression templates and reverse mode
CoDiPack	Operator overloading for scientific computing
Stan Math	Reverse-mode autodiff for probabilistic modeling
Tapenade	Source transformation, especially C and Fortran
Enzyme	LLVM IR-level automatic differentiation

Each tool reflects a different answer to the same question: where should differentiation happen, at the library level, source level, or compiler level?

Practical Design Guidance

For C and C++ systems, the AD strategy should follow the workload.

Use forward-mode operator overloading when the number of independent variables is small, templates are already used, and development speed matters.

Use reverse-mode tape systems when computing gradients of scalar objectives with many parameters, especially when the program has dynamic control flow.

Use source transformation or compiler-level AD when performance matters and the codebase can tolerate a more structured build pipeline.

Use custom adjoints for large library primitives. This is usually mandatory for linear algebra, solvers, FFTs, and accelerator kernels.

The central engineering problem is not the derivative rule for multiplication. It is preserving C and C++ program semantics while producing derivative code that remains fast, memory-safe, and compatible with existing numerical libraries.