# AD in C and C++

## AD in C and C++

C and C++ are important targets for automatic differentiation because much scientific, engineering, graphics, finance, and machine learning infrastructure is written in these languages. They provide direct control over memory layout, compilation, vectorization, and hardware-specific optimization. They also make AD difficult because programs can contain pointer aliasing, mutation, templates, manual memory management, macros, and calls into external libraries.

In C and C++, automatic differentiation is less a single technique than a family of implementation strategies. The main approaches are operator overloading, source transformation, and compiler intermediate representation transformation.

### Operator Overloading in C++

C++ supports operator overloading, so forward mode can be implemented by replacing ordinary scalar types with derivative-aware scalar types.

A simple dual-number type looks like this:

```cpp
struct Dual {
    double x;   // primal value
    double dx;  // tangent value
};

Dual operator+(Dual a, Dual b) {
    return {a.x + b.x, a.dx + b.dx};
}

Dual operator*(Dual a, Dual b) {
    return {
        a.x * b.x,
        a.dx * b.x + a.x * b.dx
    };
}
```

A function written generically can then run over either `double` or `Dual`:

```cpp
template <class T>
T f(T x) {
    return (x + T{1}) * sin(x);
}
```

With suitable overloads for `sin`, evaluating `f(Dual{x, 1})` returns both the function value and the derivative with respect to `x`.

This works well for small to medium programs. It also composes naturally with templates, expression templates, and generic numeric code.

### Reverse Mode with Tapes

Reverse mode in C++ is commonly implemented with a tape. During the forward pass, each primitive operation records enough information to propagate adjoints backward.

A variable type may contain:

```cpp
struct Var {
    double value;
    int tape_id;
};
```

The tape stores operations:

```cpp
struct Node {
    double value;
    double adjoint;
    int left;
    int right;
    Op op;
};
```

For multiplication, the forward pass stores the input references and the output value. The reverse pass applies:

```cpp
adjoint[left]  += adjoint[out] * value[right];
adjoint[right] += adjoint[out] * value[left];
```

Tape-based reverse mode is flexible. It supports dynamic control flow because it records the operations that actually ran. It also has costs: allocation, tape memory, pointer chasing, and limited compiler visibility.

### Source Transformation

Source transformation tools parse C or C++ code and generate differentiated code. Instead of overloading arithmetic operators, the tool rewrites the program.

For example, a primal function:

```c
double f(double x) {
    return (x + 1.0) * sin(x);
}
```

may be transformed into:

```c
void f_d(double x, double x_d, double* y, double* y_d) {
    double v1 = x + 1.0;
    double v1_d = x_d;

    double v2 = sin(x);
    double v2_d = cos(x) * x_d;

    *y = v1 * v2;
    *y_d = v1_d * v2 + v1 * v2_d;
}
```

This approach can produce efficient plain C or C++ code. It is especially useful in scientific computing, where generated derivative code may be compiled by standard toolchains and integrated into larger solvers.

The difficulty is language coverage. Full C++ is large. Templates, overload resolution, macros, exceptions, virtual dispatch, and undefined behavior make source transformation difficult.

### Compiler IR Differentiation

A newer strategy differentiates compiler intermediate representations, especially LLVM IR. The AD tool operates after frontend parsing and type checking.

This has several advantages:

| Benefit | Explanation |
|---|---|
| Language coverage | C, C++, Rust, Julia, Fortran frontends can lower to LLVM |
| Optimization reuse | Existing compiler passes optimize primal and derivative code |
| Library visibility | Inlined code can be differentiated after lowering |
| Hardware targeting | LLVM already supports many CPU and accelerator targets |

The cost is that high-level source information may be lost. Arrays, loops, and mathematical intent are represented as low-level control flow and memory operations. The AD pass must understand loads, stores, aliasing, calls, and activity analysis.

### Activity Analysis

Activity analysis determines which values influence derivatives.

A value is active if perturbing it can affect the differentiated output. A value is passive if it has no derivative relevance.

Example:

```c
double f(double x, int n) {
    return n * x * x;
}
```

Here `x` is active. `n` is usually passive because it is an integer control or scale parameter. The derivative with respect to `x` depends on `n`, but `n` itself does not receive a derivative.

Activity analysis reduces work. It prevents the AD system from propagating adjoints through irrelevant values.

### Pointer Aliasing

Pointer aliasing is one of the hard parts of AD in C and C++.

Consider:

```c
void axpy(double* y, const double* x, double a, int n) {
    for (int i = 0; i < n; i++) {
        y[i] += a * x[i];
    }
}
```

If `x` and `y` point to overlapping memory, the semantics differ from the non-overlapping case. Reverse mode must respect the exact order of reads and writes.

The AD system needs alias information. It may rely on:

| Mechanism | Role |
|---|---|
| `restrict` in C | Promises non-aliasing pointers |
| `const` qualifiers | Marks read-only data |
| Compiler alias analysis | Infers possible overlap |
| Runtime checks | Selects safe derivative path |
| Conservative fallback | Stores more intermediate state |

Without correct alias handling, reverse-mode derivatives can be wrong.

### Mutation and Reverse Mode

C and C++ programs frequently update arrays in place. Reverse mode must invert the effect of these updates on adjoints.

For assignment:

```c
x[i] = y;
```

the old value of `x[i]` may be needed during the reverse pass if later computations depend on it. The AD system may need to store snapshots or record enough information to replay the program.

For accumulation:

```c
x[i] += y;
```

the adjoint behavior resembles a scatter-add in reverse.

In array-heavy code, these rules dominate performance. Efficient AD for C and C++ must treat memory operations as first-class derivative operations.

### Templates and Expression Templates

C++ templates can help AD. A numeric function templated over scalar type can work with `double`, `float`, dual numbers, intervals, arbitrary precision values, or AD variables.

```cpp
template <typename Scalar>
Scalar energy(const Scalar* x, int n) {
    Scalar e = 0;
    for (int i = 0; i < n; i++) {
        e += x[i] * x[i];
    }
    return e;
}
```

This style gives forward-mode AD almost for free.

Expression templates go further. They delay evaluation and build expression trees at compile time. AD libraries can exploit these trees to reduce temporaries and fuse operations.

The downside is compile time, error-message complexity, and fragile interaction with large codebases.

### Custom Adjoints for Libraries

C and C++ programs often call BLAS, LAPACK, FFT libraries, sparse solvers, graphics kernels, and vendor APIs. AD systems rarely differentiate through these binaries directly.

Instead, they use custom derivative rules.

| Library operation | Derivative treatment |
|---|---|
| Matrix multiply | Transposed matrix products |
| Linear solve | Solve adjoint linear system |
| Cholesky | Structured triangular derivative |
| FFT | Linear adjoint transform |
| Convolution | Correlation-like adjoint kernels |
| ODE solver | Sensitivity or adjoint method |

Custom adjoints are essential for performance. Differentiating the implementation of a dense matrix multiply loop would lose optimized BLAS performance. Calling optimized adjoint kernels preserves the performance model.

### Memory Management

Manual memory management complicates AD.

Reverse mode extends object lifetimes because intermediate values may be needed after the forward pass. A buffer that can be freed in ordinary execution may need to remain alive until the reverse pass.

C++ RAII helps, but AD systems must still manage:

| Concern | AD consequence |
|---|---|
| Stack allocation | Values may disappear before reverse pass |
| Heap allocation | Tape must own or reference values safely |
| Move semantics | Active values need valid adjoint identity |
| Destructors | Side effects may occur during unwinding |
| Custom allocators | Tape and adjoint storage must integrate cleanly |

High-performance systems often use arena allocation for tapes and adjoints.

### Exceptions and Undefined Behavior

C and C++ contain behavior that is difficult to differentiate safely.

Exceptions create non-local control flow. Reverse mode must know which operations completed before the exception. In many AD systems, differentiable regions are expected to avoid exceptions.

Undefined behavior is worse. If the primal program has undefined behavior, the derivative program has no reliable meaning. Examples include out-of-bounds access, use-after-free, signed integer overflow, and invalid aliasing assumptions.

A robust AD tool cannot repair undefined primal semantics.

### Performance Model

C and C++ AD performance depends heavily on the implementation strategy.

| Approach | Strength | Weakness |
|---|---|---|
| Forward operator overloading | Simple, generic, good for few inputs | Expensive for many-input scalar-output gradients |
| Reverse tape | Good for scalar losses, dynamic control flow | Tape memory and runtime overhead |
| Source transformation | Efficient generated code | Hard language coverage |
| LLVM IR transformation | Broad frontend support | Low-level alias and memory complexity |
| Custom adjoints | High performance for libraries | Requires manual rules |

Production systems often combine these methods.

### AD Tools in C and C++

Representative tools include:

| Tool | Main approach |
|---|---|
| ADOL-C | Operator overloading and tape-based AD |
| CppAD | Operator overloading with tapes |
| Adept | Expression templates and reverse mode |
| CoDiPack | Operator overloading for scientific computing |
| Stan Math | Reverse-mode autodiff for probabilistic modeling |
| Tapenade | Source transformation, especially C and Fortran |
| Enzyme | LLVM IR-level automatic differentiation |

Each tool reflects a different answer to the same question: where should differentiation happen, at the library level, source level, or compiler level?

### Practical Design Guidance

For C and C++ systems, the AD strategy should follow the workload.

Use forward-mode operator overloading when the number of independent variables is small, templates are already used, and development speed matters.

Use reverse-mode tape systems when computing gradients of scalar objectives with many parameters, especially when the program has dynamic control flow.

Use source transformation or compiler-level AD when performance matters and the codebase can tolerate a more structured build pipeline.

Use custom adjoints for large library primitives. This is usually mandatory for linear algebra, solvers, FFTs, and accelerator kernels.

The central engineering problem is not the derivative rule for multiplication. It is preserving C and C++ program semantics while producing derivative code that remains fast, memory-safe, and compatible with existing numerical libraries.

