# Symbolic Differentiation

## Symbolic Differentiation

Symbolic differentiation computes derivatives by manipulating expressions. The input is a formula. The output is another formula.

For example, given

$$
f(x) = x^2 + \sin x,
$$

symbolic differentiation produces

$$
f'(x) = 2x + \cos x.
$$

This is the form of differentiation taught in calculus courses. It is exact over the algebraic expression being manipulated. It uses rules such as the sum rule, product rule, quotient rule, and chain rule.

## Expression Trees

A symbolic differentiator usually represents a formula as a tree.

For

$$
f(x) = x^2 + \sin x,
$$

the root is addition. The left child is exponentiation. The right child is sine.

```text
        +
       / \
      ^   sin
     / \    \
    x   2    x
```

Differentiation walks this tree and rewrites each node according to a derivative rule.

The rules are local:

$$
\frac{d}{dx}(u+v) =
\frac{du}{dx}
+
\frac{dv}{dx},
$$

$$
\frac{d}{dx}(uv) =
u\frac{dv}{dx}
+
v\frac{du}{dx},
$$

$$
\frac{d}{dx}\sin u =
\cos u \frac{du}{dx}.
$$

The differentiator recursively applies these rules until the whole expression has been transformed.

## Symbolic Rules

A minimal symbolic differentiator needs rules for constants, variables, arithmetic, and elementary functions.

| Expression | Derivative with respect to $x$ |
|---|---|
| $c$ | $0$ |
| $x$ | $1$ |
| $y$, where $y \ne x$ | $0$ |
| $u + v$ | $u' + v'$ |
| $u - v$ | $u' - v'$ |
| $uv$ | $u'v + uv'$ |
| $u/v$ | $(u'v - uv')/v^2$ |
| $u^n$ | $n u^{n-1}u'$ |
| $\sin u$ | $\cos(u)u'$ |
| $\cos u$ | $-\sin(u)u'$ |
| $\exp u$ | $\exp(u)u'$ |
| $\log u$ | $u'/u$ |

These rules are mathematically clean. They preserve exact structure when the expression language is small and well defined.

## A Small Example

Let

$$
f(x) = (x^2 + 1)\sin x.
$$

The product rule gives

$$
f'(x) =
\frac{d}{dx}(x^2+1)\sin x
+
(x^2+1)\frac{d}{dx}\sin x.
$$

Then

$$
\frac{d}{dx}(x^2+1)=2x,
$$

and

$$
\frac{d}{dx}\sin x=\cos x.
$$

So

$$
f'(x) =
2x\sin x
+
(x^2+1)\cos x.
$$

This result is compact. It is easy to inspect and may be useful for analysis, simplification, or exact reasoning.

## Expression Swell

Symbolic differentiation can produce expressions much larger than the original expression. This problem is called expression swell.

Consider a deeply nested composition:

$$
f(x) = g_1(g_2(g_3(\cdots g_k(x)))).
$$

The derivative is

$$
f'(x) =
g_1'(\cdots)
g_2'(\cdots)
g_3'(\cdots)
\cdots
g_k'(x).
$$

If the expression system repeatedly substitutes full subexpressions, intermediate expressions may grow rapidly.

Product and quotient rules are common sources of growth. For example, differentiating a product of many terms expands into a sum of many products:

$$
\frac{d}{dx}(u_1u_2\cdots u_n) =
\sum_{i=1}^{n}
u_1\cdots u_{i-1}u_i'u_{i+1}\cdots u_n.
$$

The derivative has $n$ product terms. Repeated differentiation can make this growth much worse.

Symbolic systems fight expression swell using simplification, common subexpression elimination, sharing, factoring, and delayed expansion. These techniques help, but they add complexity.

## Programs Are Not Just Expressions

Symbolic differentiation works best when the function is a formula in a known expression language. Many numerical functions are programs instead.

A program may contain:

| Program feature | Difficulty for symbolic differentiation |
|---|---|
| Assignment | Requires tracking intermediate definitions |
| Mutation | Requires reasoning about changing state |
| Loops | Requires summarizing repeated computation |
| Recursion | Requires fixed-point or recurrence reasoning |
| Branches | Produces piecewise derivatives |
| Arrays | Requires index reasoning |
| Library calls | Requires derivative rules for external operations |
| Floating point behavior | Differs from real arithmetic |
| Randomness | Requires probabilistic semantics or fixed traces |

A symbolic system can handle some of these features if the language is restricted. But ordinary programs are richer than algebraic expressions.

For example:

```text
function f(x, n):
    y = 1
    for i in 1..n:
        y = y * (x + i)
    return y
```

Mathematically, this computes

$$
f(x,n)=\prod_{i=1}^{n}(x+i).
$$

A symbolic differentiator may expand or represent the product symbolically. But if $n$ is known only at runtime, the expression structure depends on the execution.

Automatic differentiation handles this case naturally by differentiating the operations that actually execute.

## Symbolic Differentiation and Simplification

Raw symbolic derivatives often contain redundant terms.

For

$$
f(x) = x + 0,
$$

a naive differentiator may produce

$$
1 + 0.
$$

For

$$
f(x) = x \cdot x,
$$

the product rule gives

$$
1 \cdot x + x \cdot 1.
$$

A simplifier can reduce this to

$$
2x.
$$

Simplification is necessary for readable output and efficient evaluation. But simplification itself is a large problem. Some simplifications are local:

$$
u + 0 \to u,
\qquad
u \cdot 1 \to u,
\qquad
u \cdot 0 \to 0.
$$

Others require algebraic reasoning:

$$
x + x \to 2x,
$$

$$
\sin^2 x + \cos^2 x \to 1.
$$

General simplification can be expensive and domain-dependent. A simplifier for polynomials differs from a simplifier for trigonometric identities, matrix expressions, or tensor programs.

## Symbolic Differentiation and Code Generation

Symbolic differentiation can be effective when paired with code generation.

A system may accept a high-level formula, compute symbolic derivatives, simplify them, and emit numerical code. This approach is useful in domains where models are concise and structured:

| Domain | Example derivative use |
|---|---|
| Mechanics | Equations of motion |
| Control | Linearized dynamics |
| Robotics | Kinematic Jacobians |
| Optimization | Constraint gradients |
| Statistics | Log-likelihood gradients |
| PDE solvers | Weak-form residual derivatives |

In these settings, symbolic differentiation can expose structure that helps performance. It can simplify constants, eliminate zero terms, and produce specialized kernels.

But code generation becomes harder when the original computation uses dynamic control flow, complex data structures, or large external libraries.

## Symbolic Differentiation Compared with AD

Symbolic differentiation transforms expressions into derivative expressions.

Automatic differentiation transforms or augments computations so that derivatives are computed along with values.

The distinction matters.

Symbolic differentiation asks:

$$
\text{Given an expression for } f, \text{ what is an expression for } f'?
$$

Automatic differentiation asks:

$$
\text{Given a computation of } f(x), \text{ how does derivative information propagate through that computation?}
$$

Symbolic differentiation works at the level of formulas. AD works at the level of executed operations or program transformations.

For small expressions, the two may look similar. For large programs, they behave differently.

## A Shared Foundation: The Chain Rule

Symbolic differentiation and automatic differentiation both rely on the chain rule. The difference is where the chain rule is applied.

Symbolic differentiation applies it to expression syntax.

Automatic differentiation applies it to a computation trace or transformed program.

For example, let

$$
u = x^2,
\qquad
v = \sin u,
\qquad
y = v + 1.
$$

A symbolic system may inline this as

$$
y = \sin(x^2) + 1
$$

and derive

$$
\frac{dy}{dx} =
\cos(x^2)2x.
$$

An AD system may keep the intermediate variables and propagate derivative information:

$$
\dot{u}=2x\dot{x},
$$

$$
\dot{v}=\cos(u)\dot{u},
$$

$$
\dot{y}=\dot{v}.
$$

Both compute the same mathematical derivative. But the AD form follows the program structure directly.

## Why Symbolic Differentiation Is Not Enough

Symbolic differentiation is exact and valuable, but it has structural limits.

It requires a symbolic representation of the function.

It can produce large derivative expressions.

It needs simplification to produce efficient code.

It struggles with unrestricted programs.

It has difficulty matching the execution behavior of floating point numerical software.

Automatic differentiation was developed to address these limits. It keeps the local exactness of derivative rules while operating on computations rather than only formulas. This makes it better suited for large numerical programs, machine learning systems, simulation codes, and differentiable software infrastructure.

