Symbolic Versus Dynamic Computation

Deep learning frameworks need a way to represent computation. Some systems represent computation as a graph built before execution. Some systems build the graph while ordinary program code runs. Some systems compile parts of the program into optimized graphs while preserving an imperative programming style.

PyTorch began with a dynamic computation model. This means the graph is built from the operations that actually execute. This design makes PyTorch easy to use for research, debugging, and models with control flow. Modern PyTorch also includes compilation tools, so the same program can often be optimized without giving up the dynamic programming model.

Symbolic Computation

In symbolic computation, the user defines a computation abstractly before running it. The system builds a symbolic graph of operations. Later, the graph is executed with concrete input values.

A symbolic graph might represent:

y = x^2 + 3x + 1.

The graph records the operations:

x \longrightarrow x^2

x \longrightarrow 3x

x^2,3x,1 \longrightarrow y.

The variable $x$ is a placeholder. The graph can be optimized, transformed, serialized, and executed repeatedly with different inputs.

Symbolic systems are useful because the framework can inspect the whole computation before running it. This can enable optimization, memory planning, device placement, graph rewriting, and deployment to environments where Python is unavailable.

Dynamic Computation

In dynamic computation, the program runs normally, and the framework records operations as they happen.

In PyTorch:

import torch

x = torch.tensor(2.0, requires_grad=True)

y = x ** 2 + 3 * x + 1

y.backward()

print(x.grad)  # tensor(7.)

There is no separate graph construction phase. Python executes each line. PyTorch records the tensor operations involving x because x.requires_grad=True.

The graph is concrete. It corresponds to the actual operations that ran for this input.

This is why dynamic systems are often called define-by-run systems. The computation is defined by running the program.

Control Flow

Dynamic computation works naturally with ordinary Python control flow.

x = torch.tensor(2.0, requires_grad=True)

if x.item() > 0:
    y = x ** 2
else:
    y = -x

y.backward()

print(x.grad)  # tensor(4.)

Only the branch that runs becomes part of the graph. If x were negative, PyTorch would build a different graph.

Loops also work naturally:

x = torch.tensor(1.0, requires_grad=True)

y = x
for _ in range(5):
    y = y * 2 + 1

y.backward()

print(x.grad)

The graph contains five repeated applications of the same operation because the loop ran five times.

This is useful for variable-length sequences, recursive structures, search procedures, reinforcement learning rollouts, graph neural networks, and models whose computation depends on data.

Symbolic Graph Advantages

Symbolic graphs have several advantages.

First, the system can optimize the graph before execution. For example, it can remove unused operations, fuse several operations into one kernel, simplify algebraic expressions, or plan memory reuse.

Second, a symbolic graph can be exported. A graph representation can be saved and deployed without the original Python program.

Third, static graph structure can help distributed systems. The framework may know ahead of time where communication happens, which tensors need synchronization, and how memory should be allocated.

Fourth, graph-level compilation can reduce overhead. This matters when many small operations would otherwise pay Python dispatch cost.

Symbolic graph systems are therefore attractive for production serving, mobile inference, accelerators, and large-scale training pipelines.

Dynamic Graph Advantages

Dynamic computation has different strengths.

First, it is easy to debug. Tensors have concrete values at every line of code. A user can print shapes, inspect values, use a Python debugger, and write normal control flow.

Second, it is expressive. Models can use loops, branches, recursion, and data-dependent computation without a special graph language.

Third, it is convenient for research. Experimental architectures often change quickly. A dynamic framework lets the programmer write direct Python code rather than construct a separate symbolic representation.

Fourth, errors tend to occur close to the source. If a tensor shape is wrong, the exception usually appears on the line that performed the invalid operation.

This is one reason PyTorch became popular in research workflows.

Static Versus Dynamic Graphs

The distinction can be summarized as follows:

Aspect	Symbolic or static graph	Dynamic graph
Graph construction	Before execution	During execution
Control flow	Must be represented in graph language or traced	Ordinary Python control flow
Debugging	Often less direct	Direct Python debugging
Optimization	Whole-graph optimization is natural	Requires tracing, scripting, or compilation
Deployment	Graph export is natural	Needs export or compilation path
Research flexibility	Lower	Higher

Modern frameworks blur this distinction. PyTorch supports dynamic execution by default, but can also capture and compile graph regions.

Tracing

Tracing records operations that run for example inputs. The tracer observes tensor operations and builds a graph from them.

A simplified idea:

def f(x):
    return torch.relu(x @ x.T)

example = torch.randn(4, 4)

A tracer runs f(example) and records the operations:

matrix multiplication
transpose
ReLU

Tracing works well when the computation structure does not depend on data values. It can fail or become inaccurate when Python control flow depends on tensor data.

Example:

def f(x):
    if x.sum().item() > 0:
        return x ** 2
    else:
        return -x

A tracer using one example input records only the branch that ran for that example. If a later input should take the other branch, the traced graph may not represent the intended computation.

This limitation is central to tracing-based systems.

Scripting

Scripting attempts to convert a subset of Python code into an intermediate graph representation while preserving control flow.

Unlike tracing, scripting can represent branches and loops in the graph, provided they use supported language features and types.

The advantage is more faithful graph capture for programs with control flow. The cost is that not all Python code can be scripted. The user may need to follow restrictions on types, data structures, and supported operations.

Historically, PyTorch used TorchScript for this purpose. It allowed models to be converted into a form that could be optimized and executed outside ordinary Python. In modern PyTorch, torch.compile and related tools handle many optimization use cases, while export and deployment paths continue to evolve.

Compilation in Modern PyTorch

Modern PyTorch can compile parts of a model to improve performance.

A typical pattern is:

import torch
from torch import nn

model = nn.Sequential(
    nn.Linear(128, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
)

compiled_model = torch.compile(model)

The compiled model can be used like the original model:

x = torch.randn(32, 128)
y = compiled_model(x)

The goal is to preserve the normal PyTorch programming model while allowing the system to capture graph regions, optimize them, and generate faster code.

Compilation may reduce Python overhead, fuse operations, improve memory behavior, and target efficient kernels. The exact benefit depends on model architecture, input shapes, hardware, and backend.

Graph Breaks

A graph break occurs when the compiler cannot capture part of the program into a graph. Execution falls back to ordinary Python for that part, then graph capture may resume later.

Graph breaks can be caused by unsupported Python features, data-dependent control flow, side effects, printing, mutation of Python containers, or operations the compiler cannot analyze.

Example:

def forward(x):
    print(x.shape)       # side effect
    return torch.relu(x)

The print statement may cause a graph break because it is a Python side effect.

A graph break does not necessarily make the program wrong. It may reduce performance because the computation is split into smaller graph regions.

When optimizing PyTorch programs, one often tries to reduce unnecessary graph breaks.

Dynamic Shapes

Many deep learning workloads have variable shapes. Sequence lengths may vary. Image sizes may vary. Batch sizes may vary.

Dynamic computation handles this naturally because the graph is rebuilt each time. Compilation systems need to decide whether to specialize to particular shapes or generate code that supports multiple shapes.

Specialization can be fast but may require recompilation for new shapes. Dynamic-shape support is more flexible but harder to optimize.

For example:

for x in batches:
    y = compiled_model(x)

If x.shape changes often, the compiler may need to generate multiple specialized graphs or use dynamic-shape logic.

For high-performance training, stable shapes are usually easier to optimize. This is one reason sequence models often use padding and bucketing.

Eager Execution

PyTorch’s default mode is eager execution. Each operation runs immediately.

x = torch.randn(3, 3)
y = x @ x.T
z = torch.relu(y)

print(z)

After each line, the result exists. This makes debugging direct.

Eager execution also means Python overhead can matter. If a model performs many small tensor operations, the cost of dispatching each operation from Python may become significant. Compilation and kernel fusion reduce this overhead by grouping work.

Kernel Fusion

Kernel fusion combines multiple tensor operations into a single lower-level kernel.

For example, instead of launching separate kernels for

y = torch.relu(x + b)

a compiler may fuse addition and ReLU into one kernel.

This can reduce memory traffic. Without fusion, the program may write the intermediate result x + b to memory and then read it again for ReLU. With fusion, the kernel can compute the addition and apply ReLU before writing the final result.

Kernel fusion is especially useful for elementwise operations, normalization patterns, activation functions, and small operations around large matrix multiplications.

Export and Deployment

Training code often uses Python. Deployment systems may need a portable representation.

Exporting a model means converting it into a graph or program representation that another runtime can execute. Common deployment targets include servers, mobile devices, browsers, edge devices, and specialized accelerators.

A deployed model usually needs:

Requirement	Meaning
Stable computation	Same operations as training or evaluation code
Known inputs	Shape and dtype constraints
Serialized parameters	Trained weights stored with the graph or loaded separately
Runtime support	Operators available on target platform
Numerical consistency	Acceptable agreement with PyTorch outputs

Dynamic Python code must often be restricted or captured before deployment.

Symbolic and Dynamic Thinking Together

A practical PyTorch programmer uses both models of thought.

During research and debugging, think dynamically. Write direct Python code. Inspect tensors. Print shapes. Use assertions. Let the graph be built by execution.

During optimization and deployment, think symbolically. Ask which operations form a stable graph. Avoid unnecessary Python side effects in the forward path. Keep shapes predictable when possible. Use compilation or export tools where appropriate.

This combined view is important for modern deep learning. The same model may begin as flexible research code and later become a compiled training kernel or exported inference graph.

Example: Debug First, Compile Later

A common workflow is:

model = MyModel()

# Debug in eager mode first.
y = model(x)
loss = loss_fn(y, target)
loss.backward()

After the model is correct:

model = torch.compile(model)

Then benchmark:

import time

for _ in range(10):
    y = model(x)

torch.cuda.synchronize()
start = time.time()

for _ in range(100):
    y = model(x)

torch.cuda.synchronize()
elapsed = time.time() - start

print(elapsed)

This pattern keeps correctness and performance separate. First make the model right. Then make it faster.

Common Mistakes

A common mistake is compiling too early. Compilation can make stack traces harder to read. Debug the eager model first.

Another mistake is assuming tracing captures all possible behavior. Tracing captures what happened for example inputs. If the model has data-dependent branches, tracing may miss important paths.

A third mistake is writing forward methods with unnecessary Python side effects. Printing, appending to global lists, mutating external objects, or using data-dependent Python values can interfere with graph capture.

A fourth mistake is using highly variable shapes without considering performance. Dynamic shapes are convenient, but stable shapes are often faster.

Summary

Symbolic computation builds an abstract graph before execution. Dynamic computation builds the graph as the program runs. PyTorch uses dynamic eager execution by default, which makes model development and debugging direct.

Symbolic graphs are useful for optimization and deployment. Modern PyTorch combines both approaches through tools such as tracing, export, and compilation. The practical workflow is to write clear eager code first, then capture or compile stable parts of the computation when performance or deployment requires it.