# 22. Compiler Passes

# 22. Compiler Passes

Compiler passes turn the AST and symbol table information into an executable code object.

Earlier stages answer structural questions.

```text id="lfk0j9"
tokenizer:
    What lexical units are in the source?

parser:
    What syntax tree do these tokens form?

symbol table:
    What scope does each name belong to?
```

The compiler answers execution questions.

```text id="ju2gsy"
Which bytecode instructions should be emitted?
Which constants belong in co_consts?
Which names belong in co_names?
Which local variables belong in co_varnames?
Where should jumps go?
How large can the evaluation stack grow?
Which exception table entries are needed?
Which nested code objects must be created?
```

The output is a `code` object. That code object can be executed by the CPython evaluation loop.

## 22.1 Position in the Compilation Pipeline

The compiler sits after parsing and scope analysis.

```text id="9mc7xw"
source
    ↓
tokenization
    ↓
parsing
    ↓
AST
    ↓
symbol table
    ↓
compiler passes
    ↓
code object
    ↓
interpreter execution
```

The compiler does not execute Python code. It emits bytecode and metadata.

At Python level, the `compile()` function exposes this stage:

```python id="2sweqj"
src = "x = 1 + 2\n"
code = compile(src, "<input>", "exec")

print(code)
```

You can inspect the result with `dis`:

```python id="24znqy"
import dis

code = compile("x = 1 + 2\n", "<input>", "exec")
dis.dis(code)
```

## 22.2 Main Inputs to the Compiler

The compiler consumes two main products from earlier stages.

| Input         | Role                                                               |
| ------------- | ------------------------------------------------------------------ |
| AST           | Describes statements, expressions, operators, and source locations |
| Symbol tables | Describe scope classification for names                            |

The AST says:

```text id="iqtzc2"
there is an assignment
target is Name("x")
value is BinOp(Constant(1), Add, Constant(2))
```

The symbol table says:

```text id="2kd8cu"
x is local/global in this scope
```

Together they allow bytecode generation.

For example, the compiler chooses between:

```text id="1lrzcy"
STORE_FAST
STORE_NAME
STORE_GLOBAL
STORE_DEREF
```

depending on scope information.

## 22.3 Compiler Output: Code Objects

The compiler emits code objects.

A code object contains bytecode and execution metadata.

Important fields include:

```text id="6p1zp9"
co_code             bytecode bytes
co_consts           constants
co_names            global and attribute names
co_varnames         local variable names
co_freevars         free variable names
co_cellvars         cell variable names
co_filename         source filename
co_name             function or module name
co_qualname         qualified name
co_firstlineno      first source line
co_flags            execution flags
co_stacksize        required value stack size
```

Example:

```python id="z2p4ol"
def add(a, b):
    return a + b

code = add.__code__

print(code.co_name)
print(code.co_varnames)
print(code.co_consts)
print(code.co_names)
print(code.co_stacksize)
```

A code object is immutable execution data. A function object wraps a code object with runtime context such as globals, defaults, annotations, and closure cells.

## 22.4 Compilation Is Recursive

Nested functions, lambdas, comprehensions, classes, and generator expressions create nested code objects.

Example:

```python id="nxf9b9"
def outer(x):
    def inner(y):
        return x + y
    return inner
```

The module code object contains a function code object for `outer`.

The `outer` code object contains a function code object for `inner`.

Conceptually:

```text id="g7e4jo"
module code object
    constants:
        code object for outer
            constants:
                code object for inner
```

At runtime, executing `def outer` creates a function object from the `outer` code object. Executing `def inner` inside `outer` creates another function object, with closure cells when needed.

## 22.5 Statement Compilation

Statements usually emit bytecode that performs actions.

Example:

```python id="f8r2kh"
x = 1
```

Compilation pattern:

```text id="vbnwbg"
compile right-hand expression
store result into target
```

Simplified bytecode:

```text id="ozi020"
LOAD_CONST 1
STORE_NAME x
```

Example:

```python id="jflq5g"
return x
```

Compilation pattern:

```text id="j5nviu"
compile return expression
emit RETURN_VALUE
```

Simplified bytecode:

```text id="5k1c90"
LOAD_FAST x
RETURN_VALUE
```

The exact bytecode changes by Python version, but the structural idea is stable.

## 22.6 Expression Compilation

Expressions emit bytecode that leaves a value on the evaluation stack.

Example:

```python id="thj50j"
a + b
```

Compilation pattern:

```text id="f51gg5"
load a
load b
apply binary operation
```

Simplified bytecode:

```text id="esgtwo"
LOAD_FAST a
LOAD_FAST b
BINARY_OP +
```

An expression statement usually discards its result:

```python id="77sckz"
f()
```

Compilation pattern:

```text id="u882gm"
load function
call function
pop result
```

Simplified bytecode:

```text id="fo9egq"
LOAD_NAME f
CALL
POP_TOP
```

The compiler maintains stack discipline. Every expression has a known stack effect.

## 22.7 Name Compilation

Name bytecode depends on scope classification.

Example:

```python id="7v96c6"
def f(a):
    return a
```

`a` is a local variable:

```text id="spceye"
LOAD_FAST a
```

Example:

```python id="o6h1if"
x = 1

def f():
    return x
```

`x` is global from inside `f`:

```text id="xmpdg1"
LOAD_GLOBAL x
```

Example:

```python id="yptzi2"
def outer():
    x = 1

    def inner():
        return x
```

`x` is a closure variable from inside `inner`:

```text id="og9dfx"
LOAD_DEREF x
```

This is why symbol table analysis precedes bytecode emission.

## 22.8 Target Compilation

Assignment targets compile differently from value expressions.

Example:

```python id="xpt040"
x = value
```

Target operation:

```text id="jfu667"
STORE_FAST or STORE_NAME or STORE_GLOBAL or STORE_DEREF
```

Example:

```python id="rtgk3q"
obj.attr = value
```

Target operation:

```text id="r72bwa"
compile obj
compile value
STORE_ATTR attr
```

Example:

```python id="s5jn7u"
items[i] = value
```

Target operation:

```text id="yyzjb4"
compile items
compile i
compile value
STORE_SUBSCR
```

The same AST shape can compile differently depending on context.

Compare:

```python id="i5tdmi"
obj.attr
```

used as a value:

```text id="14wnr4"
LOAD_ATTR attr
```

and:

```python id="c46iv2"
obj.attr = 1
```

used as a target:

```text id="7uswmv"
STORE_ATTR attr
```

AST contexts such as `Load`, `Store`, and `Del` drive this distinction.

## 22.9 Control Flow Compilation

Control flow requires jumps and labels.

Example:

```python id="5u72qc"
if x:
    a = 1
else:
    a = 2
```

Compilation pattern:

```text id="c7me6c"
compile test x
jump to else block if false
compile body
jump to end
else label:
compile else body
end label:
continue
```

Simplified bytecode shape:

```text id="ukb3dq"
LOAD_NAME x
POP_JUMP_IF_FALSE else_label

LOAD_CONST 1
STORE_NAME a
JUMP_FORWARD end_label

else_label:
LOAD_CONST 2
STORE_NAME a

end_label:
```

The compiler first emits symbolic jumps or internal labels. Later assembly resolves them into concrete byte offsets or instruction offsets.

## 22.10 Loops

Loops require entry labels, exit labels, and sometimes continue targets.

Example:

```python id="28xqrv"
while x:
    work()
```

Compilation pattern:

```text id="aq977s"
loop_start:
compile test
jump to loop_end if false
compile body
jump to loop_start
loop_end:
```

Example:

```python id="u0a0kq"
for item in items:
    work(item)
```

Compilation pattern:

```text id="fs8o3k"
compile iterable
get iterator
loop_start:
get next item or jump to loop_end
store item
compile body
jump to loop_start
loop_end:
```

`break` jumps to the loop exit. `continue` jumps to the loop continuation point.

Nested loops require a compiler block stack so `break` and `continue` target the correct enclosing loop.

## 22.11 Exception Handling

Exception handling compilation is more complex because it needs control flow metadata.

Example:

```python id="zbie3d"
try:
    risky()
except ValueError:
    recover()
finally:
    cleanup()
```

The compiler must produce bytecode for:

```text id="7fd19n"
normal execution
exception matching
handler entry
handler cleanup
finally execution
reraising when needed
```

Modern CPython uses an exception table associated with the code object rather than only inline block setup opcodes. The exception table tells the interpreter which handler covers which instruction ranges.

Conceptually:

```text id="9wg6kk"
protected bytecode range:
    risky()

handler:
    if exception matches ValueError:
        recover()

finally:
    cleanup()
```

The compiler must preserve Python’s exact exception semantics, including `else`, `finally`, `raise`, exception chaining, and cleanup of exception handler variables.

## 22.12 Function Definition Compilation

A function definition compiles in two layers.

Example:

```python id="9423bh"
def add(a, b):
    return a + b
```

Layer 1: compile the function body into a nested code object.

```text id="8pamlr"
code object for add:
    LOAD_FAST a
    LOAD_FAST b
    BINARY_OP +
    RETURN_VALUE
```

Layer 2: compile the outer statement that creates the function object.

```text id="ndgkj2"
LOAD_CONST <code object add>
MAKE_FUNCTION
STORE_NAME add
```

This distinction is central.

The function body does not run when the compiler sees `def`. At runtime, executing the `def` statement creates a function object and binds it to the function name.

## 22.13 Defaults, Annotations, and Decorators

Function definitions may include defaults, annotations, and decorators.

Example:

```python id="06culw"
@trace
def f(x: int, y=1) -> int:
    return x + y
```

Compilation must handle:

```text id="st02n8"
default argument value y=1
parameter annotation x: int
return annotation -> int
function body code object
function object creation
decorator application
name binding
```

Conceptual runtime order:

```text id="e9t5yf"
evaluate decorator expressions
evaluate default values
evaluate annotations according to current rules
create function object
apply decorators from bottom to top
bind final function object to name
```

The compiler emits bytecode that implements this order.

## 22.14 Class Definition Compilation

A class definition also compiles in layers.

Example:

```python id="c7bkv3"
class C(Base):
    x = 1

    def f(self):
        return self.x
```

The class body becomes a code object.

At runtime:

```text id="s2rsuu"
evaluate base classes
create class namespace
execute class body code object in that namespace
call metaclass machinery
bind resulting class object to name
```

The compiler emits bytecode that calls the internal class-building protocol.

A class body is executable code. It can contain arbitrary statements:

```python id="we8528"
class C:
    print("building class")
    x = compute()
```

Those statements run when the class definition executes.

## 22.15 Closure Compilation

Closures require coordinated work between symbol table analysis and bytecode generation.

Example:

```python id="ovqkwa"
def outer():
    x = 1

    def inner():
        return x

    return inner
```

The compiler must:

```text id="ovv4rt"
create a cell for x in outer
compile inner with x as a free variable
create inner function with closure tuple
load x through closure access in inner
```

Simplified shape inside `outer`:

```text id="3irffr"
MAKE_CELL x
LOAD_CONST 1
STORE_DEREF x
LOAD_CLOSURE x
BUILD_TUPLE 1
LOAD_CONST <code object inner>
MAKE_FUNCTION closure
STORE_FAST inner
LOAD_FAST inner
RETURN_VALUE
```

Inside `inner`:

```text id="31h0a6"
LOAD_DEREF x
RETURN_VALUE
```

The exact instructions vary by version, but the closure mechanism remains: captured variables live in cells.

## 22.16 Comprehension Compilation

Comprehensions compile as nested code objects.

Example:

```python id="5w7ihk"
values = [x * x for x in range(5)]
```

Conceptually, CPython creates code similar to an implicit nested function for the comprehension body.

The outer code evaluates the iterable and calls the comprehension code. The comprehension code loops and builds the result list.

This explains why comprehension variables do not leak into the surrounding scope.

```python id="in5tuc"
x = 100
values = [x for x in range(3)]
print(x)
```

The outer `x` remains `100`.

The comprehension’s `x` is local to the comprehension code object.

## 22.17 Generator Compilation

Generator functions compile differently from normal functions.

Example:

```python id="esmh0i"
def gen():
    yield 1
    yield 2
```

The compiler marks the code object with generator flags.

When called, `gen()` does not execute the body immediately. It returns a generator object.

The body executes when the generator is advanced.

Compilation must emit yield points:

```text id="xk7fji"
LOAD_CONST 1
YIELD_VALUE
resume point
LOAD_CONST 2
YIELD_VALUE
resume point
RETURN_VALUE
```

A generator frame can suspend and resume. The code object flags tell the runtime how to construct and execute it.

## 22.18 Coroutine and Async Compilation

Async functions produce coroutine code objects.

Example:

```python id="5d36kx"
async def fetch():
    result = await client.get()
    return result
```

The compiler marks the code object as coroutine code.

It emits await machinery around awaitable expressions.

Conceptually:

```text id="g68r60"
call client.get()
obtain awaitable
suspend until awaitable completes
store result
return result
```

Async functions, async generators, `async for`, and `async with` all require special bytecode patterns and code flags.

## 22.19 Pattern Matching Compilation

Pattern matching has dedicated compilation logic.

Example:

```python id="jz9doq"
match value:
    case 0:
        result = "zero"
    case [x, y]:
        result = x + y
```

The compiler must emit code to:

```text id="1gk5a7"
evaluate subject once
try first pattern
jump to body if matched
try next pattern if failed
bind captured names only on successful match
evaluate guards when present
skip remaining cases after match
```

Pattern matching is neither ordinary `if` syntax nor ordinary assignment. It has its own matching and binding rules.

## 22.20 Constant Handling

The compiler stores constants in `co_consts`.

Example:

```python id="j0h9t8"
x = 123
y = "hello"
```

Code object constants may include:

```text id="4721fz"
None
123
"hello"
nested code objects
tuples of constants
frozensets used by optimized membership tests
```

Inspect:

```python id="qrm79o"
code = compile("x = 123\ny = 'hello'\n", "<input>", "exec")
print(code.co_consts)
```

Constants are referenced by index from bytecode instructions.

## 22.21 Name and Variable Tables

The compiler builds several name tables.

| Code object field | Meaning                                         |
| ----------------- | ----------------------------------------------- |
| `co_names`        | Names used for globals, attributes, and imports |
| `co_varnames`     | Fast local variable names                       |
| `co_cellvars`     | Locals captured by nested scopes                |
| `co_freevars`     | Variables captured from enclosing scopes        |

Example:

```python id="m1cvy7"
x = 1

def f(a):
    b = len(a)
    return x + b
```

Inside `f`:

```text id="8bgw2w"
co_varnames:
    a, b

co_names:
    len, x
```

If closures are involved, `co_cellvars` and `co_freevars` appear.

## 22.22 Stack Size Computation

CPython bytecode uses a value stack.

The compiler must compute the maximum stack depth needed by a code object.

Example:

```python id="kt5lcp"
x = a + b * c
```

Possible stack evolution:

```text id="q0x4gy"
LOAD a        stack: a
LOAD b        stack: a, b
LOAD c        stack: a, b, c
MULTIPLY      stack: a, result
ADD           stack: result
STORE x       stack:
```

Maximum stack depth: 3.

The code object stores this as `co_stacksize`.

The interpreter uses it to size frame storage.

## 22.23 Line Number and Position Tables

The compiler records mappings between bytecode and source positions.

These mappings support:

```text id="srmq4m"
tracebacks
debuggers
profilers
coverage tools
stepping behavior
error locations
```

Example:

```python id="st4v12"
def f():
    x = 1
    y = 2
    return x + y
```

Each bytecode range can be associated with source line and column information.

Modern CPython tracks more precise position information than older versions, which improves tracebacks and debugging.

## 22.24 Assembly

The compiler often emits an intermediate instruction representation before final bytecode bytes.

Assembly resolves:

```text id="ic8am2"
labels
jump targets
instruction offsets
extended arguments
line tables
exception tables
final bytecode layout
```

This is necessary because jump offsets depend on instruction sizes, and instruction sizes can change when offsets require extended arguments.

Conceptually:

```text id="nxzhpt"
emit symbolic instructions
compute instruction offsets
resolve jumps
insert extended arguments if needed
recompute if sizes changed
build final bytecode bytes
build metadata tables
create code object
```

## 22.25 AST Optimization

CPython performs some optimization at the AST or compiler level.

Examples may include:

```text id="45a56p"
constant folding
removing unreachable assert code under optimization
simplifying literal containers for membership tests
basic expression simplifications
```

Example:

```python id="hspw89"
x = 1 + 2
```

may compile as if written:

```python id="99zdo5"
x = 3
```

Inspect:

```python id="3ltxiz"
import dis

dis.dis(compile("x = 1 + 2\n", "<input>", "exec"))
```

Optimization is conservative. CPython must preserve Python semantics, including side effects, exceptions, and evaluation order.

## 22.26 Evaluation Order

The compiler must preserve Python’s evaluation order.

Example:

```python id="0rat4c"
result = f() + g()
```

`f()` must execute before `g()`.

Example:

```python id="3cgy5k"
obj.attr = value()
```

The target object and value are evaluated in the order required by the language.

Example:

```python id="9uqpy4"
d[key()] = value()
```

The compiler cannot freely reorder operations just because another order might be faster.

Python’s dynamic semantics make evaluation order part of observable behavior.

## 22.27 Error Handling During Compilation

Some errors are detected during compilation, not parsing.

Examples:

```python id="uxlk51"
return 1
```

at module level.

```python id="nlyacf"
break
```

outside a loop.

```python id="kggjx7"
continue
```

outside a loop.

The parser can build nodes for these statements. The compiler rejects them when context makes them illegal.

This means a syntax-shaped AST may still fail to compile.

## 22.28 Public APIs for Compilation

Python exposes compilation through `compile()`.

Example:

```python id="njqphs"
code = compile("x = 1\n", "<input>", "exec")
exec(code)
```

Modes:

| Mode     | Meaning                                |
| -------- | -------------------------------------- |
| `exec`   | Compile a module or statement sequence |
| `eval`   | Compile a single expression            |
| `single` | Compile interactive input              |

Examples:

```python id="s246ja"
compile("x = 1", "<input>", "exec")
compile("1 + 2", "<input>", "eval")
compile("print(1)", "<input>", "single")
```

`compile()` can also compile an AST:

```python id="mcvj8n"
import ast

tree = ast.parse("x = 1\n")
code = compile(tree, "<ast>", "exec")
exec(code)
```

## 22.29 Important CPython Source Areas

Important compiler-related files include:

```text id="jwz8yn"
Python/compile.c
Python/symtable.c
Python/ast.c
Python/ast_opt.c
Python/flowgraph.c
Python/assemble.c
Include/cpython/code.h
Lib/dis.py
Lib/test/test_compile.py
Lib/test/test_dis.py
```

Conceptual roles:

| Area           | Role                               |
| -------------- | ---------------------------------- |
| `symtable.c`   | Scope classification               |
| `compile.c`    | AST-to-instruction compilation     |
| `flowgraph.c`  | Control-flow graph handling        |
| `assemble.c`   | Final bytecode assembly            |
| `code.h`       | Code object definitions            |
| `dis.py`       | Bytecode inspection                |
| compiler tests | Behavior and regression protection |

The exact file organization can shift across versions, but the conceptual pipeline remains stable.

## 22.30 Minimal Mental Model

Use this model:

```text id="ti9m3l"
The AST gives syntax.
The symbol table gives scope.
The compiler walks the AST and emits bytecode instructions.
Nested executable constructs produce nested code objects.
Control flow becomes jumps and exception tables.
Names become local, global, or closure bytecode.
Constants and names are stored in code object tables.
Assembly resolves labels, jumps, stack size, and metadata.
The result is an immutable code object ready for execution.
```

Compiler passes are where CPython turns syntax and scope into executable instructions.

