Skip to content

22. Compiler Passes

Multi-pass lowering from AST to CFG to linear bytecode, and the role of each compiler pass in Python/compile.c.

Compiler passes turn the AST and symbol table information into an executable code object.

Earlier stages answer structural questions.

tokenizer:
    What lexical units are in the source?

parser:
    What syntax tree do these tokens form?

symbol table:
    What scope does each name belong to?

The compiler answers execution questions.

Which bytecode instructions should be emitted?
Which constants belong in co_consts?
Which names belong in co_names?
Which local variables belong in co_varnames?
Where should jumps go?
How large can the evaluation stack grow?
Which exception table entries are needed?
Which nested code objects must be created?

The output is a code object. That code object can be executed by the CPython evaluation loop.

22.1 Position in the Compilation Pipeline

The compiler sits after parsing and scope analysis.

source
tokenization
parsing
AST
symbol table
compiler passes
code object
interpreter execution

The compiler does not execute Python code. It emits bytecode and metadata.

At Python level, the compile() function exposes this stage:

src = "x = 1 + 2\n"
code = compile(src, "<input>", "exec")

print(code)

You can inspect the result with dis:

import dis

code = compile("x = 1 + 2\n", "<input>", "exec")
dis.dis(code)

22.2 Main Inputs to the Compiler

The compiler consumes two main products from earlier stages.

InputRole
ASTDescribes statements, expressions, operators, and source locations
Symbol tablesDescribe scope classification for names

The AST says:

there is an assignment
target is Name("x")
value is BinOp(Constant(1), Add, Constant(2))

The symbol table says:

x is local/global in this scope

Together they allow bytecode generation.

For example, the compiler chooses between:

STORE_FAST
STORE_NAME
STORE_GLOBAL
STORE_DEREF

depending on scope information.

22.3 Compiler Output: Code Objects

The compiler emits code objects.

A code object contains bytecode and execution metadata.

Important fields include:

co_code             bytecode bytes
co_consts           constants
co_names            global and attribute names
co_varnames         local variable names
co_freevars         free variable names
co_cellvars         cell variable names
co_filename         source filename
co_name             function or module name
co_qualname         qualified name
co_firstlineno      first source line
co_flags            execution flags
co_stacksize        required value stack size

Example:

def add(a, b):
    return a + b

code = add.__code__

print(code.co_name)
print(code.co_varnames)
print(code.co_consts)
print(code.co_names)
print(code.co_stacksize)

A code object is immutable execution data. A function object wraps a code object with runtime context such as globals, defaults, annotations, and closure cells.

22.4 Compilation Is Recursive

Nested functions, lambdas, comprehensions, classes, and generator expressions create nested code objects.

Example:

def outer(x):
    def inner(y):
        return x + y
    return inner

The module code object contains a function code object for outer.

The outer code object contains a function code object for inner.

Conceptually:

module code object
    constants:
        code object for outer
            constants:
                code object for inner

At runtime, executing def outer creates a function object from the outer code object. Executing def inner inside outer creates another function object, with closure cells when needed.

22.5 Statement Compilation

Statements usually emit bytecode that performs actions.

Example:

x = 1

Compilation pattern:

compile right-hand expression
store result into target

Simplified bytecode:

LOAD_CONST 1
STORE_NAME x

Example:

return x

Compilation pattern:

compile return expression
emit RETURN_VALUE

Simplified bytecode:

LOAD_FAST x
RETURN_VALUE

The exact bytecode changes by Python version, but the structural idea is stable.

22.6 Expression Compilation

Expressions emit bytecode that leaves a value on the evaluation stack.

Example:

a + b

Compilation pattern:

load a
load b
apply binary operation

Simplified bytecode:

LOAD_FAST a
LOAD_FAST b
BINARY_OP +

An expression statement usually discards its result:

f()

Compilation pattern:

load function
call function
pop result

Simplified bytecode:

LOAD_NAME f
CALL
POP_TOP

The compiler maintains stack discipline. Every expression has a known stack effect.

22.7 Name Compilation

Name bytecode depends on scope classification.

Example:

def f(a):
    return a

a is a local variable:

LOAD_FAST a

Example:

x = 1

def f():
    return x

x is global from inside f:

LOAD_GLOBAL x

Example:

def outer():
    x = 1

    def inner():
        return x

x is a closure variable from inside inner:

LOAD_DEREF x

This is why symbol table analysis precedes bytecode emission.

22.8 Target Compilation

Assignment targets compile differently from value expressions.

Example:

x = value

Target operation:

STORE_FAST or STORE_NAME or STORE_GLOBAL or STORE_DEREF

Example:

obj.attr = value

Target operation:

compile obj
compile value
STORE_ATTR attr

Example:

items[i] = value

Target operation:

compile items
compile i
compile value
STORE_SUBSCR

The same AST shape can compile differently depending on context.

Compare:

obj.attr

used as a value:

LOAD_ATTR attr

and:

obj.attr = 1

used as a target:

STORE_ATTR attr

AST contexts such as Load, Store, and Del drive this distinction.

22.9 Control Flow Compilation

Control flow requires jumps and labels.

Example:

if x:
    a = 1
else:
    a = 2

Compilation pattern:

compile test x
jump to else block if false
compile body
jump to end
else label:
compile else body
end label:
continue

Simplified bytecode shape:

LOAD_NAME x
POP_JUMP_IF_FALSE else_label

LOAD_CONST 1
STORE_NAME a
JUMP_FORWARD end_label

else_label:
LOAD_CONST 2
STORE_NAME a

end_label:

The compiler first emits symbolic jumps or internal labels. Later assembly resolves them into concrete byte offsets or instruction offsets.

22.10 Loops

Loops require entry labels, exit labels, and sometimes continue targets.

Example:

while x:
    work()

Compilation pattern:

loop_start:
compile test
jump to loop_end if false
compile body
jump to loop_start
loop_end:

Example:

for item in items:
    work(item)

Compilation pattern:

compile iterable
get iterator
loop_start:
get next item or jump to loop_end
store item
compile body
jump to loop_start
loop_end:

break jumps to the loop exit. continue jumps to the loop continuation point.

Nested loops require a compiler block stack so break and continue target the correct enclosing loop.

22.11 Exception Handling

Exception handling compilation is more complex because it needs control flow metadata.

Example:

try:
    risky()
except ValueError:
    recover()
finally:
    cleanup()

The compiler must produce bytecode for:

normal execution
exception matching
handler entry
handler cleanup
finally execution
reraising when needed

Modern CPython uses an exception table associated with the code object rather than only inline block setup opcodes. The exception table tells the interpreter which handler covers which instruction ranges.

Conceptually:

protected bytecode range:
    risky()

handler:
    if exception matches ValueError:
        recover()

finally:
    cleanup()

The compiler must preserve Python’s exact exception semantics, including else, finally, raise, exception chaining, and cleanup of exception handler variables.

22.12 Function Definition Compilation

A function definition compiles in two layers.

Example:

def add(a, b):
    return a + b

Layer 1: compile the function body into a nested code object.

code object for add:
    LOAD_FAST a
    LOAD_FAST b
    BINARY_OP +
    RETURN_VALUE

Layer 2: compile the outer statement that creates the function object.

LOAD_CONST <code object add>
MAKE_FUNCTION
STORE_NAME add

This distinction is central.

The function body does not run when the compiler sees def. At runtime, executing the def statement creates a function object and binds it to the function name.

22.13 Defaults, Annotations, and Decorators

Function definitions may include defaults, annotations, and decorators.

Example:

@trace
def f(x: int, y=1) -> int:
    return x + y

Compilation must handle:

default argument value y=1
parameter annotation x: int
return annotation -> int
function body code object
function object creation
decorator application
name binding

Conceptual runtime order:

evaluate decorator expressions
evaluate default values
evaluate annotations according to current rules
create function object
apply decorators from bottom to top
bind final function object to name

The compiler emits bytecode that implements this order.

22.14 Class Definition Compilation

A class definition also compiles in layers.

Example:

class C(Base):
    x = 1

    def f(self):
        return self.x

The class body becomes a code object.

At runtime:

evaluate base classes
create class namespace
execute class body code object in that namespace
call metaclass machinery
bind resulting class object to name

The compiler emits bytecode that calls the internal class-building protocol.

A class body is executable code. It can contain arbitrary statements:

class C:
    print("building class")
    x = compute()

Those statements run when the class definition executes.

22.15 Closure Compilation

Closures require coordinated work between symbol table analysis and bytecode generation.

Example:

def outer():
    x = 1

    def inner():
        return x

    return inner

The compiler must:

create a cell for x in outer
compile inner with x as a free variable
create inner function with closure tuple
load x through closure access in inner

Simplified shape inside outer:

MAKE_CELL x
LOAD_CONST 1
STORE_DEREF x
LOAD_CLOSURE x
BUILD_TUPLE 1
LOAD_CONST <code object inner>
MAKE_FUNCTION closure
STORE_FAST inner
LOAD_FAST inner
RETURN_VALUE

Inside inner:

LOAD_DEREF x
RETURN_VALUE

The exact instructions vary by version, but the closure mechanism remains: captured variables live in cells.

22.16 Comprehension Compilation

Comprehensions compile as nested code objects.

Example:

values = [x * x for x in range(5)]

Conceptually, CPython creates code similar to an implicit nested function for the comprehension body.

The outer code evaluates the iterable and calls the comprehension code. The comprehension code loops and builds the result list.

This explains why comprehension variables do not leak into the surrounding scope.

x = 100
values = [x for x in range(3)]
print(x)

The outer x remains 100.

The comprehension’s x is local to the comprehension code object.

22.17 Generator Compilation

Generator functions compile differently from normal functions.

Example:

def gen():
    yield 1
    yield 2

The compiler marks the code object with generator flags.

When called, gen() does not execute the body immediately. It returns a generator object.

The body executes when the generator is advanced.

Compilation must emit yield points:

LOAD_CONST 1
YIELD_VALUE
resume point
LOAD_CONST 2
YIELD_VALUE
resume point
RETURN_VALUE

A generator frame can suspend and resume. The code object flags tell the runtime how to construct and execute it.

22.18 Coroutine and Async Compilation

Async functions produce coroutine code objects.

Example:

async def fetch():
    result = await client.get()
    return result

The compiler marks the code object as coroutine code.

It emits await machinery around awaitable expressions.

Conceptually:

call client.get()
obtain awaitable
suspend until awaitable completes
store result
return result

Async functions, async generators, async for, and async with all require special bytecode patterns and code flags.

22.19 Pattern Matching Compilation

Pattern matching has dedicated compilation logic.

Example:

match value:
    case 0:
        result = "zero"
    case [x, y]:
        result = x + y

The compiler must emit code to:

evaluate subject once
try first pattern
jump to body if matched
try next pattern if failed
bind captured names only on successful match
evaluate guards when present
skip remaining cases after match

Pattern matching is neither ordinary if syntax nor ordinary assignment. It has its own matching and binding rules.

22.20 Constant Handling

The compiler stores constants in co_consts.

Example:

x = 123
y = "hello"

Code object constants may include:

None
123
"hello"
nested code objects
tuples of constants
frozensets used by optimized membership tests

Inspect:

code = compile("x = 123\ny = 'hello'\n", "<input>", "exec")
print(code.co_consts)

Constants are referenced by index from bytecode instructions.

22.21 Name and Variable Tables

The compiler builds several name tables.

Code object fieldMeaning
co_namesNames used for globals, attributes, and imports
co_varnamesFast local variable names
co_cellvarsLocals captured by nested scopes
co_freevarsVariables captured from enclosing scopes

Example:

x = 1

def f(a):
    b = len(a)
    return x + b

Inside f:

co_varnames:
    a, b

co_names:
    len, x

If closures are involved, co_cellvars and co_freevars appear.

22.22 Stack Size Computation

CPython bytecode uses a value stack.

The compiler must compute the maximum stack depth needed by a code object.

Example:

x = a + b * c

Possible stack evolution:

LOAD a        stack: a
LOAD b        stack: a, b
LOAD c        stack: a, b, c
MULTIPLY      stack: a, result
ADD           stack: result
STORE x       stack:

Maximum stack depth: 3.

The code object stores this as co_stacksize.

The interpreter uses it to size frame storage.

22.23 Line Number and Position Tables

The compiler records mappings between bytecode and source positions.

These mappings support:

tracebacks
debuggers
profilers
coverage tools
stepping behavior
error locations

Example:

def f():
    x = 1
    y = 2
    return x + y

Each bytecode range can be associated with source line and column information.

Modern CPython tracks more precise position information than older versions, which improves tracebacks and debugging.

22.24 Assembly

The compiler often emits an intermediate instruction representation before final bytecode bytes.

Assembly resolves:

labels
jump targets
instruction offsets
extended arguments
line tables
exception tables
final bytecode layout

This is necessary because jump offsets depend on instruction sizes, and instruction sizes can change when offsets require extended arguments.

Conceptually:

emit symbolic instructions
compute instruction offsets
resolve jumps
insert extended arguments if needed
recompute if sizes changed
build final bytecode bytes
build metadata tables
create code object

22.25 AST Optimization

CPython performs some optimization at the AST or compiler level.

Examples may include:

constant folding
removing unreachable assert code under optimization
simplifying literal containers for membership tests
basic expression simplifications

Example:

x = 1 + 2

may compile as if written:

x = 3

Inspect:

import dis

dis.dis(compile("x = 1 + 2\n", "<input>", "exec"))

Optimization is conservative. CPython must preserve Python semantics, including side effects, exceptions, and evaluation order.

22.26 Evaluation Order

The compiler must preserve Python’s evaluation order.

Example:

result = f() + g()

f() must execute before g().

Example:

obj.attr = value()

The target object and value are evaluated in the order required by the language.

Example:

d[key()] = value()

The compiler cannot freely reorder operations just because another order might be faster.

Python’s dynamic semantics make evaluation order part of observable behavior.

22.27 Error Handling During Compilation

Some errors are detected during compilation, not parsing.

Examples:

return 1

at module level.

break

outside a loop.

continue

outside a loop.

The parser can build nodes for these statements. The compiler rejects them when context makes them illegal.

This means a syntax-shaped AST may still fail to compile.

22.28 Public APIs for Compilation

Python exposes compilation through compile().

Example:

code = compile("x = 1\n", "<input>", "exec")
exec(code)

Modes:

ModeMeaning
execCompile a module or statement sequence
evalCompile a single expression
singleCompile interactive input

Examples:

compile("x = 1", "<input>", "exec")
compile("1 + 2", "<input>", "eval")
compile("print(1)", "<input>", "single")

compile() can also compile an AST:

import ast

tree = ast.parse("x = 1\n")
code = compile(tree, "<ast>", "exec")
exec(code)

22.29 Important CPython Source Areas

Important compiler-related files include:

Python/compile.c
Python/symtable.c
Python/ast.c
Python/ast_opt.c
Python/flowgraph.c
Python/assemble.c
Include/cpython/code.h
Lib/dis.py
Lib/test/test_compile.py
Lib/test/test_dis.py

Conceptual roles:

AreaRole
symtable.cScope classification
compile.cAST-to-instruction compilation
flowgraph.cControl-flow graph handling
assemble.cFinal bytecode assembly
code.hCode object definitions
dis.pyBytecode inspection
compiler testsBehavior and regression protection

The exact file organization can shift across versions, but the conceptual pipeline remains stable.

22.30 Minimal Mental Model

Use this model:

The AST gives syntax.
The symbol table gives scope.
The compiler walks the AST and emits bytecode instructions.
Nested executable constructs produce nested code objects.
Control flow becomes jumps and exception tables.
Names become local, global, or closure bytecode.
Constants and names are stored in code object tables.
Assembly resolves labels, jumps, stack size, and metadata.
The result is an immutable code object ready for execution.

Compiler passes are where CPython turns syntax and scope into executable instructions.