Multi-pass lowering from AST to CFG to linear bytecode, and the role of each compiler pass in Python/compile.c.
Compiler passes turn the AST and symbol table information into an executable code object.
Earlier stages answer structural questions.
tokenizer:
What lexical units are in the source?
parser:
What syntax tree do these tokens form?
symbol table:
What scope does each name belong to?The compiler answers execution questions.
Which bytecode instructions should be emitted?
Which constants belong in co_consts?
Which names belong in co_names?
Which local variables belong in co_varnames?
Where should jumps go?
How large can the evaluation stack grow?
Which exception table entries are needed?
Which nested code objects must be created?The output is a code object. That code object can be executed by the CPython evaluation loop.
22.1 Position in the Compilation Pipeline
The compiler sits after parsing and scope analysis.
source
↓
tokenization
↓
parsing
↓
AST
↓
symbol table
↓
compiler passes
↓
code object
↓
interpreter executionThe compiler does not execute Python code. It emits bytecode and metadata.
At Python level, the compile() function exposes this stage:
src = "x = 1 + 2\n"
code = compile(src, "<input>", "exec")
print(code)You can inspect the result with dis:
import dis
code = compile("x = 1 + 2\n", "<input>", "exec")
dis.dis(code)22.2 Main Inputs to the Compiler
The compiler consumes two main products from earlier stages.
| Input | Role |
|---|---|
| AST | Describes statements, expressions, operators, and source locations |
| Symbol tables | Describe scope classification for names |
The AST says:
there is an assignment
target is Name("x")
value is BinOp(Constant(1), Add, Constant(2))The symbol table says:
x is local/global in this scopeTogether they allow bytecode generation.
For example, the compiler chooses between:
STORE_FAST
STORE_NAME
STORE_GLOBAL
STORE_DEREFdepending on scope information.
22.3 Compiler Output: Code Objects
The compiler emits code objects.
A code object contains bytecode and execution metadata.
Important fields include:
co_code bytecode bytes
co_consts constants
co_names global and attribute names
co_varnames local variable names
co_freevars free variable names
co_cellvars cell variable names
co_filename source filename
co_name function or module name
co_qualname qualified name
co_firstlineno first source line
co_flags execution flags
co_stacksize required value stack sizeExample:
def add(a, b):
return a + b
code = add.__code__
print(code.co_name)
print(code.co_varnames)
print(code.co_consts)
print(code.co_names)
print(code.co_stacksize)A code object is immutable execution data. A function object wraps a code object with runtime context such as globals, defaults, annotations, and closure cells.
22.4 Compilation Is Recursive
Nested functions, lambdas, comprehensions, classes, and generator expressions create nested code objects.
Example:
def outer(x):
def inner(y):
return x + y
return innerThe module code object contains a function code object for outer.
The outer code object contains a function code object for inner.
Conceptually:
module code object
constants:
code object for outer
constants:
code object for innerAt runtime, executing def outer creates a function object from the outer code object. Executing def inner inside outer creates another function object, with closure cells when needed.
22.5 Statement Compilation
Statements usually emit bytecode that performs actions.
Example:
x = 1Compilation pattern:
compile right-hand expression
store result into targetSimplified bytecode:
LOAD_CONST 1
STORE_NAME xExample:
return xCompilation pattern:
compile return expression
emit RETURN_VALUESimplified bytecode:
LOAD_FAST x
RETURN_VALUEThe exact bytecode changes by Python version, but the structural idea is stable.
22.6 Expression Compilation
Expressions emit bytecode that leaves a value on the evaluation stack.
Example:
a + bCompilation pattern:
load a
load b
apply binary operationSimplified bytecode:
LOAD_FAST a
LOAD_FAST b
BINARY_OP +An expression statement usually discards its result:
f()Compilation pattern:
load function
call function
pop resultSimplified bytecode:
LOAD_NAME f
CALL
POP_TOPThe compiler maintains stack discipline. Every expression has a known stack effect.
22.7 Name Compilation
Name bytecode depends on scope classification.
Example:
def f(a):
return aa is a local variable:
LOAD_FAST aExample:
x = 1
def f():
return xx is global from inside f:
LOAD_GLOBAL xExample:
def outer():
x = 1
def inner():
return xx is a closure variable from inside inner:
LOAD_DEREF xThis is why symbol table analysis precedes bytecode emission.
22.8 Target Compilation
Assignment targets compile differently from value expressions.
Example:
x = valueTarget operation:
STORE_FAST or STORE_NAME or STORE_GLOBAL or STORE_DEREFExample:
obj.attr = valueTarget operation:
compile obj
compile value
STORE_ATTR attrExample:
items[i] = valueTarget operation:
compile items
compile i
compile value
STORE_SUBSCRThe same AST shape can compile differently depending on context.
Compare:
obj.attrused as a value:
LOAD_ATTR attrand:
obj.attr = 1used as a target:
STORE_ATTR attrAST contexts such as Load, Store, and Del drive this distinction.
22.9 Control Flow Compilation
Control flow requires jumps and labels.
Example:
if x:
a = 1
else:
a = 2Compilation pattern:
compile test x
jump to else block if false
compile body
jump to end
else label:
compile else body
end label:
continueSimplified bytecode shape:
LOAD_NAME x
POP_JUMP_IF_FALSE else_label
LOAD_CONST 1
STORE_NAME a
JUMP_FORWARD end_label
else_label:
LOAD_CONST 2
STORE_NAME a
end_label:The compiler first emits symbolic jumps or internal labels. Later assembly resolves them into concrete byte offsets or instruction offsets.
22.10 Loops
Loops require entry labels, exit labels, and sometimes continue targets.
Example:
while x:
work()Compilation pattern:
loop_start:
compile test
jump to loop_end if false
compile body
jump to loop_start
loop_end:Example:
for item in items:
work(item)Compilation pattern:
compile iterable
get iterator
loop_start:
get next item or jump to loop_end
store item
compile body
jump to loop_start
loop_end:break jumps to the loop exit. continue jumps to the loop continuation point.
Nested loops require a compiler block stack so break and continue target the correct enclosing loop.
22.11 Exception Handling
Exception handling compilation is more complex because it needs control flow metadata.
Example:
try:
risky()
except ValueError:
recover()
finally:
cleanup()The compiler must produce bytecode for:
normal execution
exception matching
handler entry
handler cleanup
finally execution
reraising when neededModern CPython uses an exception table associated with the code object rather than only inline block setup opcodes. The exception table tells the interpreter which handler covers which instruction ranges.
Conceptually:
protected bytecode range:
risky()
handler:
if exception matches ValueError:
recover()
finally:
cleanup()The compiler must preserve Python’s exact exception semantics, including else, finally, raise, exception chaining, and cleanup of exception handler variables.
22.12 Function Definition Compilation
A function definition compiles in two layers.
Example:
def add(a, b):
return a + bLayer 1: compile the function body into a nested code object.
code object for add:
LOAD_FAST a
LOAD_FAST b
BINARY_OP +
RETURN_VALUELayer 2: compile the outer statement that creates the function object.
LOAD_CONST <code object add>
MAKE_FUNCTION
STORE_NAME addThis distinction is central.
The function body does not run when the compiler sees def. At runtime, executing the def statement creates a function object and binds it to the function name.
22.13 Defaults, Annotations, and Decorators
Function definitions may include defaults, annotations, and decorators.
Example:
@trace
def f(x: int, y=1) -> int:
return x + yCompilation must handle:
default argument value y=1
parameter annotation x: int
return annotation -> int
function body code object
function object creation
decorator application
name bindingConceptual runtime order:
evaluate decorator expressions
evaluate default values
evaluate annotations according to current rules
create function object
apply decorators from bottom to top
bind final function object to nameThe compiler emits bytecode that implements this order.
22.14 Class Definition Compilation
A class definition also compiles in layers.
Example:
class C(Base):
x = 1
def f(self):
return self.xThe class body becomes a code object.
At runtime:
evaluate base classes
create class namespace
execute class body code object in that namespace
call metaclass machinery
bind resulting class object to nameThe compiler emits bytecode that calls the internal class-building protocol.
A class body is executable code. It can contain arbitrary statements:
class C:
print("building class")
x = compute()Those statements run when the class definition executes.
22.15 Closure Compilation
Closures require coordinated work between symbol table analysis and bytecode generation.
Example:
def outer():
x = 1
def inner():
return x
return innerThe compiler must:
create a cell for x in outer
compile inner with x as a free variable
create inner function with closure tuple
load x through closure access in innerSimplified shape inside outer:
MAKE_CELL x
LOAD_CONST 1
STORE_DEREF x
LOAD_CLOSURE x
BUILD_TUPLE 1
LOAD_CONST <code object inner>
MAKE_FUNCTION closure
STORE_FAST inner
LOAD_FAST inner
RETURN_VALUEInside inner:
LOAD_DEREF x
RETURN_VALUEThe exact instructions vary by version, but the closure mechanism remains: captured variables live in cells.
22.16 Comprehension Compilation
Comprehensions compile as nested code objects.
Example:
values = [x * x for x in range(5)]Conceptually, CPython creates code similar to an implicit nested function for the comprehension body.
The outer code evaluates the iterable and calls the comprehension code. The comprehension code loops and builds the result list.
This explains why comprehension variables do not leak into the surrounding scope.
x = 100
values = [x for x in range(3)]
print(x)The outer x remains 100.
The comprehension’s x is local to the comprehension code object.
22.17 Generator Compilation
Generator functions compile differently from normal functions.
Example:
def gen():
yield 1
yield 2The compiler marks the code object with generator flags.
When called, gen() does not execute the body immediately. It returns a generator object.
The body executes when the generator is advanced.
Compilation must emit yield points:
LOAD_CONST 1
YIELD_VALUE
resume point
LOAD_CONST 2
YIELD_VALUE
resume point
RETURN_VALUEA generator frame can suspend and resume. The code object flags tell the runtime how to construct and execute it.
22.18 Coroutine and Async Compilation
Async functions produce coroutine code objects.
Example:
async def fetch():
result = await client.get()
return resultThe compiler marks the code object as coroutine code.
It emits await machinery around awaitable expressions.
Conceptually:
call client.get()
obtain awaitable
suspend until awaitable completes
store result
return resultAsync functions, async generators, async for, and async with all require special bytecode patterns and code flags.
22.19 Pattern Matching Compilation
Pattern matching has dedicated compilation logic.
Example:
match value:
case 0:
result = "zero"
case [x, y]:
result = x + yThe compiler must emit code to:
evaluate subject once
try first pattern
jump to body if matched
try next pattern if failed
bind captured names only on successful match
evaluate guards when present
skip remaining cases after matchPattern matching is neither ordinary if syntax nor ordinary assignment. It has its own matching and binding rules.
22.20 Constant Handling
The compiler stores constants in co_consts.
Example:
x = 123
y = "hello"Code object constants may include:
None
123
"hello"
nested code objects
tuples of constants
frozensets used by optimized membership testsInspect:
code = compile("x = 123\ny = 'hello'\n", "<input>", "exec")
print(code.co_consts)Constants are referenced by index from bytecode instructions.
22.21 Name and Variable Tables
The compiler builds several name tables.
| Code object field | Meaning |
|---|---|
co_names | Names used for globals, attributes, and imports |
co_varnames | Fast local variable names |
co_cellvars | Locals captured by nested scopes |
co_freevars | Variables captured from enclosing scopes |
Example:
x = 1
def f(a):
b = len(a)
return x + bInside f:
co_varnames:
a, b
co_names:
len, xIf closures are involved, co_cellvars and co_freevars appear.
22.22 Stack Size Computation
CPython bytecode uses a value stack.
The compiler must compute the maximum stack depth needed by a code object.
Example:
x = a + b * cPossible stack evolution:
LOAD a stack: a
LOAD b stack: a, b
LOAD c stack: a, b, c
MULTIPLY stack: a, result
ADD stack: result
STORE x stack:Maximum stack depth: 3.
The code object stores this as co_stacksize.
The interpreter uses it to size frame storage.
22.23 Line Number and Position Tables
The compiler records mappings between bytecode and source positions.
These mappings support:
tracebacks
debuggers
profilers
coverage tools
stepping behavior
error locationsExample:
def f():
x = 1
y = 2
return x + yEach bytecode range can be associated with source line and column information.
Modern CPython tracks more precise position information than older versions, which improves tracebacks and debugging.
22.24 Assembly
The compiler often emits an intermediate instruction representation before final bytecode bytes.
Assembly resolves:
labels
jump targets
instruction offsets
extended arguments
line tables
exception tables
final bytecode layoutThis is necessary because jump offsets depend on instruction sizes, and instruction sizes can change when offsets require extended arguments.
Conceptually:
emit symbolic instructions
compute instruction offsets
resolve jumps
insert extended arguments if needed
recompute if sizes changed
build final bytecode bytes
build metadata tables
create code object22.25 AST Optimization
CPython performs some optimization at the AST or compiler level.
Examples may include:
constant folding
removing unreachable assert code under optimization
simplifying literal containers for membership tests
basic expression simplificationsExample:
x = 1 + 2may compile as if written:
x = 3Inspect:
import dis
dis.dis(compile("x = 1 + 2\n", "<input>", "exec"))Optimization is conservative. CPython must preserve Python semantics, including side effects, exceptions, and evaluation order.
22.26 Evaluation Order
The compiler must preserve Python’s evaluation order.
Example:
result = f() + g()f() must execute before g().
Example:
obj.attr = value()The target object and value are evaluated in the order required by the language.
Example:
d[key()] = value()The compiler cannot freely reorder operations just because another order might be faster.
Python’s dynamic semantics make evaluation order part of observable behavior.
22.27 Error Handling During Compilation
Some errors are detected during compilation, not parsing.
Examples:
return 1at module level.
breakoutside a loop.
continueoutside a loop.
The parser can build nodes for these statements. The compiler rejects them when context makes them illegal.
This means a syntax-shaped AST may still fail to compile.
22.28 Public APIs for Compilation
Python exposes compilation through compile().
Example:
code = compile("x = 1\n", "<input>", "exec")
exec(code)Modes:
| Mode | Meaning |
|---|---|
exec | Compile a module or statement sequence |
eval | Compile a single expression |
single | Compile interactive input |
Examples:
compile("x = 1", "<input>", "exec")
compile("1 + 2", "<input>", "eval")
compile("print(1)", "<input>", "single")compile() can also compile an AST:
import ast
tree = ast.parse("x = 1\n")
code = compile(tree, "<ast>", "exec")
exec(code)22.29 Important CPython Source Areas
Important compiler-related files include:
Python/compile.c
Python/symtable.c
Python/ast.c
Python/ast_opt.c
Python/flowgraph.c
Python/assemble.c
Include/cpython/code.h
Lib/dis.py
Lib/test/test_compile.py
Lib/test/test_dis.pyConceptual roles:
| Area | Role |
|---|---|
symtable.c | Scope classification |
compile.c | AST-to-instruction compilation |
flowgraph.c | Control-flow graph handling |
assemble.c | Final bytecode assembly |
code.h | Code object definitions |
dis.py | Bytecode inspection |
| compiler tests | Behavior and regression protection |
The exact file organization can shift across versions, but the conceptual pipeline remains stable.
22.30 Minimal Mental Model
Use this model:
The AST gives syntax.
The symbol table gives scope.
The compiler walks the AST and emits bytecode instructions.
Nested executable constructs produce nested code objects.
Control flow becomes jumps and exception tables.
Names become local, global, or closure bytecode.
Constants and names are stored in code object tables.
Assembly resolves labels, jumps, stack size, and metadata.
The result is an immutable code object ready for execution.Compiler passes are where CPython turns syntax and scope into executable instructions.