End-to-end journey of a .py file through lexing, parsing, compilation, and bytecode evaluation.
CPython does not execute Python source text directly. It transforms source text through several internal representations before the first bytecode instruction runs.
The path is:
source text
↓
tokens
↓
parse tree
↓
abstract syntax tree
↓
symbol table
↓
code object
↓
frame
↓
bytecode evaluation
↓
object operationsEach stage has a separate job. The tokenizer understands characters. The parser understands syntax. The AST represents program structure. The symbol table classifies names. The compiler emits bytecode. The evaluator runs bytecode against Python objects.
6.1 Source Text
The input begins as text.
x = 1 + 2
print(x)Before CPython can execute this, it must know:
where statements begin and end
which characters form names
which characters form numbers
which indentation levels define blocks
which tokens form expressions
which names are local or global
which bytecode instructions are neededPython source code is not just a string. It has encoding, line structure, indentation, comments, string literal rules, and syntax rules.
The first stage converts raw text into tokens.
6.2 Tokenization
The tokenizer reads source characters and produces tokens.
For this code:
x = 1 + 2the tokenizer produces a stream similar to:
NAME("x")
EQUAL("=")
NUMBER("1")
PLUS("+")
NUMBER("2")
NEWLINE
ENDMARKERFor block structure, indentation becomes tokens too.
if ok:
run()
else:
stop()Conceptually:
NAME("if")
NAME("ok")
COLON
NEWLINE
INDENT
NAME("run")
LPAR
RPAR
NEWLINE
DEDENT
NAME("else")
COLON
NEWLINE
INDENT
NAME("stop")
LPAR
RPAR
NEWLINE
DEDENT
ENDMARKERThis is important. Python block structure is not inferred later from whitespace. The tokenizer emits explicit INDENT and DEDENT tokens.
6.3 Parsing
The parser consumes tokens and checks whether they form valid Python syntax.
For:
x = 1 + 2the parser recognizes an assignment statement whose right side is a binary expression.
For:
def add(a, b):
return a + bthe parser recognizes:
function definition
function name
parameter list
function body
return statement
binary expressionThe parser rejects invalid token sequences:
x = + * 3This reaches the parser, but it cannot be reduced into valid syntax.
The parser’s output is a structured representation of the program. CPython then converts that structure into an AST.
6.4 Abstract Syntax Tree
The AST represents the semantic structure of the program.
For:
x = 1 + 2the AST is conceptually:
Module
Assign
target: Name("x", Store)
value:
BinOp
left: Constant(1)
op: Add
right: Constant(2)The AST removes many surface details and keeps the structure needed by later compiler stages.
You can inspect the AST from Python:
import ast
tree = ast.parse("x = 1 + 2")
print(ast.dump(tree, indent=4))Example output shape:
Module(
body=[
Assign(
targets=[
Name(id='x', ctx=Store())],
value=BinOp(
left=Constant(value=1),
op=Add(),
right=Constant(value=2)))],
type_ignores=[])The AST says what the program means structurally. It does not yet say which bytecode instructions to emit.
6.5 Name Contexts
AST names carry context.
In this code:
x = x + 1the two uses of x have different roles.
left side x Store
right side x LoadConceptually:
Assign
target: Name("x", Store)
value:
BinOp
left: Name("x", Load)
op: Add
right: Constant(1)This distinction matters because loading a name and storing a name compile to different operations.
Load context read a value
Store context assign a value
Del context delete a bindingThe compiler relies on this information when emitting bytecode.
6.6 Symbol Table Analysis
Before bytecode generation, CPython analyzes names.
It decides whether each name is:
local
global
nonlocal
free
cell
implicit builtin lookupExample:
x = 10
def f(y):
return x + yInside f:
y is local
x is global or builtin lookupAnother example:
def outer():
x = 10
def inner():
return x
return innerHere:
x is local in outer
x is free in inner
x becomes a cell variable in outerA cell variable is a local variable that must survive because an inner function captures it. A free variable is a variable used by a function but stored in an enclosing scope.
This stage is essential for closures.
6.7 Code Objects
After parsing and symbol analysis, CPython compiles code into code objects.
A code object contains:
bytecode
constants
names
local variable names
free variable names
cell variable names
stack size
flags
filename
function name
line mapping information
exception tableYou can inspect a code object:
def add(a, b):
return a + b
code = add.__code__
print(code.co_name)
print(code.co_varnames)
print(code.co_consts)
print(code.co_names)
print(code.co_freevars)
print(code.co_cellvars)The code object is immutable. It describes executable code, but it does not contain the current runtime values of local variables.
6.8 Bytecode
Bytecode is CPython’s instruction format.
For:
def add(a, b):
return a + bdisassembly may look conceptually like:
LOAD_FAST a
LOAD_FAST b
BINARY_OP +
RETURN_VALUEActual bytecode names and layout vary by Python version. The core idea remains: bytecode instructions operate on a frame.
Use dis:
import dis
def add(a, b):
return a + b
dis.dis(add)Bytecode is lower-level than the AST. It is close to execution.
The AST says:
return a + bThe bytecode says:
load a
load b
perform addition
return result6.9 Constants and Names
A code object stores constants and names separately from bytecode.
For:
x = 10
print(x)the constant 10 is stored in the constants table. The names x and print are stored in the names table.
Conceptually:
co_consts = (10, None)
co_names = ("x", "print")The bytecode then references these tables by index.
LOAD_CONST 0 load constant 10
STORE_NAME 0 store into name x
LOAD_NAME 1 load name print
LOAD_NAME 0 load name x
CALL 1
POP_TOP
LOAD_CONST 1 load None
RETURN_VALUEThis makes bytecode compact. Instructions store small indexes instead of full strings or objects.
6.10 Module Execution
A Python file is compiled into a module-level code object.
For:
# demo.py
x = 1
def f():
return xCPython compiles the whole file into one code object. Executing that code object creates module bindings.
Conceptually:
create module object
create module dictionary
execute module code object in that dictionary
bind x
create function object f
bind fThe function body is also compiled into its own code object. The module code object contains that function code object as a constant.
This explains why defining a function executes code at module import time: the body does not run, but the function object is created and bound.
6.11 Function Definition
A function definition is executable code.
For:
def add(a, b):
return a + bCPython does not run the body immediately. It creates a function object.
Conceptually:
load code object for add
load function name
create function object
store function object in current namespaceThe function object contains:
code object
globals dictionary
default values
closure cells
annotations
metadataLater, when the function is called, CPython creates a frame from that function object and executes the function’s code object.
6.12 Frame Creation
A frame is created when CPython executes a code object.
For a function call:
add(2, 3)CPython creates a frame with:
code object for add
globals from add.__globals__
builtins
local slots
argument values
value stack
instruction pointer
exception stateThe arguments are placed into local variable slots.
a = 2
b = 3Then the bytecode evaluator starts executing the frame.
6.13 Evaluation Stack
Most bytecode instructions communicate through the frame’s value stack.
For:
return a + bthe execution is:
LOAD_FAST a push value of a
LOAD_FAST b push value of b
BINARY_OP + pop two values, add, push result
RETURN_VALUE pop result and return itThe local variables are stored separately from temporary stack values.
locals:
a = 2
b = 3
stack:
temporary values used by bytecodeThis is why CPython is called a stack-based virtual machine.
6.14 Object Operations
Bytecode instructions operate on Python objects, not raw C primitives.
When CPython executes:
a + bit does not assume that a and b are machine integers.
They may be:
integers
floats
strings
lists
tuples
NumPy arrays
user-defined objectsThe operation dispatches through the object protocol.
For int + int, CPython uses integer addition. For str + str, it uses string concatenation. For user-defined classes, it may call __add__.
Conceptually:
BINARY_OP +
inspect operand types
find numeric operation
call appropriate slot
return Python objectThis is why bytecode remains generic while types provide concrete behavior.
6.15 Attribute Access
For:
obj.nameCPython compiles an attribute load.
Conceptually:
LOAD_FAST obj
LOAD_ATTR nameAt runtime, LOAD_ATTR performs Python attribute lookup rules:
check type descriptors
check instance dictionary
check non-data descriptors and class attributes
possibly call __getattr__
raise AttributeError if missingAttribute access is not a raw field lookup in the general case. It is a protocol operation.
This explains why attribute access can run Python code.
class C:
@property
def name(self):
print("computed")
return 42
obj = C()
obj.nameThe attribute read calls descriptor code.
6.16 Calls
For:
result = f(2, 3)CPython evaluates the callable and arguments, then performs a call.
Conceptually:
load f
load 2
load 3
call with 2 positional arguments
store resultAt runtime, the callable may be:
Python function
built-in C function
bound method
class object
object with __call__
partial object
method descriptorA Python function call creates a new frame. A C built-in call invokes a C function wrapper. A class call allocates and initializes an instance.
The bytecode instruction is generic. Runtime dispatch decides the exact call path.
6.17 Control Flow
Control flow is compiled into jumps.
For:
if x:
a()
else:
b()CPython emits bytecode shaped like:
load x
jump if false to else
call a
jump to end
else:
call b
end:For loops compile into iterator protocol operations plus jumps.
for x in items:
use(x)Conceptually:
get iterator
loop_start:
get next item
if exhausted, jump to loop_end
store x
call use(x)
jump to loop_start
loop_end:The language feature is high-level. The execution model is bytecode jumps and protocol calls.
6.18 Exception Handling
Exception handling compiles into protected bytecode ranges and handler metadata.
For:
try:
risky()
except ValueError:
recover()CPython needs to know:
which bytecode range is protected
where the handler starts
which exception type to match
how to unwind the stack
where execution continuesWhen an exception occurs, the evaluator consults exception handling metadata and transfers control to the appropriate handler if one matches.
If no handler matches in the current frame, the exception propagates to the caller.
6.19 Imports
An import statement is executable code.
import mathAt runtime, CPython uses the import machinery to:
check sys.modules
find a module spec
load or create the module
execute module code if needed
bind the nameThe import system is partly implemented in Python through importlib and partly supported by C runtime code.
A module file is compiled and executed just like other Python code, but its execution namespace is the module dictionary.
6.20 Comprehensions
A comprehension has its own execution scope.
For:
squares = [x * x for x in range(10)]CPython creates code for the comprehension body.
Conceptually:
call range(10)
get iterator
create result list
run comprehension code
append each computed value
store final list in squaresThe loop variable x belongs to the comprehension’s internal scope, not the surrounding function scope.
This is why:
[x for x in range(3)]
print(x)does not bind x in the surrounding scope in modern Python.
6.21 Closures
Closures require cells.
For:
def outer():
x = 10
def inner():
return x
return innerinner uses a variable from outer.
CPython cannot store x as an ordinary fast local that disappears when outer returns. It stores x in a cell object.
Conceptually:
outer local x becomes cell variable
inner references x as free variable
inner function stores reference to the cell
cell keeps x alive after outer returnsThis is why the returned function still works:
f = outer()
print(f())The value survives through the closure cell.
6.22 Generators
A generator function compiles differently from an ordinary function.
def count():
yield 1
yield 2Calling count() creates a generator object. It does not immediately run the body.
The generator object stores suspended execution state:
code object
frame or frame-like execution state
instruction offset
local variables
value stack state
running statusEach next() resumes execution until the next yield.
g = count()
next(g)
next(g)A yield is not just a return. It suspends the frame and preserves execution state.
6.23 Coroutines
A coroutine is similar to a generator, but it participates in the await protocol.
async def fetch():
value = await operation()
return valueCalling fetch() creates a coroutine object. The body runs only when the coroutine is awaited or scheduled.
The coroutine stores suspended execution state and resumes around await points.
Conceptually:
create coroutine object
start execution
reach await
suspend coroutine
resume later with result
continue execution
return final valueThe event loop is outside the core bytecode model, but coroutine suspension and resumption are runtime features implemented by CPython objects and frames.
6.24 Class Definition
A class statement is executable code.
class C:
x = 1
def f(self):
return self.xCPython does not simply allocate a static type. It executes the class body in a temporary namespace.
Conceptually:
load class name
prepare class namespace
execute class body code object
collect attributes and methods
call metaclass
bind resulting class object to name CThis explains why class bodies can contain arbitrary code:
class C:
print("building class")
x = 1 + 2The class body executes immediately when the class statement runs.
6.25 End-to-End Example
Consider:
x = 10
def add(y):
return x + y
print(add(5))The pipeline is:
tokenize source
parse tokens
build AST
analyze symbols
compile module code object
execute module frame
bind x = 10
create function object add
bind add
load print
load add
load 5
call add
create function frame
bind y = 5
load global x
load local y
add objects
return 15
call print
finish module executionThe important point is that CPython has already done substantial work before the first line appears to run.
6.26 Where Each Stage Lives
A useful source map:
Tokenizer Parser/
Parser Parser/ and Grammar/
AST support Python/ast.c
Symbol table Python/symtable.c
Compiler Python/compile.c
Code objects Objects/codeobject.c
Function objects Objects/funcobject.c
Frames Python/ and Objects/frameobject.c
Evaluation loop Python/ceval.c and generated/interpreter files
Objects Objects/
Imports Lib/importlib/ and Python/import.cExact filenames shift over time, but this map is stable enough for reading the repository.
6.27 Mental Model
Keep this compact model:
Source code becomes tokens.
Tokens become syntax.
Syntax becomes AST.
AST plus scope analysis becomes bytecode.
Bytecode lives in code objects.
Code objects execute inside frames.
Frames use local slots and a value stack.
Bytecode instructions operate on PyObject references.
Types decide concrete behavior.The full system is large, but this sequence is the backbone.
6.28 Chapter Summary
CPython execution is a pipeline. It starts with source text and ends with object operations inside the bytecode evaluator. The tokenizer handles characters. The parser handles syntax. The AST represents structure. The symbol table classifies names. The compiler emits code objects. Frames execute code objects. Bytecode manipulates Python objects through type-defined behavior.
This pipeline explains how high-level Python constructs become concrete runtime actions.