Skip to content

6. From Source Code to Execution

End-to-end journey of a .py file through lexing, parsing, compilation, and bytecode evaluation.

CPython does not execute Python source text directly. It transforms source text through several internal representations before the first bytecode instruction runs.

The path is:

source text
tokens
parse tree
abstract syntax tree
symbol table
code object
frame
bytecode evaluation
object operations

Each stage has a separate job. The tokenizer understands characters. The parser understands syntax. The AST represents program structure. The symbol table classifies names. The compiler emits bytecode. The evaluator runs bytecode against Python objects.

6.1 Source Text

The input begins as text.

x = 1 + 2
print(x)

Before CPython can execute this, it must know:

where statements begin and end
which characters form names
which characters form numbers
which indentation levels define blocks
which tokens form expressions
which names are local or global
which bytecode instructions are needed

Python source code is not just a string. It has encoding, line structure, indentation, comments, string literal rules, and syntax rules.

The first stage converts raw text into tokens.

6.2 Tokenization

The tokenizer reads source characters and produces tokens.

For this code:

x = 1 + 2

the tokenizer produces a stream similar to:

NAME("x")
EQUAL("=")
NUMBER("1")
PLUS("+")
NUMBER("2")
NEWLINE
ENDMARKER

For block structure, indentation becomes tokens too.

if ok:
    run()
else:
    stop()

Conceptually:

NAME("if")
NAME("ok")
COLON
NEWLINE
INDENT
NAME("run")
LPAR
RPAR
NEWLINE
DEDENT
NAME("else")
COLON
NEWLINE
INDENT
NAME("stop")
LPAR
RPAR
NEWLINE
DEDENT
ENDMARKER

This is important. Python block structure is not inferred later from whitespace. The tokenizer emits explicit INDENT and DEDENT tokens.

6.3 Parsing

The parser consumes tokens and checks whether they form valid Python syntax.

For:

x = 1 + 2

the parser recognizes an assignment statement whose right side is a binary expression.

For:

def add(a, b):
    return a + b

the parser recognizes:

function definition
function name
parameter list
function body
return statement
binary expression

The parser rejects invalid token sequences:

x = + * 3

This reaches the parser, but it cannot be reduced into valid syntax.

The parser’s output is a structured representation of the program. CPython then converts that structure into an AST.

6.4 Abstract Syntax Tree

The AST represents the semantic structure of the program.

For:

x = 1 + 2

the AST is conceptually:

Module
    Assign
        target: Name("x", Store)
        value:
            BinOp
                left: Constant(1)
                op: Add
                right: Constant(2)

The AST removes many surface details and keeps the structure needed by later compiler stages.

You can inspect the AST from Python:

import ast

tree = ast.parse("x = 1 + 2")
print(ast.dump(tree, indent=4))

Example output shape:

Module(
    body=[
        Assign(
            targets=[
                Name(id='x', ctx=Store())],
            value=BinOp(
                left=Constant(value=1),
                op=Add(),
                right=Constant(value=2)))],
    type_ignores=[])

The AST says what the program means structurally. It does not yet say which bytecode instructions to emit.

6.5 Name Contexts

AST names carry context.

In this code:

x = x + 1

the two uses of x have different roles.

left side x     Store
right side x    Load

Conceptually:

Assign
    target: Name("x", Store)
    value:
        BinOp
            left: Name("x", Load)
            op: Add
            right: Constant(1)

This distinction matters because loading a name and storing a name compile to different operations.

Load context     read a value
Store context    assign a value
Del context      delete a binding

The compiler relies on this information when emitting bytecode.

6.6 Symbol Table Analysis

Before bytecode generation, CPython analyzes names.

It decides whether each name is:

local
global
nonlocal
free
cell
implicit builtin lookup

Example:

x = 10

def f(y):
    return x + y

Inside f:

y is local
x is global or builtin lookup

Another example:

def outer():
    x = 10

    def inner():
        return x

    return inner

Here:

x is local in outer
x is free in inner
x becomes a cell variable in outer

A cell variable is a local variable that must survive because an inner function captures it. A free variable is a variable used by a function but stored in an enclosing scope.

This stage is essential for closures.

6.7 Code Objects

After parsing and symbol analysis, CPython compiles code into code objects.

A code object contains:

bytecode
constants
names
local variable names
free variable names
cell variable names
stack size
flags
filename
function name
line mapping information
exception table

You can inspect a code object:

def add(a, b):
    return a + b

code = add.__code__

print(code.co_name)
print(code.co_varnames)
print(code.co_consts)
print(code.co_names)
print(code.co_freevars)
print(code.co_cellvars)

The code object is immutable. It describes executable code, but it does not contain the current runtime values of local variables.

6.8 Bytecode

Bytecode is CPython’s instruction format.

For:

def add(a, b):
    return a + b

disassembly may look conceptually like:

LOAD_FAST a
LOAD_FAST b
BINARY_OP +
RETURN_VALUE

Actual bytecode names and layout vary by Python version. The core idea remains: bytecode instructions operate on a frame.

Use dis:

import dis

def add(a, b):
    return a + b

dis.dis(add)

Bytecode is lower-level than the AST. It is close to execution.

The AST says:

return a + b

The bytecode says:

load a
load b
perform addition
return result

6.9 Constants and Names

A code object stores constants and names separately from bytecode.

For:

x = 10
print(x)

the constant 10 is stored in the constants table. The names x and print are stored in the names table.

Conceptually:

co_consts = (10, None)
co_names  = ("x", "print")

The bytecode then references these tables by index.

LOAD_CONST 0       load constant 10
STORE_NAME 0       store into name x
LOAD_NAME 1        load name print
LOAD_NAME 0        load name x
CALL 1
POP_TOP
LOAD_CONST 1       load None
RETURN_VALUE

This makes bytecode compact. Instructions store small indexes instead of full strings or objects.

6.10 Module Execution

A Python file is compiled into a module-level code object.

For:

# demo.py
x = 1

def f():
    return x

CPython compiles the whole file into one code object. Executing that code object creates module bindings.

Conceptually:

create module object
create module dictionary
execute module code object in that dictionary
bind x
create function object f
bind f

The function body is also compiled into its own code object. The module code object contains that function code object as a constant.

This explains why defining a function executes code at module import time: the body does not run, but the function object is created and bound.

6.11 Function Definition

A function definition is executable code.

For:

def add(a, b):
    return a + b

CPython does not run the body immediately. It creates a function object.

Conceptually:

load code object for add
load function name
create function object
store function object in current namespace

The function object contains:

code object
globals dictionary
default values
closure cells
annotations
metadata

Later, when the function is called, CPython creates a frame from that function object and executes the function’s code object.

6.12 Frame Creation

A frame is created when CPython executes a code object.

For a function call:

add(2, 3)

CPython creates a frame with:

code object for add
globals from add.__globals__
builtins
local slots
argument values
value stack
instruction pointer
exception state

The arguments are placed into local variable slots.

a = 2
b = 3

Then the bytecode evaluator starts executing the frame.

6.13 Evaluation Stack

Most bytecode instructions communicate through the frame’s value stack.

For:

return a + b

the execution is:

LOAD_FAST a      push value of a
LOAD_FAST b      push value of b
BINARY_OP +      pop two values, add, push result
RETURN_VALUE     pop result and return it

The local variables are stored separately from temporary stack values.

locals:
    a = 2
    b = 3

stack:
    temporary values used by bytecode

This is why CPython is called a stack-based virtual machine.

6.14 Object Operations

Bytecode instructions operate on Python objects, not raw C primitives.

When CPython executes:

a + b

it does not assume that a and b are machine integers.

They may be:

integers
floats
strings
lists
tuples
NumPy arrays
user-defined objects

The operation dispatches through the object protocol.

For int + int, CPython uses integer addition. For str + str, it uses string concatenation. For user-defined classes, it may call __add__.

Conceptually:

BINARY_OP +
    inspect operand types
    find numeric operation
    call appropriate slot
    return Python object

This is why bytecode remains generic while types provide concrete behavior.

6.15 Attribute Access

For:

obj.name

CPython compiles an attribute load.

Conceptually:

LOAD_FAST obj
LOAD_ATTR name

At runtime, LOAD_ATTR performs Python attribute lookup rules:

check type descriptors
check instance dictionary
check non-data descriptors and class attributes
possibly call __getattr__
raise AttributeError if missing

Attribute access is not a raw field lookup in the general case. It is a protocol operation.

This explains why attribute access can run Python code.

class C:
    @property
    def name(self):
        print("computed")
        return 42

obj = C()
obj.name

The attribute read calls descriptor code.

6.16 Calls

For:

result = f(2, 3)

CPython evaluates the callable and arguments, then performs a call.

Conceptually:

load f
load 2
load 3
call with 2 positional arguments
store result

At runtime, the callable may be:

Python function
built-in C function
bound method
class object
object with __call__
partial object
method descriptor

A Python function call creates a new frame. A C built-in call invokes a C function wrapper. A class call allocates and initializes an instance.

The bytecode instruction is generic. Runtime dispatch decides the exact call path.

6.17 Control Flow

Control flow is compiled into jumps.

For:

if x:
    a()
else:
    b()

CPython emits bytecode shaped like:

load x
jump if false to else
call a
jump to end
else:
call b
end:

For loops compile into iterator protocol operations plus jumps.

for x in items:
    use(x)

Conceptually:

get iterator
loop_start:
    get next item
    if exhausted, jump to loop_end
    store x
    call use(x)
    jump to loop_start
loop_end:

The language feature is high-level. The execution model is bytecode jumps and protocol calls.

6.18 Exception Handling

Exception handling compiles into protected bytecode ranges and handler metadata.

For:

try:
    risky()
except ValueError:
    recover()

CPython needs to know:

which bytecode range is protected
where the handler starts
which exception type to match
how to unwind the stack
where execution continues

When an exception occurs, the evaluator consults exception handling metadata and transfers control to the appropriate handler if one matches.

If no handler matches in the current frame, the exception propagates to the caller.

6.19 Imports

An import statement is executable code.

import math

At runtime, CPython uses the import machinery to:

check sys.modules
find a module spec
load or create the module
execute module code if needed
bind the name

The import system is partly implemented in Python through importlib and partly supported by C runtime code.

A module file is compiled and executed just like other Python code, but its execution namespace is the module dictionary.

6.20 Comprehensions

A comprehension has its own execution scope.

For:

squares = [x * x for x in range(10)]

CPython creates code for the comprehension body.

Conceptually:

call range(10)
get iterator
create result list
run comprehension code
append each computed value
store final list in squares

The loop variable x belongs to the comprehension’s internal scope, not the surrounding function scope.

This is why:

[x for x in range(3)]
print(x)

does not bind x in the surrounding scope in modern Python.

6.21 Closures

Closures require cells.

For:

def outer():
    x = 10

    def inner():
        return x

    return inner

inner uses a variable from outer.

CPython cannot store x as an ordinary fast local that disappears when outer returns. It stores x in a cell object.

Conceptually:

outer local x becomes cell variable
inner references x as free variable
inner function stores reference to the cell
cell keeps x alive after outer returns

This is why the returned function still works:

f = outer()
print(f())

The value survives through the closure cell.

6.22 Generators

A generator function compiles differently from an ordinary function.

def count():
    yield 1
    yield 2

Calling count() creates a generator object. It does not immediately run the body.

The generator object stores suspended execution state:

code object
frame or frame-like execution state
instruction offset
local variables
value stack state
running status

Each next() resumes execution until the next yield.

g = count()
next(g)
next(g)

A yield is not just a return. It suspends the frame and preserves execution state.

6.23 Coroutines

A coroutine is similar to a generator, but it participates in the await protocol.

async def fetch():
    value = await operation()
    return value

Calling fetch() creates a coroutine object. The body runs only when the coroutine is awaited or scheduled.

The coroutine stores suspended execution state and resumes around await points.

Conceptually:

create coroutine object
start execution
reach await
suspend coroutine
resume later with result
continue execution
return final value

The event loop is outside the core bytecode model, but coroutine suspension and resumption are runtime features implemented by CPython objects and frames.

6.24 Class Definition

A class statement is executable code.

class C:
    x = 1

    def f(self):
        return self.x

CPython does not simply allocate a static type. It executes the class body in a temporary namespace.

Conceptually:

load class name
prepare class namespace
execute class body code object
collect attributes and methods
call metaclass
bind resulting class object to name C

This explains why class bodies can contain arbitrary code:

class C:
    print("building class")
    x = 1 + 2

The class body executes immediately when the class statement runs.

6.25 End-to-End Example

Consider:

x = 10

def add(y):
    return x + y

print(add(5))

The pipeline is:

tokenize source
parse tokens
build AST
analyze symbols
compile module code object
execute module frame
    bind x = 10
    create function object add
    bind add
    load print
    load add
    load 5
    call add
        create function frame
        bind y = 5
        load global x
        load local y
        add objects
        return 15
    call print
finish module execution

The important point is that CPython has already done substantial work before the first line appears to run.

6.26 Where Each Stage Lives

A useful source map:

Tokenizer          Parser/
Parser             Parser/ and Grammar/
AST support        Python/ast.c
Symbol table       Python/symtable.c
Compiler           Python/compile.c
Code objects       Objects/codeobject.c
Function objects   Objects/funcobject.c
Frames             Python/ and Objects/frameobject.c
Evaluation loop    Python/ceval.c and generated/interpreter files
Objects            Objects/
Imports            Lib/importlib/ and Python/import.c

Exact filenames shift over time, but this map is stable enough for reading the repository.

6.27 Mental Model

Keep this compact model:

Source code becomes tokens.
Tokens become syntax.
Syntax becomes AST.
AST plus scope analysis becomes bytecode.
Bytecode lives in code objects.
Code objects execute inside frames.
Frames use local slots and a value stack.
Bytecode instructions operate on PyObject references.
Types decide concrete behavior.

The full system is large, but this sequence is the backbone.

6.28 Chapter Summary

CPython execution is a pipeline. It starts with source text and ends with object operations inside the bytecode evaluator. The tokenizer handles characters. The parser handles syntax. The AST represents structure. The symbol table classifies names. The compiler emits code objects. Frames execute code objects. Bytecode manipulates Python objects through type-defined behavior.

This pipeline explains how high-level Python constructs become concrete runtime actions.