# 6. From Source Code to Execution

# 6. From Source Code to Execution

CPython does not execute Python source text directly. It transforms source text through several internal representations before the first bytecode instruction runs.

The path is:

```text
source text
    ↓
tokens
    ↓
parse tree
    ↓
abstract syntax tree
    ↓
symbol table
    ↓
code object
    ↓
frame
    ↓
bytecode evaluation
    ↓
object operations
```

Each stage has a separate job. The tokenizer understands characters. The parser understands syntax. The AST represents program structure. The symbol table classifies names. The compiler emits bytecode. The evaluator runs bytecode against Python objects.

## 6.1 Source Text

The input begins as text.

```python
x = 1 + 2
print(x)
```

Before CPython can execute this, it must know:

```text
where statements begin and end
which characters form names
which characters form numbers
which indentation levels define blocks
which tokens form expressions
which names are local or global
which bytecode instructions are needed
```

Python source code is not just a string. It has encoding, line structure, indentation, comments, string literal rules, and syntax rules.

The first stage converts raw text into tokens.

## 6.2 Tokenization

The tokenizer reads source characters and produces tokens.

For this code:

```python
x = 1 + 2
```

the tokenizer produces a stream similar to:

```text
NAME("x")
EQUAL("=")
NUMBER("1")
PLUS("+")
NUMBER("2")
NEWLINE
ENDMARKER
```

For block structure, indentation becomes tokens too.

```python
if ok:
    run()
else:
    stop()
```

Conceptually:

```text
NAME("if")
NAME("ok")
COLON
NEWLINE
INDENT
NAME("run")
LPAR
RPAR
NEWLINE
DEDENT
NAME("else")
COLON
NEWLINE
INDENT
NAME("stop")
LPAR
RPAR
NEWLINE
DEDENT
ENDMARKER
```

This is important. Python block structure is not inferred later from whitespace. The tokenizer emits explicit `INDENT` and `DEDENT` tokens.

## 6.3 Parsing

The parser consumes tokens and checks whether they form valid Python syntax.

For:

```python
x = 1 + 2
```

the parser recognizes an assignment statement whose right side is a binary expression.

For:

```python
def add(a, b):
    return a + b
```

the parser recognizes:

```text
function definition
function name
parameter list
function body
return statement
binary expression
```

The parser rejects invalid token sequences:

```python
x = + * 3
```

This reaches the parser, but it cannot be reduced into valid syntax.

The parser’s output is a structured representation of the program. CPython then converts that structure into an AST.

## 6.4 Abstract Syntax Tree

The AST represents the semantic structure of the program.

For:

```python
x = 1 + 2
```

the AST is conceptually:

```text
Module
    Assign
        target: Name("x", Store)
        value:
            BinOp
                left: Constant(1)
                op: Add
                right: Constant(2)
```

The AST removes many surface details and keeps the structure needed by later compiler stages.

You can inspect the AST from Python:

```python
import ast

tree = ast.parse("x = 1 + 2")
print(ast.dump(tree, indent=4))
```

Example output shape:

```text
Module(
    body=[
        Assign(
            targets=[
                Name(id='x', ctx=Store())],
            value=BinOp(
                left=Constant(value=1),
                op=Add(),
                right=Constant(value=2)))],
    type_ignores=[])
```

The AST says what the program means structurally. It does not yet say which bytecode instructions to emit.

## 6.5 Name Contexts

AST names carry context.

In this code:

```python
x = x + 1
```

the two uses of `x` have different roles.

```text
left side x     Store
right side x    Load
```

Conceptually:

```text
Assign
    target: Name("x", Store)
    value:
        BinOp
            left: Name("x", Load)
            op: Add
            right: Constant(1)
```

This distinction matters because loading a name and storing a name compile to different operations.

```text
Load context     read a value
Store context    assign a value
Del context      delete a binding
```

The compiler relies on this information when emitting bytecode.

## 6.6 Symbol Table Analysis

Before bytecode generation, CPython analyzes names.

It decides whether each name is:

```text
local
global
nonlocal
free
cell
implicit builtin lookup
```

Example:

```python
x = 10

def f(y):
    return x + y
```

Inside `f`:

```text
y is local
x is global or builtin lookup
```

Another example:

```python
def outer():
    x = 10

    def inner():
        return x

    return inner
```

Here:

```text
x is local in outer
x is free in inner
x becomes a cell variable in outer
```

A cell variable is a local variable that must survive because an inner function captures it. A free variable is a variable used by a function but stored in an enclosing scope.

This stage is essential for closures.

## 6.7 Code Objects

After parsing and symbol analysis, CPython compiles code into code objects.

A code object contains:

```text
bytecode
constants
names
local variable names
free variable names
cell variable names
stack size
flags
filename
function name
line mapping information
exception table
```

You can inspect a code object:

```python
def add(a, b):
    return a + b

code = add.__code__

print(code.co_name)
print(code.co_varnames)
print(code.co_consts)
print(code.co_names)
print(code.co_freevars)
print(code.co_cellvars)
```

The code object is immutable. It describes executable code, but it does not contain the current runtime values of local variables.

## 6.8 Bytecode

Bytecode is CPython’s instruction format.

For:

```python
def add(a, b):
    return a + b
```

disassembly may look conceptually like:

```text
LOAD_FAST a
LOAD_FAST b
BINARY_OP +
RETURN_VALUE
```

Actual bytecode names and layout vary by Python version. The core idea remains: bytecode instructions operate on a frame.

Use `dis`:

```python
import dis

def add(a, b):
    return a + b

dis.dis(add)
```

Bytecode is lower-level than the AST. It is close to execution.

The AST says:

```text
return a + b
```

The bytecode says:

```text
load a
load b
perform addition
return result
```

## 6.9 Constants and Names

A code object stores constants and names separately from bytecode.

For:

```python
x = 10
print(x)
```

the constant `10` is stored in the constants table. The names `x` and `print` are stored in the names table.

Conceptually:

```text
co_consts = (10, None)
co_names  = ("x", "print")
```

The bytecode then references these tables by index.

```text
LOAD_CONST 0       load constant 10
STORE_NAME 0       store into name x
LOAD_NAME 1        load name print
LOAD_NAME 0        load name x
CALL 1
POP_TOP
LOAD_CONST 1       load None
RETURN_VALUE
```

This makes bytecode compact. Instructions store small indexes instead of full strings or objects.

## 6.10 Module Execution

A Python file is compiled into a module-level code object.

For:

```python
# demo.py
x = 1

def f():
    return x
```

CPython compiles the whole file into one code object. Executing that code object creates module bindings.

Conceptually:

```text
create module object
create module dictionary
execute module code object in that dictionary
bind x
create function object f
bind f
```

The function body is also compiled into its own code object. The module code object contains that function code object as a constant.

This explains why defining a function executes code at module import time: the body does not run, but the function object is created and bound.

## 6.11 Function Definition

A function definition is executable code.

For:

```python
def add(a, b):
    return a + b
```

CPython does not run the body immediately. It creates a function object.

Conceptually:

```text
load code object for add
load function name
create function object
store function object in current namespace
```

The function object contains:

```text
code object
globals dictionary
default values
closure cells
annotations
metadata
```

Later, when the function is called, CPython creates a frame from that function object and executes the function’s code object.

## 6.12 Frame Creation

A frame is created when CPython executes a code object.

For a function call:

```python
add(2, 3)
```

CPython creates a frame with:

```text
code object for add
globals from add.__globals__
builtins
local slots
argument values
value stack
instruction pointer
exception state
```

The arguments are placed into local variable slots.

```text
a = 2
b = 3
```

Then the bytecode evaluator starts executing the frame.

## 6.13 Evaluation Stack

Most bytecode instructions communicate through the frame’s value stack.

For:

```python
return a + b
```

the execution is:

```text
LOAD_FAST a      push value of a
LOAD_FAST b      push value of b
BINARY_OP +      pop two values, add, push result
RETURN_VALUE     pop result and return it
```

The local variables are stored separately from temporary stack values.

```text
locals:
    a = 2
    b = 3

stack:
    temporary values used by bytecode
```

This is why CPython is called a stack-based virtual machine.

## 6.14 Object Operations

Bytecode instructions operate on Python objects, not raw C primitives.

When CPython executes:

```python
a + b
```

it does not assume that `a` and `b` are machine integers.

They may be:

```text
integers
floats
strings
lists
tuples
NumPy arrays
user-defined objects
```

The operation dispatches through the object protocol.

For `int + int`, CPython uses integer addition. For `str + str`, it uses string concatenation. For user-defined classes, it may call `__add__`.

Conceptually:

```text
BINARY_OP +
    inspect operand types
    find numeric operation
    call appropriate slot
    return Python object
```

This is why bytecode remains generic while types provide concrete behavior.

## 6.15 Attribute Access

For:

```python
obj.name
```

CPython compiles an attribute load.

Conceptually:

```text
LOAD_FAST obj
LOAD_ATTR name
```

At runtime, `LOAD_ATTR` performs Python attribute lookup rules:

```text
check type descriptors
check instance dictionary
check non-data descriptors and class attributes
possibly call __getattr__
raise AttributeError if missing
```

Attribute access is not a raw field lookup in the general case. It is a protocol operation.

This explains why attribute access can run Python code.

```python
class C:
    @property
    def name(self):
        print("computed")
        return 42

obj = C()
obj.name
```

The attribute read calls descriptor code.

## 6.16 Calls

For:

```python
result = f(2, 3)
```

CPython evaluates the callable and arguments, then performs a call.

Conceptually:

```text
load f
load 2
load 3
call with 2 positional arguments
store result
```

At runtime, the callable may be:

```text
Python function
built-in C function
bound method
class object
object with __call__
partial object
method descriptor
```

A Python function call creates a new frame. A C built-in call invokes a C function wrapper. A class call allocates and initializes an instance.

The bytecode instruction is generic. Runtime dispatch decides the exact call path.

## 6.17 Control Flow

Control flow is compiled into jumps.

For:

```python
if x:
    a()
else:
    b()
```

CPython emits bytecode shaped like:

```text
load x
jump if false to else
call a
jump to end
else:
call b
end:
```

For loops compile into iterator protocol operations plus jumps.

```python
for x in items:
    use(x)
```

Conceptually:

```text
get iterator
loop_start:
    get next item
    if exhausted, jump to loop_end
    store x
    call use(x)
    jump to loop_start
loop_end:
```

The language feature is high-level. The execution model is bytecode jumps and protocol calls.

## 6.18 Exception Handling

Exception handling compiles into protected bytecode ranges and handler metadata.

For:

```python
try:
    risky()
except ValueError:
    recover()
```

CPython needs to know:

```text
which bytecode range is protected
where the handler starts
which exception type to match
how to unwind the stack
where execution continues
```

When an exception occurs, the evaluator consults exception handling metadata and transfers control to the appropriate handler if one matches.

If no handler matches in the current frame, the exception propagates to the caller.

## 6.19 Imports

An import statement is executable code.

```python
import math
```

At runtime, CPython uses the import machinery to:

```text
check sys.modules
find a module spec
load or create the module
execute module code if needed
bind the name
```

The import system is partly implemented in Python through `importlib` and partly supported by C runtime code.

A module file is compiled and executed just like other Python code, but its execution namespace is the module dictionary.

## 6.20 Comprehensions

A comprehension has its own execution scope.

For:

```python
squares = [x * x for x in range(10)]
```

CPython creates code for the comprehension body.

Conceptually:

```text
call range(10)
get iterator
create result list
run comprehension code
append each computed value
store final list in squares
```

The loop variable `x` belongs to the comprehension’s internal scope, not the surrounding function scope.

This is why:

```python
[x for x in range(3)]
print(x)
```

does not bind `x` in the surrounding scope in modern Python.

## 6.21 Closures

Closures require cells.

For:

```python
def outer():
    x = 10

    def inner():
        return x

    return inner
```

`inner` uses a variable from `outer`.

CPython cannot store `x` as an ordinary fast local that disappears when `outer` returns. It stores `x` in a cell object.

Conceptually:

```text
outer local x becomes cell variable
inner references x as free variable
inner function stores reference to the cell
cell keeps x alive after outer returns
```

This is why the returned function still works:

```python
f = outer()
print(f())
```

The value survives through the closure cell.

## 6.22 Generators

A generator function compiles differently from an ordinary function.

```python
def count():
    yield 1
    yield 2
```

Calling `count()` creates a generator object. It does not immediately run the body.

The generator object stores suspended execution state:

```text
code object
frame or frame-like execution state
instruction offset
local variables
value stack state
running status
```

Each `next()` resumes execution until the next `yield`.

```python
g = count()
next(g)
next(g)
```

A `yield` is not just a return. It suspends the frame and preserves execution state.

## 6.23 Coroutines

A coroutine is similar to a generator, but it participates in the `await` protocol.

```python
async def fetch():
    value = await operation()
    return value
```

Calling `fetch()` creates a coroutine object. The body runs only when the coroutine is awaited or scheduled.

The coroutine stores suspended execution state and resumes around `await` points.

Conceptually:

```text
create coroutine object
start execution
reach await
suspend coroutine
resume later with result
continue execution
return final value
```

The event loop is outside the core bytecode model, but coroutine suspension and resumption are runtime features implemented by CPython objects and frames.

## 6.24 Class Definition

A class statement is executable code.

```python
class C:
    x = 1

    def f(self):
        return self.x
```

CPython does not simply allocate a static type. It executes the class body in a temporary namespace.

Conceptually:

```text
load class name
prepare class namespace
execute class body code object
collect attributes and methods
call metaclass
bind resulting class object to name C
```

This explains why class bodies can contain arbitrary code:

```python
class C:
    print("building class")
    x = 1 + 2
```

The class body executes immediately when the class statement runs.

## 6.25 End-to-End Example

Consider:

```python
x = 10

def add(y):
    return x + y

print(add(5))
```

The pipeline is:

```text
tokenize source
parse tokens
build AST
analyze symbols
compile module code object
execute module frame
    bind x = 10
    create function object add
    bind add
    load print
    load add
    load 5
    call add
        create function frame
        bind y = 5
        load global x
        load local y
        add objects
        return 15
    call print
finish module execution
```

The important point is that CPython has already done substantial work before the first line appears to run.

## 6.26 Where Each Stage Lives

A useful source map:

```text
Tokenizer          Parser/
Parser             Parser/ and Grammar/
AST support        Python/ast.c
Symbol table       Python/symtable.c
Compiler           Python/compile.c
Code objects       Objects/codeobject.c
Function objects   Objects/funcobject.c
Frames             Python/ and Objects/frameobject.c
Evaluation loop    Python/ceval.c and generated/interpreter files
Objects            Objects/
Imports            Lib/importlib/ and Python/import.c
```

Exact filenames shift over time, but this map is stable enough for reading the repository.

## 6.27 Mental Model

Keep this compact model:

```text
Source code becomes tokens.
Tokens become syntax.
Syntax becomes AST.
AST plus scope analysis becomes bytecode.
Bytecode lives in code objects.
Code objects execute inside frames.
Frames use local slots and a value stack.
Bytecode instructions operate on PyObject references.
Types decide concrete behavior.
```

The full system is large, but this sequence is the backbone.

## 6.28 Chapter Summary

CPython execution is a pipeline. It starts with source text and ends with object operations inside the bytecode evaluator. The tokenizer handles characters. The parser handles syntax. The AST represents structure. The symbol table classifies names. The compiler emits code objects. Frames execute code objects. Bytecode manipulates Python objects through type-defined behavior.

This pipeline explains how high-level Python constructs become concrete runtime actions.

