# 25. Bytecode Generation

# 25. Bytecode Generation

Bytecode generation is the stage where CPython transforms structured syntax into executable virtual machine instructions.

The parser builds an AST.

The symbol table determines scope behavior.

The compiler then emits bytecode instructions that implement Python semantics.

For this source:

```python id="bjlwm8"
def add(a, b):
    return a + b
```

CPython generates bytecode shaped like:

```text id="plhxq4"
LOAD_FAST a
LOAD_FAST b
BINARY_OP +
RETURN_VALUE
```

The exact instruction names and formats vary across Python versions, but the model remains stable:

```text id="ecx81v"
bytecode is a low-level instruction stream
executed by the CPython virtual machine
```

## 25.1 Position in the Compilation Pipeline

Bytecode generation happens after AST construction and scope analysis.

```text id="5v2o9v"
source
    ↓
tokenization
    ↓
parsing
    ↓
AST
    ↓
symbol table
    ↓
bytecode generation
    ↓
code object
    ↓
evaluation loop
```

The compiler walks the AST and emits instructions plus metadata.

The output becomes part of a code object.

## 25.2 What Bytecode Represents

Bytecode is a virtual instruction set for CPython.

It is not machine code.

It is not source code.

It is an intermediate execution language designed for the CPython interpreter.

Example source:

```python id="mwhlwe"
x = 1 + 2
```

Possible bytecode:

```text id="cv4c36"
LOAD_CONST 3
STORE_NAME x
LOAD_CONST None
RETURN_VALUE
```

The interpreter later executes these instructions one by one.

The compiler’s job is:

```text id="wlk7tq"
preserve Python semantics
emit correct stack behavior
emit correct control flow
emit correct scope access
record source metadata
```

## 25.3 Bytecode Is Stack-Based

CPython bytecode uses a stack machine.

Most instructions push or pop values from the evaluation stack.

Example:

```python id="qggm8k"
a + b
```

Compilation:

```text id="jlwm0j"
LOAD_FAST a
LOAD_FAST b
BINARY_OP +
```

Stack evolution:

```text id="4tq4zg"
LOAD_FAST a
    stack: a

LOAD_FAST b
    stack: a, b

BINARY_OP +
    pop a and b
    push result
    stack: result
```

The compiler must keep stack effects consistent.

Every bytecode path must maintain valid stack state.

## 25.4 Expression Compilation

Expressions produce values.

Example:

```python id="nl8bg0"
x + y * z
```

The AST preserves precedence:

```text id="0u6c93"
x + (y * z)
```

Bytecode generation follows that structure.

Conceptual bytecode:

```text id="jlwm0m"
LOAD_FAST x
LOAD_FAST y
LOAD_FAST z
BINARY_OP *
BINARY_OP +
```

Stack evolution:

```text id="f2zv3g"
x
x, y
x, y, z
x, temp
result
```

The compiler recursively compiles subexpressions.

## 25.5 Statement Compilation

Statements usually emit side-effect instructions.

Example:

```python id="rj0nwu"
x = value
```

Compilation:

```text id="zyulq0"
compile expression value
store into target x
```

Possible bytecode:

```text id="jlwm0n"
LOAD_FAST value
STORE_FAST x
```

Example:

```python id="9hq2o6"
return x
```

Compilation:

```text id="jlwm0o"
LOAD_FAST x
RETURN_VALUE
```

The compiler distinguishes between:

```text id="n0c2jr"
expressions producing values
statements performing actions
```

## 25.6 Loading Constants

Constants are loaded from `co_consts`.

Example:

```python id="s7hy3e"
x = 123
```

Bytecode:

```text id="jlwm0p"
LOAD_CONST 123
STORE_NAME x
```

The compiler inserts the constant into `co_consts` and emits an index reference.

Conceptually:

```text id="j49sul"
co_consts:
    0: 123

instruction:
    LOAD_CONST 0
```

The interpreter resolves the index during execution.

## 25.7 Loading Locals

Fast locals use indexed slots.

Example:

```python id="bhz0x7"
def f(a, b):
    return a + b
```

Bytecode:

```text id="jlwm0q"
LOAD_FAST a
LOAD_FAST b
BINARY_OP +
RETURN_VALUE
```

The compiler uses `LOAD_FAST` because symbol analysis classified `a` and `b` as local variables.

Fast locals avoid dictionary lookup.

## 25.8 Loading Globals

Global and builtin lookups use different instructions.

Example:

```python id="y3afqz"
x = 10

def f():
    return x
```

Inside `f`:

```text id="jlwm0r"
LOAD_GLOBAL x
RETURN_VALUE
```

The interpreter checks:

```text id="l1ajyz"
function globals
then builtins
```

The compiler selects `LOAD_GLOBAL` based on symbol table information.

## 25.9 Loading Closure Variables

Closure variables use dereference operations.

Example:

```python id="cv9ngd"
def outer():
    x = 1

    def inner():
        return x
```

Inside `inner`:

```text id="jlwm0s"
LOAD_DEREF x
RETURN_VALUE
```

The compiler emits dereference bytecode because `x` is a free variable captured from an enclosing scope.

Closure bytecode accesses cell objects rather than ordinary local slots.

## 25.10 Assignment Targets

Assignment targets compile differently from ordinary expressions.

Example:

```python id="mav3fq"
x = 1
```

Bytecode:

```text id="jlwm0t"
LOAD_CONST 1
STORE_NAME x
```

But attribute assignment:

```python id="y7g47q"
obj.value = 1
```

Bytecode shape:

```text id="jlwm0u"
LOAD_FAST obj
LOAD_CONST 1
STORE_ATTR value
```

Subscript assignment:

```python id="oxtk8t"
items[i] = value
```

Bytecode shape:

```text id="jlwm0v"
LOAD_FAST items
LOAD_FAST i
LOAD_FAST value
STORE_SUBSCR
```

Target compilation depends on AST context.

## 25.11 Deletion

Deletion uses dedicated instructions.

Example:

```python id="d8j5gc"
del x
```

Possible bytecode:

```text id="jlwm0w"
DELETE_FAST x
```

Example:

```python id="b4d5gn"
del obj.attr
```

Possible bytecode:

```text id="jlwm0x"
LOAD_FAST obj
DELETE_ATTR attr
```

Deletion is not assignment to `None`. It removes bindings or object entries according to target type.

## 25.12 Function Calls

Function calls generate multiple instructions.

Example:

```python id="a3j0ut"
f(x, y)
```

Conceptual bytecode:

```text id="jlwm0y"
LOAD_NAME f
LOAD_FAST x
LOAD_FAST y
CALL 2
POP_TOP
```

The compiler must:

```text id="e8qjkl"
compile callable expression
compile positional arguments
compile keyword arguments
emit call instruction
handle stack layout
```

Method calls may use specialized bytecode forms.

Example:

```python id="s4m8cq"
obj.run()
```

Possible shape:

```text id="jlwm0z"
LOAD_FAST obj
LOAD_METHOD run
CALL 0
```

Modern CPython versions contain additional specialization and inline cache behavior around calls.

## 25.13 Binary Operations

Arithmetic and binary operations emit operation instructions.

Example:

```python id="gt4c4x"
a + b
```

Bytecode:

```text id="jlwm10"
LOAD_FAST a
LOAD_FAST b
BINARY_OP +
```

Other examples:

| Expression | Operation       |
| ---------- | --------------- |
| `a - b`    | subtraction     |
| `a * b`    | multiplication  |
| `a / b`    | division        |
| `a // b`   | floor division  |
| `a % b`    | modulo          |
| `a ** b`   | power           |
| `a @ b`    | matrix multiply |
| `a << b`   | left shift      |
| `a & b`    | bitwise and     |

The compiler emits operation instructions. Runtime type dispatch happens later.

Example:

```python id="x8vwdk"
1 + 2
"a" + "b"
```

Both compile similarly, but runtime object behavior differs.

## 25.14 Comparisons

Comparisons emit comparison operations.

Example:

```python id="e98twa"
a < b
```

Possible bytecode:

```text id="jlwm11"
LOAD_FAST a
LOAD_FAST b
COMPARE_OP <
```

Chained comparisons require more complex control flow.

Example:

```python id="pxo93r"
a < b < c
```

This must evaluate `b` once.

Conceptual compilation:

```text id="jlwm12"
LOAD_FAST a
LOAD_FAST b
COMPARE_OP <
conditional jump if false
LOAD_FAST b
LOAD_FAST c
COMPARE_OP <
```

The compiler preserves Python’s chained comparison semantics.

## 25.15 Boolean Operations

Boolean operations short-circuit.

Example:

```python id="dd8v9m"
a and b
```

Compilation pattern:

```text id="jlwm13"
evaluate a
jump if false
evaluate b
```

`b` executes only if needed.

Similarly:

```python id="a3pgh7"
a or b
```

evaluates `b` only if `a` is false.

Short-circuit behavior is implemented through jumps, not ordinary function calls.

## 25.16 Conditional Expressions

Example:

```python id="r74d92"
x if cond else y
```

Compilation pattern:

```text id="jlwm14"
evaluate cond
jump to else branch if false
evaluate x
jump to end
evaluate y
```

Conditional expressions are expressions, not statements. They must leave one value on the stack regardless of branch taken.

## 25.17 `if` Statements

Example:

```python id="q70zfx"
if cond:
    a()
else:
    b()
```

Compilation pattern:

```text id="jlwm15"
compile condition
jump to else if false
compile a()
jump to end
compile b()
end
```

Possible bytecode shape:

```text id="jlwm16"
LOAD_NAME cond
POP_JUMP_IF_FALSE else_label

LOAD_NAME a
CALL
POP_TOP
JUMP_FORWARD end_label

else_label:
LOAD_NAME b
CALL
POP_TOP

end_label:
```

The compiler manages labels and jump targets internally before final assembly.

## 25.18 `while` Loops

Example:

```python id="0wdg4v"
while cond:
    work()
```

Compilation pattern:

```text id="jlwm17"
loop_start:
    evaluate cond
    jump to end if false
    compile body
    jump to loop_start
loop_end:
```

Loops require block stack tracking for:

```text id="jlwm18"
break
continue
exception cleanup
```

## 25.19 `for` Loops

Example:

```python id="uk10dk"
for item in items:
    work(item)
```

Compilation pattern:

```text id="jlwm19"
load iterable
get iterator

loop_start:
    get next item
    jump to end on StopIteration
    store item
    compile body
    jump to loop_start

loop_end:
```

The compiler emits iterator protocol bytecode.

A Python `for` loop is iterator-driven, not index-driven.

## 25.20 `break` and `continue`

Example:

```python id="h5f6ov"
while True:
    if stop:
        break

    continue
```

`break` jumps to loop exit.

`continue` jumps to loop continuation point.

The compiler maintains loop context structures so nested loops behave correctly.

Example:

```python id="r7bz0d"
for x in xs:
    for y in ys:
        break
```

The inner `break` exits only the inner loop.

## 25.21 Exception Handling

Exception handling requires structured control flow metadata.

Example:

```python id="12cm6v"
try:
    risky()
except ValueError:
    recover()
finally:
    cleanup()
```

Compilation responsibilities:

```text id="jlwm1a"
protected instruction ranges
exception handler targets
finally cleanup
reraising behavior
stack restoration
```

Modern CPython uses exception tables associated with the code object.

The compiler records:

```text id="jlwm1b"
instruction range
handler entry
handler type
stack depth information
```

This metadata lets the interpreter jump into handlers correctly when exceptions occur.

## 25.22 `with` Statements

Example:

```python id="wq6o7z"
with open(path) as f:
    data = f.read()
```

Compilation pattern:

```text id="jlwm1c"
evaluate context manager
call __enter__
store result
execute body
ensure __exit__ runs
handle exceptions correctly
```

The compiler emits cleanup logic ensuring `__exit__` executes even when exceptions occur.

`with` compilation is tightly connected to exception handling machinery.

## 25.23 Function Definitions

Function definitions compile in two stages.

Example:

```python id="8k2d07"
def f(x):
    return x + 1
```

Stage 1:

```text id="jlwm1d"
compile function body into nested code object
```

Stage 2:

```text id="jlwm1e"
emit runtime instructions creating function object
```

Conceptual bytecode:

```text id="jlwm1f"
LOAD_CONST <code object f>
MAKE_FUNCTION
STORE_NAME f
```

The body itself becomes bytecode inside the nested code object.

## 25.24 Closures

Example:

```python id="b4pshk"
def outer():
    x = 1

    def inner():
        return x
```

Compilation responsibilities:

```text id="jlwm1g"
create closure cell for x
compile inner with free variable access
pass closure tuple during function creation
```

Possible bytecode shape inside `outer`:

```text id="jlwm1h"
MAKE_CELL x
LOAD_CONST 1
STORE_DEREF x

LOAD_CLOSURE x
BUILD_TUPLE 1
LOAD_CONST <code object inner>
MAKE_FUNCTION closure
```

Inside `inner`:

```text id="jlwm1i"
LOAD_DEREF x
RETURN_VALUE
```

## 25.25 Class Definitions

Example:

```python id="1br4s3"
class C:
    x = 1
```

The class body becomes a nested code object.

Compilation pattern:

```text id="jlwm1j"
compile class body code object
emit runtime class construction logic
bind resulting class object
```

Class bodies execute like mini modules with their own namespace.

Methods become nested function definitions inside the class body code object.

## 25.26 Comprehensions

Comprehensions compile into nested scopes.

Example:

```python id="9w8n9x"
[x * x for x in xs]
```

Compilation responsibilities:

```text id="jlwm1k"
create nested comprehension code object
iterate input iterable
bind local iteration variable
append results
return constructed container
```

Comprehension variables do not leak into outer scope because the compiler creates separate execution scope machinery.

## 25.27 Generators

Generator functions use suspension points.

Example:

```python id="i8gtwx"
def gen():
    yield 1
    yield 2
```

Compilation responsibilities:

```text id="jlwm1l"
mark code object as generator
emit yield instructions
preserve resumable execution state
```

Possible bytecode shape:

```text id="jlwm1m"
LOAD_CONST 1
YIELD_VALUE

LOAD_CONST 2
YIELD_VALUE

LOAD_CONST None
RETURN_VALUE
```

The frame must preserve state across suspension.

## 25.28 Coroutines and `await`

Example:

```python id="oqf8so"
async def fetch():
    return await client.get()
```

Compilation responsibilities:

```text id="jlwm1n"
mark coroutine flags
emit await handling
preserve suspension semantics
```

The compiler generates bytecode for coroutine scheduling behavior rather than ordinary synchronous calls.

## 25.29 Imports

Example:

```python id="3k7d9u"
import os
```

Compilation pattern:

```text id="jlwm1o"
IMPORT_NAME os
STORE_NAME os
```

Example:

```python id="38ef9k"
from math import sin
```

Compilation pattern:

```text id="jlwm1p"
IMPORT_NAME math
IMPORT_FROM sin
STORE_NAME sin
```

The compiler emits import operations. Actual module loading happens at runtime.

## 25.30 Assertions

Example:

```python id="g2m5qn"
assert x > 0
```

Compilation pattern:

```text id="jlwm1q"
evaluate condition
jump if true
raise AssertionError
```

Under optimization mode (`python -O`), assert statements may be omitted entirely.

This is a compiler-level transformation.

## 25.31 Source Locations and Line Tables

The compiler records mappings between bytecode and source positions.

These mappings support:

```text id="jlwm1r"
tracebacks
debuggers
profilers
coverage tools
stepping
error reporting
```

Each instruction range may correspond to:

```text id="jlwm1s"
line number
column offset
end line
end column
```

The code object stores compressed mapping tables.

## 25.32 Stack Size Computation

The compiler computes maximum stack depth.

Example:

```python id="7qdzg3"
a + b * c
```

Possible stack evolution:

```text id="jlwm1t"
a
a, b
a, b, c
a, temp
result
```

Maximum depth: 3.

This becomes `co_stacksize`.

Frames allocate enough stack space based on this value.

## 25.33 Basic Blocks

Internally, the compiler often groups instructions into basic blocks.

A basic block is a sequence of instructions with:

```text id="jlwm1u"
single entry
single exit
no internal jumps
```

Example:

```python id="w2g2q7"
if cond:
    a()
b()
```

Possible block structure:

```text id="jlwm1v"
block 1:
    evaluate cond
    conditional jump

block 2:
    a()
    jump

block 3:
    b()
```

Basic blocks simplify control-flow analysis and jump resolution.

## 25.34 Jump Resolution

The compiler initially emits symbolic labels.

Later assembly resolves actual instruction offsets.

Conceptual process:

```text id="jlwm1w"
emit instructions
emit labels
calculate instruction offsets
replace labels with offsets
insert extended arguments if needed
recalculate offsets if sizes changed
```

Jump resolution is one reason compilation is multi-stage.

## 25.35 Inline Caches and Specialization Support

Modern CPython bytecode supports adaptive specialization.

The compiler may emit cache entries associated with instructions.

Example operations benefiting from specialization:

```text id="jlwm1x"
attribute access
global lookup
binary operations
calls
method dispatch
```

Initial bytecode is generic.

Runtime specialization may later replace behavior with optimized fast paths.

The compiler prepares instruction layouts that allow this adaptation.

## 25.36 Bytecode Inspection

Use `dis` to inspect generated bytecode.

Example:

```python id="jlwm1y"
import dis

def f(a, b):
    return a + b

dis.dis(f)
```

Useful inspection functions include:

```text id="jlwm1z"
dis.dis
dis.Bytecode
dis.get_instructions
```

Bytecode inspection is essential for:

```text id="jlwm20"
compiler debugging
performance analysis
tooling
education
reverse engineering Python behavior
```

## 25.37 Version Sensitivity

Bytecode changes between CPython versions.

Changes may include:

```text id="jlwm21"
new opcodes
removed opcodes
opcode renaming
instruction format changes
specialization changes
exception handling changes
line table changes
```

Tools should avoid depending on exact bytecode layouts across versions unless version-specific support exists.

Use public APIs rather than parsing raw bytecode manually whenever possible.

## 25.38 Important CPython Source Areas

Important files include:

```text id="jlwm22"
Python/compile.c
Python/flowgraph.c
Python/assemble.c
Python/bytecodes.c
Include/opcode_ids.h
Lib/dis.py
Lib/opcode.py
```

Conceptual roles:

| Area          | Role                                   |
| ------------- | -------------------------------------- |
| `compile.c`   | AST traversal and instruction emission |
| `flowgraph.c` | Control-flow graph handling            |
| `assemble.c`  | Bytecode assembly and jump resolution  |
| `bytecodes.c` | Opcode definitions                     |
| `opcode.py`   | Opcode metadata                        |
| `dis.py`      | Bytecode inspection                    |

## 25.39 Minimal Mental Model

Use this model:

```text id="jlwm23"
The compiler walks the AST.
Expressions emit stack-based instructions.
Statements emit side-effect and control-flow instructions.
Constants, names, and locals become indexed table references.
Control flow becomes jumps and exception metadata.
Functions, classes, comprehensions, and generators create nested code objects.
The final instruction stream becomes part of a code object executed by the CPython virtual machine.
```

Bytecode generation is the stage where Python syntax becomes executable virtual machine operations.

