27. Evaluation Loop

The evaluation loop is the central execution engine of CPython. It takes a compiled code object, executes its bytecode instructions, and produces a result or an exception.

At a high level, CPython execution looks like this:

Python source
    ↓
tokens
    ↓
parser
    ↓
AST
    ↓
symbol table
    ↓
compiler
    ↓
code object
    ↓
frame
    ↓
evaluation loop
    ↓
Python result or exception

This chapter focuses on the last active stage: the loop that runs bytecode.

The evaluation loop lives in CPython’s interpreter implementation. Historically, the key file has been Python/ceval.c, with surrounding interpreter machinery spread across other files. Modern CPython also generates some interpreter code from bytecode definitions. The details move between releases, but the model remains stable: a frame executes a code object by repeatedly dispatching bytecode instructions. CPython’s own developer guide points to the internal documentation and source tree as the current reference because this code changes across versions.

27.1 The Job of the Evaluation Loop

The evaluation loop does not parse Python text. It does not build the AST. It does not usually decide lexical scope. Those jobs are already finished by the time execution begins.

Its job is narrower and more mechanical:

read the next bytecode instruction
decode its operand
perform the operation
update the frame
continue, jump, call, return, or raise

For example, this function:

def add(a, b):
    return a + b

is compiled into a code object. The code object contains bytecode. When add(2, 3) is called, CPython creates or initializes a frame for that call, stores the arguments in fast local slots, then runs the frame through the evaluation loop.

The loop eventually reaches a return instruction. That instruction pops or reads the return value, unwinds the frame, and returns the object pointer to the caller.

Conceptually:

call add(2, 3)
    create frame
    store a = 2
    store b = 3
    execute LOAD_FAST a
    execute LOAD_FAST b
    execute BINARY_OP +
    execute RETURN_VALUE
return 5

The real implementation is more complex, but this is the core model.

27.2 The Main Runtime Objects

The evaluation loop connects several CPython runtime objects.

Runtime object	Role
Code object	Immutable compiled bytecode and metadata
Frame	Mutable execution state for one call
Thread state	Per-thread interpreter execution state
Interpreter state	Per-interpreter runtime state
Python object	Runtime value manipulated by instructions
Type object	Runtime behavior table for Python objects

A code object describes what should run.

A frame stores one active execution of that code.

The evaluation loop executes the frame.

This distinction matters. A single code object can be executed many times. Each call gets its own frame state.

def f(x):
    return x + 1

a = f(10)
b = f(20)

Both calls use the same code object, but each call has separate locals, stack state, and return value.

27.3 Code Objects

A code object contains the compiled representation of Python code.

You can inspect one from Python:

def f(x):
    y = x + 1
    return y

code = f.__code__

print(code.co_name)
print(code.co_varnames)
print(code.co_consts)
print(code.co_names)
print(code.co_stacksize)

Typical fields include:

Field	Meaning
`co_code`	Bytecode stream, exposed in a version-dependent form
`co_consts`	Literal constants used by the code
`co_names`	Names referenced by bytecode
`co_varnames`	Local variable names
`co_freevars`	Free variables captured from outer scopes
`co_cellvars`	Variables captured by inner scopes
`co_argcount`	Positional argument count
`co_kwonlyargcount`	Keyword-only argument count
`co_stacksize`	Required value stack size
`co_flags`	Code flags
`co_filename`	Source filename
`co_name`	Function or block name
`co_qualname`	Qualified name
`co_firstlineno`	First source line
line table	Mapping from bytecode offsets to source lines
exception table	Structured exception handling metadata

The dis module exists specifically to inspect CPython bytecode. Its documentation notes that CPython bytecode is an implementation detail and may change between Python versions.

27.4 Frames

A frame is an execution record.

When a function runs, CPython needs somewhere to store:

the code object being executed
the instruction pointer
local variables
temporary stack values
globals dictionary
builtins dictionary
closure cells
exception state
return state
tracing and profiling state

That structure is the frame.

A simplified frame model looks like this:

frame
    code object
    globals
    builtins
    locals / fast locals
    instruction pointer
    value stack
    block and exception state
    previous frame / caller relation

A Python call chain creates a chain of frames:

def a():
    return b()

def b():
    return c()

def c():
    return 42

a()

Conceptually:

frame for a
    frame for b
        frame for c

At any instant, the current thread state points to the currently executing frame or equivalent internal frame representation.

27.5 Fast Locals

Function local variables are not normally stored in a normal dictionary during execution.

CPython uses an array-like layout for fast local variables. Names are resolved at compile time to local indexes. Bytecode instructions can then access local variables by index instead of doing dictionary lookup.

Example:

def f(a, b):
    c = a + b
    return c

The compiler assigns local slots:

Name	Slot
`a`	0
`b`	1
`c`	2

The bytecode can then use slot-based operations:

LOAD_FAST 0     load a
LOAD_FAST 1     load b
BINARY_OP +
STORE_FAST 2    store c
LOAD_FAST 2     load c
RETURN_VALUE

This is why local variable access is generally faster than global variable access. A local access can use a direct frame slot. A global access must search dictionaries and handle builtins fallback.

27.6 The Value Stack

CPython bytecode uses a stack model.

Most instructions read from and write to a frame-local value stack. This stack is separate from the C call stack. It stores PyObject * values during bytecode execution.

For this expression:

x = (a + b) * c

The stack behavior is roughly:

LOAD_FAST a      stack: [a]
LOAD_FAST b      stack: [a, b]
BINARY_OP +      stack: [a_plus_b]
LOAD_FAST c      stack: [a_plus_b, c]
BINARY_OP *      stack: [product]
STORE_FAST x     stack: []

The value stack is central to bytecode design. It avoids needing every instruction to name explicit source and destination registers. Instead, instructions agree on stack effects.

Some instructions push values:

LOAD_CONST
LOAD_FAST
LOAD_GLOBAL
BUILD_LIST

Some instructions pop values:

STORE_FAST
POP_TOP
RETURN_VALUE

Some do both:

BINARY_OP
CALL
LOAD_ATTR
COMPARE_OP

27.7 Instruction Pointer

The frame tracks where execution is inside the bytecode stream.

For straight-line code, the instruction pointer moves forward after each instruction.

For branches, loops, exception handling, and returns, instructions change control flow.

Example:

def sign(x):
    if x < 0:
        return -1
    return 1

Conceptual bytecode flow:

load x
load 0
compare <
jump if false to positive_return
load -1
return
positive_return:
load 1
return

The instruction pointer is what makes this possible. A branch instruction changes the next instruction to execute.

27.8 Dispatch

Dispatch is the act of choosing the C implementation for the current bytecode instruction.

A simplified interpreter loop looks like this:

for (;;) {
    opcode = read_opcode(frame);
    oparg = read_operand(frame);

    switch (opcode) {
        case LOAD_FAST:
            /* load local variable */
            break;

        case LOAD_CONST:
            /* load constant */
            break;

        case BINARY_OP:
            /* perform binary operation */
            break;

        case RETURN_VALUE:
            /* return from frame */
            break;
    }
}

This is only a teaching model. Modern CPython uses optimized dispatch techniques and generated interpreter code in places. Still, the essential shape remains:

fetch
decode
dispatch
execute
repeat

Dispatch cost matters. Every Python bytecode instruction passes through dispatch. If a loop executes millions of bytecode instructions, dispatch overhead becomes visible.

27.9 Stack Effects

Every bytecode instruction has a stack effect.

A stack effect describes how many values an instruction consumes and produces.

For example:

Instruction	Input stack	Output stack
`LOAD_CONST`	`[]`	`[const]`
`LOAD_FAST`	`[]`	`[local]`
`STORE_FAST`	`[value]`	`[]`
`BINARY_OP`	`[left, right]`	`[result]`
`RETURN_VALUE`	`[value]`	returns from frame

The compiler must know stack effects to compute the maximum stack size required by a code object. That value appears as co_stacksize.

For:

def f(a, b, c):
    return (a + b) * c

The stack never needs to hold more than two or three temporary values, depending on exact bytecode. CPython records the maximum required stack depth so the frame can reserve enough space.

27.10 Bytecode Operands

Many bytecode instructions have operands.

An operand is a small integer argument attached to the instruction. The meaning depends on the opcode.

Examples:

LOAD_CONST 0      load co_consts[0]
LOAD_FAST 1       load fast local slot 1
STORE_FAST 2      store into fast local slot 2
LOAD_GLOBAL 3     load name from name table index 3

The bytecode instruction does not usually store a full pointer to the object or string name. It stores an index into a table owned by the code object.

This keeps bytecode compact and separates immutable metadata from execution state.

27.11 Running a Simple Function

Consider:

def add(a, b):
    return a + b

Disassembly may vary by Python version, but the conceptual instruction sequence is:

load local a
load local b
binary add
return value

Execution proceeds like this:

Step	Instruction	Stack before	Stack after
1	`LOAD_FAST a`	`[]`	`[a]`
2	`LOAD_FAST b`	`[a]`	`[a, b]`
3	`BINARY_OP +`	`[a, b]`	`[a + b]`
4	`RETURN_VALUE`	`[a + b]`	return

At the C level, each stack element is a PyObject *.

For add(2, 3), the stack holds pointers to Python integer objects. The addition operation dispatches through Python object semantics. It does not directly emit a CPU integer addition in the general case.

27.12 Why `a + b` Is Not Just One CPU Instruction

In Python, a + b is dynamic.

The objects may be integers:

1 + 2

They may be strings:

"hello " + "world"

They may be lists:

[1] + [2]

They may be user-defined objects:

class X:
    def __add__(self, other):
        return "custom"

X() + X()

The bytecode instruction for addition must respect Python’s data model. It must inspect the operand types, find the correct numeric or sequence operation, call special methods when needed, handle errors, and return a Python object.

So the evaluation loop cannot treat + as plain machine addition. It is a dynamic operation over Python objects.

Modern CPython reduces this overhead when it can. The specializing adaptive interpreter can specialize operations after observing stable runtime behavior. PEP 659 describes this as specialization over small regions with rapid adaptation when behavior changes.

27.13 Function Calls

Function calls are among the most important paths in the evaluation loop.

For:

result = f(x, y)

The interpreter must:

load callable f
load arguments x and y
arrange call arguments
check callable type
enter optimized call path if possible
create or initialize callee frame if it is a Python function
execute callee frame
receive return value
continue caller frame

Conceptually:

caller frame
    LOAD_FAST f
    LOAD_FAST x
    LOAD_FAST y
    CALL 2
        create callee frame
        run callee frame
        return object
    STORE_FAST result

CPython has spent significant optimization effort on calls because calls are frequent and expensive. Important mechanisms include:

vectorcall
fast locals
specialized call bytecodes
inline caches
frame optimizations
reduced temporary tuple/dict creation

The goal is to avoid unnecessary argument packing. Historically, many calls required building tuples and dictionaries for arguments. Modern call paths try to pass arguments in array-like layouts when possible.

27.14 Returning From a Frame

A return instruction ends the current frame.

For:

def f():
    return 42

The return instruction produces a PyObject * result and unwinds the frame.

The caller receives that object as the result of the call expression:

x = f()

Conceptually:

callee frame stack: [42]
RETURN_VALUE
    pop result
    finish callee frame
    give result to caller
caller resumes with stack: [42]
STORE_FAST x

A frame can finish in several ways:

Exit path	Meaning
Normal return	Function returns a value
Exception	Function exits by raising
Generator yield	Frame suspends and later resumes
Coroutine await	Coroutine suspends
Fatal error	Runtime-level failure

The evaluation loop must handle all of these paths.

27.15 Exceptions

Exceptions are part of normal interpreter control flow.

For:

def div(a, b):
    return a / b

If b is zero, the division operation raises ZeroDivisionError.

The bytecode instruction does not return a normal result. Instead, it sets exception state and transfers control to exception handling logic.

Conceptually:

execute BINARY_OP /
    operation fails
    set current exception
    search exception table
    jump to handler or unwind frame

Modern CPython uses structured exception tables associated with code objects. These tables describe protected bytecode ranges and handlers. This allows the interpreter to find the correct handler when an exception occurs.

Example:

try:
    x = 1 / y
except ZeroDivisionError:
    x = 0

The evaluation loop must know where the protected range is, where the handler starts, and what stack state is required at the handler.

27.16 Loops and Branches

Python loops compile to jumps.

Example:

def count(n):
    i = 0
    while i < n:
        i += 1
    return i

Conceptual bytecode shape:

i = 0

loop_start:
    load i
    load n
    compare <
    jump if false to loop_end

    load i
    load 1
    add
    store i

    jump to loop_start

loop_end:
    load i
    return

The evaluation loop does not have a special C-level while-loop for each Python while. It executes bytecode instructions that implement the loop.

A Python loop is therefore an interpreter loop inside the outer interpreter loop:

C evaluation loop
    executes Python loop bytecode
        jumps backward many times

This is one reason tight Python loops can be expensive. Each iteration may execute many bytecode instructions, and each bytecode instruction has dispatch and dynamic object overhead.

27.17 Iteration

A for loop uses the iteration protocol.

Example:

for item in xs:
    use(item)

Conceptual execution:

iterator = iter(xs)

loop:
    item = next(iterator)
    if StopIteration:
        exit loop
    use(item)
    jump loop

The evaluation loop executes instructions that call iter(), call the iterator’s next operation, handle StopIteration, and branch.

This means Python-level for loops are protocol-based. They work for lists, tuples, dicts, files, generators, custom iterators, and many extension types because the interpreter dispatches through object protocol slots.

27.18 Attribute Access

Attribute access is also dynamic.

For:

value = obj.name

The interpreter must implement Python’s attribute lookup rules:

look at object type
handle descriptors
look in instance dictionary if applicable
look in class dictionary and base classes
call custom __getattribute__ if present
fall back to __getattr__ if applicable
raise AttributeError if missing

A simple-looking expression can involve significant machinery.

Modern CPython uses inline caches and specialization to speed up common attribute access patterns. For example, repeated access to the same attribute on objects with stable shapes can avoid some repeated lookup work.

27.19 Global and Builtin Lookup

Global lookup is more expensive than local lookup.

For:

print(len(xs))

Names such as print and len are not local variables unless assigned locally. CPython looks them up through global and builtin namespaces.

Conceptually:

look in globals dictionary
if missing, look in builtins dictionary
if missing, raise NameError

This is why local binding can be faster in tight loops:

def slow(xs):
    for x in xs:
        len(x)

def faster(xs):
    local_len = len
    for x in xs:
        local_len(x)

Modern CPython can specialize global lookups, so this old micro-optimization is less universally useful than it once was. Still, the underlying distinction remains: local slots are simpler than dictionary-based name lookup.

27.20 The GIL and the Evaluation Loop

In the traditional CPython runtime, the evaluation loop runs while the current thread holds the Global Interpreter Lock.

The GIL protects interpreter state, including reference counts and many object internals. The evaluation loop periodically checks whether it should drop the GIL, handle signals, process pending calls, or allow another thread to run.

This means bytecode execution is cooperative at the interpreter level. A thread does not usually hold the GIL forever. CPython has scheduling checks that allow switching between threads.

The practical consequence:

one thread executes Python bytecode at a time per traditional interpreter
I/O operations may release the GIL
C extensions may release the GIL around long native work
CPU-bound Python threads do not normally execute bytecode in parallel

Newer CPython work includes free-threaded builds and per-interpreter changes, but the evaluation loop remains the central place where thread state, pending work, and bytecode execution meet.

27.21 Reference Counts During Execution

Every value on the stack is a Python object pointer with ownership rules.

The evaluation loop must carefully maintain reference counts. When an instruction pushes a value, stores a value, replaces a value, or discards a value, it must preserve object lifetime correctly.

Example:

x = a + b

Conceptually:

load a        obtain reference to object a
load b        obtain reference to object b
add           produce new reference to result
store x       bind result to local slot
discard temporaries

Incorrect reference management would cause either leaks or premature destruction.

At the C level, this means carefully placed operations equivalent to:

Py_INCREF(obj);
Py_DECREF(obj);

The exact implementation often uses specialized macros and ownership conventions. But the invariant is simple: an object must stay alive while it can still be used, and it must be released when the interpreter no longer owns a reference.

27.22 Error Signaling

Most C helper functions in CPython use a common convention:

return a valid pointer or success code on success
return NULL or error code on failure
set an exception on failure

The evaluation loop checks these results.

Simplified example:

PyObject *result = PyNumber_Add(left, right);
if (result == NULL) {
    goto error;
}

The NULL return does not by itself describe the exception. The exception is stored in thread state.

This pattern appears everywhere:

call helper
if failed:
    go to error path
else:
    push or store result

The evaluation loop contains many error exits because almost any Python operation can fail:

allocation can fail
attribute lookup can fail
function call can fail
comparison can fail
iteration can fail
import can fail
descriptor code can fail
user-defined special method can fail

27.23 Pending Calls, Signals, and Async Events

The evaluation loop also acts as a safe checkpoint for runtime-level work.

CPython cannot handle every signal or pending event at arbitrary C instruction boundaries. Instead, it records that something needs attention and checks at controlled points during evaluation.

Examples:

signal handling
pending calls from C APIs
thread switching requests
async exception injection
tracing and profiling hooks
monitoring hooks
interrupt checks

This keeps the interpreter manageable. The evaluation loop becomes the place where Python execution notices outside events.

27.24 Tracing and Profiling

Python supports tracing and profiling through APIs such as:

sys.settrace(...)
sys.setprofile(...)

These hooks require cooperation from the evaluation loop.

The loop must emit events such as:

call
line
return
exception
opcode, when enabled

Tracing makes execution slower because it adds checks and callback calls. But it enables debuggers, coverage tools, profilers, teaching tools, and observability systems.

A debugger that steps through Python code depends on the evaluation loop’s ability to map bytecode execution back to source lines.

27.25 Specializing Adaptive Interpreter

Since Python 3.11, CPython has included a specializing adaptive interpreter based on PEP 659. The idea is to keep Python semantics dynamic while making common stable cases faster. PEP 659 describes specialization as aggressive over small regions, with adaptation when runtime patterns change.

The interpreter starts with general bytecode. As code runs, CPython observes behavior and may replace or augment generic operations with specialized forms.

For example, a generic binary operation may become optimized for common operand types:

generic BINARY_OP
    observed int + int repeatedly
        ↓
specialized integer-add path

For attribute access:

generic LOAD_ATTR
    observed same attribute layout repeatedly
        ↓
cached attribute access path

For global lookup:

generic LOAD_GLOBAL
    observed stable globals and builtins dictionaries
        ↓
cached global lookup path

Specialization must remain correct. If assumptions fail, the interpreter falls back or adapts.

This is not the same as a traditional full JIT compiler. It still operates inside the interpreter architecture. It specializes bytecode-level execution paths rather than compiling whole functions into native machine code in the general case.

27.26 Inline Caches

Inline caches are small pieces of cache storage associated with bytecode instructions.

Instead of recomputing lookup information every time, the interpreter stores facts near the instruction that needs them.

Example cache information may include:

type version
dictionary version
attribute offset
resolved descriptor
global dictionary version
builtin dictionary version
specialized call target

A simplified attribute cache model:

LOAD_ATTR name
    cache:
        expected type = User
        type version = 123
        attribute offset = 2

On the next execution, CPython can check whether the object still matches the cached assumptions. If yes, it uses the fast path. If no, it falls back to the generic path.

Inline caches work well because bytecode instructions at a given source location often see the same kinds of objects repeatedly.

27.27 Why Specialization Is Safe

Python is dynamic, so specialization must be guarded.

A specialized path is valid only while its assumptions remain true.

For example:

obj.x

can be specialized if CPython observes a stable object layout. But Python allows mutation:

obj.__dict__["x"] = 10
type(obj).x = property(...)
obj.__class__ = OtherType

So CPython uses version tags, guards, counters, and fallback paths.

The safety rule is:

use fast path only if guards prove assumptions still hold
otherwise use generic Python semantics

This is the same broad strategy used by many dynamic language runtimes, but CPython keeps the machinery relatively close to the bytecode interpreter.

27.28 Generated Interpreter Code

Modern CPython does not treat every bytecode implementation as hand-written switch cases in one file.

Parts of the interpreter are generated from instruction definitions. This helps keep bytecode metadata, stack effects, specialization information, and dispatch code more consistent.

The broad idea:

instruction definitions
    ↓
generated opcode metadata
    ↓
generated dispatch support
    ↓
interpreter execution

For a reader, this means the source of truth may not always be the final generated C file alone. You often need to inspect the instruction definition files, generated headers, and build outputs.

The exact files and generation pipeline can change across CPython releases, so use the source tree for the version you are studying.

27.29 The Evaluation Loop and C Calls

The evaluation loop frequently calls C helper functions.

Examples:

PyNumber_Add
PyObject_GetAttr
PyObject_SetAttr
PyObject_Call
PyDict_GetItem
PyObject_RichCompare
PyIter_Next

These helpers may call user-defined Python code.

For example:

a + b

may call:

a.__add__(b)

And:

obj.name

may call:

obj.__getattribute__("name")

So the evaluation loop can reenter Python execution indirectly. A bytecode instruction may call C helper code, which may call Python code, which creates another frame, which starts another evaluation loop execution.

Conceptually:

frame A
    executes BINARY_OP
        calls C helper
            calls user __add__
                frame B
                    evaluation loop

This recursive execution model is central to Python’s flexibility.

27.30 Recursion and Call Depth

Python protects against uncontrolled recursion.

Example:

def f():
    return f()

f()

Each call creates another Python frame. CPython tracks recursion depth and raises RecursionError when the configured limit is exceeded.

The evaluation loop and call machinery must cooperate with this check. Without it, recursive Python code could exhaust the C stack or process memory.

You can inspect and adjust the limit:

import sys

print(sys.getrecursionlimit())
sys.setrecursionlimit(2000)

Raising the recursion limit should be done carefully. The Python limit exists partly to protect lower-level runtime resources.

27.31 Generators

Generators change the frame lifecycle.

A normal function call runs until it returns or raises. A generator can suspend and resume.

Example:

def gen():
    yield 1
    yield 2

Calling gen() does not immediately run the function body to completion. It creates a generator object that owns a suspended frame or equivalent execution state.

Each next() resumes execution:

first next()
    enter frame
    run until yield 1
    suspend frame

second next()
    resume frame
    run until yield 2
    suspend frame

third next()
    resume frame
    finish function
    raise StopIteration

The evaluation loop must support suspension. It cannot simply destroy the frame at yield.

27.32 Coroutines and Await

Coroutines extend the same suspension model.

Example:

async def fetch():
    data = await read()
    return data

An await may suspend the coroutine until another awaitable completes.

The evaluation loop must support:

coroutine frame creation
suspension at await
resumption with value
resumption with exception
final return
cancellation behavior

Async execution is therefore not a separate interpreter. It is built on the same frame and bytecode machinery, with specific instructions and protocols for suspension and resumption.

27.33 Class Bodies and Module Bodies

The evaluation loop does not only execute functions.

It also executes module bodies and class bodies.

A module file:

x = 1

def f():
    return x

is compiled into a module-level code object. Importing or running the module executes that code object.

A class statement also executes code:

class C:
    x = 1

    def method(self):
        return self.x

The class body runs in a namespace prepared for class construction. After execution, CPython builds the class object from that namespace.

So the evaluation loop executes several block kinds:

Block kind	Example
Module	`.py` file body
Function	`def f(): ...`
Class body	`class C: ...`
Lambda	`lambda x: x + 1`
Comprehension	`[x * 2 for x in xs]`
Generator	`(x for x in xs)`
Coroutine	`async def f(): ...`

27.34 Comprehensions

Comprehensions compile to their own code objects in many cases.

Example:

ys = [x * 2 for x in xs if x > 0]

Conceptually:

create list
iterate xs
for each x:
    if x > 0:
        append x * 2
return list

This means comprehensions often run through a nested frame or specialized internal execution path. They have their own local scope behavior, which is why loop variables inside list comprehensions do not leak into the surrounding scope in Python 3.

The evaluation loop sees comprehension execution as bytecode execution, not as a special syntax form.

27.35 Import Execution

Imports also eventually execute bytecode.

When Python imports a .py module, the import system finds the module, reads the source or cached bytecode, creates a module object, then executes the module code object.

Conceptually:

import module
    find spec
    create module object
    compile or load code object
    execute code object in module namespace

The evaluation loop therefore participates in imports. Importing a module means running code.

This is why import-time side effects happen:

# module.py
print("imported")

import module

The print runs because module body execution is ordinary code execution.

27.36 Performance Model

The evaluation loop explains much of Python performance.

A Python operation often has several layers of cost:

bytecode dispatch
stack manipulation
reference count updates
dynamic type checks
dictionary lookup
descriptor protocol
function call overhead
allocation
error checks

For example:

obj.x + y

may require:

LOAD_FAST obj
LOAD_ATTR x
LOAD_FAST y
BINARY_OP +

Each instruction has interpreter overhead. LOAD_ATTR may involve descriptor lookup. BINARY_OP may involve numeric dispatch. Reference counts must be maintained. Errors must be checked.

This is why moving hot loops into C extensions, vectorized libraries, or built-in operations can be much faster. They reduce the number of bytecode instructions and dynamic dispatches executed by the evaluation loop.

27.37 Built-ins as Evaluation Loop Escape Hatches

Built-in operations can perform large amounts of work below the bytecode level.

Example:

sum(xs)

The evaluation loop executes the call to sum, but the loop over elements may run in C inside the built-in implementation.

Compare:

total = 0
for x in xs:
    total += x

This requires many bytecode instructions per iteration.

The built-in can reduce interpreter overhead because much of the repeated work happens in C.

This is a common Python performance principle:

fewer Python bytecode instructions in hot paths usually means better performance

27.38 Inspecting the Evaluation Loop From Python

You can study bytecode with dis:

import dis

def f(a, b):
    c = a + b
    return c

dis.dis(f)

You can inspect frames:

import inspect

def f():
    frame = inspect.currentframe()
    print(frame.f_code.co_name)
    print(frame.f_locals)

f()

You can inspect call depth:

import sys

def f(n):
    frame = sys._getframe()
    print(n, frame.f_code.co_name)
    if n:
        f(n - 1)

f(3)

You can trace execution:

import sys

def trace(frame, event, arg):
    print(event, frame.f_code.co_name, frame.f_lineno)
    return trace

def f(x):
    y = x + 1
    return y

sys.settrace(trace)
f(10)
sys.settrace(None)

These tools expose part of the machinery that the evaluation loop maintains internally.

27.39 A Simplified Evaluation Loop

A teaching version of the loop might look like this:

PyObject *
eval_frame(Frame *frame)
{
    for (;;) {
        Instruction instr = next_instruction(frame);

        switch (instr.opcode) {
            case OP_LOAD_CONST: {
                PyObject *value = frame->code->consts[instr.arg];
                push(frame, value);
                break;
            }

            case OP_LOAD_FAST: {
                PyObject *value = frame->locals[instr.arg];
                if (value == NULL) {
                    raise_unbound_local_error();
                    goto error;
                }
                push(frame, value);
                break;
            }

            case OP_STORE_FAST: {
                PyObject *value = pop(frame);
                frame->locals[instr.arg] = value;
                break;
            }

            case OP_BINARY_ADD: {
                PyObject *right = pop(frame);
                PyObject *left = pop(frame);
                PyObject *result = PyNumber_Add(left, right);
                if (result == NULL) {
                    goto error;
                }
                push(frame, result);
                break;
            }

            case OP_RETURN_VALUE: {
                PyObject *result = pop(frame);
                return result;
            }
        }
    }

error:
    return NULL;
}

This omits most real details:

reference ownership
specialization
inline caches
exception tables
tracing
profiling
GIL checks
pending calls
signals
generators
coroutines
debug builds
statistics
opcode prediction
deoptimization
frame materialization

But it captures the essential idea.

27.40 Common Misunderstandings

Misunderstanding	Correct model
CPython executes source text directly	CPython executes compiled code objects
Python variables store raw values	Names and slots hold references to objects
Bytecode is stable across versions	Bytecode is a CPython implementation detail
`a + b` is simple machine addition	It is dynamic object protocol dispatch, unless specialized
A frame is only a traceback object	A frame is active execution state
The GIL only affects user threads	It is deeply connected to interpreter execution and object safety
Exceptions are rare side paths only	Exceptions are integrated into normal control flow machinery
Generators are special functions only	They are resumable execution frames or equivalent state

27.41 Reading the Real Source

When reading the real CPython source, use this order:

Start with dis output for a small Python function.
Identify the bytecode instructions.
Find the corresponding opcode definitions.
Find the generated or handwritten interpreter implementation.
Follow helper calls for object operations.
Track reference ownership.
Track stack effects.
Track error paths.
Check specialization and cache behavior.
Compare behavior across Python versions.

A good study function is:

def example(obj, xs):
    total = 0
    for x in xs:
        total += obj.value + x
    return total

This function touches many interpreter paths:

local variable access
loop iteration
attribute lookup
binary operation
in-place update semantics
jump instructions
return

Disassemble it, then map each instruction to the interpreter machinery.

27.42 Chapter Summary

The evaluation loop is where compiled Python code becomes running Python behavior. It executes code objects through frames, uses a stack-based bytecode model, dispatches instructions, maintains references, handles exceptions, calls functions, checks runtime events, and applies specialization when possible.

The loop is small in concept but large in consequence. It sits at the junction of nearly every CPython subsystem:

compiler
frames
objects
types
reference counting
garbage collection
exceptions
calls
imports
generators
coroutines
tracing
profiling
threading
optimization

To understand CPython, you must understand the evaluation loop. It is the machine inside the machine.

27. Evaluation Loop

27.1 The Job of the Evaluation Loop

27.2 The Main Runtime Objects

27.3 Code Objects

27.4 Frames

27.5 Fast Locals

27.6 The Value Stack

27.7 Instruction Pointer

27.8 Dispatch

27.9 Stack Effects

27.10 Bytecode Operands

27.11 Running a Simple Function

27.12 Why a + b Is Not Just One CPU Instruction

27.13 Function Calls

27.14 Returning From a Frame

27.15 Exceptions

27.16 Loops and Branches

27.17 Iteration

27.18 Attribute Access

27.19 Global and Builtin Lookup

27.20 The GIL and the Evaluation Loop

27.21 Reference Counts During Execution

27.22 Error Signaling

27.23 Pending Calls, Signals, and Async Events

27.24 Tracing and Profiling

27.25 Specializing Adaptive Interpreter

27.26 Inline Caches

27.27 Why Specialization Is Safe

27.28 Generated Interpreter Code

27.29 The Evaluation Loop and C Calls

27.30 Recursion and Call Depth

27.31 Generators

27.32 Coroutines and Await

27.33 Class Bodies and Module Bodies

27.34 Comprehensions

27.35 Import Execution

27.36 Performance Model

27.37 Built-ins as Evaluation Loop Escape Hatches

27.38 Inspecting the Evaluation Loop From Python

27.39 A Simplified Evaluation Loop

27.40 Common Misunderstandings

27.41 Reading the Real Source

27.42 Chapter Summary

27.12 Why `a + b` Is Not Just One CPU Instruction