Skip to content

82. Running the Test Suite

python -m test flags, regrtest test selection, parallel execution (-j), and interpreting test output.

The CPython test suite is one of the most important parts of the project. It protects language semantics, runtime behavior, standard library correctness, ABI stability, memory management invariants, and platform compatibility across operating systems and architectures.

A CPython contributor spends a large amount of time inside the test system. Even small changes to parser logic, object lifetime handling, reference counting, imports, or threading can break unrelated parts of the runtime. The test suite exists to detect those regressions early.

This chapter explains how the CPython test infrastructure works, how tests are organized, how to execute subsets of tests efficiently, and how core developers use the suite during development.

82.1 Purpose of the Test Suite

The CPython test suite serves several independent purposes.

PurposeDescription
CorrectnessVerify language and library behavior
Regression preventionPrevent old bugs from reappearing
Platform validationEnsure behavior across Linux, macOS, Windows, BSD, mobile, embedded targets
Memory safetyDetect leaks, corruption, dangling references
Concurrency validationTest thread safety and signal handling
API compatibilityProtect the C API and ABI
Performance stabilityDetect pathological slowdowns
Build validationVerify generated artifacts and extension modules

CPython evolves continuously. A change in one subsystem often affects another subsystem indirectly.

For example:

parser change
    → compiler output changes
        → bytecode layout changes
            → traceback formatting changes
                → debugger tests fail

The test suite exists to make those interactions visible.

82.2 Repository Layout

Most tests live in:

Lib/test/

This directory contains thousands of files.

Important directories include:

PathPurpose
Lib/test/Core regression tests
Lib/test/test_*Standard test modules
Lib/test/support/Shared utilities
Modules/_test*Native test extensions
Python/Runtime-level tests for core systems
Tools/Developer tools and helpers

Example:

Lib/test/test_dict.py
Lib/test/test_list.py
Lib/test/test_gc.py
Lib/test/test_asyncio/
Lib/test/test_importlib/

The naming convention is usually:

test_<feature>.py

Each file focuses on one subsystem or module.

82.3 The regrtest Framework

CPython uses a custom test runner called regrtest.

You usually invoke it through:

python -m test

or:

./python -m test

when using a locally built interpreter.

The entry point lives in:

Lib/test/libregrtest/

regrtest handles:

test discovery
parallel execution
timeouts
resource management
reference leak checks
test isolation
reruns
randomization
output formatting
platform skipping

This framework evolved specifically for CPython’s needs. Generic Python test runners are insufficient for many interpreter-level tasks.

82.4 Building Before Running Tests

You typically run tests against a locally built interpreter.

Example:

./configure --with-pydebug
make -j8

Then:

./python -m test

Using the system Python is usually incorrect when modifying CPython internals because:

wrong binary
wrong stdlib
wrong extension modules
wrong bytecode format
wrong ABI

The local build ensures tests execute against the modified runtime.

82.5 Running the Entire Test Suite

To execute the full suite:

./python -m test

This may take a long time depending on hardware and build type.

Typical execution includes:

thousands of test files
tens of thousands of test cases
subprocess spawning
network simulation
filesystem operations
thread scheduling
extension module loading

Parallel execution is common:

./python -m test -j8

where:

-j8

runs eight worker processes.

CPython tests are generally process-isolated rather than thread-isolated.

82.6 Running Individual Tests

During development, you rarely run the full suite repeatedly.

Instead:

./python -m test test_dict

or:

./python -m test test_gc

You can also run multiple tests:

./python -m test test_dict test_list test_set

This workflow is essential for fast iteration.

Example development loop:

edit source
rebuild CPython
run focused tests
inspect failure
repeat

82.7 Verbose Output

Verbose mode:

./python -m test -v test_dict

shows individual test cases as they execute.

Useful for:

debugging hangs
tracking failures
observing execution order
finding flaky tests

Very verbose mode:

./python -m test -vv

prints even more internal details.

82.8 Fail Fast Mode

During debugging:

./python -m test -x

stops at the first failure.

This reduces noise when diagnosing a regression.

Combined example:

./python -m test -v -x test_gc

82.9 Rerunning Failed Tests

Useful option:

--rerun

Example:

./python -m test --rerun

This reruns previously failed tests.

Helpful for:

flaky failures
intermittent race conditions
long test sessions
incremental debugging

82.10 Test Discovery

regrtest discovers tests dynamically.

Convention:

test_*.py

Classes usually inherit from:

unittest.TestCase

Example:

import unittest

class DictTests(unittest.TestCase):

    def test_lookup(self):
        d = {"x": 1}
        self.assertEqual(d["x"], 1)

Discovery scans the test package and imports matching modules.

82.11 Test Isolation

Isolation is critical.

Many tests modify:

environment variables
working directories
signal handlers
sys.modules
thread state
filesystem state
locale settings
warning filters

CPython’s test infrastructure attempts to restore interpreter state after each test.

Utilities in:

test.support

provide helpers for isolation.

Example:

from test.support import EnvironmentVarGuard

This prevents global state contamination across tests.

82.12 Temporary Directories and Files

Tests should avoid polluting the filesystem.

Typical pattern:

import tempfile

with tempfile.TemporaryDirectory() as d:
    ...

CPython also provides helpers:

from test.support import os_helper

These utilities handle platform-specific cleanup issues.

82.13 Skipped Tests

Some tests require optional features:

network access
IPv6
large memory
GUI support
SSL
specific OS behavior

Tests can skip dynamically:

import unittest

@unittest.skipUnless(condition, "requires feature")
def test_feature():
    ...

or:

self.skipTest("reason")

Skipping is common in CPython because supported platforms vary widely.

82.14 Resource-Intensive Tests

Some tests are disabled by default.

Examples:

network tests
large file tests
CPU-intensive tests
memory-heavy tests

Enable them with:

./python -m test -u all

or specific resources:

./python -m test -u network

Resource categories include:

ResourceMeaning
networkInternet access
largefileVery large files
audioAudio devices
guiGUI interaction
cpuExpensive CPU workloads

This prevents accidental long-running executions.

82.15 Reference Leak Testing

One of the most important CPython-specific features is reference leak detection.

Debug builds support:

./python -m test -R 3:3 test_dict

Meaning:

warmup runs
measured runs
compare reference counts

This detects leaked references caused by incorrect Py_INCREF or Py_DECREF usage.

Example leak source:

PyObject *x = PyLong_FromLong(1);
return x;

without a matching decref in some path.

Reference leaks are critical because CPython relies heavily on deterministic reference counting.

82.16 Debug Builds

Many tests are meaningful only under debug builds.

Configure:

./configure --with-pydebug

Debug builds enable:

extra assertions
memory poisoning
reference tracking
debug allocators
interpreter consistency checks

Debug builds are slower but substantially more informative.

Typical debug-only checks include:

negative refcounts
invalid GC state
object lifecycle corruption
API misuse

82.17 Memory Allocator Debugging

CPython has specialized allocator diagnostics.

Environment variables:

PYTHONMALLOC=debug

can detect:

buffer overflows
double frees
invalid memory access
allocator misuse

Combined with tests, these tools expose subtle runtime bugs.

82.18 Running Tests Under Sanitizers

Advanced debugging often uses compiler sanitizers.

Examples:

CFLAGS="-fsanitize=address"

or:

CFLAGS="-fsanitize=undefined"

Sanitizers help detect:

heap corruption
use-after-free
integer overflow
undefined behavior
stack corruption

These tools are extremely valuable for C-level interpreter work.

82.19 Parallel Testing

Parallel execution:

./python -m test -j0

uses all CPU cores automatically.

Internally, regrtest spawns worker processes.

Benefits:

faster CI
better CPU utilization
reduced wall-clock time

Challenges:

race conditions
filesystem contention
port conflicts
test order assumptions

Flaky tests often appear first under parallel execution.

82.20 Flaky Tests

A flaky test passes sometimes and fails sometimes.

Common causes:

timing assumptions
thread scheduling
signal races
network instability
clock precision
resource exhaustion
platform variance

CPython developers treat flaky tests seriously because they reduce CI reliability.

Strategies include:

timeouts
retry loops
stronger synchronization
reduced timing assumptions
process isolation

82.21 Platform-Specific Behavior

CPython supports many platforms.

Tests often include conditional branches:

import sys

if sys.platform == "win32":
    ...

or:

import unittest

@unittest.skipIf(sys.platform == "win32", "POSIX only")

Platform differences include:

filesystem semantics
path handling
signals
process APIs
thread scheduling
encoding defaults
socket behavior

A test passing on Linux does not guarantee correctness on Windows or macOS.

82.22 Running Tests After Bytecode Changes

Compiler or interpreter modifications often invalidate .pyc files.

Common workflow:

make clean
make

or manually removing:

__pycache__/

Incorrect bytecode caches can produce misleading failures.

82.23 Test Support Utilities

Lib/test/support/ contains many helpers.

Examples:

UtilityPurpose
import_helperImport isolation
threading_helperThread coordination
socket_helperNetwork helpers
warnings_helperWarning capture
os_helperFilesystem helpers

These utilities reduce duplicated infrastructure across tests.

82.24 Continuous Integration

CPython uses extensive CI infrastructure.

Typical CI runs include:

Linux
Windows
macOS
debug builds
release builds
sanitizer builds
free-threaded builds
multiple architectures

A patch may pass locally but fail in CI due to platform-specific behavior.

Core developers therefore rely heavily on automated infrastructure.

82.25 Common Contributor Workflow

Typical workflow:

modify source
build interpreter
run focused tests
run broader related tests
run full suite if necessary
check reference leaks
push branch
wait for CI
investigate failures

Small targeted testing is essential for productivity.

Running the full suite after every edit is usually impractical.

82.26 Writing Good Tests

Good CPython tests are:

deterministic
isolated
cross-platform
minimal
fast
clear
specific

Bad tests often:

depend on timing
depend on execution order
leave global state modified
depend on external services
assume specific memory layout

A good regression test usually targets one bug precisely.

82.27 Example Minimal Regression Test

Suppose a dictionary regression existed.

A focused test might look like:

import unittest

class DictRegressionTests(unittest.TestCase):

    def test_resize_preserves_entries(self):
        d = {}

        for i in range(1000):
            d[i] = i

        for i in range(1000):
            self.assertEqual(d[i], i)

if __name__ == "__main__":
    unittest.main()

This directly validates the invariant under investigation.

82.28 Reading Failures

A test failure may indicate:

logic bug
memory corruption
reference leak
ABI mismatch
undefined behavior
test bug
platform assumption

CPython failures are sometimes nonlocal.

Example:

GC corruption
    → unrelated test crashes later

The first visible failure is not always the root cause.

82.29 Core Principle

The CPython test suite is part of the interpreter itself.

It is not an optional accessory.

The runtime, compiler, object system, import machinery, and standard library evolve together with the tests. Every important subsystem in CPython is tightly coupled to regression validation.

Understanding the test infrastructure is therefore a prerequisite for serious CPython development.