# 82. Running the Test Suite

# 82. Running the Test Suite

The CPython test suite is one of the most important parts of the project. It protects language semantics, runtime behavior, standard library correctness, ABI stability, memory management invariants, and platform compatibility across operating systems and architectures.

A CPython contributor spends a large amount of time inside the test system. Even small changes to parser logic, object lifetime handling, reference counting, imports, or threading can break unrelated parts of the runtime. The test suite exists to detect those regressions early.

This chapter explains how the CPython test infrastructure works, how tests are organized, how to execute subsets of tests efficiently, and how core developers use the suite during development.

## 82.1 Purpose of the Test Suite

The CPython test suite serves several independent purposes.

| Purpose | Description |
|---|---|
| Correctness | Verify language and library behavior |
| Regression prevention | Prevent old bugs from reappearing |
| Platform validation | Ensure behavior across Linux, macOS, Windows, BSD, mobile, embedded targets |
| Memory safety | Detect leaks, corruption, dangling references |
| Concurrency validation | Test thread safety and signal handling |
| API compatibility | Protect the C API and ABI |
| Performance stability | Detect pathological slowdowns |
| Build validation | Verify generated artifacts and extension modules |

CPython evolves continuously. A change in one subsystem often affects another subsystem indirectly.

For example:

```text
parser change
    → compiler output changes
        → bytecode layout changes
            → traceback formatting changes
                → debugger tests fail
```

The test suite exists to make those interactions visible.

## 82.2 Repository Layout

Most tests live in:

```text
Lib/test/
```

This directory contains thousands of files.

Important directories include:

| Path | Purpose |
|---|---|
| `Lib/test/` | Core regression tests |
| `Lib/test/test_*` | Standard test modules |
| `Lib/test/support/` | Shared utilities |
| `Modules/_test*` | Native test extensions |
| `Python/` | Runtime-level tests for core systems |
| `Tools/` | Developer tools and helpers |

Example:

```text
Lib/test/test_dict.py
Lib/test/test_list.py
Lib/test/test_gc.py
Lib/test/test_asyncio/
Lib/test/test_importlib/
```

The naming convention is usually:

```text
test_<feature>.py
```

Each file focuses on one subsystem or module.

## 82.3 The `regrtest` Framework

CPython uses a custom test runner called `regrtest`.

You usually invoke it through:

```bash
python -m test
```

or:

```bash
./python -m test
```

when using a locally built interpreter.

The entry point lives in:

```text
Lib/test/libregrtest/
```

`regrtest` handles:

```text
test discovery
parallel execution
timeouts
resource management
reference leak checks
test isolation
reruns
randomization
output formatting
platform skipping
```

This framework evolved specifically for CPython’s needs. Generic Python test runners are insufficient for many interpreter-level tasks.

## 82.4 Building Before Running Tests

You typically run tests against a locally built interpreter.

Example:

```bash
./configure --with-pydebug
make -j8
```

Then:

```bash
./python -m test
```

Using the system Python is usually incorrect when modifying CPython internals because:

```text
wrong binary
wrong stdlib
wrong extension modules
wrong bytecode format
wrong ABI
```

The local build ensures tests execute against the modified runtime.

## 82.5 Running the Entire Test Suite

To execute the full suite:

```bash
./python -m test
```

This may take a long time depending on hardware and build type.

Typical execution includes:

```text
thousands of test files
tens of thousands of test cases
subprocess spawning
network simulation
filesystem operations
thread scheduling
extension module loading
```

Parallel execution is common:

```bash
./python -m test -j8
```

where:

```text
-j8
```

runs eight worker processes.

CPython tests are generally process-isolated rather than thread-isolated.

## 82.6 Running Individual Tests

During development, you rarely run the full suite repeatedly.

Instead:

```bash
./python -m test test_dict
```

or:

```bash
./python -m test test_gc
```

You can also run multiple tests:

```bash
./python -m test test_dict test_list test_set
```

This workflow is essential for fast iteration.

Example development loop:

```text
edit source
rebuild CPython
run focused tests
inspect failure
repeat
```

## 82.7 Verbose Output

Verbose mode:

```bash
./python -m test -v test_dict
```

shows individual test cases as they execute.

Useful for:

```text
debugging hangs
tracking failures
observing execution order
finding flaky tests
```

Very verbose mode:

```bash
./python -m test -vv
```

prints even more internal details.

## 82.8 Fail Fast Mode

During debugging:

```bash
./python -m test -x
```

stops at the first failure.

This reduces noise when diagnosing a regression.

Combined example:

```bash
./python -m test -v -x test_gc
```

## 82.9 Rerunning Failed Tests

Useful option:

```bash
--rerun
```

Example:

```bash
./python -m test --rerun
```

This reruns previously failed tests.

Helpful for:

```text
flaky failures
intermittent race conditions
long test sessions
incremental debugging
```

## 82.10 Test Discovery

`regrtest` discovers tests dynamically.

Convention:

```text
test_*.py
```

Classes usually inherit from:

```python
unittest.TestCase
```

Example:

```python
import unittest

class DictTests(unittest.TestCase):

    def test_lookup(self):
        d = {"x": 1}
        self.assertEqual(d["x"], 1)
```

Discovery scans the test package and imports matching modules.

## 82.11 Test Isolation

Isolation is critical.

Many tests modify:

```text
environment variables
working directories
signal handlers
sys.modules
thread state
filesystem state
locale settings
warning filters
```

CPython’s test infrastructure attempts to restore interpreter state after each test.

Utilities in:

```text
test.support
```

provide helpers for isolation.

Example:

```python
from test.support import EnvironmentVarGuard
```

This prevents global state contamination across tests.

## 82.12 Temporary Directories and Files

Tests should avoid polluting the filesystem.

Typical pattern:

```python
import tempfile

with tempfile.TemporaryDirectory() as d:
    ...
```

CPython also provides helpers:

```python
from test.support import os_helper
```

These utilities handle platform-specific cleanup issues.

## 82.13 Skipped Tests

Some tests require optional features:

```text
network access
IPv6
large memory
GUI support
SSL
specific OS behavior
```

Tests can skip dynamically:

```python
import unittest

@unittest.skipUnless(condition, "requires feature")
def test_feature():
    ...
```

or:

```python
self.skipTest("reason")
```

Skipping is common in CPython because supported platforms vary widely.

## 82.14 Resource-Intensive Tests

Some tests are disabled by default.

Examples:

```text
network tests
large file tests
CPU-intensive tests
memory-heavy tests
```

Enable them with:

```bash
./python -m test -u all
```

or specific resources:

```bash
./python -m test -u network
```

Resource categories include:

| Resource | Meaning |
|---|---|
| `network` | Internet access |
| `largefile` | Very large files |
| `audio` | Audio devices |
| `gui` | GUI interaction |
| `cpu` | Expensive CPU workloads |

This prevents accidental long-running executions.

## 82.15 Reference Leak Testing

One of the most important CPython-specific features is reference leak detection.

Debug builds support:

```bash
./python -m test -R 3:3 test_dict
```

Meaning:

```text
warmup runs
measured runs
compare reference counts
```

This detects leaked references caused by incorrect `Py_INCREF` or `Py_DECREF` usage.

Example leak source:

```c
PyObject *x = PyLong_FromLong(1);
return x;
```

without a matching decref in some path.

Reference leaks are critical because CPython relies heavily on deterministic reference counting.

## 82.16 Debug Builds

Many tests are meaningful only under debug builds.

Configure:

```bash
./configure --with-pydebug
```

Debug builds enable:

```text
extra assertions
memory poisoning
reference tracking
debug allocators
interpreter consistency checks
```

Debug builds are slower but substantially more informative.

Typical debug-only checks include:

```text
negative refcounts
invalid GC state
object lifecycle corruption
API misuse
```

## 82.17 Memory Allocator Debugging

CPython has specialized allocator diagnostics.

Environment variables:

```bash
PYTHONMALLOC=debug
```

can detect:

```text
buffer overflows
double frees
invalid memory access
allocator misuse
```

Combined with tests, these tools expose subtle runtime bugs.

## 82.18 Running Tests Under Sanitizers

Advanced debugging often uses compiler sanitizers.

Examples:

```bash
CFLAGS="-fsanitize=address"
```

or:

```bash
CFLAGS="-fsanitize=undefined"
```

Sanitizers help detect:

```text
heap corruption
use-after-free
integer overflow
undefined behavior
stack corruption
```

These tools are extremely valuable for C-level interpreter work.

## 82.19 Parallel Testing

Parallel execution:

```bash
./python -m test -j0
```

uses all CPU cores automatically.

Internally, `regrtest` spawns worker processes.

Benefits:

```text
faster CI
better CPU utilization
reduced wall-clock time
```

Challenges:

```text
race conditions
filesystem contention
port conflicts
test order assumptions
```

Flaky tests often appear first under parallel execution.

## 82.20 Flaky Tests

A flaky test passes sometimes and fails sometimes.

Common causes:

```text
timing assumptions
thread scheduling
signal races
network instability
clock precision
resource exhaustion
platform variance
```

CPython developers treat flaky tests seriously because they reduce CI reliability.

Strategies include:

```text
timeouts
retry loops
stronger synchronization
reduced timing assumptions
process isolation
```

## 82.21 Platform-Specific Behavior

CPython supports many platforms.

Tests often include conditional branches:

```python
import sys

if sys.platform == "win32":
    ...
```

or:

```python
import unittest

@unittest.skipIf(sys.platform == "win32", "POSIX only")
```

Platform differences include:

```text
filesystem semantics
path handling
signals
process APIs
thread scheduling
encoding defaults
socket behavior
```

A test passing on Linux does not guarantee correctness on Windows or macOS.

## 82.22 Running Tests After Bytecode Changes

Compiler or interpreter modifications often invalidate `.pyc` files.

Common workflow:

```bash
make clean
make
```

or manually removing:

```text
__pycache__/
```

Incorrect bytecode caches can produce misleading failures.

## 82.23 Test Support Utilities

`Lib/test/support/` contains many helpers.

Examples:

| Utility | Purpose |
|---|---|
| `import_helper` | Import isolation |
| `threading_helper` | Thread coordination |
| `socket_helper` | Network helpers |
| `warnings_helper` | Warning capture |
| `os_helper` | Filesystem helpers |

These utilities reduce duplicated infrastructure across tests.

## 82.24 Continuous Integration

CPython uses extensive CI infrastructure.

Typical CI runs include:

```text
Linux
Windows
macOS
debug builds
release builds
sanitizer builds
free-threaded builds
multiple architectures
```

A patch may pass locally but fail in CI due to platform-specific behavior.

Core developers therefore rely heavily on automated infrastructure.

## 82.25 Common Contributor Workflow

Typical workflow:

```text
modify source
build interpreter
run focused tests
run broader related tests
run full suite if necessary
check reference leaks
push branch
wait for CI
investigate failures
```

Small targeted testing is essential for productivity.

Running the full suite after every edit is usually impractical.

## 82.26 Writing Good Tests

Good CPython tests are:

```text
deterministic
isolated
cross-platform
minimal
fast
clear
specific
```

Bad tests often:

```text
depend on timing
depend on execution order
leave global state modified
depend on external services
assume specific memory layout
```

A good regression test usually targets one bug precisely.

## 82.27 Example Minimal Regression Test

Suppose a dictionary regression existed.

A focused test might look like:

```python
import unittest

class DictRegressionTests(unittest.TestCase):

    def test_resize_preserves_entries(self):
        d = {}

        for i in range(1000):
            d[i] = i

        for i in range(1000):
            self.assertEqual(d[i], i)

if __name__ == "__main__":
    unittest.main()
```

This directly validates the invariant under investigation.

## 82.28 Reading Failures

A test failure may indicate:

```text
logic bug
memory corruption
reference leak
ABI mismatch
undefined behavior
test bug
platform assumption
```

CPython failures are sometimes nonlocal.

Example:

```text
GC corruption
    → unrelated test crashes later
```

The first visible failure is not always the root cause.

## 82.29 Core Principle

The CPython test suite is part of the interpreter itself.

It is not an optional accessory.

The runtime, compiler, object system, import machinery, and standard library evolve together with the tests. Every important subsystem in CPython is tightly coupled to regression validation.

Understanding the test infrastructure is therefore a prerequisite for serious CPython development.