87. Writing Core Tests

Core tests in CPython are regression tests for the interpreter, the standard library, and the C API. They are written to protect behavior that CPython promises to users, extension authors, and downstream distributors.

A good CPython test is small, deterministic, isolated, and directly connected to a specific behavior. It should fail before the fix and pass after the fix. It should avoid timing assumptions, external services, machine-specific behavior, and hidden dependency on test order.

87.1 Where Core Tests Live

Most tests live under:

Lib/test/

Common examples:

Lib/test/test_dict.py
Lib/test/test_list.py
Lib/test/test_gc.py
Lib/test/test_importlib/
Lib/test/test_asyncio/
Lib/test/test_capi/

Shared helpers live under:

Lib/test/support/

C-level test helpers often live in:

Modules/_test*.c

These native helper modules expose controlled C behavior to Python-level tests.

87.2 Test File Naming

Test modules usually follow this pattern:

test_<feature>.py

Examples:

test_unicode.py
test_compile.py
test_descr.py
test_frame.py
test_sys.py

The file name should make the tested subsystem obvious.

A test for dictionary behavior belongs in test_dict.py. A test for garbage collector behavior belongs in test_gc.py. A test for interpreter frame behavior may belong in test_frame.py, test_sys.py, or a more specific existing file, depending on the behavior.

87.3 Basic Test Structure

Most CPython tests use unittest.

import unittest

class DictTests(unittest.TestCase):

    def test_lookup_existing_key(self):
        d = {"x": 1}
        self.assertEqual(d["x"], 1)

if __name__ == "__main__":
    unittest.main()

When run through regrtest, the test runner discovers and executes the test module.

./python -m test test_dict

For direct execution:

./python Lib/test/test_dict.py

87.4 Regression Tests

A regression test protects against a bug returning.

A good regression test has this shape:

def test_specific_bug(self):
    # Minimal setup that used to fail.
    ...
    # Direct assertion for the corrected behavior.
    self.assertEqual(actual, expected)

Avoid broad tests that validate many unrelated things at once.

Weak regression test:

def test_many_things(self):
    ...

Better:

def test_dict_preserves_value_after_resize(self):
    ...

The test name should describe the behavior, not just the bug number.

87.5 Testing Exceptions

Use assertRaises for expected exceptions.

with self.assertRaises(TypeError):
    len(10)

To check the message:

with self.assertRaisesRegex(TypeError, "object of type"):
    len(10)

Exception messages are part of user-visible behavior in many cases, but they can be more fragile than exception types. Test the message only when the wording matters.

87.6 Testing Warnings

Use warning helpers rather than relying on global warning state.

import warnings
import unittest

class WarningTests(unittest.TestCase):

    def test_deprecated_path_warns(self):
        with self.assertWarns(DeprecationWarning):
            warnings.warn("old", DeprecationWarning)

CPython also has helpers under:

from test.support import warnings_helper

These helpers are useful when tests need precise control over warning filters.

87.7 Test Isolation

Tests must clean up after themselves.

Bad tests leave behind:

modified sys.path
modified sys.modules
changed current directory
environment variables
open file descriptors
running threads
temporary files
registered signal handlers
changed warning filters

Use helpers and context managers.

import os
import unittest
from test.support import os_helper

class EnvTests(unittest.TestCase):

    def test_environment_change(self):
        with os_helper.EnvironmentVarGuard() as env:
            env["CPYTHON_TEST_VALUE"] = "1"
            self.assertEqual(os.environ["CPYTHON_TEST_VALUE"], "1")

After the block, the environment is restored.

87.8 Temporary Files and Directories

Use temporary directories for filesystem tests.

import pathlib
import tempfile
import unittest

class FileTests(unittest.TestCase):

    def test_write_file(self):
        with tempfile.TemporaryDirectory() as tmp:
            path = pathlib.Path(tmp) / "data.txt"
            path.write_text("hello", encoding="utf-8")
            self.assertEqual(path.read_text(encoding="utf-8"), "hello")

Do not write into the repository root or the current working directory unless the test runner explicitly provides a managed location.

87.9 Platform-Specific Tests

CPython supports many platforms. Tests must account for differences.

import sys
import unittest

@unittest.skipUnless(sys.platform == "win32", "Windows only")
class WindowsTests(unittest.TestCase):

    def test_windows_behavior(self):
        ...

For POSIX-only behavior:

@unittest.skipIf(sys.platform == "win32", "POSIX only")
def test_posix_behavior(self):
    ...

Prefer feature checks over platform checks when possible.

Better:

import os
import unittest

@unittest.skipUnless(hasattr(os, "fork"), "requires fork")
def test_fork_behavior(self):
    ...

Feature checks are more accurate across unusual platforms.

87.10 Resource-Gated Tests

Some tests require external or expensive resources.

Examples:

network
large memory
large files
audio
GUI
CPU-intensive work

Tests using these resources should be gated so normal test runs stay fast and reliable.

Conceptual pattern:

from test.support import requires

requires("network")

Then the test only runs when that resource is enabled with -u.

./python -m test -u network test_socket

87.11 Avoiding Timing Bugs

Timing assumptions cause flaky tests.

Bad:

import time

time.sleep(0.1)
self.assertTrue(thread_finished)

Better:

thread.join(timeout=5)
self.assertFalse(thread.is_alive())

Even better, use synchronization primitives:

import threading

ready = threading.Event()

def worker():
    ready.set()

t = threading.Thread(target=worker)
t.start()

self.assertTrue(ready.wait(timeout=5))
t.join()

Tests should wait for a condition, not for an arbitrary amount of time.

87.12 Testing Threads

Thread tests must join all threads they create.

import threading
import unittest

class ThreadTests(unittest.TestCase):

    def test_worker_runs(self):
        seen = []

        def worker():
            seen.append(1)

        t = threading.Thread(target=worker)
        t.start()
        t.join(timeout=5)

        self.assertFalse(t.is_alive())
        self.assertEqual(seen, [1])

A test that leaves a thread running can poison later tests.

87.13 Testing Subprocess Behavior

Use subprocesses when testing interpreter startup, process-global state, fatal errors, command-line flags, or environment behavior.

Example shape:

import subprocess
import sys
import unittest

class SubprocessTests(unittest.TestCase):

    def test_command_line_execution(self):
        proc = subprocess.run(
            [sys.executable, "-c", "print('ok')"],
            text=True,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            check=True,
        )
        self.assertEqual(proc.stdout, "ok\n")

In CPython tests, prefer support helpers where available because they handle build-tree details and platform differences.

87.14 Testing C API Behavior

Some C API behavior cannot be tested from pure Python.

For that, CPython uses test extension modules such as:

_testcapi
_testinternalcapi
_testlimitedcapi

Python-level tests call functions exposed by those modules.

Example shape:

import _testcapi
import unittest

class CAPITests(unittest.TestCase):

    def test_some_c_api_behavior(self):
        self.assertEqual(_testcapi.some_helper(), expected)

The C helper should be minimal. It should expose the specific runtime behavior needed by the test, not a broad unrelated API.

87.15 Testing Reference Leaks

Reference leak tests are run through regrtest.

./python -m test -R 3:3 test_name

A test suitable for leak detection should be deterministic. It should not keep intentional global state across runs unless that state is warmed up before measurement.

Common causes of apparent leaks:

global caches initialized during first measured run
interned strings
module-level singletons
warnings registries
lazy imports
thread-local state

If a test initializes a cache intentionally, structure it so warmup runs absorb the one-time allocation.

87.16 Testing Garbage Collection

GC tests often need explicit collection.

import gc
import weakref
import unittest

class GCTests(unittest.TestCase):

    def test_cycle_collected(self):
        obj = []
        ref = weakref.ref(obj) if False else None

Built-in lists cannot be weak-referenced, so use a weak-referenceable object:

import gc
import weakref
import unittest

class Node:
    pass

class GCTests(unittest.TestCase):

    def test_cycle_collected(self):
        obj = Node()
        obj.self = obj
        ref = weakref.ref(obj)

        del obj
        gc.collect()

        self.assertIsNone(ref())

This tests that the cycle becomes unreachable and is collected.

87.17 Testing Imports

Import tests must handle sys.modules carefully.

Bad:

import mymodule
del sys.modules["mymodule"]

Better use import helper utilities where possible.

Conceptual pattern:

import sys
import unittest
from test.support import import_helper

class ImportTests(unittest.TestCase):

    def test_fresh_import(self):
        with import_helper.CleanImport("json"):
            import json
            self.assertEqual(json.__name__, "json")

Import tests are easy to make order-dependent. Keep cleanup explicit.

87.18 Testing Bytecode and Compiler Behavior

Compiler tests usually validate observable behavior, not exact bytecode, unless the bytecode itself is the target.

Good:

def test_assignment_expression_scope(self):
    ns = {}
    exec("x = 1\n", ns)
    self.assertEqual(ns["x"], 1)

Bytecode-specific tests may use dis, but they are more version-sensitive.

import dis

instructions = list(dis.get_instructions(func))

Use bytecode assertions only when testing bytecode generation, optimization, tracing, or interpreter dispatch.

87.19 Testing Error Paths

Error paths need direct tests.

Many C bugs live in failure cleanup. Force failure when possible.

Examples:

invalid argument type
overflow input
closed file
bad encoding
missing attribute
recursive call limit
allocation failure helper

A good error-path test checks both exception type and post-error state.

with self.assertRaises(TypeError):
    target(unknown_argument=1)

self.assertEqual(state, expected_state)

87.20 Avoiding Over-Specified Tests

Tests should protect behavior, not accidental implementation details.

Bad:

self.assertEqual(repr(d), "{'a': 1, 'b': 2}")

This is valid only if order and representation are part of the intended behavior.

Better:

self.assertEqual(d, {"a": 1, "b": 2})

Over-specified tests make future implementation changes harder without improving correctness.

87.21 Good Test Names

Test names should say what behavior is protected.

Weak:

def test_bug(self):
    ...

Better:

def test_subclass_dict_lookup_uses_override(self):
    ...

Weak:

def test_case_1(self):
    ...

Better:

def test_gc_untracks_object_before_dealloc(self):
    ...

A failing test name should point the maintainer toward the broken invariant.

87.22 Keeping Tests Small

One test should usually check one behavior.

Bad:

def test_everything_about_lists(self):
    ...

Better:

def test_append_increases_length(self):
    ...

def test_pop_returns_last_item(self):
    ...

def test_pop_empty_raises_index_error(self):
    ...

Small tests make failures easier to diagnose.

87.23 When to Add a New Test File

Add a new test file when:

the subsystem has no existing test module
the feature is large enough to deserve its own file
the existing file would become too broad
the test needs special setup shared by many cases

Otherwise, prefer adding to an existing targeted file.

A scattered test suite is harder to maintain. A giant unrelated file is also harder to maintain. Choose the narrowest natural home.

87.24 Running the New Test

Run the specific test:

./python -m test -v test_name

Run related tests:

./python -m test -v test_module_a test_module_b

Run with fail-fast:

./python -m test -v -x test_name

Run leak checks when touching C code or object lifetime:

./python -m test -R 3:3 test_name

Run under a debug build before considering the test reliable.

87.25 Common Mistakes

Mistake	Better approach
Sleeping for timing	Wait for explicit condition
Leaving files behind	Use temporary directories
Modifying `sys.path` manually	Use support helpers and cleanup
Depending on test order	Make setup explicit
Testing implementation details	Test observable behavior
Ignoring platform differences	Use feature checks and skips
Creating global state	Clean it up or isolate it
Large broad regression test	Minimal focused regression test

87.26 Core Principle

A CPython test should encode one durable invariant.

It should describe the expected behavior precisely enough to catch regressions, but not so tightly that it freezes irrelevant implementation details. Good tests make the runtime safer to change.