test_* module conventions, unittest patterns, support helpers in Lib/test/support/, and writing C-level tests.
Core tests in CPython are regression tests for the interpreter, the standard library, and the C API. They are written to protect behavior that CPython promises to users, extension authors, and downstream distributors.
A good CPython test is small, deterministic, isolated, and directly connected to a specific behavior. It should fail before the fix and pass after the fix. It should avoid timing assumptions, external services, machine-specific behavior, and hidden dependency on test order.
87.1 Where Core Tests Live
Most tests live under:
Lib/test/Common examples:
Lib/test/test_dict.py
Lib/test/test_list.py
Lib/test/test_gc.py
Lib/test/test_importlib/
Lib/test/test_asyncio/
Lib/test/test_capi/Shared helpers live under:
Lib/test/support/C-level test helpers often live in:
Modules/_test*.cThese native helper modules expose controlled C behavior to Python-level tests.
87.2 Test File Naming
Test modules usually follow this pattern:
test_<feature>.pyExamples:
test_unicode.py
test_compile.py
test_descr.py
test_frame.py
test_sys.pyThe file name should make the tested subsystem obvious.
A test for dictionary behavior belongs in test_dict.py. A test for garbage collector behavior belongs in test_gc.py. A test for interpreter frame behavior may belong in test_frame.py, test_sys.py, or a more specific existing file, depending on the behavior.
87.3 Basic Test Structure
Most CPython tests use unittest.
import unittest
class DictTests(unittest.TestCase):
def test_lookup_existing_key(self):
d = {"x": 1}
self.assertEqual(d["x"], 1)
if __name__ == "__main__":
unittest.main()When run through regrtest, the test runner discovers and executes the test module.
./python -m test test_dictFor direct execution:
./python Lib/test/test_dict.py87.4 Regression Tests
A regression test protects against a bug returning.
A good regression test has this shape:
def test_specific_bug(self):
# Minimal setup that used to fail.
...
# Direct assertion for the corrected behavior.
self.assertEqual(actual, expected)Avoid broad tests that validate many unrelated things at once.
Weak regression test:
def test_many_things(self):
...Better:
def test_dict_preserves_value_after_resize(self):
...The test name should describe the behavior, not just the bug number.
87.5 Testing Exceptions
Use assertRaises for expected exceptions.
with self.assertRaises(TypeError):
len(10)To check the message:
with self.assertRaisesRegex(TypeError, "object of type"):
len(10)Exception messages are part of user-visible behavior in many cases, but they can be more fragile than exception types. Test the message only when the wording matters.
87.6 Testing Warnings
Use warning helpers rather than relying on global warning state.
import warnings
import unittest
class WarningTests(unittest.TestCase):
def test_deprecated_path_warns(self):
with self.assertWarns(DeprecationWarning):
warnings.warn("old", DeprecationWarning)CPython also has helpers under:
from test.support import warnings_helperThese helpers are useful when tests need precise control over warning filters.
87.7 Test Isolation
Tests must clean up after themselves.
Bad tests leave behind:
modified sys.path
modified sys.modules
changed current directory
environment variables
open file descriptors
running threads
temporary files
registered signal handlers
changed warning filtersUse helpers and context managers.
import os
import unittest
from test.support import os_helper
class EnvTests(unittest.TestCase):
def test_environment_change(self):
with os_helper.EnvironmentVarGuard() as env:
env["CPYTHON_TEST_VALUE"] = "1"
self.assertEqual(os.environ["CPYTHON_TEST_VALUE"], "1")After the block, the environment is restored.
87.8 Temporary Files and Directories
Use temporary directories for filesystem tests.
import pathlib
import tempfile
import unittest
class FileTests(unittest.TestCase):
def test_write_file(self):
with tempfile.TemporaryDirectory() as tmp:
path = pathlib.Path(tmp) / "data.txt"
path.write_text("hello", encoding="utf-8")
self.assertEqual(path.read_text(encoding="utf-8"), "hello")Do not write into the repository root or the current working directory unless the test runner explicitly provides a managed location.
87.9 Platform-Specific Tests
CPython supports many platforms. Tests must account for differences.
import sys
import unittest
@unittest.skipUnless(sys.platform == "win32", "Windows only")
class WindowsTests(unittest.TestCase):
def test_windows_behavior(self):
...For POSIX-only behavior:
@unittest.skipIf(sys.platform == "win32", "POSIX only")
def test_posix_behavior(self):
...Prefer feature checks over platform checks when possible.
Better:
import os
import unittest
@unittest.skipUnless(hasattr(os, "fork"), "requires fork")
def test_fork_behavior(self):
...Feature checks are more accurate across unusual platforms.
87.10 Resource-Gated Tests
Some tests require external or expensive resources.
Examples:
network
large memory
large files
audio
GUI
CPU-intensive workTests using these resources should be gated so normal test runs stay fast and reliable.
Conceptual pattern:
from test.support import requires
requires("network")Then the test only runs when that resource is enabled with -u.
./python -m test -u network test_socket87.11 Avoiding Timing Bugs
Timing assumptions cause flaky tests.
Bad:
import time
time.sleep(0.1)
self.assertTrue(thread_finished)Better:
thread.join(timeout=5)
self.assertFalse(thread.is_alive())Even better, use synchronization primitives:
import threading
ready = threading.Event()
def worker():
ready.set()
t = threading.Thread(target=worker)
t.start()
self.assertTrue(ready.wait(timeout=5))
t.join()Tests should wait for a condition, not for an arbitrary amount of time.
87.12 Testing Threads
Thread tests must join all threads they create.
import threading
import unittest
class ThreadTests(unittest.TestCase):
def test_worker_runs(self):
seen = []
def worker():
seen.append(1)
t = threading.Thread(target=worker)
t.start()
t.join(timeout=5)
self.assertFalse(t.is_alive())
self.assertEqual(seen, [1])A test that leaves a thread running can poison later tests.
87.13 Testing Subprocess Behavior
Use subprocesses when testing interpreter startup, process-global state, fatal errors, command-line flags, or environment behavior.
Example shape:
import subprocess
import sys
import unittest
class SubprocessTests(unittest.TestCase):
def test_command_line_execution(self):
proc = subprocess.run(
[sys.executable, "-c", "print('ok')"],
text=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
check=True,
)
self.assertEqual(proc.stdout, "ok\n")In CPython tests, prefer support helpers where available because they handle build-tree details and platform differences.
87.14 Testing C API Behavior
Some C API behavior cannot be tested from pure Python.
For that, CPython uses test extension modules such as:
_testcapi
_testinternalcapi
_testlimitedcapiPython-level tests call functions exposed by those modules.
Example shape:
import _testcapi
import unittest
class CAPITests(unittest.TestCase):
def test_some_c_api_behavior(self):
self.assertEqual(_testcapi.some_helper(), expected)The C helper should be minimal. It should expose the specific runtime behavior needed by the test, not a broad unrelated API.
87.15 Testing Reference Leaks
Reference leak tests are run through regrtest.
./python -m test -R 3:3 test_nameA test suitable for leak detection should be deterministic. It should not keep intentional global state across runs unless that state is warmed up before measurement.
Common causes of apparent leaks:
global caches initialized during first measured run
interned strings
module-level singletons
warnings registries
lazy imports
thread-local stateIf a test initializes a cache intentionally, structure it so warmup runs absorb the one-time allocation.
87.16 Testing Garbage Collection
GC tests often need explicit collection.
import gc
import weakref
import unittest
class GCTests(unittest.TestCase):
def test_cycle_collected(self):
obj = []
ref = weakref.ref(obj) if False else NoneBuilt-in lists cannot be weak-referenced, so use a weak-referenceable object:
import gc
import weakref
import unittest
class Node:
pass
class GCTests(unittest.TestCase):
def test_cycle_collected(self):
obj = Node()
obj.self = obj
ref = weakref.ref(obj)
del obj
gc.collect()
self.assertIsNone(ref())This tests that the cycle becomes unreachable and is collected.
87.17 Testing Imports
Import tests must handle sys.modules carefully.
Bad:
import mymodule
del sys.modules["mymodule"]Better use import helper utilities where possible.
Conceptual pattern:
import sys
import unittest
from test.support import import_helper
class ImportTests(unittest.TestCase):
def test_fresh_import(self):
with import_helper.CleanImport("json"):
import json
self.assertEqual(json.__name__, "json")Import tests are easy to make order-dependent. Keep cleanup explicit.
87.18 Testing Bytecode and Compiler Behavior
Compiler tests usually validate observable behavior, not exact bytecode, unless the bytecode itself is the target.
Good:
def test_assignment_expression_scope(self):
ns = {}
exec("x = 1\n", ns)
self.assertEqual(ns["x"], 1)Bytecode-specific tests may use dis, but they are more version-sensitive.
import dis
instructions = list(dis.get_instructions(func))Use bytecode assertions only when testing bytecode generation, optimization, tracing, or interpreter dispatch.
87.19 Testing Error Paths
Error paths need direct tests.
Many C bugs live in failure cleanup. Force failure when possible.
Examples:
invalid argument type
overflow input
closed file
bad encoding
missing attribute
recursive call limit
allocation failure helperA good error-path test checks both exception type and post-error state.
with self.assertRaises(TypeError):
target(unknown_argument=1)
self.assertEqual(state, expected_state)87.20 Avoiding Over-Specified Tests
Tests should protect behavior, not accidental implementation details.
Bad:
self.assertEqual(repr(d), "{'a': 1, 'b': 2}")This is valid only if order and representation are part of the intended behavior.
Better:
self.assertEqual(d, {"a": 1, "b": 2})Over-specified tests make future implementation changes harder without improving correctness.
87.21 Good Test Names
Test names should say what behavior is protected.
Weak:
def test_bug(self):
...Better:
def test_subclass_dict_lookup_uses_override(self):
...Weak:
def test_case_1(self):
...Better:
def test_gc_untracks_object_before_dealloc(self):
...A failing test name should point the maintainer toward the broken invariant.
87.22 Keeping Tests Small
One test should usually check one behavior.
Bad:
def test_everything_about_lists(self):
...Better:
def test_append_increases_length(self):
...
def test_pop_returns_last_item(self):
...
def test_pop_empty_raises_index_error(self):
...Small tests make failures easier to diagnose.
87.23 When to Add a New Test File
Add a new test file when:
the subsystem has no existing test module
the feature is large enough to deserve its own file
the existing file would become too broad
the test needs special setup shared by many casesOtherwise, prefer adding to an existing targeted file.
A scattered test suite is harder to maintain. A giant unrelated file is also harder to maintain. Choose the narrowest natural home.
87.24 Running the New Test
Run the specific test:
./python -m test -v test_nameRun related tests:
./python -m test -v test_module_a test_module_bRun with fail-fast:
./python -m test -v -x test_nameRun leak checks when touching C code or object lifetime:
./python -m test -R 3:3 test_nameRun under a debug build before considering the test reliable.
87.25 Common Mistakes
| Mistake | Better approach |
|---|---|
| Sleeping for timing | Wait for explicit condition |
| Leaving files behind | Use temporary directories |
Modifying sys.path manually | Use support helpers and cleanup |
| Depending on test order | Make setup explicit |
| Testing implementation details | Test observable behavior |
| Ignoring platform differences | Use feature checks and skips |
| Creating global state | Clean it up or isolate it |
| Large broad regression test | Minimal focused regression test |
87.26 Core Principle
A CPython test should encode one durable invariant.
It should describe the expected behavior precisely enough to catch regressions, but not so tightly that it freezes irrelevant implementation details. Good tests make the runtime safer to change.