# 87. Writing Core Tests

# 87. Writing Core Tests

Core tests in CPython are regression tests for the interpreter, the standard library, and the C API. They are written to protect behavior that CPython promises to users, extension authors, and downstream distributors.

A good CPython test is small, deterministic, isolated, and directly connected to a specific behavior. It should fail before the fix and pass after the fix. It should avoid timing assumptions, external services, machine-specific behavior, and hidden dependency on test order.

## 87.1 Where Core Tests Live

Most tests live under:

```text
Lib/test/
```

Common examples:

```text
Lib/test/test_dict.py
Lib/test/test_list.py
Lib/test/test_gc.py
Lib/test/test_importlib/
Lib/test/test_asyncio/
Lib/test/test_capi/
```

Shared helpers live under:

```text
Lib/test/support/
```

C-level test helpers often live in:

```text
Modules/_test*.c
```

These native helper modules expose controlled C behavior to Python-level tests.

## 87.2 Test File Naming

Test modules usually follow this pattern:

```text
test_<feature>.py
```

Examples:

```text
test_unicode.py
test_compile.py
test_descr.py
test_frame.py
test_sys.py
```

The file name should make the tested subsystem obvious.

A test for dictionary behavior belongs in `test_dict.py`. A test for garbage collector behavior belongs in `test_gc.py`. A test for interpreter frame behavior may belong in `test_frame.py`, `test_sys.py`, or a more specific existing file, depending on the behavior.

## 87.3 Basic Test Structure

Most CPython tests use `unittest`.

```python
import unittest

class DictTests(unittest.TestCase):

    def test_lookup_existing_key(self):
        d = {"x": 1}
        self.assertEqual(d["x"], 1)

if __name__ == "__main__":
    unittest.main()
```

When run through `regrtest`, the test runner discovers and executes the test module.

```bash
./python -m test test_dict
```

For direct execution:

```bash
./python Lib/test/test_dict.py
```

## 87.4 Regression Tests

A regression test protects against a bug returning.

A good regression test has this shape:

```python
def test_specific_bug(self):
    # Minimal setup that used to fail.
    ...
    # Direct assertion for the corrected behavior.
    self.assertEqual(actual, expected)
```

Avoid broad tests that validate many unrelated things at once.

Weak regression test:

```python
def test_many_things(self):
    ...
```

Better:

```python
def test_dict_preserves_value_after_resize(self):
    ...
```

The test name should describe the behavior, not just the bug number.

## 87.5 Testing Exceptions

Use `assertRaises` for expected exceptions.

```python
with self.assertRaises(TypeError):
    len(10)
```

To check the message:

```python
with self.assertRaisesRegex(TypeError, "object of type"):
    len(10)
```

Exception messages are part of user-visible behavior in many cases, but they can be more fragile than exception types. Test the message only when the wording matters.

## 87.6 Testing Warnings

Use warning helpers rather than relying on global warning state.

```python
import warnings
import unittest

class WarningTests(unittest.TestCase):

    def test_deprecated_path_warns(self):
        with self.assertWarns(DeprecationWarning):
            warnings.warn("old", DeprecationWarning)
```

CPython also has helpers under:

```python
from test.support import warnings_helper
```

These helpers are useful when tests need precise control over warning filters.

## 87.7 Test Isolation

Tests must clean up after themselves.

Bad tests leave behind:

```text
modified sys.path
modified sys.modules
changed current directory
environment variables
open file descriptors
running threads
temporary files
registered signal handlers
changed warning filters
```

Use helpers and context managers.

```python
import os
import unittest
from test.support import os_helper

class EnvTests(unittest.TestCase):

    def test_environment_change(self):
        with os_helper.EnvironmentVarGuard() as env:
            env["CPYTHON_TEST_VALUE"] = "1"
            self.assertEqual(os.environ["CPYTHON_TEST_VALUE"], "1")
```

After the block, the environment is restored.

## 87.8 Temporary Files and Directories

Use temporary directories for filesystem tests.

```python
import pathlib
import tempfile
import unittest

class FileTests(unittest.TestCase):

    def test_write_file(self):
        with tempfile.TemporaryDirectory() as tmp:
            path = pathlib.Path(tmp) / "data.txt"
            path.write_text("hello", encoding="utf-8")
            self.assertEqual(path.read_text(encoding="utf-8"), "hello")
```

Do not write into the repository root or the current working directory unless the test runner explicitly provides a managed location.

## 87.9 Platform-Specific Tests

CPython supports many platforms. Tests must account for differences.

```python
import sys
import unittest

@unittest.skipUnless(sys.platform == "win32", "Windows only")
class WindowsTests(unittest.TestCase):

    def test_windows_behavior(self):
        ...
```

For POSIX-only behavior:

```python
@unittest.skipIf(sys.platform == "win32", "POSIX only")
def test_posix_behavior(self):
    ...
```

Prefer feature checks over platform checks when possible.

Better:

```python
import os
import unittest

@unittest.skipUnless(hasattr(os, "fork"), "requires fork")
def test_fork_behavior(self):
    ...
```

Feature checks are more accurate across unusual platforms.

## 87.10 Resource-Gated Tests

Some tests require external or expensive resources.

Examples:

```text
network
large memory
large files
audio
GUI
CPU-intensive work
```

Tests using these resources should be gated so normal test runs stay fast and reliable.

Conceptual pattern:

```python
from test.support import requires

requires("network")
```

Then the test only runs when that resource is enabled with `-u`.

```bash
./python -m test -u network test_socket
```

## 87.11 Avoiding Timing Bugs

Timing assumptions cause flaky tests.

Bad:

```python
import time

time.sleep(0.1)
self.assertTrue(thread_finished)
```

Better:

```python
thread.join(timeout=5)
self.assertFalse(thread.is_alive())
```

Even better, use synchronization primitives:

```python
import threading

ready = threading.Event()

def worker():
    ready.set()

t = threading.Thread(target=worker)
t.start()

self.assertTrue(ready.wait(timeout=5))
t.join()
```

Tests should wait for a condition, not for an arbitrary amount of time.

## 87.12 Testing Threads

Thread tests must join all threads they create.

```python
import threading
import unittest

class ThreadTests(unittest.TestCase):

    def test_worker_runs(self):
        seen = []

        def worker():
            seen.append(1)

        t = threading.Thread(target=worker)
        t.start()
        t.join(timeout=5)

        self.assertFalse(t.is_alive())
        self.assertEqual(seen, [1])
```

A test that leaves a thread running can poison later tests.

## 87.13 Testing Subprocess Behavior

Use subprocesses when testing interpreter startup, process-global state, fatal errors, command-line flags, or environment behavior.

Example shape:

```python
import subprocess
import sys
import unittest

class SubprocessTests(unittest.TestCase):

    def test_command_line_execution(self):
        proc = subprocess.run(
            [sys.executable, "-c", "print('ok')"],
            text=True,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            check=True,
        )
        self.assertEqual(proc.stdout, "ok\n")
```

In CPython tests, prefer support helpers where available because they handle build-tree details and platform differences.

## 87.14 Testing C API Behavior

Some C API behavior cannot be tested from pure Python.

For that, CPython uses test extension modules such as:

```text
_testcapi
_testinternalcapi
_testlimitedcapi
```

Python-level tests call functions exposed by those modules.

Example shape:

```python
import _testcapi
import unittest

class CAPITests(unittest.TestCase):

    def test_some_c_api_behavior(self):
        self.assertEqual(_testcapi.some_helper(), expected)
```

The C helper should be minimal. It should expose the specific runtime behavior needed by the test, not a broad unrelated API.

## 87.15 Testing Reference Leaks

Reference leak tests are run through `regrtest`.

```bash
./python -m test -R 3:3 test_name
```

A test suitable for leak detection should be deterministic. It should not keep intentional global state across runs unless that state is warmed up before measurement.

Common causes of apparent leaks:

```text
global caches initialized during first measured run
interned strings
module-level singletons
warnings registries
lazy imports
thread-local state
```

If a test initializes a cache intentionally, structure it so warmup runs absorb the one-time allocation.

## 87.16 Testing Garbage Collection

GC tests often need explicit collection.

```python
import gc
import weakref
import unittest

class GCTests(unittest.TestCase):

    def test_cycle_collected(self):
        obj = []
        ref = weakref.ref(obj) if False else None
```

Built-in lists cannot be weak-referenced, so use a weak-referenceable object:

```python
import gc
import weakref
import unittest

class Node:
    pass

class GCTests(unittest.TestCase):

    def test_cycle_collected(self):
        obj = Node()
        obj.self = obj
        ref = weakref.ref(obj)

        del obj
        gc.collect()

        self.assertIsNone(ref())
```

This tests that the cycle becomes unreachable and is collected.

## 87.17 Testing Imports

Import tests must handle `sys.modules` carefully.

Bad:

```python
import mymodule
del sys.modules["mymodule"]
```

Better use import helper utilities where possible.

Conceptual pattern:

```python
import sys
import unittest
from test.support import import_helper

class ImportTests(unittest.TestCase):

    def test_fresh_import(self):
        with import_helper.CleanImport("json"):
            import json
            self.assertEqual(json.__name__, "json")
```

Import tests are easy to make order-dependent. Keep cleanup explicit.

## 87.18 Testing Bytecode and Compiler Behavior

Compiler tests usually validate observable behavior, not exact bytecode, unless the bytecode itself is the target.

Good:

```python
def test_assignment_expression_scope(self):
    ns = {}
    exec("x = 1\n", ns)
    self.assertEqual(ns["x"], 1)
```

Bytecode-specific tests may use `dis`, but they are more version-sensitive.

```python
import dis

instructions = list(dis.get_instructions(func))
```

Use bytecode assertions only when testing bytecode generation, optimization, tracing, or interpreter dispatch.

## 87.19 Testing Error Paths

Error paths need direct tests.

Many C bugs live in failure cleanup. Force failure when possible.

Examples:

```text
invalid argument type
overflow input
closed file
bad encoding
missing attribute
recursive call limit
allocation failure helper
```

A good error-path test checks both exception type and post-error state.

```python
with self.assertRaises(TypeError):
    target(unknown_argument=1)

self.assertEqual(state, expected_state)
```

## 87.20 Avoiding Over-Specified Tests

Tests should protect behavior, not accidental implementation details.

Bad:

```python
self.assertEqual(repr(d), "{'a': 1, 'b': 2}")
```

This is valid only if order and representation are part of the intended behavior.

Better:

```python
self.assertEqual(d, {"a": 1, "b": 2})
```

Over-specified tests make future implementation changes harder without improving correctness.

## 87.21 Good Test Names

Test names should say what behavior is protected.

Weak:

```python
def test_bug(self):
    ...
```

Better:

```python
def test_subclass_dict_lookup_uses_override(self):
    ...
```

Weak:

```python
def test_case_1(self):
    ...
```

Better:

```python
def test_gc_untracks_object_before_dealloc(self):
    ...
```

A failing test name should point the maintainer toward the broken invariant.

## 87.22 Keeping Tests Small

One test should usually check one behavior.

Bad:

```python
def test_everything_about_lists(self):
    ...
```

Better:

```python
def test_append_increases_length(self):
    ...

def test_pop_returns_last_item(self):
    ...

def test_pop_empty_raises_index_error(self):
    ...
```

Small tests make failures easier to diagnose.

## 87.23 When to Add a New Test File

Add a new test file when:

```text
the subsystem has no existing test module
the feature is large enough to deserve its own file
the existing file would become too broad
the test needs special setup shared by many cases
```

Otherwise, prefer adding to an existing targeted file.

A scattered test suite is harder to maintain. A giant unrelated file is also harder to maintain. Choose the narrowest natural home.

## 87.24 Running the New Test

Run the specific test:

```bash
./python -m test -v test_name
```

Run related tests:

```bash
./python -m test -v test_module_a test_module_b
```

Run with fail-fast:

```bash
./python -m test -v -x test_name
```

Run leak checks when touching C code or object lifetime:

```bash
./python -m test -R 3:3 test_name
```

Run under a debug build before considering the test reliable.

## 87.25 Common Mistakes

| Mistake | Better approach |
|---|---|
| Sleeping for timing | Wait for explicit condition |
| Leaving files behind | Use temporary directories |
| Modifying `sys.path` manually | Use support helpers and cleanup |
| Depending on test order | Make setup explicit |
| Testing implementation details | Test observable behavior |
| Ignoring platform differences | Use feature checks and skips |
| Creating global state | Clean it up or isolate it |
| Large broad regression test | Minimal focused regression test |

## 87.26 Core Principle

A CPython test should encode one durable invariant.

It should describe the expected behavior precisely enough to catch regressions, but not so tightly that it freezes irrelevant implementation details. Good tests make the runtime safer to change.