Building CPython with AddressSanitizer, UBSan, and ThreadSanitizer to catch memory and concurrency bugs.
Sanitizers are compiler instrumentation tools that detect low-level C and C++ defects at runtime. They are especially useful for CPython development because CPython is a large C program with manual memory management, custom allocators, platform-specific code paths, and many extension-module boundaries.
A debug build catches CPython invariant violations. Sanitizers catch classes of C undefined behavior that may survive ordinary assertions.
85.1 What Sanitizers Detect
Sanitizers insert checks into compiled code. The resulting binary runs slower, but it reports precise failures when dangerous behavior occurs.
| Sanitizer | Common name | Detects |
|---|---|---|
| AddressSanitizer | ASan | Heap overflow, stack overflow, use-after-free, double free |
| UndefinedBehaviorSanitizer | UBSan | Undefined integer, pointer, cast, shift, and alignment behavior |
| ThreadSanitizer | TSan | Data races and unsafe concurrent memory access |
| MemorySanitizer | MSan | Use of uninitialized memory |
| LeakSanitizer | LSan | Memory leaks |
The most common CPython configurations use ASan and UBSan first. TSan is useful for threading and free-threaded work, but it is noisier and more expensive.
85.2 Why Sanitizers Matter for CPython
CPython hides many memory operations behind macros and allocator wrappers.
Examples:
Py_INCREF(op);
Py_DECREF(op);
PyObject_Malloc(size);
PyObject_Free(ptr);
PyMem_Malloc(size);
PyMem_Free(ptr);
PyObject_GC_New(MyObject, type);
PyObject_GC_Del(op);A bug in these paths may corrupt memory long before the interpreter crashes.
Typical sanitizer findings include:
writing past the end of a variable-sized object
reading a freed object after an incorrect Py_DECREF
using an uninitialized struct field
misaligned pointer access
invalid enum or integer conversion
data race in shared runtime stateWithout instrumentation, these bugs often appear as unrelated failures later in execution.
85.3 Sanitizers vs Debug Builds
Debug builds and sanitizer builds overlap, but they answer different questions.
| Build | Best question answered |
|---|---|
| Debug build | Did CPython violate an internal invariant? |
| ASan build | Did C code access invalid memory? |
| UBSan build | Did C code execute undefined behavior? |
| TSan build | Did threads race on shared memory? |
| MSan build | Did code read uninitialized memory? |
A strong development workflow uses more than one configuration.
debug build first
→ fix CPython-level assertions
ASan or UBSan build next
→ fix low-level C memory and UB bugs
TSan build for concurrency changes
→ fix data races
optimized build last
→ measure performance85.4 Building With AddressSanitizer
A typical Unix build:
make clean
./configure --with-pydebug CFLAGS="-O1 -g -fsanitize=address" LDFLAGS="-fsanitize=address"
make -j8Then run:
./python -m test test_gcFor a direct script:
./python script.pyASan usually works best with -O1 or -O0 and debug symbols. Higher optimization can make reports harder to read.
85.5 AddressSanitizer Report Shape
An ASan report usually includes:
error type
faulting address
stack trace of invalid access
stack trace of allocation
stack trace of deallocation
shadow memory information
summary lineExample shape:
ERROR: AddressSanitizer: heap-use-after-free on address 0x...
READ of size 8 at 0x...
#0 function_a file.c:123
#1 function_b file.c:456
freed by thread T0 here:
#0 free
#1 PyObject_Free
#2 object_dealloc file.c:88
previously allocated by thread T0 here:
#0 malloc
#1 PyObject_Malloc
#2 object_new file.c:44The most useful parts are usually:
where the invalid access happened
where the object was freed
where the object was allocatedThis gives a lifetime history that a normal debugger does not provide automatically.
85.6 Common ASan Findings in CPython
Use-after-free
Typical cause:
Py_DECREF(op);
/* op may now be freed */
return PyObject_Repr(op);Correct pattern:
Py_INCREF(op);
Py_DECREF(container);
PyObject *repr = PyObject_Repr(op);
Py_DECREF(op);
return repr;The correct fix depends on ownership. The key rule is simple: do not use a pointer after releasing the last reference that may keep it alive.
Buffer overflow
Example:
char buf[8];
memcpy(buf, src, 16);ASan reports stack or heap overflow.
In CPython this may happen with:
manual buffer manipulation
Unicode internals
bytes construction
parser buffers
path conversion
extension modulesWrong deallocation
Example:
void *p = PyMem_Malloc(100);
PyObject_Free(p);Allocator families must match.
Use:
void *p = PyMem_Malloc(100);
PyMem_Free(p);CPython has several allocator domains. Mixing them can corrupt allocator state.
85.7 Building With UndefinedBehaviorSanitizer
Typical build:
make clean
./configure --with-pydebug CFLAGS="-O1 -g -fsanitize=undefined" LDFLAGS="-fsanitize=undefined"
make -j8Run:
./python -m test test_long test_float test_structUBSan detects undefined or suspicious behavior such as:
signed integer overflow
invalid shift
division by zero
misaligned pointer access
invalid enum value
out-of-bounds array indexing
null pointer passed to nonnull functionC undefined behavior is dangerous because the compiler may optimize under assumptions that make broken code behave unpredictably.
85.8 Common UBSan Findings
Invalid Shift
long x = 1L << shift;If shift is negative or too large, behavior is undefined.
Safer pattern:
if (shift < 0 || shift >= PyLong_SHIFT) {
PyErr_SetString(PyExc_ValueError, "invalid shift");
return NULL;
}Signed Integer Overflow
int n = a + b;If the result exceeds the range of int, signed overflow is undefined in C.
Use checked arithmetic or wider types when overflow is possible.
Misaligned Access
int *p = (int *)(buffer + 1);
int x = *p;Some architectures allow this. Others trap. UBSan reports it because the C abstract machine treats it as invalid.
85.9 Combining ASan and UBSan
ASan and UBSan are commonly combined:
make clean
./configure --with-pydebug \
CFLAGS="-O1 -g -fsanitize=address,undefined" \
LDFLAGS="-fsanitize=address,undefined"
make -j8Run a focused test:
./python -m test -v test_gcRun more broadly:
./python -m test -j0This combination catches many practical C bugs.
85.10 Useful ASan Environment Variables
ASan behavior can be configured with ASAN_OPTIONS.
Example:
ASAN_OPTIONS=detect_leaks=0 ./python -m test test_gcCommon options:
| Option | Use |
|---|---|
detect_leaks=0 | Disable leak detection if too noisy |
abort_on_error=1 | Abort immediately on first error |
symbolize=1 | Produce symbolized stack traces |
detect_stack_use_after_return=1 | Catch more stack lifetime bugs |
allocator_may_return_null=1 | Return null instead of aborting on allocation failure |
Example:
ASAN_OPTIONS=abort_on_error=1:symbolize=1 ./python -m test test_dict85.11 Useful UBSan Environment Variables
UBSan behavior can be configured with UBSAN_OPTIONS.
Example:
UBSAN_OPTIONS=print_stacktrace=1 ./python -m test test_longCommon options:
| Option | Use |
|---|---|
print_stacktrace=1 | Print stack traces |
halt_on_error=1 | Stop on first undefined behavior report |
suppressions=path | Use suppression file |
Example:
UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1 ./python -m test test_float85.12 Suppression Files
Sanitizers sometimes report known third-party or platform issues.
A suppression file can hide selected reports.
Example ASan suppression shape:
interceptor_via_fun:some_system_functionExample run:
ASAN_OPTIONS=suppressions=asan.supp ./python -m testSuppressions should be used carefully.
Good uses:
known external library issue
platform runtime issue
temporary local investigationBad uses:
hiding a real CPython bug
making CI green without root cause
ignoring new reports in changed code85.13 ThreadSanitizer
ThreadSanitizer detects data races.
Typical build:
make clean
./configure --with-pydebug \
CFLAGS="-O1 -g -fsanitize=thread" \
LDFLAGS="-fsanitize=thread"
make -j8TSan is most useful when working on:
free-threaded CPython
GIL changes
subinterpreters
thread state
object synchronization
runtime-global state
extension module thread safetyA data race means two threads access the same memory concurrently, at least one access writes, and there is no valid synchronization.
TSan reports can be noisy because CPython contains deliberate low-level synchronization patterns and platform-dependent primitives.
85.14 MemorySanitizer
MemorySanitizer detects reads from uninitialized memory.
It is harder to use than ASan or UBSan because all code, including dependencies, should ideally be built with MSan instrumentation.
Typical use cases:
new parser buffers
new object structs
new C extension code
new memory allocation paths
new platform abstraction codeUninitialized fields are common in partially constructed objects.
Example:
typedef struct {
PyObject_HEAD
PyObject *name;
int flags;
} MyObject;
static PyObject *
my_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
MyObject *self = (MyObject *)type->tp_alloc(type, 0);
if (self == NULL) {
return NULL;
}
self->name = NULL;
/* self->flags left uninitialized */
return (PyObject *)self;
}If flags is read later, MSan can report it.
85.15 LeakSanitizer
LeakSanitizer is often integrated with ASan.
For CPython, leak reports can be noisy because the interpreter deliberately keeps some objects alive until process exit.
Examples:
interned strings
singletons
caches
module state
allocator arenas
immortal objectsFor this reason, CPython reference leak testing with -R often gives more targeted results for Python object leaks.
Use LSan for C allocation leaks, especially in new native code.
85.16 Running CPython Tests With Sanitizers
Focused run:
ASAN_OPTIONS=abort_on_error=1:symbolize=1 \
UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1 \
./python -m test -v test_gcParallel run:
./python -m test -j0For sanitizer builds, parallel test runs can be memory-heavy. If the machine starts swapping, reduce the worker count:
./python -m test -j4Long-running sanitizer sessions are slower than normal debug builds. Start with focused tests.
85.17 Sanitizers and CPython Allocators
CPython uses specialized allocators for small objects. These can interfere with sanitizer visibility.
Useful option:
PYTHONMALLOC=malloc ./python -m test test_gcThis asks CPython to use the system allocator instead of its specialized allocator for Python memory domains.
For ASan, this can produce better reports because ASan intercepts system malloc and free.
Common pattern:
PYTHONMALLOC=malloc \
ASAN_OPTIONS=abort_on_error=1:symbolize=1 \
./python -m test test_bytesFor some bugs, use both modes:
default CPython allocator
→ catches CPython allocator-specific issues
PYTHONMALLOC=malloc
→ gives sanitizer better heap visibility85.18 Sanitizers and Optimization Levels
Optimization affects sanitizer reports.
| Optimization | Effect |
|---|---|
-O0 | Easier debugging, slower |
-O1 | Good sanitizer default |
-O2 | Closer to release behavior, harder traces |
-Og | Debug-friendly optimization |
Many developers use:
CFLAGS="-O1 -g -fsanitize=address,undefined"This balances report quality and practical runtime.
85.19 Debugging a Sanitizer Failure
When a sanitizer report appears:
1. Read the error type.
2. Find the invalid access stack.
3. Find the allocation stack.
4. Find the free stack if present.
5. Reduce the test case.
6. Rebuild with symbols.
7. Reproduce under the same environment variables.
8. Fix the first reported bug first.Do not chase later failures before fixing the first sanitizer report. Memory corruption usually creates cascading symptoms.
85.20 Example Use-After-Free Investigation
Suppose ASan reports:
heap-use-after-free in PyObject_Repr
freed by list_dealloc
allocated by list_newA likely ownership bug is:
PyObject *item = PyList_GET_ITEM(list, 0); /* borrowed */
Py_DECREF(list);
return PyObject_Repr(item);item is a borrowed reference. If list owns the only reference to item, destroying list may free item.
Correct pattern:
PyObject *item = PyList_GET_ITEM(list, 0); /* borrowed */
Py_INCREF(item);
Py_DECREF(list);
PyObject *repr = PyObject_Repr(item);
Py_DECREF(item);
return repr;The sanitizer report points to the invalid read, but the semantic bug is reference ownership.
85.21 Example Undefined Behavior Investigation
Suppose UBSan reports:
runtime error: shift exponent 64 is too large for 64-bit typeProblem shape:
uint64_t mask = 1ULL << bits;If bits == 64, the shift is invalid.
Correct pattern:
uint64_t mask;
if (bits == 64) {
mask = UINT64_MAX;
}
else {
mask = (1ULL << bits) - 1;
}UBSan reports C-level invalid behavior before it becomes a platform-specific bug.
85.22 Sanitizers in CI
Sanitizer builds are often part of serious runtime CI.
They are slower and more memory-intensive, but they catch bugs ordinary tests miss.
A useful CI matrix includes:
release build
debug build
ASan plus UBSan build
reference leak build
free-threaded build if relevant
platform-specific buildsFor CPython development, sanitizer failures should be treated as correctness failures unless clearly caused by an external library or known unsupported configuration.
85.23 Limitations
Sanitizers are powerful, but incomplete.
They may miss:
logic bugs
Python semantic regressions
reference leaks that leave valid memory reachable
data races hidden by scheduling
bugs in uninstrumented dependencies
ABI compatibility problems
performance regressionsThey can also report false positives or unsupported patterns in low-level runtime code.
Use sanitizers as part of a toolchain, not as a replacement for tests, review, debug builds, and benchmarks.
85.24 Practical Workflow
A practical CPython sanitizer workflow:
1. Reproduce the issue with a debug build.
2. Build with ASan plus UBSan.
3. Run the smallest relevant test.
4. Use PYTHONMALLOC=malloc if heap reports are unclear.
5. Stop on first sanitizer error.
6. Reduce the failure.
7. Fix ownership, bounds, initialization, or synchronization.
8. Rerun the focused test.
9. Run related tests.
10. Run broader sanitizer tests if the change touches core memory paths.For threading changes, add a TSan build. For struct initialization changes, consider MSan if practical.
85.25 Core Principle
Sanitizers make C mistakes observable.
CPython’s own debug checks tell you when interpreter invariants are broken. Sanitizers tell you when the underlying C program has stepped outside safe memory, valid arithmetic, or synchronized access. For runtime work, both views are necessary.