Valgrind suppression files, tracemalloc snapshot diffing, and libasan leak detection in CPython builds.
Memory bugs in CPython are difficult because the visible failure often happens far away from the original mistake. A missing Py_INCREF, an extra Py_DECREF, a wrong allocator call, or an incomplete GC traversal may corrupt state silently. The crash may appear thousands of bytecode instructions later.
Tracing memory bugs means reconstructing object lifetime: where an object was allocated, who owns it, when its reference count changed, when it was freed, and why a later access was invalid.
86.1 Classes of Memory Bugs
CPython memory bugs usually fall into a small number of categories.
| Bug type | Typical cause | Common symptom |
|---|---|---|
| Reference leak | Missing Py_DECREF | Growing memory or -R failure |
| Use-after-free | Extra Py_DECREF or stale borrowed reference | Crash in unrelated code |
| Double free | Object deallocated twice | Allocator abort |
| Buffer overflow | Writing past allocated memory | ASan report or later corruption |
| Uninitialized read | Struct field not initialized | Random branch or MSan report |
| Allocator mismatch | PyMem_Free on PyObject_Malloc memory | Allocator corruption |
| GC traversal bug | Missing Py_VISIT or bad tp_clear | Cycle leak or GC crash |
| Borrowed reference misuse | Container mutation invalidates lifetime | Crash after mutation |
The important point: CPython memory bugs are often ownership bugs first and heap bugs second.
86.2 Start With a Reproducer
A memory bug investigation needs the smallest command that reproduces the issue.
Good reproducer:
./python -m test -v test_gcBetter reproducer:
./python Lib/test/test_gc.py GCRegressionTests.test_specific_caseBest reproducer:
./python /tmp/minimal.pyA small reproducer helps you:
run under GDB
run under ASan
add temporary logging
use watchpoints
repeat quickly
remove unrelated noiseDo not start with the full test suite unless the bug only appears under full-suite pressure.
86.3 Use the Right Build
For memory bugs, use a debug build first:
make clean
./configure --with-pydebug CFLAGS="-O0 -g3"
make -j8Then run:
./python -m test -v test_nameIf the failure looks like invalid memory access, build with ASan:
make clean
./configure --with-pydebug \
CFLAGS="-O1 -g -fsanitize=address,undefined" \
LDFLAGS="-fsanitize=address,undefined"
make -j8Run with:
ASAN_OPTIONS=abort_on_error=1:symbolize=1 \
UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1 \
./python -m test -v test_nameFor allocator visibility, try:
PYTHONMALLOC=malloc ./python -m test -v test_name86.4 Understand the Ownership Contract
Most CPython memory bugs come from violating reference ownership.
There are three core reference categories.
| Reference kind | Meaning | Caller responsibility |
|---|---|---|
| New reference | Caller owns one reference | Must eventually Py_DECREF |
| Borrowed reference | Caller does not own it | Must not decref unless incref first |
| Stolen reference | Callee takes ownership | Caller must not decref after transfer |
Example new reference:
PyObject *x = PyLong_FromLong(10);
if (x == NULL) {
return NULL;
}
/* use x */
Py_DECREF(x);Example borrowed reference:
PyObject *item = PyList_GET_ITEM(list, 0); /* borrowed */If item must survive after list may change or disappear, take ownership:
PyObject *item = PyList_GET_ITEM(list, 0); /* borrowed */
Py_INCREF(item);
/* item is now owned */
Py_DECREF(item);Example stolen reference:
PyTuple_SET_ITEM(tuple, 0, item); /* steals item */After this call, do not decref item separately.
86.5 Trace Reference Count Changes
A reference count bug is a history problem.
You need to answer:
Who created the object?
Who owns it?
Who borrowed it?
Who decref'd it?
Who used it after release?In GDB, inspect a suspicious object:
print op
print Py_REFCNT(op)
print Py_TYPE(op)If you know the object address, set a watchpoint on its reference count:
watch ((PyObject *)0xADDRESS)->ob_refcnt
continueEach stop shows where the reference count changed.
This is slow but precise.
Use it only after reducing the bug to a specific object.
86.6 Use sys.gettotalrefcount for Leaks
Debug builds expose total reference count tracking.
Run reference leak tests:
./python -m test -R 3:3 test_nameMeaning:
3 warmup runs
3 measured runs
compare reference count deltasIf the measured reference count grows consistently, a leak exists.
A typical leak report means some path creates or owns references without releasing them.
Common leak pattern:
PyObject *name = PyUnicode_FromString("field");
if (name == NULL) {
return NULL;
}
PyObject *value = compute_value();
if (value == NULL) {
return NULL; /* leaks name */
}
Py_DECREF(name);
return value;Correct cleanup:
PyObject *name = PyUnicode_FromString("field");
if (name == NULL) {
return NULL;
}
PyObject *value = compute_value();
if (value == NULL) {
Py_DECREF(name);
return NULL;
}
Py_DECREF(name);
return value;Error paths are the most common leak source.
86.7 Audit All Exit Paths
For every function that owns references, inspect all exits.
Example structure:
static PyObject *
make_pair(PyObject *a, PyObject *b)
{
PyObject *tuple = NULL;
PyObject *x = NULL;
PyObject *y = NULL;
x = transform(a);
if (x == NULL) {
goto error;
}
y = transform(b);
if (y == NULL) {
goto error;
}
tuple = PyTuple_New(2);
if (tuple == NULL) {
goto error;
}
PyTuple_SET_ITEM(tuple, 0, x);
PyTuple_SET_ITEM(tuple, 1, y);
return tuple;
error:
Py_XDECREF(x);
Py_XDECREF(y);
Py_XDECREF(tuple);
return NULL;
}But this code has a subtle ownership issue: after PyTuple_SET_ITEM, the tuple owns x and y. The error cleanup must not decref them separately after ownership transfer.
A safer pattern clears variables after stealing:
PyTuple_SET_ITEM(tuple, 0, x);
x = NULL;
PyTuple_SET_ITEM(tuple, 1, y);
y = NULL;
return tuple;Then cleanup can safely use Py_XDECREF.
86.8 New Reference Returned From Functions
A C function returning PyObject * usually returns a new reference on success.
Example:
static PyObject *
make_number(void)
{
return PyLong_FromLong(42);
}The caller owns the result.
If the function returns NULL, it must set an exception unless the exception was already set by a failing API call.
Wrong:
static PyObject *
bad(void)
{
return NULL;
}Correct:
static PyObject *
good(void)
{
PyErr_SetString(PyExc_RuntimeError, "operation failed");
return NULL;
}Memory correctness and exception correctness are connected. Error paths must release owned references and preserve a valid exception state.
86.9 Borrowed References and Container Mutation
Borrowed references are safe only while the owner remains alive and unchanged in relevant ways.
Dangerous pattern:
PyObject *item = PyList_GET_ITEM(list, 0); /* borrowed */
if (PyList_SetItem(list, 0, PyLong_FromLong(1)) < 0) {
return NULL;
}
/* item may now be freed */
return PyObject_Repr(item);Correct pattern:
PyObject *item = PyList_GET_ITEM(list, 0); /* borrowed */
Py_INCREF(item);
if (PyList_SetItem(list, 0, PyLong_FromLong(1)) < 0) {
Py_DECREF(item);
return NULL;
}
PyObject *repr = PyObject_Repr(item);
Py_DECREF(item);
return repr;Container mutation can release references to old elements. A borrowed element pointer may become invalid.
86.10 Allocator Domains
CPython has several allocator families.
| Allocation API | Free API | Typical use |
|---|---|---|
PyObject_Malloc | PyObject_Free | Object-domain allocation |
PyMem_Malloc | PyMem_Free | Python memory-domain allocation |
PyMem_RawMalloc | PyMem_RawFree | Raw memory allocation |
PyObject_GC_New | PyObject_GC_Del | GC-tracked Python objects |
malloc | free | External C allocation |
Allocator calls must be paired correctly.
Wrong:
void *p = PyMem_Malloc(128);
PyObject_Free(p);Correct:
void *p = PyMem_Malloc(128);
PyMem_Free(p);Wrong allocator pairing often appears as heap corruption far from the original call.
86.11 Debug Allocator
Enable allocator debugging:
PYTHONMALLOC=debug ./python script.pyThis can detect:
buffer underflow
buffer overflow
use of freed memory
wrong allocator API family
memory block corruptionFor ASan investigations, also test:
PYTHONMALLOC=malloc ./python script.pyThis routes more allocation through the system allocator, where ASan can observe it directly.
86.12 Garbage Collector Bugs
GC-tracked container objects need correct lifecycle handling.
A GC-aware type must usually implement:
tp_traverse
tp_clear
tp_deallocA traversal function must visit every contained Python reference:
static int
MyType_traverse(MyType *self, visitproc visit, void *arg)
{
Py_VISIT(self->child);
Py_VISIT(self->callback);
return 0;
}A clear function must release strong references and tolerate repeated calls:
static int
MyType_clear(MyType *self)
{
Py_CLEAR(self->child);
Py_CLEAR(self->callback);
return 0;
}Common GC mistakes:
forgetting a field in tp_traverse
decref instead of Py_CLEAR in tp_clear
tracking object before all fields are initialized
failing to untrack before deallocation
using non-GC allocation for GC-tracked object86.13 Track and Untrack Correctly
A container object should usually be tracked only after it is fully initialized.
Sketch:
self = PyObject_GC_New(MyType, type);
if (self == NULL) {
return NULL;
}
self->child = NULL;
self->callback = NULL;
/* initialize fields */
PyObject_GC_Track(self);
return (PyObject *)self;During deallocation:
static void
MyType_dealloc(MyType *self)
{
PyObject_GC_UnTrack(self);
MyType_clear(self);
Py_TYPE(self)->tp_free((PyObject *)self);
}Tracking too early exposes a partially initialized object to the collector.
Untracking too late can expose an object while it is being destroyed.
86.14 Use ASan for Invalid Memory Access
ASan gives allocation and deallocation history.
Run:
ASAN_OPTIONS=abort_on_error=1:symbolize=1 \
PYTHONMALLOC=malloc \
./python -m test -v test_nameA use-after-free report usually contains:
invalid access stack
free stack
allocation stackRead the stacks in that order.
The invalid access tells you where the bad pointer was used.
The free stack tells you when the object became invalid.
The allocation stack tells you what kind of object it was.
86.15 Use GDB Watchpoints for Specific Objects
When ASan tells you an object address, reproduce under GDB and set a watchpoint.
Example:
watch ((PyObject *)0x12345678)->ob_refcnt
runWhen it stops:
bt
py-bt
print Py_REFCNT((PyObject *)0x12345678)
print Py_TYPE((PyObject *)0x12345678)
continueThis reconstructs ownership history.
Watchpoints are expensive, but for a small reproducer they can be decisive.
86.16 Detecting Buffer Overwrites
A buffer overwrite may corrupt adjacent object metadata.
Symptoms:
invalid type pointer
negative refcount
crash in unrelated deallocator
GC list corruption
allocator guard failureUse:
PYTHONMALLOC=debug ./python script.pyor ASan:
PYTHONMALLOC=malloc \
ASAN_OPTIONS=abort_on_error=1:symbolize=1 \
./python script.pyLook for manual memory operations:
memcpy(dst, src, n);
memmove(dst, src, n);
strcpy(dst, src);
snprintf(buf, size, ...);Check that sizes are in bytes, not elements, unless the API explicitly expects elements.
86.17 Uninitialized Fields
Uninitialized fields often cause nondeterministic failures.
Wrong:
typedef struct {
PyObject_HEAD
PyObject *name;
int flags;
} MyObject;
static PyObject *
MyObject_new(PyTypeObject *type, PyObject *args, PyObject *kw)
{
MyObject *self = (MyObject *)type->tp_alloc(type, 0);
if (self == NULL) {
return NULL;
}
self->name = NULL;
return (PyObject *)self;
}flags is uninitialized.
Correct:
self->name = NULL;
self->flags = 0;For larger structs, prefer full initialization at allocation time when suitable.
86.18 Debugging Memory Growth
Memory growth has several possible causes.
| Cause | How to investigate |
|---|---|
| Python object leak | -R, gc, object counting |
| C allocation leak | ASan/LSan, allocator logs |
| Intentional cache | inspect cache invalidation |
| Fragmentation | compare allocated vs resident memory |
| Immortal/global objects | check runtime initialization paths |
A growing RSS does not always mean a leak. CPython allocators may retain arenas for reuse. C libraries may cache memory. The OS may delay returning pages.
Use targeted evidence:
import gc
import sys
gc.collect()
print(len(gc.get_objects()))For reference leaks, prefer CPython’s -R test mode over raw RSS.
86.19 Temporary Instrumentation
Sometimes the fastest path is temporary logging.
Example:
fprintf(stderr, "new object %p refcnt=%zd\n", op, Py_REFCNT(op));Better with context:
fprintf(stderr,
"%s:%d op=%p type=%s refcnt=%zd\n",
__FILE__, __LINE__,
op,
Py_TYPE(op)->tp_name,
Py_REFCNT(op));Remove instrumentation before committing.
Avoid logging from hot paths unless the reproducer is small. The output can become unusable.
86.20 Common Failure Patterns
Extra Py_DECREF
PyObject *item = PyList_GET_ITEM(list, 0); /* borrowed */
Py_DECREF(item); /* wrong */Borrowed references are not owned.
Missing cleanup on error
x = PyLong_FromLong(1);
y = might_fail();
if (y == NULL) {
return NULL; /* leaks x */
}Stolen reference decref’d twice
PyTuple_SET_ITEM(tuple, 0, item); /* steals item */
Py_DECREF(item); /* wrong */GC traversal omission
static int
MyType_traverse(MyType *self, visitproc visit, void *arg)
{
Py_VISIT(self->a);
return 0; /* forgot self->b */
}Wrong allocator pair
p = PyObject_Malloc(n);
PyMem_Free(p); /* wrong */86.21 A Practical Investigation Checklist
For crashes:
1. Reproduce with a debug build.
2. Run the smallest failing test.
3. Capture C backtrace.
4. Capture Python backtrace if possible.
5. Inspect suspicious PyObject pointers.
6. Check type pointer and refcount.
7. Run under ASan with PYTHONMALLOC=malloc.
8. Use watchpoints only after identifying an object.For leaks:
1. Run with -R.
2. Reduce to one test case.
3. Audit all new references.
4. Audit all error exits.
5. Check stolen-reference APIs.
6. Check caches and globals.
7. Confirm the leak disappears after the fix.For GC bugs:
1. Check allocation uses GC APIs.
2. Check object is initialized before tracking.
3. Check tp_traverse visits all references.
4. Check tp_clear uses Py_CLEAR.
5. Check dealloc untracks before clearing.
6. Run gc-focused tests.86.22 Core Principle
Memory bugs in CPython are ownership bugs until proven otherwise.
Start with reference ownership. Then check allocator pairing, object initialization, GC traversal, and buffer bounds. Use debug builds for invariants, ASan for invalid memory access, -R for reference leaks, and GDB watchpoints for object lifetime history.