Skip to content

86. Tracing Memory Bugs

Valgrind suppression files, tracemalloc snapshot diffing, and libasan leak detection in CPython builds.

Memory bugs in CPython are difficult because the visible failure often happens far away from the original mistake. A missing Py_INCREF, an extra Py_DECREF, a wrong allocator call, or an incomplete GC traversal may corrupt state silently. The crash may appear thousands of bytecode instructions later.

Tracing memory bugs means reconstructing object lifetime: where an object was allocated, who owns it, when its reference count changed, when it was freed, and why a later access was invalid.

86.1 Classes of Memory Bugs

CPython memory bugs usually fall into a small number of categories.

Bug typeTypical causeCommon symptom
Reference leakMissing Py_DECREFGrowing memory or -R failure
Use-after-freeExtra Py_DECREF or stale borrowed referenceCrash in unrelated code
Double freeObject deallocated twiceAllocator abort
Buffer overflowWriting past allocated memoryASan report or later corruption
Uninitialized readStruct field not initializedRandom branch or MSan report
Allocator mismatchPyMem_Free on PyObject_Malloc memoryAllocator corruption
GC traversal bugMissing Py_VISIT or bad tp_clearCycle leak or GC crash
Borrowed reference misuseContainer mutation invalidates lifetimeCrash after mutation

The important point: CPython memory bugs are often ownership bugs first and heap bugs second.

86.2 Start With a Reproducer

A memory bug investigation needs the smallest command that reproduces the issue.

Good reproducer:

./python -m test -v test_gc

Better reproducer:

./python Lib/test/test_gc.py GCRegressionTests.test_specific_case

Best reproducer:

./python /tmp/minimal.py

A small reproducer helps you:

run under GDB
run under ASan
add temporary logging
use watchpoints
repeat quickly
remove unrelated noise

Do not start with the full test suite unless the bug only appears under full-suite pressure.

86.3 Use the Right Build

For memory bugs, use a debug build first:

make clean
./configure --with-pydebug CFLAGS="-O0 -g3"
make -j8

Then run:

./python -m test -v test_name

If the failure looks like invalid memory access, build with ASan:

make clean
./configure --with-pydebug \
  CFLAGS="-O1 -g -fsanitize=address,undefined" \
  LDFLAGS="-fsanitize=address,undefined"
make -j8

Run with:

ASAN_OPTIONS=abort_on_error=1:symbolize=1 \
UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1 \
./python -m test -v test_name

For allocator visibility, try:

PYTHONMALLOC=malloc ./python -m test -v test_name

86.4 Understand the Ownership Contract

Most CPython memory bugs come from violating reference ownership.

There are three core reference categories.

Reference kindMeaningCaller responsibility
New referenceCaller owns one referenceMust eventually Py_DECREF
Borrowed referenceCaller does not own itMust not decref unless incref first
Stolen referenceCallee takes ownershipCaller must not decref after transfer

Example new reference:

PyObject *x = PyLong_FromLong(10);
if (x == NULL) {
    return NULL;
}

/* use x */

Py_DECREF(x);

Example borrowed reference:

PyObject *item = PyList_GET_ITEM(list, 0);  /* borrowed */

If item must survive after list may change or disappear, take ownership:

PyObject *item = PyList_GET_ITEM(list, 0);  /* borrowed */
Py_INCREF(item);

/* item is now owned */

Py_DECREF(item);

Example stolen reference:

PyTuple_SET_ITEM(tuple, 0, item);  /* steals item */

After this call, do not decref item separately.

86.5 Trace Reference Count Changes

A reference count bug is a history problem.

You need to answer:

Who created the object?
Who owns it?
Who borrowed it?
Who decref'd it?
Who used it after release?

In GDB, inspect a suspicious object:

print op
print Py_REFCNT(op)
print Py_TYPE(op)

If you know the object address, set a watchpoint on its reference count:

watch ((PyObject *)0xADDRESS)->ob_refcnt
continue

Each stop shows where the reference count changed.

This is slow but precise.

Use it only after reducing the bug to a specific object.

86.6 Use sys.gettotalrefcount for Leaks

Debug builds expose total reference count tracking.

Run reference leak tests:

./python -m test -R 3:3 test_name

Meaning:

3 warmup runs
3 measured runs
compare reference count deltas

If the measured reference count grows consistently, a leak exists.

A typical leak report means some path creates or owns references without releasing them.

Common leak pattern:

PyObject *name = PyUnicode_FromString("field");
if (name == NULL) {
    return NULL;
}

PyObject *value = compute_value();
if (value == NULL) {
    return NULL;       /* leaks name */
}

Py_DECREF(name);
return value;

Correct cleanup:

PyObject *name = PyUnicode_FromString("field");
if (name == NULL) {
    return NULL;
}

PyObject *value = compute_value();
if (value == NULL) {
    Py_DECREF(name);
    return NULL;
}

Py_DECREF(name);
return value;

Error paths are the most common leak source.

86.7 Audit All Exit Paths

For every function that owns references, inspect all exits.

Example structure:

static PyObject *
make_pair(PyObject *a, PyObject *b)
{
    PyObject *tuple = NULL;
    PyObject *x = NULL;
    PyObject *y = NULL;

    x = transform(a);
    if (x == NULL) {
        goto error;
    }

    y = transform(b);
    if (y == NULL) {
        goto error;
    }

    tuple = PyTuple_New(2);
    if (tuple == NULL) {
        goto error;
    }

    PyTuple_SET_ITEM(tuple, 0, x);
    PyTuple_SET_ITEM(tuple, 1, y);
    return tuple;

error:
    Py_XDECREF(x);
    Py_XDECREF(y);
    Py_XDECREF(tuple);
    return NULL;
}

But this code has a subtle ownership issue: after PyTuple_SET_ITEM, the tuple owns x and y. The error cleanup must not decref them separately after ownership transfer.

A safer pattern clears variables after stealing:

PyTuple_SET_ITEM(tuple, 0, x);
x = NULL;

PyTuple_SET_ITEM(tuple, 1, y);
y = NULL;

return tuple;

Then cleanup can safely use Py_XDECREF.

86.8 New Reference Returned From Functions

A C function returning PyObject * usually returns a new reference on success.

Example:

static PyObject *
make_number(void)
{
    return PyLong_FromLong(42);
}

The caller owns the result.

If the function returns NULL, it must set an exception unless the exception was already set by a failing API call.

Wrong:

static PyObject *
bad(void)
{
    return NULL;
}

Correct:

static PyObject *
good(void)
{
    PyErr_SetString(PyExc_RuntimeError, "operation failed");
    return NULL;
}

Memory correctness and exception correctness are connected. Error paths must release owned references and preserve a valid exception state.

86.9 Borrowed References and Container Mutation

Borrowed references are safe only while the owner remains alive and unchanged in relevant ways.

Dangerous pattern:

PyObject *item = PyList_GET_ITEM(list, 0);  /* borrowed */

if (PyList_SetItem(list, 0, PyLong_FromLong(1)) < 0) {
    return NULL;
}

/* item may now be freed */
return PyObject_Repr(item);

Correct pattern:

PyObject *item = PyList_GET_ITEM(list, 0);  /* borrowed */
Py_INCREF(item);

if (PyList_SetItem(list, 0, PyLong_FromLong(1)) < 0) {
    Py_DECREF(item);
    return NULL;
}

PyObject *repr = PyObject_Repr(item);
Py_DECREF(item);
return repr;

Container mutation can release references to old elements. A borrowed element pointer may become invalid.

86.10 Allocator Domains

CPython has several allocator families.

Allocation APIFree APITypical use
PyObject_MallocPyObject_FreeObject-domain allocation
PyMem_MallocPyMem_FreePython memory-domain allocation
PyMem_RawMallocPyMem_RawFreeRaw memory allocation
PyObject_GC_NewPyObject_GC_DelGC-tracked Python objects
mallocfreeExternal C allocation

Allocator calls must be paired correctly.

Wrong:

void *p = PyMem_Malloc(128);
PyObject_Free(p);

Correct:

void *p = PyMem_Malloc(128);
PyMem_Free(p);

Wrong allocator pairing often appears as heap corruption far from the original call.

86.11 Debug Allocator

Enable allocator debugging:

PYTHONMALLOC=debug ./python script.py

This can detect:

buffer underflow
buffer overflow
use of freed memory
wrong allocator API family
memory block corruption

For ASan investigations, also test:

PYTHONMALLOC=malloc ./python script.py

This routes more allocation through the system allocator, where ASan can observe it directly.

86.12 Garbage Collector Bugs

GC-tracked container objects need correct lifecycle handling.

A GC-aware type must usually implement:

tp_traverse
tp_clear
tp_dealloc

A traversal function must visit every contained Python reference:

static int
MyType_traverse(MyType *self, visitproc visit, void *arg)
{
    Py_VISIT(self->child);
    Py_VISIT(self->callback);
    return 0;
}

A clear function must release strong references and tolerate repeated calls:

static int
MyType_clear(MyType *self)
{
    Py_CLEAR(self->child);
    Py_CLEAR(self->callback);
    return 0;
}

Common GC mistakes:

forgetting a field in tp_traverse
decref instead of Py_CLEAR in tp_clear
tracking object before all fields are initialized
failing to untrack before deallocation
using non-GC allocation for GC-tracked object

86.13 Track and Untrack Correctly

A container object should usually be tracked only after it is fully initialized.

Sketch:

self = PyObject_GC_New(MyType, type);
if (self == NULL) {
    return NULL;
}

self->child = NULL;
self->callback = NULL;

/* initialize fields */

PyObject_GC_Track(self);
return (PyObject *)self;

During deallocation:

static void
MyType_dealloc(MyType *self)
{
    PyObject_GC_UnTrack(self);
    MyType_clear(self);
    Py_TYPE(self)->tp_free((PyObject *)self);
}

Tracking too early exposes a partially initialized object to the collector.

Untracking too late can expose an object while it is being destroyed.

86.14 Use ASan for Invalid Memory Access

ASan gives allocation and deallocation history.

Run:

ASAN_OPTIONS=abort_on_error=1:symbolize=1 \
PYTHONMALLOC=malloc \
./python -m test -v test_name

A use-after-free report usually contains:

invalid access stack
free stack
allocation stack

Read the stacks in that order.

The invalid access tells you where the bad pointer was used.

The free stack tells you when the object became invalid.

The allocation stack tells you what kind of object it was.

86.15 Use GDB Watchpoints for Specific Objects

When ASan tells you an object address, reproduce under GDB and set a watchpoint.

Example:

watch ((PyObject *)0x12345678)->ob_refcnt
run

When it stops:

bt
py-bt
print Py_REFCNT((PyObject *)0x12345678)
print Py_TYPE((PyObject *)0x12345678)
continue

This reconstructs ownership history.

Watchpoints are expensive, but for a small reproducer they can be decisive.

86.16 Detecting Buffer Overwrites

A buffer overwrite may corrupt adjacent object metadata.

Symptoms:

invalid type pointer
negative refcount
crash in unrelated deallocator
GC list corruption
allocator guard failure

Use:

PYTHONMALLOC=debug ./python script.py

or ASan:

PYTHONMALLOC=malloc \
ASAN_OPTIONS=abort_on_error=1:symbolize=1 \
./python script.py

Look for manual memory operations:

memcpy(dst, src, n);
memmove(dst, src, n);
strcpy(dst, src);
snprintf(buf, size, ...);

Check that sizes are in bytes, not elements, unless the API explicitly expects elements.

86.17 Uninitialized Fields

Uninitialized fields often cause nondeterministic failures.

Wrong:

typedef struct {
    PyObject_HEAD
    PyObject *name;
    int flags;
} MyObject;

static PyObject *
MyObject_new(PyTypeObject *type, PyObject *args, PyObject *kw)
{
    MyObject *self = (MyObject *)type->tp_alloc(type, 0);
    if (self == NULL) {
        return NULL;
    }

    self->name = NULL;
    return (PyObject *)self;
}

flags is uninitialized.

Correct:

self->name = NULL;
self->flags = 0;

For larger structs, prefer full initialization at allocation time when suitable.

86.18 Debugging Memory Growth

Memory growth has several possible causes.

CauseHow to investigate
Python object leak-R, gc, object counting
C allocation leakASan/LSan, allocator logs
Intentional cacheinspect cache invalidation
Fragmentationcompare allocated vs resident memory
Immortal/global objectscheck runtime initialization paths

A growing RSS does not always mean a leak. CPython allocators may retain arenas for reuse. C libraries may cache memory. The OS may delay returning pages.

Use targeted evidence:

import gc
import sys

gc.collect()
print(len(gc.get_objects()))

For reference leaks, prefer CPython’s -R test mode over raw RSS.

86.19 Temporary Instrumentation

Sometimes the fastest path is temporary logging.

Example:

fprintf(stderr, "new object %p refcnt=%zd\n", op, Py_REFCNT(op));

Better with context:

fprintf(stderr,
        "%s:%d op=%p type=%s refcnt=%zd\n",
        __FILE__, __LINE__,
        op,
        Py_TYPE(op)->tp_name,
        Py_REFCNT(op));

Remove instrumentation before committing.

Avoid logging from hot paths unless the reproducer is small. The output can become unusable.

86.20 Common Failure Patterns

Extra Py_DECREF

PyObject *item = PyList_GET_ITEM(list, 0);  /* borrowed */
Py_DECREF(item);                            /* wrong */

Borrowed references are not owned.

Missing cleanup on error

x = PyLong_FromLong(1);
y = might_fail();
if (y == NULL) {
    return NULL;    /* leaks x */
}

Stolen reference decref’d twice

PyTuple_SET_ITEM(tuple, 0, item);  /* steals item */
Py_DECREF(item);                   /* wrong */

GC traversal omission

static int
MyType_traverse(MyType *self, visitproc visit, void *arg)
{
    Py_VISIT(self->a);
    return 0;       /* forgot self->b */
}

Wrong allocator pair

p = PyObject_Malloc(n);
PyMem_Free(p);      /* wrong */

86.21 A Practical Investigation Checklist

For crashes:

1. Reproduce with a debug build.
2. Run the smallest failing test.
3. Capture C backtrace.
4. Capture Python backtrace if possible.
5. Inspect suspicious PyObject pointers.
6. Check type pointer and refcount.
7. Run under ASan with PYTHONMALLOC=malloc.
8. Use watchpoints only after identifying an object.

For leaks:

1. Run with -R.
2. Reduce to one test case.
3. Audit all new references.
4. Audit all error exits.
5. Check stolen-reference APIs.
6. Check caches and globals.
7. Confirm the leak disappears after the fix.

For GC bugs:

1. Check allocation uses GC APIs.
2. Check object is initialized before tracking.
3. Check tp_traverse visits all references.
4. Check tp_clear uses Py_CLEAR.
5. Check dealloc untracks before clearing.
6. Run gc-focused tests.

86.22 Core Principle

Memory bugs in CPython are ownership bugs until proven otherwise.

Start with reference ownership. Then check allocator pairing, object initialization, GC traversal, and buffer bounds. Use debug builds for invariants, ASan for invalid memory access, -R for reference leaks, and GDB watchpoints for object lifetime history.