# 86. Tracing Memory Bugs

# 86. Tracing Memory Bugs

Memory bugs in CPython are difficult because the visible failure often happens far away from the original mistake. A missing `Py_INCREF`, an extra `Py_DECREF`, a wrong allocator call, or an incomplete GC traversal may corrupt state silently. The crash may appear thousands of bytecode instructions later.

Tracing memory bugs means reconstructing object lifetime: where an object was allocated, who owns it, when its reference count changed, when it was freed, and why a later access was invalid.

## 86.1 Classes of Memory Bugs

CPython memory bugs usually fall into a small number of categories.

| Bug type | Typical cause | Common symptom |
|---|---|---|
| Reference leak | Missing `Py_DECREF` | Growing memory or `-R` failure |
| Use-after-free | Extra `Py_DECREF` or stale borrowed reference | Crash in unrelated code |
| Double free | Object deallocated twice | Allocator abort |
| Buffer overflow | Writing past allocated memory | ASan report or later corruption |
| Uninitialized read | Struct field not initialized | Random branch or MSan report |
| Allocator mismatch | `PyMem_Free` on `PyObject_Malloc` memory | Allocator corruption |
| GC traversal bug | Missing `Py_VISIT` or bad `tp_clear` | Cycle leak or GC crash |
| Borrowed reference misuse | Container mutation invalidates lifetime | Crash after mutation |

The important point: CPython memory bugs are often ownership bugs first and heap bugs second.

## 86.2 Start With a Reproducer

A memory bug investigation needs the smallest command that reproduces the issue.

Good reproducer:

```bash
./python -m test -v test_gc
```

Better reproducer:

```bash
./python Lib/test/test_gc.py GCRegressionTests.test_specific_case
```

Best reproducer:

```bash
./python /tmp/minimal.py
```

A small reproducer helps you:

```text
run under GDB
run under ASan
add temporary logging
use watchpoints
repeat quickly
remove unrelated noise
```

Do not start with the full test suite unless the bug only appears under full-suite pressure.

## 86.3 Use the Right Build

For memory bugs, use a debug build first:

```bash
make clean
./configure --with-pydebug CFLAGS="-O0 -g3"
make -j8
```

Then run:

```bash
./python -m test -v test_name
```

If the failure looks like invalid memory access, build with ASan:

```bash
make clean
./configure --with-pydebug \
  CFLAGS="-O1 -g -fsanitize=address,undefined" \
  LDFLAGS="-fsanitize=address,undefined"
make -j8
```

Run with:

```bash
ASAN_OPTIONS=abort_on_error=1:symbolize=1 \
UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1 \
./python -m test -v test_name
```

For allocator visibility, try:

```bash
PYTHONMALLOC=malloc ./python -m test -v test_name
```

## 86.4 Understand the Ownership Contract

Most CPython memory bugs come from violating reference ownership.

There are three core reference categories.

| Reference kind | Meaning | Caller responsibility |
|---|---|---|
| New reference | Caller owns one reference | Must eventually `Py_DECREF` |
| Borrowed reference | Caller does not own it | Must not decref unless incref first |
| Stolen reference | Callee takes ownership | Caller must not decref after transfer |

Example new reference:

```c
PyObject *x = PyLong_FromLong(10);
if (x == NULL) {
    return NULL;
}

/* use x */

Py_DECREF(x);
```

Example borrowed reference:

```c
PyObject *item = PyList_GET_ITEM(list, 0);  /* borrowed */
```

If `item` must survive after `list` may change or disappear, take ownership:

```c
PyObject *item = PyList_GET_ITEM(list, 0);  /* borrowed */
Py_INCREF(item);

/* item is now owned */

Py_DECREF(item);
```

Example stolen reference:

```c
PyTuple_SET_ITEM(tuple, 0, item);  /* steals item */
```

After this call, do not decref `item` separately.

## 86.5 Trace Reference Count Changes

A reference count bug is a history problem.

You need to answer:

```text
Who created the object?
Who owns it?
Who borrowed it?
Who decref'd it?
Who used it after release?
```

In GDB, inspect a suspicious object:

```gdb
print op
print Py_REFCNT(op)
print Py_TYPE(op)
```

If you know the object address, set a watchpoint on its reference count:

```gdb
watch ((PyObject *)0xADDRESS)->ob_refcnt
continue
```

Each stop shows where the reference count changed.

This is slow but precise.

Use it only after reducing the bug to a specific object.

## 86.6 Use `sys.gettotalrefcount` for Leaks

Debug builds expose total reference count tracking.

Run reference leak tests:

```bash
./python -m test -R 3:3 test_name
```

Meaning:

```text
3 warmup runs
3 measured runs
compare reference count deltas
```

If the measured reference count grows consistently, a leak exists.

A typical leak report means some path creates or owns references without releasing them.

Common leak pattern:

```c
PyObject *name = PyUnicode_FromString("field");
if (name == NULL) {
    return NULL;
}

PyObject *value = compute_value();
if (value == NULL) {
    return NULL;       /* leaks name */
}

Py_DECREF(name);
return value;
```

Correct cleanup:

```c
PyObject *name = PyUnicode_FromString("field");
if (name == NULL) {
    return NULL;
}

PyObject *value = compute_value();
if (value == NULL) {
    Py_DECREF(name);
    return NULL;
}

Py_DECREF(name);
return value;
```

Error paths are the most common leak source.

## 86.7 Audit All Exit Paths

For every function that owns references, inspect all exits.

Example structure:

```c
static PyObject *
make_pair(PyObject *a, PyObject *b)
{
    PyObject *tuple = NULL;
    PyObject *x = NULL;
    PyObject *y = NULL;

    x = transform(a);
    if (x == NULL) {
        goto error;
    }

    y = transform(b);
    if (y == NULL) {
        goto error;
    }

    tuple = PyTuple_New(2);
    if (tuple == NULL) {
        goto error;
    }

    PyTuple_SET_ITEM(tuple, 0, x);
    PyTuple_SET_ITEM(tuple, 1, y);
    return tuple;

error:
    Py_XDECREF(x);
    Py_XDECREF(y);
    Py_XDECREF(tuple);
    return NULL;
}
```

But this code has a subtle ownership issue: after `PyTuple_SET_ITEM`, the tuple owns `x` and `y`. The error cleanup must not decref them separately after ownership transfer.

A safer pattern clears variables after stealing:

```c
PyTuple_SET_ITEM(tuple, 0, x);
x = NULL;

PyTuple_SET_ITEM(tuple, 1, y);
y = NULL;

return tuple;
```

Then cleanup can safely use `Py_XDECREF`.

## 86.8 New Reference Returned From Functions

A C function returning `PyObject *` usually returns a new reference on success.

Example:

```c
static PyObject *
make_number(void)
{
    return PyLong_FromLong(42);
}
```

The caller owns the result.

If the function returns `NULL`, it must set an exception unless the exception was already set by a failing API call.

Wrong:

```c
static PyObject *
bad(void)
{
    return NULL;
}
```

Correct:

```c
static PyObject *
good(void)
{
    PyErr_SetString(PyExc_RuntimeError, "operation failed");
    return NULL;
}
```

Memory correctness and exception correctness are connected. Error paths must release owned references and preserve a valid exception state.

## 86.9 Borrowed References and Container Mutation

Borrowed references are safe only while the owner remains alive and unchanged in relevant ways.

Dangerous pattern:

```c
PyObject *item = PyList_GET_ITEM(list, 0);  /* borrowed */

if (PyList_SetItem(list, 0, PyLong_FromLong(1)) < 0) {
    return NULL;
}

/* item may now be freed */
return PyObject_Repr(item);
```

Correct pattern:

```c
PyObject *item = PyList_GET_ITEM(list, 0);  /* borrowed */
Py_INCREF(item);

if (PyList_SetItem(list, 0, PyLong_FromLong(1)) < 0) {
    Py_DECREF(item);
    return NULL;
}

PyObject *repr = PyObject_Repr(item);
Py_DECREF(item);
return repr;
```

Container mutation can release references to old elements. A borrowed element pointer may become invalid.

## 86.10 Allocator Domains

CPython has several allocator families.

| Allocation API | Free API | Typical use |
|---|---|---|
| `PyObject_Malloc` | `PyObject_Free` | Object-domain allocation |
| `PyMem_Malloc` | `PyMem_Free` | Python memory-domain allocation |
| `PyMem_RawMalloc` | `PyMem_RawFree` | Raw memory allocation |
| `PyObject_GC_New` | `PyObject_GC_Del` | GC-tracked Python objects |
| `malloc` | `free` | External C allocation |

Allocator calls must be paired correctly.

Wrong:

```c
void *p = PyMem_Malloc(128);
PyObject_Free(p);
```

Correct:

```c
void *p = PyMem_Malloc(128);
PyMem_Free(p);
```

Wrong allocator pairing often appears as heap corruption far from the original call.

## 86.11 Debug Allocator

Enable allocator debugging:

```bash
PYTHONMALLOC=debug ./python script.py
```

This can detect:

```text
buffer underflow
buffer overflow
use of freed memory
wrong allocator API family
memory block corruption
```

For ASan investigations, also test:

```bash
PYTHONMALLOC=malloc ./python script.py
```

This routes more allocation through the system allocator, where ASan can observe it directly.

## 86.12 Garbage Collector Bugs

GC-tracked container objects need correct lifecycle handling.

A GC-aware type must usually implement:

```text
tp_traverse
tp_clear
tp_dealloc
```

A traversal function must visit every contained Python reference:

```c
static int
MyType_traverse(MyType *self, visitproc visit, void *arg)
{
    Py_VISIT(self->child);
    Py_VISIT(self->callback);
    return 0;
}
```

A clear function must release strong references and tolerate repeated calls:

```c
static int
MyType_clear(MyType *self)
{
    Py_CLEAR(self->child);
    Py_CLEAR(self->callback);
    return 0;
}
```

Common GC mistakes:

```text
forgetting a field in tp_traverse
decref instead of Py_CLEAR in tp_clear
tracking object before all fields are initialized
failing to untrack before deallocation
using non-GC allocation for GC-tracked object
```

## 86.13 Track and Untrack Correctly

A container object should usually be tracked only after it is fully initialized.

Sketch:

```c
self = PyObject_GC_New(MyType, type);
if (self == NULL) {
    return NULL;
}

self->child = NULL;
self->callback = NULL;

/* initialize fields */

PyObject_GC_Track(self);
return (PyObject *)self;
```

During deallocation:

```c
static void
MyType_dealloc(MyType *self)
{
    PyObject_GC_UnTrack(self);
    MyType_clear(self);
    Py_TYPE(self)->tp_free((PyObject *)self);
}
```

Tracking too early exposes a partially initialized object to the collector.

Untracking too late can expose an object while it is being destroyed.

## 86.14 Use ASan for Invalid Memory Access

ASan gives allocation and deallocation history.

Run:

```bash
ASAN_OPTIONS=abort_on_error=1:symbolize=1 \
PYTHONMALLOC=malloc \
./python -m test -v test_name
```

A use-after-free report usually contains:

```text
invalid access stack
free stack
allocation stack
```

Read the stacks in that order.

The invalid access tells you where the bad pointer was used.

The free stack tells you when the object became invalid.

The allocation stack tells you what kind of object it was.

## 86.15 Use GDB Watchpoints for Specific Objects

When ASan tells you an object address, reproduce under GDB and set a watchpoint.

Example:

```gdb
watch ((PyObject *)0x12345678)->ob_refcnt
run
```

When it stops:

```gdb
bt
py-bt
print Py_REFCNT((PyObject *)0x12345678)
print Py_TYPE((PyObject *)0x12345678)
continue
```

This reconstructs ownership history.

Watchpoints are expensive, but for a small reproducer they can be decisive.

## 86.16 Detecting Buffer Overwrites

A buffer overwrite may corrupt adjacent object metadata.

Symptoms:

```text
invalid type pointer
negative refcount
crash in unrelated deallocator
GC list corruption
allocator guard failure
```

Use:

```bash
PYTHONMALLOC=debug ./python script.py
```

or ASan:

```bash
PYTHONMALLOC=malloc \
ASAN_OPTIONS=abort_on_error=1:symbolize=1 \
./python script.py
```

Look for manual memory operations:

```c
memcpy(dst, src, n);
memmove(dst, src, n);
strcpy(dst, src);
snprintf(buf, size, ...);
```

Check that sizes are in bytes, not elements, unless the API explicitly expects elements.

## 86.17 Uninitialized Fields

Uninitialized fields often cause nondeterministic failures.

Wrong:

```c
typedef struct {
    PyObject_HEAD
    PyObject *name;
    int flags;
} MyObject;

static PyObject *
MyObject_new(PyTypeObject *type, PyObject *args, PyObject *kw)
{
    MyObject *self = (MyObject *)type->tp_alloc(type, 0);
    if (self == NULL) {
        return NULL;
    }

    self->name = NULL;
    return (PyObject *)self;
}
```

`flags` is uninitialized.

Correct:

```c
self->name = NULL;
self->flags = 0;
```

For larger structs, prefer full initialization at allocation time when suitable.

## 86.18 Debugging Memory Growth

Memory growth has several possible causes.

| Cause | How to investigate |
|---|---|
| Python object leak | `-R`, `gc`, object counting |
| C allocation leak | ASan/LSan, allocator logs |
| Intentional cache | inspect cache invalidation |
| Fragmentation | compare allocated vs resident memory |
| Immortal/global objects | check runtime initialization paths |

A growing RSS does not always mean a leak. CPython allocators may retain arenas for reuse. C libraries may cache memory. The OS may delay returning pages.

Use targeted evidence:

```python
import gc
import sys

gc.collect()
print(len(gc.get_objects()))
```

For reference leaks, prefer CPython’s `-R` test mode over raw RSS.

## 86.19 Temporary Instrumentation

Sometimes the fastest path is temporary logging.

Example:

```c
fprintf(stderr, "new object %p refcnt=%zd\n", op, Py_REFCNT(op));
```

Better with context:

```c
fprintf(stderr,
        "%s:%d op=%p type=%s refcnt=%zd\n",
        __FILE__, __LINE__,
        op,
        Py_TYPE(op)->tp_name,
        Py_REFCNT(op));
```

Remove instrumentation before committing.

Avoid logging from hot paths unless the reproducer is small. The output can become unusable.

## 86.20 Common Failure Patterns

### Extra `Py_DECREF`

```c
PyObject *item = PyList_GET_ITEM(list, 0);  /* borrowed */
Py_DECREF(item);                            /* wrong */
```

Borrowed references are not owned.

### Missing cleanup on error

```c
x = PyLong_FromLong(1);
y = might_fail();
if (y == NULL) {
    return NULL;    /* leaks x */
}
```

### Stolen reference decref'd twice

```c
PyTuple_SET_ITEM(tuple, 0, item);  /* steals item */
Py_DECREF(item);                   /* wrong */
```

### GC traversal omission

```c
static int
MyType_traverse(MyType *self, visitproc visit, void *arg)
{
    Py_VISIT(self->a);
    return 0;       /* forgot self->b */
}
```

### Wrong allocator pair

```c
p = PyObject_Malloc(n);
PyMem_Free(p);      /* wrong */
```

## 86.21 A Practical Investigation Checklist

For crashes:

```text
1. Reproduce with a debug build.
2. Run the smallest failing test.
3. Capture C backtrace.
4. Capture Python backtrace if possible.
5. Inspect suspicious PyObject pointers.
6. Check type pointer and refcount.
7. Run under ASan with PYTHONMALLOC=malloc.
8. Use watchpoints only after identifying an object.
```

For leaks:

```text
1. Run with -R.
2. Reduce to one test case.
3. Audit all new references.
4. Audit all error exits.
5. Check stolen-reference APIs.
6. Check caches and globals.
7. Confirm the leak disappears after the fix.
```

For GC bugs:

```text
1. Check allocation uses GC APIs.
2. Check object is initialized before tracking.
3. Check tp_traverse visits all references.
4. Check tp_clear uses Py_CLEAR.
5. Check dealloc untracks before clearing.
6. Run gc-focused tests.
```

## 86.22 Core Principle

Memory bugs in CPython are ownership bugs until proven otherwise.

Start with reference ownership. Then check allocator pairing, object initialization, GC traversal, and buffer bounds. Use debug builds for invariants, ASan for invalid memory access, `-R` for reference leaks, and GDB watchpoints for object lifetime history.
