Skip to content

10. The Garbage Collector

Cycle detection algorithm, generational collection, gc thresholds, and the interaction with reference counting.

CPython uses reference counting as its main memory management mechanism. Reference counting destroys most objects as soon as their last strong reference disappears.

Reference counting has one major limitation: it cannot reclaim reference cycles by itself.

The garbage collector exists to find unreachable cycles of container objects and reclaim them. It is a supplement to reference counting, not a replacement for it.

10.1 Why Reference Counting Needs Help

A reference count reaches zero only when no strong references point to an object.

This works for ordinary object graphs:

x = []
del x

The list loses its final reference and can be destroyed.

It does not work for cycles:

a = []
b = []

a.append(b)
b.append(a)

del a
del b

After the two names are deleted, the lists still reference each other:

list A ---> list B
list B ---> list A

Their reference counts remain nonzero. But no live Python code can reach them.

Reference counting sees local ownership. Garbage collection sees reachability.

10.2 What the Garbage Collector Tracks

CPython does not need to track every object in the cyclic garbage collector.

Objects that cannot contain references to other Python objects cannot form cycles by themselves. Examples include many integers, floats, and simple strings.

The collector mainly tracks container-like objects:

list
dict
set
tuple containing references
function
class
instance
frame
generator
coroutine
traceback
some extension objects

An object needs GC tracking when it can participate in a cycle.

For example:

x = 123

The integer object does not point to other Python objects in a way that creates container cycles.

But:

x = []
x.append(x)

The list points to itself. This is a cycle.

10.3 Self-Cycles

The simplest reference cycle is a self-cycle:

x = []
x.append(x)

The structure is:

x ---> list
       ^  |
       |  |
       +--+

Now delete the external name:

del x

The list still contains a reference to itself. Its reference count stays above zero.

No program variable can reach it, but reference counting alone cannot destroy it. The cyclic garbage collector must detect it.

10.4 Multi-Object Cycles

Cycles often involve multiple objects.

class Node:
    def __init__(self):
        self.parent = None
        self.children = []

root = Node()
child = Node()

root.children.append(child)
child.parent = root

del root
del child

The graph becomes unreachable, but references remain inside it:

root node ---> children list ---> child node
    ^                              |
    |                              |
    +----------- parent -----------+

This pattern is common in trees, graphs, object models, ASTs, DOMs, caches, frames, closures, and exception tracebacks.

10.5 The Collector Works on Containers

The cyclic collector only needs to reason about objects that can point to other objects.

It asks each tracked object:

Which Python objects do you reference?

At the C level, extension types answer through traversal support. A GC-aware type provides a traversal function that visits contained Python references.

Conceptually:

static int
Node_traverse(NodeObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->parent);
    Py_VISIT(self->children);
    return 0;
}

The collector uses this to walk object graphs.

If an extension object owns Python references and can be part of a cycle, it must participate in this protocol. Otherwise, it can leak cycles.

10.6 Tracked and Untracked Objects

An object may be tracked or untracked by the cyclic collector.

Tracked means the collector may inspect it during a collection.

Untracked means the collector ignores it for cycle detection.

You can inspect this from Python:

import gc

print(gc.is_tracked([]))
print(gc.is_tracked(123))
print(gc.is_tracked("hello"))

Typical output on CPython may show that containers are tracked and atomic objects are not. Exact results can vary because CPython applies optimizations. For example, some containers may become untracked when they contain only atomic objects.

The important rule is conceptual:

objects that can participate in cycles need tracking
objects that cannot participate in cycles usually do not

10.7 Generational Collection

CPython’s cyclic garbage collector is generational.

The idea is based on a common observation: most objects die young. Objects that survive several collections are likely to live longer.

A generational collector groups tracked objects by age. Younger generations are collected more often. Older generations are collected less often.

Conceptually:

generation 0
    newest tracked objects
    collected most often

generation 1
    objects that survived earlier collection
    collected less often

generation 2
    older tracked objects
    collected least often

The exact generation design can change across CPython versions. The useful mental model is that CPython avoids scanning all tracked containers on every collection.

10.8 Collection Thresholds

The gc module exposes thresholds:

import gc

print(gc.get_threshold())

Thresholds help decide when automatic cyclic collection runs.

You can change them:

gc.set_threshold(700, 10, 10)

You can force a collection:

gc.collect()

You can disable automatic cyclic collection:

gc.disable()

and re-enable it:

gc.enable()

Disabling the collector does not disable reference counting. Objects whose reference counts reach zero are still destroyed. Disabling the collector only disables automatic cycle detection.

10.9 How Cycle Detection Works

The collector does not simply look for objects with nonzero reference counts. Almost every live object has a nonzero reference count.

Instead, it computes whether the references keeping a group alive come only from inside that group.

A simplified cycle detection process:

select tracked candidate objects
copy each object's reference count into a temporary field
for each reference from candidate object to candidate object:
    subtract one from the target's temporary count
objects with temporary count still positive are reachable from outside
propagate reachability from those externally reachable objects
objects never reached are unreachable cycles

Example:

outside ---> A ---> B
             ^     |
             |     v
             D <--- C

Even if A, B, C, and D form a cycle, the outside reference to A makes the whole group reachable.

But:

A ---> B
^     |
|     v
D <--- C

with no outside reference is collectible.

10.10 Reachable Cycles Are Not Garbage

A cycle is not automatically garbage.

a = []
a.append(a)

print(a)

This object is cyclic, but reachable through the name a. It must stay alive.

The collector only reclaims unreachable cycles.

This distinction matters for data structures. Cyclic graphs are normal and valid in Python. The collector exists so they can be used safely without manual breakage in common cases.

10.11 Finalizers

Finalizers complicate garbage collection.

A finalizer is usually a __del__ method:

class Resource:
    def __del__(self):
        print("destroying")

Finalizers run during object destruction. They can execute Python code. That code can access globals, mutate state, acquire locks, create objects, or even resurrect the object being finalized.

Object resurrection means a finalizer makes the object reachable again:

saved = None

class Resurrect:
    def __del__(self):
        global saved
        saved = self

This makes collection more complex.

Modern CPython has specific rules for finalizing cyclic garbage, but the practical guidance is simple:

avoid complex __del__ methods
prefer context managers
prefer weakref.finalize for cleanup hooks

10.12 Context Managers Are Better for Resources

Garbage collection is not a resource management API.

Use with for deterministic cleanup:

with open("data.txt") as f:
    data = f.read()

This closes the file when the block exits.

Do not rely on garbage collection timing:

f = open("data.txt")
data = f.read()
f = None

In CPython, reference counting may close the file quickly. In other implementations, cleanup may happen later.

For locks, sockets, files, transactions, temporary directories, and external handles, use explicit lifetime control.

10.13 gc.collect

gc.collect() forces a cyclic collection.

import gc

n = gc.collect()
print(n)

The return value is the number of unreachable objects found and collected.

You can request a specific generation in versions that support that interface:

gc.collect(0)
gc.collect(1)
gc.collect(2)

Manual collection is useful for:

tests
debugging leaks
interactive experiments
memory-sensitive batch phases
controlled benchmarks

It should rarely be needed in ordinary application code.

10.14 gc.get_objects

gc.get_objects() returns tracked objects known to the collector.

import gc

objs = gc.get_objects()
print(len(objs))

This does not return every live Python object. It returns objects tracked by the cyclic garbage collector.

Many atomic objects may be absent.

Use it for debugging object graphs and memory leaks, not for normal application logic.

10.15 gc.get_referrers

gc.get_referrers(obj) returns objects that directly refer to obj.

import gc

x = []
refs = gc.get_referrers(x)
print(refs)

This can help explain why an object remains alive.

But it must be used carefully. Calling debugging functions creates temporary references and frames. These may appear in results.

gc.get_referrers also exposes implementation details. It can show stack frames, dictionaries, lists, and internal objects.

10.16 gc.get_referents

gc.get_referents(obj) returns objects directly referenced by obj.

import gc

x = [[1], [2]]
print(gc.get_referents(x))

For a list, this includes its elements.

For a dict, this may include keys and values.

This function uses the same traversal support that the collector uses. It sees GC-level referents, not necessarily all semantic relationships a programmer may imagine.

10.17 Uncollectable Garbage

Some objects may be found unreachable but not immediately collectable under certain conditions.

Historically, cycles containing finalizers were especially difficult. Modern CPython handles many of these cases better, but uncollectable objects can still appear with extension types or unusual finalization behavior.

The gc module exposes:

import gc

print(gc.garbage)

When debugging, you can enable debug flags:

gc.set_debug(gc.DEBUG_SAVEALL)

With DEBUG_SAVEALL, unreachable objects are saved in gc.garbage instead of being freed. This is useful for inspection, but it intentionally leaks until you clear the list.

10.18 Weak References

Weak references allow observing an object without keeping it alive.

import weakref

class User:
    pass

u = User()
r = weakref.ref(u)

print(r())      # object

del u

print(r())      # None

The weak reference does not increment the target’s strong reference count.

Weak references are useful for caches, observer lists, parent links, and auxiliary metadata.

A parent pointer can often be weak:

import weakref

class Node:
    def __init__(self):
        self.children = []
        self.parent = None

root = Node()
child = Node()

child.parent = weakref.ref(root)
root.children.append(child)

This avoids forming a strong cycle through parent and child links.

10.19 weakref.finalize

weakref.finalize registers cleanup code without placing cleanup logic directly in __del__.

import weakref

class Resource:
    pass

def cleanup(name):
    print("cleaning", name)

r = Resource()
finalizer = weakref.finalize(r, cleanup, "resource")

When r becomes unreachable, the finalizer can run.

This is often safer than writing complex __del__ methods because the finalizer separates cleanup state from the object being finalized.

Still, external resources should generally be managed with context managers when possible.

10.20 Frames, Tracebacks, and Cycles

Frames and tracebacks often create cycles.

Example:

def f():
    x = []
    raise RuntimeError

try:
    f()
except RuntimeError as exc:
    saved = exc

An exception can hold a traceback. The traceback can hold frames. Frames hold local variables. Local variables may hold the exception or objects related to it.

Conceptually:

exception
    traceback
        frame
            locals
                exception

This can keep large object graphs alive.

Modern Python clears some exception state more aggressively than older versions, but stored exceptions and tracebacks can still retain memory.

Common mitigation:

try:
    ...
except Exception as exc:
    ...
finally:
    exc = None

or avoid storing tracebacks longer than necessary.

10.21 Closures and Cycles

Closures can also create cycles.

def make_func():
    items = []

    def add(x):
        items.append(x)
        return items

    return add

The inner function references a cell. The cell references items.

More complex closures can reference objects that reference the function back.

Cycles through functions, closures, instances, and callbacks are common in real programs.

The collector handles many of them, but understanding the graph helps when memory grows unexpectedly.

10.22 Extension Types and GC Support

A C extension type that owns references to Python objects may need cyclic GC support.

Required pieces often include:

Py_TPFLAGS_HAVE_GC
tp_traverse
tp_clear
GC-aware allocation
GC-aware deallocation

A traversal function tells the collector what objects this object references:

static int
Box_traverse(BoxObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->value);
    return 0;
}

A clear function breaks references during collection:

static int
Box_clear(BoxObject *self)
{
    Py_CLEAR(self->value);
    return 0;
}

The deallocator must untrack the object and clear owned references safely.

Conceptual shape:

static void
Box_dealloc(BoxObject *self)
{
    PyObject_GC_UnTrack(self);
    Box_clear(self);
    Py_TYPE(self)->tp_free((PyObject *)self);
}

This is simplified. Real extension code must follow the exact C API requirements.

10.23 Py_VISIT

Py_VISIT is used inside traversal functions.

static int
Node_traverse(NodeObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->left);
    Py_VISIT(self->right);
    Py_VISIT(self->parent);
    return 0;
}

It checks whether the field is non-NULL and passes it to the visitor.

A correct traversal function must visit every contained Python object reference that can participate in cycles.

Missing one field can make cycles invisible to the collector.

10.24 Py_CLEAR

Py_CLEAR safely clears an owned reference.

A naive clear operation:

Py_DECREF(self->value);
self->value = NULL;

has a subtle problem. The Py_DECREF can run arbitrary code through finalizers. That code might observe self before self->value has been set to NULL.

Py_CLEAR sets the field to NULL first, then decrements the old reference.

Conceptually:

tmp = self->value;
self->value = NULL;
Py_XDECREF(tmp);

Use Py_CLEAR when breaking references inside containers or deallocators.

10.25 Collection and Performance

The cyclic collector has a cost. It scans tracked containers and follows references.

Most programs should leave it enabled. But some workloads may tune it:

short-lived batch jobs
allocation-heavy parsers
large object graph construction
scientific pipelines
services with known allocation phases

Example:

import gc

gc.disable()
try:
    build_large_graph()
finally:
    gc.enable()
    gc.collect()

This pattern can help when you know a phase creates many containers without cycles. It can also hurt if cycles accumulate.

Measure before tuning.

10.26 Memory Leaks vs Retained References

Not all memory growth is a leak in the C sense.

A program may retain references accidentally:

cache = []

def handle(request):
    cache.append(request)

The collector cannot free reachable objects. If cache keeps growing, those objects are live.

A true C-level leak means memory or references are lost by the implementation.

A Python-level retention bug means the program still has reachable references to objects it no longer needs.

Debugging memory often starts by asking:

Is the object unreachable but not collected?
Or is something still referring to it?

Use gc.get_referrers, tracemalloc, object graph tools, and heap inspection to answer that.

10.27 Common Cycle Patterns

Common sources of cycles include:

PatternShape
Parent and child linksparent -> child -> parent
Doubly linked listsnode A -> node B -> node A
Graph structuresarbitrary cycles
Observer callbacksobject -> callback -> bound method -> object
Exceptionsexception -> traceback -> frame -> locals
Closuresfunction -> closure cell -> object -> function
Descriptorsclass -> descriptor -> class-related state
C extension objectsnative object -> Python object -> native wrapper

Cycles are normal. The issue is whether unreachable cycles are collectible and whether they contain external resources that need deterministic cleanup.

10.28 Practical Rules

For Python code:

SituationPreferred approach
External resourceUse with
Parent pointerConsider weakref
CacheUse weakref.WeakValueDictionary when appropriate
Long-lived exceptionAvoid retaining traceback unnecessarily
Memory debuggingUse gc, tracemalloc, and referrer inspection
Cleanup hookPrefer context manager or weakref.finalize
Large allocation phaseTune GC only after measurement

For C extension code:

SituationRequired discipline
Type owns Python referencesImplement proper deallocation
Type can form cyclesAdd GC support
Type supports GCImplement tp_traverse correctly
Type participates in collectionImplement tp_clear correctly
Breaking referencesUse Py_CLEAR
Traversing fieldsUse Py_VISIT
Deallocating GC objectUntrack before clearing

10.29 Mental Model

Use this model:

Reference counting handles local lifetime.
Cyclic GC handles unreachable container cycles.
The collector only sees tracked objects.
Tracked objects must describe their outgoing references.
Reachable cycles remain alive.
Unreachable cycles can be collected.
Finalizers and extension types complicate collection.
Resource lifetime should be explicit.

This model explains why most CPython objects disappear promptly, why some cyclic structures survive until a collection, why __del__ needs care, and why C extension types must participate in the GC protocol when they own Python references.

10.30 Summary

CPython’s garbage collector exists because reference counting cannot reclaim cycles. It tracks container-like objects, analyzes references between them, finds unreachable groups, and clears them safely.

The collector is generational, configurable through the gc module, and deeply tied to the object model. Python programmers mainly need to understand cycles, finalizers, weak references, and resource lifetime. C extension authors must implement traversal and clearing correctly when their types can participate in cycles.