Cycle detection algorithm, generational collection, gc thresholds, and the interaction with reference counting.
CPython uses reference counting as its main memory management mechanism. Reference counting destroys most objects as soon as their last strong reference disappears.
Reference counting has one major limitation: it cannot reclaim reference cycles by itself.
The garbage collector exists to find unreachable cycles of container objects and reclaim them. It is a supplement to reference counting, not a replacement for it.
10.1 Why Reference Counting Needs Help
A reference count reaches zero only when no strong references point to an object.
This works for ordinary object graphs:
x = []
del xThe list loses its final reference and can be destroyed.
It does not work for cycles:
a = []
b = []
a.append(b)
b.append(a)
del a
del bAfter the two names are deleted, the lists still reference each other:
list A ---> list B
list B ---> list ATheir reference counts remain nonzero. But no live Python code can reach them.
Reference counting sees local ownership. Garbage collection sees reachability.
10.2 What the Garbage Collector Tracks
CPython does not need to track every object in the cyclic garbage collector.
Objects that cannot contain references to other Python objects cannot form cycles by themselves. Examples include many integers, floats, and simple strings.
The collector mainly tracks container-like objects:
list
dict
set
tuple containing references
function
class
instance
frame
generator
coroutine
traceback
some extension objectsAn object needs GC tracking when it can participate in a cycle.
For example:
x = 123The integer object does not point to other Python objects in a way that creates container cycles.
But:
x = []
x.append(x)The list points to itself. This is a cycle.
10.3 Self-Cycles
The simplest reference cycle is a self-cycle:
x = []
x.append(x)The structure is:
x ---> list
^ |
| |
+--+Now delete the external name:
del xThe list still contains a reference to itself. Its reference count stays above zero.
No program variable can reach it, but reference counting alone cannot destroy it. The cyclic garbage collector must detect it.
10.4 Multi-Object Cycles
Cycles often involve multiple objects.
class Node:
def __init__(self):
self.parent = None
self.children = []
root = Node()
child = Node()
root.children.append(child)
child.parent = root
del root
del childThe graph becomes unreachable, but references remain inside it:
root node ---> children list ---> child node
^ |
| |
+----------- parent -----------+This pattern is common in trees, graphs, object models, ASTs, DOMs, caches, frames, closures, and exception tracebacks.
10.5 The Collector Works on Containers
The cyclic collector only needs to reason about objects that can point to other objects.
It asks each tracked object:
Which Python objects do you reference?At the C level, extension types answer through traversal support. A GC-aware type provides a traversal function that visits contained Python references.
Conceptually:
static int
Node_traverse(NodeObject *self, visitproc visit, void *arg)
{
Py_VISIT(self->parent);
Py_VISIT(self->children);
return 0;
}The collector uses this to walk object graphs.
If an extension object owns Python references and can be part of a cycle, it must participate in this protocol. Otherwise, it can leak cycles.
10.6 Tracked and Untracked Objects
An object may be tracked or untracked by the cyclic collector.
Tracked means the collector may inspect it during a collection.
Untracked means the collector ignores it for cycle detection.
You can inspect this from Python:
import gc
print(gc.is_tracked([]))
print(gc.is_tracked(123))
print(gc.is_tracked("hello"))Typical output on CPython may show that containers are tracked and atomic objects are not. Exact results can vary because CPython applies optimizations. For example, some containers may become untracked when they contain only atomic objects.
The important rule is conceptual:
objects that can participate in cycles need tracking
objects that cannot participate in cycles usually do not10.7 Generational Collection
CPython’s cyclic garbage collector is generational.
The idea is based on a common observation: most objects die young. Objects that survive several collections are likely to live longer.
A generational collector groups tracked objects by age. Younger generations are collected more often. Older generations are collected less often.
Conceptually:
generation 0
newest tracked objects
collected most often
generation 1
objects that survived earlier collection
collected less often
generation 2
older tracked objects
collected least oftenThe exact generation design can change across CPython versions. The useful mental model is that CPython avoids scanning all tracked containers on every collection.
10.8 Collection Thresholds
The gc module exposes thresholds:
import gc
print(gc.get_threshold())Thresholds help decide when automatic cyclic collection runs.
You can change them:
gc.set_threshold(700, 10, 10)You can force a collection:
gc.collect()You can disable automatic cyclic collection:
gc.disable()and re-enable it:
gc.enable()Disabling the collector does not disable reference counting. Objects whose reference counts reach zero are still destroyed. Disabling the collector only disables automatic cycle detection.
10.9 How Cycle Detection Works
The collector does not simply look for objects with nonzero reference counts. Almost every live object has a nonzero reference count.
Instead, it computes whether the references keeping a group alive come only from inside that group.
A simplified cycle detection process:
select tracked candidate objects
copy each object's reference count into a temporary field
for each reference from candidate object to candidate object:
subtract one from the target's temporary count
objects with temporary count still positive are reachable from outside
propagate reachability from those externally reachable objects
objects never reached are unreachable cyclesExample:
outside ---> A ---> B
^ |
| v
D <--- CEven if A, B, C, and D form a cycle, the outside reference to A makes the whole group reachable.
But:
A ---> B
^ |
| v
D <--- Cwith no outside reference is collectible.
10.10 Reachable Cycles Are Not Garbage
A cycle is not automatically garbage.
a = []
a.append(a)
print(a)This object is cyclic, but reachable through the name a. It must stay alive.
The collector only reclaims unreachable cycles.
This distinction matters for data structures. Cyclic graphs are normal and valid in Python. The collector exists so they can be used safely without manual breakage in common cases.
10.11 Finalizers
Finalizers complicate garbage collection.
A finalizer is usually a __del__ method:
class Resource:
def __del__(self):
print("destroying")Finalizers run during object destruction. They can execute Python code. That code can access globals, mutate state, acquire locks, create objects, or even resurrect the object being finalized.
Object resurrection means a finalizer makes the object reachable again:
saved = None
class Resurrect:
def __del__(self):
global saved
saved = selfThis makes collection more complex.
Modern CPython has specific rules for finalizing cyclic garbage, but the practical guidance is simple:
avoid complex __del__ methods
prefer context managers
prefer weakref.finalize for cleanup hooks10.12 Context Managers Are Better for Resources
Garbage collection is not a resource management API.
Use with for deterministic cleanup:
with open("data.txt") as f:
data = f.read()This closes the file when the block exits.
Do not rely on garbage collection timing:
f = open("data.txt")
data = f.read()
f = NoneIn CPython, reference counting may close the file quickly. In other implementations, cleanup may happen later.
For locks, sockets, files, transactions, temporary directories, and external handles, use explicit lifetime control.
10.13 gc.collect
gc.collect() forces a cyclic collection.
import gc
n = gc.collect()
print(n)The return value is the number of unreachable objects found and collected.
You can request a specific generation in versions that support that interface:
gc.collect(0)
gc.collect(1)
gc.collect(2)Manual collection is useful for:
tests
debugging leaks
interactive experiments
memory-sensitive batch phases
controlled benchmarksIt should rarely be needed in ordinary application code.
10.14 gc.get_objects
gc.get_objects() returns tracked objects known to the collector.
import gc
objs = gc.get_objects()
print(len(objs))This does not return every live Python object. It returns objects tracked by the cyclic garbage collector.
Many atomic objects may be absent.
Use it for debugging object graphs and memory leaks, not for normal application logic.
10.15 gc.get_referrers
gc.get_referrers(obj) returns objects that directly refer to obj.
import gc
x = []
refs = gc.get_referrers(x)
print(refs)This can help explain why an object remains alive.
But it must be used carefully. Calling debugging functions creates temporary references and frames. These may appear in results.
gc.get_referrers also exposes implementation details. It can show stack frames, dictionaries, lists, and internal objects.
10.16 gc.get_referents
gc.get_referents(obj) returns objects directly referenced by obj.
import gc
x = [[1], [2]]
print(gc.get_referents(x))For a list, this includes its elements.
For a dict, this may include keys and values.
This function uses the same traversal support that the collector uses. It sees GC-level referents, not necessarily all semantic relationships a programmer may imagine.
10.17 Uncollectable Garbage
Some objects may be found unreachable but not immediately collectable under certain conditions.
Historically, cycles containing finalizers were especially difficult. Modern CPython handles many of these cases better, but uncollectable objects can still appear with extension types or unusual finalization behavior.
The gc module exposes:
import gc
print(gc.garbage)When debugging, you can enable debug flags:
gc.set_debug(gc.DEBUG_SAVEALL)With DEBUG_SAVEALL, unreachable objects are saved in gc.garbage instead of being freed. This is useful for inspection, but it intentionally leaks until you clear the list.
10.18 Weak References
Weak references allow observing an object without keeping it alive.
import weakref
class User:
pass
u = User()
r = weakref.ref(u)
print(r()) # object
del u
print(r()) # NoneThe weak reference does not increment the target’s strong reference count.
Weak references are useful for caches, observer lists, parent links, and auxiliary metadata.
A parent pointer can often be weak:
import weakref
class Node:
def __init__(self):
self.children = []
self.parent = None
root = Node()
child = Node()
child.parent = weakref.ref(root)
root.children.append(child)This avoids forming a strong cycle through parent and child links.
10.19 weakref.finalize
weakref.finalize registers cleanup code without placing cleanup logic directly in __del__.
import weakref
class Resource:
pass
def cleanup(name):
print("cleaning", name)
r = Resource()
finalizer = weakref.finalize(r, cleanup, "resource")When r becomes unreachable, the finalizer can run.
This is often safer than writing complex __del__ methods because the finalizer separates cleanup state from the object being finalized.
Still, external resources should generally be managed with context managers when possible.
10.20 Frames, Tracebacks, and Cycles
Frames and tracebacks often create cycles.
Example:
def f():
x = []
raise RuntimeError
try:
f()
except RuntimeError as exc:
saved = excAn exception can hold a traceback. The traceback can hold frames. Frames hold local variables. Local variables may hold the exception or objects related to it.
Conceptually:
exception
traceback
frame
locals
exceptionThis can keep large object graphs alive.
Modern Python clears some exception state more aggressively than older versions, but stored exceptions and tracebacks can still retain memory.
Common mitigation:
try:
...
except Exception as exc:
...
finally:
exc = Noneor avoid storing tracebacks longer than necessary.
10.21 Closures and Cycles
Closures can also create cycles.
def make_func():
items = []
def add(x):
items.append(x)
return items
return addThe inner function references a cell. The cell references items.
More complex closures can reference objects that reference the function back.
Cycles through functions, closures, instances, and callbacks are common in real programs.
The collector handles many of them, but understanding the graph helps when memory grows unexpectedly.
10.22 Extension Types and GC Support
A C extension type that owns references to Python objects may need cyclic GC support.
Required pieces often include:
Py_TPFLAGS_HAVE_GC
tp_traverse
tp_clear
GC-aware allocation
GC-aware deallocationA traversal function tells the collector what objects this object references:
static int
Box_traverse(BoxObject *self, visitproc visit, void *arg)
{
Py_VISIT(self->value);
return 0;
}A clear function breaks references during collection:
static int
Box_clear(BoxObject *self)
{
Py_CLEAR(self->value);
return 0;
}The deallocator must untrack the object and clear owned references safely.
Conceptual shape:
static void
Box_dealloc(BoxObject *self)
{
PyObject_GC_UnTrack(self);
Box_clear(self);
Py_TYPE(self)->tp_free((PyObject *)self);
}This is simplified. Real extension code must follow the exact C API requirements.
10.23 Py_VISIT
Py_VISIT is used inside traversal functions.
static int
Node_traverse(NodeObject *self, visitproc visit, void *arg)
{
Py_VISIT(self->left);
Py_VISIT(self->right);
Py_VISIT(self->parent);
return 0;
}It checks whether the field is non-NULL and passes it to the visitor.
A correct traversal function must visit every contained Python object reference that can participate in cycles.
Missing one field can make cycles invisible to the collector.
10.24 Py_CLEAR
Py_CLEAR safely clears an owned reference.
A naive clear operation:
Py_DECREF(self->value);
self->value = NULL;has a subtle problem. The Py_DECREF can run arbitrary code through finalizers. That code might observe self before self->value has been set to NULL.
Py_CLEAR sets the field to NULL first, then decrements the old reference.
Conceptually:
tmp = self->value;
self->value = NULL;
Py_XDECREF(tmp);Use Py_CLEAR when breaking references inside containers or deallocators.
10.25 Collection and Performance
The cyclic collector has a cost. It scans tracked containers and follows references.
Most programs should leave it enabled. But some workloads may tune it:
short-lived batch jobs
allocation-heavy parsers
large object graph construction
scientific pipelines
services with known allocation phasesExample:
import gc
gc.disable()
try:
build_large_graph()
finally:
gc.enable()
gc.collect()This pattern can help when you know a phase creates many containers without cycles. It can also hurt if cycles accumulate.
Measure before tuning.
10.26 Memory Leaks vs Retained References
Not all memory growth is a leak in the C sense.
A program may retain references accidentally:
cache = []
def handle(request):
cache.append(request)The collector cannot free reachable objects. If cache keeps growing, those objects are live.
A true C-level leak means memory or references are lost by the implementation.
A Python-level retention bug means the program still has reachable references to objects it no longer needs.
Debugging memory often starts by asking:
Is the object unreachable but not collected?
Or is something still referring to it?Use gc.get_referrers, tracemalloc, object graph tools, and heap inspection to answer that.
10.27 Common Cycle Patterns
Common sources of cycles include:
| Pattern | Shape |
|---|---|
| Parent and child links | parent -> child -> parent |
| Doubly linked lists | node A -> node B -> node A |
| Graph structures | arbitrary cycles |
| Observer callbacks | object -> callback -> bound method -> object |
| Exceptions | exception -> traceback -> frame -> locals |
| Closures | function -> closure cell -> object -> function |
| Descriptors | class -> descriptor -> class-related state |
| C extension objects | native object -> Python object -> native wrapper |
Cycles are normal. The issue is whether unreachable cycles are collectible and whether they contain external resources that need deterministic cleanup.
10.28 Practical Rules
For Python code:
| Situation | Preferred approach |
|---|---|
| External resource | Use with |
| Parent pointer | Consider weakref |
| Cache | Use weakref.WeakValueDictionary when appropriate |
| Long-lived exception | Avoid retaining traceback unnecessarily |
| Memory debugging | Use gc, tracemalloc, and referrer inspection |
| Cleanup hook | Prefer context manager or weakref.finalize |
| Large allocation phase | Tune GC only after measurement |
For C extension code:
| Situation | Required discipline |
|---|---|
| Type owns Python references | Implement proper deallocation |
| Type can form cycles | Add GC support |
| Type supports GC | Implement tp_traverse correctly |
| Type participates in collection | Implement tp_clear correctly |
| Breaking references | Use Py_CLEAR |
| Traversing fields | Use Py_VISIT |
| Deallocating GC object | Untrack before clearing |
10.29 Mental Model
Use this model:
Reference counting handles local lifetime.
Cyclic GC handles unreachable container cycles.
The collector only sees tracked objects.
Tracked objects must describe their outgoing references.
Reachable cycles remain alive.
Unreachable cycles can be collected.
Finalizers and extension types complicate collection.
Resource lifetime should be explicit.This model explains why most CPython objects disappear promptly, why some cyclic structures survive until a collection, why __del__ needs care, and why C extension types must participate in the GC protocol when they own Python references.
10.30 Summary
CPython’s garbage collector exists because reference counting cannot reclaim cycles. It tracks container-like objects, analyzes references between them, finds unreachable groups, and clears them safely.
The collector is generational, configurable through the gc module, and deeply tied to the object model. Python programmers mainly need to understand cycles, finalizers, weak references, and resource lifetime. C extension authors must implement traversal and clearing correctly when their types can participate in cycles.