Skip to content

55. `gc`

gc.collect, gc.get_objects, gc.freeze, callbacks, and tuning generational thresholds.

The gc module exposes CPython’s cyclic garbage collector. It gives Python code a way to inspect, configure, disable, enable, and manually run the collector that handles unreachable reference cycles.

CPython’s main memory management mechanism is reference counting. Most objects are destroyed as soon as their reference count reaches zero. The cyclic garbage collector exists for the cases reference counting cannot solve: groups of objects that keep each other alive even though no live outside reference can reach them.

55.1 Why CPython Needs gc

Reference counting handles simple object lifetime well.

x = []
y = x

del x
del y

After the last reference disappears, the list can be destroyed immediately.

Cycles are different:

a = []
b = []

a.append(b)
b.append(a)

del a
del b

The two lists still reference each other. Their reference counts remain nonzero, but the program can no longer reach them.

Conceptually:

outside references: none

list A → list B
list B → list A

Reference counting alone cannot prove this group is garbage. The cyclic collector finds such groups.

55.2 What the Collector Tracks

The cyclic collector tracks container objects.

A tracked object is an object that may contain references to other Python objects.

Common tracked objects include:

Object kindUsually tracked?Reason
listYesContains references
dictSometimesCan contain references
setYesContains references
tupleSometimesDepends on contents
FunctionYesReferences globals, defaults, closure
ClassYesReferences dict, bases, descriptors
FrameYesReferences locals, stack, globals
IntegerNoDoes not contain object references
Many stringsNoDoes not contain object references

The collector does not need to track every object. An object that cannot point to another object cannot participate in a reference cycle.

55.3 The gc Module Interface

The main control functions are:

FunctionPurpose
gc.enable()Enable automatic collection
gc.disable()Disable automatic collection
gc.isenabled()Return whether automatic collection is enabled
gc.collect()Run a collection manually
gc.get_count()Return allocation counters
gc.get_threshold()Return collection thresholds
gc.set_threshold()Change thresholds
gc.get_objects()Return tracked objects
gc.is_tracked()Test if an object is tracked
gc.get_referrers()Find objects referring to an object
gc.get_referents()Find objects referenced by an object

Example:

import gc

print(gc.isenabled())
print(gc.get_count())
print(gc.get_threshold())

collected = gc.collect()
print(collected)

The module is mostly diagnostic and control-oriented. Normal Python programs rarely need to call it directly.

55.4 Reference Counting and Cyclic GC Together

The two systems cooperate.

reference counting
    handles ordinary lifetime immediately

cyclic GC
    periodically detects unreachable cycles

For most short-lived objects, reference counting does all the work. The cyclic collector runs less frequently and focuses on container objects.

Example:

def f():
    xs = [1, 2, 3]
    return sum(xs)

The list xs usually disappears as soon as the function returns because its reference count drops to zero.

A cycle may survive longer:

def f():
    xs = []
    xs.append(xs)

When f() returns, xs has no external reference, but it references itself. The cyclic collector must clean it later.

55.5 Generations

Traditional CPython cyclic GC uses generations.

The practical idea is simple: most objects die young. Recently allocated objects are checked more often than older objects.

Conceptually:

generation 0: young objects
generation 1: older objects
generation 2: oldest objects

A collection of generation 0 checks young objects. Survivors may be promoted. Older generations are collected less frequently.

This reduces overhead because the collector avoids scanning all tracked objects every time.

The public API exposes generation-oriented functions:

import gc

gc.collect(0)  # collect youngest generation
gc.collect(1)
gc.collect(2)  # collect full set in traditional model

Exact generation behavior is version-dependent. Treat the public API as stable, but avoid depending on internal generation details.

55.6 Thresholds

Automatic collection is driven by allocation counters and thresholds.

import gc

print(gc.get_count())
print(gc.get_threshold())

gc.get_count() returns counters for tracked allocations.

gc.get_threshold() returns threshold values used to decide when collection should run.

Example:

import gc

old = gc.get_threshold()
gc.set_threshold(1000, 10, 10)

try:
    # workload here
    pass
finally:
    gc.set_threshold(*old)

Changing thresholds affects when automatic cyclic collection happens. It does not disable reference counting.

55.7 Manual Collection

gc.collect() forces collection.

import gc

n = gc.collect()
print(n)

The return value is the number of unreachable objects collected.

Manual collection is useful for:

tests that assert cleanup
memory diagnostics
long-running batch phases
debugging cycles
embedding scenarios

It is usually a poor substitute for fixing object lifetime problems.

A common pattern in tests:

import gc
import weakref

class Node:
    pass

obj = Node()
ref = weakref.ref(obj)

del obj
gc.collect()

assert ref() is None

55.8 Tracked vs Untracked Objects

You can ask whether an object is tracked:

import gc

print(gc.is_tracked([]))
print(gc.is_tracked(123))
print(gc.is_tracked("abc"))

A list is usually tracked. An integer usually is not.

Some containers may be untracked as an optimization when they only contain atomic objects.

Example:

import gc

t = (1, 2, 3)
print(gc.is_tracked(t))

The exact result may vary depending on interpreter version and collector state.

Do not write program logic that depends on gc.is_tracked() for ordinary application behavior. Use it for diagnostics.

55.9 Referents

gc.get_referents(obj) returns objects directly referenced by obj.

Example:

import gc

xs = [1, "a", {}]

print(gc.get_referents(xs))

For a list, referents are its elements.

For a function, referents may include its globals, defaults, annotations, code object, closure, and other internal fields.

This function exposes the object graph from the perspective of CPython’s traversal protocol.

Conceptually:

object
objects it directly points to

The collector uses similar traversal logic internally.

55.10 Referrers

gc.get_referrers(obj) returns objects that directly refer to obj.

Example:

import gc

x = []
holder = {"x": x}

refs = gc.get_referrers(x)
print(any(r is holder for r in refs))

This is useful but dangerous.

The returned referrers may include:

current stack frames
locals dictionaries
temporary objects created by inspection
internal interpreter structures
test harness objects
debugger objects

Because calling get_referrers() itself creates references, results can be noisy.

It is primarily a debugging tool.

55.11 Object Graph Traversal

The cyclic collector sees memory as a graph.

object A → object B
object B → object C
object C → object A

A cycle becomes collectable only when no live root can reach it.

Roots include active frames, module globals, thread state, interpreter state, and objects reachable from them.

Simplified model:

live roots
reachable objects survive

unreachable tracked group
candidate garbage

The collector must distinguish internal references within an unreachable group from external references coming from live objects.

55.12 Finalizers

Objects may define finalizers.

class FileLike:
    def __del__(self):
        print("closing")

Finalizers complicate cycle collection because destruction order matters.

Example:

class Node:
    def __del__(self):
        print("finalize")

a = Node()
b = Node()

a.other = b
b.other = a

del a
del b

Modern CPython can handle many cycles with finalizers, but finalization remains a subtle area. The safe rule is simple: avoid depending on __del__ for important resource cleanup.

Use context managers instead:

with open("data.txt") as f:
    data = f.read()

Context managers make cleanup explicit and deterministic.

55.13 Weak References

Weak references help avoid ownership cycles.

Example:

import weakref

class Parent:
    pass

class Child:
    pass

parent = Parent()
child = Child()

child.parent = weakref.ref(parent)

The weak reference does not increase the parent’s reference count. This helps with graphs such as:

parent → child
child → parent weakly

Weak references are common in caches, observer lists, parent pointers, memoization tables, and object registries.

55.14 gc.garbage

gc.garbage contains objects the collector found unreachable but could not collect.

import gc

print(gc.garbage)

In modern CPython, this list is usually empty unless debugging flags or special extension types are involved.

Historically, cycles with finalizers often ended up here. Modern finalization rules reduced that problem.

Still, extension types with incorrect GC support can create uncollectable objects.

55.15 Debug Flags

The gc module supports debug flags.

Example:

import gc

gc.set_debug(gc.DEBUG_STATS)
gc.collect()
gc.set_debug(0)

Common flags include:

FlagMeaning
DEBUG_STATSPrint collection statistics
DEBUG_COLLECTABLEPrint collectable objects
DEBUG_UNCOLLECTABLEPrint uncollectable objects
DEBUG_SAVEALLSave unreachable objects in gc.garbage instead of freeing them

DEBUG_SAVEALL is useful for postmortem analysis:

import gc

gc.set_debug(gc.DEBUG_SAVEALL)
gc.collect()

for obj in gc.garbage:
    print(type(obj))

Clear gc.garbage after inspection to release objects.

55.16 Freezing Objects

gc.freeze() moves currently tracked objects into a permanent generation.

import gc

gc.freeze()

This is useful in some process-forking scenarios. If many long-lived objects are frozen before fork(), the collector avoids touching them later, which can help preserve copy-on-write memory sharing.

Related functions:

gc.freeze()
gc.unfreeze()
gc.get_freeze_count()

This feature targets specialized runtime and deployment cases, not ordinary application logic.

55.17 Frames and Cycles

Frames are common sources of cycles.

Example:

import sys

def f():
    frame = sys._getframe()
    return frame

frame = f()

The frame references its locals. The local variable frame references the frame itself.

Conceptually:

frame
locals
frame

Tracebacks also keep frames alive:

try:
    1 / 0
except Exception as exc:
    saved = exc

The exception holds a traceback. The traceback holds frames. Frames hold locals. This can retain large object graphs.

A cleanup pattern:

try:
    1 / 0
except Exception as exc:
    # inspect exc
    pass
finally:
    exc = None

In modern Python, exception variables in except blocks are cleared automatically after the block, but saved exceptions can still retain tracebacks.

55.18 Extension Types and GC Support

C extension types that contain Python object references must cooperate with the cyclic collector.

A container extension type usually needs:

tp_traverse
tp_clear
Py_TPFLAGS_HAVE_GC
GC-aware allocation
GC-aware deallocation

Conceptual responsibilities:

Slot or mechanismPurpose
tp_traverseVisit contained references
tp_clearBreak internal references during collection
GC allocationRegister object with collector
GC deallocationUntrack object before destruction

If an extension type stores Python object pointers but does not expose them to the collector, cycles involving that type may leak.

Simplified C shape:

static int
Node_traverse(NodeObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->child);
    return 0;
}

static int
Node_clear(NodeObject *self)
{
    Py_CLEAR(self->child);
    return 0;
}

This pattern lets the collector discover and break cycles.

55.19 tp_traverse

tp_traverse tells the collector which Python objects are reachable from a container object.

Conceptually:

for each PyObject* field inside this object:
    visit(field)

The collector uses this to build reachability information.

For a tree node extension type:

typedef struct {
    PyObject_HEAD
    PyObject *left;
    PyObject *right;
    PyObject *value;
} NodeObject;

Traversal should visit all Python object fields:

static int
Node_traverse(NodeObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->left);
    Py_VISIT(self->right);
    Py_VISIT(self->value);
    return 0;
}

Missing one field can produce incorrect collection behavior.

55.20 tp_clear

tp_clear breaks references during cycle collection.

static int
Node_clear(NodeObject *self)
{
    Py_CLEAR(self->left);
    Py_CLEAR(self->right);
    Py_CLEAR(self->value);
    return 0;
}

Py_CLEAR sets the pointer to NULL before decrementing the reference. This is safer than a plain Py_DECREF when breaking cycles because finalization may re-enter code.

The collector calls tp_clear on unreachable containers to break internal reference cycles and allow reference counts to fall to zero.

55.21 Disabling GC

You can disable automatic cyclic collection:

import gc

gc.disable()
try:
    # critical allocation-heavy section
    pass
finally:
    gc.enable()

This does not disable reference counting.

Objects whose reference counts reach zero are still destroyed.

Disabling cyclic GC can help in specialized workloads where:

object graphs are known to be acyclic
latency spikes from GC are unacceptable
manual collection points are controlled

It can also cause memory growth if cycles are created.

55.22 Memory Diagnostics Pattern

A practical leak investigation often follows this pattern:

import gc
import weakref

class Node:
    pass

obj = Node()
ref = weakref.ref(obj)

del obj
gc.collect()

if ref() is not None:
    print("object is still alive")
    print(gc.get_referrers(ref()))

For container graphs:

import gc

gc.set_debug(gc.DEBUG_SAVEALL)
gc.collect()

for obj in gc.garbage:
    print(type(obj), repr(obj)[:80])

gc.garbage.clear()
gc.set_debug(0)

This is intrusive. Use it in tests, debugging sessions, or isolated reproduction scripts.

55.23 Performance Considerations

Cyclic GC has runtime cost because it must scan tracked containers.

Costs come from:

maintaining allocation counters
tracking container objects
traversing object graphs
handling finalizers
promoting survivors
clearing unreachable cycles

For most programs, default settings are sufficient.

Tune GC only with evidence:

latency measurements
allocation profiles
memory growth data
object graph analysis
production traces

Common mistakes include:

calling gc.collect() too frequently
disabling GC globally without cycle analysis
using get_referrers() in production paths
keeping frame objects alive accidentally
storing tracebacks indefinitely

55.24 Relationship to sys

sys.getrefcount() and gc often appear together.

import gc
import sys

x = []
print(sys.getrefcount(x))
print(gc.is_tracked(x))
print(gc.get_referents(x))

They expose different layers:

ModuleLayer
sys.getrefcount()Direct reference count
gc.is_tracked()Cyclic GC tracking state
gc.get_referents()Traversal graph
gc.collect()Cycle collection

Reference counts answer “how many references exist?”

GC traversal answers “what graph can this object reach?”

Both are CPython-specific enough that they should be used carefully in portable code.

55.25 Chapter Summary

The gc module exposes CPython’s cyclic garbage collector. CPython primarily uses reference counting, but reference counting cannot reclaim unreachable cycles. The cyclic collector tracks container objects, traverses object graphs, detects unreachable groups, and breaks cycles so objects can be destroyed.

For normal application code, gc is rarely needed. For CPython internals, extension modules, memory leak debugging, frame and traceback analysis, and long-running systems, it is essential.