# 55. `gc`

# 55. `gc`

The `gc` module exposes CPython’s cyclic garbage collector. It gives Python code a way to inspect, configure, disable, enable, and manually run the collector that handles unreachable reference cycles.

CPython’s main memory management mechanism is reference counting. Most objects are destroyed as soon as their reference count reaches zero. The cyclic garbage collector exists for the cases reference counting cannot solve: groups of objects that keep each other alive even though no live outside reference can reach them.

## 55.1 Why CPython Needs `gc`

Reference counting handles simple object lifetime well.

```python
x = []
y = x

del x
del y
```

After the last reference disappears, the list can be destroyed immediately.

Cycles are different:

```python
a = []
b = []

a.append(b)
b.append(a)

del a
del b
```

The two lists still reference each other. Their reference counts remain nonzero, but the program can no longer reach them.

Conceptually:

```text
outside references: none

list A → list B
list B → list A
```

Reference counting alone cannot prove this group is garbage. The cyclic collector finds such groups.

## 55.2 What the Collector Tracks

The cyclic collector tracks container objects.

A tracked object is an object that may contain references to other Python objects.

Common tracked objects include:

| Object kind | Usually tracked? | Reason |
|---|---:|---|
| `list` | Yes | Contains references |
| `dict` | Sometimes | Can contain references |
| `set` | Yes | Contains references |
| `tuple` | Sometimes | Depends on contents |
| Function | Yes | References globals, defaults, closure |
| Class | Yes | References dict, bases, descriptors |
| Frame | Yes | References locals, stack, globals |
| Integer | No | Does not contain object references |
| Many strings | No | Does not contain object references |

The collector does not need to track every object. An object that cannot point to another object cannot participate in a reference cycle.

## 55.3 The `gc` Module Interface

The main control functions are:

| Function | Purpose |
|---|---|
| `gc.enable()` | Enable automatic collection |
| `gc.disable()` | Disable automatic collection |
| `gc.isenabled()` | Return whether automatic collection is enabled |
| `gc.collect()` | Run a collection manually |
| `gc.get_count()` | Return allocation counters |
| `gc.get_threshold()` | Return collection thresholds |
| `gc.set_threshold()` | Change thresholds |
| `gc.get_objects()` | Return tracked objects |
| `gc.is_tracked()` | Test if an object is tracked |
| `gc.get_referrers()` | Find objects referring to an object |
| `gc.get_referents()` | Find objects referenced by an object |

Example:

```python
import gc

print(gc.isenabled())
print(gc.get_count())
print(gc.get_threshold())

collected = gc.collect()
print(collected)
```

The module is mostly diagnostic and control-oriented. Normal Python programs rarely need to call it directly.

## 55.4 Reference Counting and Cyclic GC Together

The two systems cooperate.

```text
reference counting
    handles ordinary lifetime immediately

cyclic GC
    periodically detects unreachable cycles
```

For most short-lived objects, reference counting does all the work. The cyclic collector runs less frequently and focuses on container objects.

Example:

```python
def f():
    xs = [1, 2, 3]
    return sum(xs)
```

The list `xs` usually disappears as soon as the function returns because its reference count drops to zero.

A cycle may survive longer:

```python
def f():
    xs = []
    xs.append(xs)
```

When `f()` returns, `xs` has no external reference, but it references itself. The cyclic collector must clean it later.

## 55.5 Generations

Traditional CPython cyclic GC uses generations.

The practical idea is simple: most objects die young. Recently allocated objects are checked more often than older objects.

Conceptually:

```text
generation 0: young objects
generation 1: older objects
generation 2: oldest objects
```

A collection of generation 0 checks young objects. Survivors may be promoted. Older generations are collected less frequently.

This reduces overhead because the collector avoids scanning all tracked objects every time.

The public API exposes generation-oriented functions:

```python
import gc

gc.collect(0)  # collect youngest generation
gc.collect(1)
gc.collect(2)  # collect full set in traditional model
```

Exact generation behavior is version-dependent. Treat the public API as stable, but avoid depending on internal generation details.

## 55.6 Thresholds

Automatic collection is driven by allocation counters and thresholds.

```python
import gc

print(gc.get_count())
print(gc.get_threshold())
```

`gc.get_count()` returns counters for tracked allocations.

`gc.get_threshold()` returns threshold values used to decide when collection should run.

Example:

```python
import gc

old = gc.get_threshold()
gc.set_threshold(1000, 10, 10)

try:
    # workload here
    pass
finally:
    gc.set_threshold(*old)
```

Changing thresholds affects when automatic cyclic collection happens. It does not disable reference counting.

## 55.7 Manual Collection

`gc.collect()` forces collection.

```python
import gc

n = gc.collect()
print(n)
```

The return value is the number of unreachable objects collected.

Manual collection is useful for:

```text
tests that assert cleanup
memory diagnostics
long-running batch phases
debugging cycles
embedding scenarios
```

It is usually a poor substitute for fixing object lifetime problems.

A common pattern in tests:

```python
import gc
import weakref

class Node:
    pass

obj = Node()
ref = weakref.ref(obj)

del obj
gc.collect()

assert ref() is None
```

## 55.8 Tracked vs Untracked Objects

You can ask whether an object is tracked:

```python
import gc

print(gc.is_tracked([]))
print(gc.is_tracked(123))
print(gc.is_tracked("abc"))
```

A list is usually tracked. An integer usually is not.

Some containers may be untracked as an optimization when they only contain atomic objects.

Example:

```python
import gc

t = (1, 2, 3)
print(gc.is_tracked(t))
```

The exact result may vary depending on interpreter version and collector state.

Do not write program logic that depends on `gc.is_tracked()` for ordinary application behavior. Use it for diagnostics.

## 55.9 Referents

`gc.get_referents(obj)` returns objects directly referenced by `obj`.

Example:

```python
import gc

xs = [1, "a", {}]

print(gc.get_referents(xs))
```

For a list, referents are its elements.

For a function, referents may include its globals, defaults, annotations, code object, closure, and other internal fields.

This function exposes the object graph from the perspective of CPython’s traversal protocol.

Conceptually:

```text
object
    ↓
objects it directly points to
```

The collector uses similar traversal logic internally.

## 55.10 Referrers

`gc.get_referrers(obj)` returns objects that directly refer to `obj`.

Example:

```python
import gc

x = []
holder = {"x": x}

refs = gc.get_referrers(x)
print(any(r is holder for r in refs))
```

This is useful but dangerous.

The returned referrers may include:

```text
current stack frames
locals dictionaries
temporary objects created by inspection
internal interpreter structures
test harness objects
debugger objects
```

Because calling `get_referrers()` itself creates references, results can be noisy.

It is primarily a debugging tool.

## 55.11 Object Graph Traversal

The cyclic collector sees memory as a graph.

```text
object A → object B
object B → object C
object C → object A
```

A cycle becomes collectable only when no live root can reach it.

Roots include active frames, module globals, thread state, interpreter state, and objects reachable from them.

Simplified model:

```text
live roots
    ↓
reachable objects survive

unreachable tracked group
    ↓
candidate garbage
```

The collector must distinguish internal references within an unreachable group from external references coming from live objects.

## 55.12 Finalizers

Objects may define finalizers.

```python
class FileLike:
    def __del__(self):
        print("closing")
```

Finalizers complicate cycle collection because destruction order matters.

Example:

```python
class Node:
    def __del__(self):
        print("finalize")

a = Node()
b = Node()

a.other = b
b.other = a

del a
del b
```

Modern CPython can handle many cycles with finalizers, but finalization remains a subtle area. The safe rule is simple: avoid depending on `__del__` for important resource cleanup.

Use context managers instead:

```python
with open("data.txt") as f:
    data = f.read()
```

Context managers make cleanup explicit and deterministic.

## 55.13 Weak References

Weak references help avoid ownership cycles.

Example:

```python
import weakref

class Parent:
    pass

class Child:
    pass

parent = Parent()
child = Child()

child.parent = weakref.ref(parent)
```

The weak reference does not increase the parent’s reference count. This helps with graphs such as:

```text
parent → child
child → parent weakly
```

Weak references are common in caches, observer lists, parent pointers, memoization tables, and object registries.

## 55.14 `gc.garbage`

`gc.garbage` contains objects the collector found unreachable but could not collect.

```python
import gc

print(gc.garbage)
```

In modern CPython, this list is usually empty unless debugging flags or special extension types are involved.

Historically, cycles with finalizers often ended up here. Modern finalization rules reduced that problem.

Still, extension types with incorrect GC support can create uncollectable objects.

## 55.15 Debug Flags

The `gc` module supports debug flags.

Example:

```python
import gc

gc.set_debug(gc.DEBUG_STATS)
gc.collect()
gc.set_debug(0)
```

Common flags include:

| Flag | Meaning |
|---|---|
| `DEBUG_STATS` | Print collection statistics |
| `DEBUG_COLLECTABLE` | Print collectable objects |
| `DEBUG_UNCOLLECTABLE` | Print uncollectable objects |
| `DEBUG_SAVEALL` | Save unreachable objects in `gc.garbage` instead of freeing them |

`DEBUG_SAVEALL` is useful for postmortem analysis:

```python
import gc

gc.set_debug(gc.DEBUG_SAVEALL)
gc.collect()

for obj in gc.garbage:
    print(type(obj))
```

Clear `gc.garbage` after inspection to release objects.

## 55.16 Freezing Objects

`gc.freeze()` moves currently tracked objects into a permanent generation.

```python
import gc

gc.freeze()
```

This is useful in some process-forking scenarios. If many long-lived objects are frozen before `fork()`, the collector avoids touching them later, which can help preserve copy-on-write memory sharing.

Related functions:

```python
gc.freeze()
gc.unfreeze()
gc.get_freeze_count()
```

This feature targets specialized runtime and deployment cases, not ordinary application logic.

## 55.17 Frames and Cycles

Frames are common sources of cycles.

Example:

```python
import sys

def f():
    frame = sys._getframe()
    return frame

frame = f()
```

The frame references its locals. The local variable `frame` references the frame itself.

Conceptually:

```text
frame
    ↓
locals
    ↓
frame
```

Tracebacks also keep frames alive:

```python
try:
    1 / 0
except Exception as exc:
    saved = exc
```

The exception holds a traceback. The traceback holds frames. Frames hold locals. This can retain large object graphs.

A cleanup pattern:

```python
try:
    1 / 0
except Exception as exc:
    # inspect exc
    pass
finally:
    exc = None
```

In modern Python, exception variables in `except` blocks are cleared automatically after the block, but saved exceptions can still retain tracebacks.

## 55.18 Extension Types and GC Support

C extension types that contain Python object references must cooperate with the cyclic collector.

A container extension type usually needs:

```text
tp_traverse
tp_clear
Py_TPFLAGS_HAVE_GC
GC-aware allocation
GC-aware deallocation
```

Conceptual responsibilities:

| Slot or mechanism | Purpose |
|---|---|
| `tp_traverse` | Visit contained references |
| `tp_clear` | Break internal references during collection |
| GC allocation | Register object with collector |
| GC deallocation | Untrack object before destruction |

If an extension type stores Python object pointers but does not expose them to the collector, cycles involving that type may leak.

Simplified C shape:

```c
static int
Node_traverse(NodeObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->child);
    return 0;
}

static int
Node_clear(NodeObject *self)
{
    Py_CLEAR(self->child);
    return 0;
}
```

This pattern lets the collector discover and break cycles.

## 55.19 `tp_traverse`

`tp_traverse` tells the collector which Python objects are reachable from a container object.

Conceptually:

```text
for each PyObject* field inside this object:
    visit(field)
```

The collector uses this to build reachability information.

For a tree node extension type:

```c
typedef struct {
    PyObject_HEAD
    PyObject *left;
    PyObject *right;
    PyObject *value;
} NodeObject;
```

Traversal should visit all Python object fields:

```c
static int
Node_traverse(NodeObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->left);
    Py_VISIT(self->right);
    Py_VISIT(self->value);
    return 0;
}
```

Missing one field can produce incorrect collection behavior.

## 55.20 `tp_clear`

`tp_clear` breaks references during cycle collection.

```c
static int
Node_clear(NodeObject *self)
{
    Py_CLEAR(self->left);
    Py_CLEAR(self->right);
    Py_CLEAR(self->value);
    return 0;
}
```

`Py_CLEAR` sets the pointer to `NULL` before decrementing the reference. This is safer than a plain `Py_DECREF` when breaking cycles because finalization may re-enter code.

The collector calls `tp_clear` on unreachable containers to break internal reference cycles and allow reference counts to fall to zero.

## 55.21 Disabling GC

You can disable automatic cyclic collection:

```python
import gc

gc.disable()
try:
    # critical allocation-heavy section
    pass
finally:
    gc.enable()
```

This does not disable reference counting.

Objects whose reference counts reach zero are still destroyed.

Disabling cyclic GC can help in specialized workloads where:

```text
object graphs are known to be acyclic
latency spikes from GC are unacceptable
manual collection points are controlled
```

It can also cause memory growth if cycles are created.

## 55.22 Memory Diagnostics Pattern

A practical leak investigation often follows this pattern:

```python
import gc
import weakref

class Node:
    pass

obj = Node()
ref = weakref.ref(obj)

del obj
gc.collect()

if ref() is not None:
    print("object is still alive")
    print(gc.get_referrers(ref()))
```

For container graphs:

```python
import gc

gc.set_debug(gc.DEBUG_SAVEALL)
gc.collect()

for obj in gc.garbage:
    print(type(obj), repr(obj)[:80])

gc.garbage.clear()
gc.set_debug(0)
```

This is intrusive. Use it in tests, debugging sessions, or isolated reproduction scripts.

## 55.23 Performance Considerations

Cyclic GC has runtime cost because it must scan tracked containers.

Costs come from:

```text
maintaining allocation counters
tracking container objects
traversing object graphs
handling finalizers
promoting survivors
clearing unreachable cycles
```

For most programs, default settings are sufficient.

Tune GC only with evidence:

```text
latency measurements
allocation profiles
memory growth data
object graph analysis
production traces
```

Common mistakes include:

```text
calling gc.collect() too frequently
disabling GC globally without cycle analysis
using get_referrers() in production paths
keeping frame objects alive accidentally
storing tracebacks indefinitely
```

## 55.24 Relationship to `sys`

`sys.getrefcount()` and `gc` often appear together.

```python
import gc
import sys

x = []
print(sys.getrefcount(x))
print(gc.is_tracked(x))
print(gc.get_referents(x))
```

They expose different layers:

| Module | Layer |
|---|---|
| `sys.getrefcount()` | Direct reference count |
| `gc.is_tracked()` | Cyclic GC tracking state |
| `gc.get_referents()` | Traversal graph |
| `gc.collect()` | Cycle collection |

Reference counts answer “how many references exist?”

GC traversal answers “what graph can this object reach?”

Both are CPython-specific enough that they should be used carefully in portable code.

## 55.25 Chapter Summary

The `gc` module exposes CPython’s cyclic garbage collector. CPython primarily uses reference counting, but reference counting cannot reclaim unreachable cycles. The cyclic collector tracks container objects, traverses object graphs, detects unreachable groups, and breaks cycles so objects can be destroyed.

For normal application code, `gc` is rarely needed. For CPython internals, extension modules, memory leak debugging, frame and traceback analysis, and long-running systems, it is essential.
