# 10. The Garbage Collector

# 10. The Garbage Collector

CPython uses reference counting as its main memory management mechanism. Reference counting destroys most objects as soon as their last strong reference disappears.

Reference counting has one major limitation: it cannot reclaim reference cycles by itself.

The garbage collector exists to find unreachable cycles of container objects and reclaim them. It is a supplement to reference counting, not a replacement for it.

## 10.1 Why Reference Counting Needs Help

A reference count reaches zero only when no strong references point to an object.

This works for ordinary object graphs:

```python
x = []
del x
```

The list loses its final reference and can be destroyed.

It does not work for cycles:

```python
a = []
b = []

a.append(b)
b.append(a)

del a
del b
```

After the two names are deleted, the lists still reference each other:

```text
list A ---> list B
list B ---> list A
```

Their reference counts remain nonzero. But no live Python code can reach them.

Reference counting sees local ownership. Garbage collection sees reachability.

## 10.2 What the Garbage Collector Tracks

CPython does not need to track every object in the cyclic garbage collector.

Objects that cannot contain references to other Python objects cannot form cycles by themselves. Examples include many integers, floats, and simple strings.

The collector mainly tracks container-like objects:

```text
list
dict
set
tuple containing references
function
class
instance
frame
generator
coroutine
traceback
some extension objects
```

An object needs GC tracking when it can participate in a cycle.

For example:

```python
x = 123
```

The integer object does not point to other Python objects in a way that creates container cycles.

But:

```python
x = []
x.append(x)
```

The list points to itself. This is a cycle.

## 10.3 Self-Cycles

The simplest reference cycle is a self-cycle:

```python
x = []
x.append(x)
```

The structure is:

```text
x ---> list
       ^  |
       |  |
       +--+
```

Now delete the external name:

```python
del x
```

The list still contains a reference to itself. Its reference count stays above zero.

No program variable can reach it, but reference counting alone cannot destroy it. The cyclic garbage collector must detect it.

## 10.4 Multi-Object Cycles

Cycles often involve multiple objects.

```python
class Node:
    def __init__(self):
        self.parent = None
        self.children = []

root = Node()
child = Node()

root.children.append(child)
child.parent = root

del root
del child
```

The graph becomes unreachable, but references remain inside it:

```text
root node ---> children list ---> child node
    ^                              |
    |                              |
    +----------- parent -----------+
```

This pattern is common in trees, graphs, object models, ASTs, DOMs, caches, frames, closures, and exception tracebacks.

## 10.5 The Collector Works on Containers

The cyclic collector only needs to reason about objects that can point to other objects.

It asks each tracked object:

```text
Which Python objects do you reference?
```

At the C level, extension types answer through traversal support. A GC-aware type provides a traversal function that visits contained Python references.

Conceptually:

```c
static int
Node_traverse(NodeObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->parent);
    Py_VISIT(self->children);
    return 0;
}
```

The collector uses this to walk object graphs.

If an extension object owns Python references and can be part of a cycle, it must participate in this protocol. Otherwise, it can leak cycles.

## 10.6 Tracked and Untracked Objects

An object may be tracked or untracked by the cyclic collector.

Tracked means the collector may inspect it during a collection.

Untracked means the collector ignores it for cycle detection.

You can inspect this from Python:

```python
import gc

print(gc.is_tracked([]))
print(gc.is_tracked(123))
print(gc.is_tracked("hello"))
```

Typical output on CPython may show that containers are tracked and atomic objects are not. Exact results can vary because CPython applies optimizations. For example, some containers may become untracked when they contain only atomic objects.

The important rule is conceptual:

```text
objects that can participate in cycles need tracking
objects that cannot participate in cycles usually do not
```

## 10.7 Generational Collection

CPython’s cyclic garbage collector is generational.

The idea is based on a common observation: most objects die young. Objects that survive several collections are likely to live longer.

A generational collector groups tracked objects by age. Younger generations are collected more often. Older generations are collected less often.

Conceptually:

```text
generation 0
    newest tracked objects
    collected most often

generation 1
    objects that survived earlier collection
    collected less often

generation 2
    older tracked objects
    collected least often
```

The exact generation design can change across CPython versions. The useful mental model is that CPython avoids scanning all tracked containers on every collection.

## 10.8 Collection Thresholds

The `gc` module exposes thresholds:

```python
import gc

print(gc.get_threshold())
```

Thresholds help decide when automatic cyclic collection runs.

You can change them:

```python
gc.set_threshold(700, 10, 10)
```

You can force a collection:

```python
gc.collect()
```

You can disable automatic cyclic collection:

```python
gc.disable()
```

and re-enable it:

```python
gc.enable()
```

Disabling the collector does not disable reference counting. Objects whose reference counts reach zero are still destroyed. Disabling the collector only disables automatic cycle detection.

## 10.9 How Cycle Detection Works

The collector does not simply look for objects with nonzero reference counts. Almost every live object has a nonzero reference count.

Instead, it computes whether the references keeping a group alive come only from inside that group.

A simplified cycle detection process:

```text
select tracked candidate objects
copy each object's reference count into a temporary field
for each reference from candidate object to candidate object:
    subtract one from the target's temporary count
objects with temporary count still positive are reachable from outside
propagate reachability from those externally reachable objects
objects never reached are unreachable cycles
```

Example:

```text
outside ---> A ---> B
             ^     |
             |     v
             D <--- C
```

Even if `A`, `B`, `C`, and `D` form a cycle, the outside reference to `A` makes the whole group reachable.

But:

```text
A ---> B
^     |
|     v
D <--- C
```

with no outside reference is collectible.

## 10.10 Reachable Cycles Are Not Garbage

A cycle is not automatically garbage.

```python
a = []
a.append(a)

print(a)
```

This object is cyclic, but reachable through the name `a`. It must stay alive.

The collector only reclaims unreachable cycles.

This distinction matters for data structures. Cyclic graphs are normal and valid in Python. The collector exists so they can be used safely without manual breakage in common cases.

## 10.11 Finalizers

Finalizers complicate garbage collection.

A finalizer is usually a `__del__` method:

```python
class Resource:
    def __del__(self):
        print("destroying")
```

Finalizers run during object destruction. They can execute Python code. That code can access globals, mutate state, acquire locks, create objects, or even resurrect the object being finalized.

Object resurrection means a finalizer makes the object reachable again:

```python
saved = None

class Resurrect:
    def __del__(self):
        global saved
        saved = self
```

This makes collection more complex.

Modern CPython has specific rules for finalizing cyclic garbage, but the practical guidance is simple:

```text
avoid complex __del__ methods
prefer context managers
prefer weakref.finalize for cleanup hooks
```

## 10.12 Context Managers Are Better for Resources

Garbage collection is not a resource management API.

Use `with` for deterministic cleanup:

```python
with open("data.txt") as f:
    data = f.read()
```

This closes the file when the block exits.

Do not rely on garbage collection timing:

```python
f = open("data.txt")
data = f.read()
f = None
```

In CPython, reference counting may close the file quickly. In other implementations, cleanup may happen later.

For locks, sockets, files, transactions, temporary directories, and external handles, use explicit lifetime control.

## 10.13 `gc.collect`

`gc.collect()` forces a cyclic collection.

```python
import gc

n = gc.collect()
print(n)
```

The return value is the number of unreachable objects found and collected.

You can request a specific generation in versions that support that interface:

```python
gc.collect(0)
gc.collect(1)
gc.collect(2)
```

Manual collection is useful for:

```text
tests
debugging leaks
interactive experiments
memory-sensitive batch phases
controlled benchmarks
```

It should rarely be needed in ordinary application code.

## 10.14 `gc.get_objects`

`gc.get_objects()` returns tracked objects known to the collector.

```python
import gc

objs = gc.get_objects()
print(len(objs))
```

This does not return every live Python object. It returns objects tracked by the cyclic garbage collector.

Many atomic objects may be absent.

Use it for debugging object graphs and memory leaks, not for normal application logic.

## 10.15 `gc.get_referrers`

`gc.get_referrers(obj)` returns objects that directly refer to `obj`.

```python
import gc

x = []
refs = gc.get_referrers(x)
print(refs)
```

This can help explain why an object remains alive.

But it must be used carefully. Calling debugging functions creates temporary references and frames. These may appear in results.

`gc.get_referrers` also exposes implementation details. It can show stack frames, dictionaries, lists, and internal objects.

## 10.16 `gc.get_referents`

`gc.get_referents(obj)` returns objects directly referenced by `obj`.

```python
import gc

x = [[1], [2]]
print(gc.get_referents(x))
```

For a list, this includes its elements.

For a dict, this may include keys and values.

This function uses the same traversal support that the collector uses. It sees GC-level referents, not necessarily all semantic relationships a programmer may imagine.

## 10.17 Uncollectable Garbage

Some objects may be found unreachable but not immediately collectable under certain conditions.

Historically, cycles containing finalizers were especially difficult. Modern CPython handles many of these cases better, but uncollectable objects can still appear with extension types or unusual finalization behavior.

The `gc` module exposes:

```python
import gc

print(gc.garbage)
```

When debugging, you can enable debug flags:

```python
gc.set_debug(gc.DEBUG_SAVEALL)
```

With `DEBUG_SAVEALL`, unreachable objects are saved in `gc.garbage` instead of being freed. This is useful for inspection, but it intentionally leaks until you clear the list.

## 10.18 Weak References

Weak references allow observing an object without keeping it alive.

```python
import weakref

class User:
    pass

u = User()
r = weakref.ref(u)

print(r())      # object

del u

print(r())      # None
```

The weak reference does not increment the target’s strong reference count.

Weak references are useful for caches, observer lists, parent links, and auxiliary metadata.

A parent pointer can often be weak:

```python
import weakref

class Node:
    def __init__(self):
        self.children = []
        self.parent = None

root = Node()
child = Node()

child.parent = weakref.ref(root)
root.children.append(child)
```

This avoids forming a strong cycle through parent and child links.

## 10.19 `weakref.finalize`

`weakref.finalize` registers cleanup code without placing cleanup logic directly in `__del__`.

```python
import weakref

class Resource:
    pass

def cleanup(name):
    print("cleaning", name)

r = Resource()
finalizer = weakref.finalize(r, cleanup, "resource")
```

When `r` becomes unreachable, the finalizer can run.

This is often safer than writing complex `__del__` methods because the finalizer separates cleanup state from the object being finalized.

Still, external resources should generally be managed with context managers when possible.

## 10.20 Frames, Tracebacks, and Cycles

Frames and tracebacks often create cycles.

Example:

```python
def f():
    x = []
    raise RuntimeError

try:
    f()
except RuntimeError as exc:
    saved = exc
```

An exception can hold a traceback. The traceback can hold frames. Frames hold local variables. Local variables may hold the exception or objects related to it.

Conceptually:

```text
exception
    traceback
        frame
            locals
                exception
```

This can keep large object graphs alive.

Modern Python clears some exception state more aggressively than older versions, but stored exceptions and tracebacks can still retain memory.

Common mitigation:

```python
try:
    ...
except Exception as exc:
    ...
finally:
    exc = None
```

or avoid storing tracebacks longer than necessary.

## 10.21 Closures and Cycles

Closures can also create cycles.

```python
def make_func():
    items = []

    def add(x):
        items.append(x)
        return items

    return add
```

The inner function references a cell. The cell references `items`.

More complex closures can reference objects that reference the function back.

Cycles through functions, closures, instances, and callbacks are common in real programs.

The collector handles many of them, but understanding the graph helps when memory grows unexpectedly.

## 10.22 Extension Types and GC Support

A C extension type that owns references to Python objects may need cyclic GC support.

Required pieces often include:

```text
Py_TPFLAGS_HAVE_GC
tp_traverse
tp_clear
GC-aware allocation
GC-aware deallocation
```

A traversal function tells the collector what objects this object references:

```c
static int
Box_traverse(BoxObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->value);
    return 0;
}
```

A clear function breaks references during collection:

```c
static int
Box_clear(BoxObject *self)
{
    Py_CLEAR(self->value);
    return 0;
}
```

The deallocator must untrack the object and clear owned references safely.

Conceptual shape:

```c
static void
Box_dealloc(BoxObject *self)
{
    PyObject_GC_UnTrack(self);
    Box_clear(self);
    Py_TYPE(self)->tp_free((PyObject *)self);
}
```

This is simplified. Real extension code must follow the exact C API requirements.

## 10.23 `Py_VISIT`

`Py_VISIT` is used inside traversal functions.

```c
static int
Node_traverse(NodeObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->left);
    Py_VISIT(self->right);
    Py_VISIT(self->parent);
    return 0;
}
```

It checks whether the field is non-NULL and passes it to the visitor.

A correct traversal function must visit every contained Python object reference that can participate in cycles.

Missing one field can make cycles invisible to the collector.

## 10.24 `Py_CLEAR`

`Py_CLEAR` safely clears an owned reference.

A naive clear operation:

```c
Py_DECREF(self->value);
self->value = NULL;
```

has a subtle problem. The `Py_DECREF` can run arbitrary code through finalizers. That code might observe `self` before `self->value` has been set to `NULL`.

`Py_CLEAR` sets the field to `NULL` first, then decrements the old reference.

Conceptually:

```c
tmp = self->value;
self->value = NULL;
Py_XDECREF(tmp);
```

Use `Py_CLEAR` when breaking references inside containers or deallocators.

## 10.25 Collection and Performance

The cyclic collector has a cost. It scans tracked containers and follows references.

Most programs should leave it enabled. But some workloads may tune it:

```text
short-lived batch jobs
allocation-heavy parsers
large object graph construction
scientific pipelines
services with known allocation phases
```

Example:

```python
import gc

gc.disable()
try:
    build_large_graph()
finally:
    gc.enable()
    gc.collect()
```

This pattern can help when you know a phase creates many containers without cycles. It can also hurt if cycles accumulate.

Measure before tuning.

## 10.26 Memory Leaks vs Retained References

Not all memory growth is a leak in the C sense.

A program may retain references accidentally:

```python
cache = []

def handle(request):
    cache.append(request)
```

The collector cannot free reachable objects. If `cache` keeps growing, those objects are live.

A true C-level leak means memory or references are lost by the implementation.

A Python-level retention bug means the program still has reachable references to objects it no longer needs.

Debugging memory often starts by asking:

```text
Is the object unreachable but not collected?
Or is something still referring to it?
```

Use `gc.get_referrers`, tracemalloc, object graph tools, and heap inspection to answer that.

## 10.27 Common Cycle Patterns

Common sources of cycles include:

| Pattern                | Shape                                            |
| ---------------------- | ------------------------------------------------ |
| Parent and child links | parent -> child -> parent                        |
| Doubly linked lists    | node A -> node B -> node A                       |
| Graph structures       | arbitrary cycles                                 |
| Observer callbacks     | object -> callback -> bound method -> object     |
| Exceptions             | exception -> traceback -> frame -> locals        |
| Closures               | function -> closure cell -> object -> function   |
| Descriptors            | class -> descriptor -> class-related state       |
| C extension objects    | native object -> Python object -> native wrapper |

Cycles are normal. The issue is whether unreachable cycles are collectible and whether they contain external resources that need deterministic cleanup.

## 10.28 Practical Rules

For Python code:

| Situation              | Preferred approach                                 |
| ---------------------- | -------------------------------------------------- |
| External resource      | Use `with`                                         |
| Parent pointer         | Consider `weakref`                                 |
| Cache                  | Use `weakref.WeakValueDictionary` when appropriate |
| Long-lived exception   | Avoid retaining traceback unnecessarily            |
| Memory debugging       | Use `gc`, `tracemalloc`, and referrer inspection   |
| Cleanup hook           | Prefer context manager or `weakref.finalize`       |
| Large allocation phase | Tune GC only after measurement                     |

For C extension code:

| Situation                       | Required discipline               |
| ------------------------------- | --------------------------------- |
| Type owns Python references     | Implement proper deallocation     |
| Type can form cycles            | Add GC support                    |
| Type supports GC                | Implement `tp_traverse` correctly |
| Type participates in collection | Implement `tp_clear` correctly    |
| Breaking references             | Use `Py_CLEAR`                    |
| Traversing fields               | Use `Py_VISIT`                    |
| Deallocating GC object          | Untrack before clearing           |

## 10.29 Mental Model

Use this model:

```text
Reference counting handles local lifetime.
Cyclic GC handles unreachable container cycles.
The collector only sees tracked objects.
Tracked objects must describe their outgoing references.
Reachable cycles remain alive.
Unreachable cycles can be collected.
Finalizers and extension types complicate collection.
Resource lifetime should be explicit.
```

This model explains why most CPython objects disappear promptly, why some cyclic structures survive until a collection, why `__del__` needs care, and why C extension types must participate in the GC protocol when they own Python references.

## 10.30 Summary

CPython’s garbage collector exists because reference counting cannot reclaim cycles. It tracks container-like objects, analyzes references between them, finds unreachable groups, and clears them safely.

The collector is generational, configurable through the `gc` module, and deeply tied to the object model. Python programmers mainly need to understand cycles, finalizers, weak references, and resource lifetime. C extension authors must implement traversal and clearing correctly when their types can participate in cycles.

