# 4. Reading CPython C Code

# 4. Reading CPython C Code

Reading CPython C code requires two mental models at the same time.

The first model is ordinary C: structs, pointers, macros, function pointers, reference ownership, allocation, error returns, and conditional compilation.

The second model is Python’s runtime model: objects, types, frames, exceptions, reference counts, descriptors, iterators, modules, and bytecode.

Most CPython source files combine both. A line of C code may look like ordinary pointer manipulation, but it often encodes a Python language rule.

## 4.1 Start From the Runtime Invariant

The central invariant is simple:

```text id="vm1twr"
Every Python value is represented as a PyObject pointer or a pointer to a struct whose first field is compatible with PyObject.
```

Most CPython functions work with values through `PyObject *`.

```c id="x34zg4"
PyObject *obj;
```

This pointer may refer to an integer, list, function, class, module, exception, string, or any user-defined object.

The actual behavior comes from the object’s type:

```c id="y1r4zx"
Py_TYPE(obj)
```

The type determines which operations are valid and which C functions implement them.

## 4.2 Read `PyObject` First

A simplified object header looks like this:

```c id="ogd09v"
typedef struct {
    Py_ssize_t ob_refcnt;
    PyTypeObject *ob_type;
} PyObject;
```

Variable-sized objects extend this idea:

```c id="s6ik3g"
typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size;
} PyVarObject;
```

A list, tuple, string, bytes object, dict, set, and many other types begin with this common object header. That lets CPython cast between specific object structs and `PyObject *`.

Example shape:

```c id="05z3tr"
typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;
```

The macro `PyObject_VAR_HEAD` expands to the common object header plus a size field.

The important point is layout compatibility. CPython can pass a `PyListObject *` to generic object APIs by casting it to `PyObject *`.

## 4.3 Learn the Core Macros

CPython uses macros heavily. Do not skip them. Many simple-looking operations expand into important behavior.

Common object macros:

```c id="0tneoh"
Py_TYPE(obj)       /* get object type */
Py_SIZE(obj)       /* get variable object size */
Py_REFCNT(obj)     /* get reference count */
Py_INCREF(obj)     /* increment reference count */
Py_DECREF(obj)     /* decrement reference count */
Py_XINCREF(obj)    /* increment if not NULL */
Py_XDECREF(obj)    /* decrement if not NULL */
```

Type-checking macros:

```c id="rkdbnt"
PyLong_Check(obj)
PyUnicode_Check(obj)
PyList_Check(obj)
PyTuple_Check(obj)
PyDict_Check(obj)
```

Fast exact-type checks often use variants such as:

```c id="houo83"
PyLong_CheckExact(obj)
PyUnicode_CheckExact(obj)
PyList_CheckExact(obj)
```

The difference matters. `PyList_Check(obj)` accepts list subclasses. `PyList_CheckExact(obj)` accepts only exact built-in lists.

## 4.4 Understand Reference Ownership

Reference ownership is the first major difficulty in CPython C code.

A function returning `PyObject *` may return:

| Reference kind     | Meaning                                        |
| ------------------ | ---------------------------------------------- |
| New reference      | Caller owns it and must eventually `Py_DECREF` |
| Borrowed reference | Caller does not own it                         |
| Stolen reference   | Callee takes ownership from caller             |

This is not visible from the C type. Both new and borrowed references are just `PyObject *`.

You must know the API contract.

Example new reference:

```c id="4ox0zo"
PyObject *x = PyLong_FromLong(42);
/* use x */
Py_DECREF(x);
```

`PyLong_FromLong` returns a new reference.

Example borrowed reference:

```c id="6j9ceg"
PyObject *item = PyList_GetItem(list, 0);
/* do not Py_DECREF(item) */
```

`PyList_GetItem` returns a borrowed reference.

Example strong reference from newer APIs:

```c id="kqi42o"
PyObject *item = PySequence_GetItem(seq, 0);
/* must Py_DECREF(item) */
Py_DECREF(item);
```

`PySequence_GetItem` returns a new reference.

A correct reader asks for every `PyObject *`:

```text id="ifk3kl"
Who owns this reference?
Who must release it?
Can this pointer be NULL?
Can this call execute Python code?
Can this call mutate the container?
```

## 4.5 Recognize Error Returns

CPython C APIs usually signal errors by returning sentinel values and setting an exception.

Common patterns:

| Return type       | Error value                      |
| ----------------- | -------------------------------- |
| `PyObject *`      | `NULL`                           |
| `int`             | `-1`                             |
| `Py_ssize_t`      | `-1`, often with exception check |
| pointer           | `NULL`                           |
| comparison result | `-1` may mean error              |

Example:

```c id="3udr6f"
PyObject *value = PyObject_GetAttrString(obj, "name");
if (value == NULL) {
    return NULL;
}
```

The exception has already been set. The caller usually propagates it by returning `NULL`.

For integer-like APIs, `-1` can be ambiguous. Some APIs require checking whether an exception occurred:

```c id="29oa6e"
Py_ssize_t n = PyLong_AsSsize_t(obj);
if (n == -1 && PyErr_Occurred()) {
    return NULL;
}
```

This pattern is common because `-1` may also be a valid Python value.

## 4.6 Trace the Exception State

CPython exceptions are stored in runtime thread state, not normally returned as C values.

When this Python code raises:

```python id="0j7gku"
raise ValueError("bad value")
```

CPython records an active exception. C functions then propagate failure by returning error sentinels.

Typical C pattern:

```c id="y9ubjz"
if (bad_condition) {
    PyErr_SetString(PyExc_ValueError, "bad value");
    return NULL;
}
```

A caller then does:

```c id="356gdj"
result = some_function();
if (result == NULL) {
    return NULL;
}
```

No explicit exception object is passed through the C call stack in most cases. The exception is stored in interpreter state, while `NULL` or `-1` carries control flow.

This explains why failing to check a return value can corrupt later execution.

## 4.7 Read Cleanup Paths Carefully

Most nontrivial CPython C functions have multiple failure exits.

A common pattern:

```c id="u4ryic"
PyObject *a = NULL;
PyObject *b = NULL;
PyObject *result = NULL;

a = make_a();
if (a == NULL) {
    goto error;
}

b = make_b();
if (b == NULL) {
    goto error;
}

result = combine(a, b);

error:
Py_XDECREF(a);
Py_XDECREF(b);
return result;
```

This kind of code is not incidental. It encodes reference ownership.

When reading cleanup code, verify:

```text id="73pxzg"
Every owned reference is released exactly once.
Borrowed references are not decref'd.
Objects are still valid when used.
Error paths preserve the active exception.
Success paths return the correct ownership.
```

Many CPython bugs are reference-count bugs in uncommon failure paths.

## 4.8 Know When C Code Can Run Python Code

A C function call may execute arbitrary Python code.

Examples include:

```text id="s31qgj"
attribute access
method calls
comparisons
hashing
iteration
descriptor invocation
numeric operations
imports
finalizers
weakref callbacks
```

This matters because arbitrary Python code can mutate objects, release references, re-enter the interpreter, trigger garbage collection, or raise exceptions.

For example:

```c id="u11u69"
int equal = PyObject_RichCompareBool(a, b, Py_EQ);
```

This may call user-defined `__eq__`.

Likewise:

```c id="kfptiw"
Py_hash_t h = PyObject_Hash(obj);
```

This may call user-defined `__hash__`.

When reading CPython C code, never assume an object remains unchanged across a call that can execute Python code unless the code owns the right references and has guarded its invariants.

## 4.9 Read Type Objects as Dispatch Tables

A `PyTypeObject` describes how a type behaves.

Simplified idea:

```c id="48r26q"
PyTypeObject PyList_Type = {
    .tp_name = "list",
    .tp_basicsize = sizeof(PyListObject),
    .tp_dealloc = list_dealloc,
    .tp_repr = list_repr,
    .tp_as_sequence = &list_as_sequence,
    .tp_methods = list_methods,
    .tp_new = list_new,
};
```

A type object contains slots for:

```text id="jf802y"
allocation
deallocation
attribute access
call behavior
numeric operations
sequence operations
mapping operations
iteration
methods
members
getters and setters
subclass behavior
```

Python syntax often maps to these slots.

| Python operation | Internal route                     |
| ---------------- | ---------------------------------- |
| `len(x)`         | sequence or mapping length slot    |
| `x[y]`           | mapping or sequence subscript slot |
| `x + y`          | numeric add slot                   |
| `x()`            | call slot                          |
| `iter(x)`        | iterator slot                      |
| `x.y`            | attribute access slot              |
| `repr(x)`        | repr slot                          |

So when reading a built-in type, first find its `PyTypeObject`. It acts as the table of contents for the implementation.

## 4.10 Distinguish Generic APIs From Type-Specific APIs

CPython often has both generic object APIs and exact type APIs.

Generic API:

```c id="e77tm2"
PyObject_GetItem(obj, key)
PyObject_SetAttr(obj, name, value)
PyObject_Call(func, args, kwargs)
PyObject_RichCompare(a, b, Py_EQ)
```

These respect Python-level customization. They may call user code.

Type-specific API:

```c id="3m2zu8"
PyList_GET_ITEM(list, i)
PyTuple_GET_ITEM(tuple, i)
PyDict_GetItemWithError(dict, key)
```

These often assume exact types and may bypass Python-level dispatch.

Fast macros such as `PyList_GET_ITEM` can be unsafe if used with the wrong type or invalid index. They are fast because they skip checks.

When reading code, ask whether the function needs Python semantics or internal speed.

## 4.11 Understand Borrowed Pointers Into Containers

Some APIs expose borrowed references to objects stored inside containers.

Example:

```c id="9tw6vq"
PyObject *item = PyList_GetItem(list, i);
```

The returned `item` is valid only while the list keeps that reference alive.

If code later allows Python execution, the list could mutate and release the item. Safe code often increments the reference before such calls:

```c id="0wl8te"
PyObject *item = PyList_GetItem(list, i);  /* borrowed */
if (item == NULL) {
    return NULL;
}

Py_INCREF(item);
/* safe across calls that may mutate list */
...
Py_DECREF(item);
```

This pattern is fundamental. Borrowed references are efficient, but they require strict lifetime reasoning.

## 4.12 Read Argument Parsing Code

C functions exposed to Python usually parse arguments with helper APIs or Argument Clinic generated code.

Manual style:

```c id="0jk1n5"
static PyObject *
mod_func(PyObject *self, PyObject *args)
{
    int n;

    if (!PyArg_ParseTuple(args, "i", &n)) {
        return NULL;
    }

    return PyLong_FromLong(n + 1);
}
```

Keyword style:

```c id="6r9q8i"
static PyObject *
mod_func(PyObject *self, PyObject *args, PyObject *kwargs)
{
    static char *kwlist[] = {"name", NULL};
    const char *name;

    if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s", kwlist, &name)) {
        return NULL;
    }

    Py_RETURN_NONE;
}
```

Argument Clinic style generates much of this wrapper code. When you see generated blocks, identify the handwritten logic and separate it from generated parsing boilerplate.

## 4.13 Know the Common Return Helpers

CPython uses helper macros for common return values.

```c id="83y546"
Py_RETURN_NONE;
Py_RETURN_TRUE;
Py_RETURN_FALSE;
```

These increment the singleton reference and return it.

Equivalent idea:

```c id="0iy1et"
Py_INCREF(Py_None);
return Py_None;
```

Newer code may use strong-reference helpers and internal convenience APIs. The rule remains the same: returned objects normally need to be owned by the caller.

## 4.14 Read Deallocation Functions Slowly

Every object type has a deallocation path.

Example shape:

```c id="tr2vxz"
static void
type_dealloc(MyObject *self)
{
    Py_XDECREF(self->field);
    Py_TYPE(self)->tp_free((PyObject *)self);
}
```

Deallocation must release owned references and free memory. But deallocation can be subtle because `Py_DECREF` can execute more deallocation, which can trigger finalizers or weakref callbacks.

For container objects, deallocation often clears contained references carefully.

Important questions:

```text id="va1g1h"
Does this object participate in cyclic GC?
Does it need to untrack itself before clearing fields?
Can clearing a field run Python code?
Does it support weakrefs?
Does it have a finalizer?
Which allocator frees the memory?
```

Deallocation bugs often appear as leaks, crashes, resurrected objects, or invalid memory access.

## 4.15 Recognize Garbage Collector Protocol Code

Container types that can participate in cycles implement GC support.

You may see functions like:

```c id="wm6ch5"
tp_traverse
tp_clear
PyObject_GC_Track
PyObject_GC_UnTrack
PyObject_GC_Del
```

The traverse function visits contained references:

```c id="zo14wa"
static int
my_traverse(MyObject *self, visitproc visit, void *arg)
{
    Py_VISIT(self->field);
    return 0;
}
```

The clear function releases references that may form cycles:

```c id="l0u4gi"
static int
my_clear(MyObject *self)
{
    Py_CLEAR(self->field);
    return 0;
}
```

`Py_CLEAR` sets the field to `NULL` before decrementing the reference. This prevents re-entrant code from seeing a dangling pointer.

GC support code looks mechanical, but it is essential for correctness.

## 4.16 Read With the Test File Open

Do not read implementation files alone.

For `Objects/listobject.c`, keep `Lib/test/test_list.py` nearby.

For `Objects/dictobject.c`, keep `Lib/test/test_dict.py` nearby.

For descriptors and classes, use `Lib/test/test_descr.py`.

For compiler behavior, use `Lib/test/test_compile.py`, `Lib/test/test_ast.py`, and `Lib/test/test_dis.py`.

Tests show intended behavior, edge cases, and historical regression cases.

A productive reading loop:

```text id="2ghmn1"
find Python feature
find test file
run targeted test
read implementation
modify small behavior or add print
rebuild
run test again
```

## 4.17 Use Search Patterns

Useful search patterns:

```bash id="7h5kd5"
grep -R "PyList_Type" Objects Include Python Modules
grep -R "list_append" Objects
grep -R "PyArg_ParseTuple" Modules Objects Python
grep -R "tp_as_mapping" Objects
grep -R "PyErr_SetString" Objects Python Modules
```

Use `git grep` inside the repository:

```bash id="4als46"
git grep "PyDict_GetItem"
git grep "tp_dealloc"
git grep "PyObject_RichCompareBool"
git grep "Argument Clinic"
```

Search for the type object first, then follow slots to functions.

## 4.18 A Practical Reading Example: `list.append`

Start from Python:

```python id="6wptq9"
xs = []
xs.append(1)
```

Find the method table in `Objects/listobject.c`.

It will contain a method entry for `append`.

Conceptually:

```c id="rbqcp6"
{"append", list_append, METH_O, "..."}
```

Then read the implementation.

A simplified shape:

```c id="06amht"
static PyObject *
list_append(PyListObject *self, PyObject *object)
{
    if (_PyList_AppendTakeRef(self, Py_NewRef(object)) < 0) {
        return NULL;
    }
    Py_RETURN_NONE;
}
```

The important points are:

```text id="ud5hhf"
self is the list object
object is the item passed from Python
the list stores a new reference to object
failure returns NULL with an exception set
success returns None
```

Then follow the helper that resizes the list if needed.

That path teaches:

```text id="bc2gxm"
method tables
argument calling convention
list over-allocation
reference ownership
error handling
return helpers
```

One small method can expose several CPython idioms.

## 4.19 A Practical Reading Example: `dict[key]`

Start from Python:

```python id="ghl7wv"
value = d[key]
```

This operation maps to dictionary subscript behavior.

Reading path:

```text id="agveol"
Objects/dictobject.c
    ↓
dict type object
    ↓
mapping methods
    ↓
subscript function
    ↓
hash lookup path
```

Important questions:

```text id="iws5gu"
Is the key hashable?
Does hashing call Python code?
How are missing keys handled?
How are exceptions distinguished from absence?
Does this path return a borrowed or new reference?
```

Dictionary code is performance-critical and highly optimized. Read it in layers: public behavior first, lookup helpers second, table layout third.

## 4.20 Style of CPython C

CPython C code tends to favor explicit control flow over abstraction.

Common traits:

```text id="k252pu"
manual reference counting
explicit error checks
goto-based cleanup
macros for hot paths
function pointers through type slots
separate fast paths and generic paths
conditional compilation for platforms
generated wrappers for Python-callable functions
```

This style is practical. CPython is old, portable, performance-sensitive C code with strict compatibility requirements.

Do not expect a small, purely modern C architecture. Expect layered evolution.

## 4.21 Common Mistakes When Reading CPython

| Mistake                                        | Correction                                   |
| ---------------------------------------------- | -------------------------------------------- |
| Treating `PyObject *` as a concrete type       | It is a generic pointer to any Python object |
| Ignoring reference ownership                   | Every object pointer has ownership rules     |
| Assuming `NULL` means no value                 | It usually means exception                   |
| Assuming C calls cannot run Python             | Many object APIs can run Python code         |
| Editing generated code                         | Edit the source input and regenerate         |
| Reading fast macros as safe APIs               | Many skip checks                             |
| Assuming bytecode is stable                    | Bytecode changes between versions            |
| Assuming CPython behavior is language behavior | Some behavior is implementation-specific     |

## 4.22 A Minimal Checklist for Any Function

When reading a CPython C function, answer these questions:

```text id="ztg0lo"
What Python behavior does this implement?
What are the input reference ownership rules?
What does the function return on success?
What does it return on failure?
Does it set or propagate an exception?
Which references does it own?
Which references are borrowed?
Can any call execute Python code?
Can any object be mutated during the function?
Are there cleanup paths?
Is this public API, private API, or internal helper?
Is any part generated?
Which tests cover it?
```

This checklist prevents most misreadings.

## 4.23 Chapter Summary

CPython C code is readable once you track three things consistently: object layout, reference ownership, and error propagation. Most runtime values are handled as `PyObject *`. Type objects define behavior through slots. Functions signal errors through return sentinels and interpreter exception state. Reference counting makes ownership visible in every line of code.

Read CPython code with tests open, follow type objects to slots, treat macros as real code, and assume that many generic object operations can execute arbitrary Python code.

