# 3. Repository Layout

# 3. Repository Layout

The CPython repository is organized around the major subsystems of the interpreter: object implementations, runtime machinery, compiler pipeline, parser, built-in modules, standard library, tests, documentation, and platform build files.

A good first pass is to treat the source tree as a map of responsibilities.

```text
cpython/
    Include/
    Objects/
    Python/
    Parser/
    Modules/
    Lib/
    Programs/
    Tools/
    Doc/
    Grammar/
    PC/
    PCbuild/
    Mac/
```

Each directory has a different role. Some contain core runtime code. Some contain generated code. Some contain test fixtures. Some exist mainly for platform-specific builds.

## 3.1 Top-Level Structure

| Directory   | Main role                                           |
| ----------- | --------------------------------------------------- |
| `Include/`  | Public, internal, and private C headers             |
| `Objects/`  | Implementations of core object types                |
| `Python/`   | Runtime, compiler, interpreter loop, initialization |
| `Parser/`   | Tokenizer and parser support code                   |
| `Grammar/`  | Grammar input files                                 |
| `Modules/`  | Built-in and extension modules written in C         |
| `Lib/`      | Python standard library                             |
| `Lib/test/` | CPython regression test suite                       |
| `Programs/` | Executable entry points                             |
| `Tools/`    | Developer and build tools                           |
| `Doc/`      | Documentation source                                |
| `PC/`       | Windows-specific source and config files            |
| `PCbuild/`  | Windows build system                                |
| `Mac/`      | macOS-specific support                              |

The most important directories for internals reading are:

```text
Include/
Objects/
Python/
Parser/
Modules/
Lib/test/
```

Those directories cover the object model, execution engine, compiler, parser, built-in types, C API, and tests.

## 3.2 `Include/`: C Header Files

`Include/` contains the header files used by CPython itself, extension modules, and embedders.

A simplified layout:

```text
Include/
    Python.h
    object.h
    unicodeobject.h
    listobject.h
    dictobject.h
    cpython/
    internal/
```

The most important file is:

```c
#include <Python.h>
```

`Python.h` is the umbrella public header for extension modules. It includes many other public headers and exposes the C API most extension authors use.

### Header categories

| Header area         | Audience               | Stability         |
| ------------------- | ---------------------- | ----------------- |
| `Include/*.h`       | Public C API users     | Relatively stable |
| `Include/cpython/`  | CPython-specific API   | Less portable     |
| `Include/internal/` | CPython internals only | Can change freely |

This distinction matters. Code inside CPython can include internal headers. Third-party extensions generally should not.

For example:

```c
#include "Python.h"
```

is normal for extension modules.

But:

```c
#include "internal/pycore_runtime.h"
```

is for CPython core code. It exposes internal runtime structures that are not part of the stable public API.

## 3.3 `Objects/`: Built-In Object Implementations

`Objects/` contains the C implementations of many core Python object types.

Examples:

| File                | Implements                        |
| ------------------- | --------------------------------- |
| `object.c`          | Base object operations            |
| `typeobject.c`      | Type objects, classes, MRO, slots |
| `longobject.c`      | Python integers                   |
| `floatobject.c`     | Python floats                     |
| `unicodeobject.c`   | Python strings                    |
| `bytesobject.c`     | `bytes`                           |
| `bytearrayobject.c` | `bytearray`                       |
| `listobject.c`      | `list`                            |
| `tupleobject.c`     | `tuple`                           |
| `dictobject.c`      | `dict`                            |
| `setobject.c`       | `set` and `frozenset`             |
| `funcobject.c`      | Function objects                  |
| `methodobject.c`    | Built-in method objects           |
| `moduleobject.c`    | Module objects                    |
| `genobject.c`       | Generators and coroutines         |
| `frameobject.c`     | Frame object support              |
| `codeobject.c`      | Code objects                      |
| `cellobject.c`      | Closure cells                     |
| `descrobject.c`     | Descriptors                       |

This directory is the best place to study how Python values are represented and operated on.

For example, list behavior lives mainly in:

```text
Objects/listobject.c
```

Dictionary behavior lives mainly in:

```text
Objects/dictobject.c
```

String behavior lives mainly in:

```text
Objects/unicodeobject.c
```

When Python runs:

```python
items = []
items.append(1)
```

the underlying list allocation, resizing, method lookup, and append operation eventually involve code in `Objects/listobject.c` and type machinery in `Objects/typeobject.c`.

## 3.4 `Python/`: Runtime, Compiler, and Interpreter Core

`Python/` contains much of CPython’s central machinery.

Important files include:

| File               | Role                                               |
| ------------------ | -------------------------------------------------- |
| `ceval.c`          | Bytecode evaluation loop                           |
| `bytecodes.c`      | Bytecode instruction definitions in modern CPython |
| `compile.c`        | AST to code object compiler                        |
| `symtable.c`       | Symbol table analysis                              |
| `ast.c`            | AST support                                        |
| `pythonrun.c`      | High-level execution entry points                  |
| `pylifecycle.c`    | Runtime initialization and finalization            |
| `import.c`         | Import support                                     |
| `errors.c`         | Exception state and error APIs                     |
| `traceback.c`      | Traceback support                                  |
| `sysmodule.c`      | Implementation of `sys`                            |
| `bltinmodule.c`    | Built-in functions and builtins module             |
| `marshal.c`        | Internal serialization format for code objects     |
| `thread.c`         | Thread abstraction layer                           |
| `context.c`        | Context variable support                           |
| `bootstrap_hash.c` | Hash secret initialization                         |

The file names are not merely labels. They reflect deep runtime subsystems.

### Interpreter execution

The bytecode interpreter is centered around the evaluation loop. Historically this is associated with `ceval.c`. In newer CPython versions, opcode definitions and generated interpreter pieces may be split across additional files.

Conceptual role:

```text
frame enters evaluation
    ↓
bytecode instruction fetched
    ↓
instruction dispatch
    ↓
object operation
    ↓
stack and frame state updated
```

### Compiler pipeline

The compiler code lowers AST nodes into code objects.

A simplified path:

```text
source text
    ↓
tokens
    ↓
parse tree
    ↓
AST
    ↓
symbol table
    ↓
compiler
    ↓
code object
```

The key files are usually:

```text
Parser/
Python/ast.c
Python/symtable.c
Python/compile.c
Objects/codeobject.c
```

## 3.5 `Parser/`: Tokenizer and Parser Support

`Parser/` contains the tokenizer, parser generator support, parser implementation files, and generated parser-related code.

Important areas include:

| Area                   | Role                                 |
| ---------------------- | ------------------------------------ |
| tokenizer              | Converts source text into tokens     |
| parser                 | Builds syntax structures from tokens |
| PEG machinery          | Supports Python’s PEG parser         |
| generated parser files | Produced from grammar definitions    |

The parser’s job is to decide whether source text is valid Python syntax and to build the structures that later become an AST.

Example:

```python
x = 1 + 2
```

The parser needs to recognize:

```text
assignment statement
target name x
expression 1 + 2
integer literal 1
integer literal 2
binary addition operator
```

Parsing precedes semantic analysis. The parser knows syntax. The symbol table pass later decides whether names are local, global, free, or cell variables.

## 3.6 `Grammar/`: Grammar Definitions

`Grammar/` contains grammar input files used to generate parser-related code.

The grammar defines Python syntax in a form consumed by CPython’s parser tooling.

For internals work, grammar changes are high impact. A syntax change can affect:

```text
parser generation
AST generation
compiler behavior
error messages
tests
documentation
tools that parse Python
```

A grammar-level change often requires regeneration and targeted tests.

The usual workflow is:

```text
edit grammar input
regenerate parser files
rebuild CPython
run parser, AST, compiler, and syntax tests
```

## 3.7 `Modules/`: Built-In and Extension Modules

`Modules/` contains many C modules shipped with CPython.

Examples:

| File or directory   | Module                   |
| ------------------- | ------------------------ |
| `_io/`              | I/O implementation       |
| `_decimal/`         | Decimal accelerator      |
| `_sqlite/`          | SQLite module            |
| `_ssl.c`            | SSL support              |
| `_hashopenssl.c`    | Hashing with OpenSSL     |
| `_ctypes/`          | `ctypes`                 |
| `arraymodule.c`     | `array`                  |
| `mathmodule.c`      | `math`                   |
| `itertoolsmodule.c` | `itertools`              |
| `functoolsmodule.c` | `_functools`             |
| `posixmodule.c`     | `os` platform operations |
| `timemodule.c`      | `time`                   |

Some standard library modules are written in Python and use C accelerators from `Modules/`.

For example, a public Python module may live in `Lib/`, while a private performance-critical helper lives in `Modules/`.

This pattern gives CPython a clean public API while preserving fast C implementations for hot paths.

## 3.8 `Lib/`: Standard Library

`Lib/` contains the Python standard library.

Examples:

| Path                 | Role                             |
| -------------------- | -------------------------------- |
| `Lib/os.py`          | OS interface layer               |
| `Lib/pathlib/`       | Object-oriented paths            |
| `Lib/importlib/`     | Import system implementation     |
| `Lib/asyncio/`       | Async I/O framework              |
| `Lib/collections/`   | Collection utilities             |
| `Lib/dataclasses.py` | Dataclass support                |
| `Lib/typing.py`      | Typing support                   |
| `Lib/unittest/`      | Unit testing framework           |
| `Lib/json/`          | JSON implementation              |
| `Lib/concurrent/`    | Futures and process/thread pools |

Many CPython internals are easier to understand by reading the Python-level standard library first.

For example, `importlib` contains much of the import system in Python code. CPython bootstraps it specially, but large parts remain readable as Python.

## 3.9 `Lib/test/`: Regression Tests

`Lib/test/` contains CPython’s test suite.

This directory is essential for internals work.

Examples:

| Test file           | Focus                |
| ------------------- | -------------------- |
| `test_dict.py`      | Dictionary behavior  |
| `test_list.py`      | List behavior        |
| `test_gc.py`        | Garbage collector    |
| `test_sys.py`       | `sys` module         |
| `test_dis.py`       | Bytecode disassembly |
| `test_compile.py`   | Compiler behavior    |
| `test_ast.py`       | AST behavior         |
| `test_importlib/`   | Import system        |
| `test_capi/`        | C API behavior       |
| `test_threading.py` | Threading behavior   |

When studying a subsystem, pair the implementation file with its tests.

| Implementation         | Tests                       |
| ---------------------- | --------------------------- |
| `Objects/dictobject.c` | `Lib/test/test_dict.py`     |
| `Objects/listobject.c` | `Lib/test/test_list.py`     |
| `Python/compile.c`     | `Lib/test/test_compile.py`  |
| `Python/symtable.c`    | `Lib/test/test_symtable.py` |
| `Python/sysmodule.c`   | `Lib/test/test_sys.py`      |
| `Modules/mathmodule.c` | `Lib/test/test_math.py`     |

This habit prevents reading code in isolation. CPython behavior is defined by code plus tests plus documentation plus compatibility expectations.

## 3.10 `Programs/`: Executable Entry Points

`Programs/` contains source files for CPython executable programs.

Typical files include:

```text
Programs/python.c
Programs/_testembed.c
```

`Programs/python.c` is the normal command-line interpreter entry point.

A simplified startup path looks like:

```text
main()
    ↓
initialize runtime
    ↓
configure interpreter
    ↓
run command, script, module, stdin, or REPL
    ↓
finalize runtime
```

This directory is useful when studying startup, embedding, command-line options, and interpreter initialization.

## 3.11 `Tools/`: Developer Tools

`Tools/` contains helper programs for CPython development.

Examples include tools for:

```text
generated file maintenance
bytecode and opcode metadata
C API inspection
test support
build support
freeze tooling
scripts used by maintainers
```

The exact contents change over time. The important rule is that many generated files in CPython have source inputs and regeneration tools. `Tools/` is often where those tools live.

When changing grammar, Argument Clinic blocks, opcode definitions, or generated metadata, check the relevant tool workflow before editing generated output.

## 3.12 `Doc/`: Documentation Source

`Doc/` contains CPython’s documentation source.

The documentation covers:

```text
language reference
library reference
C API reference
extending and embedding
how-to guides
tutorial
installing and using Python
```

For internals work, the documentation matters because changes to behavior often require documentation updates.

A CPython change may require edits in several places:

```text
implementation code
tests
documentation
news entry
C API docs
library docs
```

Documentation source uses reStructuredText rather than Markdown.

## 3.13 Platform Directories

CPython supports many platforms. Some directories exist mainly for platform-specific configuration and builds.

| Directory                                  | Platform role                  |
| ------------------------------------------ | ------------------------------ |
| `PC/`                                      | Windows-specific source/config |
| `PCbuild/`                                 | Visual Studio build files      |
| `Mac/`                                     | macOS support                  |
| platform files in `Python/` and `Modules/` | OS-specific implementations    |

Platform support appears throughout the tree through conditional compilation.

Example pattern:

```c
#ifdef MS_WINDOWS
    /* Windows-specific code */
#else
    /* POSIX-like code */
#endif
```

Internals reading often requires distinguishing portable runtime logic from platform-specific branches.

## 3.14 Generated Code and Source Inputs

CPython contains both hand-written files and generated files.

Common generated areas include:

```text
parser output from grammar
AST-related generated files
opcode metadata and bytecode tables
Argument Clinic output
frozen importlib modules
configuration files
```

A generated file usually has a comment near the top explaining how it was produced.

Before editing a suspiciously mechanical block, look for markers such as:

```text
generated by
do not edit
clinic start generated code
autogenerated
```

Manual edits to generated regions are usually lost during regeneration.

## 3.15 A Reading Path Through the Repository

A good first reading path is:

```text
Programs/python.c
    ↓
Python/pylifecycle.c
    ↓
Python/pythonrun.c
    ↓
Python/compile.c
    ↓
Python/ceval.c
    ↓
Objects/object.c
    ↓
Objects/typeobject.c
    ↓
Objects/dictobject.c
    ↓
Objects/listobject.c
```

This path follows execution from process startup to runtime behavior.

A second reading path for object internals:

```text
Include/object.h
    ↓
Include/cpython/object.h
    ↓
Objects/object.c
    ↓
Objects/typeobject.c
    ↓
Objects/longobject.c
    ↓
Objects/unicodeobject.c
    ↓
Objects/listobject.c
    ↓
Objects/dictobject.c
```

A third path for source-to-bytecode:

```text
Grammar/
    ↓
Parser/
    ↓
Python/ast.c
    ↓
Python/symtable.c
    ↓
Python/compile.c
    ↓
Objects/codeobject.c
    ↓
Python/ceval.c
```

## 3.16 How to Find Code for a Python Feature

Start from the Python feature, then map it to a subsystem.

| Feature       | First files to inspect                     |
| ------------- | ------------------------------------------ |
| `list.append` | `Objects/listobject.c`                     |
| `dict[key]`   | `Objects/dictobject.c`                     |
| `x.y`         | `Objects/object.c`, `Objects/typeobject.c` |
| `class C:`    | `Objects/typeobject.c`, `Python/compile.c` |
| `try/except`  | `Python/compile.c`, `Python/ceval.c`       |
| `import x`    | `Lib/importlib/`, `Python/import.c`        |
| `async def`   | `Python/compile.c`, `Objects/genobject.c`  |
| `with`        | `Python/compile.c`, `Python/ceval.c`       |
| `len(x)`      | `Python/bltinmodule.c`, type slots         |
| `print(x)`    | `Python/bltinmodule.c`, file I/O modules   |

Use tests to confirm behavior:

```bash
./python -m test test_dict
./python -m test test_descr
./python -m test test_importlib
```

## 3.17 Repository Layout as Architecture

The directory structure reflects CPython’s architecture.

```text
Include/    exposes C interfaces
Objects/    defines runtime values
Python/     executes and manages programs
Parser/     understands syntax
Modules/    provides C-backed modules
Lib/        provides Python-level standard library
Lib/test/   protects behavior
Programs/   starts the executable
Tools/      maintains generated and developer workflows
Doc/        explains public behavior
```

This layout is not perfect. Older code, platform constraints, backward compatibility, and generated files create exceptions. Still, the structure is coherent enough to guide source reading.

## 3.18 Chapter Summary

The CPython repository is a working system, not a collection of isolated files. `Objects/` defines what Python values are. `Python/` defines how programs compile and execute. `Parser/` and `Grammar/` define syntax handling. `Modules/` and `Lib/` provide the standard library. `Include/` exposes C interfaces. `Lib/test/` defines much of the regression safety net.

A productive reader moves between implementation, tests, and documentation. CPython internals become easier once each source file has a clear place in the runtime architecture.

