# 2. Building CPython From Source

# 2. Building CPython From Source

Building CPython from source gives you a local interpreter that you can inspect, modify, debug, and test. This is the first practical step before reading internals seriously.

A source build lets you do things that a packaged Python install usually hides:

```text
change interpreter code
add debug prints
inspect object layout
run CPython tests
use debug-only assertions
trace reference counts
debug crashes in C
compare bytecode across builds
```

## 2.1 Get the Source Tree

CPython lives in a Git repository. A normal local checkout looks like this:

```bash
git clone https://github.com/python/cpython.git
cd cpython
```

The repository contains the interpreter, standard library, tests, documentation, build files, and platform support code.

A simplified view:

```text
cpython/
    Include/        public and internal C headers
    Objects/        core object implementations
    Python/         compiler, runtime, interpreter loop
    Parser/         tokenizer and parser support
    Modules/        built-in and extension modules
    Lib/            Python standard library
    Lib/test/       regression test suite
    Programs/       executable entry points
    Tools/          developer tools
    Doc/            documentation source
```

The most important directories for internals work are `Objects`, `Python`, `Include`, `Modules`, and `Lib/test`.

## 2.2 Choose a Build Mode

There are two common builds:

| Build         | Purpose                              |
| ------------- | ------------------------------------ |
| Release build | Similar to a normal installed Python |
| Debug build   | Better for internals work            |

For internals study, use a debug build first. It enables extra assertions, debug helpers, reference tracking support, and safer failure modes.

A debug build is slower, but easier to inspect.

## 2.3 Build on Linux or macOS

On Unix-like systems, CPython uses the usual configure and make flow.

```bash
./configure --with-pydebug
make -j
```

This produces an executable in the source tree, usually named like:

```bash
./python
```

Run it:

```bash
./python -V
./python -c "print('hello from local CPython')"
```

The `--with-pydebug` option creates a debug build. This changes ABI tags and enables debug behavior.

A common development configure command is:

```bash
./configure --with-pydebug --with-trace-refs
make -j
```

Use `--with-trace-refs` only when you need deeper reference tracking. It changes object layout and can slow the interpreter further.

## 2.4 Build Dependencies

A minimal build may succeed without every optional dependency, but many standard library modules need system libraries.

Common dependencies include:

| Feature                | Typical dependency         |
| ---------------------- | -------------------------- |
| SSL and HTTPS          | OpenSSL                    |
| Compression            | zlib, bzip2, xz            |
| SQLite                 | SQLite development headers |
| Readline shell support | readline or libedit        |
| Curses                 | ncurses                    |
| Tkinter                | Tcl/Tk                     |
| UUID support           | libuuid                    |
| FFI support            | libffi                     |

If a dependency is missing, CPython may still build, but some extension modules will be skipped.

You can check the build output for messages about missing modules.

## 2.5 Out-of-Tree Builds

You can build CPython outside the source directory. This keeps generated files separate.

```bash
mkdir ../cpython-build-debug
cd ../cpython-build-debug
../cpython/configure --with-pydebug
make -j
```

The resulting executable lives in the build directory:

```bash
./python
```

Out-of-tree builds are useful when you want multiple configurations from one source checkout:

```text
cpython/
cpython-build-debug/
cpython-build-release/
cpython-build-asan/
```

## 2.6 Build on Windows

On Windows, CPython uses Visual Studio build files.

From a Developer Command Prompt:

```bat
PCbuild\build.bat -d
```

The `-d` flag builds a debug interpreter.

The executable is usually under:

```text
PCbuild\amd64\python_d.exe
```

A release build uses:

```bat
PCbuild\build.bat
```

Windows builds have their own project files, platform code, and extension build rules. The layout differs from Unix builds, but the interpreter source is the same core code.

## 2.7 Verify the Build

After building, check the executable:

```bash
./python -V
./python -m sysconfig
```

Check where the interpreter thinks it is installed:

```bash
./python - <<'PY'
import sys
print(sys.executable)
print(sys.prefix)
print(sys.path)
PY
```

For source-tree development, `sys.path` should include the local `Lib` directory.

## 2.8 Run the Test Suite

CPython’s test suite is run with:

```bash
./python -m test
```

For a faster first check:

```bash
./python -m test test_sys test_gc test_dict test_compile
```

Run tests in parallel:

```bash
./python -m test -j8
```

Re-run failed tests:

```bash
./python -m test --fail-env-changed
```

Run one test file:

```bash
./python -m test test_dict
```

Run one test case with `unittest` syntax:

```bash
./python -m unittest Lib.test.test_dict.DictTest.test_constructor
```

The test suite is part of the internals workflow. When you change CPython, tests are the first guard against breaking language behavior.

## 2.9 Debug Build Behavior

A debug build changes how CPython behaves internally.

It enables additional checks such as:

```text
assertions in C code
debug memory allocator checks
extra object consistency checks
reference leak tools
debug ABI marker
stricter failure behavior
```

Debug builds often expose bugs earlier. A memory misuse that appears harmless in a release build may abort quickly in a debug build.

This is useful. Internals work should fail loudly.

## 2.10 Inspect Build Configuration

CPython exposes build configuration through `sysconfig`.

```python
import sysconfig

print(sysconfig.get_config_var("Py_DEBUG"))
print(sysconfig.get_config_var("WITH_PYMALLOC"))
print(sysconfig.get_config_var("Py_GIL_DISABLED"))
print(sysconfig.get_config_var("CONFIG_ARGS"))
```

Run it directly:

```bash
./python - <<'PY'
import sysconfig
for name in ["Py_DEBUG", "WITH_PYMALLOC", "Py_GIL_DISABLED", "CONFIG_ARGS"]:
    print(name, "=", sysconfig.get_config_var(name))
PY
```

This tells you which compile-time features are active.

## 2.11 Useful Build Targets

Common `make` targets:

| Target             | Purpose                                |
| ------------------ | -------------------------------------- |
| `make`             | Build the interpreter                  |
| `make -j`          | Build in parallel                      |
| `make clean`       | Remove many generated files            |
| `make distclean`   | Remove configure output too            |
| `make test`        | Run tests                              |
| `make regen-all`   | Regenerate generated files             |
| `make profile-opt` | Build with profile-guided optimization |

For ordinary internals work, use:

```bash
make -j
./python -m test test_name
```

For generated files, use regeneration targets only when you change inputs such as grammar, clinic definitions, or opcode metadata.

## 2.12 Generated Files

Some CPython files are generated. Do not edit them blindly.

Generated artifacts may come from:

```text
Grammar definitions
Argument Clinic input
opcode metadata
frozen modules
configuration scripts
documentation tools
```

Argument Clinic is especially common in C extension and built-in method definitions. It generates parsing and wrapper code from structured comments.

A C file may contain blocks like:

```c
/*[clinic input]
module.function

    arg: object

Description here.
[clinic start generated code]*/
```

The generated part should be regenerated through the correct tool instead of manually edited.

## 2.13 Rebuild After a Change

A normal edit loop:

```bash
vim Objects/listobject.c
make -j
./python -m test test_list
```

For a small change in C code, incremental rebuilds are usually fast.

For a Python standard library change:

```bash
vim Lib/pathlib/__init__.py
./python -m test test_pathlib
```

No C rebuild is needed for pure Python changes.

For parser, opcode, or generated-code changes, you may need regeneration before building.

## 2.14 Add a Debug Print

A simple way to confirm that you are running your own interpreter is to add a temporary debug print.

For example, in a C function:

```c
fprintf(stderr, "debug: list append called\n");
```

Then rebuild:

```bash
make -j
```

Run a small program:

```bash
./python - <<'PY'
x = []
x.append(1)
PY
```

Temporary prints are crude but effective. Remove them before committing.

## 2.15 Use `gdb` or `lldb`

A debug build works well with native debuggers.

With `gdb`:

```bash
gdb --args ./python script.py
```

Inside `gdb`:

```gdb
break PyEval_EvalFrameDefault
run
bt
```

With `lldb`:

```bash
lldb -- ./python script.py
```

Inside `lldb`:

```lldb
breakpoint set --name PyEval_EvalFrameDefault
run
bt
```

Useful breakpoints:

```text
Py_Initialize
PyEval_EvalFrameDefault
_PyEval_EvalFrameDefault
PyObject_Malloc
PyObject_Free
PyErr_SetString
_Py_Dealloc
```

Exact symbol names can change by version and build configuration.

## 2.16 Use Python-Level Inspection

Not every internals question needs a C debugger.

Useful modules:

| Module        | Use                                    |
| ------------- | -------------------------------------- |
| `dis`         | Inspect bytecode                       |
| `sys`         | Runtime state and interpreter settings |
| `gc`          | Garbage collector inspection           |
| `inspect`     | Frames, functions, source, signatures  |
| `types`       | Runtime type objects                   |
| `sysconfig`   | Build configuration                    |
| `tracemalloc` | Python allocation tracing              |

Example:

```python
import dis
import gc
import sys

def f(x):
    return x + 1

dis.dis(f)
print(f.__code__)
print(sys.getrefcount(f))
print(gc.is_tracked(f))
```

This style is useful before dropping into C.

## 2.17 Debug Memory Allocation

CPython has debug hooks for memory allocators.

Run with:

```bash
PYTHONMALLOC=debug ./python script.py
```

This enables extra checks around memory allocation. It can detect API misuse, buffer overruns, underflows, and some use-after-free patterns.

For allocation tracing:

```bash
./python -X tracemalloc script.py
```

Or inside Python:

```python
import tracemalloc

tracemalloc.start()

data = [bytearray(1024) for _ in range(1000)]

current, peak = tracemalloc.get_traced_memory()
print(current, peak)
```

`tracemalloc` traces Python-level memory allocation paths. Native heap debugging still needs lower-level tools.

## 2.18 Sanitizer Builds

For serious C-level work, build with sanitizers.

AddressSanitizer can detect memory errors:

```bash
./configure --with-pydebug --with-address-sanitizer
make -j
```

UndefinedBehaviorSanitizer can detect undefined C behavior:

```bash
./configure --with-pydebug --with-undefined-behavior-sanitizer
make -j
```

These builds are slower, but useful for runtime, allocator, parser, and extension changes.

## 2.19 Profile-Guided Release Builds

A normal optimized CPython release build may use profile-guided optimization.

```bash
make profile-opt
```

This builds CPython, runs a training workload, and rebuilds using profile data.

Use this when measuring performance. Do not compare a debug build against a release Python and treat the numbers as meaningful.

For internals study:

```text
debug build for correctness and inspection
release build for speed comparison
PGO build for performance-sensitive measurement
```

## 2.20 Common Build Problems

| Symptom                      | Likely cause                                   |
| ---------------------------- | ---------------------------------------------- |
| `_ssl` missing               | OpenSSL headers or libraries unavailable       |
| `_sqlite3` missing           | SQLite development package unavailable         |
| `readline` missing           | readline or libedit headers unavailable        |
| test failures around locale  | environment locale mismatch                    |
| test failures around network | external network tests or platform limits      |
| debug build import mismatch  | running wrong executable or wrong `PYTHONPATH` |
| stale generated files        | regeneration needed                            |
| linker error                 | missing or incompatible system library         |

Before debugging CPython itself, confirm that you are running the interpreter you just built:

```bash
./python - <<'PY'
import sys
print(sys.executable)
print(sys.version)
PY
```

## 2.21 Keep Multiple Builds

A practical setup:

```text
cpython/
cpython-build-debug/
cpython-build-release/
cpython-build-asan/
```

Use each build for a different job:

| Build   | Use                              |
| ------- | -------------------------------- |
| Debug   | Internals reading and assertions |
| Release | Behavior comparison              |
| ASAN    | Memory error detection           |
| PGO     | Performance measurements         |

This avoids constantly reconfiguring one build directory.

## 2.22 Minimal Internals Workflow

A good first workflow:

```bash
git clone https://github.com/python/cpython.git
cd cpython
./configure --with-pydebug
make -j
./python -V
./python -m test test_sys test_gc test_dict
```

Then inspect bytecode:

```bash
./python - <<'PY'
import dis

def f(x):
    return x + 1

dis.dis(f)
PY
```

Then change a small file, rebuild, and run a targeted test.

## 2.23 What This Build Enables

After this chapter, you should have a local CPython executable that you can use for the rest of the book.

You can now inspect:

```text
how source becomes bytecode
how frames execute
how objects are laid out
how reference counts change
how the garbage collector tracks containers
how built-in types are implemented
how tests protect behavior
how C-level bugs surface
```

A source build turns CPython from a black box into a system you can step through, instrument, and modify.

