PyModuleDef structure, PyModule_Create, multi-phase initialization (PEP 451), and module state.
An extension module is a native shared library loaded by CPython at runtime. It exposes functions, types, constants, and state implemented in C or C-compatible languages.
From Python code, an extension module behaves like a normal module:
import math
import zlib
import _sqlite3Internally, these modules are compiled native binaries that integrate with the CPython runtime through the Python C API.
Extension modules are one of the main mechanisms that make Python practical for systems programming, numerical computing, graphics, databases, networking, cryptography, and machine learning.
63.1 What an Extension Module Is
At the operating system level, an extension module is usually:
| Platform | Binary type |
|---|---|
| Linux | ELF shared object (.so) |
| macOS | Mach-O shared object (.so) |
| Windows | DLL-based Python extension (.pyd) |
CPython dynamically loads the binary:
filesystem
↓
dynamic loader
↓
module init symbol
↓
CPython runtime registration
↓
Python module objectThe extension becomes part of the interpreter process.
Unlike subprocesses, extension modules execute inside the same memory space as the interpreter.
63.2 Native Modules vs Pure Python Modules
Pure Python module:
# hello.py
def greet():
return "hello"Extension module equivalent:
hello.c
↓
compiler
↓
hello.so
↓
import helloBoth appear similar from Python:
import hello
hello.greet()But internally:
| Pure Python | Extension module |
|---|---|
| Parsed and compiled by CPython | Compiled by native compiler |
| Executes bytecode | Executes machine code |
| Managed by interpreter | Integrated through C API |
| Slower for low-level loops | Near-native performance possible |
63.3 Why Extension Modules Exist
Extension modules serve several roles.
Performance
Native loops avoid interpreter overhead.
System Integration
Direct operating system APIs:
sockets
filesystems
processes
memory mapping
GPU drivers
network stacksExisting Native Libraries
Binding mature ecosystems:
| Ecosystem | Examples |
|---|---|
| C | zlib, OpenSSL |
| C++ | LLVM, Tensor runtimes |
| Fortran | BLAS, LAPACK |
| CUDA | GPU kernels |
Runtime Features
Some features require low-level access:
custom allocators
vector instructions
thread primitives
kernel APIs
zero-copy buffers63.4 The Smallest Possible Extension
Minimal extension:
#include <Python.h>
static PyObject *
hello(PyObject *self, PyObject *args)
{
printf("hello from C\n");
Py_RETURN_NONE;
}
static PyMethodDef methods[] = {
{"hello", hello, METH_NOARGS, "Print hello"},
{NULL, NULL, 0, NULL}
};
static struct PyModuleDef module = {
PyModuleDef_HEAD_INIT,
"demo",
NULL,
-1,
methods
};
PyMODINIT_FUNC
PyInit_demo(void)
{
return PyModule_Create(&module);
}This module exposes one Python function:
import demo
demo.hello()The output comes directly from native code.
63.5 Python.h
Every extension starts with:
#include <Python.h>This header:
defines PyObject
includes runtime macros
declares API functions
configures platform compatibility
defines interpreter typesIt must usually be included before standard headers because it configures compiler and platform settings internally.
63.6 The Module Initialization Function
Each extension exports a special symbol:
PyInit_demowhere:
demomatches the module name.
During import:
import demoCPython:
loads shared library
finds PyInit_demo
calls initializer
receives PyObject *
registers moduleThe initializer must return a module object or NULL on failure.
63.7 PyMODINIT_FUNC
Initialization functions use:
PyMODINIT_FUNCExample:
PyMODINIT_FUNC
PyInit_demo(void)This macro handles platform-specific export behavior:
| Platform | Requirement |
|---|---|
| Windows | DLL export decoration |
| Unix-like systems | Symbol visibility |
| Compilers | Calling conventions |
Without it, the dynamic loader may fail to locate the module initializer.
63.8 PyModuleDef
Modules are described using PyModuleDef.
static struct PyModuleDef module = {
PyModuleDef_HEAD_INIT,
"demo",
"Example module",
-1,
methods
};Structure fields:
| Field | Meaning |
|---|---|
| initializer | Internal runtime header |
| module name | Python-visible name |
| docstring | Module documentation |
| state size | Per-module state size |
| methods | Exported functions |
The runtime uses this structure to construct the module object.
63.9 PyMethodDef
Exported functions are declared using:
static PyMethodDef methods[]Example:
{
"add",
add,
METH_VARARGS,
"Add two numbers"
}Fields:
| Field | Meaning |
|---|---|
| Python name | Visible function name |
| C function | Native implementation |
| flags | Calling convention |
| docstring | Help text |
The array ends with:
{NULL, NULL, 0, NULL}which acts as a sentinel terminator.
63.10 Function Signatures
Different calling conventions require different signatures.
METH_NOARGS
static PyObject *
f(PyObject *self, PyObject *unused)METH_VARARGS
static PyObject *
f(PyObject *self, PyObject *args)METH_VARARGS | METH_KEYWORDS
static PyObject *
f(PyObject *self,
PyObject *args,
PyObject *kwargs)METH_FASTCALL
Modern optimized convention.
static PyObject *
f(PyObject *self,
PyObject *const *args,
Py_ssize_t nargs)Modern CPython increasingly favors fastcall-style APIs internally.
63.11 Parsing Arguments
Python arguments arrive as Python objects.
Extensions typically convert them into C values.
Example:
static PyObject *
add(PyObject *self, PyObject *args)
{
int a;
int b;
if (!PyArg_ParseTuple(args, "ii", &a, &b)) {
return NULL;
}
return PyLong_FromLong(a + b);
}Format string:
"ii"means:
parse two integersCommon format units:
| Unit | Meaning |
|---|---|
i | int |
l | long |
d | double |
s | UTF-8 string |
O | generic object |
p | boolean |
Failure automatically sets an exception.
63.12 Returning Values
Functions return Python objects.
Example:
return PyLong_FromLong(a + b);The return value must be:
| Return | Meaning |
|---|---|
PyObject * | Success |
NULL | Exception occurred |
Returning native C values directly is invalid.
Incorrect:
return a + b;Correct:
return PyLong_FromLong(a + b);63.13 Raising Exceptions
Exceptions are set explicitly.
Example:
PyErr_SetString(PyExc_ValueError,
"invalid value");
return NULL;The interpreter checks for:
NULL return
+
active exception stateBuilt-in exception objects include:
| Exception | Object |
|---|---|
ValueError | PyExc_ValueError |
TypeError | PyExc_TypeError |
RuntimeError | PyExc_RuntimeError |
MemoryError | PyExc_MemoryError |
Extensions may define custom exception types.
63.14 Module-Level State
Historically, extensions used global variables:
static int counter = 0;This causes problems with:
subinterpreters
reloading
isolation
thread safety
multiple runtimesModern CPython supports per-module state.
Example:
typedef struct {
int counter;
} module_state;The module definition specifies state size:
sizeof(module_state)This allows each interpreter instance to maintain isolated module data.
63.15 Multi-Phase Initialization
Modern extensions can use multi-phase initialization.
Traditional initialization:
create module immediatelyMulti-phase initialization:
create module definition
↓
runtime allocates module
↓
state initialized laterThis improves compatibility with:
subinterpreters
module reloading
runtime isolation
future interpreter changesPEP 489 introduced this model.
63.16 Adding Constants
Extensions can add constants directly.
Example:
PyModule_AddIntConstant(module,
"ANSWER",
42);Python usage:
import demo
print(demo.ANSWER)Other helpers:
| Function | Purpose |
|---|---|
PyModule_AddObject | Add arbitrary object |
PyModule_AddStringConstant | Add string |
PyModule_AddIntConstant | Add integer |
Ownership behavior matters carefully here.
63.17 Defining Module Exceptions
Extensions often expose module-specific exceptions.
Example:
static PyObject *DemoError;
DemoError =
PyErr_NewException(
"demo.Error",
NULL,
NULL
);Register:
PyModule_AddObject(module,
"Error",
DemoError);Python:
import demo
raise demo.Error("failure")This integrates native modules into Python exception semantics naturally.
63.18 Building Extensions
Extensions require native compilation.
Traditional setuptools
from setuptools import setup, Extension
setup(
ext_modules=[
Extension(
"demo",
["demo.c"]
)
]
)Build:
python setup.py buildModern build systems
Common tools:
| Tool | Purpose |
|---|---|
| setuptools | Traditional builds |
| scikit-build | CMake integration |
| maturin | Rust integration |
| meson-python | Meson builds |
63.19 Shared Library Loading
Importing an extension uses the operating system loader.
Process:
import statement
↓
importlib finds shared library
↓
dlopen / LoadLibrary
↓
resolve PyInit symbol
↓
call initializer
↓
register moduleThe module remains mapped into process memory.
Native static variables therefore persist for interpreter lifetime unless explicitly cleaned up.
63.20 Extension Module Lifetime
Extension modules often live for the entire interpreter lifetime.
Objects created by extensions may survive:
imports
reloads
callbacks
threads
async tasks
reference cyclesThis means extension code must handle:
long-lived allocations
shutdown ordering
global cleanup
finalization safetyInterpreter shutdown is especially difficult because objects may disappear in partially torn-down states.
63.21 Extension Modules and the GIL
Most extension code executes while holding the GIL.
CPU-intensive native code may release it:
Py_BEGIN_ALLOW_THREADS
compute();
Py_END_ALLOW_THREADSThis allows parallel native execution.
But once the GIL is released:
most Python C API calls become unsafebecause interpreter state is no longer protected.
63.22 Extension Modules and ABI Compatibility
Extension modules are sensitive to CPython ABI changes.
Dependencies include:
object layout
reference count semantics
calling conventions
interpreter state structures
memory allocatorsBinary compatibility strategies:
| Strategy | Tradeoff |
|---|---|
| Full API | Maximum power, tighter coupling |
| Stable ABI | Reduced access, broader compatibility |
Stable ABI extensions avoid direct access to many internal structures.
63.23 Common Extension Bugs
Typical failure classes:
| Bug | Cause |
|---|---|
| Reference leak | Missing Py_DECREF |
| Use-after-free | Incorrect ownership |
| Double free | Extra Py_DECREF |
| Crashes during shutdown | Global state assumptions |
| Thread corruption | API calls without GIL |
| Refcount corruption | Borrowed/new confusion |
| ABI breakage | Internal API dependence |
Most extension debugging eventually reduces to:
ownership
lifetime
threading
interpreter state63.24 Extension Modules vs Embedding
Extension modules:
Python process
↓
native module loaded into interpreterEmbedding:
native application
↓
embedded CPython runtimeExtensions extend Python outward.
Embedding pulls Python inward.
Many systems use both simultaneously.
63.25 Real-World Architecture
Large extensions rarely stay as one file.
Typical layout:
module init
↓
type definitions
↓
runtime wrappers
↓
conversion helpers
↓
error handling
↓
memory management
↓
native library bindingsScientific libraries often include:
vector kernels
SIMD paths
thread pools
GPU backends
custom allocators
buffer interfaceswhile exposing Pythonic APIs externally.
63.26 The CPython Import View
From the import system perspective, extension modules are loaders that produce module objects.
The import system treats:
import mathand:
import pathlibsimilarly at high level.
But internally:
| Module type | Implementation |
|---|---|
math | Native shared library |
pathlib | Python source |
_io | Mostly native |
asyncio | Mostly Python |
_ssl | OpenSSL wrapper |
The import system abstracts over these differences.
63.27 Chapter Summary
Extension modules are dynamically loaded native libraries integrated into CPython through the Python C API. They expose Python-visible functions, types, constants, and state implemented in native machine code.
Each extension exports a module initializer, defines functions using PyMethodDef, creates modules through PyModuleDef, parses arguments using C API helpers, and returns Python objects through explicit ownership rules.
Extension modules provide performance, systems integration, and interoperability with native ecosystems. They also introduce complexity around reference counting, interpreter lifetime, ABI compatibility, thread safety, and runtime integration.
They are one of the central architectural mechanisms that connect Python code to the lower-level systems world.