94. Per-Interpreter GIL

The per-interpreter GIL is a CPython runtime design where each subinterpreter owns its own Global Interpreter Lock instead of all interpreters sharing one process-wide lock.

The traditional model used one GIL for the whole process:

process
    runtime
        global GIL
        interpreter A
        interpreter B
        interpreter C

The per-interpreter model moves the lock down into each interpreter:

process
    runtime
        interpreter A
            GIL A
        interpreter B
            GIL B
        interpreter C
            GIL C

This allows separate interpreters to execute Python bytecode in parallel, as long as they do not share unsafe runtime state.

The per-interpreter GIL is different from removing the GIL completely. It preserves the GIL inside each interpreter, but gives each interpreter its own independent lock.

94.1 Why Per-Interpreter GIL Exists

The original GIL solved many correctness problems:

reference counting
object mutation
allocator state
import state
runtime caches
C extension assumptions

But one process-wide GIL also meant that all Python threads in the process competed for one global execution lock.

Subinterpreters already existed in CPython. They allowed multiple interpreter states inside one process, but historically they still shared too much global runtime state to provide true parallel execution.

The per-interpreter GIL attempts a middle path:

keep the GIL model inside one interpreter
allow multiple interpreters to run independently
reduce global runtime sharing
avoid requiring all code to become fully free-threaded

It is less radical than free-threaded CPython, but still requires deep runtime changes.

94.2 Interpreter State

A CPython process contains runtime state and one or more interpreter states.

Conceptually:

typedef struct _is PyInterpreterState;
typedef struct _ts PyThreadState;

A PyInterpreterState owns interpreter-level data.

Examples:

module dictionary
builtins
import machinery
codec state
warnings state
GC state
thread states
interpreter configuration
runtime caches

A PyThreadState represents one thread executing inside an interpreter.

Conceptually:

PyInterpreterState
    PyThreadState
    PyThreadState
    PyThreadState

The per-interpreter GIL makes the interpreter state the unit of bytecode execution locking.

94.3 Traditional Process-Wide GIL

In the older model, the process had one effective GIL.

Even if a process contained multiple interpreters:

interpreter A
interpreter B

only one thread could execute Python bytecode at a time across both.

Conceptually:

Thread 1 in interpreter A acquires global GIL
Thread 2 in interpreter B waits

Thread 1 releases global GIL
Thread 2 acquires global GIL

This limited the scalability of subinterpreters. They provided isolation of some state, but not parallel Python execution.

94.4 Per-Interpreter GIL Model

With a per-interpreter GIL, each interpreter has its own lock.

Thread 1 in interpreter A acquires GIL A
Thread 2 in interpreter B acquires GIL B

both execute Python bytecode concurrently

This changes the concurrency model.

Parallelism becomes possible when execution is split across interpreters rather than merely across threads in one interpreter.

A process can then use multiple CPU cores without removing the GIL inside each interpreter.

94.5 Difference From Free-Threaded CPython

Per-interpreter GIL and free-threaded CPython solve related problems differently.

Model	Locking design	Parallel bytecode execution
Traditional GIL	One GIL per process	No, not across Python threads
Per-interpreter GIL	One GIL per interpreter	Yes, across interpreters
Free-threaded CPython	No traditional GIL	Yes, inside one interpreter

The per-interpreter GIL keeps many old assumptions valid within each interpreter:

only one thread executes bytecode in this interpreter
reference counting remains simpler locally
container mutation remains serialized locally
many C extension assumptions remain closer to traditional CPython

Free-threaded CPython removes that protection and replaces it with fine-grained synchronization.

94.6 Why Subinterpreter Isolation Matters

Per-interpreter GIL only works if interpreters do not share unsafe mutable state.

If two interpreters share the same mutable object:

interpreter A mutates object
interpreter B mutates same object

then separate GILs do not protect the object.

The old global GIL accidentally protected shared runtime state. Once the GIL becomes per-interpreter, shared state becomes dangerous.

Therefore CPython must move data from process-global state into interpreter-local state.

Examples:

module state
import state
GC state
exception state
runtime caches
interned objects
type metadata where possible

The more state becomes interpreter-local, the safer parallel subinterpreters become.

94.7 Runtime Global State

CPython historically used many process-global variables.

Examples:

static runtime caches
global singletons
global freelists
global type state
global import machinery data
global extension module state

These globals were convenient because the process-wide GIL serialized access.

With a per-interpreter GIL, such globals become concurrency hazards.

A process-global mutable value must either be:

made immutable
protected by its own lock
moved into PyInterpreterState
made thread-local
eliminated

This creates a large refactoring burden.

94.8 Extension Module State

Extension modules are a major challenge.

Old extension modules often used process-global C variables:

static PyObject *cache;
static int initialized;
static PyTypeObject MyType;

This pattern assumes one global interpreter context.

In a subinterpreter world, it is problematic.

If two interpreters import the same extension module, global state may be shared accidentally:

interpreter A imports module
interpreter B imports module
both use same static C globals

That can break isolation.

Modern extension design prefers per-module state:

typedef struct {
    PyObject *cache;
    PyObject *error_type;
} ModuleState;

Each interpreter gets its own module instance and its own module state.

94.9 Multi-Phase Module Initialization

Multi-phase initialization helps extension modules work with subinterpreters.

Instead of one global initialization function building one process-wide module object, an extension can define module creation and execution phases.

Conceptually:

create module object
allocate per-module state
execute module initialization
store state in module instance

This allows each interpreter to get a separate module instance.

The extension can retrieve its state from the module object rather than from static globals.

Simplified pattern:

typedef struct {
    PyObject *CacheType;
} mod_state;

static int
module_exec(PyObject *m)
{
    mod_state *st = PyModule_GetState(m);
    if (st == NULL) {
        return -1;
    }

    st->CacheType = create_cache_type();
    if (st->CacheType == NULL) {
        return -1;
    }

    return PyModule_AddObjectRef(m, "Cache", st->CacheType);
}

This is more compatible with multiple interpreters.

94.10 Objects Cannot Be Freely Shared

Ordinary Python objects generally cannot be passed directly between interpreters.

A list created in interpreter A belongs to interpreter A:

xs = [1, 2, 3]

The list references type objects, allocator state, GC metadata, and other interpreter-specific structures.

Passing that list directly into interpreter B would create ownership and synchronization problems.

Instead, inter-interpreter communication should use safe channels:

serialization
copying
immutable shareable objects
explicit cross-interpreter data APIs
message passing

This keeps interpreter heaps separate.

94.11 Shareable Objects

Some objects are safer to share than others.

Good candidates:

None
booleans
small immutable values
bytes
strings
immutable memory views
simple serialized data

Bad candidates:

list
dict
set
user-defined mutable objects
open files
generators
frames
coroutines
locks

The safest model treats interpreters as isolated runtimes that exchange messages rather than share object graphs.

94.12 Message Passing Model

A practical subinterpreter design resembles actor-style concurrency.

Each interpreter owns its state:

interpreter A owns heap A
interpreter B owns heap B

They communicate through explicit channels:

interpreter A sends message
runtime copies or transfers safe data
interpreter B receives message

This avoids shared mutable state.

Conceptually:

worker interpreter
    receive task
    run Python code
    send result

A thread pool based on subinterpreters can then run CPU-bound Python code in parallel, while preserving a simpler per-interpreter GIL model.

94.13 Reference Counting With Per-Interpreter GIL

Inside one interpreter, reference counting remains protected by that interpreter’s GIL.

Thread A in interpreter X holds GIL X
    updates refcounts for objects in interpreter X

This avoids full atomic reference counting for ordinary interpreter-local objects.

However, globally shared immortal objects and runtime-level objects still require special handling.

The rule becomes:

interpreter-local objects use interpreter-local protection
process-global objects need global safety

This boundary is central to the design.

94.14 Garbage Collection Per Interpreter

The cyclic garbage collector is naturally interpreter-scoped.

Each interpreter has its own object graph:

interpreter A heap
interpreter B heap

Each graph can be collected independently.

This has useful properties:

GC pauses can be interpreter-local
cycles do not cross interpreter heaps
object ownership is clearer
finalizers run in the owning interpreter

But it requires that object graphs do not contain unsafe cross-interpreter references.

94.15 Import System Isolation

The import system is another major area.

Each interpreter should have its own module table:

import sys
sys.modules

If interpreter A imports mymodule, and interpreter B imports mymodule, they should generally get separate module objects.

This preserves module globals isolation.

Example:

# in interpreter A
import config
config.value = 10

# in interpreter B
import config
print(config.value)

Interpreter B should not accidentally observe interpreter A’s module global mutation unless communication is explicit.

94.16 Builtins and Runtime Constants

Builtins are heavily shared in traditional CPython.

Examples:

None
True
False
Ellipsis
NotImplemented
int
str
list
dict
object
type

Some of these can be safely immortal and shared. Others may require interpreter-local state or careful synchronization.

The runtime must classify objects by lifetime and ownership:

Object kind	Typical handling
Immutable singleton	Immortal and shareable
Builtin type	Often shared or specially managed
Module object	Interpreter-local
User object	Interpreter-local
Frame	Thread/interpreter-local
Mutable cache	Interpreter-local or locked

94.17 Type Objects and Interpreter Isolation

Type objects are complicated.

A type object may hold:

method table
slot functions
base classes
MRO
subclasses
dict
cache data
module state references

Static built-in types can often be shared because they are effectively permanent and carefully managed.

Heap types created by Python code are interpreter-local.

Example:

class User:
    pass

The resulting User type belongs to the interpreter that created it.

Sharing it directly with another interpreter would expose mutable type dictionaries, subclass lists, descriptors, and cached lookup state.

94.18 Thread State and GIL Ownership

Each OS thread executing Python code has a PyThreadState.

In the per-interpreter model, the thread state belongs to one interpreter at a time:

PyThreadState
    interpreter pointer
    current frame
    exception state
    recursion state

To execute bytecode, the thread must acquire that interpreter’s GIL.

Conceptually:

attach thread state to interpreter
acquire interpreter GIL
execute Python code
release interpreter GIL
detach or switch

Switching between interpreters is possible, but must be explicit and carefully managed.

94.19 Scheduling Model

The per-interpreter GIL does not by itself create a scheduler.

It provides a locking model.

Scheduling still depends on:

OS threads
application thread pools
embedding host
subinterpreter API
task dispatch system

A runtime can create several interpreters and assign one worker thread to each.

Conceptually:

main interpreter
    dispatch task 1 to interpreter A
    dispatch task 2 to interpreter B
    dispatch task 3 to interpreter C

Each worker interpreter can execute Python code independently.

94.20 Comparison With Multiprocessing

Subinterpreters with per-interpreter GIL overlap with multiprocessing, but they have different tradeoffs.

Feature	Multiprocessing	Subinterpreters
Isolation	OS process boundary	Interpreter boundary
Parallelism	Yes	Yes, with per-interpreter GIL
Memory sharing	Separate address spaces	Same process address space
Startup cost	Higher	Lower
Crash isolation	Stronger	Weaker
Object sharing	Serialization needed	Usually message passing or restricted sharing
C extension safety	Process-isolated	Must be subinterpreter-safe

Subinterpreters can be lighter than processes, but they provide weaker fault isolation.

A crash in native code can still bring down the whole process.

94.21 Comparison With Threads

Normal Python threads share one interpreter.

one interpreter
    many threads
    one GIL

Subinterpreter workers use multiple interpreters:

many interpreters
    one or more threads each
    one GIL each

Threads are easier for shared-memory programming.

Subinterpreters are better for isolated parallel execution.

The programming model shifts from shared objects to explicit communication.

94.22 Advantages

Per-interpreter GIL offers several advantages:

true parallel bytecode execution across interpreters
less radical than full free-threading
clearer isolation boundary
better fit for plugin systems
lower overhead than multiprocessing in some cases
keeps many traditional GIL assumptions inside one interpreter

It can support workloads such as:

CPU-bound task pools
server plugin isolation
parallel data processing
embedded scripting runtimes
independent user code execution

94.23 Costs and Limitations

The model also has costs:

extension modules must support subinterpreters correctly
objects cannot be freely shared
global runtime state must be removed or protected
debugging becomes more complex
memory use may increase due to duplicated interpreter state
some libraries assume one interpreter per process

A subinterpreter pool may duplicate:

module imports
module globals
caches
class objects
runtime metadata

This can consume more memory than a thread pool.

94.24 Common Misunderstandings

The per-interpreter GIL does not mean every Python thread runs in parallel.

Threads inside the same interpreter still share that interpreter’s GIL.

It also does not mean Python objects are automatically thread-safe across interpreters.

The correct model is:

parallelism comes from multiple interpreters
safety comes from isolation
communication must be explicit

94.25 Design Pressure on CPython

Per-interpreter GIL forces CPython to become more modular internally.

Old style:

static PyObject *global_cache;

New style:

state belongs to runtime, interpreter, module, or thread

Every piece of state needs a clear owner.

This improves architecture even outside subinterpreters.

It makes CPython less dependent on hidden global variables and easier to reason about in concurrent settings.

94.26 Mental Model

Use this model:

A CPython process may contain many interpreters.

Each interpreter has:
    its own GIL
    its own module table
    its own import state
    its own garbage collector state
    its own thread states
    its own object heap boundaries

Threads can run Python bytecode in parallel when they execute in different interpreters.

Objects should stay inside the interpreter that owns them.

Communication should use explicit transfer, copying, serialization, or safe shareable values.

This model explains why per-interpreter GIL is useful and why it requires substantial runtime refactoring.

94.27 Chapter Summary

The per-interpreter GIL moves CPython from one process-wide execution lock to one lock per interpreter.

This enables parallel bytecode execution across subinterpreters while preserving the familiar GIL model inside each interpreter.

The design depends on interpreter isolation:

interpreter-local module state
interpreter-local object ownership
reduced process-global mutable state
subinterpreter-safe extension modules
explicit communication between interpreters

Per-interpreter GIL is a major step toward scalable CPython concurrency. It provides a middle path between traditional single-GIL CPython and fully free-threaded CPython.