46. The GIL

The Global Interpreter Lock, usually called the GIL, is CPython’s process-level execution lock for Python bytecode in the traditional build. It ensures that only one thread at a time executes Python code in a given interpreter.

The GIL is not the same as Python’s threading API. Python can create many operating system threads. The restriction is that, in the normal CPython runtime, those threads do not run Python bytecode in parallel inside the same interpreter.

The GIL is one of CPython’s most important implementation choices because it affects memory management, extension modules, object safety, performance, concurrency, and the future direction of the runtime.

46.1 Why the GIL Exists

CPython’s core memory management is based on reference counting.

Every object has a reference count. When a new reference is created, CPython increments the count. When a reference is released, CPython decrements it. When the count reaches zero, CPython deallocates the object.

Conceptually:

x = obj       increments or owns a reference
del x         decrements a reference
refcount == 0 deallocates the object

Reference count operations happen constantly. Almost every bytecode instruction touches object references.

Without the GIL, these reference count updates would need fine-grained synchronization or atomic operations. The GIL allows CPython to perform many internal operations under a single coarse lock.

The design is simple and robust:

before executing Python bytecode, a thread must hold the GIL
while holding the GIL, it can safely manipulate most Python objects
when blocked or scheduled out, it may release the GIL

46.2 What the GIL Protects

The GIL protects many interpreter invariants.

Important examples:

reference count updates
object allocation and deallocation paths
type object state
module dictionaries
frame evaluation state
interpreter bookkeeping
garbage collector state
exception state transitions
many C API operations

The GIL does not make all Python programs logically thread-safe. It protects CPython internals from memory corruption. It does not protect application-level invariants.

This code can still have a race:

counter = 0

def increment():
    global counter
    counter += 1

The expression counter += 1 involves multiple steps:

load counter
load constant 1
add
store counter

Another thread may run between operations. The GIL prevents simultaneous bytecode execution, but thread switches can occur between bytecodes or interpreter scheduling points.

Use application locks for application invariants.

46.3 The GIL and Python Threads

Python threads are real operating system threads.

import threading

def worker():
    print("running")

t = threading.Thread(target=worker)
t.start()
t.join()

CPython creates an OS thread. That thread can run Python code only when it holds the GIL.

In a CPU-bound Python workload, threads contend for the GIL:

def burn():
    total = 0
    for i in range(100_000_000):
        total += i
    return total

Running this in several threads usually does not scale across cores because only one thread executes Python bytecode at a time.

46.4 The GIL and I/O

Threads can still be useful in CPython for I/O-bound programs.

When a thread performs blocking I/O, CPython or the underlying C extension can release the GIL while waiting.

Examples:

socket reads and writes
file reads and writes
sleep calls
some DNS operations
some database driver waits
subprocess waits
native library calls that release the GIL

While one thread waits for I/O, another thread can acquire the GIL and execute Python bytecode.

Example:

import threading
import time

def worker(n):
    time.sleep(1)
    print(n)

threads = [threading.Thread(target=worker, args=(i,)) for i in range(10)]

for t in threads:
    t.start()

for t in threads:
    t.join()

This completes in roughly one second rather than ten because sleep releases control while waiting.

46.5 CPU-Bound vs I/O-Bound Work

The GIL mostly affects CPU-bound Python code.

Workload	Threads in CPython
Network I/O	Often useful
File I/O	Often useful
Waiting on subprocesses	Often useful
CPU-bound pure Python loops	Usually poor scaling
NumPy operations	Can scale if native code releases the GIL
Compression or hashing	Depends on implementation
C extensions	Can scale if they release the GIL

The key distinction is whether the thread spends most time executing Python bytecode or waiting/running native code outside the GIL.

46.6 The GIL Is Not a Mutex for Your Data

The GIL is an interpreter lock, not an application data lock.

This code is unsafe as a logical counter:

counter = 0

def increment_many():
    global counter
    for _ in range(100_000):
        counter += 1

Use a lock:

import threading

counter = 0
lock = threading.Lock()

def increment_many():
    global counter
    for _ in range(100_000):
        with lock:
            counter += 1

The GIL prevents memory corruption in CPython. It does not make compound operations atomic at the program semantics level.

46.7 Atomic-Looking Operations

Some operations appear atomic in practice because they execute under the GIL and happen in a short C code path.

Examples often include:

items.append(x)
d[key] = value

But relying on accidental atomicity is fragile.

Reasons:

implementation details can change
custom methods may execute Python code
destructors may run
hashing or equality may call Python code
other Python implementations may behave differently
free-threaded CPython changes assumptions

Prefer explicit locks or thread-safe queues.

import queue

q = queue.Queue()
q.put(item)
item = q.get()

46.8 GIL Scheduling

CPython periodically gives other threads a chance to run.

The interpreter uses a switching interval to decide how often to check for thread switches.

You can inspect and change it:

import sys

print(sys.getswitchinterval())
sys.setswitchinterval(0.005)

This interval is not a hard real-time scheduling guarantee. It is a hint controlling how often the interpreter checks for thread switching.

Thread scheduling also depends on the operating system, blocking calls, signal handling, and extension module behavior.

46.9 Long-Running C Code

A C extension that runs for a long time while holding the GIL can block all Python threads.

Bad shape:

static PyObject *
slow(PyObject *self, PyObject *args)
{
    long long total = 0;

    for (long long i = 0; i < 10000000000LL; i++) {
        total += i;
    }

    return PyLong_FromLongLong(total);
}

While this function runs, other Python threads cannot execute Python bytecode if the GIL remains held.

A well-behaved extension releases the GIL around long-running native work that does not touch Python objects.

46.10 Releasing the GIL in C Extensions

C extensions can release the GIL using CPython macros.

Typical pattern:

Py_BEGIN_ALLOW_THREADS

/* long-running C code that does not touch Python objects */

Py_END_ALLOW_THREADS

Between these macros, the current thread does not hold the GIL.

Rules:

do not access Python objects
do not call most Python C API functions
do not mutate Python-owned memory
use native synchronization for native shared state
reacquire the GIL before returning Python objects or raising exceptions

This is how native libraries can allow parallelism.

46.11 Native Libraries and Parallelism

Many Python performance libraries call native code that releases the GIL.

Examples include numerical kernels, compression, cryptography, image processing, and database drivers, depending on the implementation.

In such cases, several Python threads can call into native code and run on multiple CPU cores.

The model is:

Python thread holds GIL
enters extension function
extension validates arguments
extension releases GIL
native code runs in parallel
extension reacquires GIL
extension returns Python object

This is why threaded Python programs can scale for some workloads but not for pure Python loops.

46.12 The GIL and Reference Counts

Reference counting is cheap under the GIL because increments and decrements do not need to be independently synchronized in the traditional build.

A typical C API function manipulates references freely while holding the GIL.

Example shape:

Py_INCREF(obj);
Py_DECREF(other);

Without the GIL, every reference count operation becomes more complicated.

Possible approaches include:

atomic reference counts
biased reference counting
deferred reference counting
immortal objects
per-thread reference ownership schemes
fine-grained object locks

Free-threaded CPython work exists because removing the GIL requires redesigning many low-level assumptions.

46.13 The GIL and Object Invariants

The GIL lets CPython assume many object internals are not mutated concurrently by two Python threads.

For example, list append can update internal list fields under the GIL.

Conceptually:

check capacity
resize if needed
write item pointer
increment size

Without synchronization, another thread could observe an inconsistent intermediate state.

The GIL makes these internal transitions safe for the interpreter. It does not necessarily make higher-level sequences of operations safe.

46.14 The GIL and Garbage Collection

The cyclic garbage collector walks object graphs.

That requires stable enough object relationships while collection runs.

The GIL helps ensure the collector can inspect containers, reference links, and object flags without arbitrary concurrent Python-level mutation from another thread in the same interpreter.

Free-threaded designs need additional mechanisms to make GC safe without relying on one global bytecode lock.

46.15 The GIL and Finalizers

Object destruction can run Python code.

For example, a class can define __del__:

class Resource:
    def __del__(self):
        print("finalizing")

When the reference count reaches zero, CPython may deallocate the object immediately. Deallocation may trigger finalizers, weakref callbacks, or cleanup code.

This can happen while releasing a reference during ordinary execution.

The GIL ensures that finalization occurs within a controlled interpreter state, but finalizers can still cause reentrant behavior.

Design rule:

avoid complex logic in __del__
use context managers for resource lifetime

Prefer:

with open("data.txt") as f:
    data = f.read()

over relying on finalization timing.

46.16 The GIL and Signals

CPython handles signals in the main thread at safe evaluation points.

The GIL interacts with this because bytecode execution checks pending calls and signal flags.

Signals do not run arbitrary Python handlers asynchronously in the middle of any C instruction. CPython records the signal and later runs the Python handler at a safe point in the main thread.

This reduces corruption risk but means signal handling can be delayed while long-running C code holds the GIL.

46.17 The GIL and Asyncio

asyncio usually runs many tasks on one thread.

The GIL is not the primary concurrency limit for a single event loop because only one task is executing Python code at a time anyway.

asyncio concurrency comes from cooperative suspension:

await socket_read()

During the await, the event loop can run other tasks.

But CPU-bound Python code blocks the event loop:

async def handler():
    total = 0
    for i in range(100_000_000):
        total += i
    return total

The GIL is not the only issue here. The task never yields control.

Use process pools, native code, or explicit offloading for CPU-bound work.

46.18 The GIL and Multiprocessing

Multiprocessing avoids the GIL by using multiple processes.

Each process has its own interpreter, memory space, and GIL.

from multiprocessing import Pool

def square(x):
    return x * x

with Pool() as pool:
    print(pool.map(square, range(10)))

This can use multiple CPU cores for pure Python CPU-bound work.

Tradeoffs:

data must be serialized or shared explicitly
process startup has overhead
memory is not shared by default
debugging is more complex
interprocess communication costs matter

Multiprocessing is often the simplest path to CPU parallelism for pure Python code.

46.19 The GIL and Subinterpreters

Subinterpreters allow multiple Python interpreters inside one process.

Historically, the GIL was process-wide in normal CPython. Modern work has moved toward per-interpreter GIL designs and free-threaded builds.

The design goal is to allow better isolation and concurrency while preserving compatibility where possible.

Subinterpreters raise hard questions:

which objects can be shared
how extension module state is isolated
how memory allocation works
how imports behave per interpreter
how C globals are handled

The GIL story becomes more nuanced when one process contains multiple interpreters.

46.20 Free-Threaded CPython

Free-threaded CPython refers to builds that can run Python code in parallel without the traditional global interpreter lock.

This requires major runtime changes.

Areas affected include:

reference counting
object layout
container synchronization
memory allocation
garbage collection
C API assumptions
extension module compatibility
borrowed references
immortal objects
interpreter state access

The free-threaded runtime is not just “CPython without one lock.” It is a different synchronization design for the same language implementation.

46.21 Immortal Objects

Immortal objects are objects whose reference counts are treated specially so they are not deallocated in the usual way.

This helps reduce reference count overhead for common static objects.

Examples of candidates include:

None
True
False
small integers
some interned strings
static runtime objects

Immortal objects are useful in free-threaded work because they reduce the number of objects needing synchronized reference count changes.

They also help performance in traditional builds by avoiding unnecessary refcount churn for heavily used objects.

46.22 Borrowed References and the GIL

The CPython C API historically uses borrowed references.

A borrowed reference is a pointer to a Python object that you do not own.

Example shape:

PyObject *item = PyList_GetItem(list, index);  /* borrowed */

Under the traditional GIL, borrowed references are often safe for short local use because no other Python thread can concurrently mutate the object graph while the current thread holds the GIL.

Without the GIL, borrowed references become more dangerous. Another thread might remove the object while native code still holds a borrowed pointer.

This is one reason free-threaded CPython affects the C API and extension design.

46.23 GIL State API

CPython provides APIs for native threads that need to call into Python.

Common pattern:

PyGILState_STATE state = PyGILState_Ensure();

/* call Python C API */

PyGILState_Release(state);

This ensures the current native thread has the GIL and an appropriate thread state.

This is used when native code creates threads outside Python and later wants to interact with Python objects or call Python callbacks.

Rules:

acquire GIL before using Python C API
release it when done
do not keep borrowed references across unsafe boundaries
understand interpreter and thread state assumptions

46.24 Thread State

Each Python thread that executes Python code has a thread state.

The thread state stores execution-related data:

current frame
exception state
recursion depth
current interpreter
tracing and profiling state
async exception state
context information

The GIL and thread state are closely related. Holding the GIL allows the thread to safely operate on interpreter state.

At the C level, many APIs assume there is a current thread state.

46.25 Releasing the GIL Around Blocking I/O

A C extension wrapping blocking I/O should release the GIL while waiting.

Shape:

static PyObject *
read_from_device(PyObject *self, PyObject *args)
{
    int result;

    Py_BEGIN_ALLOW_THREADS
    result = blocking_device_read();
    Py_END_ALLOW_THREADS

    if (result < 0) {
        return PyErr_SetFromErrno(PyExc_OSError);
    }

    return PyLong_FromLong(result);
}

The extension must not touch Python objects while the GIL is released.

Argument parsing happens before release. Python object creation and error handling happen after reacquiring the GIL.

46.26 The GIL and Fairness

The GIL has historically had fairness issues. A CPU-bound thread could reacquire the GIL quickly and reduce progress for other threads.

Modern CPython uses mechanisms to improve fairness, but thread scheduling is still influenced by:

switch interval
operating system scheduler
blocking operations
extension behavior
number of active threads
CPU topology

The GIL is not a real-time scheduler.

Programs requiring strict scheduling guarantees need explicit concurrency design beyond Python threads.

46.27 The GIL and Latency

The GIL can affect latency.

In a server, if one thread runs CPU-heavy Python code while holding the GIL, other Python threads may wait.

Symptoms:

request latency spikes
background tasks delay foreground work
signal handling delay
logging or monitoring thread stalls
thread pool saturation

Mitigations:

move CPU work to processes
use native extensions that release the GIL
break long work into smaller chunks
avoid CPU-heavy work in request threads
precompute or cache
use async carefully for I/O, not CPU loops

46.28 The GIL and Memory Safety

The GIL is a large part of CPython’s memory safety story.

Because only one thread executes Python bytecode at a time, many object operations can be implemented without per-object locks.

This simplifies:

object refcounting
list resizing
dict mutation
type cache updates
frame evaluation
exception propagation

Removing the GIL requires replacing one broad safety mechanism with many narrower mechanisms.

That can improve parallelism but increases implementation complexity.

46.29 The GIL and C Extension Compatibility

Many C extensions assume the traditional GIL.

Assumptions include:

borrowed references remain valid while the GIL is held
object fields are stable during C API calls
global C state is protected by the GIL
callbacks into Python happen with the GIL held
reference count operations are cheap and unsynchronized

Free-threaded CPython requires extensions to be audited and sometimes changed.

Extensions that already avoid global mutable state, use per-module state, and release the GIL carefully are better positioned.

46.30 Debugging GIL-Related Problems

Common symptoms:

threads do not speed up CPU-bound code
program hangs when extension code runs
latency spikes under threaded load
native callback crashes
deadlock involving Python locks and C locks
background thread cannot make progress

Useful tools and techniques:

thread dumps with faulthandler
profiling CPU-bound sections
checking extension code for GIL release
using multiprocessing for CPU tests
measuring import and startup contention
testing under load with realistic thread counts

Example thread dump:

import faulthandler
import signal

faulthandler.register(signal.SIGUSR1)

Then send the signal to inspect where threads are blocked.

46.31 Design Rules for Python Code

For ordinary Python programs:

use threads for I/O concurrency
use asyncio for structured I/O concurrency
use multiprocessing for pure Python CPU parallelism
use native libraries for numeric or systems-heavy CPU work
use locks for shared mutable state
avoid assuming accidental atomicity
keep long CPU loops out of request threads

The GIL is rarely a problem for small scripts. It matters when workloads become concurrent, CPU-heavy, or latency-sensitive.

46.32 Design Rules for C Extensions

For C extension authors:

hold the GIL when touching Python objects
release the GIL around long native work
do not use borrowed references beyond their safe lifetime
avoid mutable process-global state
use per-module state where possible
support multi-phase initialization
protect native shared state with native locks
prepare for free-threaded compatibility

Correct GIL handling is part of extension correctness, not just performance.

46.33 A Minimal GIL Mental Model

Use this model:

A Python thread must hold the GIL to execute Python bytecode.

The GIL protects CPython internal object and interpreter state.

Blocking I/O and some native code can release the GIL.

Pure Python CPU-bound threads usually do not run in parallel.

The GIL does not protect application-level invariants.

C extensions must hold the GIL when using Python objects.

Free-threaded CPython replaces this broad lock with finer synchronization.

This model is enough to reason about most CPython threading behavior.

46.34 Key Points

The GIL is CPython’s traditional global execution lock for Python bytecode.

It exists largely because CPython uses reference counting and mutable shared runtime structures.

The GIL protects interpreter memory safety, not application logic.

Threads are useful for I/O-bound workloads because blocking operations can release the GIL.

Pure Python CPU-bound threads usually do not scale across cores.

C extensions can release the GIL around long-running native work.

The GIL interacts with reference counting, garbage collection, finalizers, signals, native callbacks, and extension module design.

Free-threaded CPython changes many assumptions, especially for the C API and extension modules.

Use explicit locks for shared state, processes for pure Python CPU parallelism, and native code that releases the GIL for compute-heavy threaded work.