# 46. The GIL

# 46. The GIL

The Global Interpreter Lock, usually called the GIL, is CPython’s process-level execution lock for Python bytecode in the traditional build. It ensures that only one thread at a time executes Python code in a given interpreter.

The GIL is not the same as Python’s threading API. Python can create many operating system threads. The restriction is that, in the normal CPython runtime, those threads do not run Python bytecode in parallel inside the same interpreter.

The GIL is one of CPython’s most important implementation choices because it affects memory management, extension modules, object safety, performance, concurrency, and the future direction of the runtime.

## 46.1 Why the GIL Exists

CPython’s core memory management is based on reference counting.

Every object has a reference count. When a new reference is created, CPython increments the count. When a reference is released, CPython decrements it. When the count reaches zero, CPython deallocates the object.

Conceptually:

```text id="o0dawn"
x = obj       increments or owns a reference
del x         decrements a reference
refcount == 0 deallocates the object
```

Reference count operations happen constantly. Almost every bytecode instruction touches object references.

Without the GIL, these reference count updates would need fine-grained synchronization or atomic operations. The GIL allows CPython to perform many internal operations under a single coarse lock.

The design is simple and robust:

```text id="3x0n06"
before executing Python bytecode, a thread must hold the GIL
while holding the GIL, it can safely manipulate most Python objects
when blocked or scheduled out, it may release the GIL
```

## 46.2 What the GIL Protects

The GIL protects many interpreter invariants.

Important examples:

```text id="r6i809"
reference count updates
object allocation and deallocation paths
type object state
module dictionaries
frame evaluation state
interpreter bookkeeping
garbage collector state
exception state transitions
many C API operations
```

The GIL does not make all Python programs logically thread-safe. It protects CPython internals from memory corruption. It does not protect application-level invariants.

This code can still have a race:

```python id="jlgrnm"
counter = 0

def increment():
    global counter
    counter += 1
```

The expression `counter += 1` involves multiple steps:

```text id="siqnlk"
load counter
load constant 1
add
store counter
```

Another thread may run between operations. The GIL prevents simultaneous bytecode execution, but thread switches can occur between bytecodes or interpreter scheduling points.

Use application locks for application invariants.

## 46.3 The GIL and Python Threads

Python threads are real operating system threads.

```python id="h8vf7c"
import threading

def worker():
    print("running")

t = threading.Thread(target=worker)
t.start()
t.join()
```

CPython creates an OS thread. That thread can run Python code only when it holds the GIL.

In a CPU-bound Python workload, threads contend for the GIL:

```python id="4s6j45"
def burn():
    total = 0
    for i in range(100_000_000):
        total += i
    return total
```

Running this in several threads usually does not scale across cores because only one thread executes Python bytecode at a time.

## 46.4 The GIL and I/O

Threads can still be useful in CPython for I/O-bound programs.

When a thread performs blocking I/O, CPython or the underlying C extension can release the GIL while waiting.

Examples:

```text id="o5hisv"
socket reads and writes
file reads and writes
sleep calls
some DNS operations
some database driver waits
subprocess waits
native library calls that release the GIL
```

While one thread waits for I/O, another thread can acquire the GIL and execute Python bytecode.

Example:

```python id="zoptdu"
import threading
import time

def worker(n):
    time.sleep(1)
    print(n)

threads = [threading.Thread(target=worker, args=(i,)) for i in range(10)]

for t in threads:
    t.start()

for t in threads:
    t.join()
```

This completes in roughly one second rather than ten because `sleep` releases control while waiting.

## 46.5 CPU-Bound vs I/O-Bound Work

The GIL mostly affects CPU-bound Python code.

| Workload | Threads in CPython |
|---|---|
| Network I/O | Often useful |
| File I/O | Often useful |
| Waiting on subprocesses | Often useful |
| CPU-bound pure Python loops | Usually poor scaling |
| NumPy operations | Can scale if native code releases the GIL |
| Compression or hashing | Depends on implementation |
| C extensions | Can scale if they release the GIL |

The key distinction is whether the thread spends most time executing Python bytecode or waiting/running native code outside the GIL.

## 46.6 The GIL Is Not a Mutex for Your Data

The GIL is an interpreter lock, not an application data lock.

This code is unsafe as a logical counter:

```python id="e3hqpz"
counter = 0

def increment_many():
    global counter
    for _ in range(100_000):
        counter += 1
```

Use a lock:

```python id="ow12ja"
import threading

counter = 0
lock = threading.Lock()

def increment_many():
    global counter
    for _ in range(100_000):
        with lock:
            counter += 1
```

The GIL prevents memory corruption in CPython. It does not make compound operations atomic at the program semantics level.

## 46.7 Atomic-Looking Operations

Some operations appear atomic in practice because they execute under the GIL and happen in a short C code path.

Examples often include:

```python id="g5ww2k"
items.append(x)
d[key] = value
```

But relying on accidental atomicity is fragile.

Reasons:

```text id="eubt8m"
implementation details can change
custom methods may execute Python code
destructors may run
hashing or equality may call Python code
other Python implementations may behave differently
free-threaded CPython changes assumptions
```

Prefer explicit locks or thread-safe queues.

```python id="volsne"
import queue

q = queue.Queue()
q.put(item)
item = q.get()
```

## 46.8 GIL Scheduling

CPython periodically gives other threads a chance to run.

The interpreter uses a switching interval to decide how often to check for thread switches.

You can inspect and change it:

```python id="rtdbpv"
import sys

print(sys.getswitchinterval())
sys.setswitchinterval(0.005)
```

This interval is not a hard real-time scheduling guarantee. It is a hint controlling how often the interpreter checks for thread switching.

Thread scheduling also depends on the operating system, blocking calls, signal handling, and extension module behavior.

## 46.9 Long-Running C Code

A C extension that runs for a long time while holding the GIL can block all Python threads.

Bad shape:

```c id="7qrs08"
static PyObject *
slow(PyObject *self, PyObject *args)
{
    long long total = 0;

    for (long long i = 0; i < 10000000000LL; i++) {
        total += i;
    }

    return PyLong_FromLongLong(total);
}
```

While this function runs, other Python threads cannot execute Python bytecode if the GIL remains held.

A well-behaved extension releases the GIL around long-running native work that does not touch Python objects.

## 46.10 Releasing the GIL in C Extensions

C extensions can release the GIL using CPython macros.

Typical pattern:

```c id="9508vk"
Py_BEGIN_ALLOW_THREADS

/* long-running C code that does not touch Python objects */

Py_END_ALLOW_THREADS
```

Between these macros, the current thread does not hold the GIL.

Rules:

```text id="z0dvx6"
do not access Python objects
do not call most Python C API functions
do not mutate Python-owned memory
use native synchronization for native shared state
reacquire the GIL before returning Python objects or raising exceptions
```

This is how native libraries can allow parallelism.

## 46.11 Native Libraries and Parallelism

Many Python performance libraries call native code that releases the GIL.

Examples include numerical kernels, compression, cryptography, image processing, and database drivers, depending on the implementation.

In such cases, several Python threads can call into native code and run on multiple CPU cores.

The model is:

```text id="x6g7lc"
Python thread holds GIL
enters extension function
extension validates arguments
extension releases GIL
native code runs in parallel
extension reacquires GIL
extension returns Python object
```

This is why threaded Python programs can scale for some workloads but not for pure Python loops.

## 46.12 The GIL and Reference Counts

Reference counting is cheap under the GIL because increments and decrements do not need to be independently synchronized in the traditional build.

A typical C API function manipulates references freely while holding the GIL.

Example shape:

```c id="exlt3e"
Py_INCREF(obj);
Py_DECREF(other);
```

Without the GIL, every reference count operation becomes more complicated.

Possible approaches include:

```text id="huwwjq"
atomic reference counts
biased reference counting
deferred reference counting
immortal objects
per-thread reference ownership schemes
fine-grained object locks
```

Free-threaded CPython work exists because removing the GIL requires redesigning many low-level assumptions.

## 46.13 The GIL and Object Invariants

The GIL lets CPython assume many object internals are not mutated concurrently by two Python threads.

For example, list append can update internal list fields under the GIL.

Conceptually:

```text id="ftlbww"
check capacity
resize if needed
write item pointer
increment size
```

Without synchronization, another thread could observe an inconsistent intermediate state.

The GIL makes these internal transitions safe for the interpreter. It does not necessarily make higher-level sequences of operations safe.

## 46.14 The GIL and Garbage Collection

The cyclic garbage collector walks object graphs.

That requires stable enough object relationships while collection runs.

The GIL helps ensure the collector can inspect containers, reference links, and object flags without arbitrary concurrent Python-level mutation from another thread in the same interpreter.

Free-threaded designs need additional mechanisms to make GC safe without relying on one global bytecode lock.

## 46.15 The GIL and Finalizers

Object destruction can run Python code.

For example, a class can define `__del__`:

```python id="cthjk8"
class Resource:
    def __del__(self):
        print("finalizing")
```

When the reference count reaches zero, CPython may deallocate the object immediately. Deallocation may trigger finalizers, weakref callbacks, or cleanup code.

This can happen while releasing a reference during ordinary execution.

The GIL ensures that finalization occurs within a controlled interpreter state, but finalizers can still cause reentrant behavior.

Design rule:

```text id="zkkyev"
avoid complex logic in __del__
use context managers for resource lifetime
```

Prefer:

```python id="bfh4xs"
with open("data.txt") as f:
    data = f.read()
```

over relying on finalization timing.

## 46.16 The GIL and Signals

CPython handles signals in the main thread at safe evaluation points.

The GIL interacts with this because bytecode execution checks pending calls and signal flags.

Signals do not run arbitrary Python handlers asynchronously in the middle of any C instruction. CPython records the signal and later runs the Python handler at a safe point in the main thread.

This reduces corruption risk but means signal handling can be delayed while long-running C code holds the GIL.

## 46.17 The GIL and Asyncio

`asyncio` usually runs many tasks on one thread.

The GIL is not the primary concurrency limit for a single event loop because only one task is executing Python code at a time anyway.

`asyncio` concurrency comes from cooperative suspension:

```python id="i9yivc"
await socket_read()
```

During the await, the event loop can run other tasks.

But CPU-bound Python code blocks the event loop:

```python id="x84njx"
async def handler():
    total = 0
    for i in range(100_000_000):
        total += i
    return total
```

The GIL is not the only issue here. The task never yields control.

Use process pools, native code, or explicit offloading for CPU-bound work.

## 46.18 The GIL and Multiprocessing

Multiprocessing avoids the GIL by using multiple processes.

Each process has its own interpreter, memory space, and GIL.

```python id="67le7z"
from multiprocessing import Pool

def square(x):
    return x * x

with Pool() as pool:
    print(pool.map(square, range(10)))
```

This can use multiple CPU cores for pure Python CPU-bound work.

Tradeoffs:

```text id="brnxya"
data must be serialized or shared explicitly
process startup has overhead
memory is not shared by default
debugging is more complex
interprocess communication costs matter
```

Multiprocessing is often the simplest path to CPU parallelism for pure Python code.

## 46.19 The GIL and Subinterpreters

Subinterpreters allow multiple Python interpreters inside one process.

Historically, the GIL was process-wide in normal CPython. Modern work has moved toward per-interpreter GIL designs and free-threaded builds.

The design goal is to allow better isolation and concurrency while preserving compatibility where possible.

Subinterpreters raise hard questions:

```text id="ossf3k"
which objects can be shared
how extension module state is isolated
how memory allocation works
how imports behave per interpreter
how C globals are handled
```

The GIL story becomes more nuanced when one process contains multiple interpreters.

## 46.20 Free-Threaded CPython

Free-threaded CPython refers to builds that can run Python code in parallel without the traditional global interpreter lock.

This requires major runtime changes.

Areas affected include:

```text id="wxb6s7"
reference counting
object layout
container synchronization
memory allocation
garbage collection
C API assumptions
extension module compatibility
borrowed references
immortal objects
interpreter state access
```

The free-threaded runtime is not just “CPython without one lock.” It is a different synchronization design for the same language implementation.

## 46.21 Immortal Objects

Immortal objects are objects whose reference counts are treated specially so they are not deallocated in the usual way.

This helps reduce reference count overhead for common static objects.

Examples of candidates include:

```text id="pnb80n"
None
True
False
small integers
some interned strings
static runtime objects
```

Immortal objects are useful in free-threaded work because they reduce the number of objects needing synchronized reference count changes.

They also help performance in traditional builds by avoiding unnecessary refcount churn for heavily used objects.

## 46.22 Borrowed References and the GIL

The CPython C API historically uses borrowed references.

A borrowed reference is a pointer to a Python object that you do not own.

Example shape:

```c id="0k1gaj"
PyObject *item = PyList_GetItem(list, index);  /* borrowed */
```

Under the traditional GIL, borrowed references are often safe for short local use because no other Python thread can concurrently mutate the object graph while the current thread holds the GIL.

Without the GIL, borrowed references become more dangerous. Another thread might remove the object while native code still holds a borrowed pointer.

This is one reason free-threaded CPython affects the C API and extension design.

## 46.23 GIL State API

CPython provides APIs for native threads that need to call into Python.

Common pattern:

```c id="r58i77"
PyGILState_STATE state = PyGILState_Ensure();

/* call Python C API */

PyGILState_Release(state);
```

This ensures the current native thread has the GIL and an appropriate thread state.

This is used when native code creates threads outside Python and later wants to interact with Python objects or call Python callbacks.

Rules:

```text id="3nhj6d"
acquire GIL before using Python C API
release it when done
do not keep borrowed references across unsafe boundaries
understand interpreter and thread state assumptions
```

## 46.24 Thread State

Each Python thread that executes Python code has a thread state.

The thread state stores execution-related data:

```text id="d57t3r"
current frame
exception state
recursion depth
current interpreter
tracing and profiling state
async exception state
context information
```

The GIL and thread state are closely related. Holding the GIL allows the thread to safely operate on interpreter state.

At the C level, many APIs assume there is a current thread state.

## 46.25 Releasing the GIL Around Blocking I/O

A C extension wrapping blocking I/O should release the GIL while waiting.

Shape:

```c id="n6p951"
static PyObject *
read_from_device(PyObject *self, PyObject *args)
{
    int result;

    Py_BEGIN_ALLOW_THREADS
    result = blocking_device_read();
    Py_END_ALLOW_THREADS

    if (result < 0) {
        return PyErr_SetFromErrno(PyExc_OSError);
    }

    return PyLong_FromLong(result);
}
```

The extension must not touch Python objects while the GIL is released.

Argument parsing happens before release. Python object creation and error handling happen after reacquiring the GIL.

## 46.26 The GIL and Fairness

The GIL has historically had fairness issues. A CPU-bound thread could reacquire the GIL quickly and reduce progress for other threads.

Modern CPython uses mechanisms to improve fairness, but thread scheduling is still influenced by:

```text id="amc7rj"
switch interval
operating system scheduler
blocking operations
extension behavior
number of active threads
CPU topology
```

The GIL is not a real-time scheduler.

Programs requiring strict scheduling guarantees need explicit concurrency design beyond Python threads.

## 46.27 The GIL and Latency

The GIL can affect latency.

In a server, if one thread runs CPU-heavy Python code while holding the GIL, other Python threads may wait.

Symptoms:

```text id="0df779"
request latency spikes
background tasks delay foreground work
signal handling delay
logging or monitoring thread stalls
thread pool saturation
```

Mitigations:

```text id="5t8jbo"
move CPU work to processes
use native extensions that release the GIL
break long work into smaller chunks
avoid CPU-heavy work in request threads
precompute or cache
use async carefully for I/O, not CPU loops
```

## 46.28 The GIL and Memory Safety

The GIL is a large part of CPython’s memory safety story.

Because only one thread executes Python bytecode at a time, many object operations can be implemented without per-object locks.

This simplifies:

```text id="m0q2hm"
object refcounting
list resizing
dict mutation
type cache updates
frame evaluation
exception propagation
```

Removing the GIL requires replacing one broad safety mechanism with many narrower mechanisms.

That can improve parallelism but increases implementation complexity.

## 46.29 The GIL and C Extension Compatibility

Many C extensions assume the traditional GIL.

Assumptions include:

```text id="fzpl21"
borrowed references remain valid while the GIL is held
object fields are stable during C API calls
global C state is protected by the GIL
callbacks into Python happen with the GIL held
reference count operations are cheap and unsynchronized
```

Free-threaded CPython requires extensions to be audited and sometimes changed.

Extensions that already avoid global mutable state, use per-module state, and release the GIL carefully are better positioned.

## 46.30 Debugging GIL-Related Problems

Common symptoms:

```text id="l6r6dj"
threads do not speed up CPU-bound code
program hangs when extension code runs
latency spikes under threaded load
native callback crashes
deadlock involving Python locks and C locks
background thread cannot make progress
```

Useful tools and techniques:

```text id="2tkv0p"
thread dumps with faulthandler
profiling CPU-bound sections
checking extension code for GIL release
using multiprocessing for CPU tests
measuring import and startup contention
testing under load with realistic thread counts
```

Example thread dump:

```python id="i1a49o"
import faulthandler
import signal

faulthandler.register(signal.SIGUSR1)
```

Then send the signal to inspect where threads are blocked.

## 46.31 Design Rules for Python Code

For ordinary Python programs:

```text id="b8b2pc"
use threads for I/O concurrency
use asyncio for structured I/O concurrency
use multiprocessing for pure Python CPU parallelism
use native libraries for numeric or systems-heavy CPU work
use locks for shared mutable state
avoid assuming accidental atomicity
keep long CPU loops out of request threads
```

The GIL is rarely a problem for small scripts. It matters when workloads become concurrent, CPU-heavy, or latency-sensitive.

## 46.32 Design Rules for C Extensions

For C extension authors:

```text id="4egduq"
hold the GIL when touching Python objects
release the GIL around long native work
do not use borrowed references beyond their safe lifetime
avoid mutable process-global state
use per-module state where possible
support multi-phase initialization
protect native shared state with native locks
prepare for free-threaded compatibility
```

Correct GIL handling is part of extension correctness, not just performance.

## 46.33 A Minimal GIL Mental Model

Use this model:

```text id="e806qy"
A Python thread must hold the GIL to execute Python bytecode.

The GIL protects CPython internal object and interpreter state.

Blocking I/O and some native code can release the GIL.

Pure Python CPU-bound threads usually do not run in parallel.

The GIL does not protect application-level invariants.

C extensions must hold the GIL when using Python objects.

Free-threaded CPython replaces this broad lock with finer synchronization.
```

This model is enough to reason about most CPython threading behavior.

## 46.34 Key Points

The GIL is CPython’s traditional global execution lock for Python bytecode.

It exists largely because CPython uses reference counting and mutable shared runtime structures.

The GIL protects interpreter memory safety, not application logic.

Threads are useful for I/O-bound workloads because blocking operations can release the GIL.

Pure Python CPU-bound threads usually do not scale across cores.

C extensions can release the GIL around long-running native work.

The GIL interacts with reference counting, garbage collection, finalizers, signals, native callbacks, and extension module design.

Free-threaded CPython changes many assumptions, especially for the C API and extension modules.

Use explicit locks for shared state, processes for pure Python CPU parallelism, and native code that releases the GIL for compute-heavy threaded work.
