# 59. `multiprocessing`

# 59. `multiprocessing`

The `multiprocessing` module provides process-based concurrency. It lets Python programs create child processes, communicate between them, share limited state, coordinate execution, and distribute work across CPU cores.

For CPython internals, `multiprocessing` matters because it is the standard library answer to a central runtime constraint: ordinary Python threads in one traditional CPython interpreter are limited by the Global Interpreter Lock during bytecode execution. Separate processes have separate interpreters, separate heaps, separate GILs, and separate address spaces.

## 59.1 The Role of `multiprocessing`

`multiprocessing` gives Python a high-level API for OS processes.

Example:

```python id="cdnc24"
from multiprocessing import Process

def worker():
    print("child process")

p = Process(target=worker)
p.start()
p.join()
```

This creates a child process, runs `worker()` inside it, and waits for it to finish.

Common uses include:

```text id="d57buz"
CPU-bound parallel work
isolation between tasks
fault containment
parallel data processing
background workers
producer-consumer pipelines
process pools
```

The core model is:

```text id="mcctda"
parent process
    ↓ starts
child process
    ↓ runs Python code in separate interpreter
parent waits or communicates
```

Each process has its own Python runtime state.

## 59.2 Process Isolation

A process is an operating system execution unit with its own virtual address space.

That means this code does not share ordinary Python objects:

```python id="2obw4u"
from multiprocessing import Process

x = []

def worker():
    x.append(1)
    print("child:", x)

p = Process(target=worker)
p.start()
p.join()

print("parent:", x)
```

Typical output:

```text id="s1ppbn"
child: [1]
parent: []
```

The child modifies its own copy or independently created version of `x`. The parent’s list remains unchanged.

This is the most important distinction between threads and processes.

| Concurrency model | Memory model |
|---|---|
| Threads | Shared address space |
| Processes | Separate address spaces |
| `multiprocessing` | Separate Python interpreters communicating explicitly |

## 59.3 Relationship to the GIL

Traditional CPython uses a Global Interpreter Lock per interpreter. In one interpreter, only one thread runs Python bytecode at a time.

`multiprocessing` avoids that limitation by using multiple OS processes.

```text id="21gfhk"
process A
    CPython interpreter
    GIL A

process B
    CPython interpreter
    GIL B

process C
    CPython interpreter
    GIL C
```

Since each process has its own interpreter and GIL, CPU-bound Python code can run in parallel on multiple cores.

Example:

```python id="xkql4n"
from multiprocessing import Pool

def square(x):
    return x * x

with Pool() as pool:
    print(pool.map(square, range(10)))
```

The work is distributed across worker processes.

## 59.4 Process Creation Methods

CPython supports several start methods.

| Method | Main platforms | Behavior |
|---|---|---|
| `fork` | Unix | Child is created by copying parent process state |
| `spawn` | Windows, macOS default in many cases | Fresh interpreter starts and imports main module |
| `forkserver` | Unix | A server process forks clean child processes |

Check available methods:

```python id="4satb7"
import multiprocessing as mp

print(mp.get_all_start_methods())
print(mp.get_start_method())
```

Set a method:

```python id="s19olx"
import multiprocessing as mp

if __name__ == "__main__":
    mp.set_start_method("spawn")
```

The start method strongly affects semantics, performance, and safety.

## 59.5 `fork`

With `fork`, the child process starts as a copy of the parent process.

Conceptually:

```text id="65h2em"
parent process memory
    ↓ fork
child process sees copied memory
```

Modern operating systems usually implement this with copy-on-write pages. Physical memory is not copied immediately. Pages are copied only when modified.

Advantages:

```text id="iaw3k8"
fast startup
inherits loaded modules
inherits initialized data
low initial memory cost with copy-on-write
```

Risks:

```text id="ev1dl7"
unsafe after threads have started
inherits locks in unknown states
inherits open file descriptors
inherits partially initialized runtime state
can interact badly with native libraries
```

Forking a multi-threaded process is especially delicate. Only the thread that calls `fork()` survives in the child, but locks held by other threads may remain locked.

## 59.6 `spawn`

With `spawn`, the child starts a fresh Python interpreter.

Conceptually:

```text id="6menqa"
parent process
    ↓ create new process
fresh Python interpreter
    ↓ import main module
    ↓ unpickle target and arguments
```

Advantages:

```text id="y1xcdk"
clean interpreter state
safer with threads
portable to Windows
avoids inherited lock problems
```

Costs:

```text id="1l6azz"
slower startup
requires picklable targets
imports main module again
does not inherit live Python objects directly
```

With `spawn`, this guard is essential:

```python id="40mgm7"
if __name__ == "__main__":
    ...
```

Without it, importing the main module in the child may recursively create new child processes.

## 59.7 `forkserver`

With `forkserver`, a dedicated server process is started. Later children are forked from that server.

Conceptually:

```text id="cfffh1"
main process
    ↓ starts fork server
fork server
    ↓ forks clean workers on request
worker processes
```

This combines some benefits of `fork` and `spawn`.

It avoids forking from a complex multi-threaded main process while still allowing relatively efficient child creation.

It is mainly available on Unix-like platforms.

## 59.8 The Main Module Rule

When using `spawn`, child processes import the main module.

Therefore, process creation must be protected:

```python id="qgnagx"
from multiprocessing import Process

def worker():
    print("work")

if __name__ == "__main__":
    p = Process(target=worker)
    p.start()
    p.join()
```

Top-level process creation is unsafe:

```python id="pw7bal"
# Bad with spawn
p = Process(target=worker)
p.start()
```

Why:

```text id="xtr1j3"
parent imports main module
    ↓ creates process
child imports main module
    ↓ creates process again
recursive process creation
```

The guard makes top-level code import-safe.

## 59.9 Pickling and Process Boundaries

Processes do not share normal Python objects. Arguments and results usually cross process boundaries through serialization.

`multiprocessing` mostly uses `pickle`.

Example:

```python id="dyjib3"
from multiprocessing import Process

def worker(data):
    print(data)

if __name__ == "__main__":
    p = Process(target=worker, args=({"x": 1},))
    p.start()
    p.join()
```

The dictionary is serialized in the parent and reconstructed in the child.

This implies:

```text id="pwl5ee"
target function must be importable
arguments must be picklable
return values through pools must be picklable
closures and lambdas often fail with spawn
large objects have serialization cost
```

Good target:

```python id="c36hsf"
def worker(x):
    return x * x
```

Problematic target:

```python id="dabx38"
worker = lambda x: x * x
```

Top-level functions are easier to pickle than local functions or lambdas.

## 59.10 `Process`

`Process` is the basic unit.

```python id="c6ityc"
from multiprocessing import Process

def run(name):
    print("hello", name)

if __name__ == "__main__":
    p = Process(target=run, args=("worker-1",))
    p.start()
    p.join()

    print(p.exitcode)
```

Important methods and attributes:

| API | Meaning |
|---|---|
| `start()` | Start child process |
| `join()` | Wait for child to finish |
| `terminate()` | Ask OS to terminate child |
| `kill()` | Force kill where supported |
| `is_alive()` | Check if still running |
| `exitcode` | Process exit status |
| `pid` | OS process ID |
| `name` | Process name |

A process object in the parent is a controller for the child. It is not the child’s memory.

## 59.11 Exit Codes

A child process has an exit code.

```python id="51xga9"
from multiprocessing import Process
import sys

def worker():
    sys.exit(3)

if __name__ == "__main__":
    p = Process(target=worker)
    p.start()
    p.join()

    print(p.exitcode)
```

Typical output:

```text id="zgr3ld"
3
```

Conventions:

| Exit code | Meaning |
|---:|---|
| `0` | Success |
| Positive integer | Program-defined failure |
| Negative value | Terminated by signal on Unix-like systems |

If a child raises an unhandled exception, it exits with a nonzero code and prints a traceback to its stderr.

## 59.12 Queues

`multiprocessing.Queue` provides process-safe message passing.

```python id="m39r1m"
from multiprocessing import Process, Queue

def worker(q):
    q.put("done")

if __name__ == "__main__":
    q = Queue()

    p = Process(target=worker, args=(q,))
    p.start()

    print(q.get())

    p.join()
```

A queue serializes objects with pickle and sends them through an inter-process communication channel.

Conceptually:

```text id="of5jc6"
producer process
    pickle object
    send bytes through pipe
consumer process
    receive bytes
    unpickle object
```

Queues are good for task pipelines and result collection.

## 59.13 Pipes

`multiprocessing.Pipe` creates connected endpoints.

```python id="8z923x"
from multiprocessing import Process, Pipe

def worker(conn):
    conn.send("hello")
    conn.close()

if __name__ == "__main__":
    parent_conn, child_conn = Pipe()

    p = Process(target=worker, args=(child_conn,))
    p.start()

    print(parent_conn.recv())

    p.join()
```

A pipe is lower-level than a queue.

Use pipes for direct two-party communication. Use queues for many producers or consumers.

## 59.14 Pools

`Pool` manages a group of worker processes.

```python id="cowyrd"
from multiprocessing import Pool

def square(x):
    return x * x

if __name__ == "__main__":
    with Pool(processes=4) as pool:
        results = pool.map(square, range(10))

    print(results)
```

Common pool methods:

| Method | Behavior |
|---|---|
| `map()` | Apply function to iterable and preserve order |
| `imap()` | Lazy ordered results |
| `imap_unordered()` | Lazy unordered results |
| `apply()` | Run one call |
| `apply_async()` | Submit one async call |
| `starmap()` | Like map with argument tuples |

Pool model:

```text id="dczxqp"
parent
    ↓ submit tasks
worker processes
    ↓ execute tasks
parent
    ↓ collect results
```

Pools are convenient, but serialization and scheduling overhead matter.

## 59.15 Chunking

For `Pool.map()`, work is sent in chunks.

Too-small chunks cause overhead. Too-large chunks hurt load balancing.

Example:

```python id="186b8u"
with Pool(4) as pool:
    results = pool.map(square, range(1000), chunksize=50)
```

Chunking tradeoff:

| Chunk size | Effect |
|---:|---|
| Small | Better load balancing, more IPC overhead |
| Large | Less IPC overhead, worse load balancing |
| Automatic | Usually acceptable default |

For short tasks, chunking can dominate performance.

## 59.16 Shared Values and Arrays

`multiprocessing.Value` and `Array` allocate shared memory wrappers for simple C-style data.

```python id="eqsw3z"
from multiprocessing import Process, Value

def worker(counter):
    with counter.get_lock():
        counter.value += 1

if __name__ == "__main__":
    counter = Value("i", 0)

    processes = [Process(target=worker, args=(counter,)) for _ in range(4)]

    for p in processes:
        p.start()

    for p in processes:
        p.join()

    print(counter.value)
```

The type code `"i"` means C `int`.

Shared objects need synchronization when mutated from multiple processes.

## 59.17 Shared Memory

`multiprocessing.shared_memory` provides named shared memory blocks.

```python id="ebdgx7"
from multiprocessing import shared_memory

shm = shared_memory.SharedMemory(create=True, size=10)

try:
    shm.buf[:5] = b"hello"
    print(bytes(shm.buf[:5]))
finally:
    shm.close()
    shm.unlink()
```

Shared memory is useful for large binary data because it avoids pickling and copying.

Conceptually:

```text id="l8cumq"
process A
    maps shared memory block

process B
    maps same shared memory block
```

Only bytes are shared. You must define the data layout and synchronization.

## 59.18 Managers

A manager process hosts Python objects and exposes proxies to other processes.

```python id="4ppssj"
from multiprocessing import Manager, Process

def worker(shared_list):
    shared_list.append("x")

if __name__ == "__main__":
    with Manager() as manager:
        xs = manager.list()

        p = Process(target=worker, args=(xs,))
        p.start()
        p.join()

        print(list(xs))
```

Managers are flexible but slower than queues or shared memory.

They work by proxy calls:

```text id="juqyzu"
worker process
    ↓ proxy method call
manager process
    ↓ mutates real object
worker process
    ↓ receives result
```

Use managers for coordination, not high-throughput data paths.

## 59.19 Locks

`multiprocessing` provides synchronization primitives.

```python id="lqrl39"
from multiprocessing import Process, Lock

def worker(lock):
    with lock:
        print("critical section")

if __name__ == "__main__":
    lock = Lock()

    processes = [Process(target=worker, args=(lock,)) for _ in range(4)]

    for p in processes:
        p.start()

    for p in processes:
        p.join()
```

Common primitives:

| Primitive | Purpose |
|---|---|
| `Lock` | Mutual exclusion |
| `RLock` | Reentrant mutual exclusion |
| `Semaphore` | Counting permits |
| `BoundedSemaphore` | Semaphore with upper bound |
| `Event` | One-bit notification |
| `Condition` | Wait and notify |
| `Barrier` | Group synchronization |

These map to OS-level or multiprocessing-managed synchronization mechanisms.

## 59.20 Daemon Processes

A process can be marked daemon.

```python id="pj2spg"
p = Process(target=worker)
p.daemon = True
```

Daemon child processes are terminated when the parent process exits.

They are not allowed to create child processes themselves.

Daemon processes are useful for auxiliary background work, but they are poor for work requiring reliable cleanup.

## 59.21 Termination

`terminate()` stops a process abruptly.

```python id="3g65hz"
p.terminate()
p.join()
```

This does not run normal Python cleanup reliably in the child.

Consequences may include:

```text id="u5d6n2"
finally blocks skipped
locks left acquired
queues corrupted
temporary files not cleaned
shared resources leaked
```

Prefer cooperative shutdown:

```python id="d90m1a"
from multiprocessing import Event

stop = Event()

def worker(stop):
    while not stop.is_set():
        do_work()
```

Then signal:

```python id="6yoakb"
stop.set()
```

and join.

## 59.22 Process Pools and Shutdown

A pool should be closed or used as a context manager.

Good:

```python id="f1vxsf"
with Pool() as pool:
    results = pool.map(square, range(10))
```

Manual form:

```python id="4hy7fg"
pool = Pool()
try:
    results = pool.map(square, range(10))
finally:
    pool.close()
    pool.join()
```

Important methods:

| Method | Meaning |
|---|---|
| `close()` | Stop accepting new work and finish existing work |
| `terminate()` | Stop workers immediately |
| `join()` | Wait for worker exit |

Using the context manager handles shutdown more safely.

## 59.23 Exceptions in Pools

Exceptions raised in workers are sent back to the parent.

```python id="z9j9kg"
from multiprocessing import Pool

def fail(x):
    raise ValueError(x)

if __name__ == "__main__":
    with Pool(2) as pool:
        try:
            pool.map(fail, [1, 2, 3])
        except ValueError as exc:
            print("caught:", exc)
```

The exception is reconstructed in the parent process.

Tracebacks may be less direct than ordinary in-process exceptions because execution happened elsewhere.

For async calls:

```python id="qn9xhe"
result = pool.apply_async(fail, (1,))
result.get()
```

`get()` re-raises the worker exception.

## 59.24 Initializers

Pools can run initializer functions in each worker.

```python id="dgof41"
from multiprocessing import Pool

_state = None

def init_worker(value):
    global _state
    _state = value

def work(x):
    return _state + x

if __name__ == "__main__":
    with Pool(initializer=init_worker, initargs=(10,)) as pool:
        print(pool.map(work, [1, 2, 3]))
```

Initializers are useful for per-process setup:

```text id="r9lac5"
open database connections
load models
initialize caches
set process-global config
ignore signals
configure logging
```

Remember each worker has separate state.

## 59.25 Logging

Multiprocessing complicates logging because several processes may write concurrently.

Naive logging:

```python id="me9a8d"
print("message")
```

can interleave output.

A robust pattern is to send log records through a queue to a single logging process or listener thread.

Conceptually:

```text id="27ddki"
worker process
    ↓ log record queue
logging listener
    ↓ writes files or stdout
```

This avoids corrupted or interleaved output.

## 59.26 Signals

Signals interact with multiprocessing at the OS level.

On Unix-like systems, signals are delivered to processes, not Python tasks. A parent may receive `SIGINT`, while children may also receive it depending on process groups and terminal state.

Robust process systems usually handle:

```text id="w2x8xf"
SIGINT
SIGTERM
child cleanup
queue draining
pool termination
graceful shutdown deadlines
```

Signal behavior differs across platforms, especially Windows.

## 59.27 File Descriptors and Handles

Child processes may inherit file descriptors or handles, depending on platform and start method.

With `fork`, the child inherits many open resources.

With `spawn`, inheritance is more controlled.

Inherited resources can include:

```text id="9vj7sd"
files
sockets
pipes
locks
database connections
random generator state
logging handlers
```

Some inherited resources are safe. Others must be reopened in the child.

This is one reason pool initializers are useful.

## 59.28 Randomness

With `fork`, pseudo-random generator state may be copied into children.

If multiple children inherit the same random state, they may produce identical sequences unless reseeded.

Example mitigation:

```python id="ldl2u0"
import os
import random

def init_worker():
    random.seed(os.getpid())
```

For cryptographic randomness, use APIs backed by OS entropy sources.

## 59.29 Memory Cost

Processes are heavier than threads.

Costs include:

```text id="fwhsc7"
separate interpreter state
separate heaps
separate module imports
serialization overhead
IPC buffers
OS process scheduling
```

With `fork`, copy-on-write reduces initial memory cost, but writing to inherited pages creates real copies.

With `spawn`, each process imports modules independently, often using more memory.

For large read-only data, strategies include:

```text id="cq7zu7"
load before fork to exploit copy-on-write
use shared memory
memory-map files
send small indexes instead of large objects
use worker initializers
```

## 59.30 Performance Model

`multiprocessing` improves performance when parallel work is large enough to amortize overhead.

Costs:

```text id="hhtop5"
process startup
pickling arguments
IPC transfer
unpickling arguments
scheduling work
pickling results
unpickling results
```

Good workload:

```text id="3q658e"
large CPU-bound tasks
limited communication
independent inputs
small result objects
long-lived worker pool
```

Poor workload:

```text id="0o614v"
tiny functions
large objects copied repeatedly
shared mutable state
high-frequency synchronization
tasks requiring many round trips
```

Use measurement before assuming multiprocessing helps.

## 59.31 Relationship to `subprocess`

`multiprocessing` runs Python functions in child Python processes.

`subprocess` runs external programs.

| Module | Main use |
|---|---|
| `multiprocessing` | Parallel Python execution |
| `subprocess` | Launch external commands |
| `threading` | Concurrent work in same process |
| `concurrent.futures` | Uniform executor interface |

Example contrast:

```python id="lj80v7"
# multiprocessing
Process(target=worker).start()

# subprocess
subprocess.run(["python", "script.py"])
```

`multiprocessing` provides Python object communication. `subprocess` provides process execution and byte streams.

## 59.32 Relationship to `concurrent.futures`

`concurrent.futures.ProcessPoolExecutor` provides a simpler pool API.

```python id="4tsy76"
from concurrent.futures import ProcessPoolExecutor

def square(x):
    return x * x

if __name__ == "__main__":
    with ProcessPoolExecutor() as ex:
        print(list(ex.map(square, range(10))))
```

It is built on process-based execution and shares many constraints:

```text id="ehclxx"
picklable functions
picklable arguments
process startup costs
separate memory
main module guard
```

Use `multiprocessing` when you need lower-level process control. Use `ProcessPoolExecutor` when a future-based API is enough.

## 59.33 Relationship to CPython Internals

`multiprocessing` touches several CPython internals:

| Internal area | Connection |
|---|---|
| Interpreter startup | Spawn starts fresh interpreters |
| Import system | Child imports main module and dependencies |
| Pickle | Arguments and results are serialized |
| GIL | Separate processes avoid one-interpreter GIL limits |
| Memory allocator | Each process has its own heap |
| File descriptors | Fork and spawn differ in inheritance |
| Signal handling | Parent and children receive process signals |
| Garbage collection | Each process collects its own objects |
| C extensions | Native state may or may not survive fork safely |

This module is a runtime boundary layer between Python and the operating system process model.

## 59.34 Common Mistakes

Common errors include:

| Mistake | Consequence |
|---|---|
| Missing `if __name__ == "__main__"` | Recursive process creation under spawn |
| Passing lambdas or local functions | Pickling errors |
| Sending huge objects repeatedly | Slow IPC and memory pressure |
| Forking after threads start | Deadlocks or inconsistent native state |
| Using managers for high-throughput data | Poor performance |
| Terminating workers abruptly | Corrupted queues or leaked resources |
| Assuming globals are shared | Incorrect results |
| Forgetting to join processes | Zombie processes or resource leaks |
| Printing from many workers | Interleaved output |

Most multiprocessing bugs come from forgetting the process boundary.

## 59.35 Practical Design Rules

Use these rules for robust multiprocessing code:

```text id="t67wtu"
put process creation behind the main guard
prefer top-level functions as targets
send small immutable messages
use queues for communication
use shared memory for large numeric or binary data
initialize per-process resources in workers
close and join pools
prefer cooperative shutdown
avoid global mutable state
measure serialization cost
```

A clean process program looks like message passing, not shared-object programming.

## 59.36 Chapter Summary

The `multiprocessing` module provides process-based concurrency for CPython. It creates child interpreters, communicates through queues, pipes, shared memory, managers, and serialized messages, and provides process pools for parallel work.

For CPython internals, `multiprocessing` is important because it bypasses one-interpreter GIL limits by using separate processes. It also exposes the runtime consequences of process isolation: pickling, interpreter startup, import behavior, memory separation, file descriptor inheritance, signal handling, and explicit communication.
