# 65. Buffer Protocol

# 65. Buffer Protocol

The buffer protocol is CPython’s low-level interface for sharing raw memory between Python objects without copying. It allows one object to expose a contiguous or strided block of memory, and another object to read or write that memory through a common C structure.

The protocol is used by objects such as:

| Object | Buffer use |
|---|---|
| `bytes` | Read-only contiguous byte storage |
| `bytearray` | Mutable contiguous byte storage |
| `memoryview` | General Python-level buffer view |
| `array.array` | Typed contiguous storage |
| `mmap.mmap` | Memory-mapped file storage |
| `numpy.ndarray` | Typed, shaped, strided memory |
| extension objects | Custom binary storage |

The buffer protocol is one of the main reasons Python can interoperate efficiently with binary data, numerical arrays, images, files, sockets, compression libraries, codecs, and native extensions.

## 65.1 Why the Buffer Protocol Exists

Python objects usually hide their internal representation. A `bytes` object, a `bytearray`, an image buffer, and a NumPy array all have different implementation details.

But native code often needs direct access to memory:

```text
hash this byte range
compress this block
decode this image
write this array to a file
pass this tensor to native code
parse this packet without copying
```

Without a common protocol, each library would need custom APIs for each object type.

The buffer protocol provides one uniform view:

```text
Python object
    exposes memory
        through Py_buffer
            consumed by C code
```

This lets native code operate on many object types through the same interface.

## 65.2 Exporters and Consumers

The protocol has two sides.

| Role | Meaning |
|---|---|
| Exporter | Object that exposes memory |
| Consumer | Code that requests and uses memory |

Examples:

| Exporter | Consumer |
|---|---|
| `bytes` | hashing function |
| `bytearray` | compression library |
| `array.array` | binary writer |
| `mmap.mmap` | parser |
| NumPy array | native numerical kernel |
| custom extension type | Python `memoryview` |

A consumer asks an exporter for a view. The exporter fills a `Py_buffer` structure. The consumer uses it. When finished, the consumer releases it.

```text
consumer
    PyObject_GetBuffer(obj, &view, flags)
        exporter fills Py_buffer
    use view.buf, view.len, shape, strides
    PyBuffer_Release(&view)
```

## 65.3 `Py_buffer`

The central structure is `Py_buffer`.

Conceptually:

```c
typedef struct {
    void *buf;
    PyObject *obj;
    Py_ssize_t len;
    Py_ssize_t itemsize;
    int readonly;
    int ndim;
    char *format;
    Py_ssize_t *shape;
    Py_ssize_t *strides;
    Py_ssize_t *suboffsets;
    void *internal;
} Py_buffer;
```

Important fields:

| Field | Meaning |
|---|---|
| `buf` | Pointer to first accessible byte |
| `obj` | Exporting object |
| `len` | Total logical byte length |
| `itemsize` | Size of one element |
| `readonly` | Whether writes are forbidden |
| `ndim` | Number of dimensions |
| `format` | Element format string |
| `shape` | Length of each dimension |
| `strides` | Byte step per dimension |
| `suboffsets` | Indirect buffer support |
| `internal` | Exporter-private data |

A simple byte buffer may use only `buf`, `len`, `readonly`, and `obj`.

A multidimensional array needs `ndim`, `itemsize`, `format`, `shape`, and `strides`.

## 65.4 Simple Contiguous Buffers

A `bytes` object exposes a read-only contiguous buffer.

```python
data = b"hello"
view = memoryview(data)
print(view.readonly)
print(view.nbytes)
```

At the C level:

```c
Py_buffer view;

if (PyObject_GetBuffer(obj, &view, PyBUF_SIMPLE) < 0) {
    return NULL;
}

/* view.buf points to bytes */
 /* view.len is byte length */

PyBuffer_Release(&view);
```

`PyBUF_SIMPLE` requests a simple byte-oriented buffer.

The consumer should treat the memory as a flat array of bytes.

## 65.5 Writable Buffers

Some objects expose mutable memory.

Example:

```python
data = bytearray(b"hello")
view = memoryview(data)
view[0] = ord("H")
print(data)
```

Native code can request a writable buffer:

```c
Py_buffer view;

if (PyObject_GetBuffer(obj, &view, PyBUF_WRITABLE) < 0) {
    return NULL;
}

char *p = (char *)view.buf;
p[0] = 'H';

PyBuffer_Release(&view);
```

If the exporter is read-only, the request fails and sets an exception.

This prevents code from mutating immutable objects such as `bytes`.

## 65.6 Buffer Flags

Consumers specify what kind of view they need.

Common flags:

| Flag | Meaning |
|---|---|
| `PyBUF_SIMPLE` | Flat byte buffer |
| `PyBUF_WRITABLE` | Writable buffer required |
| `PyBUF_FORMAT` | Request element format string |
| `PyBUF_ND` | Request dimensionality and shape |
| `PyBUF_STRIDES` | Request strides |
| `PyBUF_C_CONTIGUOUS` | Require C-contiguous layout |
| `PyBUF_F_CONTIGUOUS` | Require Fortran-contiguous layout |
| `PyBUF_ANY_CONTIGUOUS` | Require any contiguous layout |
| `PyBUF_FULL` | Request full buffer information |

A consumer should request the weakest view it needs.

For example, a hashing function only needs bytes:

```c
PyBUF_SIMPLE
```

A numerical kernel may require:

```c
PyBUF_FORMAT | PyBUF_ND | PyBUF_STRIDES
```

A C library requiring flat contiguous memory should ask for contiguity explicitly.

## 65.7 Contiguous vs Strided Memory

Not all buffers are contiguous.

A one-dimensional contiguous buffer:

```text
[ a b c d e f ]
```

has one linear memory range.

A strided view may skip bytes:

```text
[ a _ b _ c _ d _ ]
```

A two-dimensional array can have row strides:

```text
row 0: a b c
row 1: d e f
row 2: g h i
```

C-contiguous layout stores rows next to each other:

```text
a b c d e f g h i
```

Fortran-contiguous layout stores columns next to each other:

```text
a d g b e h c f i
```

The buffer protocol represents this using:

```text
shape
strides
itemsize
```

## 65.8 Shape and Strides

For a two-dimensional array:

```text
shape = [3, 4]
itemsize = 8
```

means:

```text
3 rows
4 columns
8 bytes per element
```

Strides describe how many bytes to move to advance along each dimension.

C-contiguous double array:

```text
shape   = [3, 4]
strides = [32, 8]
```

because:

```text
next row    = 4 elements * 8 bytes = 32 bytes
next column = 1 element * 8 bytes = 8 bytes
```

Element address:

```text
address(i, j) = buf + i * strides[0] + j * strides[1]
```

This lets one protocol describe compact arrays, slices, transposes, channels, images, and tensor-like data.

## 65.9 Format Strings

The `format` field describes the type of each element.

Examples:

| Format | Meaning |
|---|---|
| `B` | unsigned byte |
| `b` | signed byte |
| `h` | short |
| `i` | int |
| `l` | long |
| `f` | float |
| `d` | double |

A consumer that cares about element type should request `PyBUF_FORMAT` and validate it.

Example:

```c
if (view.itemsize != sizeof(double) ||
    view.format == NULL ||
    strcmp(view.format, "d") != 0) {
    PyBuffer_Release(&view);
    PyErr_SetString(PyExc_TypeError, "expected double buffer");
    return NULL;
}
```

Do not assume a buffer contains a particular type unless the protocol data confirms it.

## 65.10 `memoryview`

`memoryview` is the Python-level object for inspecting and slicing buffers.

```python
data = bytearray(b"abcdef")
v = memoryview(data)

print(v[0])
print(v[1:4])
```

`memoryview` does not copy the underlying memory. It references the exporter.

This matters:

```python
data = bytearray(b"abc")
v = memoryview(data)

v[0] = ord("A")
print(data)
```

The output:

```text
bytearray(b'Abc')
```

The view modifies the original object.

## 65.11 Lifetime Rules

A buffer view must keep the exporter alive.

In `Py_buffer`, the `obj` field stores a reference to the exporting object. The consumer must release the buffer:

```c
PyBuffer_Release(&view);
```

This call releases exporter-owned state and decrements the reference held by the view.

Common bug:

```c
Py_buffer view;

if (PyObject_GetBuffer(obj, &view, PyBUF_SIMPLE) < 0) {
    return NULL;
}

/* use view */

return PyLong_FromLong(view.len);  /* missing PyBuffer_Release */
```

Correct:

```c
Py_buffer view;

if (PyObject_GetBuffer(obj, &view, PyBUF_SIMPLE) < 0) {
    return NULL;
}

PyObject *result = PyLong_FromSsize_t(view.len);

PyBuffer_Release(&view);

return result;
```

Every successful `PyObject_GetBuffer` must have a matching `PyBuffer_Release`.

## 65.12 Exporter Restrictions During Active Views

An exporter must not invalidate memory while consumers hold active views.

For example, a `bytearray` cannot be resized while exported buffers exist:

```python
data = bytearray(b"abc")
v = memoryview(data)

data.append(100)
```

This raises an error because resizing might move memory and invalidate the view.

Custom exporters must obey the same principle. Once they export a buffer, they must keep the memory valid until the consumer releases it.

## 65.13 Writing a Buffer Consumer

A simple consumer that sums bytes:

```c
static PyObject *
sum_bytes(PyObject *self, PyObject *args)
{
    PyObject *obj;
    Py_buffer view;

    if (!PyArg_ParseTuple(args, "O", &obj)) {
        return NULL;
    }

    if (PyObject_GetBuffer(obj, &view, PyBUF_SIMPLE) < 0) {
        return NULL;
    }

    unsigned char *p = (unsigned char *)view.buf;
    Py_ssize_t total = 0;

    for (Py_ssize_t i = 0; i < view.len; i++) {
        total += p[i];
    }

    PyBuffer_Release(&view);

    return PyLong_FromSsize_t(total);
}
```

Python usage:

```python
sum_bytes(b"abc")
sum_bytes(bytearray(b"abc"))
sum_bytes(memoryview(b"abc"))
```

The same C function works with many exporters.

## 65.14 Handling Errors in Buffer Consumers

Always release the buffer on every path after acquisition.

```c
static PyObject *
first_byte(PyObject *self, PyObject *args)
{
    PyObject *obj;
    Py_buffer view;

    if (!PyArg_ParseTuple(args, "O", &obj)) {
        return NULL;
    }

    if (PyObject_GetBuffer(obj, &view, PyBUF_SIMPLE) < 0) {
        return NULL;
    }

    if (view.len == 0) {
        PyBuffer_Release(&view);
        PyErr_SetString(PyExc_ValueError, "empty buffer");
        return NULL;
    }

    unsigned char value = ((unsigned char *)view.buf)[0];

    PyBuffer_Release(&view);

    return PyLong_FromUnsignedLong(value);
}
```

The pattern mirrors reference cleanup.

## 65.15 Requiring Contiguous Memory

Some C libraries require a single contiguous memory block.

Ask explicitly:

```c
if (PyObject_GetBuffer(obj, &view, PyBUF_SIMPLE) < 0) {
    return NULL;
}

if (!PyBuffer_IsContiguous(&view, 'C')) {
    PyBuffer_Release(&view);
    PyErr_SetString(PyExc_BufferError, "expected C-contiguous buffer");
    return NULL;
}
```

For non-contiguous input, consumers can either reject it or copy it into a contiguous buffer.

Rejecting is simpler. Copying is more flexible.

## 65.16 Copying from Non-Contiguous Buffers

CPython provides helpers for copying buffer data into contiguous storage.

Conceptual pattern:

```c
Py_buffer view;

if (PyObject_GetBuffer(obj, &view, PyBUF_FULL_RO) < 0) {
    return NULL;
}

char *copy = PyMem_Malloc(view.len);
if (copy == NULL) {
    PyBuffer_Release(&view);
    return PyErr_NoMemory();
}

if (PyBuffer_ToContiguous(copy, &view, view.len, 'C') < 0) {
    PyMem_Free(copy);
    PyBuffer_Release(&view);
    return NULL;
}

/* use copy */

PyMem_Free(copy);
PyBuffer_Release(&view);
```

This keeps the C library interface simple while accepting strided inputs.

## 65.17 Writing a Buffer Exporter

A custom type exports a buffer by implementing `bf_getbuffer` and `bf_releasebuffer`.

Example object:

```c
typedef struct {
    PyObject_HEAD
    char *data;
    Py_ssize_t len;
    int exports;
} BlobObject;
```

Buffer methods:

```c
static int
Blob_getbuffer(BlobObject *self, Py_buffer *view, int flags)
{
    if (view == NULL) {
        PyErr_SetString(PyExc_BufferError, "NULL view");
        return -1;
    }

    return PyBuffer_FillInfo(
        view,
        (PyObject *)self,
        self->data,
        self->len,
        0,
        flags
    );
}

static void
Blob_releasebuffer(BlobObject *self, Py_buffer *view)
{
    /* optional exporter cleanup */
}
```

Attach through `PyBufferProcs`:

```c
static PyBufferProcs Blob_bufferprocs = {
    .bf_getbuffer = (getbufferproc)Blob_getbuffer,
    .bf_releasebuffer = (releasebufferproc)Blob_releasebuffer,
};
```

Then in the type:

```c
.tp_as_buffer = &Blob_bufferprocs,
```

Now Python can do:

```python
b = Blob(...)
v = memoryview(b)
```

## 65.18 Tracking Active Exports

If an exporter owns resizable memory, it should track active exports.

```c
static int
Blob_getbuffer(BlobObject *self, Py_buffer *view, int flags)
{
    int ret = PyBuffer_FillInfo(
        view,
        (PyObject *)self,
        self->data,
        self->len,
        0,
        flags
    );

    if (ret == 0) {
        self->exports++;
    }

    return ret;
}

static void
Blob_releasebuffer(BlobObject *self, Py_buffer *view)
{
    self->exports--;
}
```

Before resizing:

```c
if (self->exports > 0) {
    PyErr_SetString(
        PyExc_BufferError,
        "cannot resize while buffers are exported"
    );
    return NULL;
}
```

This prevents dangling pointers.

## 65.19 Read-Only Exporters

The `readonly` argument to `PyBuffer_FillInfo` controls writability.

Read-only:

```c
PyBuffer_FillInfo(
    view,
    (PyObject *)self,
    self->data,
    self->len,
    1,
    flags
);
```

Writable:

```c
PyBuffer_FillInfo(
    view,
    (PyObject *)self,
    self->data,
    self->len,
    0,
    flags
);
```

If a consumer requests `PyBUF_WRITABLE` from a read-only exporter, the request fails.

This is how immutable binary objects protect their storage.

## 65.20 Multidimensional Exporters

For arrays, exporters must provide shape, strides, item size, and format.

Example for a 2D `double` matrix:

```text
rows = 3
cols = 4
itemsize = 8
shape = [3, 4]
strides = [32, 8]
format = "d"
```

The exporter must ensure that the shape and stride arrays remain valid while the view exists. They are often stored in the object itself or in exporter-private memory.

A full exporter must handle flag requests correctly. If the consumer asks for shape or format and the exporter cannot provide them, it should fail with `BufferError`.

## 65.21 Buffer Protocol and Zero Copy

The buffer protocol enables zero-copy paths.

Example:

```text
socket reads bytes
    ↓
bytearray stores mutable memory
    ↓
memoryview slices without copy
    ↓
parser reads view
    ↓
native extension decodes fields
```

For large data, zero-copy can dominate performance.

Copying a 1 GB buffer costs both memory bandwidth and allocation overhead. Passing a view costs pointer setup and lifetime tracking.

This is why the protocol matters for:

```text
images
audio
video
tensors
network packets
database pages
compressed blocks
memory-mapped files
```

## 65.22 Buffer Protocol and the GIL

Acquiring and releasing buffers touches Python objects, so it normally requires the GIL.

But after acquiring a stable buffer, native code may release the GIL while processing raw memory, if it does not call Python APIs and the exporter guarantees valid storage.

Pattern:

```c
if (PyObject_GetBuffer(obj, &view, PyBUF_SIMPLE) < 0) {
    return NULL;
}

Py_BEGIN_ALLOW_THREADS

process_raw_memory(view.buf, view.len);

Py_END_ALLOW_THREADS

PyBuffer_Release(&view);
```

This allows CPU-bound native processing to run without blocking other Python threads from acquiring the GIL.

## 65.23 Buffer Protocol vs Sequence Protocol

The sequence protocol exposes elements as Python objects.

```python
x = obj[i]
```

The buffer protocol exposes raw memory.

```text
buf + offset
```

Comparison:

| Feature | Sequence protocol | Buffer protocol |
|---|---|---|
| Access level | Python objects | Raw memory |
| Copy-free binary access | No | Yes |
| Type metadata | Python-level | Format string |
| Multidimensional layout | Indirect | Shape and strides |
| Use case | General containers | Binary and numerical data |

A list of integers does not normally expose a buffer because its memory contains pointers to Python objects, not raw integer values.

An `array.array("i")` can expose a buffer because its memory stores raw C integers.

## 65.24 Buffer Protocol vs C API Type-Specific Access

Some APIs expose direct access to specific object types.

Example:

```c
char *p = PyBytes_AS_STRING(obj);
Py_ssize_t n = PyBytes_GET_SIZE(obj);
```

This works only for `bytes`.

The buffer protocol works across many exporters.

| Approach | Scope |
|---|---|
| `PyBytes_AS_STRING` | `bytes` only |
| `PyByteArray_AS_STRING` | `bytearray` only |
| `PyObject_GetBuffer` | Any buffer exporter |

Prefer the buffer protocol when accepting generic binary data.

## 65.25 Common Buffer Bugs

| Bug | Cause |
|---|---|
| Missing `PyBuffer_Release` | Exporter stays pinned |
| Writing to read-only memory | Missing writable check |
| Assuming contiguity | Ignoring strides |
| Assuming element type | Ignoring `format` |
| Resizing during export | Invalidates active views |
| Releasing GIL too early | Python API calls without GIL |
| Returning pointer after release | Dangling pointer |
| Storing `view.buf` long-term | Buffer lifetime violation |

The safest rule: treat `view.buf` as valid only between successful `PyObject_GetBuffer` and matching `PyBuffer_Release`.

## 65.26 Practical Design Guidelines

For consumers:

| Need | Request |
|---|---|
| Raw bytes only | `PyBUF_SIMPLE` |
| Must write | `PyBUF_WRITABLE` |
| Must know type | `PyBUF_FORMAT` |
| Must support arrays | `PyBUF_ND | PyBUF_STRIDES | PyBUF_FORMAT` |
| Must call C library | Require contiguity or copy |

For exporters:

| Requirement | Rule |
|---|---|
| Memory can resize | Track active exports |
| Object stores references | Add GC support separately |
| Memory read-only | Mark buffer read-only |
| Multidimensional data | Provide stable shape and strides |
| Custom allocation | Keep storage valid until release |

## 65.27 Chapter Summary

The buffer protocol is CPython’s common interface for exposing raw memory. It lets objects such as `bytes`, `bytearray`, `memoryview`, `array.array`, `mmap`, numerical arrays, and custom extension types share memory with native code without copying.

Consumers acquire a `Py_buffer`, use the memory and metadata, then release it. Exporters fill the view and keep memory valid while it is exported.

The protocol supports read-only memory, writable memory, contiguous buffers, multidimensional arrays, typed elements, strides, and zero-copy slicing. It is central to CPython’s performance story for binary data and native interoperability.
