# 15. Lists, Tuples, and Arrays

# 15. Lists, Tuples, and Arrays

Lists, tuples, and array-like objects represent ordered collections. They all support indexed access, but they have different storage models, mutability rules, and performance tradeoffs.

A list is a mutable sequence of object references.

A tuple is an immutable sequence of object references.

An array-like object stores compact typed data or exposes a contiguous buffer.

These distinctions matter because CPython containers store references unless the type is specifically designed for raw storage.

## 15.1 Ordered Collections

Python has several ordered collection types.

| Type          |        Mutable | Stores            | Main use                        |
| ------------- | -------------: | ----------------- | ------------------------------- |
| `list`        |            Yes | Object references | General mutable sequence        |
| `tuple`       |             No | Object references | Fixed records, immutable groups |
| `array.array` |            Yes | Raw typed values  | Compact numeric storage         |
| `bytes`       |             No | Raw bytes         | Immutable binary data           |
| `bytearray`   |            Yes | Raw bytes         | Mutable binary data             |
| `memoryview`  | View-dependent | Raw buffer view   | Zero-copy buffer access         |

A list and a tuple can store objects of any type:

```python
xs = [1, "two", object()]
t = (1, "two", object())
```

An array stores values of one machine-level type:

```python
from array import array

nums = array("i", [1, 2, 3])
```

## 15.2 Lists Store References

A CPython list stores pointers to Python objects.

Conceptually:

```text
PyListObject
    PyVarObject header
        ob_size = logical length
    ob_item ----> array of PyObject *
    allocated = capacity
```

For:

```python
xs = [10, 20, 30]
```

the layout is:

```text
list object
    ob_size = 3
    allocated >= 3
    ob_item ----+
                |
                v
              [ptr][ptr][ptr]
                |    |    |
                v    v    v
               10   20   30
```

The list does not inline the integer payloads. It stores references to integer objects.

This is why a list can hold mixed types:

```python
xs = [1, "hello", None, []]
```

Every element slot has the same representation: a `PyObject *`.

## 15.3 List Length and Capacity

A list has a logical length and an allocated capacity.

```text
length
    number of active elements

capacity
    number of slots currently allocated in the internal array
```

Example:

```python
xs = []
xs.append("a")
xs.append("b")
xs.append("c")
```

The logical length is 3. The allocated capacity may be greater than 3.

This spare capacity allows efficient append. CPython does not normally resize the internal array on every append. It grows the array in larger steps.

Conceptually:

```text
append item
    if length < allocated:
        store item in spare slot
        length += 1
    else:
        allocate larger array
        copy references
        store item
        length += 1
```

The object identity of the list remains stable. The internal element array may move.

```python
xs = []
before = id(xs)

for i in range(1000):
    xs.append(i)

after = id(xs)
print(before == after)   # True
```

## 15.4 List Append

`list.append` adds one object reference to the end.

```python
xs = []
xs.append(1)
xs.append(2)
```

At the C level, append must:

```text
ensure there is enough capacity
increment the appended object's reference count
store the object pointer
increase the logical size
```

Appending an object does not copy it.

```python
item = []
xs = []
xs.append(item)

item.append(1)
print(xs)        # [[1]]
```

The list stores a reference to `item`.

## 15.5 List Over-Allocation

List over-allocation is the reason repeated append is efficient.

Without over-allocation, this loop would copy the whole array on many iterations:

```python
xs = []
for i in range(1_000_000):
    xs.append(i)
```

With over-allocation, CPython grows capacity by more than one slot at a time. The exact formula is an implementation detail, but the behavior gives amortized constant-time append.

Important result:

| Operation          |     Average cost |
| ------------------ | ---------------: |
| `append`           | `O(1)` amortized |
| `pop()` from end   |           `O(1)` |
| random indexing    |           `O(1)` |
| insert at front    |           `O(n)` |
| delete from middle |           `O(n)` |

Appending is cheap because it touches the end. Inserting at the front is expensive because many references must move.

## 15.6 List Indexing

List indexing is direct array access.

```python
x = xs[i]
```

Conceptually:

```text
check bounds
read ob_item[i]
return referenced object
```

This gives `O(1)` access.

Negative indexes are translated:

```python
xs[-1]
```

means:

```text
xs[len(xs) - 1]
```

after bounds handling.

Indexing returns a reference to the contained object. It does not copy that object.

```python
xs = [[1], [2]]
a = xs[0]
a.append(99)

print(xs)        # [[1, 99], [2]]
```

## 15.7 List Assignment

List item assignment replaces one stored reference.

```python
xs = [1, 2, 3]
xs[1] = "two"
```

At the C level, replacement must handle reference counts safely:

```text
increment new item
store new item pointer
decrement old item
```

The safe order matters when the new object and old object could be the same object.

Conceptually:

```c
Py_INCREF(new_item);
old_item = items[index];
items[index] = new_item;
Py_DECREF(old_item);
```

This preserves ownership invariants even if finalizers run during `Py_DECREF`.

## 15.8 List Slicing

Slicing creates a new list.

```python
xs = [[1], [2], [3]]
ys = xs[0:2]
```

The new list contains references to the same element objects.

```python
ys[0].append(99)

print(xs)    # [[1, 99], [2], [3]]
print(ys)    # [[1, 99], [2]]
```

This is a shallow copy.

The outer list is new. The contained objects are shared.

```text
xs ----> [ptr A][ptr B][ptr C]
ys ----> [ptr A][ptr B]
```

A deep copy requires recursive copying:

```python
import copy

ys = copy.deepcopy(xs)
```

## 15.9 List Insertion and Deletion

Inserting into the middle moves later references.

```python
xs = [1, 2, 3, 4]
xs.insert(1, "x")
```

Conceptually:

```text
[1][2][3][4]
insert at index 1
[1]["x"][2][3][4]
```

Elements from index 1 onward shift right.

Deleting from the middle shifts left:

```python
del xs[1]
```

These operations are `O(n)` because they may move many pointers.

The objects themselves are not copied. CPython moves references inside the internal array.

## 15.10 List Sorting

`list.sort` sorts the list in place.

```python
xs = [3, 1, 2]
xs.sort()
```

`sorted(xs)` creates a new list:

```python
ys = sorted(xs)
```

CPython uses Timsort for list sorting. It is stable, meaning equal keys retain their relative order.

```python
items = [
    ("a", 2),
    ("b", 1),
    ("c", 2),
]

items.sort(key=lambda x: x[1])
print(items)    # [('b', 1), ('a', 2), ('c', 2)]
```

Sorting rearranges references inside the list. It does not copy the pointed-to objects.

## 15.11 Tuples Store References Inline

A tuple is an immutable sequence.

Conceptually:

```text
PyTupleObject
    PyVarObject header
        ob_size = length
    ob_item[0]
    ob_item[1]
    ...
```

For:

```python
t = (10, 20, 30)
```

the layout is:

```text
tuple object
    ob_size = 3
    ob_item[0] ---> 10
    ob_item[1] ---> 20
    ob_item[2] ---> 30
```

Unlike list, tuple items are stored inline as part of the tuple allocation. There is no separate resizable array.

This is possible because tuple length cannot change after creation.

## 15.12 Tuple Immutability

Tuple immutability means the tuple’s item references cannot be changed.

```python
t = (1, 2, 3)
t[0] = 99        # TypeError
```

The tuple object still stores references. If one referenced object is mutable, that object can change.

```python
t = ([],)
t[0].append("x")

print(t)         # (['x'],)
```

The tuple still points to the same list. The list changed.

This distinction matters for hashing.

```python
hash((1, 2, 3))      # works
hash(([],))          # TypeError
```

A tuple is hashable only if all elements are hashable.

## 15.13 Tuple Creation

Tuple literals are common:

```python
t = (1, 2, 3)
```

Parentheses alone do not create a tuple. The comma does.

```python
a = (1)
b = (1,)

print(type(a))   # int
print(type(b))   # tuple
```

Tuples are heavily used internally for:

```text
function arguments
multiple return values
constant groups
dictionary keys
bytecode constants
C API argument packing
```

Because tuples are immutable and compact, they are natural containers for fixed groups.

## 15.14 Tuple Packing and Unpacking

Python uses tuples for packing multiple values.

```python
point = 10, 20
```

This creates a tuple.

Unpacking extracts elements:

```python
x, y = point
```

At the bytecode level, packing and unpacking have dedicated instructions because they are common.

Extended unpacking:

```python
first, *middle, last = [1, 2, 3, 4, 5]
```

creates a list for the starred target.

```python
print(middle)    # [2, 3, 4]
```

## 15.15 Lists vs Tuples

Lists and tuples have similar indexing behavior but different design goals.

| Feature    | List                     | Tuple                    |
| ---------- | ------------------------ | ------------------------ |
| Mutability | Mutable                  | Immutable                |
| Length     | Can change               | Fixed                    |
| Storage    | Separate resizable array | Inline item references   |
| Append     | Yes                      | No                       |
| Hashable   | No                       | If elements are hashable |
| Main use   | Mutable sequence         | Fixed group or record    |
| Syntax     | `[1, 2]`                 | `(1, 2)`                 |

Use a list when the collection changes.

Use a tuple when the collection represents a fixed grouping.

## 15.16 Arrays Store Raw Values

`array.array` stores compact typed values, not Python object references.

```python
from array import array

xs = array("i", [1, 2, 3])
```

The type code controls the C-level representation.

Examples:

| Type code | Meaning        |
| --------- | -------------- |
| `"b"`     | signed char    |
| `"B"`     | unsigned char  |
| `"h"`     | signed short   |
| `"H"`     | unsigned short |
| `"i"`     | signed int     |
| `"I"`     | unsigned int   |
| `"l"`     | signed long    |
| `"L"`     | unsigned long  |
| `"f"`     | float          |
| `"d"`     | double         |

An array of integers is much more compact than a list of Python integer objects.

Conceptually:

```text
list of ints
    [ptr][ptr][ptr]
      |    |    |
      v    v    v
     int  int  int

array("i")
    [raw int][raw int][raw int]
```

## 15.17 Array Tradeoffs

An array is compact but less general than a list.

| Feature               | List                                        | `array.array`                |
| --------------------- | ------------------------------------------- | ---------------------------- |
| Element types         | Any Python object                           | One primitive type           |
| Memory use            | Higher                                      | Lower                        |
| Numeric compactness   | Poor                                        | Good                         |
| Python object methods | Full object references                      | Values converted at boundary |
| Heterogeneous data    | Yes                                         | No                           |
| Buffer protocol       | No direct raw numeric buffer use like array | Yes                          |

Arrays are useful for binary I/O, compact numeric storage, and buffer-compatible data.

For heavy numerical computing, third-party arrays such as NumPy arrays are usually more powerful.

## 15.18 Bytes as an Array of Bytes

`bytes` is an immutable sequence of bytes.

```python
b = b"abc"

print(b[0])     # 97
```

It behaves sequence-like, but stores raw bytes.

`bytearray` is the mutable version:

```python
buf = bytearray(b"abc")
buf[0] = 65

print(buf)      # bytearray(b'Abc')
```

For binary data, `bytes` and `bytearray` are more compact than `list[int]`.

```python
data = [97, 98, 99]     # list of Python int objects
raw = b"abc"            # compact bytes
```

## 15.19 Memoryview and Zero-Copy Slicing

A `memoryview` can expose a view into an existing buffer.

```python
buf = bytearray(b"abcdef")
view = memoryview(buf)[2:5]

print(view.tobytes())   # b'cde'
```

The view does not copy the underlying bytes.

Mutating through a writable view affects the exporter:

```python
buf = bytearray(b"abcdef")
view = memoryview(buf)

view[0] = ord("X")

print(buf)              # bytearray(b'Xbcdef')
```

This is useful for binary parsers, network protocols, file formats, and extension modules.

## 15.20 Sequence Protocol

Lists, tuples, bytes, bytearrays, ranges, and arrays participate in sequence behavior.

Common operations:

```python
len(xs)
xs[i]
xs[i:j]
x in xs
for x in xs:
    ...
```

The type decides how to implement those operations.

At the C level, sequence behavior is exposed through sequence slots such as:

```text
sq_length
sq_item
sq_ass_item
sq_contains
sq_concat
sq_repeat
```

Immutable sequences omit assignment slots.

Mutable sequences provide assignment behavior.

## 15.21 Iteration

A list iterator stores a reference to the list and a current index.

```python
xs = [1, 2, 3]
it = iter(xs)

print(next(it))    # 1
```

Conceptually:

```text
list iterator
    sequence reference
    current index
```

Iteration over a tuple is similar.

Iteration over a list sees the list object, not a snapshot copy.

```python
xs = [1, 2, 3]

for x in xs:
    if x == 2:
        xs.append(4)
```

Mutating a list during iteration can produce confusing behavior. Avoid it unless the behavior is intentionally controlled.

## 15.22 Membership Testing

For a list or tuple:

```python
x in xs
```

performs a linear scan.

Conceptually:

```text
for each item:
    compare item == x
```

This is `O(n)`.

For membership-heavy workloads, use a set or dict:

```python
seen = set(xs)

if x in seen:
    ...
```

A set uses hashing and usually gives average `O(1)` membership.

## 15.23 Copying

List copy methods are shallow:

```python
a = [[1], [2]]

b = a.copy()
c = list(a)
d = a[:]
```

All three create a new outer list. Inner objects are shared.

```python
b[0].append(99)

print(a)    # [[1, 99], [2]]
```

Tuple copying usually returns the same object when no real copy is needed:

```python
t = (1, 2, 3)
u = tuple(t)

print(t is u)    # True
```

Because tuples are immutable, this reuse is safe.

## 15.24 Repetition

Sequence repetition repeats references.

```python
xs = [[]] * 3
xs[0].append(1)

print(xs)        # [[1], [1], [1]]
```

All three slots point to the same inner list.

Correct pattern:

```python
xs = [[] for _ in range(3)]
```

Now each element is a different list.

This is a reference model issue, not a list multiplication bug.

## 15.25 Sorting Lists of Objects

Sorting uses comparisons or a key function.

```python
items = ["aaa", "b", "cc"]
items.sort(key=len)
```

With `key`, CPython computes keys and sorts using those keys.

This avoids repeated recomputation of expensive comparison logic.

Stable sorting allows multi-pass sorting:

```python
items.sort(key=lambda x: x.name)
items.sort(key=lambda x: x.group)
```

After the second sort, items with equal group retain name order from the first sort.

## 15.26 List Comprehensions

A list comprehension builds a list directly.

```python
squares = [x * x for x in range(10)]
```

It is usually faster and clearer than manual append loops for simple transformations.

Equivalent shape:

```python
squares = []
for x in range(10):
    squares.append(x * x)
```

The comprehension has dedicated bytecode patterns and keeps the loop body compact.

## 15.27 Generator Expressions vs Lists

A list comprehension creates all elements immediately.

```python
xs = [x * x for x in range(1_000_000)]
```

A generator expression produces values lazily.

```python
it = (x * x for x in range(1_000_000))
```

Use a list when you need indexing, repeated traversal, or materialized data.

Use a generator when you need one pass and want to avoid storing all results.

## 15.28 Common Performance Patterns

| Task                     | Better choice          | Reason                     |
| ------------------------ | ---------------------- | -------------------------- |
| Append many items        | `list.append`          | Amortized `O(1)`           |
| Build from iterable      | `list(iterable)`       | Efficient constructor path |
| Fixed record             | `tuple`                | Immutable and compact      |
| Membership test          | `set`                  | Hash-based lookup          |
| Queue from left          | `collections.deque`    | Efficient left pops        |
| Numeric compact storage  | `array.array` or NumPy | Raw typed storage          |
| Binary data              | `bytes` or `bytearray` | Compact byte storage       |
| Zero-copy binary slicing | `memoryview`           | Avoids allocation          |
| Many concatenations      | `join`                 | Avoids repeated copying    |

Lists are general-purpose. They are not optimal for every ordered workload.

## 15.29 Common Mistakes

A few common mistakes come directly from CPython’s reference-based container model.

Shared inner lists:

```python
matrix = [[0] * 3] * 3
```

Better:

```python
matrix = [[0] * 3 for _ in range(3)]
```

Using list membership for large lookup sets:

```python
if user_id in user_ids_list:
    ...
```

Better:

```python
user_ids = set(user_ids_list)
if user_id in user_ids:
    ...
```

Repeated front deletion:

```python
while xs:
    xs.pop(0)
```

Better:

```python
from collections import deque

q = deque(xs)
while q:
    q.popleft()
```

Repeated string or bytes concatenation:

```python
s = ""
for part in parts:
    s += part
```

Better:

```python
s = "".join(parts)
```

## 15.30 C API Sketch

List creation:

```c
PyObject *list = PyList_New(0);
if (list == NULL) {
    return NULL;
}
```

Append:

```c
PyObject *item = PyLong_FromLong(42);
if (item == NULL) {
    Py_DECREF(list);
    return NULL;
}

if (PyList_Append(list, item) < 0) {
    Py_DECREF(item);
    Py_DECREF(list);
    return NULL;
}

Py_DECREF(item);
```

`PyList_Append` increments the item reference internally. The local owned reference must still be released.

Tuple creation:

```c
PyObject *tuple = PyTuple_New(1);
if (tuple == NULL) {
    return NULL;
}

PyObject *item = PyLong_FromLong(42);
if (item == NULL) {
    Py_DECREF(tuple);
    return NULL;
}

PyTuple_SET_ITEM(tuple, 0, item);
```

`PyTuple_SET_ITEM` steals the reference. Do not `Py_DECREF(item)` after successful use.

This difference is one of the classic C API ownership traps.

## 15.31 Mental Model

Use this model:

```text
list
    mutable sequence
    separate resizable array of PyObject *
    over-allocates for efficient append

tuple
    immutable sequence
    inline fixed array of PyObject *
    compact for fixed groups

array
    mutable typed sequence
    raw value storage
    compact but homogeneous

bytes
    immutable raw byte sequence

bytearray
    mutable raw byte sequence

memoryview
    zero-copy view over buffer-exporting object
```

For lists and tuples, remember that elements are references. Copying the container does not copy the objects inside it.

## 15.32 Summary

Lists and tuples are reference containers. A list is mutable and uses a separately allocated resizable array. A tuple is immutable and stores item references inline in a fixed-size allocation.

Arrays and bytes-like objects store compact raw data rather than references to arbitrary Python objects. They are better suited for binary data and typed numeric storage.

Understanding these layouts explains most performance and correctness behavior: append efficiency, front insertion cost, shallow copies, repeated-reference pitfalls, tuple immutability, buffer views, and the difference between general object containers and compact raw storage.