Skip to content

15. Lists, Tuples, and Arrays

PyListObject dynamic resizing strategy, PyTupleObject immutability, and the array module's typed buffer.

Lists, tuples, and array-like objects represent ordered collections. They all support indexed access, but they have different storage models, mutability rules, and performance tradeoffs.

A list is a mutable sequence of object references.

A tuple is an immutable sequence of object references.

An array-like object stores compact typed data or exposes a contiguous buffer.

These distinctions matter because CPython containers store references unless the type is specifically designed for raw storage.

15.1 Ordered Collections

Python has several ordered collection types.

TypeMutableStoresMain use
listYesObject referencesGeneral mutable sequence
tupleNoObject referencesFixed records, immutable groups
array.arrayYesRaw typed valuesCompact numeric storage
bytesNoRaw bytesImmutable binary data
bytearrayYesRaw bytesMutable binary data
memoryviewView-dependentRaw buffer viewZero-copy buffer access

A list and a tuple can store objects of any type:

xs = [1, "two", object()]
t = (1, "two", object())

An array stores values of one machine-level type:

from array import array

nums = array("i", [1, 2, 3])

15.2 Lists Store References

A CPython list stores pointers to Python objects.

Conceptually:

PyListObject
    PyVarObject header
        ob_size = logical length
    ob_item ----> array of PyObject *
    allocated = capacity

For:

xs = [10, 20, 30]

the layout is:

list object
    ob_size = 3
    allocated >= 3
    ob_item ----+
                |
                v
              [ptr][ptr][ptr]
                |    |    |
                v    v    v
               10   20   30

The list does not inline the integer payloads. It stores references to integer objects.

This is why a list can hold mixed types:

xs = [1, "hello", None, []]

Every element slot has the same representation: a PyObject *.

15.3 List Length and Capacity

A list has a logical length and an allocated capacity.

length
    number of active elements

capacity
    number of slots currently allocated in the internal array

Example:

xs = []
xs.append("a")
xs.append("b")
xs.append("c")

The logical length is 3. The allocated capacity may be greater than 3.

This spare capacity allows efficient append. CPython does not normally resize the internal array on every append. It grows the array in larger steps.

Conceptually:

append item
    if length < allocated:
        store item in spare slot
        length += 1
    else:
        allocate larger array
        copy references
        store item
        length += 1

The object identity of the list remains stable. The internal element array may move.

xs = []
before = id(xs)

for i in range(1000):
    xs.append(i)

after = id(xs)
print(before == after)   # True

15.4 List Append

list.append adds one object reference to the end.

xs = []
xs.append(1)
xs.append(2)

At the C level, append must:

ensure there is enough capacity
increment the appended object's reference count
store the object pointer
increase the logical size

Appending an object does not copy it.

item = []
xs = []
xs.append(item)

item.append(1)
print(xs)        # [[1]]

The list stores a reference to item.

15.5 List Over-Allocation

List over-allocation is the reason repeated append is efficient.

Without over-allocation, this loop would copy the whole array on many iterations:

xs = []
for i in range(1_000_000):
    xs.append(i)

With over-allocation, CPython grows capacity by more than one slot at a time. The exact formula is an implementation detail, but the behavior gives amortized constant-time append.

Important result:

OperationAverage cost
appendO(1) amortized
pop() from endO(1)
random indexingO(1)
insert at frontO(n)
delete from middleO(n)

Appending is cheap because it touches the end. Inserting at the front is expensive because many references must move.

15.6 List Indexing

List indexing is direct array access.

x = xs[i]

Conceptually:

check bounds
read ob_item[i]
return referenced object

This gives O(1) access.

Negative indexes are translated:

xs[-1]

means:

xs[len(xs) - 1]

after bounds handling.

Indexing returns a reference to the contained object. It does not copy that object.

xs = [[1], [2]]
a = xs[0]
a.append(99)

print(xs)        # [[1, 99], [2]]

15.7 List Assignment

List item assignment replaces one stored reference.

xs = [1, 2, 3]
xs[1] = "two"

At the C level, replacement must handle reference counts safely:

increment new item
store new item pointer
decrement old item

The safe order matters when the new object and old object could be the same object.

Conceptually:

Py_INCREF(new_item);
old_item = items[index];
items[index] = new_item;
Py_DECREF(old_item);

This preserves ownership invariants even if finalizers run during Py_DECREF.

15.8 List Slicing

Slicing creates a new list.

xs = [[1], [2], [3]]
ys = xs[0:2]

The new list contains references to the same element objects.

ys[0].append(99)

print(xs)    # [[1, 99], [2], [3]]
print(ys)    # [[1, 99], [2]]

This is a shallow copy.

The outer list is new. The contained objects are shared.

xs ----> [ptr A][ptr B][ptr C]
ys ----> [ptr A][ptr B]

A deep copy requires recursive copying:

import copy

ys = copy.deepcopy(xs)

15.9 List Insertion and Deletion

Inserting into the middle moves later references.

xs = [1, 2, 3, 4]
xs.insert(1, "x")

Conceptually:

[1][2][3][4]
insert at index 1
[1]["x"][2][3][4]

Elements from index 1 onward shift right.

Deleting from the middle shifts left:

del xs[1]

These operations are O(n) because they may move many pointers.

The objects themselves are not copied. CPython moves references inside the internal array.

15.10 List Sorting

list.sort sorts the list in place.

xs = [3, 1, 2]
xs.sort()

sorted(xs) creates a new list:

ys = sorted(xs)

CPython uses Timsort for list sorting. It is stable, meaning equal keys retain their relative order.

items = [
    ("a", 2),
    ("b", 1),
    ("c", 2),
]

items.sort(key=lambda x: x[1])
print(items)    # [('b', 1), ('a', 2), ('c', 2)]

Sorting rearranges references inside the list. It does not copy the pointed-to objects.

15.11 Tuples Store References Inline

A tuple is an immutable sequence.

Conceptually:

PyTupleObject
    PyVarObject header
        ob_size = length
    ob_item[0]
    ob_item[1]
    ...

For:

t = (10, 20, 30)

the layout is:

tuple object
    ob_size = 3
    ob_item[0] ---> 10
    ob_item[1] ---> 20
    ob_item[2] ---> 30

Unlike list, tuple items are stored inline as part of the tuple allocation. There is no separate resizable array.

This is possible because tuple length cannot change after creation.

15.12 Tuple Immutability

Tuple immutability means the tuple’s item references cannot be changed.

t = (1, 2, 3)
t[0] = 99        # TypeError

The tuple object still stores references. If one referenced object is mutable, that object can change.

t = ([],)
t[0].append("x")

print(t)         # (['x'],)

The tuple still points to the same list. The list changed.

This distinction matters for hashing.

hash((1, 2, 3))      # works
hash(([],))          # TypeError

A tuple is hashable only if all elements are hashable.

15.13 Tuple Creation

Tuple literals are common:

t = (1, 2, 3)

Parentheses alone do not create a tuple. The comma does.

a = (1)
b = (1,)

print(type(a))   # int
print(type(b))   # tuple

Tuples are heavily used internally for:

function arguments
multiple return values
constant groups
dictionary keys
bytecode constants
C API argument packing

Because tuples are immutable and compact, they are natural containers for fixed groups.

15.14 Tuple Packing and Unpacking

Python uses tuples for packing multiple values.

point = 10, 20

This creates a tuple.

Unpacking extracts elements:

x, y = point

At the bytecode level, packing and unpacking have dedicated instructions because they are common.

Extended unpacking:

first, *middle, last = [1, 2, 3, 4, 5]

creates a list for the starred target.

print(middle)    # [2, 3, 4]

15.15 Lists vs Tuples

Lists and tuples have similar indexing behavior but different design goals.

FeatureListTuple
MutabilityMutableImmutable
LengthCan changeFixed
StorageSeparate resizable arrayInline item references
AppendYesNo
HashableNoIf elements are hashable
Main useMutable sequenceFixed group or record
Syntax[1, 2](1, 2)

Use a list when the collection changes.

Use a tuple when the collection represents a fixed grouping.

15.16 Arrays Store Raw Values

array.array stores compact typed values, not Python object references.

from array import array

xs = array("i", [1, 2, 3])

The type code controls the C-level representation.

Examples:

Type codeMeaning
"b"signed char
"B"unsigned char
"h"signed short
"H"unsigned short
"i"signed int
"I"unsigned int
"l"signed long
"L"unsigned long
"f"float
"d"double

An array of integers is much more compact than a list of Python integer objects.

Conceptually:

list of ints
    [ptr][ptr][ptr]
      |    |    |
      v    v    v
     int  int  int

array("i")
    [raw int][raw int][raw int]

15.17 Array Tradeoffs

An array is compact but less general than a list.

FeatureListarray.array
Element typesAny Python objectOne primitive type
Memory useHigherLower
Numeric compactnessPoorGood
Python object methodsFull object referencesValues converted at boundary
Heterogeneous dataYesNo
Buffer protocolNo direct raw numeric buffer use like arrayYes

Arrays are useful for binary I/O, compact numeric storage, and buffer-compatible data.

For heavy numerical computing, third-party arrays such as NumPy arrays are usually more powerful.

15.18 Bytes as an Array of Bytes

bytes is an immutable sequence of bytes.

b = b"abc"

print(b[0])     # 97

It behaves sequence-like, but stores raw bytes.

bytearray is the mutable version:

buf = bytearray(b"abc")
buf[0] = 65

print(buf)      # bytearray(b'Abc')

For binary data, bytes and bytearray are more compact than list[int].

data = [97, 98, 99]     # list of Python int objects
raw = b"abc"            # compact bytes

15.19 Memoryview and Zero-Copy Slicing

A memoryview can expose a view into an existing buffer.

buf = bytearray(b"abcdef")
view = memoryview(buf)[2:5]

print(view.tobytes())   # b'cde'

The view does not copy the underlying bytes.

Mutating through a writable view affects the exporter:

buf = bytearray(b"abcdef")
view = memoryview(buf)

view[0] = ord("X")

print(buf)              # bytearray(b'Xbcdef')

This is useful for binary parsers, network protocols, file formats, and extension modules.

15.20 Sequence Protocol

Lists, tuples, bytes, bytearrays, ranges, and arrays participate in sequence behavior.

Common operations:

len(xs)
xs[i]
xs[i:j]
x in xs
for x in xs:
    ...

The type decides how to implement those operations.

At the C level, sequence behavior is exposed through sequence slots such as:

sq_length
sq_item
sq_ass_item
sq_contains
sq_concat
sq_repeat

Immutable sequences omit assignment slots.

Mutable sequences provide assignment behavior.

15.21 Iteration

A list iterator stores a reference to the list and a current index.

xs = [1, 2, 3]
it = iter(xs)

print(next(it))    # 1

Conceptually:

list iterator
    sequence reference
    current index

Iteration over a tuple is similar.

Iteration over a list sees the list object, not a snapshot copy.

xs = [1, 2, 3]

for x in xs:
    if x == 2:
        xs.append(4)

Mutating a list during iteration can produce confusing behavior. Avoid it unless the behavior is intentionally controlled.

15.22 Membership Testing

For a list or tuple:

x in xs

performs a linear scan.

Conceptually:

for each item:
    compare item == x

This is O(n).

For membership-heavy workloads, use a set or dict:

seen = set(xs)

if x in seen:
    ...

A set uses hashing and usually gives average O(1) membership.

15.23 Copying

List copy methods are shallow:

a = [[1], [2]]

b = a.copy()
c = list(a)
d = a[:]

All three create a new outer list. Inner objects are shared.

b[0].append(99)

print(a)    # [[1, 99], [2]]

Tuple copying usually returns the same object when no real copy is needed:

t = (1, 2, 3)
u = tuple(t)

print(t is u)    # True

Because tuples are immutable, this reuse is safe.

15.24 Repetition

Sequence repetition repeats references.

xs = [[]] * 3
xs[0].append(1)

print(xs)        # [[1], [1], [1]]

All three slots point to the same inner list.

Correct pattern:

xs = [[] for _ in range(3)]

Now each element is a different list.

This is a reference model issue, not a list multiplication bug.

15.25 Sorting Lists of Objects

Sorting uses comparisons or a key function.

items = ["aaa", "b", "cc"]
items.sort(key=len)

With key, CPython computes keys and sorts using those keys.

This avoids repeated recomputation of expensive comparison logic.

Stable sorting allows multi-pass sorting:

items.sort(key=lambda x: x.name)
items.sort(key=lambda x: x.group)

After the second sort, items with equal group retain name order from the first sort.

15.26 List Comprehensions

A list comprehension builds a list directly.

squares = [x * x for x in range(10)]

It is usually faster and clearer than manual append loops for simple transformations.

Equivalent shape:

squares = []
for x in range(10):
    squares.append(x * x)

The comprehension has dedicated bytecode patterns and keeps the loop body compact.

15.27 Generator Expressions vs Lists

A list comprehension creates all elements immediately.

xs = [x * x for x in range(1_000_000)]

A generator expression produces values lazily.

it = (x * x for x in range(1_000_000))

Use a list when you need indexing, repeated traversal, or materialized data.

Use a generator when you need one pass and want to avoid storing all results.

15.28 Common Performance Patterns

TaskBetter choiceReason
Append many itemslist.appendAmortized O(1)
Build from iterablelist(iterable)Efficient constructor path
Fixed recordtupleImmutable and compact
Membership testsetHash-based lookup
Queue from leftcollections.dequeEfficient left pops
Numeric compact storagearray.array or NumPyRaw typed storage
Binary databytes or bytearrayCompact byte storage
Zero-copy binary slicingmemoryviewAvoids allocation
Many concatenationsjoinAvoids repeated copying

Lists are general-purpose. They are not optimal for every ordered workload.

15.29 Common Mistakes

A few common mistakes come directly from CPython’s reference-based container model.

Shared inner lists:

matrix = [[0] * 3] * 3

Better:

matrix = [[0] * 3 for _ in range(3)]

Using list membership for large lookup sets:

if user_id in user_ids_list:
    ...

Better:

user_ids = set(user_ids_list)
if user_id in user_ids:
    ...

Repeated front deletion:

while xs:
    xs.pop(0)

Better:

from collections import deque

q = deque(xs)
while q:
    q.popleft()

Repeated string or bytes concatenation:

s = ""
for part in parts:
    s += part

Better:

s = "".join(parts)

15.30 C API Sketch

List creation:

PyObject *list = PyList_New(0);
if (list == NULL) {
    return NULL;
}

Append:

PyObject *item = PyLong_FromLong(42);
if (item == NULL) {
    Py_DECREF(list);
    return NULL;
}

if (PyList_Append(list, item) < 0) {
    Py_DECREF(item);
    Py_DECREF(list);
    return NULL;
}

Py_DECREF(item);

PyList_Append increments the item reference internally. The local owned reference must still be released.

Tuple creation:

PyObject *tuple = PyTuple_New(1);
if (tuple == NULL) {
    return NULL;
}

PyObject *item = PyLong_FromLong(42);
if (item == NULL) {
    Py_DECREF(tuple);
    return NULL;
}

PyTuple_SET_ITEM(tuple, 0, item);

PyTuple_SET_ITEM steals the reference. Do not Py_DECREF(item) after successful use.

This difference is one of the classic C API ownership traps.

15.31 Mental Model

Use this model:

list
    mutable sequence
    separate resizable array of PyObject *
    over-allocates for efficient append

tuple
    immutable sequence
    inline fixed array of PyObject *
    compact for fixed groups

array
    mutable typed sequence
    raw value storage
    compact but homogeneous

bytes
    immutable raw byte sequence

bytearray
    mutable raw byte sequence

memoryview
    zero-copy view over buffer-exporting object

For lists and tuples, remember that elements are references. Copying the container does not copy the objects inside it.

15.32 Summary

Lists and tuples are reference containers. A list is mutable and uses a separately allocated resizable array. A tuple is immutable and stores item references inline in a fixed-size allocation.

Arrays and bytes-like objects store compact raw data rather than references to arbitrary Python objects. They are better suited for binary data and typed numeric storage.

Understanding these layouts explains most performance and correctness behavior: append efficiency, front insertion cost, shallow copies, repeated-reference pitfalls, tuple immutability, buffer views, and the difference between general object containers and compact raw storage.