PyListObject dynamic resizing strategy, PyTupleObject immutability, and the array module's typed buffer.
Lists, tuples, and array-like objects represent ordered collections. They all support indexed access, but they have different storage models, mutability rules, and performance tradeoffs.
A list is a mutable sequence of object references.
A tuple is an immutable sequence of object references.
An array-like object stores compact typed data or exposes a contiguous buffer.
These distinctions matter because CPython containers store references unless the type is specifically designed for raw storage.
15.1 Ordered Collections
Python has several ordered collection types.
| Type | Mutable | Stores | Main use |
|---|---|---|---|
list | Yes | Object references | General mutable sequence |
tuple | No | Object references | Fixed records, immutable groups |
array.array | Yes | Raw typed values | Compact numeric storage |
bytes | No | Raw bytes | Immutable binary data |
bytearray | Yes | Raw bytes | Mutable binary data |
memoryview | View-dependent | Raw buffer view | Zero-copy buffer access |
A list and a tuple can store objects of any type:
xs = [1, "two", object()]
t = (1, "two", object())An array stores values of one machine-level type:
from array import array
nums = array("i", [1, 2, 3])15.2 Lists Store References
A CPython list stores pointers to Python objects.
Conceptually:
PyListObject
PyVarObject header
ob_size = logical length
ob_item ----> array of PyObject *
allocated = capacityFor:
xs = [10, 20, 30]the layout is:
list object
ob_size = 3
allocated >= 3
ob_item ----+
|
v
[ptr][ptr][ptr]
| | |
v v v
10 20 30The list does not inline the integer payloads. It stores references to integer objects.
This is why a list can hold mixed types:
xs = [1, "hello", None, []]Every element slot has the same representation: a PyObject *.
15.3 List Length and Capacity
A list has a logical length and an allocated capacity.
length
number of active elements
capacity
number of slots currently allocated in the internal arrayExample:
xs = []
xs.append("a")
xs.append("b")
xs.append("c")The logical length is 3. The allocated capacity may be greater than 3.
This spare capacity allows efficient append. CPython does not normally resize the internal array on every append. It grows the array in larger steps.
Conceptually:
append item
if length < allocated:
store item in spare slot
length += 1
else:
allocate larger array
copy references
store item
length += 1The object identity of the list remains stable. The internal element array may move.
xs = []
before = id(xs)
for i in range(1000):
xs.append(i)
after = id(xs)
print(before == after) # True15.4 List Append
list.append adds one object reference to the end.
xs = []
xs.append(1)
xs.append(2)At the C level, append must:
ensure there is enough capacity
increment the appended object's reference count
store the object pointer
increase the logical sizeAppending an object does not copy it.
item = []
xs = []
xs.append(item)
item.append(1)
print(xs) # [[1]]The list stores a reference to item.
15.5 List Over-Allocation
List over-allocation is the reason repeated append is efficient.
Without over-allocation, this loop would copy the whole array on many iterations:
xs = []
for i in range(1_000_000):
xs.append(i)With over-allocation, CPython grows capacity by more than one slot at a time. The exact formula is an implementation detail, but the behavior gives amortized constant-time append.
Important result:
| Operation | Average cost |
|---|---|
append | O(1) amortized |
pop() from end | O(1) |
| random indexing | O(1) |
| insert at front | O(n) |
| delete from middle | O(n) |
Appending is cheap because it touches the end. Inserting at the front is expensive because many references must move.
15.6 List Indexing
List indexing is direct array access.
x = xs[i]Conceptually:
check bounds
read ob_item[i]
return referenced objectThis gives O(1) access.
Negative indexes are translated:
xs[-1]means:
xs[len(xs) - 1]after bounds handling.
Indexing returns a reference to the contained object. It does not copy that object.
xs = [[1], [2]]
a = xs[0]
a.append(99)
print(xs) # [[1, 99], [2]]15.7 List Assignment
List item assignment replaces one stored reference.
xs = [1, 2, 3]
xs[1] = "two"At the C level, replacement must handle reference counts safely:
increment new item
store new item pointer
decrement old itemThe safe order matters when the new object and old object could be the same object.
Conceptually:
Py_INCREF(new_item);
old_item = items[index];
items[index] = new_item;
Py_DECREF(old_item);This preserves ownership invariants even if finalizers run during Py_DECREF.
15.8 List Slicing
Slicing creates a new list.
xs = [[1], [2], [3]]
ys = xs[0:2]The new list contains references to the same element objects.
ys[0].append(99)
print(xs) # [[1, 99], [2], [3]]
print(ys) # [[1, 99], [2]]This is a shallow copy.
The outer list is new. The contained objects are shared.
xs ----> [ptr A][ptr B][ptr C]
ys ----> [ptr A][ptr B]A deep copy requires recursive copying:
import copy
ys = copy.deepcopy(xs)15.9 List Insertion and Deletion
Inserting into the middle moves later references.
xs = [1, 2, 3, 4]
xs.insert(1, "x")Conceptually:
[1][2][3][4]
insert at index 1
[1]["x"][2][3][4]Elements from index 1 onward shift right.
Deleting from the middle shifts left:
del xs[1]These operations are O(n) because they may move many pointers.
The objects themselves are not copied. CPython moves references inside the internal array.
15.10 List Sorting
list.sort sorts the list in place.
xs = [3, 1, 2]
xs.sort()sorted(xs) creates a new list:
ys = sorted(xs)CPython uses Timsort for list sorting. It is stable, meaning equal keys retain their relative order.
items = [
("a", 2),
("b", 1),
("c", 2),
]
items.sort(key=lambda x: x[1])
print(items) # [('b', 1), ('a', 2), ('c', 2)]Sorting rearranges references inside the list. It does not copy the pointed-to objects.
15.11 Tuples Store References Inline
A tuple is an immutable sequence.
Conceptually:
PyTupleObject
PyVarObject header
ob_size = length
ob_item[0]
ob_item[1]
...For:
t = (10, 20, 30)the layout is:
tuple object
ob_size = 3
ob_item[0] ---> 10
ob_item[1] ---> 20
ob_item[2] ---> 30Unlike list, tuple items are stored inline as part of the tuple allocation. There is no separate resizable array.
This is possible because tuple length cannot change after creation.
15.12 Tuple Immutability
Tuple immutability means the tuple’s item references cannot be changed.
t = (1, 2, 3)
t[0] = 99 # TypeErrorThe tuple object still stores references. If one referenced object is mutable, that object can change.
t = ([],)
t[0].append("x")
print(t) # (['x'],)The tuple still points to the same list. The list changed.
This distinction matters for hashing.
hash((1, 2, 3)) # works
hash(([],)) # TypeErrorA tuple is hashable only if all elements are hashable.
15.13 Tuple Creation
Tuple literals are common:
t = (1, 2, 3)Parentheses alone do not create a tuple. The comma does.
a = (1)
b = (1,)
print(type(a)) # int
print(type(b)) # tupleTuples are heavily used internally for:
function arguments
multiple return values
constant groups
dictionary keys
bytecode constants
C API argument packingBecause tuples are immutable and compact, they are natural containers for fixed groups.
15.14 Tuple Packing and Unpacking
Python uses tuples for packing multiple values.
point = 10, 20This creates a tuple.
Unpacking extracts elements:
x, y = pointAt the bytecode level, packing and unpacking have dedicated instructions because they are common.
Extended unpacking:
first, *middle, last = [1, 2, 3, 4, 5]creates a list for the starred target.
print(middle) # [2, 3, 4]15.15 Lists vs Tuples
Lists and tuples have similar indexing behavior but different design goals.
| Feature | List | Tuple |
|---|---|---|
| Mutability | Mutable | Immutable |
| Length | Can change | Fixed |
| Storage | Separate resizable array | Inline item references |
| Append | Yes | No |
| Hashable | No | If elements are hashable |
| Main use | Mutable sequence | Fixed group or record |
| Syntax | [1, 2] | (1, 2) |
Use a list when the collection changes.
Use a tuple when the collection represents a fixed grouping.
15.16 Arrays Store Raw Values
array.array stores compact typed values, not Python object references.
from array import array
xs = array("i", [1, 2, 3])The type code controls the C-level representation.
Examples:
| Type code | Meaning |
|---|---|
"b" | signed char |
"B" | unsigned char |
"h" | signed short |
"H" | unsigned short |
"i" | signed int |
"I" | unsigned int |
"l" | signed long |
"L" | unsigned long |
"f" | float |
"d" | double |
An array of integers is much more compact than a list of Python integer objects.
Conceptually:
list of ints
[ptr][ptr][ptr]
| | |
v v v
int int int
array("i")
[raw int][raw int][raw int]15.17 Array Tradeoffs
An array is compact but less general than a list.
| Feature | List | array.array |
|---|---|---|
| Element types | Any Python object | One primitive type |
| Memory use | Higher | Lower |
| Numeric compactness | Poor | Good |
| Python object methods | Full object references | Values converted at boundary |
| Heterogeneous data | Yes | No |
| Buffer protocol | No direct raw numeric buffer use like array | Yes |
Arrays are useful for binary I/O, compact numeric storage, and buffer-compatible data.
For heavy numerical computing, third-party arrays such as NumPy arrays are usually more powerful.
15.18 Bytes as an Array of Bytes
bytes is an immutable sequence of bytes.
b = b"abc"
print(b[0]) # 97It behaves sequence-like, but stores raw bytes.
bytearray is the mutable version:
buf = bytearray(b"abc")
buf[0] = 65
print(buf) # bytearray(b'Abc')For binary data, bytes and bytearray are more compact than list[int].
data = [97, 98, 99] # list of Python int objects
raw = b"abc" # compact bytes15.19 Memoryview and Zero-Copy Slicing
A memoryview can expose a view into an existing buffer.
buf = bytearray(b"abcdef")
view = memoryview(buf)[2:5]
print(view.tobytes()) # b'cde'The view does not copy the underlying bytes.
Mutating through a writable view affects the exporter:
buf = bytearray(b"abcdef")
view = memoryview(buf)
view[0] = ord("X")
print(buf) # bytearray(b'Xbcdef')This is useful for binary parsers, network protocols, file formats, and extension modules.
15.20 Sequence Protocol
Lists, tuples, bytes, bytearrays, ranges, and arrays participate in sequence behavior.
Common operations:
len(xs)
xs[i]
xs[i:j]
x in xs
for x in xs:
...The type decides how to implement those operations.
At the C level, sequence behavior is exposed through sequence slots such as:
sq_length
sq_item
sq_ass_item
sq_contains
sq_concat
sq_repeatImmutable sequences omit assignment slots.
Mutable sequences provide assignment behavior.
15.21 Iteration
A list iterator stores a reference to the list and a current index.
xs = [1, 2, 3]
it = iter(xs)
print(next(it)) # 1Conceptually:
list iterator
sequence reference
current indexIteration over a tuple is similar.
Iteration over a list sees the list object, not a snapshot copy.
xs = [1, 2, 3]
for x in xs:
if x == 2:
xs.append(4)Mutating a list during iteration can produce confusing behavior. Avoid it unless the behavior is intentionally controlled.
15.22 Membership Testing
For a list or tuple:
x in xsperforms a linear scan.
Conceptually:
for each item:
compare item == xThis is O(n).
For membership-heavy workloads, use a set or dict:
seen = set(xs)
if x in seen:
...A set uses hashing and usually gives average O(1) membership.
15.23 Copying
List copy methods are shallow:
a = [[1], [2]]
b = a.copy()
c = list(a)
d = a[:]All three create a new outer list. Inner objects are shared.
b[0].append(99)
print(a) # [[1, 99], [2]]Tuple copying usually returns the same object when no real copy is needed:
t = (1, 2, 3)
u = tuple(t)
print(t is u) # TrueBecause tuples are immutable, this reuse is safe.
15.24 Repetition
Sequence repetition repeats references.
xs = [[]] * 3
xs[0].append(1)
print(xs) # [[1], [1], [1]]All three slots point to the same inner list.
Correct pattern:
xs = [[] for _ in range(3)]Now each element is a different list.
This is a reference model issue, not a list multiplication bug.
15.25 Sorting Lists of Objects
Sorting uses comparisons or a key function.
items = ["aaa", "b", "cc"]
items.sort(key=len)With key, CPython computes keys and sorts using those keys.
This avoids repeated recomputation of expensive comparison logic.
Stable sorting allows multi-pass sorting:
items.sort(key=lambda x: x.name)
items.sort(key=lambda x: x.group)After the second sort, items with equal group retain name order from the first sort.
15.26 List Comprehensions
A list comprehension builds a list directly.
squares = [x * x for x in range(10)]It is usually faster and clearer than manual append loops for simple transformations.
Equivalent shape:
squares = []
for x in range(10):
squares.append(x * x)The comprehension has dedicated bytecode patterns and keeps the loop body compact.
15.27 Generator Expressions vs Lists
A list comprehension creates all elements immediately.
xs = [x * x for x in range(1_000_000)]A generator expression produces values lazily.
it = (x * x for x in range(1_000_000))Use a list when you need indexing, repeated traversal, or materialized data.
Use a generator when you need one pass and want to avoid storing all results.
15.28 Common Performance Patterns
| Task | Better choice | Reason |
|---|---|---|
| Append many items | list.append | Amortized O(1) |
| Build from iterable | list(iterable) | Efficient constructor path |
| Fixed record | tuple | Immutable and compact |
| Membership test | set | Hash-based lookup |
| Queue from left | collections.deque | Efficient left pops |
| Numeric compact storage | array.array or NumPy | Raw typed storage |
| Binary data | bytes or bytearray | Compact byte storage |
| Zero-copy binary slicing | memoryview | Avoids allocation |
| Many concatenations | join | Avoids repeated copying |
Lists are general-purpose. They are not optimal for every ordered workload.
15.29 Common Mistakes
A few common mistakes come directly from CPython’s reference-based container model.
Shared inner lists:
matrix = [[0] * 3] * 3Better:
matrix = [[0] * 3 for _ in range(3)]Using list membership for large lookup sets:
if user_id in user_ids_list:
...Better:
user_ids = set(user_ids_list)
if user_id in user_ids:
...Repeated front deletion:
while xs:
xs.pop(0)Better:
from collections import deque
q = deque(xs)
while q:
q.popleft()Repeated string or bytes concatenation:
s = ""
for part in parts:
s += partBetter:
s = "".join(parts)15.30 C API Sketch
List creation:
PyObject *list = PyList_New(0);
if (list == NULL) {
return NULL;
}Append:
PyObject *item = PyLong_FromLong(42);
if (item == NULL) {
Py_DECREF(list);
return NULL;
}
if (PyList_Append(list, item) < 0) {
Py_DECREF(item);
Py_DECREF(list);
return NULL;
}
Py_DECREF(item);PyList_Append increments the item reference internally. The local owned reference must still be released.
Tuple creation:
PyObject *tuple = PyTuple_New(1);
if (tuple == NULL) {
return NULL;
}
PyObject *item = PyLong_FromLong(42);
if (item == NULL) {
Py_DECREF(tuple);
return NULL;
}
PyTuple_SET_ITEM(tuple, 0, item);PyTuple_SET_ITEM steals the reference. Do not Py_DECREF(item) after successful use.
This difference is one of the classic C API ownership traps.
15.31 Mental Model
Use this model:
list
mutable sequence
separate resizable array of PyObject *
over-allocates for efficient append
tuple
immutable sequence
inline fixed array of PyObject *
compact for fixed groups
array
mutable typed sequence
raw value storage
compact but homogeneous
bytes
immutable raw byte sequence
bytearray
mutable raw byte sequence
memoryview
zero-copy view over buffer-exporting objectFor lists and tuples, remember that elements are references. Copying the container does not copy the objects inside it.
15.32 Summary
Lists and tuples are reference containers. A list is mutable and uses a separately allocated resizable array. A tuple is immutable and stores item references inline in a fixed-size allocation.
Arrays and bytes-like objects store compact raw data rather than references to arbitrary Python objects. They are better suited for binary data and typed numeric storage.
Understanding these layouts explains most performance and correctness behavior: append efficiency, front insertion cost, shallow copies, repeated-reference pitfalls, tuple immutability, buffer views, and the difference between general object containers and compact raw storage.