Compact dict implementation, hash table probing, split-key sharing, and PySetObject collision resolution.
Dictionaries and sets are CPython’s primary hash-table containers. A dictionary maps keys to values. A set stores keys without associated values.
They are used everywhere in Python:
module globals
class namespaces
instance attributes
keyword arguments
function annotations
import caches
memoization tables
membership indexes
deduplication setsA large part of Python’s runtime performance depends on dictionary performance.
16.1 Dictionary Semantics
A dictionary maps hashable keys to values.
d = {
"name": "Ada",
"age": 36,
}
print(d["name"]) # AdaKeys must be hashable. Values can be any object.
d = {}
d["x"] = []
d[42] = object()
d[(1, 2)] = "tuple key"A key is hashable when it has a stable hash value and supports equality comparison compatible with that hash.
hash("name") # works
hash((1, 2)) # works
hash([1, 2]) # TypeErrorLists are unhashable because their contents can change.
16.2 Set Semantics
A set stores unique hashable elements.
seen = set()
seen.add("x")
seen.add("x")
print(seen) # {'x'}A set is useful when you need membership testing, deduplication, or set algebra.
a = {"red", "green", "blue"}
b = {"green", "yellow"}
print(a & b) # intersection
print(a | b) # union
print(a - b) # differenceA frozenset is immutable and hashable if all elements are hashable.
key = frozenset(["read", "write"])
d = {key: "permissions"}16.3 Hash Tables
Dictionaries and sets are hash tables.
A hash table uses a hash value to find where an object should live in an internal table.
For dictionary lookup:
hash key
find candidate slot
compare stored key with lookup key
return value if equalFor set membership:
hash element
find candidate slot
compare stored element with lookup element
return present or absentAverage-case lookup is constant time:
d[key] average O(1)
key in d average O(1)
x in set average O(1)Worst-case behavior can degrade when many keys collide, but CPython includes collision handling and hash randomization defenses.
16.4 Hash and Equality Contract
The hash contract is:
if a == b, then hash(a) == hash(b)The reverse is not required:
if hash(a) == hash(b), a may still be different from bA collision happens when two unequal objects have the same hash.
Dictionaries handle collisions by comparing keys.
Example custom key:
class Key:
def __init__(self, value):
self.value = value
def __eq__(self, other):
return isinstance(other, Key) and self.value == other.value
def __hash__(self):
return hash(self.value)If you define __eq__, define a compatible __hash__ only when the object is immutable enough for hashing.
16.5 Mutable Keys Are Dangerous
A key’s hash must remain stable while the key is in a dictionary or set.
This is bad:
class BadKey:
def __init__(self, value):
self.value = value
def __eq__(self, other):
return isinstance(other, BadKey) and self.value == other.value
def __hash__(self):
return hash(self.value)
k = BadKey("a")
d = {k: "value"}
k.value = "b"After mutation, the object may no longer be findable in the table.
print(d.get(k)) # may fail unexpectedlyThe dictionary placed the key according to the old hash. Lookup uses the new hash.
Use immutable key data.
key = ("user", 123)16.6 Dictionary Object Layout
A CPython dictionary object stores metadata and points to table data.
Conceptually:
PyDictObject
object header
used count
version tag
keys table pointer
values pointer for split-table dictsThe internal layout is optimized and version-specific, but the key idea is:
dictionary object
controls a hash table
stores keys and values
resizes as neededA dictionary does not store key and value objects inline as raw values. It stores references to Python objects.
dict entry
key pointer ---> Python object
value pointer ---> Python object
hash value16.7 Combined and Split Tables
CPython dictionaries can use different internal layouts.
A combined table stores keys and values together.
entry
hash
key pointer
value pointerA split table separates shared keys from per-instance values.
This is useful for instance dictionaries of objects of the same class.
Example:
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
a = Point(1, 2)
b = Point(3, 4)Both instances have attributes x and y.
CPython can share key metadata for the class’s instance layout while each instance stores its own values.
Conceptually:
shared keys table
"x"
"y"
instance a values
1
2
instance b values
3
4This reduces memory usage for many instances with the same attribute names.
16.8 Insertion Order
Dictionaries preserve insertion order.
d = {}
d["a"] = 1
d["b"] = 2
d["c"] = 3
print(list(d)) # ['a', 'b', 'c']Updating an existing key does not move it:
d["b"] = 20
print(list(d)) # ['a', 'b', 'c']Deleting and reinserting creates a new insertion position:
del d["b"]
d["b"] = 200
print(list(d)) # ['a', 'c', 'b']Insertion order is language behavior for dictionaries in modern Python, not merely an accidental CPython detail.
16.9 Dictionary Lookup
Dictionary lookup has several stages.
value = d[key]Conceptually:
compute hash(key)
choose initial table slot
inspect slot
if empty: key missing
if stored hash differs: continue probing
if stored key is key: found
if stored key == key: found
otherwise continue probingIdentity is checked before equality when possible. If the same object is used as the key, lookup can avoid equality work.
key = "name"
d = {key: "Ada"}
print(d[key])For strings, cached hashes and interning can make lookup especially fast.
16.10 Collision Handling
Different keys can have the same hash.
class Collide:
def __init__(self, value):
self.value = value
def __hash__(self):
return 1
def __eq__(self, other):
return isinstance(other, Collide) and self.value == other.valueAll instances hash to 1.
d = {}
for i in range(100):
d[Collide(i)] = iThe dictionary still works, but performance suffers because many probes and equality checks are needed.
Hash randomization helps protect string-like keys from predictable collision attacks in network-facing programs.
16.11 Resizing
A dictionary resizes when it becomes too full.
The goal is to keep lookup efficient by maintaining empty space in the table.
Conceptually:
insert key
if table load is high:
allocate larger table
reinsert active entries
insert new keyResizing costs O(n) at the moment it happens, but insertions remain efficient on average.
This is similar in spirit to list over-allocation. The container uses extra memory to keep common operations fast.
16.12 Deletion and Dummy Entries
Deleting a dictionary key cannot always mark a slot as simply empty.
Why?
Because lookup may rely on probe chains. If deletion broke the chain, later keys could become unreachable.
So hash tables often use dummy markers for deleted slots.
Conceptually:
active slot
contains key and value
empty slot
never used
dummy slot
used before, now deletedInsertion can reuse dummy slots. Lookup must skip over them.
Too many dummy slots can hurt performance, so resizing or rebuilding can clean them up.
16.13 Dictionary Views
Dictionary view objects expose dynamic views of keys, values, or items.
d = {"a": 1, "b": 2}
keys = d.keys()
values = d.values()
items = d.items()Views reflect later mutations:
keys = d.keys()
d["c"] = 3
print(keys) # dict_keys(['a', 'b', 'c'])Views avoid copying. They are lightweight objects pointing back to the dictionary.
To make a snapshot, explicitly copy:
keys_snapshot = list(d.keys())16.14 Iterating a Dictionary
Iterating a dictionary yields keys.
for key in d:
print(key)Equivalent:
for key in d.keys():
print(key)During iteration, changing the size of the dictionary raises an error:
for key in d:
d["new"] = 1 # RuntimeErrorMutating values without changing keys is usually allowed:
for key in d:
d[key] += 1The iterator relies on dictionary structure staying consistent.
16.15 Dictionary Version Tags
CPython dictionaries can maintain version information that changes when the dictionary changes.
Version tags are useful for runtime optimizations, especially caches.
For example, attribute lookup and global lookup can cache information when a dictionary remains unchanged.
Conceptually:
dict version = 100
cache lookup result for name "x"
later:
if dict version still 100:
cached result may be valid
else:
redo lookupThis is a major tool for speeding up dynamic lookup without changing Python semantics.
16.16 Globals Are Dictionaries
A module’s global namespace is a dictionary.
x = 10
globals()["x"] = 20
print(x) # 20When Python code reads a global variable, CPython performs a dictionary lookup in the module globals, with fallback to builtins.
Conceptually:
LOAD_GLOBAL name
look in globals dict
if missing, look in builtins dictBecause globals are dictionaries, global variable access depends on dict performance. CPython uses inline caches and version tags to speed this path.
16.17 Instance Attributes Are Often Dictionaries
Ordinary object instances usually store attributes in a dictionary.
class User:
pass
u = User()
u.name = "Ada"
u.age = 36
print(u.__dict__)Output:
{'name': 'Ada', 'age': 36}Attribute access:
u.nameoften becomes a dictionary lookup, after descriptor checks and type-level lookup.
CPython optimizes this heavily with split dictionaries, shared keys, and attribute caches.
16.18 Class Namespaces Are Dictionaries
A class body executes in a namespace mapping.
class User:
species = "human"
def greet(self):
return "hello"The class namespace collects:
species
greet
__module__
__qualname__Then the metaclass creates the class object from that namespace.
The resulting class has a dictionary-like mapping of attributes.
print(User.__dict__)Class dictionaries are exposed through a read-only mappingproxy view.
16.19 mappingproxy
A mappingproxy is a read-only view of a mapping.
class C:
x = 1
print(C.__dict__)You cannot directly assign through the mappingproxy:
C.__dict__["y"] = 2 # TypeErrorBut you can assign through normal class attribute syntax:
C.y = 2The proxy prevents direct mutation of the internal class dictionary while still allowing introspection.
16.20 Keyword Arguments
Keyword arguments are represented using mapping-like mechanisms.
def f(**kwargs):
print(kwargs)
f(a=1, b=2)Inside the function, kwargs is a dictionary.
Keyword argument passing is performance-sensitive because calls are common.
CPython has specialized call conventions to avoid unnecessary dictionary creation when possible, but when **kwargs is needed, a dictionary is materialized.
16.21 Sets Use Hash Tables Too
A set is similar to a dictionary without values.
s = {"a", "b", "c"}Conceptually:
set table entry
hash
key pointerMembership:
"x" in suses hashing and probing, like dict lookup.
Sets resize and handle collisions similarly in concept.
16.22 Set Operations
Set operations use hash-table membership internally.
a = {1, 2, 3}
b = {3, 4, 5}
print(a & b) # {3}
print(a | b) # {1, 2, 3, 4, 5}
print(a - b) # {1, 2}Performance depends on input sizes.
For intersection, it is usually better to iterate over the smaller set and test membership in the larger set.
Conceptually:
intersection(a, b)
for each item in smaller set:
if item in larger set:
add to resultCPython’s implementation uses these kinds of size-aware strategies.
16.23 Set Iteration Order
Sets do not promise insertion order.
s = {"a", "b", "c"}
print(list(s))The order can depend on hashes, table size, insertion history, and runtime hash randomization.
Do not write code that depends on set iteration order.
If order matters, use a dictionary as an ordered membership map or sort explicitly.
ordered_unique = list(dict.fromkeys(items))16.24 dict.fromkeys
A common deduplication pattern:
items = ["b", "a", "b", "c", "a"]
unique = list(dict.fromkeys(items))
print(unique) # ['b', 'a', 'c']This uses dictionary insertion order to preserve first occurrence.
A set would deduplicate but lose order:
set(items)Use the container whose semantics match the requirement.
16.25 Default Values and Missing Keys
Several dictionary APIs handle missing keys.
d.get("key", default)get returns a default without inserting.
d.setdefault("key", [])setdefault inserts the default if the key is missing.
Common pattern:
groups = {}
for item in items:
groups.setdefault(item.category, []).append(item)For repeated grouping, collections.defaultdict is often clearer:
from collections import defaultdict
groups = defaultdict(list)
for item in items:
groups[item.category].append(item)16.26 Dictionary Comprehensions
A dictionary comprehension builds a dictionary from an iterable.
squares = {x: x * x for x in range(10)}If duplicate keys appear, later values replace earlier values while the insertion position of the first key is retained.
d = {x % 2: x for x in range(5)}
print(d) # {0: 4, 1: 3}The keys 0 and 1 were inserted early, then values were updated.
16.27 Set Comprehensions
A set comprehension builds a set.
mods = {x % 10 for x in range(100)}Duplicates collapse.
Set comprehensions are useful for extracting unique computed values.
domains = {url.split("/")[2] for url in urls}Again, iteration order is not a semantic guarantee.
16.28 Hash Randomization
CPython uses hash randomization for some hashable types such as strings and bytes.
This means hash values can differ between processes.
print(hash("name"))Run the program twice and the value may differ.
Do not persist hash() values across processes. Do not use them as stable IDs.
For stable hashing, use a defined hash function:
import hashlib
digest = hashlib.sha256(b"name").hexdigest()hash() is for in-process hash tables, not durable identity.
16.29 Common Performance Patterns
| Task | Container | Reason |
|---|---|---|
| Key-value lookup | dict | Hash-table lookup |
| Membership testing | set | Average O(1) |
| Preserve first unique values | dict.fromkeys | Ordered keys |
| Count items | collections.Counter | Specialized dict subclass |
| Group items | defaultdict(list) | Avoid repeated missing-key checks |
| Fixed composite key | tuple | Hashable if elements hashable |
| Ordered sorted keys | sorted(d) | Explicit ordering |
| Read-only class namespace view | mappingproxy | Prevent direct mutation |
16.30 Common Mistakes
Using a list for frequent membership:
if user_id in user_ids:
...If user_ids is large and lookup happens often, use a set.
user_ids = set(user_ids)Using mutable objects as conceptual keys:
key = [1, 2]
d[key] = "value" # TypeErrorUse a tuple:
key = (1, 2)
d[key] = "value"Depending on set order:
first = next(iter(my_set))This selects an arbitrary element, not a stable first element.
Persisting hash():
id_value = hash(username)Use a stable hash function instead.
16.31 C API Sketch
Creating a dictionary:
PyObject *d = PyDict_New();
if (d == NULL) {
return NULL;
}Setting an item:
PyObject *key = PyUnicode_FromString("name");
PyObject *value = PyUnicode_FromString("Ada");
if (key == NULL || value == NULL) {
Py_XDECREF(key);
Py_XDECREF(value);
Py_DECREF(d);
return NULL;
}
if (PyDict_SetItem(d, key, value) < 0) {
Py_DECREF(key);
Py_DECREF(value);
Py_DECREF(d);
return NULL;
}
Py_DECREF(key);
Py_DECREF(value);PyDict_SetItem does not steal references. It increments what it stores. The caller must release its owned references.
Getting an item can return a borrowed reference depending on the API:
PyObject *value = PyDict_GetItemWithError(d, key);
if (value == NULL) {
if (PyErr_Occurred()) {
return NULL;
}
Py_RETURN_NONE;
}
/* borrowed reference */
Py_INCREF(value);
return value;Reference ownership rules matter.
16.32 Set C API Sketch
Creating a set:
PyObject *s = PySet_New(NULL);
if (s == NULL) {
return NULL;
}Adding an item:
PyObject *item = PyUnicode_FromString("x");
if (item == NULL) {
Py_DECREF(s);
return NULL;
}
if (PySet_Add(s, item) < 0) {
Py_DECREF(item);
Py_DECREF(s);
return NULL;
}
Py_DECREF(item);The set owns its own reference after insertion. The local reference is still released.
Membership:
int present = PySet_Contains(s, item);
if (present < 0) {
return NULL;
}The result is:
1 if present
0 if absent
-1 on error16.33 Mental Model
Use this model:
dict
hash table mapping keys to values
keys must be hashable
insertion order preserved
stores references to keys and values
used throughout CPython runtime
set
hash table storing keys only
keys must be hashable
no insertion-order guarantee
optimized for membership and set algebraFor both containers:
hash chooses where to look
equality confirms the key
collisions are handled by probing
resizing keeps lookup efficient
mutation changes table structure
iteration depends on table state16.34 Summary
Dictionaries and sets are hash-table containers. Dictionaries map hashable keys to values and preserve insertion order. Sets store unique hashable elements and optimize membership testing and set operations.
CPython uses dictionaries as a core runtime structure for globals, attributes, classes, modules, keyword arguments, and caches. Their implementation includes hash caching, collision handling, resizing, split-table layouts, version tags, and order preservation.
Understanding dictionaries and sets explains much of Python’s speed, object model, attribute lookup, namespace behavior, and common performance patterns.