# 77. Dictionary Performance

# 77. Dictionary Performance

Dictionaries are one of the most important data structures in CPython.

Python uses dictionaries for:

```text id="m53sgf"
global namespaces
module namespaces
instance attributes
class attributes
keyword arguments
caches
sets internals
JSON-like data
configuration objects
symbol tables
```

Many Python operations ultimately become dictionary operations.

Examples:

```python id="7g4iwg"
x = globals()["name"]
obj.attr
module.value
kwargs["limit"]
```

Because dictionaries are everywhere, CPython invests heavily in dictionary performance.

Modern CPython dictionaries are compact, cache-friendly, insertion-ordered hash tables with specialized optimizations for common Python workloads.

## 77.1 Basic Dictionary Semantics

A dictionary maps keys to values.

Example:

```python id="k82g6s"
d = {
    "name": "Alice",
    "age": 30
}
```

Core operations:

```text id="3v4jib"
insert key/value
lookup key
update value
delete key
iterate entries
```

Python dictionaries support arbitrary hashable keys.

Examples:

```python id="m9j8d5"
d[1]
d["x"]
d[(1, 2)]
d[custom_object]
```

The implementation must therefore support:

```text id="2mgjlwm"
dynamic types
custom hashing
custom equality
resizing
mutation during execution
```

while remaining fast.

## 77.2 Hash Tables

CPython dictionaries are hash tables.

A hash table works by:

```text id="mjv2aq"
compute hash(key)
map hash to table position
probe for matching entry
```

Example:

```python id="zwqz9o"
d["name"]
```

Conceptually:

```text id="2b9ew4"
hash("name")
    ↓
table index
    ↓
probe sequence
    ↓
find matching key
    ↓
return value
```

The goal is average-case constant-time lookup.

## 77.3 Hash Values

Every dictionary key must be hashable.

A hashable object provides:

```python id="1vb6yb"
__hash__()
__eq__()
```

CPython stores the computed hash value alongside dictionary entries.

This avoids recomputing hashes repeatedly during probing.

Conceptually, an entry contains:

```text id="p5t53z"
hash
key pointer
value pointer
```

Caching hash values matters because string hashing and custom object hashing can be expensive.

## 77.4 Open Addressing

CPython dictionaries use open addressing rather than separate chaining.

Separate chaining:

```text id="huw2kc"
table slot
    ↓
linked list of entries
```

Open addressing:

```text id="of6k4f"
table slot
    ↓
probe nearby slots
```

Advantages of open addressing:

```text id="b1jv0k"
better cache locality
fewer allocations
compact memory layout
```

Disadvantages:

```text id="1l6y3h"
more complex probing logic
performance sensitive to load factor
deletion handling complexity
```

CPython strongly favors locality and compactness.

## 77.5 Probing

Collisions occur when different keys map to the same slot.

CPython resolves collisions using probing.

Conceptually:

```text id="n6gtln"
initial slot occupied
    ↓
probe another slot
    ↓
probe another slot
    ↓
continue until match or empty slot
```

The probing sequence is designed to spread entries across the table while remaining cache-friendly.

Good probing behavior is critical for performance.

Poor probing causes:

```text id="6u0e3p"
long lookup chains
cache misses
branch mispredictions
slower inserts
slower deletions
```

## 77.6 Dictionary Entries

Modern CPython dictionaries separate metadata from entries.

Conceptually:

```text id="3wb0fd"
indices array
entries array
```

The indices table maps hash positions to entry positions.

The entries table stores actual key/value data.

This design improves compactness and iteration performance.

Older dictionary layouts stored many unused fields directly in hash slots. Modern layouts reduce wasted space.

## 77.7 Compact Dictionaries

Modern CPython dictionaries are compact.

A compact dictionary minimizes:

```text id="6m0krv"
unused pointer fields
padding
sparse entry storage
allocation overhead
```

This matters because Python programs often create huge numbers of small dictionaries:

```python id="f83bva"
{"x": 1}
{"name": "..."}
{"id": 123}
```

Memory efficiency directly affects:

```text id="j7i1c1"
cache locality
allocation cost
overall process memory usage
```

Compact dictionaries were one of the major dictionary redesigns in CPython 3.6.

## 77.8 Insertion Order Preservation

Modern Python dictionaries preserve insertion order.

Example:

```python id="00g8l2"
d = {}
d["a"] = 1
d["b"] = 2
d["c"] = 3

print(list(d.keys()))
```

Output:

```text id="e8h3an"
['a', 'b', 'c']
```

This behavior became a language guarantee in Python 3.7.

The compact dictionary layout naturally supported ordered iteration efficiently.

Importantly:

```text id="uz6qu0"
ordered iteration
```

does not mean:

```text id="z9ik0f"
sorted order
```

The order reflects insertion history.

## 77.9 Split Dictionaries

CPython uses split dictionaries for many object instance dictionaries.

Consider:

```python id="e9a7ys"
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
```

Many `Point` instances share the same attribute names:

```text id="y4u6a0"
x
y
```

A naive representation duplicates key objects in every instance dictionary.

Split dictionaries separate:

```text id="yvffjy"
shared keys
per-instance values
```

Conceptually:

```text id="czq31j"
class layout:
    keys = [x, y]

instance 1:
    values = [1, 2]

instance 2:
    values = [5, 8]
```

This saves memory across large numbers of similar objects.

## 77.10 Combined Dictionaries

Not all dictionaries use split layout.

Ordinary dictionaries usually use combined layout:

```text id="8z6mzn"
entry contains:
    hash
    key
    value
```

Combined dictionaries are simpler and flexible.

Split dictionaries optimize common instance attribute patterns.

CPython chooses layouts based on usage.

## 77.11 String-Key Optimization

Many Python dictionaries use string keys:

```python id="5s84r6"
{"name": "Alice"}
{"id": 123}
globals()
```

CPython optimizes string-key lookup heavily.

Advantages of string keys:

```text id="mjlwmr"
cached hash values
interning opportunities
pointer comparisons
stable immutability
```

Attribute dictionaries and global namespaces benefit strongly from string-key specialization.

## 77.12 Interned Strings

Some strings are interned.

Interning means identical immutable strings may share the same object.

Example:

```python id="65p0h5"
"a" is "a"
```

often evaluates to:

```text id="krqwwy"
True
```

for interned strings.

Interning helps dictionary performance because equality checks may become:

```text id="jlwm6i"
pointer comparison
```

instead of character-by-character comparison.

CPython interns many identifier-like strings automatically.

## 77.13 Hash Randomization

CPython uses randomized hashing for some types such as strings.

Without randomization, attackers could deliberately construct many colliding keys:

```text id="7p4u9r"
same hash
different strings
```

This could force pathological probing behavior:

```text id="rh40z6"
O(n²)-like degradation
```

Hash randomization makes collision attacks harder.

The randomized seed changes between processes.

This improves security for network-facing programs.

## 77.14 Load Factor

Dictionary performance depends on load factor.

Load factor means:

```text id="1ukpnd"
used_slots / total_slots
```

If the table becomes too full:

```text id="1q0j6m"
probing chains grow longer
performance degrades
```

CPython resizes dictionaries before they become overly crowded.

This trades memory for speed.

Dictionaries intentionally keep unused capacity to maintain efficient probing.

## 77.15 Resizing

When a dictionary grows large enough, CPython resizes it.

Conceptually:

```text id="qlyq9z"
allocate larger table
rehash entries
move entries
replace old table
```

Resizing is expensive, but infrequent.

Average insertion remains approximately constant-time because resizing happens geometrically rather than linearly.

A single insertion may trigger a large resize operation, but most insertions do not.

## 77.16 Deletion

Deletion is tricky in open-addressed hash tables.

Removing an entry completely can break probe chains.

CPython therefore uses tombstone-like markers.

Conceptually:

```text id="dbqjlwm"
slot previously occupied
now marked deleted
```

Deleted slots remain part of probing logic.

Over time, many deletions can degrade table quality.

Resizing and rebuilding help clean up deleted markers.

## 77.17 Lookup Fast Paths

Dictionary lookup is highly optimized.

Hot lookup paths avoid:

```text id="r6m9ly"
unnecessary allocations
repeated hash computation
generic comparisons
```

Optimizations include:

```text id="mrlkca"
cached hashes
string specialization
pointer equality checks
compact probing
inline helper functions
```

Dictionary lookup is one of the most performance-critical operations in CPython.

## 77.18 Globals and Builtins

Global lookup depends heavily on dictionaries.

Example:

```python id="t7yfyv"
print(len(xs))
```

This may involve:

```text id="r8fmnw"
globals dictionary lookup
builtins dictionary lookup
```

CPython uses inline caches and dictionary version tags to optimize repeated lookups.

Still, the underlying namespace representation is dictionary-based.

## 77.19 Attribute Access

Object attribute access often becomes dictionary lookup.

Example:

```python id="u5r14v"
obj.x
```

may involve:

```text id="zw86hu"
instance dictionary lookup
class dictionary lookup
descriptor lookup
```

Dictionary performance directly affects attribute access performance.

This is one reason instance dictionary optimization matters so much.

## 77.20 Dictionary Version Tags

Modern CPython dictionaries carry version information.

A version changes when the dictionary mutates.

Inline caches can use this:

```text id="8m1mqe"
if dict version unchanged:
    cached lookup still valid
```

This enables fast global and attribute lookup specialization.

Without version tags, caches would need more expensive validation logic.

## 77.21 Iteration Performance

Dictionary iteration is optimized for compact sequential traversal.

Example:

```python id="v67kj5"
for k, v in d.items():
    ...
```

The compact entries layout improves iteration locality.

Iteration cost depends partly on:

```text id="1ytbfk"
table density
cache locality
number of deleted slots
```

Compact dictionaries improved iteration speed significantly compared to older sparse layouts.

## 77.22 Memory Layout

Modern dictionary memory layout prioritizes:

```text id="iwjlwm"
compactness
locality
reduced pointer chasing
efficient iteration
```

The internal structures are carefully arranged to minimize wasted memory.

This matters because dictionaries are among the most common objects in Python programs.

Small per-dictionary savings scale enormously across large applications.

## 77.23 Dictionaries and Sets

Sets reuse much of the same hash-table machinery.

A set is conceptually:

```text id="6mav6g"
dictionary with keys only
```

Many probing, hashing, resizing, and collision-resolution mechanisms are shared.

Set performance characteristics therefore resemble dictionary performance characteristics.

## 77.24 Equality and Hashing Costs

Dictionary performance depends not only on table structure but also on key behavior.

Expensive key operations hurt performance:

```python id="3u3v0j"
class Slow:
    def __hash__(self):
        ...

    def __eq__(self, other):
        ...
```

Lookup may require:

```text id="luq2p2"
hash computation
equality comparisons
probing
```

Good hash functions should:

```text id="ylt7o4"
distribute keys evenly
be stable
be reasonably fast
```

Poor hashing increases collisions.

## 77.25 Unicode Keys

Unicode strings are common dictionary keys.

CPython optimizes Unicode representation heavily because dictionaries and attribute lookup depend on them constantly.

Important properties:

```text id="jlwmvt"
immutable
cached hash
interning support
compact representation
```

Efficient Unicode handling is deeply tied to dictionary performance.

## 77.26 Why Dictionaries Are Fast

Python dictionaries are fast because they combine:

```text id="h5an0q"
compact layout
cache locality
open addressing
cached hashes
careful probing
version tagging
specialized string handling
resizing heuristics
```

The implementation is heavily tuned for real Python workloads.

This is one of the most engineered subsystems in CPython.

## 77.27 Worst-Case Behavior

Average lookup is near constant-time.

Worst-case behavior can still degrade.

Bad cases include:

```text id="w00cuv"
heavy collisions
pathological hashing
extreme deletion patterns
adversarial inputs
```

Hash randomization helps defend against deliberate attacks.

Still, no practical hash table guarantees perfect constant time in all cases.

## 77.28 Dictionary Views

Dictionary methods such as:

```python id="pt3p8m"
d.keys()
d.values()
d.items()
```

return dynamic view objects.

These views reflect dictionary mutations.

Example:

```python id="szvjlwm"
d = {"a": 1}
k = d.keys()

d["b"] = 2

print(k)
```

The view updates automatically.

The implementation avoids copying dictionary contents unnecessarily.

## 77.29 `OrderedDict`

Modern dictionaries preserve insertion order, but `collections.OrderedDict` still exists.

`OrderedDict` provides additional ordering-related operations:

```text id="4zb3px"
move_to_end
reordering behavior
specialized equality semantics
```

Ordinary dictionaries preserve insertion order efficiently without a linked-list overlay.

## 77.30 Small Dictionaries

Small dictionaries are extremely common.

Examples:

```python id="bjlwm1"
{"x": 1}
{"id": 3}
{"name": "Alice"}
```

CPython optimizes small-table behavior carefully because even tiny improvements matter globally.

Common goals:

```text id="t2j7pv"
fewer allocations
compact storage
reduced metadata
efficient resizing
```

## 77.31 Dictionaries and the Compiler

The compiler also depends on dictionaries.

Examples:

```text id="q2jlwm"
symbol tables
constant tables
name lookup tables
module namespaces
```

Efficient dictionaries improve both runtime execution and compilation.

## 77.32 Dictionaries and the Import System

The import system heavily uses dictionaries.

Examples:

```text id="fjlwm7"
module globals
sys.modules
package namespaces
import caches
```

Import performance depends partly on dictionary efficiency.

## 77.33 Dictionaries and Object Models

Python’s dynamic object model depends fundamentally on dictionaries.

Features such as:

```text id="0ylp2m"
dynamic attributes
monkey patching
runtime method replacement
dynamic modules
dynamic globals
```

all rely on dictionary mutation.

The flexibility of Python objects comes partly from dictionary-backed namespaces.

## 77.34 Practical Performance Implications

Dictionary internals explain many practical performance patterns.

| Pattern | Reason |
|---|---|
| Local variables faster than globals | Globals require dictionary lookup |
| Attribute access costs matter | Often requires dictionary operations |
| Stable attribute layouts help | Better cache specialization |
| String keys are efficient | Optimized hashing and lookup |
| Large numbers of small objects cost memory | Instance dictionaries consume space |

These are not arbitrary quirks. They follow from the object model and dictionary implementation.

## 77.35 Reading Dictionary Source Code

Important files:

| File | Purpose |
|---|---|
| `Objects/dictobject.c` | Main dictionary implementation |
| `Include/cpython/dictobject.h` | Public dictionary structures |
| `Objects/setobject.c` | Related set implementation |
| `Python/ceval.c` | Global lookup and inline caches |
| `Objects/object.c` | Generic object operations |

`dictobject.c` is one of the most performance-sensitive files in CPython.

Many implementation details exist primarily for locality and branch efficiency.

## 77.36 Mental Model

A useful mental model:

```text id="jlwm1v"
CPython dictionaries are compact ordered hash tables optimized for real Python workloads.
```

Key properties:

```text id="jlwm0x"
fast average lookup
compact memory layout
ordered iteration
specialized string handling
cache-friendly probing
dynamic mutation support
```

Dictionaries are central to Python execution.

Much of CPython performance is indirectly dictionary performance.

## 77.37 Chapter Summary

CPython dictionaries are highly optimized compact hash tables.

Major features include:

```text id="rjlwm9"
open addressing
compact entries
insertion-order preservation
split dictionaries
cached hashes
version tags
string-key optimization
careful probing strategies
```

Dictionary performance affects nearly every Python subsystem:

```text id="9jlwmc"
attribute access
global lookup
imports
function calls
object namespaces
iteration
sets
```

Understanding dictionary internals is essential for understanding CPython runtime behavior and performance.
