Skip to content

77. Dictionary Performance

Compact dict memory layout, hash collision probing strategy, split-table sharing, and lookup specialization.

Dictionaries are one of the most important data structures in CPython.

Python uses dictionaries for:

global namespaces
module namespaces
instance attributes
class attributes
keyword arguments
caches
sets internals
JSON-like data
configuration objects
symbol tables

Many Python operations ultimately become dictionary operations.

Examples:

x = globals()["name"]
obj.attr
module.value
kwargs["limit"]

Because dictionaries are everywhere, CPython invests heavily in dictionary performance.

Modern CPython dictionaries are compact, cache-friendly, insertion-ordered hash tables with specialized optimizations for common Python workloads.

77.1 Basic Dictionary Semantics

A dictionary maps keys to values.

Example:

d = {
    "name": "Alice",
    "age": 30
}

Core operations:

insert key/value
lookup key
update value
delete key
iterate entries

Python dictionaries support arbitrary hashable keys.

Examples:

d[1]
d["x"]
d[(1, 2)]
d[custom_object]

The implementation must therefore support:

dynamic types
custom hashing
custom equality
resizing
mutation during execution

while remaining fast.

77.2 Hash Tables

CPython dictionaries are hash tables.

A hash table works by:

compute hash(key)
map hash to table position
probe for matching entry

Example:

d["name"]

Conceptually:

hash("name")
table index
probe sequence
find matching key
return value

The goal is average-case constant-time lookup.

77.3 Hash Values

Every dictionary key must be hashable.

A hashable object provides:

__hash__()
__eq__()

CPython stores the computed hash value alongside dictionary entries.

This avoids recomputing hashes repeatedly during probing.

Conceptually, an entry contains:

hash
key pointer
value pointer

Caching hash values matters because string hashing and custom object hashing can be expensive.

77.4 Open Addressing

CPython dictionaries use open addressing rather than separate chaining.

Separate chaining:

table slot
linked list of entries

Open addressing:

table slot
probe nearby slots

Advantages of open addressing:

better cache locality
fewer allocations
compact memory layout

Disadvantages:

more complex probing logic
performance sensitive to load factor
deletion handling complexity

CPython strongly favors locality and compactness.

77.5 Probing

Collisions occur when different keys map to the same slot.

CPython resolves collisions using probing.

Conceptually:

initial slot occupied
probe another slot
probe another slot
continue until match or empty slot

The probing sequence is designed to spread entries across the table while remaining cache-friendly.

Good probing behavior is critical for performance.

Poor probing causes:

long lookup chains
cache misses
branch mispredictions
slower inserts
slower deletions

77.6 Dictionary Entries

Modern CPython dictionaries separate metadata from entries.

Conceptually:

indices array
entries array

The indices table maps hash positions to entry positions.

The entries table stores actual key/value data.

This design improves compactness and iteration performance.

Older dictionary layouts stored many unused fields directly in hash slots. Modern layouts reduce wasted space.

77.7 Compact Dictionaries

Modern CPython dictionaries are compact.

A compact dictionary minimizes:

unused pointer fields
padding
sparse entry storage
allocation overhead

This matters because Python programs often create huge numbers of small dictionaries:

{"x": 1}
{"name": "..."}
{"id": 123}

Memory efficiency directly affects:

cache locality
allocation cost
overall process memory usage

Compact dictionaries were one of the major dictionary redesigns in CPython 3.6.

77.8 Insertion Order Preservation

Modern Python dictionaries preserve insertion order.

Example:

d = {}
d["a"] = 1
d["b"] = 2
d["c"] = 3

print(list(d.keys()))

Output:

['a', 'b', 'c']

This behavior became a language guarantee in Python 3.7.

The compact dictionary layout naturally supported ordered iteration efficiently.

Importantly:

ordered iteration

does not mean:

sorted order

The order reflects insertion history.

77.9 Split Dictionaries

CPython uses split dictionaries for many object instance dictionaries.

Consider:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

Many Point instances share the same attribute names:

x
y

A naive representation duplicates key objects in every instance dictionary.

Split dictionaries separate:

shared keys
per-instance values

Conceptually:

class layout:
    keys = [x, y]

instance 1:
    values = [1, 2]

instance 2:
    values = [5, 8]

This saves memory across large numbers of similar objects.

77.10 Combined Dictionaries

Not all dictionaries use split layout.

Ordinary dictionaries usually use combined layout:

entry contains:
    hash
    key
    value

Combined dictionaries are simpler and flexible.

Split dictionaries optimize common instance attribute patterns.

CPython chooses layouts based on usage.

77.11 String-Key Optimization

Many Python dictionaries use string keys:

{"name": "Alice"}
{"id": 123}
globals()

CPython optimizes string-key lookup heavily.

Advantages of string keys:

cached hash values
interning opportunities
pointer comparisons
stable immutability

Attribute dictionaries and global namespaces benefit strongly from string-key specialization.

77.12 Interned Strings

Some strings are interned.

Interning means identical immutable strings may share the same object.

Example:

"a" is "a"

often evaluates to:

True

for interned strings.

Interning helps dictionary performance because equality checks may become:

pointer comparison

instead of character-by-character comparison.

CPython interns many identifier-like strings automatically.

77.13 Hash Randomization

CPython uses randomized hashing for some types such as strings.

Without randomization, attackers could deliberately construct many colliding keys:

same hash
different strings

This could force pathological probing behavior:

O(n²)-like degradation

Hash randomization makes collision attacks harder.

The randomized seed changes between processes.

This improves security for network-facing programs.

77.14 Load Factor

Dictionary performance depends on load factor.

Load factor means:

used_slots / total_slots

If the table becomes too full:

probing chains grow longer
performance degrades

CPython resizes dictionaries before they become overly crowded.

This trades memory for speed.

Dictionaries intentionally keep unused capacity to maintain efficient probing.

77.15 Resizing

When a dictionary grows large enough, CPython resizes it.

Conceptually:

allocate larger table
rehash entries
move entries
replace old table

Resizing is expensive, but infrequent.

Average insertion remains approximately constant-time because resizing happens geometrically rather than linearly.

A single insertion may trigger a large resize operation, but most insertions do not.

77.16 Deletion

Deletion is tricky in open-addressed hash tables.

Removing an entry completely can break probe chains.

CPython therefore uses tombstone-like markers.

Conceptually:

slot previously occupied
now marked deleted

Deleted slots remain part of probing logic.

Over time, many deletions can degrade table quality.

Resizing and rebuilding help clean up deleted markers.

77.17 Lookup Fast Paths

Dictionary lookup is highly optimized.

Hot lookup paths avoid:

unnecessary allocations
repeated hash computation
generic comparisons

Optimizations include:

cached hashes
string specialization
pointer equality checks
compact probing
inline helper functions

Dictionary lookup is one of the most performance-critical operations in CPython.

77.18 Globals and Builtins

Global lookup depends heavily on dictionaries.

Example:

print(len(xs))

This may involve:

globals dictionary lookup
builtins dictionary lookup

CPython uses inline caches and dictionary version tags to optimize repeated lookups.

Still, the underlying namespace representation is dictionary-based.

77.19 Attribute Access

Object attribute access often becomes dictionary lookup.

Example:

obj.x

may involve:

instance dictionary lookup
class dictionary lookup
descriptor lookup

Dictionary performance directly affects attribute access performance.

This is one reason instance dictionary optimization matters so much.

77.20 Dictionary Version Tags

Modern CPython dictionaries carry version information.

A version changes when the dictionary mutates.

Inline caches can use this:

if dict version unchanged:
    cached lookup still valid

This enables fast global and attribute lookup specialization.

Without version tags, caches would need more expensive validation logic.

77.21 Iteration Performance

Dictionary iteration is optimized for compact sequential traversal.

Example:

for k, v in d.items():
    ...

The compact entries layout improves iteration locality.

Iteration cost depends partly on:

table density
cache locality
number of deleted slots

Compact dictionaries improved iteration speed significantly compared to older sparse layouts.

77.22 Memory Layout

Modern dictionary memory layout prioritizes:

compactness
locality
reduced pointer chasing
efficient iteration

The internal structures are carefully arranged to minimize wasted memory.

This matters because dictionaries are among the most common objects in Python programs.

Small per-dictionary savings scale enormously across large applications.

77.23 Dictionaries and Sets

Sets reuse much of the same hash-table machinery.

A set is conceptually:

dictionary with keys only

Many probing, hashing, resizing, and collision-resolution mechanisms are shared.

Set performance characteristics therefore resemble dictionary performance characteristics.

77.24 Equality and Hashing Costs

Dictionary performance depends not only on table structure but also on key behavior.

Expensive key operations hurt performance:

class Slow:
    def __hash__(self):
        ...

    def __eq__(self, other):
        ...

Lookup may require:

hash computation
equality comparisons
probing

Good hash functions should:

distribute keys evenly
be stable
be reasonably fast

Poor hashing increases collisions.

77.25 Unicode Keys

Unicode strings are common dictionary keys.

CPython optimizes Unicode representation heavily because dictionaries and attribute lookup depend on them constantly.

Important properties:

immutable
cached hash
interning support
compact representation

Efficient Unicode handling is deeply tied to dictionary performance.

77.26 Why Dictionaries Are Fast

Python dictionaries are fast because they combine:

compact layout
cache locality
open addressing
cached hashes
careful probing
version tagging
specialized string handling
resizing heuristics

The implementation is heavily tuned for real Python workloads.

This is one of the most engineered subsystems in CPython.

77.27 Worst-Case Behavior

Average lookup is near constant-time.

Worst-case behavior can still degrade.

Bad cases include:

heavy collisions
pathological hashing
extreme deletion patterns
adversarial inputs

Hash randomization helps defend against deliberate attacks.

Still, no practical hash table guarantees perfect constant time in all cases.

77.28 Dictionary Views

Dictionary methods such as:

d.keys()
d.values()
d.items()

return dynamic view objects.

These views reflect dictionary mutations.

Example:

d = {"a": 1}
k = d.keys()

d["b"] = 2

print(k)

The view updates automatically.

The implementation avoids copying dictionary contents unnecessarily.

77.29 OrderedDict

Modern dictionaries preserve insertion order, but collections.OrderedDict still exists.

OrderedDict provides additional ordering-related operations:

move_to_end
reordering behavior
specialized equality semantics

Ordinary dictionaries preserve insertion order efficiently without a linked-list overlay.

77.30 Small Dictionaries

Small dictionaries are extremely common.

Examples:

{"x": 1}
{"id": 3}
{"name": "Alice"}

CPython optimizes small-table behavior carefully because even tiny improvements matter globally.

Common goals:

fewer allocations
compact storage
reduced metadata
efficient resizing

77.31 Dictionaries and the Compiler

The compiler also depends on dictionaries.

Examples:

symbol tables
constant tables
name lookup tables
module namespaces

Efficient dictionaries improve both runtime execution and compilation.

77.32 Dictionaries and the Import System

The import system heavily uses dictionaries.

Examples:

module globals
sys.modules
package namespaces
import caches

Import performance depends partly on dictionary efficiency.

77.33 Dictionaries and Object Models

Python’s dynamic object model depends fundamentally on dictionaries.

Features such as:

dynamic attributes
monkey patching
runtime method replacement
dynamic modules
dynamic globals

all rely on dictionary mutation.

The flexibility of Python objects comes partly from dictionary-backed namespaces.

77.34 Practical Performance Implications

Dictionary internals explain many practical performance patterns.

PatternReason
Local variables faster than globalsGlobals require dictionary lookup
Attribute access costs matterOften requires dictionary operations
Stable attribute layouts helpBetter cache specialization
String keys are efficientOptimized hashing and lookup
Large numbers of small objects cost memoryInstance dictionaries consume space

These are not arbitrary quirks. They follow from the object model and dictionary implementation.

77.35 Reading Dictionary Source Code

Important files:

FilePurpose
Objects/dictobject.cMain dictionary implementation
Include/cpython/dictobject.hPublic dictionary structures
Objects/setobject.cRelated set implementation
Python/ceval.cGlobal lookup and inline caches
Objects/object.cGeneric object operations

dictobject.c is one of the most performance-sensitive files in CPython.

Many implementation details exist primarily for locality and branch efficiency.

77.36 Mental Model

A useful mental model:

CPython dictionaries are compact ordered hash tables optimized for real Python workloads.

Key properties:

fast average lookup
compact memory layout
ordered iteration
specialized string handling
cache-friendly probing
dynamic mutation support

Dictionaries are central to Python execution.

Much of CPython performance is indirectly dictionary performance.

77.37 Chapter Summary

CPython dictionaries are highly optimized compact hash tables.

Major features include:

open addressing
compact entries
insertion-order preservation
split dictionaries
cached hashes
version tags
string-key optimization
careful probing strategies

Dictionary performance affects nearly every Python subsystem:

attribute access
global lookup
imports
function calls
object namespaces
iteration
sets

Understanding dictionary internals is essential for understanding CPython runtime behavior and performance.