Compact dict memory layout, hash collision probing strategy, split-table sharing, and lookup specialization.
Dictionaries are one of the most important data structures in CPython.
Python uses dictionaries for:
global namespaces
module namespaces
instance attributes
class attributes
keyword arguments
caches
sets internals
JSON-like data
configuration objects
symbol tablesMany Python operations ultimately become dictionary operations.
Examples:
x = globals()["name"]
obj.attr
module.value
kwargs["limit"]Because dictionaries are everywhere, CPython invests heavily in dictionary performance.
Modern CPython dictionaries are compact, cache-friendly, insertion-ordered hash tables with specialized optimizations for common Python workloads.
77.1 Basic Dictionary Semantics
A dictionary maps keys to values.
Example:
d = {
"name": "Alice",
"age": 30
}Core operations:
insert key/value
lookup key
update value
delete key
iterate entriesPython dictionaries support arbitrary hashable keys.
Examples:
d[1]
d["x"]
d[(1, 2)]
d[custom_object]The implementation must therefore support:
dynamic types
custom hashing
custom equality
resizing
mutation during executionwhile remaining fast.
77.2 Hash Tables
CPython dictionaries are hash tables.
A hash table works by:
compute hash(key)
map hash to table position
probe for matching entryExample:
d["name"]Conceptually:
hash("name")
↓
table index
↓
probe sequence
↓
find matching key
↓
return valueThe goal is average-case constant-time lookup.
77.3 Hash Values
Every dictionary key must be hashable.
A hashable object provides:
__hash__()
__eq__()CPython stores the computed hash value alongside dictionary entries.
This avoids recomputing hashes repeatedly during probing.
Conceptually, an entry contains:
hash
key pointer
value pointerCaching hash values matters because string hashing and custom object hashing can be expensive.
77.4 Open Addressing
CPython dictionaries use open addressing rather than separate chaining.
Separate chaining:
table slot
↓
linked list of entriesOpen addressing:
table slot
↓
probe nearby slotsAdvantages of open addressing:
better cache locality
fewer allocations
compact memory layoutDisadvantages:
more complex probing logic
performance sensitive to load factor
deletion handling complexityCPython strongly favors locality and compactness.
77.5 Probing
Collisions occur when different keys map to the same slot.
CPython resolves collisions using probing.
Conceptually:
initial slot occupied
↓
probe another slot
↓
probe another slot
↓
continue until match or empty slotThe probing sequence is designed to spread entries across the table while remaining cache-friendly.
Good probing behavior is critical for performance.
Poor probing causes:
long lookup chains
cache misses
branch mispredictions
slower inserts
slower deletions77.6 Dictionary Entries
Modern CPython dictionaries separate metadata from entries.
Conceptually:
indices array
entries arrayThe indices table maps hash positions to entry positions.
The entries table stores actual key/value data.
This design improves compactness and iteration performance.
Older dictionary layouts stored many unused fields directly in hash slots. Modern layouts reduce wasted space.
77.7 Compact Dictionaries
Modern CPython dictionaries are compact.
A compact dictionary minimizes:
unused pointer fields
padding
sparse entry storage
allocation overheadThis matters because Python programs often create huge numbers of small dictionaries:
{"x": 1}
{"name": "..."}
{"id": 123}Memory efficiency directly affects:
cache locality
allocation cost
overall process memory usageCompact dictionaries were one of the major dictionary redesigns in CPython 3.6.
77.8 Insertion Order Preservation
Modern Python dictionaries preserve insertion order.
Example:
d = {}
d["a"] = 1
d["b"] = 2
d["c"] = 3
print(list(d.keys()))Output:
['a', 'b', 'c']This behavior became a language guarantee in Python 3.7.
The compact dictionary layout naturally supported ordered iteration efficiently.
Importantly:
ordered iterationdoes not mean:
sorted orderThe order reflects insertion history.
77.9 Split Dictionaries
CPython uses split dictionaries for many object instance dictionaries.
Consider:
class Point:
def __init__(self, x, y):
self.x = x
self.y = yMany Point instances share the same attribute names:
x
yA naive representation duplicates key objects in every instance dictionary.
Split dictionaries separate:
shared keys
per-instance valuesConceptually:
class layout:
keys = [x, y]
instance 1:
values = [1, 2]
instance 2:
values = [5, 8]This saves memory across large numbers of similar objects.
77.10 Combined Dictionaries
Not all dictionaries use split layout.
Ordinary dictionaries usually use combined layout:
entry contains:
hash
key
valueCombined dictionaries are simpler and flexible.
Split dictionaries optimize common instance attribute patterns.
CPython chooses layouts based on usage.
77.11 String-Key Optimization
Many Python dictionaries use string keys:
{"name": "Alice"}
{"id": 123}
globals()CPython optimizes string-key lookup heavily.
Advantages of string keys:
cached hash values
interning opportunities
pointer comparisons
stable immutabilityAttribute dictionaries and global namespaces benefit strongly from string-key specialization.
77.12 Interned Strings
Some strings are interned.
Interning means identical immutable strings may share the same object.
Example:
"a" is "a"often evaluates to:
Truefor interned strings.
Interning helps dictionary performance because equality checks may become:
pointer comparisoninstead of character-by-character comparison.
CPython interns many identifier-like strings automatically.
77.13 Hash Randomization
CPython uses randomized hashing for some types such as strings.
Without randomization, attackers could deliberately construct many colliding keys:
same hash
different stringsThis could force pathological probing behavior:
O(n²)-like degradationHash randomization makes collision attacks harder.
The randomized seed changes between processes.
This improves security for network-facing programs.
77.14 Load Factor
Dictionary performance depends on load factor.
Load factor means:
used_slots / total_slotsIf the table becomes too full:
probing chains grow longer
performance degradesCPython resizes dictionaries before they become overly crowded.
This trades memory for speed.
Dictionaries intentionally keep unused capacity to maintain efficient probing.
77.15 Resizing
When a dictionary grows large enough, CPython resizes it.
Conceptually:
allocate larger table
rehash entries
move entries
replace old tableResizing is expensive, but infrequent.
Average insertion remains approximately constant-time because resizing happens geometrically rather than linearly.
A single insertion may trigger a large resize operation, but most insertions do not.
77.16 Deletion
Deletion is tricky in open-addressed hash tables.
Removing an entry completely can break probe chains.
CPython therefore uses tombstone-like markers.
Conceptually:
slot previously occupied
now marked deletedDeleted slots remain part of probing logic.
Over time, many deletions can degrade table quality.
Resizing and rebuilding help clean up deleted markers.
77.17 Lookup Fast Paths
Dictionary lookup is highly optimized.
Hot lookup paths avoid:
unnecessary allocations
repeated hash computation
generic comparisonsOptimizations include:
cached hashes
string specialization
pointer equality checks
compact probing
inline helper functionsDictionary lookup is one of the most performance-critical operations in CPython.
77.18 Globals and Builtins
Global lookup depends heavily on dictionaries.
Example:
print(len(xs))This may involve:
globals dictionary lookup
builtins dictionary lookupCPython uses inline caches and dictionary version tags to optimize repeated lookups.
Still, the underlying namespace representation is dictionary-based.
77.19 Attribute Access
Object attribute access often becomes dictionary lookup.
Example:
obj.xmay involve:
instance dictionary lookup
class dictionary lookup
descriptor lookupDictionary performance directly affects attribute access performance.
This is one reason instance dictionary optimization matters so much.
77.20 Dictionary Version Tags
Modern CPython dictionaries carry version information.
A version changes when the dictionary mutates.
Inline caches can use this:
if dict version unchanged:
cached lookup still validThis enables fast global and attribute lookup specialization.
Without version tags, caches would need more expensive validation logic.
77.21 Iteration Performance
Dictionary iteration is optimized for compact sequential traversal.
Example:
for k, v in d.items():
...The compact entries layout improves iteration locality.
Iteration cost depends partly on:
table density
cache locality
number of deleted slotsCompact dictionaries improved iteration speed significantly compared to older sparse layouts.
77.22 Memory Layout
Modern dictionary memory layout prioritizes:
compactness
locality
reduced pointer chasing
efficient iterationThe internal structures are carefully arranged to minimize wasted memory.
This matters because dictionaries are among the most common objects in Python programs.
Small per-dictionary savings scale enormously across large applications.
77.23 Dictionaries and Sets
Sets reuse much of the same hash-table machinery.
A set is conceptually:
dictionary with keys onlyMany probing, hashing, resizing, and collision-resolution mechanisms are shared.
Set performance characteristics therefore resemble dictionary performance characteristics.
77.24 Equality and Hashing Costs
Dictionary performance depends not only on table structure but also on key behavior.
Expensive key operations hurt performance:
class Slow:
def __hash__(self):
...
def __eq__(self, other):
...Lookup may require:
hash computation
equality comparisons
probingGood hash functions should:
distribute keys evenly
be stable
be reasonably fastPoor hashing increases collisions.
77.25 Unicode Keys
Unicode strings are common dictionary keys.
CPython optimizes Unicode representation heavily because dictionaries and attribute lookup depend on them constantly.
Important properties:
immutable
cached hash
interning support
compact representationEfficient Unicode handling is deeply tied to dictionary performance.
77.26 Why Dictionaries Are Fast
Python dictionaries are fast because they combine:
compact layout
cache locality
open addressing
cached hashes
careful probing
version tagging
specialized string handling
resizing heuristicsThe implementation is heavily tuned for real Python workloads.
This is one of the most engineered subsystems in CPython.
77.27 Worst-Case Behavior
Average lookup is near constant-time.
Worst-case behavior can still degrade.
Bad cases include:
heavy collisions
pathological hashing
extreme deletion patterns
adversarial inputsHash randomization helps defend against deliberate attacks.
Still, no practical hash table guarantees perfect constant time in all cases.
77.28 Dictionary Views
Dictionary methods such as:
d.keys()
d.values()
d.items()return dynamic view objects.
These views reflect dictionary mutations.
Example:
d = {"a": 1}
k = d.keys()
d["b"] = 2
print(k)The view updates automatically.
The implementation avoids copying dictionary contents unnecessarily.
77.29 OrderedDict
Modern dictionaries preserve insertion order, but collections.OrderedDict still exists.
OrderedDict provides additional ordering-related operations:
move_to_end
reordering behavior
specialized equality semanticsOrdinary dictionaries preserve insertion order efficiently without a linked-list overlay.
77.30 Small Dictionaries
Small dictionaries are extremely common.
Examples:
{"x": 1}
{"id": 3}
{"name": "Alice"}CPython optimizes small-table behavior carefully because even tiny improvements matter globally.
Common goals:
fewer allocations
compact storage
reduced metadata
efficient resizing77.31 Dictionaries and the Compiler
The compiler also depends on dictionaries.
Examples:
symbol tables
constant tables
name lookup tables
module namespacesEfficient dictionaries improve both runtime execution and compilation.
77.32 Dictionaries and the Import System
The import system heavily uses dictionaries.
Examples:
module globals
sys.modules
package namespaces
import cachesImport performance depends partly on dictionary efficiency.
77.33 Dictionaries and Object Models
Python’s dynamic object model depends fundamentally on dictionaries.
Features such as:
dynamic attributes
monkey patching
runtime method replacement
dynamic modules
dynamic globalsall rely on dictionary mutation.
The flexibility of Python objects comes partly from dictionary-backed namespaces.
77.34 Practical Performance Implications
Dictionary internals explain many practical performance patterns.
| Pattern | Reason |
|---|---|
| Local variables faster than globals | Globals require dictionary lookup |
| Attribute access costs matter | Often requires dictionary operations |
| Stable attribute layouts help | Better cache specialization |
| String keys are efficient | Optimized hashing and lookup |
| Large numbers of small objects cost memory | Instance dictionaries consume space |
These are not arbitrary quirks. They follow from the object model and dictionary implementation.
77.35 Reading Dictionary Source Code
Important files:
| File | Purpose |
|---|---|
Objects/dictobject.c | Main dictionary implementation |
Include/cpython/dictobject.h | Public dictionary structures |
Objects/setobject.c | Related set implementation |
Python/ceval.c | Global lookup and inline caches |
Objects/object.c | Generic object operations |
dictobject.c is one of the most performance-sensitive files in CPython.
Many implementation details exist primarily for locality and branch efficiency.
77.36 Mental Model
A useful mental model:
CPython dictionaries are compact ordered hash tables optimized for real Python workloads.Key properties:
fast average lookup
compact memory layout
ordered iteration
specialized string handling
cache-friendly probing
dynamic mutation supportDictionaries are central to Python execution.
Much of CPython performance is indirectly dictionary performance.
77.37 Chapter Summary
CPython dictionaries are highly optimized compact hash tables.
Major features include:
open addressing
compact entries
insertion-order preservation
split dictionaries
cached hashes
version tags
string-key optimization
careful probing strategiesDictionary performance affects nearly every Python subsystem:
attribute access
global lookup
imports
function calls
object namespaces
iteration
setsUnderstanding dictionary internals is essential for understanding CPython runtime behavior and performance.