5.23 Cache-Aware Hashing

Problem

Hash tables often fail to achieve their theoretical performance because memory access patterns dominate runtime. Even with good hashing and controlled load factor, repeated cache misses can make operations significantly slower than expected.

You need a design that aligns hash table behavior with CPU cache structure.

Solution

Structure the hash table to maximize spatial locality and reduce pointer chasing.

Core strategies:

- use contiguous storage for entries
- reduce per-bucket pointer structures
- keep frequently accessed data together
- align memory to cache lines

A cache-friendly hash table layout:

table:
    array of entries

entry:
    key
    value
    hash_prefix

Optional compact probing:

index = hash(key)
probe within adjacent memory slots

Discussion

A hash table is not only an algorithmic structure but also a memory access pattern. CPU performance depends heavily on how data is laid out in memory.

Traditional chaining uses pointers:

bucket -> node -> node -> node

Each pointer dereference may trigger a cache miss. Even if the number of comparisons is small, memory latency dominates runtime.

Open addressing improves locality by storing entries in a contiguous array. Probes move through adjacent memory locations, which are more likely to be preloaded into cache.

However, even open addressing can suffer if the table is poorly sized or if probe sequences become long.

Cache-aware design focuses on reducing the number of distinct cache lines touched per operation.

Key principle

Most hash table operations should access:

1 hash computation
1 to 3 cache lines
minimal pointer dereferences

Correctness

Cache-aware transformations do not change logical correctness.

Correctness depends on:

- consistent hash computation
- correct equality checks
- correct probe or bucket traversal rules

Memory layout only affects how quickly these rules are executed.

As long as all candidate positions are examined according to the probing or bucket rule, correctness is preserved.

Complexity

Asymptotic complexity remains:

expected O(1)

Worst case remains:

O(n)

Cache-aware design improves constant factors, not asymptotic bounds.

In practice, performance differences are large:

cache miss: ~50–150 ns
arithmetic operation: ~1 ns

Thus reducing cache misses often has more impact than algorithmic optimization.

Example

Compare two designs:

Pointer-based chaining

bucket:
    node -> node -> node

Each node may reside in a different cache line.

Compact array storage

entries:
    [k1, v1]
    [k2, v2]
    [k3, v3]

Sequential scanning benefits from prefetching and cache locality.

Even if both implementations perform similar comparisons, the second is significantly faster in real systems.

Implementation Notes

Prefer struct-of-arrays or tightly packed arrays in performance-critical systems.

Store hash prefixes alongside keys to avoid full key comparisons when possible.

Align entries to cache line boundaries when beneficial.

Minimize dynamic allocation per entry.

Avoid linked structures inside buckets unless necessary for correctness or hybrid fallback.

Keep probe sequences short by controlling load factor aggressively.

Use prefetching hints in systems where manual optimization is available.

Common Mistakes

Assuming asymptotic complexity reflects real performance.

Using pointer-heavy structures in hot paths.

Ignoring cache line boundaries and memory layout.

Over-optimizing hashing while neglecting memory access patterns.

Allowing long probe sequences due to high load factor.

Treating caching as an implementation detail rather than a core design constraint.