5.17 Cache Behavior

Problem

Hash tables are expected constant time structures, but real performance often deviates from that expectation due to memory hierarchy effects. Poor cache locality can dominate runtime even when asymptotic complexity is optimal.

You need to understand how hash table design interacts with CPU caches and memory access patterns.

Solution

Design hash-based structures to maximize spatial locality and minimize unpredictable memory access.

Key principles:

- Prefer contiguous memory layouts
- Reduce pointer indirection
- Keep probe sequences short
- Align data to cache lines

Discussion

Modern CPUs operate on a memory hierarchy. Access to registers is fastest, followed by L1 cache, then L2, L3, and finally main memory. A cache miss can cost orders of magnitude more than an arithmetic operation.

Hash tables often suffer from poor locality because hashing spreads keys across memory. Each lookup may access a seemingly random location, causing cache misses.

Separate chaining typically uses pointers to linked structures. A lookup touches the bucket, then follows pointers to entries. Each pointer may lead to a different cache line, increasing latency.

Open addressing stores entries in a contiguous array. Probing walks through adjacent memory locations. Even if multiple probes are needed, they often remain within the same cache line or nearby lines. This improves performance despite more comparisons.

Cache behavior explains why linear probing can outperform theoretically better collision strategies. Sequential memory access benefits from prefetching and cache line reuse.

Correctness

Cache behavior does not affect correctness. It affects only performance. All designs that preserve the placement invariant and equality checks remain logically correct.

Complexity

Asymptotic complexity does not capture cache effects. Two implementations with the same O(1) expected time may differ by large constant factors.

Effective cost model:

total_cost ≈ hash_cost + memory_access_cost + comparison_cost

Memory access cost dominates when accesses miss cache.

Reducing the number of cache misses often yields larger gains than reducing arithmetic operations.

Example

Compare two implementations:

Separate chaining:

bucket -> pointer -> entry -> pointer -> next entry

Open addressing:

array[i], array[i+1], array[i+2], ...

In chaining, each pointer dereference may cause a cache miss. In open addressing, probes access nearby memory, often already loaded into cache.

Even if chaining examines fewer entries, it may be slower due to memory latency.

Implementation Notes

Use arrays or small vectors instead of linked lists for buckets.

Group metadata and data together to reduce memory jumps.

Use structure-of-arrays layouts when scanning many entries.

Avoid large objects inline in the table. Store references or compact representations.

Keep load factor within a range that avoids long probe sequences.

Use prefetching when available for predictable probe patterns.

Align buckets or entries to cache line boundaries when possible.

Common Mistakes

Assuming fewer operations always means faster execution.

Using pointer-heavy structures in performance-critical paths.

Ignoring memory layout when designing data structures.

Allowing load factor to grow too high, increasing probe length and cache misses.

Benchmarking only on small inputs that fit entirely in cache.