Skip to content

Chapter 5. Hashing and Maps

Hash tables, hash functions, collision handling, probabilistic structures, and practical design for constant-time key-value access.

This chapter studies hashing as a practical mechanism for constant time access under realistic constraints. The focus is not on idealized O(1) claims, but on how hash tables behave under load, how collisions are handled, and how design choices affect predictability, memory locality, and adversarial robustness.

A hash table implements a partial function from keys to values by mapping keys into a finite index space. The mapping is defined by a hash function, and the core invariant is that equal keys must map to equal hashes. Everything else is a trade off between distribution quality, speed, and memory layout. The chapter makes these trade offs explicit so that you can choose an implementation that matches your workload.

Collision handling is treated as a first class concern. Separate chaining stores colliding entries in auxiliary structures, typically lists or small arrays. Open addressing stores all entries in the table itself and resolves collisions by probing. These approaches have different cache behavior and different failure modes. Chaining tolerates higher load factors but introduces pointer overhead. Open addressing improves locality but degrades sharply as the table fills. The chapter shows when each approach is appropriate and how to tune parameters such as load factor and probe sequence.

Hash function design is addressed from a systems perspective. A good hash function distributes inputs uniformly across buckets while remaining fast to compute. In practice, perfect uniformity is not required, but pathological clustering must be avoided. The chapter covers simple integer hashing, string hashing, and composite key hashing, with attention to mixing functions and avalanche properties. It also discusses deterministic versus randomized hashing, especially in contexts where adversarial inputs are possible.

Maps and sets are built on top of hashing. A set stores keys without associated values, while a map associates each key with a value. Common patterns include frequency counting, grouping, deduplication, and membership testing. These patterns appear repeatedly in higher level algorithms, so the chapter emphasizes minimal interfaces and predictable semantics. Operations such as insert, delete, and lookup are analyzed not only in average case terms but also in terms of worst case degradation.

Advanced probabilistic structures extend the basic hashing model. Bloom filters provide approximate membership with controlled false positive rates and fixed memory usage. Count-min sketches approximate frequency counts in streaming settings. These structures sacrifice exactness to gain space efficiency and speed. The chapter explains their guarantees, parameter choices, and appropriate use cases.

Another recurring theme is the interaction between hashing and memory systems. Cache locality often dominates theoretical complexity. Open addressing with linear probing can outperform more complex schemes because it accesses contiguous memory. Chaining can degrade performance due to pointer chasing. The chapter provides guidelines for aligning data structures with modern hardware behavior.

Finally, the chapter addresses correctness and robustness. Poor hash functions or incorrect equality definitions can silently corrupt results. Rehashing strategies must preserve all entries while resizing the table. Edge cases such as empty keys, large inputs, and high collision scenarios are examined systematically. Testing strategies include randomized input generation and adversarial cases designed to stress collision behavior.

By the end of the chapter, you will be able to design and implement hash based structures that behave predictably under load, select appropriate hashing strategies for different key types, and reason about performance beyond asymptotic notation.

5.1 Hash TablesA data structure that supports fast insertion, lookup, and deletion by mapping keys to bucket positions via a hash function.
6 min
5.2 Hash FunctionsDesign and evaluate hash functions that distribute keys uniformly across buckets while remaining fast to compute.
6 min
5.3 Collision HandlingResolve hash collisions using separate chaining or open addressing, with trade-offs in memory, locality, and load tolerance.
5 min
5.4 Load Factor and ResizingControl hash table performance by monitoring the load factor and growing the bucket array before collisions accumulate.
4 min
5.5 RehashingRebuild a hash table's bucket array after resizing so that every stored key satisfies the placement invariant for the new capacity.
4 min
5.6 SetsImplement a hash-based set for fast membership testing, insertion, and deletion without associated values.
4 min
5.7 MapsBuild a hash-based map that associates keys with values and supports insert, lookup, delete, and update in expected constant time.
5 min
5.8 Counting MapsTrack frequency counts for keys using a hash map that increments a counter on each insertion of an existing key.
4 min
5.9 Grouping KeysPartition a collection into groups by key using a hash map that accumulates values into per-key lists or sets.
4 min
5.10 Composite KeysHash and compare multi-field keys correctly by combining all fields that participate in equality into the hash function.
3 min
5.11 Rolling HashesCompute hash values for sliding windows over a sequence in constant time by incrementally updating rather than recomputing.
4 min
5.12 Bloom FiltersTest set membership approximately using a compact bit array and multiple hash functions, with no false negatives and bounded false positives.
4 min
5.13 Count-Min SketchApproximate frequency counting for large key streams using a compact probabilistic data structure with bounded error.
4 min
5.14 Consistent HashingDistribute keys across a dynamic set of nodes so that adding or removing nodes moves only a minimal fraction of keys.
3 min
5.15 Hash JoinsJoin two collections by key using a hash table to reduce the cost from quadratic to linear expected time.
3 min
5.16 Hash-Based DeduplicationRemove duplicate entries from a dataset or stream efficiently using hash sets for membership tracking.
3 min
5.17 Cache BehaviorUnderstand how memory hierarchy effects cause hash table performance to deviate from asymptotic expectations.
3 min
5.18 Attack ResistanceDefend hash tables against adversarial inputs that force worst-case collision behavior using randomized hashing.
4 min
5.19 Deterministic HashingProduce stable hash values that remain consistent across program runs, machines, builds, and language runtimes.
4 min
5.20 Testing Hash LogicVerify correctness, stability, and performance of hash-based structures through randomized and adversarial test strategies.
4 min
5.21 Consistent Performance GuaranteesAchieve predictable worst-case bounds for hash-based structures rather than relying solely on average-case expectations.
3 min
5.22 Hybrid Hash StructuresCombine hash tables with other data structures to handle skewed distributions, heavy deletions, and mixed workloads.
3 min
5.23 Cache-Aware HashingDesign hash table layouts that minimize cache misses and align memory access patterns with hardware behavior.
3 min
5.24 Real-World Hash Table DesignChoose and implement hash tables that perform reliably under mixed key types, uneven access patterns, and adversarial input.
3 min
5.25 Case StudiesExamine hash-based structures in complete systems: streaming pipelines, graph algorithms, caches, and distributed workflows.
4 min