Build a hash-based map that associates keys with values and supports insert, lookup, delete, and update in expected constant time.
Problem
You need to associate each key with a value and retrieve the value quickly by key.
A set answers membership questions. A map answers lookup questions. It stores pairs of the form (key, value) and maintains the rule that each key appears at most once.
Solution
Implement the map as a hash table whose entries store both keys and values.
Entry:
key
value
HashMap:
buckets: array of bucket
size: number of entries
capacity: number of bucketsLookup finds the bucket for the key, then scans that bucket for an equal key.
get(map, key):
i = hash(key) mod map.capacity
for each entry in map.buckets[i]:
if entry.key == key:
return entry.value
return not_foundInsertion either updates an existing key or inserts a new pair.
put(map, key, value):
i = hash(key) mod map.capacity
for each entry in map.buckets[i]:
if entry.key == key:
entry.value = value
return updated
append (key, value) to map.buckets[i]
map.size = map.size + 1
return insertedDeletion removes the matching key-value pair.
remove(map, key):
i = hash(key) mod map.capacity
for each entry in map.buckets[i]:
if entry.key == key:
remove entry
map.size = map.size - 1
return true
return falseDiscussion
A map is a partial function from keys to values. For a key type K and a value type V, a map represents some finite subset of associations from K to V. The same key cannot have two different active values at the same time. Updating a key replaces its old value.
This uniqueness rule is the main difference between a map and a list of pairs. A list of pairs may contain repeated keys, which forces the algorithm to define whether the first, last, or all matching pairs matter. A map removes that ambiguity. A lookup returns the one value associated with the key, or reports that the key is absent.
Maps are used whenever an algorithm needs direct access to state attached to an identifier. Examples include user records by user id, distances by graph vertex, counts by token, metadata by file path, and cached results by input.
The hash table backing the map provides expected fast access, but the logical abstraction is more important than the implementation. A map should expose operations such as get, put, remove, and contains_key. The caller should not need to know whether the implementation uses chaining, open addressing, or a balanced tree.
Correctness
The correctness invariant has two parts.
First, every entry must be placed in the bucket selected by its key.
bucket_index = hash(entry.key) mod capacitySecond, no two entries in the map may have equal keys.
Lookup is correct because it examines the only bucket in which the key can be stored. If an equal key is found, the associated value is returned. If no equal key is found in that bucket, the placement invariant implies that the key is absent from the whole map.
Insertion is correct because it first searches the target bucket. If the key already exists, the operation updates the value and preserves uniqueness. If the key does not exist in that bucket, the placement invariant implies that it does not exist elsewhere, so adding a new entry preserves uniqueness.
Deletion is correct because it removes the only possible entry for the key.
Complexity
With a good hash function and bounded load factor, get, put, and remove take expected O(1) time.
In the worst case, these operations take O(n) time if many keys collide.
Resizing costs O(n) when it occurs. With geometric growth, insertion remains expected amortized O(1).
Space usage is O(n + m), where n is the number of entries and m is the number of buckets. Since m is usually proportional to n, the space usage is commonly described as O(n).
Example
Suppose you want to store the shortest known distance to each vertex during a graph search.
distance = empty map
put(distance, "A", 0)
put(distance, "B", 5)
put(distance, "C", 9)
get(distance, "B") returns 5
get(distance, "D") returns not_foundIf a better path to "B" is found, update the existing value:
put(distance, "B", 3)Now:
get(distance, "B") returns 3The key "B" still appears once. Its value has changed.
Common Patterns
Frequency counting uses a map from item to count.
count(items):
freq = empty map
for item in items:
old = get(freq, item)
if old is not_found:
put(freq, item, 1)
else:
put(freq, item, old + 1)
return freqGrouping uses a map from group key to list.
group_by(items, key_of):
groups = empty map
for item in items:
k = key_of(item)
group = get(groups, k)
if group is not_found:
group = empty list
put(groups, k, group)
append item to group
return groupsCaching uses a map from input to computed result.
cached_compute(cache, input):
value = get(cache, input)
if value is not_found:
value = compute(input)
put(cache, input, value)
return valueImplementation Notes
Separate get from contains_key when values may be nullable or optional. Otherwise, a missing key and a stored null value can be confused.
Use put semantics deliberately. Some algorithms need update-or-insert. Others should reject duplicate keys.
For performance-sensitive code, avoid hashing the same key repeatedly. Some APIs expose an entry interface that performs one lookup and then allows insertion or update.
Preserve key immutability. Mutating a key after insertion can make the entry unreachable.
When values are large, store references or handles rather than copying large objects on every update.
Common Mistakes
Confusing absent keys with present keys whose value is null, zero, false, or empty.
Allowing duplicate equal keys.
Using a map when sorted traversal is required.
Using a map when multiple values per key are required. Use a map from key to list, set, or multiset instead.
Mutating keys after insertion.
Forgetting that map iteration order is usually unspecified.