5.7 Maps

Problem

You need to associate each key with a value and retrieve the value quickly by key.

A set answers membership questions. A map answers lookup questions. It stores pairs of the form (key, value) and maintains the rule that each key appears at most once.

Solution

Implement the map as a hash table whose entries store both keys and values.

Entry:
    key
    value

HashMap:
    buckets: array of bucket
    size: number of entries
    capacity: number of buckets

Lookup finds the bucket for the key, then scans that bucket for an equal key.

get(map, key):
    i = hash(key) mod map.capacity

    for each entry in map.buckets[i]:
        if entry.key == key:
            return entry.value

    return not_found

Insertion either updates an existing key or inserts a new pair.

put(map, key, value):
    i = hash(key) mod map.capacity

    for each entry in map.buckets[i]:
        if entry.key == key:
            entry.value = value
            return updated

    append (key, value) to map.buckets[i]
    map.size = map.size + 1
    return inserted

Deletion removes the matching key-value pair.

remove(map, key):
    i = hash(key) mod map.capacity

    for each entry in map.buckets[i]:
        if entry.key == key:
            remove entry
            map.size = map.size - 1
            return true

    return false

Discussion

A map is a partial function from keys to values. For a key type K and a value type V, a map represents some finite subset of associations from K to V. The same key cannot have two different active values at the same time. Updating a key replaces its old value.

This uniqueness rule is the main difference between a map and a list of pairs. A list of pairs may contain repeated keys, which forces the algorithm to define whether the first, last, or all matching pairs matter. A map removes that ambiguity. A lookup returns the one value associated with the key, or reports that the key is absent.

Maps are used whenever an algorithm needs direct access to state attached to an identifier. Examples include user records by user id, distances by graph vertex, counts by token, metadata by file path, and cached results by input.

The hash table backing the map provides expected fast access, but the logical abstraction is more important than the implementation. A map should expose operations such as get, put, remove, and contains_key. The caller should not need to know whether the implementation uses chaining, open addressing, or a balanced tree.

Correctness

The correctness invariant has two parts.

First, every entry must be placed in the bucket selected by its key.

bucket_index = hash(entry.key) mod capacity

Second, no two entries in the map may have equal keys.

Lookup is correct because it examines the only bucket in which the key can be stored. If an equal key is found, the associated value is returned. If no equal key is found in that bucket, the placement invariant implies that the key is absent from the whole map.

Insertion is correct because it first searches the target bucket. If the key already exists, the operation updates the value and preserves uniqueness. If the key does not exist in that bucket, the placement invariant implies that it does not exist elsewhere, so adding a new entry preserves uniqueness.

Deletion is correct because it removes the only possible entry for the key.

Complexity

With a good hash function and bounded load factor, get, put, and remove take expected O(1) time.

In the worst case, these operations take O(n) time if many keys collide.

Resizing costs O(n) when it occurs. With geometric growth, insertion remains expected amortized O(1).

Space usage is O(n + m), where n is the number of entries and m is the number of buckets. Since m is usually proportional to n, the space usage is commonly described as O(n).

Example

Suppose you want to store the shortest known distance to each vertex during a graph search.

distance = empty map

put(distance, "A", 0)
put(distance, "B", 5)
put(distance, "C", 9)

get(distance, "B") returns 5
get(distance, "D") returns not_found

If a better path to "B" is found, update the existing value:

put(distance, "B", 3)

Now:

get(distance, "B") returns 3

The key "B" still appears once. Its value has changed.

Common Patterns

Frequency counting uses a map from item to count.

count(items):
    freq = empty map

    for item in items:
        old = get(freq, item)
        if old is not_found:
            put(freq, item, 1)
        else:
            put(freq, item, old + 1)

    return freq

Grouping uses a map from group key to list.

group_by(items, key_of):
    groups = empty map

    for item in items:
        k = key_of(item)
        group = get(groups, k)

        if group is not_found:
            group = empty list
            put(groups, k, group)

        append item to group

    return groups

Caching uses a map from input to computed result.

cached_compute(cache, input):
    value = get(cache, input)

    if value is not_found:
        value = compute(input)
        put(cache, input, value)

    return value

Implementation Notes

Separate get from contains_key when values may be nullable or optional. Otherwise, a missing key and a stored null value can be confused.

Use put semantics deliberately. Some algorithms need update-or-insert. Others should reject duplicate keys.

For performance-sensitive code, avoid hashing the same key repeatedly. Some APIs expose an entry interface that performs one lookup and then allows insertion or update.

Preserve key immutability. Mutating a key after insertion can make the entry unreachable.

When values are large, store references or handles rather than copying large objects on every update.

Common Mistakes

Confusing absent keys with present keys whose value is null, zero, false, or empty.

Allowing duplicate equal keys.

Using a map when sorted traversal is required.

Using a map when multiple values per key are required. Use a map from key to list, set, or multiset instead.

Mutating keys after insertion.

Forgetting that map iteration order is usually unspecified.