5.8 Counting Maps

Problem

You need to count how many times each key appears in a dataset.

A standard map stores one value per key. In counting problems, the value is a frequency. The task is to update counts efficiently while scanning input once.

Solution

Use a map from key to integer count.

freq: Map<Key, Int>

For each key, read the current count, then increment.

count(items):
    freq = empty map

    for item in items:
        old = get(freq, item)

        if old is not_found:
            put(freq, item, 1)
        else:
            put(freq, item, old + 1)

    return freq

A common shorthand uses a default value of zero.

count(items):
    freq = empty map

    for item in items:
        put(freq, item, get(freq, item, default=0) + 1)

    return freq

Discussion

A counting map is a specialization of a hash map where values are integers and updates follow a fixed rule. This pattern appears in text processing, log analysis, histogram construction, frequency analysis, and combinatorics.

The central invariant is that freq[k] equals the number of times key k has been processed so far. The update rule preserves this invariant by incrementing exactly once per occurrence.

The map avoids storing duplicates explicitly. Instead of keeping a list of repeated keys, it compresses them into a single entry with a count. This reduces memory usage when duplicates are frequent.

Counting maps also support reverse queries. Once counts are computed, you can ask for the most frequent key, keys above a threshold, or keys sorted by frequency. These operations typically require additional processing, such as scanning the map or building auxiliary structures.

Correctness

The correctness follows from induction on the input sequence.

Initially, the map is empty, so all counts are zero.

Assume after processing the first i items, the map stores correct counts. When processing item x at position i + 1, the algorithm reads the current count for x and increases it by one. For all other keys, counts remain unchanged. Therefore, after the update, the map reflects correct counts for the first i + 1 items.

Since each item is processed exactly once, the final map contains correct frequencies for all keys.

Complexity

Let n be the number of items and k be the number of distinct keys.

Each update performs a map lookup and a map insertion or update. With a good hash function and bounded load factor, each operation takes expected O(1) time.

The total time is expected O(n).

Space usage is O(k), since each distinct key appears once in the map.

Example

Count words in a list.

items = ["go", "lean", "go", "rust", "lean", "go"]

Processing step by step:

"go"    -> {"go": 1}
"lean"  -> {"go": 1, "lean": 1}
"go"    -> {"go": 2, "lean": 1}
"rust"  -> {"go": 2, "lean": 1, "rust": 1}
"lean"  -> {"go": 2, "lean": 2, "rust": 1}
"go"    -> {"go": 3, "lean": 2, "rust": 1}

Final result:

{"go": 3, "lean": 2, "rust": 1}

Variations

Use a map from key to list when you need to retain all occurrences, not only counts.

Use a map from key to set when you want to track unique associated values.

Use approximate counting structures when the number of distinct keys is very large and exact counts are not required.

Implementation Notes

Prefer a default value mechanism if the language provides it. This reduces branching and simplifies code.

Avoid repeated lookups when possible. Some APIs expose an entry interface that returns a reference to the stored value, allowing direct increment.

Be careful with integer overflow if counts can grow large. Use a wider integer type when needed.

For streaming data, combine counting maps with eviction or windowing policies to bound memory usage.

Common Mistakes

Initializing counts incorrectly, for example starting at zero and forgetting to increment.

Using a list instead of a map, which leads to quadratic time for large inputs.

Forgetting that absent keys imply count zero.

Using a map when approximate counts would be sufficient and more memory efficient.

Failing to consider integer overflow for very large counts.