Partition a collection into groups by key using a hash map that accumulates values into per-key lists or sets.
Problem
You need to partition items into groups according to a derived key.
A counting map records how many times each key appears. A grouping map records the actual items associated with each key. This is useful when the later algorithm needs to process each group separately.
Solution
Use a map from group key to list of items.
groups: Map<Key, List<Item>>For each item, compute its group key and append the item to the corresponding list.
group_by(items, key_of):
groups = empty map
for item in items:
key = key_of(item)
bucket = get(groups, key)
if bucket is not_found:
bucket = empty list
put(groups, key, bucket)
append item to bucket
return groupsIf the map API supports default insertion, the same pattern can be written more directly.
group_by(items, key_of):
groups = empty map
for item in items:
key = key_of(item)
append item to groups.get_or_insert(key, empty list)
return groupsDiscussion
Grouping is a common way to convert a flat sequence into a keyed collection of smaller sequences. The key function defines the partition. Items that produce equal keys are placed in the same group. Items that produce different keys are placed in different groups.
The key function must be stable during grouping. If the key depends on mutable external state, the result can be inconsistent. For example, grouping log entries by a time window is valid if the window calculation is deterministic. Grouping by a changing clock value while processing is usually wrong.
The map stores one list per distinct key. This makes the number of map entries equal to the number of groups, not the number of items. The items themselves are stored inside the lists. This is useful when duplicates should be preserved.
Grouping differs from counting. Counting discards the individual items and stores only a number. Grouping keeps the items and therefore uses more memory. Use grouping only when the later computation needs the grouped contents.
Correctness
The correctness invariant is that after processing the first i items, groups[k] contains exactly the processed items whose derived key equals k, in their original order.
Initially, the map is empty, so the invariant holds.
When the algorithm processes an item x, it computes key_of(x) = k. If group k already exists, the algorithm appends x to that group. If group k does not exist, it creates an empty group and appends x. No other group is changed. Therefore, the invariant is preserved.
After all items are processed, each item appears in exactly one group, namely the group selected by its key.
Complexity
Let n be the number of items and g be the number of distinct group keys.
Each item causes one key computation, one expected constant time map lookup, and one append. If list append is amortized constant time, the total expected time is O(n) plus the cost of computing keys.
Space usage is O(n + g). The lists store all n items, and the map stores g group entries.
Example
Group words by their first letter.
items = ["go", "lean", "graph", "rust", "lambda", "regex"]Use this key function:
key_of(word):
return first character of wordThe resulting map is:
{
"g": ["go", "graph"],
"l": ["lean", "lambda"],
"r": ["rust", "regex"]
}The order inside each group follows the original input order.
Implementation Notes
Use lists or dynamic arrays for groups when append is the dominant operation.
Use sets instead of lists when each group should contain unique values.
Use a priority queue per key when each group must expose its smallest or largest item efficiently.
For large inputs, avoid copying items into groups if references or indexes are sufficient.
For streaming systems, consider bounded groups or periodic flushing, because grouping stores all items unless the groups are consumed incrementally.
Common Mistakes
Using a counting map when the later algorithm needs the actual grouped items.
Assuming group iteration order is stable when the underlying map does not guarantee it.
Using a mutable object as a group key and then modifying it.
Computing the group key inconsistently across different passes.
Creating many tiny groups when a coarser key would be sufficient.