# 5.9 Grouping Keys

# 5.9 Grouping Keys

## Problem

You need to partition items into groups according to a derived key.

A counting map records how many times each key appears. A grouping map records the actual items associated with each key. This is useful when the later algorithm needs to process each group separately.

## Solution

Use a map from group key to list of items.

```text id="cfodm4"
groups: Map<Key, List<Item>>
```

For each item, compute its group key and append the item to the corresponding list.

```text id="oac6ll"
group_by(items, key_of):
    groups = empty map

    for item in items:
        key = key_of(item)
        bucket = get(groups, key)

        if bucket is not_found:
            bucket = empty list
            put(groups, key, bucket)

        append item to bucket

    return groups
```

If the map API supports default insertion, the same pattern can be written more directly.

```text id="mnv867"
group_by(items, key_of):
    groups = empty map

    for item in items:
        key = key_of(item)
        append item to groups.get_or_insert(key, empty list)

    return groups
```

## Discussion

Grouping is a common way to convert a flat sequence into a keyed collection of smaller sequences. The key function defines the partition. Items that produce equal keys are placed in the same group. Items that produce different keys are placed in different groups.

The key function must be stable during grouping. If the key depends on mutable external state, the result can be inconsistent. For example, grouping log entries by a time window is valid if the window calculation is deterministic. Grouping by a changing clock value while processing is usually wrong.

The map stores one list per distinct key. This makes the number of map entries equal to the number of groups, not the number of items. The items themselves are stored inside the lists. This is useful when duplicates should be preserved.

Grouping differs from counting. Counting discards the individual items and stores only a number. Grouping keeps the items and therefore uses more memory. Use grouping only when the later computation needs the grouped contents.

## Correctness

The correctness invariant is that after processing the first `i` items, `groups[k]` contains exactly the processed items whose derived key equals `k`, in their original order.

Initially, the map is empty, so the invariant holds.

When the algorithm processes an item `x`, it computes `key_of(x) = k`. If group `k` already exists, the algorithm appends `x` to that group. If group `k` does not exist, it creates an empty group and appends `x`. No other group is changed. Therefore, the invariant is preserved.

After all items are processed, each item appears in exactly one group, namely the group selected by its key.

## Complexity

Let `n` be the number of items and `g` be the number of distinct group keys.

Each item causes one key computation, one expected constant time map lookup, and one append. If list append is amortized constant time, the total expected time is `O(n)` plus the cost of computing keys.

Space usage is `O(n + g)`. The lists store all `n` items, and the map stores `g` group entries.

## Example

Group words by their first letter.

```text id="s7znlw"
items = ["go", "lean", "graph", "rust", "lambda", "regex"]
```

Use this key function:

```text id="7a3vxm"
key_of(word):
    return first character of word
```

The resulting map is:

```text id="gh5jr3"
{
    "g": ["go", "graph"],
    "l": ["lean", "lambda"],
    "r": ["rust", "regex"]
}
```

The order inside each group follows the original input order.

## Implementation Notes

Use lists or dynamic arrays for groups when append is the dominant operation.

Use sets instead of lists when each group should contain unique values.

Use a priority queue per key when each group must expose its smallest or largest item efficiently.

For large inputs, avoid copying items into groups if references or indexes are sufficient.

For streaming systems, consider bounded groups or periodic flushing, because grouping stores all items unless the groups are consumed incrementally.

## Common Mistakes

Using a counting map when the later algorithm needs the actual grouped items.

Assuming group iteration order is stable when the underlying map does not guarantee it.

Using a mutable object as a group key and then modifying it.

Computing the group key inconsistently across different passes.

Creating many tiny groups when a coarser key would be sufficient.