# Sparse Key Counting Sort

# Sparse Key Counting Sort

Sparse key counting sort handles large key ranges by storing counts only for keys that appear in the input. It replaces the dense count array of counting sort with a map from key to frequency.

This is useful when the key universe is huge but the number of distinct keys is small.

## Problem

Given an array $A$ of $n$ keys drawn from a large domain, sort $A$ when only $u$ distinct keys appear and $u \ll n$ or $u \ll k$.

## Idea

Use a dictionary to count frequencies:

$$
H[x] = \text{number of occurrences of } x
$$

Then sort the distinct keys and emit each key according to its count.

## Algorithm

```text id="q8m2pa"
sparse_key_counting_sort(A):
    H = empty map

    for x in A:
        H[x] += 1

    keys = sorted keys of H

    i = 0
    for key in keys:
        repeat H[key] times:
            A[i] = key
            i += 1

    return A
```

## Example

Let:

$$
A = [1000000, 5, 1000000, 42, 5]
$$

Frequency map:

| key     | count |
| ------- | ----- |
| 5       | 2     |
| 42      | 1     |
| 1000000 | 2     |

Sorted keys:

$$
[5, 42, 1000000]
$$

Output:

$$
[5, 5, 42, 1000000, 1000000]
$$

## Correctness

The frequency map records the exact multiplicity of each key. Sorting the distinct keys gives the correct key order. Emitting each key exactly its recorded number of times produces a sorted array containing exactly the original multiset.

## Complexity

Let:

* $n$ be the number of elements
* $u$ be the number of distinct keys

Counting time:

$$
O(n)
$$

Sorting distinct keys:

$$
O(u \log u)
$$

Total time:

$$
O(n + u \log u)
$$

Space:

$$
O(u)
$$

## Comparison with Dense Counting Sort

| method                   | best domain            | time              | space  |
| ------------------------ | ---------------------- | ----------------- | ------ |
| dense counting sort      | small contiguous range | $O(n + k)$        | $O(k)$ |
| sparse key counting sort | large sparse range     | $O(n + u \log u)$ | $O(u)$ |

Sparse key counting sort is better when $k$ is large and $u$ is small.

## Properties

| property            | value                    |
| ------------------- | ------------------------ |
| stable              | no                       |
| in place            | mostly, excluding map    |
| comparison based    | only among distinct keys |
| handles sparse keys | yes                      |

## When to Use

Use sparse key counting sort when:

* the key range is large
* few distinct keys appear
* counting frequencies is natural
* dense counting arrays would waste memory

Avoid it when all or most keys are distinct, since sorting distinct keys gives the same asymptotic cost as ordinary comparison sort.

## Implementation

```python id="v7q2mx"
from collections import Counter

def sparse_key_counting_sort(a):
    count = Counter(a)
    i = 0

    for key in sorted(count):
        for _ in range(count[key]):
            a[i] = key
            i += 1

    return a
```

