Skip to content

Sparse Key Counting Sort

Counting sort variant that stores counts only for keys that appear, avoiding dense arrays over large key ranges.

Sparse key counting sort handles large key ranges by storing counts only for keys that appear in the input. It replaces the dense count array of counting sort with a map from key to frequency.

This is useful when the key universe is huge but the number of distinct keys is small.

Problem

Given an array AA of nn keys drawn from a large domain, sort AA when only uu distinct keys appear and unu \ll n or uku \ll k.

Idea

Use a dictionary to count frequencies:

H[x]=number of occurrences of x H[x] = \text{number of occurrences of } x

Then sort the distinct keys and emit each key according to its count.

Algorithm

sparse_key_counting_sort(A):
    H = empty map

    for x in A:
        H[x] += 1

    keys = sorted keys of H

    i = 0
    for key in keys:
        repeat H[key] times:
            A[i] = key
            i += 1

    return A

Example

Let:

A=[1000000,5,1000000,42,5] A = [1000000, 5, 1000000, 42, 5]

Frequency map:

keycount
52
421
10000002

Sorted keys:

[5,42,1000000] [5, 42, 1000000]

Output:

[5,5,42,1000000,1000000] [5, 5, 42, 1000000, 1000000]

Correctness

The frequency map records the exact multiplicity of each key. Sorting the distinct keys gives the correct key order. Emitting each key exactly its recorded number of times produces a sorted array containing exactly the original multiset.

Complexity

Let:

  • nn be the number of elements
  • uu be the number of distinct keys

Counting time:

O(n) O(n)

Sorting distinct keys:

O(ulogu) O(u \log u)

Total time:

O(n+ulogu) O(n + u \log u)

Space:

O(u) O(u)

Comparison with Dense Counting Sort

methodbest domaintimespace
dense counting sortsmall contiguous rangeO(n+k)O(n + k)O(k)O(k)
sparse key counting sortlarge sparse rangeO(n+ulogu)O(n + u \log u)O(u)O(u)

Sparse key counting sort is better when kk is large and uu is small.

Properties

propertyvalue
stableno
in placemostly, excluding map
comparison basedonly among distinct keys
handles sparse keysyes

When to Use

Use sparse key counting sort when:

  • the key range is large
  • few distinct keys appear
  • counting frequencies is natural
  • dense counting arrays would waste memory

Avoid it when all or most keys are distinct, since sorting distinct keys gives the same asymptotic cost as ordinary comparison sort.

Implementation

from collections import Counter

def sparse_key_counting_sort(a):
    count = Counter(a)
    i = 0

    for key in sorted(count):
        for _ in range(count[key]):
            a[i] = key
            i += 1

    return a