Sort elements by building a frequency histogram and reconstructing the output from cumulative counts.
Histogram sort constructs a frequency histogram over the key domain and uses it to reconstruct the sorted sequence. It generalizes counting sort to cases where the domain may be sparse or where counts are derived from a histogram structure.
It appears in data processing pipelines where histograms are already computed or where frequency aggregation is natural.
Problem
Given an array of elements with keys drawn from a domain , produce a sorted array.
Idea
- Build a histogram mapping each key to its frequency
- Convert the histogram into cumulative counts
- Reconstruct the sorted array using counts
Unlike simple counting sort, histogram sort often uses a map or compressed structure when the key domain is large or sparse.
Algorithm
histogram_sort(A):
H = empty map
for x in A:
H[x] += 1
keys = sorted list of keys in H
result = []
for key in keys:
for i from 1 to H[key]:
append key to result
return resultExample
Let:
Histogram:
| key | count |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 2 |
| 4 | 1 |
| 8 | 1 |
Sorted output:
Correctness
The histogram records exact multiplicities of each key. Iterating over keys in sorted order and emitting each key according to its frequency produces a sorted sequence with the correct number of occurrences.
Complexity
Let:
- be number of elements
- be number of distinct keys
Time:
The factor arises from sorting the distinct keys.
Space:
Comparison with Counting Sort
| aspect | counting sort | histogram sort |
|---|---|---|
| domain | dense | sparse |
| storage | array | map |
| key iteration | direct index | sorted keys |
| complexity |
Histogram sort is preferred when the domain size is large but the number of distinct values is small.
Properties
| property | value |
|---|---|
| stable | no |
| in place | no |
| comparison based | partially |
| domain handling | sparse friendly |
When to Use
Use histogram sort when:
- key domain is large or sparse
- number of distinct values is small
- frequency counting is already required
- memory for dense arrays is impractical
Avoid when stability is required or when domain is dense and small, where counting sort is more efficient.
Implementation (Python)
from collections import Counter
def histogram_sort(a):
H = Counter(a)
result = []
for key in sorted(H.keys()):
result.extend([key] * H[key])
return result