Histogram Sort

Histogram sort constructs a frequency histogram over the key domain and uses it to reconstruct the sorted sequence. It generalizes counting sort to cases where the domain may be sparse or where counts are derived from a histogram structure.

It appears in data processing pipelines where histograms are already computed or where frequency aggregation is natural.

Problem

Given an array $A$ of $n$ elements with keys drawn from a domain $D$ , produce a sorted array.

Idea

Build a histogram mapping each key to its frequency
Convert the histogram into cumulative counts
Reconstruct the sorted array using counts

Unlike simple counting sort, histogram sort often uses a map or compressed structure when the key domain is large or sparse.

Algorithm

histogram_sort(A):
    H = empty map

    for x in A:
        H[x] += 1

    keys = sorted list of keys in H

    result = []

    for key in keys:
        for i from 1 to H[key]:
            append key to result

    return result

Example

Let:

A = [4, 2, 2, 8, 3, 3, 1]

Histogram:

key	count
1	1
2	2
3	2
4	1
8	1

Sorted output:

[1, 2, 2, 3, 3, 4, 8]

Correctness

The histogram records exact multiplicities of each key. Iterating over keys in sorted order and emitting each key according to its frequency produces a sorted sequence with the correct number of occurrences.

Complexity

Let:

$n$ be number of elements
$u$ be number of distinct keys

Time:

O(n + u \log u)

The $\log u$ factor arises from sorting the distinct keys.

Space:

O(u)

Comparison with Counting Sort

aspect	counting sort	histogram sort
domain	dense	sparse
storage	array	map
key iteration	direct index	sorted keys
complexity	$O(n + k)$	$O(n + u \log u)$

Histogram sort is preferred when the domain size $k$ is large but the number of distinct values $u$ is small.

Properties

property	value
stable	no
in place	no
comparison based	partially
domain handling	sparse friendly

When to Use

Use histogram sort when:

key domain is large or sparse
number of distinct values is small
frequency counting is already required
memory for dense arrays is impractical

Avoid when stability is required or when domain is dense and small, where counting sort is more efficient.

Implementation (Python)

from collections import Counter

def histogram_sort(a):
    H = Counter(a)
    result = []

    for key in sorted(H.keys()):
        result.extend([key] * H[key])

    return result