# Histogram Sort

# Histogram Sort

Histogram sort constructs a frequency histogram over the key domain and uses it to reconstruct the sorted sequence. It generalizes counting sort to cases where the domain may be sparse or where counts are derived from a histogram structure.

It appears in data processing pipelines where histograms are already computed or where frequency aggregation is natural.

## Problem

Given an array $A$ of $n$ elements with keys drawn from a domain $D$, produce a sorted array.

## Idea

1. Build a histogram mapping each key to its frequency
2. Convert the histogram into cumulative counts
3. Reconstruct the sorted array using counts

Unlike simple counting sort, histogram sort often uses a map or compressed structure when the key domain is large or sparse.

## Algorithm

```text id="h7k2qp"
histogram_sort(A):
    H = empty map

    for x in A:
        H[x] += 1

    keys = sorted list of keys in H

    result = []

    for key in keys:
        for i from 1 to H[key]:
            append key to result

    return result
```

## Example

Let:

$$
A = [4, 2, 2, 8, 3, 3, 1]
$$

Histogram:

| key | count |
| --- | ----- |
| 1   | 1     |
| 2   | 2     |
| 3   | 2     |
| 4   | 1     |
| 8   | 1     |

Sorted output:

$$
[1, 2, 2, 3, 3, 4, 8]
$$

## Correctness

The histogram records exact multiplicities of each key. Iterating over keys in sorted order and emitting each key according to its frequency produces a sorted sequence with the correct number of occurrences.

## Complexity

Let:

* $n$ be number of elements
* $u$ be number of distinct keys

Time:

$$
O(n + u \log u)
$$

The $\log u$ factor arises from sorting the distinct keys.

Space:

$$
O(u)
$$

## Comparison with Counting Sort

| aspect        | counting sort | histogram sort    |
| ------------- | ------------- | ----------------- |
| domain        | dense         | sparse            |
| storage       | array         | map               |
| key iteration | direct index  | sorted keys       |
| complexity    | $O(n + k)$    | $O(n + u \log u)$ |

Histogram sort is preferred when the domain size $k$ is large but the number of distinct values $u$ is small.

## Properties

| property         | value           |
| ---------------- | --------------- |
| stable           | no              |
| in place         | no              |
| comparison based | partially       |
| domain handling  | sparse friendly |

## When to Use

Use histogram sort when:

* key domain is large or sparse
* number of distinct values is small
* frequency counting is already required
* memory for dense arrays is impractical

Avoid when stability is required or when domain is dense and small, where counting sort is more efficient.

## Implementation (Python)

```python id="z6q2wr"
from collections import Counter

def histogram_sort(a):
    H = Counter(a)
    result = []

    for key in sorted(H.keys()):
        result.extend([key] * H[key])

    return result
```

