# Sample Sort

# Sample Sort

Sample sort is a generalization of quicksort and bucket sort. It selects a sample of elements, uses them to define partition boundaries, and distributes the data into multiple buckets that are sorted independently.

It is widely used in parallel and distributed systems because it produces well balanced partitions.

## Problem

Given an array $A$ of length $n$, reorder it such that:

$$
A[0] \le A[1] \le \dots \le A[n-1]
$$

## Idea

Instead of choosing a single pivot, sample sort chooses multiple splitters:

1. take a random sample from the array
2. sort the sample
3. choose evenly spaced elements as splitters
4. partition the array into buckets using splitters
5. sort each bucket recursively

This creates $k$ partitions instead of two, reducing recursion depth.

## Algorithm

```text id="s3k9xp"
sample_sort(A):
    if length(A) small:
        use insertion sort
        return

    sample = select random elements from A
    sort(sample)

    splitters = choose k evenly spaced elements from sample

    buckets = partition A into k+1 buckets using splitters

    for each bucket:
        sample_sort(bucket)

    concatenate buckets
```

## Example

Let:

$$
A = [9, 2, 7, 4, 6, 1, 8, 3, 5]
$$

Sample:

$$
[2, 7, 4, 8]
$$

Sorted sample:

$$
[2, 4, 7, 8]
$$

Choose splitters:

$$
[4, 7]
$$

Partition:

| bucket | values    |
| ------ | --------- |
| < 4    | [2, 1, 3] |
| 4 to 7 | [4, 6, 5] |
| > 7    | [9, 8]    |

Recursively sort each bucket:

Final result:

$$
[1,2,3,4,5,6,7,8,9]
$$

## Correctness

The splitters divide the array into ranges such that all elements in earlier buckets are less than or equal to those in later buckets. Recursively sorting each bucket ensures that each bucket is internally sorted. Concatenating the sorted buckets yields a globally sorted array.

## Complexity

| case    | time          |
| ------- | ------------- |
| average | $O(n \log n)$ |
| worst   | $O(n^2)$      |

With good sampling:

* partitions are balanced
* recursion depth is reduced

In parallel settings:

| metric | value         |
| ------ | ------------- |
| work   | $O(n \log n)$ |
| depth  | $O(\log n)$   |

## Properties

| property          | value                     |
| ----------------- | ------------------------- |
| stable            | depends on implementation |
| in-place          | no                        |
| parallel friendly | very high                 |
| partitioning      | multiway                  |

## Notes

Sample sort is widely used in high performance and distributed sorting systems. It provides better load balancing than quicksort because multiple splitters create more uniform partitions.

It is a key algorithm in parallel sorting frameworks and large scale data processing systems.

## Implementation

```python id="k7m2xp"
import random

def sample_sort(a):
    if len(a) <= 16:
        return sorted(a)

    k = int(len(a) ** 0.5)

    sample = random.sample(a, min(len(a), k * 2))
    sample.sort()

    splitters = [sample[i * len(sample) // (k + 1)] for i in range(1, k + 1)]

    buckets = [[] for _ in range(k + 1)]

    for x in a:
        placed = False
        for i, s in enumerate(splitters):
            if x <= s:
                buckets[i].append(x)
                placed = True
                break
        if not placed:
            buckets[-1].append(x)

    result = []
    for b in buckets:
        result.extend(sample_sort(b))

    return result
```