Sample Sort

Sample sort is a generalization of quicksort and bucket sort. It selects a sample of elements, uses them to define partition boundaries, and distributes the data into multiple buckets that are sorted independently.

It is widely used in parallel and distributed systems because it produces well balanced partitions.

Problem

Given an array $A$ of length $n$ , reorder it such that:

A[0] \le A[1] \le \dots \le A[n-1]

Idea

Instead of choosing a single pivot, sample sort chooses multiple splitters:

take a random sample from the array
sort the sample
choose evenly spaced elements as splitters
partition the array into buckets using splitters
sort each bucket recursively

This creates $k$ partitions instead of two, reducing recursion depth.

Algorithm

sample_sort(A):
    if length(A) small:
        use insertion sort
        return

    sample = select random elements from A
    sort(sample)

    splitters = choose k evenly spaced elements from sample

    buckets = partition A into k+1 buckets using splitters

    for each bucket:
        sample_sort(bucket)

    concatenate buckets

Example

Let:

A = [9, 2, 7, 4, 6, 1, 8, 3, 5]

Sample:

[2, 7, 4, 8]

Sorted sample:

[2, 4, 7, 8]

Choose splitters:

[4, 7]

Partition:

bucket	values
< 4	[2, 1, 3]
4 to 7	[4, 6, 5]
> 7	[9, 8]

Recursively sort each bucket:

Final result:

[1,2,3,4,5,6,7,8,9]

Correctness

The splitters divide the array into ranges such that all elements in earlier buckets are less than or equal to those in later buckets. Recursively sorting each bucket ensures that each bucket is internally sorted. Concatenating the sorted buckets yields a globally sorted array.

Complexity

case	time
average	$O(n \log n)$
worst	$O(n^2)$

With good sampling:

partitions are balanced
recursion depth is reduced

In parallel settings:

metric	value
work	$O(n \log n)$
depth	$O(\log n)$

Properties

property	value
stable	depends on implementation
in-place	no
parallel friendly	very high
partitioning	multiway

Notes

Sample sort is widely used in high performance and distributed sorting systems. It provides better load balancing than quicksort because multiple splitters create more uniform partitions.

It is a key algorithm in parallel sorting frameworks and large scale data processing systems.

Implementation

import random

def sample_sort(a):
    if len(a) <= 16:
        return sorted(a)

    k = int(len(a) ** 0.5)

    sample = random.sample(a, min(len(a), k * 2))
    sample.sort()

    splitters = [sample[i * len(sample) // (k + 1)] for i in range(1, k + 1)]

    buckets = [[] for _ in range(k + 1)]

    for x in a:
        placed = False
        for i, s in enumerate(splitters):
            if x <= s:
                buckets[i].append(x)
                placed = True
                break
        if not placed:
            buckets[-1].append(x)

    result = []
    for b in buckets:
        result.extend(sample_sort(b))

    return result