Divide and conquer sorting algorithm that uses sampling to partition data into balanced buckets.
Sample sort is a generalization of quicksort and bucket sort. It selects a sample of elements, uses them to define partition boundaries, and distributes the data into multiple buckets that are sorted independently.
It is widely used in parallel and distributed systems because it produces well balanced partitions.
Problem
Given an array of length , reorder it such that:
Idea
Instead of choosing a single pivot, sample sort chooses multiple splitters:
- take a random sample from the array
- sort the sample
- choose evenly spaced elements as splitters
- partition the array into buckets using splitters
- sort each bucket recursively
This creates partitions instead of two, reducing recursion depth.
Algorithm
sample_sort(A):
if length(A) small:
use insertion sort
return
sample = select random elements from A
sort(sample)
splitters = choose k evenly spaced elements from sample
buckets = partition A into k+1 buckets using splitters
for each bucket:
sample_sort(bucket)
concatenate bucketsExample
Let:
Sample:
Sorted sample:
Choose splitters:
Partition:
| bucket | values |
|---|---|
| < 4 | [2, 1, 3] |
| 4 to 7 | [4, 6, 5] |
| > 7 | [9, 8] |
Recursively sort each bucket:
Final result:
Correctness
The splitters divide the array into ranges such that all elements in earlier buckets are less than or equal to those in later buckets. Recursively sorting each bucket ensures that each bucket is internally sorted. Concatenating the sorted buckets yields a globally sorted array.
Complexity
| case | time |
|---|---|
| average | |
| worst |
With good sampling:
- partitions are balanced
- recursion depth is reduced
In parallel settings:
| metric | value |
|---|---|
| work | |
| depth |
Properties
| property | value |
|---|---|
| stable | depends on implementation |
| in-place | no |
| parallel friendly | very high |
| partitioning | multiway |
Notes
Sample sort is widely used in high performance and distributed sorting systems. It provides better load balancing than quicksort because multiple splitters create more uniform partitions.
It is a key algorithm in parallel sorting frameworks and large scale data processing systems.
Implementation
import random
def sample_sort(a):
if len(a) <= 16:
return sorted(a)
k = int(len(a) ** 0.5)
sample = random.sample(a, min(len(a), k * 2))
sample.sort()
splitters = [sample[i * len(sample) // (k + 1)] for i in range(1, k + 1)]
buckets = [[] for _ in range(k + 1)]
for x in a:
placed = False
for i, s in enumerate(splitters):
if x <= s:
buckets[i].append(x)
placed = True
break
if not placed:
buckets[-1].append(x)
result = []
for b in buckets:
result.extend(sample_sort(b))
return result