Choose splitters from samples, partition the input into buckets, sort buckets independently, then concatenate the sorted buckets.
Parallel sample sort is a parallel sorting algorithm based on partitioning by sampled splitters. It first chooses representative keys from the input, uses them to divide the data into buckets, sorts each bucket independently, then concatenates the buckets in order.
The method is useful when many processors are available because most work after partitioning is independent. Its quality depends on how well the samples represent the input distribution.
Problem
Given an array of size , sort all elements in nondecreasing order using parallel workers.
Algorithm
Choose samples from the input, sort the samples, and select splitters. The splitters define bucket boundaries. Each element is assigned to one bucket. Buckets are then sorted independently.
parallel_sample_sort(A, p):
choose sample keys from A
sort the samples
choose p - 1 splitters from the sorted samples
create p empty buckets
parallel for each element x in A:
b = bucket_index(x, splitters)
append x to bucket b
parallel for each bucket:
sort bucket
concatenate buckets in splitter orderThe splitters must satisfy:
Elements in bucket are at most . Elements in bucket are greater than or equal to .
Bucket Lookup
A bucket can be found by binary search over the splitter array.
bucket_index(x, splitters):
return upper_bound(splitters, x)For buckets, this costs:
per element.
Complexity
| measure | value |
|---|---|
| sampling | or smaller |
| partitioning | with binary search |
| bucket sorting | about per balanced bucket |
| total work | |
| ideal parallel time |
When buckets are well balanced, the algorithm scales well. When buckets are skewed, one worker may receive a much larger bucket and dominate runtime.
Correctness
The splitters impose a global order on the buckets. Every element in an earlier bucket is less than or equal to every element in a later bucket. Each bucket is sorted locally. Concatenating locally sorted buckets in splitter order therefore produces one globally sorted array.
Practical Considerations
- Oversampling improves bucket balance.
- Duplicate heavy inputs need careful bucket rules.
- Buckets should be built with prefix sums or per worker buffers to avoid contention.
- Local bucket sorting can use quicksort, mergesort, radix sort, or introsort.
- Communication cost matters in distributed implementations.
When to Use
Use parallel sample sort when:
- the input is large
- many workers are available
- the data can be redistributed efficiently
- approximate balance can be achieved by sampling
Avoid it when the input is small, communication is expensive, or the data distribution causes severe bucket imbalance.
Implementation (Go, simplified)
func ParallelSampleSort(a []int, workers int) []int {
if len(a) <= 1 || workers <= 1 {
out := append([]int(nil), a...)
sort.Ints(out)
return out
}
splitters := chooseSplitters(a, workers)
buckets := make([][]int, workers)
for _, x := range a {
b := sort.Search(len(splitters), func(i int) bool {
return x <= splitters[i]
})
buckets[b] = append(buckets[b], x)
}
done := make(chan struct{}, workers)
for i := range buckets {
go func(i int) {
sort.Ints(buckets[i])
done <- struct{}{}
}(i)
}
for range buckets {
<-done
}
out := make([]int, 0, len(a))
for _, b := range buckets {
out = append(out, b...)
}
return out
}
func chooseSplitters(a []int, workers int) []int {
sample := append([]int(nil), a...)
sort.Ints(sample)
splitters := make([]int, 0, workers-1)
for i := 1; i < workers; i++ {
idx := i * len(sample) / workers
splitters = append(splitters, sample[idx])
}
return splitters
}