Sort data in parallel by local sorting, regular sampling, global splitter selection, redistribution, and final local merge.
PSRS is a structured parallel sorting algorithm designed for shared memory or distributed systems. It combines local sorting, deterministic sampling, and balanced redistribution.
Unlike naive sample sort, PSRS uses regular sampling to guarantee better load balance without heavy randomness.
Problem
Given elements distributed across processors, sort them in nondecreasing order.
Each processor initially holds about elements.
Algorithm
PSRS proceeds in five phases.
psrs(A, p):
partition A into p blocks
parallel for each processor i:
sort local block
select p regular samples from sorted block
gather all samples
sort samples
choose p - 1 global splitters
broadcast splitters
parallel for each processor i:
partition local block into p buckets
all_to_all exchange buckets
parallel for each processor i:
merge received buckets
return processors in orderRegular Sampling
Each processor selects evenly spaced samples from its sorted local block.
If the local block has size , then samples are chosen at:
This produces samples per processor.
Splitter Selection
All samples are collected and sorted. Then every -th sample is selected as a splitter:
for .
This approximates quantiles of the full dataset.
Redistribution
Each processor partitions its sorted block using the splitters:
bucket_id(x, splitters):
return upper_bound(splitters, x)Buckets are sent to corresponding processors via all to all communication.
Final Merge
Each processor receives sorted sublists. It merges them into one sorted list.
merge_k_sorted_lists(lists):
use k-way merge with heapComplexity
| measure | value |
|---|---|
| local sort | |
| sampling | |
| sample sort | |
| partitioning | |
| communication | |
| final merge | |
| total work |
Load Balance Guarantee
Regular sampling ensures that each processor receives at most about:
elements in the worst case.
This gives strong guarantees compared to naive random sampling.
Correctness
- Local sorting ensures each processor starts with sorted data.
- Splitters approximate global quantiles.
- Partitioning ensures that processor receives elements in a disjoint key range.
- Merging preserves sorted order.
Thus, each processor holds a sorted segment, and all segments are globally ordered.
Practical Considerations
- Requires synchronization between phases.
- Communication cost is significant.
- Works well when processors have similar speed.
- Deterministic behavior is useful for reproducibility.
- Final merge cost can be reduced with optimized k-way merge.
When to Use
Use PSRS when:
- strong load balance is required
- deterministic behavior is preferred
- processors are symmetric
- data is large and distributed
Avoid it when communication cost dominates or when simpler randomized sample sort is sufficient.
Implementation Sketch
local_phase(block):
sort block
return regular_samples(block)choose_splitters(samples, p):
sort samples
splitters = []
for i in 1..p-1:
splitters.append(samples[i * p])
return splittersredistribute(block, splitters):
buckets = partition(block, splitters)
send buckets to processorsfinal_merge(lists):
return k_way_merge(lists)Simplified Python Model
from bisect import bisect_right
import heapq
def psrs(blocks):
p = len(blocks)
# local sort
blocks = [sorted(b) for b in blocks]
# regular samples
samples = []
for b in blocks:
step = max(1, len(b)//p)
samples.extend(b[::step][:p])
samples.sort()
splitters = [samples[i*p] for i in range(1, p)]
# redistribute
buckets = [[] for _ in range(p)]
for b in blocks:
for x in b:
i = bisect_right(splitters, x)
buckets[i].append(x)
# final merge
return [sorted(bucket) for bucket in buckets]