Skip to content

Parallel Sorting by Regular Sampling (PSRS)

Sort data in parallel by local sorting, regular sampling, global splitter selection, redistribution, and final local merge.

PSRS is a structured parallel sorting algorithm designed for shared memory or distributed systems. It combines local sorting, deterministic sampling, and balanced redistribution.

Unlike naive sample sort, PSRS uses regular sampling to guarantee better load balance without heavy randomness.

Problem

Given nn elements distributed across pp processors, sort them in nondecreasing order.

Each processor initially holds about n/pn/p elements.

Algorithm

PSRS proceeds in five phases.

psrs(A, p):
    partition A into p blocks

    parallel for each processor i:
        sort local block

        select p regular samples from sorted block

    gather all samples
    sort samples
    choose p - 1 global splitters

    broadcast splitters

    parallel for each processor i:
        partition local block into p buckets

    all_to_all exchange buckets

    parallel for each processor i:
        merge received buckets

    return processors in order

Regular Sampling

Each processor selects evenly spaced samples from its sorted local block.

If the local block has size n/pn/p, then samples are chosen at:

np2,2np2,,(p1)np2 \frac{n}{p^2}, \frac{2n}{p^2}, \dots, \frac{(p-1)n}{p^2}

This produces pp samples per processor.

Splitter Selection

All p2p^2 samples are collected and sorted. Then every pp-th sample is selected as a splitter:

si=sample[ip] s_i = \text{sample}[i \cdot p]

for i=1,,p1i = 1, \dots, p-1.

This approximates quantiles of the full dataset.

Redistribution

Each processor partitions its sorted block using the splitters:

bucket_id(x, splitters):
    return upper_bound(splitters, x)

Buckets are sent to corresponding processors via all to all communication.

Final Merge

Each processor receives pp sorted sublists. It merges them into one sorted list.

merge_k_sorted_lists(lists):
    use k-way merge with heap

Complexity

measurevalue
local sortO((n/p)log(n/p))O((n/p)\log(n/p))
samplingO(n/p)O(n/p)
sample sortO(p2logp)O(p^2 \log p)
partitioningO((n/p)logp)O((n/p)\log p)
communicationO(n)O(n)
final mergeO((n/p)logp)O((n/p)\log p)
total workO(nlogn)O(n \log n)

Load Balance Guarantee

Regular sampling ensures that each processor receives at most about:

2np \frac{2n}{p}

elements in the worst case.

This gives strong guarantees compared to naive random sampling.

Correctness

  • Local sorting ensures each processor starts with sorted data.
  • Splitters approximate global quantiles.
  • Partitioning ensures that processor ii receives elements in a disjoint key range.
  • Merging preserves sorted order.

Thus, each processor holds a sorted segment, and all segments are globally ordered.

Practical Considerations

  • Requires synchronization between phases.
  • Communication cost is significant.
  • Works well when processors have similar speed.
  • Deterministic behavior is useful for reproducibility.
  • Final merge cost can be reduced with optimized k-way merge.

When to Use

Use PSRS when:

  • strong load balance is required
  • deterministic behavior is preferred
  • processors are symmetric
  • data is large and distributed

Avoid it when communication cost dominates or when simpler randomized sample sort is sufficient.

Implementation Sketch

local_phase(block):
    sort block
    return regular_samples(block)
choose_splitters(samples, p):
    sort samples
    splitters = []
    for i in 1..p-1:
        splitters.append(samples[i * p])
    return splitters
redistribute(block, splitters):
    buckets = partition(block, splitters)
    send buckets to processors
final_merge(lists):
    return k_way_merge(lists)

Simplified Python Model

from bisect import bisect_right
import heapq

def psrs(blocks):
    p = len(blocks)

    # local sort
    blocks = [sorted(b) for b in blocks]

    # regular samples
    samples = []
    for b in blocks:
        step = max(1, len(b)//p)
        samples.extend(b[::step][:p])

    samples.sort()

    splitters = [samples[i*p] for i in range(1, p)]

    # redistribute
    buckets = [[] for _ in range(p)]
    for b in blocks:
        for x in b:
            i = bisect_right(splitters, x)
            buckets[i].append(x)

    # final merge
    return [sorted(bucket) for bucket in buckets]