Parallel Sorting by Regular Sampling (PSRS)

PSRS is a structured parallel sorting algorithm designed for shared memory or distributed systems. It combines local sorting, deterministic sampling, and balanced redistribution.

Unlike naive sample sort, PSRS uses regular sampling to guarantee better load balance without heavy randomness.

Problem

Given $n$ elements distributed across $p$ processors, sort them in nondecreasing order.

Each processor initially holds about $n/p$ elements.

Algorithm

PSRS proceeds in five phases.

psrs(A, p):
    partition A into p blocks

    parallel for each processor i:
        sort local block

        select p regular samples from sorted block

    gather all samples
    sort samples
    choose p - 1 global splitters

    broadcast splitters

    parallel for each processor i:
        partition local block into p buckets

    all_to_all exchange buckets

    parallel for each processor i:
        merge received buckets

    return processors in order

Regular Sampling

Each processor selects evenly spaced samples from its sorted local block.

If the local block has size $n/p$ , then samples are chosen at:

\frac{n}{p^2}, \frac{2n}{p^2}, \dots, \frac{(p-1)n}{p^2}

This produces $p$ samples per processor.

Splitter Selection

All $p^2$ samples are collected and sorted. Then every $p$ -th sample is selected as a splitter:

s_i = \text{sample}[i \cdot p]

for $i = 1, \dots, p-1$ .

This approximates quantiles of the full dataset.

Redistribution

Each processor partitions its sorted block using the splitters:

bucket_id(x, splitters):
    return upper_bound(splitters, x)

Buckets are sent to corresponding processors via all to all communication.

Final Merge

Each processor receives $p$ sorted sublists. It merges them into one sorted list.

merge_k_sorted_lists(lists):
    use k-way merge with heap

Complexity

measure	value
local sort	$O((n/p)\log(n/p))$
sampling	$O(n/p)$
sample sort	$O(p^2 \log p)$
partitioning	$O((n/p)\log p)$
communication	$O(n)$
final merge	$O((n/p)\log p)$
total work	$O(n \log n)$

Load Balance Guarantee

Regular sampling ensures that each processor receives at most about:

\frac{2n}{p}

elements in the worst case.

This gives strong guarantees compared to naive random sampling.

Correctness

Local sorting ensures each processor starts with sorted data.
Splitters approximate global quantiles.
Partitioning ensures that processor $i$ receives elements in a disjoint key range.
Merging preserves sorted order.

Thus, each processor holds a sorted segment, and all segments are globally ordered.

Practical Considerations

Requires synchronization between phases.
Communication cost is significant.
Works well when processors have similar speed.
Deterministic behavior is useful for reproducibility.
Final merge cost can be reduced with optimized k-way merge.

When to Use

Use PSRS when:

strong load balance is required
deterministic behavior is preferred
processors are symmetric
data is large and distributed

Avoid it when communication cost dominates or when simpler randomized sample sort is sufficient.

Implementation Sketch

local_phase(block):
    sort block
    return regular_samples(block)

choose_splitters(samples, p):
    sort samples
    splitters = []
    for i in 1..p-1:
        splitters.append(samples[i * p])
    return splitters

redistribute(block, splitters):
    buckets = partition(block, splitters)
    send buckets to processors

final_merge(lists):
    return k_way_merge(lists)

Simplified Python Model

from bisect import bisect_right
import heapq

def psrs(blocks):
    p = len(blocks)

    # local sort
    blocks = [sorted(b) for b in blocks]

    # regular samples
    samples = []
    for b in blocks:
        step = max(1, len(b)//p)
        samples.extend(b[::step][:p])

    samples.sort()

    splitters = [samples[i*p] for i in range(1, p)]

    # redistribute
    buckets = [[] for _ in range(p)]
    for b in blocks:
        for x in b:
            i = bisect_right(splitters, x)
            buckets[i].append(x)

    # final merge
    return [sorted(bucket) for bucket in buckets]