# Parallel Sorting by Regular Sampling (PSRS)

# Parallel Sorting by Regular Sampling (PSRS)

PSRS is a structured parallel sorting algorithm designed for shared memory or distributed systems. It combines local sorting, deterministic sampling, and balanced redistribution.

Unlike naive sample sort, PSRS uses **regular sampling** to guarantee better load balance without heavy randomness.

## Problem

Given $n$ elements distributed across $p$ processors, sort them in nondecreasing order.

Each processor initially holds about $n/p$ elements.

## Algorithm

PSRS proceeds in five phases.

```text id="y6m8pc"
psrs(A, p):
    partition A into p blocks

    parallel for each processor i:
        sort local block

        select p regular samples from sorted block

    gather all samples
    sort samples
    choose p - 1 global splitters

    broadcast splitters

    parallel for each processor i:
        partition local block into p buckets

    all_to_all exchange buckets

    parallel for each processor i:
        merge received buckets

    return processors in order
```

## Regular Sampling

Each processor selects evenly spaced samples from its sorted local block.

If the local block has size $n/p$, then samples are chosen at:

$$
\frac{n}{p^2}, \frac{2n}{p^2}, \dots, \frac{(p-1)n}{p^2}
$$

This produces $p$ samples per processor.

## Splitter Selection

All $p^2$ samples are collected and sorted. Then every $p$-th sample is selected as a splitter:

$$
s_i = \text{sample}[i \cdot p]
$$

for $i = 1, \dots, p-1$.

This approximates quantiles of the full dataset.

## Redistribution

Each processor partitions its sorted block using the splitters:

```text id="l9e2tx"
bucket_id(x, splitters):
    return upper_bound(splitters, x)
```

Buckets are sent to corresponding processors via all to all communication.

## Final Merge

Each processor receives $p$ sorted sublists. It merges them into one sorted list.

```text id="2tx0s7"
merge_k_sorted_lists(lists):
    use k-way merge with heap
```

## Complexity

| measure       | value               |
| ------------- | ------------------- |
| local sort    | $O((n/p)\log(n/p))$ |
| sampling      | $O(n/p)$            |
| sample sort   | $O(p^2 \log p)$     |
| partitioning  | $O((n/p)\log p)$    |
| communication | $O(n)$              |
| final merge   | $O((n/p)\log p)$    |
| total work    | $O(n \log n)$       |

## Load Balance Guarantee

Regular sampling ensures that each processor receives at most about:

$$
\frac{2n}{p}
$$

elements in the worst case.

This gives strong guarantees compared to naive random sampling.

## Correctness

* Local sorting ensures each processor starts with sorted data.
* Splitters approximate global quantiles.
* Partitioning ensures that processor $i$ receives elements in a disjoint key range.
* Merging preserves sorted order.

Thus, each processor holds a sorted segment, and all segments are globally ordered.

## Practical Considerations

* Requires synchronization between phases.
* Communication cost is significant.
* Works well when processors have similar speed.
* Deterministic behavior is useful for reproducibility.
* Final merge cost can be reduced with optimized k-way merge.

## When to Use

Use PSRS when:

* strong load balance is required
* deterministic behavior is preferred
* processors are symmetric
* data is large and distributed

Avoid it when communication cost dominates or when simpler randomized sample sort is sufficient.

## Implementation Sketch

```text id="8j2p9k"
local_phase(block):
    sort block
    return regular_samples(block)
```

```text id="4z8q1m"
choose_splitters(samples, p):
    sort samples
    splitters = []
    for i in 1..p-1:
        splitters.append(samples[i * p])
    return splitters
```

```text id="6k3f2v"
redistribute(block, splitters):
    buckets = partition(block, splitters)
    send buckets to processors
```

```text id="1w7h9x"
final_merge(lists):
    return k_way_merge(lists)
```

## Simplified Python Model

```python id="psrs01"
from bisect import bisect_right
import heapq

def psrs(blocks):
    p = len(blocks)

    # local sort
    blocks = [sorted(b) for b in blocks]

    # regular samples
    samples = []
    for b in blocks:
        step = max(1, len(b)//p)
        samples.extend(b[::step][:p])

    samples.sort()

    splitters = [samples[i*p] for i in range(1, p)]

    # redistribute
    buckets = [[] for _ in range(p)]
    for b in blocks:
        for x in b:
            i = bisect_right(splitters, x)
            buckets[i].append(x)

    # final merge
    return [sorted(bucket) for bucket in buckets]
```

