Skip to content

Sample Sort

Divide and conquer sorting algorithm that uses sampling to partition data into balanced buckets.

Sample sort is a generalization of quicksort and bucket sort. It selects a sample of elements, uses them to define partition boundaries, and distributes the data into multiple buckets that are sorted independently.

It is widely used in parallel and distributed systems because it produces well balanced partitions.

Problem

Given an array AA of length nn, reorder it such that:

A[0]A[1]A[n1] A[0] \le A[1] \le \dots \le A[n-1]

Idea

Instead of choosing a single pivot, sample sort chooses multiple splitters:

  1. take a random sample from the array
  2. sort the sample
  3. choose evenly spaced elements as splitters
  4. partition the array into buckets using splitters
  5. sort each bucket recursively

This creates kk partitions instead of two, reducing recursion depth.

Algorithm

sample_sort(A):
    if length(A) small:
        use insertion sort
        return

    sample = select random elements from A
    sort(sample)

    splitters = choose k evenly spaced elements from sample

    buckets = partition A into k+1 buckets using splitters

    for each bucket:
        sample_sort(bucket)

    concatenate buckets

Example

Let:

A=[9,2,7,4,6,1,8,3,5] A = [9, 2, 7, 4, 6, 1, 8, 3, 5]

Sample:

[2,7,4,8] [2, 7, 4, 8]

Sorted sample:

[2,4,7,8] [2, 4, 7, 8]

Choose splitters:

[4,7] [4, 7]

Partition:

bucketvalues
< 4[2, 1, 3]
4 to 7[4, 6, 5]
> 7[9, 8]

Recursively sort each bucket:

Final result:

[1,2,3,4,5,6,7,8,9] [1,2,3,4,5,6,7,8,9]

Correctness

The splitters divide the array into ranges such that all elements in earlier buckets are less than or equal to those in later buckets. Recursively sorting each bucket ensures that each bucket is internally sorted. Concatenating the sorted buckets yields a globally sorted array.

Complexity

casetime
averageO(nlogn)O(n \log n)
worstO(n2)O(n^2)

With good sampling:

  • partitions are balanced
  • recursion depth is reduced

In parallel settings:

metricvalue
workO(nlogn)O(n \log n)
depthO(logn)O(\log n)

Properties

propertyvalue
stabledepends on implementation
in-placeno
parallel friendlyvery high
partitioningmultiway

Notes

Sample sort is widely used in high performance and distributed sorting systems. It provides better load balancing than quicksort because multiple splitters create more uniform partitions.

It is a key algorithm in parallel sorting frameworks and large scale data processing systems.

Implementation

import random

def sample_sort(a):
    if len(a) <= 16:
        return sorted(a)

    k = int(len(a) ** 0.5)

    sample = random.sample(a, min(len(a), k * 2))
    sample.sort()

    splitters = [sample[i * len(sample) // (k + 1)] for i in range(1, k + 1)]

    buckets = [[] for _ in range(k + 1)]

    for x in a:
        placed = False
        for i, s in enumerate(splitters):
            if x <= s:
                buckets[i].append(x)
                placed = True
                break
        if not placed:
            buckets[-1].append(x)

    result = []
    for b in buckets:
        result.extend(sample_sort(b))

    return result