Partition data by sampled splitters, while adapting splitter choice to skewed or partially ordered input.
Adaptive samplesort is a samplesort variant that chooses splitters from the input and adjusts partitioning to the observed distribution. It is useful when the input has skew, duplicates, or uneven key ranges.
You use it when ordinary quicksort or fixed bucket sorting may create badly imbalanced partitions.
Problem
Given an array ( A ) of length ( n ), sort it by dividing values into buckets whose boundaries are chosen from a sample of the input.
The goal is to make buckets roughly balanced, even when the key distribution is not uniform.
Idea
Take a sample from the input, sort the sample, and choose splitters from it. The splitters define buckets. Each element is assigned to a bucket, and each bucket is sorted recursively or with another local method.
Adaptive versions refine this by:
- oversampling when skew is likely
- handling many equal keys specially
- changing bucket count by input size
- falling back when partitions become imbalanced
Algorithm
adaptive_samplesort(A):
if length(A) <= small_limit:
insertion_sort(A)
return A
sample = choose_sample(A)
sort(sample)
splitters = choose_splitters(sample)
buckets = create_empty_buckets(length(splitters) + 1)
for x in A:
b = find_bucket(splitters, x)
buckets[b].append(x)
for each bucket in buckets:
if bucket is too_large_or_skewed:
adaptive_samplesort(bucket)
else:
local_sort(bucket)
return concatenate(buckets)Example
Input:
[ A = [42, 3, 9, 42, 7, 100, 42, 5, 1, 88] ]
A sample might produce splitters:
[ [7, 42] ]
Buckets:
| bucket | condition | values |
|---|---|---|
| 1 | ( x < 7 ) | [3, 5, 1] |
| 2 | ( 7 \le x < 42 ) | [9, 7] |
| 3 | ( x \ge 42 ) | [42, 100, 42, 42, 88] |
Each bucket is sorted, then concatenated.
Correctness
Every splitter defines an ordered boundary. All elements in an earlier bucket are less than or equal to all elements in later buckets, assuming bucket assignment uses consistent boundary rules.
If each bucket is sorted correctly, concatenating the buckets produces the full sorted order.
Complexity
| case | time |
|---|---|
| balanced buckets | ( O(n \log n) ) |
| good distribution with many buckets | close to ( O(n) ) partition work plus local sorting |
| worst case without fallback | can degrade |
| with fallback | ( O(n \log n) ) |
Space depends on bucket storage:
[ O(n) ]
Handling Duplicates
Many duplicate keys can overload one bucket. Adaptive samplesort often treats splitter-equal values as a separate equal bucket.
if x < splitter:
put in left bucket
else if x == splitter:
put in equal bucket
else:
put in right bucketThis avoids repeatedly sorting identical values.
When to Use
Use adaptive samplesort when:
- keys are unevenly distributed
- duplicate keys are common
- parallel partitioning is useful
- cache-friendly bucket processing matters
It is common in high-performance sorting systems where partition balance is more important than simple implementation.