Adaptive Samplesort

Adaptive samplesort is a samplesort variant that chooses splitters from the input and adjusts partitioning to the observed distribution. It is useful when the input has skew, duplicates, or uneven key ranges.

You use it when ordinary quicksort or fixed bucket sorting may create badly imbalanced partitions.

Problem

Given an array ( A ) of length ( n ), sort it by dividing values into buckets whose boundaries are chosen from a sample of the input.

The goal is to make buckets roughly balanced, even when the key distribution is not uniform.

Idea

Take a sample from the input, sort the sample, and choose splitters from it. The splitters define buckets. Each element is assigned to a bucket, and each bucket is sorted recursively or with another local method.

Adaptive versions refine this by:

oversampling when skew is likely
handling many equal keys specially
changing bucket count by input size
falling back when partitions become imbalanced

Algorithm

adaptive_samplesort(A):
    if length(A) <= small_limit:
        insertion_sort(A)
        return A

    sample = choose_sample(A)
    sort(sample)

    splitters = choose_splitters(sample)
    buckets = create_empty_buckets(length(splitters) + 1)

    for x in A:
        b = find_bucket(splitters, x)
        buckets[b].append(x)

    for each bucket in buckets:
        if bucket is too_large_or_skewed:
            adaptive_samplesort(bucket)
        else:
            local_sort(bucket)

    return concatenate(buckets)

Example

Input:

[ A = [42, 3, 9, 42, 7, 100, 42, 5, 1, 88] ]

A sample might produce splitters:

[ [7, 42] ]

Buckets:

bucket	condition	values
1	( x < 7 )	[3, 5, 1]
2	( 7 \le x < 42 )	[9, 7]
3	( x \ge 42 )	[42, 100, 42, 42, 88]

Each bucket is sorted, then concatenated.

Correctness

Every splitter defines an ordered boundary. All elements in an earlier bucket are less than or equal to all elements in later buckets, assuming bucket assignment uses consistent boundary rules.

If each bucket is sorted correctly, concatenating the buckets produces the full sorted order.

Complexity

case	time
balanced buckets	( O(n \log n) )
good distribution with many buckets	close to ( O(n) ) partition work plus local sorting
worst case without fallback	can degrade
with fallback	( O(n \log n) )

Space depends on bucket storage:

[ O(n) ]

Handling Duplicates

Many duplicate keys can overload one bucket. Adaptive samplesort often treats splitter-equal values as a separate equal bucket.

if x < splitter:
    put in left bucket
else if x == splitter:
    put in equal bucket
else:
    put in right bucket

This avoids repeatedly sorting identical values.

When to Use

Use adaptive samplesort when:

keys are unevenly distributed
duplicate keys are common
parallel partitioning is useful
cache-friendly bucket processing matters

It is common in high-performance sorting systems where partition balance is more important than simple implementation.