Spreadsort

Spreadsort is a hybrid sorting algorithm that combines ideas from radix sort and comparison-based sorting. It partitions data based on key ranges and recursively sorts partitions, switching to comparison sorting when partitions become small or distribution becomes irregular.

It is designed to achieve near linear performance on structured data while retaining robustness on arbitrary inputs.

Problem

Given an array $A$ of sortable keys, possibly integers, floating point values, or strings, sort the array efficiently under varying distributions.

Idea

Spreadsort works by:

Estimating the value range of the data
Partitioning elements into buckets based on high-order bits or value intervals
Recursively sorting each bucket
Switching to comparison sort when buckets are small or uneven

It adapts between radix-like behavior and comparison sort based on input characteristics.

Algorithm Sketch

spreadsort(A):
    if size(A) small:
        comparison_sort(A)
        return A

    determine value range [min, max]
    choose bucket count k

    partition A into k buckets based on range

    for each bucket:
        if bucket large:
            spreadsort(bucket)
        else:
            comparison_sort(bucket)

    concatenate buckets

Example

Sort:

A = [102, 15, 230, 78, 54, 12, 89]

Step 1: determine range Step 2: partition into buckets Step 3: recursively sort each bucket

Final result:

[12, 15, 54, 78, 89, 102, 230]

Correctness

Partitioning ensures that all elements in earlier buckets are less than elements in later buckets. Recursive sorting orders elements within each bucket. Switching to comparison sort preserves correctness for small or irregular partitions.

Complexity

Let:

$n$ be number of elements

Average time:

O(n)

Worst case:

O(n \log n)

depending on fallback comparison sorting.

Space:

O(n)

due to bucket storage.

Properties

property	value
stable	no
in place	no
comparison based	hybrid
adaptive	yes
distribution sensitive	yes

When to Use

Use spreadsort when:

data has some structure or bounded range
high performance is needed across mixed distributions
radix style sorting may help but full assumptions do not hold

Avoid when:

strict worst case guarantees are required
memory usage must be minimal
simple algorithms are preferred

Notes

Often used in high performance libraries
Works well for integers, floats, and strings
Adapts dynamically to input characteristics