Hybrid distribution and comparison sort that adapts between radix-like partitioning and comparison sorting.
Spreadsort is a hybrid sorting algorithm that combines ideas from radix sort and comparison-based sorting. It partitions data based on key ranges and recursively sorts partitions, switching to comparison sorting when partitions become small or distribution becomes irregular.
It is designed to achieve near linear performance on structured data while retaining robustness on arbitrary inputs.
Problem
Given an array of sortable keys, possibly integers, floating point values, or strings, sort the array efficiently under varying distributions.
Idea
Spreadsort works by:
- Estimating the value range of the data
- Partitioning elements into buckets based on high-order bits or value intervals
- Recursively sorting each bucket
- Switching to comparison sort when buckets are small or uneven
It adapts between radix-like behavior and comparison sort based on input characteristics.
Algorithm Sketch
spreadsort(A):
if size(A) small:
comparison_sort(A)
return A
determine value range [min, max]
choose bucket count k
partition A into k buckets based on range
for each bucket:
if bucket large:
spreadsort(bucket)
else:
comparison_sort(bucket)
concatenate bucketsExample
Sort:
Step 1: determine range Step 2: partition into buckets Step 3: recursively sort each bucket
Final result:
Correctness
Partitioning ensures that all elements in earlier buckets are less than elements in later buckets. Recursive sorting orders elements within each bucket. Switching to comparison sort preserves correctness for small or irregular partitions.
Complexity
Let:
- be number of elements
Average time:
Worst case:
depending on fallback comparison sorting.
Space:
due to bucket storage.
Properties
| property | value |
|---|---|
| stable | no |
| in place | no |
| comparison based | hybrid |
| adaptive | yes |
| distribution sensitive | yes |
When to Use
Use spreadsort when:
- data has some structure or bounded range
- high performance is needed across mixed distributions
- radix style sorting may help but full assumptions do not hold
Avoid when:
- strict worst case guarantees are required
- memory usage must be minimal
- simple algorithms are preferred
Notes
- Often used in high performance libraries
- Works well for integers, floats, and strings
- Adapts dynamically to input characteristics