Sample sort specialized for integer keys using range-based partitioning and optional radix-style bucket classification.
Integer sample sort is a specialization of sample sort for integer keys. It selects sample elements to estimate the distribution, chooses splitters, and partitions the array into buckets. Each bucket is then sorted recursively or with a suitable subroutine.
Because keys are integers, classification can use arithmetic or bit operations, and buckets can be balanced more precisely than in general comparison sorting.
Problem
Given an array of integers, sort the array efficiently using sampling to guide partitioning.
Idea
- Select a random or stratified sample from
- Sort the sample
- Choose splitters from the sample
- Partition elements into buckets using splitters
- Recursively sort each bucket
For integers, the partition step can be optimized with branchless comparisons or radix-like classification.
Splitters
If buckets are used, choose splitters:
Buckets are defined as:
Algorithm
integer_sample_sort(A):
if len(A) <= threshold:
insertion_sort(A)
return A
sample = pick_sample(A)
sort(sample)
splitters = choose_splitters(sample)
buckets = [empty list for _ in range(k)]
for x in A:
i = bucket_index(x, splitters)
buckets[i].append(x)
result = []
for b in buckets:
result.extend(integer_sample_sort(b))
return resultExample
Sort:
Sample:
Splitters:
Buckets:
| bucket | range | values |
|---|---|---|
| 0 | 12, 27, 8 | |
| 1 | 43, 50, 34 | |
| 2 | 91, 65 |
Recursively sort each bucket and concatenate:
Correctness
Splitters partition the domain into ordered intervals. Every element belongs to exactly one interval. All elements in earlier buckets are less than those in later buckets. Recursive sorting ensures order within each bucket, so concatenation yields a sorted array.
Complexity
Let:
- be the number of elements
- be the number of buckets
Expected time:
With well balanced buckets, depth is .
Worst case:
if partitioning is highly unbalanced.
Space:
for bucket storage.
Properties
| property | value |
|---|---|
| stable | no |
| in place | no |
| comparison based | yes |
| adaptive | yes |
| parallel friendly | yes |
When to Use
Use integer sample sort when:
- sorting large integer arrays
- parallelization is desired
- distribution is unknown but sampling can estimate it
- radix sort is not ideal due to key structure or constraints
Avoid when:
- small input sizes
- strict worst case guarantees are required
- memory usage must be minimal
Notes
- Often combined with block partitioning for performance
- Can switch to radix or insertion sort for small buckets
- Splitter quality strongly affects performance