Skip to content

Integer Sample Sort

Sample sort specialized for integer keys using range-based partitioning and optional radix-style bucket classification.

Integer sample sort is a specialization of sample sort for integer keys. It selects sample elements to estimate the distribution, chooses splitters, and partitions the array into buckets. Each bucket is then sorted recursively or with a suitable subroutine.

Because keys are integers, classification can use arithmetic or bit operations, and buckets can be balanced more precisely than in general comparison sorting.

Problem

Given an array AA of nn integers, sort the array efficiently using sampling to guide partitioning.

Idea

  1. Select a random or stratified sample from AA
  2. Sort the sample
  3. Choose splitters from the sample
  4. Partition elements into buckets using splitters
  5. Recursively sort each bucket

For integers, the partition step can be optimized with branchless comparisons or radix-like classification.

Splitters

If kk buckets are used, choose k1k - 1 splitters:

s1<s2<<sk1 s_1 < s_2 < \cdots < s_{k-1}

Buckets are defined as:

B0=x<s1,Bi=six<si+1,Bk1=xsk1 B_0 = {x < s_1},\quad B_i = {s_i \le x < s_{i+1}},\quad B_{k-1} = {x \ge s_{k-1}}

Algorithm

integer_sample_sort(A):
    if len(A) <= threshold:
        insertion_sort(A)
        return A

    sample = pick_sample(A)
    sort(sample)

    splitters = choose_splitters(sample)

    buckets = [empty list for _ in range(k)]

    for x in A:
        i = bucket_index(x, splitters)
        buckets[i].append(x)

    result = []
    for b in buckets:
        result.extend(integer_sample_sort(b))

    return result

Example

Sort:

A=[91,12,43,27,65,8,50,34] A = [91, 12, 43, 27, 65, 8, 50, 34]

Sample:

[12,43,65] [12, 43, 65]

Splitters:

[30,60] [30, 60]

Buckets:

bucketrangevalues
0x<30x < 3012, 27, 8
130x<6030 \le x < 6043, 50, 34
2x60x \ge 6091, 65

Recursively sort each bucket and concatenate:

[8,12,27,34,43,50,65,91] [8, 12, 27, 34, 43, 50, 65, 91]

Correctness

Splitters partition the domain into ordered intervals. Every element belongs to exactly one interval. All elements in earlier buckets are less than those in later buckets. Recursive sorting ensures order within each bucket, so concatenation yields a sorted array.

Complexity

Let:

  • nn be the number of elements
  • kk be the number of buckets

Expected time:

O(nlogn) O(n \log n)

With well balanced buckets, depth is O(logkn)O(\log_k n).

Worst case:

O(n2) O(n^2)

if partitioning is highly unbalanced.

Space:

O(n) O(n)

for bucket storage.

Properties

propertyvalue
stableno
in placeno
comparison basedyes
adaptiveyes
parallel friendlyyes

When to Use

Use integer sample sort when:

  • sorting large integer arrays
  • parallelization is desired
  • distribution is unknown but sampling can estimate it
  • radix sort is not ideal due to key structure or constraints

Avoid when:

  • small input sizes
  • strict worst case guarantees are required
  • memory usage must be minimal

Notes

  • Often combined with block partitioning for performance
  • Can switch to radix or insertion sort for small buckets
  • Splitter quality strongly affects performance