Skip to content

Ska Sort

High performance radix sort variant using cache friendly block partitioning and branchless techniques.

Ska sort is a high performance radix sorting algorithm designed for modern CPUs. It focuses on minimizing cache misses, branch mispredictions, and memory movement by using block based partitioning and branchless classification.

It is particularly effective for sorting large arrays of integers, pointers, or fixed width keys.

Problem

Given an array AA of fixed width keys, sort it efficiently using a radix based method optimized for hardware performance.

Idea

Ska sort processes keys digit by digit, similar to radix sort, but improves performance through:

  • block processing instead of element by element moves
  • prefetching and cache aware memory access
  • branchless classification to reduce pipeline stalls
  • minimizing writes through local buffering

The algorithm groups elements into buckets using blocks, then writes them back in contiguous segments.

Structure

At each pass:

  1. Read a block of elements
  2. classify each element into a bucket using the current digit
  3. store elements into temporary buffers per bucket
  4. flush buffers to output array when full

This avoids frequent random writes and improves spatial locality.

Algorithm Sketch

ska_sort(A, digit):
    if size(A) small:
        comparison_sort(A)
        return

    initialize buffers for each bucket

    for block in A:
        for x in block:
            b = extract_digit(x, digit)
            append x to buffer[b]

            if buffer[b] full:
                flush buffer[b] to output

    flush all buffers

    for each bucket:
        recursively sort bucket with next digit

Example

Sort:

[102,15,230,78,54,12,89] [102, 15, 230, 78, 54, 12, 89]

Step 1: classify by most significant byte Step 2: group into contiguous regions Step 3: recursively sort each region

Final result:

[12,15,54,78,89,102,230] [12, 15, 54, 78, 89, 102, 230]

Correctness

Each pass partitions elements according to the current digit. All elements assigned to earlier buckets are smaller than those in later buckets with respect to that digit.

Recursive processing ensures ordering by all remaining digits. Since all elements are placed into exactly one bucket per level, and buckets are processed in order, the final array is sorted.

Complexity

Let:

  • nn be number of elements
  • mm be number of digit positions
  • bb be radix

Time:

O(nm) O(n \cdot m)

with very low constant factors due to cache efficiency.

Space:

O(n+bbuffer size) O(n + b \cdot \text{buffer size})

Buffers are typically small relative to nn.

Properties

propertyvalue
stableno
in placeno
comparison basedno
cache efficientyes
branchlessmostly

When to Use

Use ska sort when:

  • sorting large arrays of primitive or fixed size keys
  • memory bandwidth and cache performance dominate
  • branch prediction cost must be minimized
  • high throughput is required

Avoid when:

  • stability is required
  • keys are variable length or complex objects
  • implementation simplicity is preferred

Notes

  • Often used in performance critical libraries
  • Works well with SIMD and prefetching
  • Requires careful tuning for hardware
  • Typically faster than naive radix sort in practice