High performance radix sort variant using cache friendly block partitioning and branchless techniques.
Ska sort is a high performance radix sorting algorithm designed for modern CPUs. It focuses on minimizing cache misses, branch mispredictions, and memory movement by using block based partitioning and branchless classification.
It is particularly effective for sorting large arrays of integers, pointers, or fixed width keys.
Problem
Given an array of fixed width keys, sort it efficiently using a radix based method optimized for hardware performance.
Idea
Ska sort processes keys digit by digit, similar to radix sort, but improves performance through:
- block processing instead of element by element moves
- prefetching and cache aware memory access
- branchless classification to reduce pipeline stalls
- minimizing writes through local buffering
The algorithm groups elements into buckets using blocks, then writes them back in contiguous segments.
Structure
At each pass:
- Read a block of elements
- classify each element into a bucket using the current digit
- store elements into temporary buffers per bucket
- flush buffers to output array when full
This avoids frequent random writes and improves spatial locality.
Algorithm Sketch
ska_sort(A, digit):
if size(A) small:
comparison_sort(A)
return
initialize buffers for each bucket
for block in A:
for x in block:
b = extract_digit(x, digit)
append x to buffer[b]
if buffer[b] full:
flush buffer[b] to output
flush all buffers
for each bucket:
recursively sort bucket with next digitExample
Sort:
Step 1: classify by most significant byte Step 2: group into contiguous regions Step 3: recursively sort each region
Final result:
Correctness
Each pass partitions elements according to the current digit. All elements assigned to earlier buckets are smaller than those in later buckets with respect to that digit.
Recursive processing ensures ordering by all remaining digits. Since all elements are placed into exactly one bucket per level, and buckets are processed in order, the final array is sorted.
Complexity
Let:
- be number of elements
- be number of digit positions
- be radix
Time:
with very low constant factors due to cache efficiency.
Space:
Buffers are typically small relative to .
Properties
| property | value |
|---|---|
| stable | no |
| in place | no |
| comparison based | no |
| cache efficient | yes |
| branchless | mostly |
When to Use
Use ska sort when:
- sorting large arrays of primitive or fixed size keys
- memory bandwidth and cache performance dominate
- branch prediction cost must be minimized
- high throughput is required
Avoid when:
- stability is required
- keys are variable length or complex objects
- implementation simplicity is preferred
Notes
- Often used in performance critical libraries
- Works well with SIMD and prefetching
- Requires careful tuning for hardware
- Typically faster than naive radix sort in practice