External-memory radix sort that processes large datasets by distributing records into digit-based buckets across multiple passes.
External radix sort extends radix sort to datasets larger than memory. It processes keys digit by digit and uses external storage to hold intermediate buckets. Unlike comparison-based external sorts, it exploits the structure of keys to achieve linear-time sorting.
This method is effective when keys have fixed length or can be decomposed into digits.
Problem
Given a large dataset with keys that can be decomposed into digits (for example integers or fixed-length strings), sort the dataset using external storage.
The input may exceed memory, so each pass must use sequential I/O.
Core Idea
Process the keys digit by digit. For each digit position:
- Read records sequentially
- distribute them into buckets based on the current digit
- write buckets back to storage
- use the bucket order as input for the next pass
After processing all digits, the data is sorted.
Algorithm
external_radix_sort(input, digits, base):
data = input
for d in digits (least to most significant):
buckets = create base empty buckets
for record in data:
i = digit(record, d)
append record to buckets[i]
data = concatenate buckets in order
return dataFor large data, buckets are stored externally and processed sequentially.
Example
Sort integers:
Using base 10, least significant digit first.
Pass 1 (units digit)
Buckets:
| digit | values |
|---|---|
| 0 | 170, 90 |
| 2 | 802, 2 |
| 4 | 24 |
| 5 | 45, 75 |
| 6 | 66 |
Concatenate:
Pass 2 (tens digit)
Continue the process, preserving order from previous pass.
After all digit passes:
Correctness
Each pass is stable, so relative order of records with equal digits is preserved. When processing from least significant digit to most significant digit, each pass refines the ordering.
After the final pass, records are ordered by all digit positions, which gives a correct total order.
Complexity
Let:
- be the number of records
- be the number of digits
- be the block size
- be the radix base
CPU time:
I/O cost:
since each pass scans and writes the full dataset.
Design Choices
| choice | effect |
|---|---|
| base size | fewer passes but more buckets |
| digit width | affects number of passes |
| stable distribution | required for correctness |
| bucket storage | must support sequential writes |
| buffering | reduces I/O overhead |
When to Use
Use external radix sort when:
- keys have fixed or bounded length
- digit extraction is efficient
- comparison sorting cost is high
- sequential I/O is preferred
- large datasets must be sorted with predictable passes
It is widely used in systems that sort integers, fixed-length strings, or encoded keys.
Implementation Sketch
def radix_sort_lsd(data, base=10):
max_val = max(data)
exp = 1
while max_val // exp > 0:
buckets = [[] for _ in range(base)]
for x in data:
digit = (x // exp) % base
buckets[digit].append(x)
data = [x for bucket in buckets for x in bucket]
exp *= base
return datadef external_radix_sort_model(data):
return radix_sort_lsd(data)Notes
External radix sort replaces comparisons with digit distribution. Its efficiency depends on the number of passes and the ability to process large buckets sequentially. It often outperforms comparison-based external sorts when keys are simple and digit extraction is cheap.