# External Radix Sort

# External Radix Sort

External radix sort extends radix sort to datasets larger than memory. It processes keys digit by digit and uses external storage to hold intermediate buckets. Unlike comparison-based external sorts, it exploits the structure of keys to achieve linear-time sorting.

This method is effective when keys have fixed length or can be decomposed into digits.

## Problem

Given a large dataset with keys that can be decomposed into digits (for example integers or fixed-length strings), sort the dataset using external storage.

The input may exceed memory, so each pass must use sequential I/O.

## Core Idea

Process the keys digit by digit. For each digit position:

1. Read records sequentially
2. distribute them into buckets based on the current digit
3. write buckets back to storage
4. use the bucket order as input for the next pass

After processing all digits, the data is sorted.

## Algorithm

```text id="7q3vpa"
external_radix_sort(input, digits, base):
    data = input

    for d in digits (least to most significant):
        buckets = create base empty buckets

        for record in data:
            i = digit(record, d)
            append record to buckets[i]

        data = concatenate buckets in order

    return data
```

For large data, buckets are stored externally and processed sequentially.

## Example

Sort integers:

$$
[170, 45, 75, 90, 802, 24, 2, 66]
$$

Using base 10, least significant digit first.

### Pass 1 (units digit)

Buckets:

| digit | values  |
| ----- | ------- |
| 0     | 170, 90 |
| 2     | 802, 2  |
| 4     | 24      |
| 5     | 45, 75  |
| 6     | 66      |

Concatenate:

$$
[170, 90, 802, 2, 24, 45, 75, 66]
$$

### Pass 2 (tens digit)

Continue the process, preserving order from previous pass.

After all digit passes:

$$
[2, 24, 45, 66, 75, 90, 170, 802]
$$

## Correctness

Each pass is stable, so relative order of records with equal digits is preserved. When processing from least significant digit to most significant digit, each pass refines the ordering.

After the final pass, records are ordered by all digit positions, which gives a correct total order.

## Complexity

Let:

* $N$ be the number of records
* $D$ be the number of digits
* $B$ be the block size
* $k$ be the radix base

CPU time:

$$
O(D \cdot N)
$$

I/O cost:

$$
O\left(D \cdot \frac{N}{B}\right)
$$

since each pass scans and writes the full dataset.

## Design Choices

| choice              | effect                         |
| ------------------- | ------------------------------ |
| base size $k$       | fewer passes but more buckets  |
| digit width         | affects number of passes       |
| stable distribution | required for correctness       |
| bucket storage      | must support sequential writes |
| buffering           | reduces I/O overhead           |

## When to Use

Use external radix sort when:

* keys have fixed or bounded length
* digit extraction is efficient
* comparison sorting cost is high
* sequential I/O is preferred
* large datasets must be sorted with predictable passes

It is widely used in systems that sort integers, fixed-length strings, or encoded keys.

## Implementation Sketch

```python id="r5j3kt"
def radix_sort_lsd(data, base=10):
    max_val = max(data)
    exp = 1

    while max_val // exp > 0:
        buckets = [[] for _ in range(base)]

        for x in data:
            digit = (x // exp) % base
            buckets[digit].append(x)

        data = [x for bucket in buckets for x in bucket]
        exp *= base

    return data
```

```python id="t4z9qh"
def external_radix_sort_model(data):
    return radix_sort_lsd(data)
```

## Notes

External radix sort replaces comparisons with digit distribution. Its efficiency depends on the number of passes and the ability to process large buckets sequentially. It often outperforms comparison-based external sorts when keys are simple and digit extraction is cheap.

