# Suffix Array Radix Sort

# Suffix Array Radix Sort

Suffix array radix sort is the radix-optimized version of the suffix array doubling method. It still sorts suffixes by rank pairs, but uses counting sort or radix sort instead of comparison sorting.

This reduces each doubling round from comparison sorting over pairs to linear sorting over integer ranks.

## Problem

Given a string $S$ of length $n$, construct the suffix array:

$$
SA = \text{sorted indices of all suffixes of } S
$$

The suffix array lists starting positions in lexicographic suffix order.

## Idea

At each doubling step, suffixes are represented by rank pairs:

$$
(\text{rank}[i], \text{rank}[i + k])
$$

where $k$ is the current prefix length.

Since ranks are integers in a bounded range, these pairs can be sorted with radix sort:

1. sort by second rank
2. stable sort by first rank

After sorting, assign new ranks and double $k$.

## Algorithm

```text id="k9q2mx"
suffix_array_radix_sort(S):
    n = length(S)
    SA = [0, 1, ..., n - 1]
    rank = initial character ranks
    k = 1

    while k < n:
        SA = radix_sort_pairs(SA, rank, k)

        new_rank[SA[0]] = 0

        for i from 1 to n - 1:
            prev = SA[i - 1]
            curr = SA[i]

            if pair(prev, rank, k) != pair(curr, rank, k):
                new_rank[curr] = new_rank[prev] + 1
            else:
                new_rank[curr] = new_rank[prev]

        rank = new_rank
        k = 2 * k

    return SA
```

## Pair Sorting

For an index $i$, define:

$$
p(i) = (\text{rank}[i], \text{rank}[i+k])
$$

If $i+k \ge n$, use sentinel rank $-1$.

A stable radix sort first sorts by the second component, then by the first component.

## Example

Let:

$$
S = "banana"
$$

The suffixes are:

| index | suffix |
| ----- | ------ |
| 0     | banana |
| 1     | anana  |
| 2     | nana   |
| 3     | ana    |
| 4     | na     |
| 5     | a      |

Final suffix order:

$$
["a", "ana", "anana", "banana", "na", "nana"]
$$

Suffix array:

$$
[5, 3, 1, 0, 4, 2]
$$

## Correctness

At each step, ranks encode the lexicographic order of prefixes of length $k$. Sorting by the pair $(\text{rank}[i], \text{rank}[i+k])$ orders prefixes of length $2k$.

Radix sorting preserves the same pair order as comparison sorting because it performs stable sorting by the lower-priority component first and the higher-priority component second. After ranks are reassigned, the invariant holds for doubled prefix length.

When $k \ge n$, the ranked prefix covers each suffix completely, so $SA$ is the suffix array.

## Complexity

Let:

* $n$ be the string length
* $R$ be the number of distinct ranks

Each doubling round performs linear integer sorting:

$$
O(n + R)
$$

Since $R \le n$, each round costs:

$$
O(n)
$$

There are:

$$
O(\log n)
$$

rounds.

Total time:

$$
O(n \log n)
$$

Space:

$$
O(n)
$$

## Properties

| property         | value                    |
| ---------------- | ------------------------ |
| stable           | yes, inside radix passes |
| in place         | no                       |
| comparison based | no for rank-pair sorting |
| output           | suffix array             |
| typical time     | $O(n \log n)$            |

## When to Use

Use suffix array radix sort when the doubling method is simple enough, but comparison sorting is too slow. It is a good practical choice for moderate to large strings when linear-time suffix array algorithms are unnecessary or harder to implement.

Avoid it when strict linear construction is required, where SA-IS or induced sorting is a better fit.

## Implementation

```python id="p7m2qa"
def suffix_array_radix_sort(s):
    n = len(s)
    if n == 0:
        return []

    sa = list(range(n))
    rank = [ord(c) for c in s]
    k = 1

    while k < n:
        max_rank = max(rank) + 1

        sa = counting_sort_sa(sa, rank, k, max_rank)
        sa = counting_sort_sa(sa, rank, 0, max_rank)

        new_rank = [0] * n

        for i in range(1, n):
            prev = sa[i - 1]
            curr = sa[i]

            prev_key = (
                rank[prev],
                rank[prev + k] if prev + k < n else -1,
            )
            curr_key = (
                rank[curr],
                rank[curr + k] if curr + k < n else -1,
            )

            if prev_key != curr_key:
                new_rank[curr] = new_rank[prev] + 1
            else:
                new_rank[curr] = new_rank[prev]

        rank = new_rank
        k <<= 1

        if rank[sa[-1]] == n - 1:
            break

    return sa

def counting_sort_sa(sa, rank, offset, max_rank):
    # Shift sentinel -1 to bucket 0.
    count = [0] * (max_rank + 1)

    for i in sa:
        key = rank[i + offset] + 1 if i + offset < len(rank) else 0
        count[key] += 1

    for i in range(1, len(count)):
        count[i] += count[i - 1]

    out = [0] * len(sa)

    for i in range(len(sa) - 1, -1, -1):
        idx = sa[i]
        key = rank[idx + offset] + 1 if idx + offset < len(rank) else 0
        count[key] -= 1
        out[count[key]] = idx

    return out
```