Skip to content

Suffix Array Radix Sort

Construct a suffix array by sorting rank pairs with radix or counting sort during the doubling method.

Suffix array radix sort is the radix-optimized version of the suffix array doubling method. It still sorts suffixes by rank pairs, but uses counting sort or radix sort instead of comparison sorting.

This reduces each doubling round from comparison sorting over pairs to linear sorting over integer ranks.

Problem

Given a string SS of length nn, construct the suffix array:

SA=sorted indices of all suffixes of S SA = \text{sorted indices of all suffixes of } S

The suffix array lists starting positions in lexicographic suffix order.

Idea

At each doubling step, suffixes are represented by rank pairs:

(rank[i],rank[i+k]) (\text{rank}[i], \text{rank}[i + k])

where kk is the current prefix length.

Since ranks are integers in a bounded range, these pairs can be sorted with radix sort:

  1. sort by second rank
  2. stable sort by first rank

After sorting, assign new ranks and double kk.

Algorithm

suffix_array_radix_sort(S):
    n = length(S)
    SA = [0, 1, ..., n - 1]
    rank = initial character ranks
    k = 1

    while k < n:
        SA = radix_sort_pairs(SA, rank, k)

        new_rank[SA[0]] = 0

        for i from 1 to n - 1:
            prev = SA[i - 1]
            curr = SA[i]

            if pair(prev, rank, k) != pair(curr, rank, k):
                new_rank[curr] = new_rank[prev] + 1
            else:
                new_rank[curr] = new_rank[prev]

        rank = new_rank
        k = 2 * k

    return SA

Pair Sorting

For an index ii, define:

p(i)=(rank[i],rank[i+k]) p(i) = (\text{rank}[i], \text{rank}[i+k])

If i+kni+k \ge n, use sentinel rank 1-1.

A stable radix sort first sorts by the second component, then by the first component.

Example

Let:

S="banana" S = "banana"

The suffixes are:

indexsuffix
0banana
1anana
2nana
3ana
4na
5a

Final suffix order:

["a","ana","anana","banana","na","nana"] ["a", "ana", "anana", "banana", "na", "nana"]

Suffix array:

[5,3,1,0,4,2] [5, 3, 1, 0, 4, 2]

Correctness

At each step, ranks encode the lexicographic order of prefixes of length kk. Sorting by the pair (rank[i],rank[i+k])(\text{rank}[i], \text{rank}[i+k]) orders prefixes of length 2k2k.

Radix sorting preserves the same pair order as comparison sorting because it performs stable sorting by the lower-priority component first and the higher-priority component second. After ranks are reassigned, the invariant holds for doubled prefix length.

When knk \ge n, the ranked prefix covers each suffix completely, so SASA is the suffix array.

Complexity

Let:

  • nn be the string length
  • RR be the number of distinct ranks

Each doubling round performs linear integer sorting:

O(n+R) O(n + R)

Since RnR \le n, each round costs:

O(n) O(n)

There are:

O(logn) O(\log n)

rounds.

Total time:

O(nlogn) O(n \log n)

Space:

O(n) O(n)

Properties

propertyvalue
stableyes, inside radix passes
in placeno
comparison basedno for rank-pair sorting
outputsuffix array
typical timeO(nlogn)O(n \log n)

When to Use

Use suffix array radix sort when the doubling method is simple enough, but comparison sorting is too slow. It is a good practical choice for moderate to large strings when linear-time suffix array algorithms are unnecessary or harder to implement.

Avoid it when strict linear construction is required, where SA-IS or induced sorting is a better fit.

Implementation

def suffix_array_radix_sort(s):
    n = len(s)
    if n == 0:
        return []

    sa = list(range(n))
    rank = [ord(c) for c in s]
    k = 1

    while k < n:
        max_rank = max(rank) + 1

        sa = counting_sort_sa(sa, rank, k, max_rank)
        sa = counting_sort_sa(sa, rank, 0, max_rank)

        new_rank = [0] * n

        for i in range(1, n):
            prev = sa[i - 1]
            curr = sa[i]

            prev_key = (
                rank[prev],
                rank[prev + k] if prev + k < n else -1,
            )
            curr_key = (
                rank[curr],
                rank[curr + k] if curr + k < n else -1,
            )

            if prev_key != curr_key:
                new_rank[curr] = new_rank[prev] + 1
            else:
                new_rank[curr] = new_rank[prev]

        rank = new_rank
        k <<= 1

        if rank[sa[-1]] == n - 1:
            break

    return sa

def counting_sort_sa(sa, rank, offset, max_rank):
    # Shift sentinel -1 to bucket 0.
    count = [0] * (max_rank + 1)

    for i in sa:
        key = rank[i + offset] + 1 if i + offset < len(rank) else 0
        count[key] += 1

    for i in range(1, len(count)):
        count[i] += count[i - 1]

    out = [0] * len(sa)

    for i in range(len(sa) - 1, -1, -1):
        idx = sa[i]
        key = rank[idx + offset] + 1 if idx + offset < len(rank) else 0
        count[key] -= 1
        out[count[key]] = idx

    return out