Construct a suffix array by sorting rank pairs with radix or counting sort during the doubling method.
Suffix array radix sort is the radix-optimized version of the suffix array doubling method. It still sorts suffixes by rank pairs, but uses counting sort or radix sort instead of comparison sorting.
This reduces each doubling round from comparison sorting over pairs to linear sorting over integer ranks.
Problem
Given a string of length , construct the suffix array:
The suffix array lists starting positions in lexicographic suffix order.
Idea
At each doubling step, suffixes are represented by rank pairs:
where is the current prefix length.
Since ranks are integers in a bounded range, these pairs can be sorted with radix sort:
- sort by second rank
- stable sort by first rank
After sorting, assign new ranks and double .
Algorithm
suffix_array_radix_sort(S):
n = length(S)
SA = [0, 1, ..., n - 1]
rank = initial character ranks
k = 1
while k < n:
SA = radix_sort_pairs(SA, rank, k)
new_rank[SA[0]] = 0
for i from 1 to n - 1:
prev = SA[i - 1]
curr = SA[i]
if pair(prev, rank, k) != pair(curr, rank, k):
new_rank[curr] = new_rank[prev] + 1
else:
new_rank[curr] = new_rank[prev]
rank = new_rank
k = 2 * k
return SAPair Sorting
For an index , define:
If , use sentinel rank .
A stable radix sort first sorts by the second component, then by the first component.
Example
Let:
The suffixes are:
| index | suffix |
|---|---|
| 0 | banana |
| 1 | anana |
| 2 | nana |
| 3 | ana |
| 4 | na |
| 5 | a |
Final suffix order:
Suffix array:
Correctness
At each step, ranks encode the lexicographic order of prefixes of length . Sorting by the pair orders prefixes of length .
Radix sorting preserves the same pair order as comparison sorting because it performs stable sorting by the lower-priority component first and the higher-priority component second. After ranks are reassigned, the invariant holds for doubled prefix length.
When , the ranked prefix covers each suffix completely, so is the suffix array.
Complexity
Let:
- be the string length
- be the number of distinct ranks
Each doubling round performs linear integer sorting:
Since , each round costs:
There are:
rounds.
Total time:
Space:
Properties
| property | value |
|---|---|
| stable | yes, inside radix passes |
| in place | no |
| comparison based | no for rank-pair sorting |
| output | suffix array |
| typical time |
When to Use
Use suffix array radix sort when the doubling method is simple enough, but comparison sorting is too slow. It is a good practical choice for moderate to large strings when linear-time suffix array algorithms are unnecessary or harder to implement.
Avoid it when strict linear construction is required, where SA-IS or induced sorting is a better fit.
Implementation
def suffix_array_radix_sort(s):
n = len(s)
if n == 0:
return []
sa = list(range(n))
rank = [ord(c) for c in s]
k = 1
while k < n:
max_rank = max(rank) + 1
sa = counting_sort_sa(sa, rank, k, max_rank)
sa = counting_sort_sa(sa, rank, 0, max_rank)
new_rank = [0] * n
for i in range(1, n):
prev = sa[i - 1]
curr = sa[i]
prev_key = (
rank[prev],
rank[prev + k] if prev + k < n else -1,
)
curr_key = (
rank[curr],
rank[curr + k] if curr + k < n else -1,
)
if prev_key != curr_key:
new_rank[curr] = new_rank[prev] + 1
else:
new_rank[curr] = new_rank[prev]
rank = new_rank
k <<= 1
if rank[sa[-1]] == n - 1:
break
return sa
def counting_sort_sa(sa, rank, offset, max_rank):
# Shift sentinel -1 to bucket 0.
count = [0] * (max_rank + 1)
for i in sa:
key = rank[i + offset] + 1 if i + offset < len(rank) else 0
count[key] += 1
for i in range(1, len(count)):
count[i] += count[i - 1]
out = [0] * len(sa)
for i in range(len(sa) - 1, -1, -1):
idx = sa[i]
key = rank[idx + offset] + 1 if idx + offset < len(rank) else 0
count[key] -= 1
out[count[key]] = idx
return out