Construct a suffix array by iteratively sorting suffixes using doubling technique with rank pairs.
Suffix array doubling sort builds a suffix array by repeatedly sorting suffixes based on prefixes of length (2^k). Each step uses previously computed ranks to compare longer prefixes efficiently.
This method avoids direct string comparison and reduces the problem to sorting integer pairs.
Problem
Given a string (S) of length (n), construct the suffix array:
Each suffix is:
Idea
Instead of comparing suffixes directly, assign each suffix a rank.
At step (k), sort suffixes using the pair:
This represents the first (2^{k+1}) characters.
After sorting, assign new ranks. Repeat until all suffixes have unique ranks.
Algorithm
suffix_array_doubling(S):
n = length(S)
SA = [0, 1, ..., n-1]
rank = initial ranks from characters
temp = array of size n
k = 0
while (1 << k) < n:
sort SA by (rank[i], rank[i + 2^k])
temp[SA[0]] = 0
for i from 1 to n-1:
prev = SA[i-1]
curr = SA[i]
if (rank[prev], rank[prev + 2^k]) < (rank[curr], rank[curr + 2^k]):
temp[curr] = temp[prev] + 1
else:
temp[curr] = temp[prev]
rank = temp
k += 1
return SAExample
Let:
Suffixes:
| index | suffix |
|---|---|
| 0 | banana |
| 1 | anana |
| 2 | nana |
| 3 | ana |
| 4 | na |
| 5 | a |
Sorted suffixes:
Suffix array:
Correctness
At step (k), suffixes are correctly ordered by their first (2^k) characters. Sorting by rank pairs extends this to (2^{k+1}) characters. Since ranks uniquely represent prefix order, the process converges when all suffixes have distinct ranks.
By induction, the final order corresponds to full lexicographic ordering of suffixes.
Complexity
Let:
- (n) be the string length
Each iteration sorts (n) elements.
Using comparison sort:
Number of iterations:
Total:
With counting or radix sort on ranks:
Space:
Properties
| property | value |
|---|---|
| stable | depends on sorting method |
| in place | no |
| comparison based | partly |
| domain | strings |
| output | suffix array |
When to Use
Use suffix array doubling when:
- constructing suffix arrays for moderate sized strings
- needing a conceptually simple and implementable method
- memory usage should remain linear
Avoid when:
- linear time algorithms such as SA-IS are required
- input size is extremely large and constant factors matter
Implementation (Python)
def suffix_array_doubling(s):
n = len(s)
sa = list(range(n))
rank = [ord(c) for c in s]
tmp = [0] * n
k = 1
while k < n:
sa.sort(key=lambda i: (rank[i], rank[i + k] if i + k < n else -1))
tmp[sa[0]] = 0
for i in range(1, n):
prev = sa[i - 1]
curr = sa[i]
prev_key = (rank[prev], rank[prev + k] if prev + k < n else -1)
curr_key = (rank[curr], rank[curr + k] if curr + k < n else -1)
if prev_key < curr_key:
tmp[curr] = tmp[prev] + 1
else:
tmp[curr] = tmp[prev]
rank = tmp[:]
k <<= 1
return sa