Build a suffix array using the prefix doubling technique with iterative ranking.
Suffix array doubling sort constructs a suffix array by iteratively sorting prefixes of length until all suffixes are uniquely ranked.
It replaces full suffix comparisons with rank comparisons that double in length each round.
Problem
Given a string of length , construct a suffix array:
such that suffixes of are sorted lexicographically.
Idea
Each suffix is represented by a pair of ranks:
where doubles each iteration:
Initially, ranks are based on single characters.
Algorithm
Step 1: Initialization
Assign rank based on characters.
Step 2: Iterative sorting
Sort suffixes using rank pairs.
suffix_array_doubling(T):
n = length(T)
SA = [0..n-1]
rank = initial_character_ranks(T)
k = 1
while k < n:
sort SA by (rank[i], rank[i + k])
new_rank = array of size n
new_rank[SA[0]] = 0
for i from 1 to n-1:
prev = SA[i-1]
curr = SA[i]
if (rank[curr], rank[curr+k]) == (rank[prev], rank[prev+k]):
new_rank[curr] = new_rank[prev]
else:
new_rank[curr] = new_rank[prev] + 1
rank = new_rank
k = k * 2
return SAExample
Text:
Initial ranks (single characters):
| index | char | rank |
|---|---|---|
| 0 | b | 1 |
| 1 | a | 0 |
| 2 | n | 13 |
| 3 | a | 0 |
| 4 | n | 13 |
| 5 | a | 0 |
First iteration ():
Sort by pairs:
| suffix | pair |
|---|---|
| a | (0,13) |
| a | (0,0) |
| a | (0,-) |
| banana | (1,0) |
| na | (13,0) |
| nana | (13,0) |
After repeated doubling, final suffix array becomes:
| SA index | suffix |
|---|---|
| 0 | a |
| 1 | ana |
| 2 | anana |
| 3 | banana |
| 4 | na |
| 5 | nana |
Correctness
At iteration , each suffix is identified by the first characters via its rank pair:
If two suffixes have identical rank pairs, their first characters are identical. Otherwise, lexicographic ordering is preserved by comparing rank pairs.
Since doubles each iteration, eventually , and all suffixes are uniquely ranked, giving a correct lexicographic ordering.
Thus the final array is a valid suffix array.
Complexity
Let .
| operation | time |
|---|---|
| each sorting step | |
| number of steps | |
| total naive implementation |
With radix or counting sort optimization:
Space complexity:
When to Use
Suffix array doubling is useful when:
- implementing suffix arrays from scratch
- no advanced linear time construction is required
- clarity and simplicity are preferred over optimal asymptotic performance
- text indexing systems are being prototyped
It is often the baseline method before moving to SA-IS or DC3 algorithms.
Implementation
def suffix_array_doubling(s):
n = len(s)
sa = list(range(n))
rank = [ord(c) for c in s]
k = 1
while k < n:
sa.sort(key=lambda i: (rank[i], rank[i+k] if i+k < n else -1))
new_rank = [0] * n
new_rank[sa[0]] = 0
for i in range(1, n):
prev, curr = sa[i-1], sa[i]
prev_key = (rank[prev], rank[prev+k] if prev+k < n else -1)
curr_key = (rank[curr], rank[curr+k] if curr+k < n else -1)
if curr_key == prev_key:
new_rank[curr] = new_rank[prev]
else:
new_rank[curr] = new_rank[prev] + 1
rank = new_rank
k *= 2
return saimport "sort"
func SuffixArrayDoubling(s string) []int {
n := len(s)
sa := make([]int, n)
rank := make([]int, n)
for i := 0; i < n; i++ {
sa[i] = i
rank[i] = int(s[i])
}
k := 1
getRank := func(i int) int {
if i < n {
return rank[i]
}
return -1
}
for k < n {
sort.Slice(sa, func(i, j int) bool {
a, b := sa[i], sa[j]
if rank[a] != rank[b] {
return rank[a] < rank[b]
}
return getRank(a+k) < getRank(b+k)
})
newRank := make([]int, n)
newRank[sa[0]] = 0
for i := 1; i < n; i++ {
prev, curr := sa[i-1], sa[i]
prevKey := [2]int{rank[prev], getRank(prev + k)}
currKey := [2]int{rank[curr], getRank(curr + k)}
if currKey == prevKey {
newRank[curr] = newRank[prev]
} else {
newRank[curr] = newRank[prev] + 1
}
}
rank = newRank
k *= 2
}
return sa
}