Linear time suffix array construction using the DC3 divide and conquer strategy.
DC3, also called skew algorithm, builds a suffix array in linear time by splitting suffix positions into residue classes modulo 3 and recursively sorting a subset.
It reduces the full suffix ordering problem into a smaller problem, then merges results using rank comparisons.
Problem
Given a string of length , construct its suffix array:
such that all suffixes are sorted lexicographically.
Key Idea
Split suffixes into three groups:
| group | positions |
|---|---|
| i mod 3 = 0 | |
| i mod 3 = 1 | |
| i mod 3 = 2 |
Sort recursively, then induce order for .
Algorithm
Step 1: Form triplets
For each position :
Step 2: Sort triplets
Sort all positions where .
Step 3: Assign ranks
Assign lexicographic ranks to triplets.
Step 4: Recursive step
If ranks are not unique, recurse on reduced string.
Step 5: Sort
Use sorted and to sort using rank comparisons.
Step 6: Merge
Merge sorted with sorted .
dc3(T):
split positions into S1 and S2
sort S1 ∪ S2 by triplets
assign ranks
if ranks not unique:
recurse
sort S0 using ranks
merge S0 and S1∪S2
return SAExample
Text:
Positions:
| i | char |
|---|---|
| 0 | b |
| 1 | a |
| 2 | n |
| 3 | a |
| 4 | n |
| 5 | a |
Groups:
| group | positions |
|---|---|
| S0 | 0, 3 |
| S1 | 1, 4 |
| S2 | 2, 5 |
After sorting and ranking, DC3 merges all suffixes into:
| SA index | suffix |
|---|---|
| 0 | a |
| 1 | ana |
| 2 | anana |
| 3 | banana |
| 4 | na |
| 5 | nana |
Correctness
DC3 works by reducing suffix comparison to fixed length tuples and ensuring lexicographic order through recursive ranking.
Suffixes in are sorted recursively, giving stable ranks. These ranks act as compressed keys for comparing suffixes in .
The final merge step preserves ordering because each suffix comparison is reduced to rank comparison or direct character comparison in a bounded window.
Thus the output is a correct suffix array.
Complexity
Let .
| phase | cost |
|---|---|
| triplet formation | |
| recursive sorting | |
| merging |
Total time:
Space complexity:
When to Use
DC3 is useful when:
- linear time suffix array construction is required
- SA-IS is not preferred due to implementation complexity
- theoretical guarantees are important
- working with academic or algorithmic research systems
In practice, SA-IS is often faster and simpler, but DC3 remains a classic linear construction method.
Implementation
def dc3(T):
# Simplified structure of DC3 algorithm
def sort_triplets(T, positions):
pass
def assign_ranks(sorted_positions):
pass
def merge(sa0, sa12):
pass
S1S2 = [i for i in range(len(T)) if i % 3 != 0]
SA12 = sort_triplets(T, S1S2)
ranks = assign_ranks(SA12)
if len(set(ranks)) != len(ranks):
SA12 = dc3(ranks)
SA0 = []
SA = merge(SA0, SA12)
return SAfunc DC3(text []int) []int {
// Simplified skeleton
sortTriplets := func() []int { return nil }
assignRanks := func([]int) []int { return nil }
merge := func([]int, []int) []int { return nil }
s1s2 := []int{}
for i := 0; i < len(text); i++ {
if i%3 != 0 {
s1s2 = append(s1s2, i)
}
}
sa12 := sortTriplets()
ranks := assignRanks(sa12)
_ = ranks
sa0 := []int{}
sa := merge(sa0, sa12)
return sa
}