Generalized divide and conquer suffix array construction using modulo 7 partitioning.
DC7 is a generalization of DC3 suffix array construction. It splits suffix positions into residue classes modulo 7 and recursively sorts a subset of suffixes, then induces the remaining order.
The goal is to reduce comparison depth by encoding suffixes into fixed length tuples and merging sorted partitions.
Problem
Given a string of length , construct its suffix array:
such that suffixes are sorted lexicographically.
Key Idea
Partition suffix positions by modulo 7:
| group | positions |
|---|---|
| i mod 7 = 0 | |
| i mod 7 = 1 | |
| i mod 7 = 2 | |
| i mod 7 = 3 | |
| i mod 7 = 4 | |
| i mod 7 = 5 | |
| i mod 7 = 6 |
DC7 sorts a subset of these positions recursively, typically all non-zero classes, then reconstructs full ordering.
Algorithm
Step 1: Build tuples
Each position is represented by a fixed length tuple of characters:
Step 2: Sort selected groups
Sort positions in selected residue classes using radix or comparison sort.
Step 3: Assign ranks
Convert sorted tuples into integer ranks.
Step 4: Recursive reduction
If ranks are not unique, recursively apply DC7 to the reduced problem.
Step 5: Induce remaining suffixes
Use sorted subset to infer ordering of remaining suffixes.
Step 6: Merge
Combine all residue classes into final suffix array.
dc7(T):
split positions by mod 7
build 7-length tuples
sort selected residue classes
assign ranks
if ranks not unique:
recurse on reduced string
induce remaining suffixes
merge all groups
return SAExample
Text:
Positions:
| i | char |
|---|---|
| 0 | b |
| 1 | a |
| 2 | n |
| 3 | a |
| 4 | n |
| 5 | a |
Residue groups:
| group | positions |
|---|---|
| S0 | 0 |
| S1 | 1 |
| S2 | 2 |
| S3 | 3 |
| S4 | 4 |
| S5 | 5 |
| S6 | none |
After tuple ranking and merging, the final suffix array becomes:
| SA index | suffix |
|---|---|
| 0 | a |
| 1 | ana |
| 2 | anana |
| 3 | banana |
| 4 | na |
| 5 | nana |
Correctness
DC7 extends the DC3 principle: suffixes are encoded into fixed length tuples that preserve lexicographic order locally.
Recursive ranking ensures uniqueness of comparisons. Induced sorting reconstructs global order by ensuring that every suffix is placed relative to already sorted representatives.
The merge step preserves correctness because all comparisons reduce to rank comparisons or bounded character comparisons over fixed windows.
Thus the final array is a correct suffix array.
Complexity
Let .
| phase | cost |
|---|---|
| tuple construction | |
| recursive sorting | |
| induction and merge |
Total time:
Space complexity:
When to Use
DC7 is primarily theoretical and used for:
- generalized divide and conquer suffix array construction
- research into linear time string algorithms
- extensions of DC3 style methods
- experimenting with partition sizes beyond 3
In practice, SA-IS is preferred for simplicity and performance.
Implementation
def dc7(T):
# Simplified DC7 skeleton
def build_tuples(T, k=7):
pass
def sort_groups(groups):
pass
def assign_ranks(sorted_groups):
pass
def induce_all(groups):
pass
groups = build_tuples(T)
sorted_groups = sort_groups(groups)
ranks = assign_ranks(sorted_groups)
if len(set(ranks)) != len(ranks):
ranks = dc7(ranks)
sa = induce_all(ranks)
return safunc DC7(text []int) []int {
// Simplified skeleton
buildTuples := func() []int { return nil }
sortGroups := func([]int) []int { return nil }
assignRanks := func([]int) []int { return nil }
induce := func([]int) []int { return nil }
groups := buildTuples()
sorted := sortGroups(groups)
ranks := assignRanks(sorted)
_ = ranks
sa := induce(ranks)
return sa
}