Skip to content

DC7 Suffix Array Sort

Generalized divide and conquer suffix array construction using modulo 7 partitioning.

DC7 is a generalization of DC3 suffix array construction. It splits suffix positions into residue classes modulo 7 and recursively sorts a subset of suffixes, then induces the remaining order.

The goal is to reduce comparison depth by encoding suffixes into fixed length tuples and merging sorted partitions.

Problem

Given a string TT of length nn, construct its suffix array:

SA[0n1] SA[0 \dots n-1]

such that suffixes are sorted lexicographically.

Key Idea

Partition suffix positions by modulo 7:

grouppositions
S0S_0i mod 7 = 0
S1S_1i mod 7 = 1
S2S_2i mod 7 = 2
S3S_3i mod 7 = 3
S4S_4i mod 7 = 4
S5S_5i mod 7 = 5
S6S_6i mod 7 = 6

DC7 sorts a subset of these positions recursively, typically all non-zero classes, then reconstructs full ordering.

Algorithm

Step 1: Build tuples

Each position is represented by a fixed length tuple of characters:

(T[i],T[i+1],,T[i+6]) (T[i], T[i+1], \dots, T[i+6])

Step 2: Sort selected groups

Sort positions in selected residue classes using radix or comparison sort.

Step 3: Assign ranks

Convert sorted tuples into integer ranks.

Step 4: Recursive reduction

If ranks are not unique, recursively apply DC7 to the reduced problem.

Step 5: Induce remaining suffixes

Use sorted subset to infer ordering of remaining suffixes.

Step 6: Merge

Combine all residue classes into final suffix array.

dc7(T):
    split positions by mod 7

    build 7-length tuples

    sort selected residue classes

    assign ranks

    if ranks not unique:
        recurse on reduced string

    induce remaining suffixes

    merge all groups

    return SA

Example

Text:

T="banana" T = "banana"

Positions:

ichar
0b
1a
2n
3a
4n
5a

Residue groups:

grouppositions
S00
S11
S22
S33
S44
S55
S6none

After tuple ranking and merging, the final suffix array becomes:

SA indexsuffix
0a
1ana
2anana
3banana
4na
5nana

Correctness

DC7 extends the DC3 principle: suffixes are encoded into fixed length tuples that preserve lexicographic order locally.

Recursive ranking ensures uniqueness of comparisons. Induced sorting reconstructs global order by ensuring that every suffix is placed relative to already sorted representatives.

The merge step preserves correctness because all comparisons reduce to rank comparisons or bounded character comparisons over fixed windows.

Thus the final array is a correct suffix array.

Complexity

Let n=Tn = |T|.

phasecost
tuple constructionO(n)O(n)
recursive sortingO(n)O(n)
induction and mergeO(n)O(n)

Total time:

O(n) O(n)

Space complexity:

O(n) O(n)

When to Use

DC7 is primarily theoretical and used for:

  • generalized divide and conquer suffix array construction
  • research into linear time string algorithms
  • extensions of DC3 style methods
  • experimenting with partition sizes beyond 3

In practice, SA-IS is preferred for simplicity and performance.

Implementation

def dc7(T):
    # Simplified DC7 skeleton

    def build_tuples(T, k=7):
        pass

    def sort_groups(groups):
        pass

    def assign_ranks(sorted_groups):
        pass

    def induce_all(groups):
        pass

    groups = build_tuples(T)

    sorted_groups = sort_groups(groups)

    ranks = assign_ranks(sorted_groups)

    if len(set(ranks)) != len(ranks):
        ranks = dc7(ranks)

    sa = induce_all(ranks)

    return sa
func DC7(text []int) []int {
	// Simplified skeleton

	buildTuples := func() []int { return nil }
	sortGroups := func([]int) []int { return nil }
	assignRanks := func([]int) []int { return nil }
	induce := func([]int) []int { return nil }

	groups := buildTuples()
	sorted := sortGroups(groups)
	ranks := assignRanks(sorted)

	_ = ranks

	sa := induce(ranks)

	return sa
}