Skip to content

Skew Suffix Array Sort (DC3)

Linear time suffix array construction using the DC3 divide and conquer strategy.

DC3, also called skew algorithm, builds a suffix array in linear time by splitting suffix positions into residue classes modulo 3 and recursively sorting a subset.

It reduces the full suffix ordering problem into a smaller problem, then merges results using rank comparisons.

Problem

Given a string TT of length nn, construct its suffix array:

SA[0n1] SA[0 \dots n-1]

such that all suffixes are sorted lexicographically.

Key Idea

Split suffixes into three groups:

grouppositions
S0S_0i mod 3 = 0
S1S_1i mod 3 = 1
S2S_2i mod 3 = 2

Sort S1S2S_1 \cup S_2 recursively, then induce order for S0S_0.

Algorithm

Step 1: Form triplets

For each position ii:

(T[i],T[i+1],T[i+2]) (T[i], T[i+1], T[i+2])

Step 2: Sort triplets

Sort all positions where imod30i \mod 3 \neq 0.

Step 3: Assign ranks

Assign lexicographic ranks to triplets.

Step 4: Recursive step

If ranks are not unique, recurse on reduced string.

Step 5: Sort S0S_0

Use sorted S1S_1 and S2S_2 to sort S0S_0 using rank comparisons.

Step 6: Merge

Merge sorted S0S_0 with sorted S1S2S_1 \cup S_2.

dc3(T):
    split positions into S1 and S2

    sort S1 ∪ S2 by triplets

    assign ranks

    if ranks not unique:
        recurse

    sort S0 using ranks

    merge S0 and S1∪S2

    return SA

Example

Text:

T="banana" T = "banana"

Positions:

ichar
0b
1a
2n
3a
4n
5a

Groups:

grouppositions
S00, 3
S11, 4
S22, 5

After sorting and ranking, DC3 merges all suffixes into:

SA indexsuffix
0a
1ana
2anana
3banana
4na
5nana

Correctness

DC3 works by reducing suffix comparison to fixed length tuples and ensuring lexicographic order through recursive ranking.

Suffixes in S1S2S_1 \cup S_2 are sorted recursively, giving stable ranks. These ranks act as compressed keys for comparing suffixes in S0S_0.

The final merge step preserves ordering because each suffix comparison is reduced to rank comparison or direct character comparison in a bounded window.

Thus the output is a correct suffix array.

Complexity

Let n=Tn = |T|.

phasecost
triplet formationO(n)O(n)
recursive sortingO(n)O(n)
mergingO(n)O(n)

Total time:

O(n) O(n)

Space complexity:

O(n) O(n)

When to Use

DC3 is useful when:

  • linear time suffix array construction is required
  • SA-IS is not preferred due to implementation complexity
  • theoretical guarantees are important
  • working with academic or algorithmic research systems

In practice, SA-IS is often faster and simpler, but DC3 remains a classic linear construction method.

Implementation

def dc3(T):
    # Simplified structure of DC3 algorithm

    def sort_triplets(T, positions):
        pass

    def assign_ranks(sorted_positions):
        pass

    def merge(sa0, sa12):
        pass

    S1S2 = [i for i in range(len(T)) if i % 3 != 0]
    SA12 = sort_triplets(T, S1S2)
    ranks = assign_ranks(SA12)

    if len(set(ranks)) != len(ranks):
        SA12 = dc3(ranks)

    SA0 = []

    SA = merge(SA0, SA12)

    return SA
func DC3(text []int) []int {
	// Simplified skeleton

	sortTriplets := func() []int { return nil }
	assignRanks := func([]int) []int { return nil }
	merge := func([]int, []int) []int { return nil }

	s1s2 := []int{}

	for i := 0; i < len(text); i++ {
		if i%3 != 0 {
			s1s2 = append(s1s2, i)
		}
	}

	sa12 := sortTriplets()
	ranks := assignRanks(sa12)

	_ = ranks

	sa0 := []int{}

	sa := merge(sa0, sa12)

	return sa
}