Skip to content

DC3 Suffix Array Sort

Difference cover modulo 3 algorithm for linear-time suffix array construction using radix sorting and merging.

DC3 suffix array sort is the concrete formulation of the skew algorithm based on a difference cover modulo 3. It builds the suffix array in linear time by sorting a subset of suffixes, assigning them ranks, and merging with the remaining suffixes.

It is widely used in high-performance string processing systems.

Problem

Given a string SS of length nn, construct the suffix array:

SA=sorted indices of all suffixes of S SA = \text{sorted indices of all suffixes of } S

A sentinel character smaller than all others is appended to ensure all suffixes are comparable.

Idea

Partition suffix indices into:

  • imod3=0i \bmod 3 = 0
  • imod3=1i \bmod 3 = 1
  • imod3=2i \bmod 3 = 2

Then:

  1. sort suffixes at positions imod30i \bmod 3 \ne 0 using triples
  2. assign ranks to these suffixes
  3. recursively solve if ranks are not unique
  4. sort suffixes at positions imod3=0i \bmod 3 = 0
  5. merge both sorted groups

Triples Sorting

For positions where imod30i \bmod 3 \ne 0, compare suffixes using:

(S[i],S[i+1],S[i+2]) (S[i], S[i+1], S[i+2])

These triples are sorted with radix sort in linear time.

Algorithm

dc3_suffix_array(S):
    append sentinel symbols

    construct list A of indices i where i mod 3 != 0

    sort A by triples (S[i], S[i+1], S[i+2]) using radix sort

    assign ranks to A

    if ranks not unique:
        recursively compute SA for reduced problem

    construct list B of indices i where i mod 3 == 0

    sort B using pairs (S[i], rank[i+1])

    merge A and B to form final SA

Merge Comparison

During merge:

  • if comparing imod3=0i \bmod 3 = 0 with jmod3=1j \bmod 3 = 1:
(S[i],rank[i+1]) (S[i], \text{rank}[i+1])
  • if comparing imod3=0i \bmod 3 = 0 with jmod3=2j \bmod 3 = 2:
(S[i],S[i+1],rank[i+2]) (S[i], S[i+1], \text{rank}[i+2])

These comparisons fully determine suffix order.

Example

Let:

S="banana$" S = \text{"banana\$"}

Suffix array:

[6,5,3,1,0,4,2] [6, 5, 3, 1, 0, 4, 2]

This is produced after sorting non-multiples of 3, ranking them, sorting multiples of 3, and merging.

Correctness

Sorting the subset of suffixes provides a correct ordering for those positions. Assigned ranks encode lexicographic order. Sorting the remaining suffixes uses these ranks to compare suffixes beyond the first few characters.

The merge step combines both lists while maintaining global order. Since every suffix is compared using sufficient prefix information and ranks, the final array is correctly sorted.

Complexity

Let nn be the string length.

Time:

O(n) O(n)

Each stage uses linear-time radix sorting or linear merging.

Space:

O(n) O(n)

Properties

propertyvalue
stableyes in radix steps
in placeno
comparison basedno
timelinear
methoddivide and conquer

When to Use

Use DC3 when you need a theoretically optimal suffix array algorithm and are comfortable with its implementation complexity.

Avoid it in simpler applications where suffix array doubling or SA-IS provides easier implementations with acceptable performance.

Notes

  • Equivalent to skew algorithm
  • Requires careful index handling
  • Efficient in both theory and practice
  • Often used in academic implementations