Difference cover modulo 3 algorithm for linear-time suffix array construction using radix sorting and merging.
DC3 suffix array sort is the concrete formulation of the skew algorithm based on a difference cover modulo 3. It builds the suffix array in linear time by sorting a subset of suffixes, assigning them ranks, and merging with the remaining suffixes.
It is widely used in high-performance string processing systems.
Problem
Given a string of length , construct the suffix array:
A sentinel character smaller than all others is appended to ensure all suffixes are comparable.
Idea
Partition suffix indices into:
Then:
- sort suffixes at positions using triples
- assign ranks to these suffixes
- recursively solve if ranks are not unique
- sort suffixes at positions
- merge both sorted groups
Triples Sorting
For positions where , compare suffixes using:
These triples are sorted with radix sort in linear time.
Algorithm
dc3_suffix_array(S):
append sentinel symbols
construct list A of indices i where i mod 3 != 0
sort A by triples (S[i], S[i+1], S[i+2]) using radix sort
assign ranks to A
if ranks not unique:
recursively compute SA for reduced problem
construct list B of indices i where i mod 3 == 0
sort B using pairs (S[i], rank[i+1])
merge A and B to form final SAMerge Comparison
During merge:
- if comparing with :
- if comparing with :
These comparisons fully determine suffix order.
Example
Let:
Suffix array:
This is produced after sorting non-multiples of 3, ranking them, sorting multiples of 3, and merging.
Correctness
Sorting the subset of suffixes provides a correct ordering for those positions. Assigned ranks encode lexicographic order. Sorting the remaining suffixes uses these ranks to compare suffixes beyond the first few characters.
The merge step combines both lists while maintaining global order. Since every suffix is compared using sufficient prefix information and ranks, the final array is correctly sorted.
Complexity
Let be the string length.
Time:
Each stage uses linear-time radix sorting or linear merging.
Space:
Properties
| property | value |
|---|---|
| stable | yes in radix steps |
| in place | no |
| comparison based | no |
| time | linear |
| method | divide and conquer |
When to Use
Use DC3 when you need a theoretically optimal suffix array algorithm and are comfortable with its implementation complexity.
Avoid it in simpler applications where suffix array doubling or SA-IS provides easier implementations with acceptable performance.
Notes
- Equivalent to skew algorithm
- Requires careful index handling
- Efficient in both theory and practice
- Often used in academic implementations