Linear-time suffix array construction using the DC3 algorithm that recursively sorts positions modulo 3.
Skew suffix array sort, also known as the DC3 algorithm, constructs a suffix array in linear time by recursively sorting a subset of suffixes whose positions are not divisible by 3.
The remaining suffixes are then merged using the already computed order.
Problem
Given a string of length , construct its suffix array:
A sentinel character smaller than all other characters is appended to the string.
Idea
Split suffixes into three groups based on index modulo 3:
- group 0: indices
- group 1: indices
- group 2: indices
The algorithm proceeds as follows:
- recursively sort suffixes in groups 1 and 2
- assign ranks to these suffixes
- sort group 0 suffixes using the ranks
- merge the sorted groups
The name DC3 refers to difference cover modulo 3.
Algorithm
dc3(S):
append sentinel characters
form positions i where i mod 3 != 0
sort triples (S[i], S[i+1], S[i+2]) using radix sort
assign ranks to sorted triples
if ranks not unique:
recursively compute SA for reduced string
sort positions i mod 3 == 0 using ranks
merge sorted lists
return SATriples
For positions in groups 1 and 2, compare suffixes using triples:
These triples are sorted using radix sort in linear time.
Example
Let:
Suffixes:
| index | suffix |
|---|---|
| 0 | banana$ |
| 1 | anana$ |
| 2 | nana$ |
| 3 | ana$ |
| 4 | na$ |
| 5 | a$ |
| 6 | $ |
Final suffix array:
Merge Step
To merge:
- compare suffix at with suffix at using:
- compare suffix at with suffix at using:
These comparisons are constant time because ranks are known.
Correctness
Suffixes in groups 1 and 2 are sorted recursively. Their ranks represent correct lexicographic order. Group 0 suffixes are sorted using these ranks, ensuring their relative order is correct.
The merge step compares suffixes using enough characters and ranks to fully determine lexicographic order. Therefore, all suffixes are merged correctly into the final suffix array.
Complexity
Let be the string length.
Time:
Each phase uses linear-time radix sorting and merging.
Space:
Properties
| property | value |
|---|---|
| stable | yes within radix steps |
| in place | no |
| comparison based | no |
| time | linear |
| method | divide and conquer |
When to Use
Use skew suffix array sort when linear-time suffix array construction is required and implementation complexity is acceptable.
It is widely used in theoretical contexts and high-performance text processing systems.
Avoid it when simplicity is preferred, since SA-IS is often easier to implement in practice.
Implementation Notes
- Use radix sort for triples
- Carefully handle sentinel values
- Recursively reduce problem size
- Merge using rank comparisons
- Maintain arrays for ranks and positions