Skip to content

Skew Suffix Array Sort

Linear-time suffix array construction using the DC3 algorithm that recursively sorts positions modulo 3.

Skew suffix array sort, also known as the DC3 algorithm, constructs a suffix array in linear time by recursively sorting a subset of suffixes whose positions are not divisible by 3.

The remaining suffixes are then merged using the already computed order.

Problem

Given a string SS of length nn, construct its suffix array:

SA=sorted indices of all suffixes of S SA = \text{sorted indices of all suffixes of } S

A sentinel character smaller than all other characters is appended to the string.

Idea

Split suffixes into three groups based on index modulo 3:

  • group 0: indices i0(mod3)i \equiv 0 \pmod{3}
  • group 1: indices i1(mod3)i \equiv 1 \pmod{3}
  • group 2: indices i2(mod3)i \equiv 2 \pmod{3}

The algorithm proceeds as follows:

  1. recursively sort suffixes in groups 1 and 2
  2. assign ranks to these suffixes
  3. sort group 0 suffixes using the ranks
  4. merge the sorted groups

The name DC3 refers to difference cover modulo 3.

Algorithm

dc3(S):
    append sentinel characters

    form positions i where i mod 3 != 0

    sort triples (S[i], S[i+1], S[i+2]) using radix sort

    assign ranks to sorted triples

    if ranks not unique:
        recursively compute SA for reduced string

    sort positions i mod 3 == 0 using ranks

    merge sorted lists

    return SA

Triples

For positions in groups 1 and 2, compare suffixes using triples:

(S[i],S[i+1],S[i+2]) (S[i], S[i+1], S[i+2])

These triples are sorted using radix sort in linear time.

Example

Let:

S="banana$" S = \text{"banana\$"}

Suffixes:

indexsuffix
0banana$
1anana$
2nana$
3ana$
4na$
5a$
6$

Final suffix array:

[6,5,3,1,0,4,2] [6, 5, 3, 1, 0, 4, 2]

Merge Step

To merge:

  • compare suffix at i0i \equiv 0 with suffix at j1j \equiv 1 using:
(S[i],rank[i+1]) (S[i], \text{rank}[i+1])
  • compare suffix at i0i \equiv 0 with suffix at j2j \equiv 2 using:
(S[i],S[i+1],rank[i+2]) (S[i], S[i+1], \text{rank}[i+2])

These comparisons are constant time because ranks are known.

Correctness

Suffixes in groups 1 and 2 are sorted recursively. Their ranks represent correct lexicographic order. Group 0 suffixes are sorted using these ranks, ensuring their relative order is correct.

The merge step compares suffixes using enough characters and ranks to fully determine lexicographic order. Therefore, all suffixes are merged correctly into the final suffix array.

Complexity

Let nn be the string length.

Time:

O(n) O(n)

Each phase uses linear-time radix sorting and merging.

Space:

O(n) O(n)

Properties

propertyvalue
stableyes within radix steps
in placeno
comparison basedno
timelinear
methoddivide and conquer

When to Use

Use skew suffix array sort when linear-time suffix array construction is required and implementation complexity is acceptable.

It is widely used in theoretical contexts and high-performance text processing systems.

Avoid it when simplicity is preferred, since SA-IS is often easier to implement in practice.

Implementation Notes

  • Use radix sort for triples
  • Carefully handle sentinel values
  • Recursively reduce problem size
  • Merge using rank comparisons
  • Maintain arrays for ranks and positions