Skip to content

DC7 Suffix Array Sort

Difference cover modulo 7 suffix array construction method that generalizes DC3 with a larger recursive sample.

DC7 suffix array sort is a linear-time suffix array construction method based on a difference cover modulo 7. It generalizes the DC3 idea by sorting a larger sampled subset of suffixes, then using the resulting ranks to sort and merge the remaining suffixes.

The method is mostly of theoretical and specialist interest. DC3 and SA-IS are more common in practice.

Problem

Given a string SS of length nn, construct its suffix array:

SA=sorted indices of all suffixes of S SA = \text{sorted indices of all suffixes of } S

A sentinel character smaller than all other characters is appended to make suffix comparisons well-defined.

Idea

DC7 partitions suffix positions by their index modulo 7. It chooses a difference cover subset DD of residues modulo 7. Suffixes whose positions fall in DD are sorted recursively. Their ranks are then used to compare and sort suffixes outside DD.

A common difference cover modulo 7 is:

D=0,1,3 D = {0, 1, 3}

The difference cover property ensures that any two suffix positions can be shifted by a small amount so that both shifted positions land in the sampled set. This allows constant-time suffix comparison using a bounded prefix plus known ranks.

Algorithm Sketch

dc7_suffix_array(S):
    append sentinel symbols

    choose difference cover D modulo 7

    sample positions i where i mod 7 is in D

    sort sampled suffixes by fixed-length tuples using radix sort

    assign ranks to sampled suffixes

    if ranks are not unique:
        recursively compute suffix array of reduced sample string

    sort nonsampled suffixes using characters and sampled ranks

    merge sampled and nonsampled suffixes

    return suffix array

Difference Cover

A set DD is a difference cover modulo vv when every residue modulo vv can be represented as:

ab(modv) a - b \pmod v

for some a,bDa, b \in D.

This property lets the algorithm compare suffixes by reading a bounded number of characters until both suffixes reach sampled positions. Once both shifted suffixes are sampled, their ranks decide the remaining order.

Example

Let:

S="banana$" S = \text{"banana\$"}

The final suffix array is:

[6,5,3,1,0,4,2] [6, 5, 3, 1, 0, 4, 2]

DC7 reaches this order by recursively ranking sampled suffixes and using those ranks to sort all other suffixes.

Correctness

The sampled suffixes are sorted recursively, so their ranks encode correct lexicographic order. For any pair of suffixes, the difference cover guarantees that after comparing a bounded prefix, both suffixes can be shifted to sampled positions.

The comparison then uses known ranks for the remaining suffixes. Therefore each local comparison agrees with true lexicographic order. Sorting and merging all residue classes with these comparisons yields the correct suffix array.

Complexity

Let nn be the string length.

Time:

O(n) O(n)

The sampled problem is a fixed fraction of the original size, and each sorting or merging phase is linear.

Space:

O(n) O(n)

Properties

propertyvalue
stableyes in radix stages
in placeno
comparison basedno
timelinear
methoddifference cover

When to Use

Use DC7 mainly when studying generalized difference-cover suffix array construction or implementing a specialized suffix array builder where larger sampling blocks improve constants for a particular workload.

For most practical systems, SA-IS or DC3 is usually easier to implement and validate.

Implementation Notes

  • Choose a valid difference cover modulo 7
  • Append enough sentinels for tuple comparisons
  • Use radix sort for fixed-length tuples
  • Store ranks for sampled positions
  • Use bounded-prefix plus rank comparisons during merge
  • Test carefully on repeated strings and strings with long equal prefixes