# Skew Suffix Array Sort

# Skew Suffix Array Sort

Skew suffix array sort, also known as the DC3 algorithm, constructs a suffix array in linear time by recursively sorting a subset of suffixes whose positions are not divisible by 3.

The remaining suffixes are then merged using the already computed order.

## Problem

Given a string $S$ of length $n$, construct its suffix array:

$$
SA = \text{sorted indices of all suffixes of } S
$$

A sentinel character smaller than all other characters is appended to the string.

## Idea

Split suffixes into three groups based on index modulo 3:

* group 0: indices $i \equiv 0 \pmod{3}$
* group 1: indices $i \equiv 1 \pmod{3}$
* group 2: indices $i \equiv 2 \pmod{3}$

The algorithm proceeds as follows:

1. recursively sort suffixes in groups 1 and 2
2. assign ranks to these suffixes
3. sort group 0 suffixes using the ranks
4. merge the sorted groups

The name DC3 refers to difference cover modulo 3.

## Algorithm

```text id="u6k2pm"
dc3(S):
    append sentinel characters

    form positions i where i mod 3 != 0

    sort triples (S[i], S[i+1], S[i+2]) using radix sort

    assign ranks to sorted triples

    if ranks not unique:
        recursively compute SA for reduced string

    sort positions i mod 3 == 0 using ranks

    merge sorted lists

    return SA
```

## Triples

For positions in groups 1 and 2, compare suffixes using triples:

$$
(S[i], S[i+1], S[i+2])
$$

These triples are sorted using radix sort in linear time.

## Example

Let:

$$
S = \text{"banana\$"}
$$

Suffixes:

| index | suffix  |
| ----- | ------- |
| 0     | banana$ |
| 1     | anana$  |
| 2     | nana$   |
| 3     | ana$    |
| 4     | na$     |
| 5     | a$      |
| 6     | $       |

Final suffix array:

$$
[6, 5, 3, 1, 0, 4, 2]
$$

## Merge Step

To merge:

* compare suffix at $i \equiv 0$ with suffix at $j \equiv 1$ using:

$$
(S[i], \text{rank}[i+1])
$$

* compare suffix at $i \equiv 0$ with suffix at $j \equiv 2$ using:

$$
(S[i], S[i+1], \text{rank}[i+2])
$$

These comparisons are constant time because ranks are known.

## Correctness

Suffixes in groups 1 and 2 are sorted recursively. Their ranks represent correct lexicographic order. Group 0 suffixes are sorted using these ranks, ensuring their relative order is correct.

The merge step compares suffixes using enough characters and ranks to fully determine lexicographic order. Therefore, all suffixes are merged correctly into the final suffix array.

## Complexity

Let $n$ be the string length.

Time:

$$
O(n)
$$

Each phase uses linear-time radix sorting and merging.

Space:

$$
O(n)
$$

## Properties

| property         | value                  |
| ---------------- | ---------------------- |
| stable           | yes within radix steps |
| in place         | no                     |
| comparison based | no                     |
| time             | linear                 |
| method           | divide and conquer     |

## When to Use

Use skew suffix array sort when linear-time suffix array construction is required and implementation complexity is acceptable.

It is widely used in theoretical contexts and high-performance text processing systems.

Avoid it when simplicity is preferred, since SA-IS is often easier to implement in practice.

## Implementation Notes

* Use radix sort for triples
* Carefully handle sentinel values
* Recursively reduce problem size
* Merge using rank comparisons
* Maintain arrays for ranks and positions

