# Suffix Array Doubling Sort

# Suffix Array Doubling Sort

Suffix array doubling sort builds a suffix array by repeatedly sorting suffixes based on prefixes of length (2^k). Each step uses previously computed ranks to compare longer prefixes efficiently.

This method avoids direct string comparison and reduces the problem to sorting integer pairs.

## Problem

Given a string (S) of length (n), construct the suffix array:

$$
SA = \text{sorted indices of all suffixes of } S
$$

Each suffix is:

$$
S[i:] = S[i] S[i+1] \cdots S[n-1]
$$

## Idea

Instead of comparing suffixes directly, assign each suffix a rank.

At step (k), sort suffixes using the pair:

$$
(\text{rank}[i], \text{rank}[i + 2^k])
$$

This represents the first (2^{k+1}) characters.

After sorting, assign new ranks. Repeat until all suffixes have unique ranks.

## Algorithm

```text id="s8k2qp"
suffix_array_doubling(S):
    n = length(S)
    SA = [0, 1, ..., n-1]
    rank = initial ranks from characters
    temp = array of size n

    k = 0
    while (1 << k) < n:
        sort SA by (rank[i], rank[i + 2^k])

        temp[SA[0]] = 0
        for i from 1 to n-1:
            prev = SA[i-1]
            curr = SA[i]

            if (rank[prev], rank[prev + 2^k]) < (rank[curr], rank[curr + 2^k]):
                temp[curr] = temp[prev] + 1
            else:
                temp[curr] = temp[prev]

        rank = temp
        k += 1

    return SA
```

## Example

Let:

$$
S = "banana"
$$

Suffixes:

| index | suffix |
| ----- | ------ |
| 0     | banana |
| 1     | anana  |
| 2     | nana   |
| 3     | ana    |
| 4     | na     |
| 5     | a      |

Sorted suffixes:

$$
["a", "ana", "anana", "banana", "na", "nana"]
$$

Suffix array:

$$
[5, 3, 1, 0, 4, 2]
$$

## Correctness

At step (k), suffixes are correctly ordered by their first (2^k) characters. Sorting by rank pairs extends this to (2^{k+1}) characters. Since ranks uniquely represent prefix order, the process converges when all suffixes have distinct ranks.

By induction, the final order corresponds to full lexicographic ordering of suffixes.

## Complexity

Let:

* (n) be the string length

Each iteration sorts (n) elements.

Using comparison sort:

$$
O(n \log n) \text{ per iteration}
$$

Number of iterations:

$$
O(\log n)
$$

Total:

$$
O(n \log^2 n)
$$

With counting or radix sort on ranks:

$$
O(n \log n)
$$

Space:

$$
O(n)
$$

## Properties

| property         | value                     |
| ---------------- | ------------------------- |
| stable           | depends on sorting method |
| in place         | no                        |
| comparison based | partly                    |
| domain           | strings                   |
| output           | suffix array              |

## When to Use

Use suffix array doubling when:

* constructing suffix arrays for moderate sized strings
* needing a conceptually simple and implementable method
* memory usage should remain linear

Avoid when:

* linear time algorithms such as SA-IS are required
* input size is extremely large and constant factors matter

## Implementation (Python)

```python id="q7p2nx"
def suffix_array_doubling(s):
    n = len(s)
    sa = list(range(n))
    rank = [ord(c) for c in s]
    tmp = [0] * n

    k = 1
    while k < n:
        sa.sort(key=lambda i: (rank[i], rank[i + k] if i + k < n else -1))

        tmp[sa[0]] = 0
        for i in range(1, n):
            prev = sa[i - 1]
            curr = sa[i]

            prev_key = (rank[prev], rank[prev + k] if prev + k < n else -1)
            curr_key = (rank[curr], rank[curr + k] if curr + k < n else -1)

            if prev_key < curr_key:
                tmp[curr] = tmp[prev] + 1
            else:
                tmp[curr] = tmp[prev]

        rank = tmp[:]
        k <<= 1

    return sa
```

