# Suffix Array Doubling Sort

# Suffix Array Doubling Sort

Suffix array doubling sort constructs a suffix array by iteratively sorting prefixes of length $1, 2, 4, 8, \dots$ until all suffixes are uniquely ranked.

It replaces full suffix comparisons with rank comparisons that double in length each round.

## Problem

Given a string $T$ of length $n$, construct a suffix array:

$$
SA[0 \dots n-1]
$$

such that suffixes of $T$ are sorted lexicographically.

## Idea

Each suffix is represented by a pair of ranks:

$$
(rank[i], rank[i + k])
$$

where $k$ doubles each iteration:

$$
k = 1, 2, 4, 8, \dots
$$

Initially, ranks are based on single characters.

## Algorithm

### Step 1: Initialization

Assign rank based on characters.

### Step 2: Iterative sorting

Sort suffixes using rank pairs.

```text id="sa0d8q"
suffix_array_doubling(T):
    n = length(T)
    SA = [0..n-1]
    rank = initial_character_ranks(T)

    k = 1

    while k < n:
        sort SA by (rank[i], rank[i + k])

        new_rank = array of size n
        new_rank[SA[0]] = 0

        for i from 1 to n-1:
            prev = SA[i-1]
            curr = SA[i]

            if (rank[curr], rank[curr+k]) == (rank[prev], rank[prev+k]):
                new_rank[curr] = new_rank[prev]
            else:
                new_rank[curr] = new_rank[prev] + 1

        rank = new_rank
        k = k * 2

    return SA
```

## Example

Text:

$$
T = "banana"
$$

Initial ranks (single characters):

| index | char | rank |
| ----: | ---- | ---: |
|     0 | b    |    1 |
|     1 | a    |    0 |
|     2 | n    |   13 |
|     3 | a    |    0 |
|     4 | n    |   13 |
|     5 | a    |    0 |

First iteration ($k = 1$):

Sort by pairs:

| suffix | pair   |
| ------ | ------ |
| a      | (0,13) |
| a      | (0,0)  |
| a      | (0,-)  |
| banana | (1,0)  |
| na     | (13,0) |
| nana   | (13,0) |

After repeated doubling, final suffix array becomes:

| SA index | suffix |
| -------: | ------ |
|        0 | a      |
|        1 | ana    |
|        2 | anana  |
|        3 | banana |
|        4 | na     |
|        5 | nana   |

## Correctness

At iteration $k$, each suffix is identified by the first $k$ characters via its rank pair:

$$
(rank[i], rank[i+k])
$$

If two suffixes have identical rank pairs, their first $2k$ characters are identical. Otherwise, lexicographic ordering is preserved by comparing rank pairs.

Since $k$ doubles each iteration, eventually $k \ge n$, and all suffixes are uniquely ranked, giving a correct lexicographic ordering.

Thus the final array is a valid suffix array.

## Complexity

Let $n = |T|$.

| operation                  | time            |
| -------------------------- | --------------- |
| each sorting step          | $O(n \log n)$   |
| number of steps            | $O(\log n)$     |
| total naive implementation | $O(n \log^2 n)$ |

With radix or counting sort optimization:

$$
O(n \log n)
$$

Space complexity:

$$
O(n)
$$

## When to Use

Suffix array doubling is useful when:

* implementing suffix arrays from scratch
* no advanced linear time construction is required
* clarity and simplicity are preferred over optimal asymptotic performance
* text indexing systems are being prototyped

It is often the baseline method before moving to SA-IS or DC3 algorithms.

## Implementation

```python id="sa1d7m"
def suffix_array_doubling(s):
    n = len(s)
    sa = list(range(n))
    rank = [ord(c) for c in s]

    k = 1

    while k < n:
        sa.sort(key=lambda i: (rank[i], rank[i+k] if i+k < n else -1))

        new_rank = [0] * n
        new_rank[sa[0]] = 0

        for i in range(1, n):
            prev, curr = sa[i-1], sa[i]

            prev_key = (rank[prev], rank[prev+k] if prev+k < n else -1)
            curr_key = (rank[curr], rank[curr+k] if curr+k < n else -1)

            if curr_key == prev_key:
                new_rank[curr] = new_rank[prev]
            else:
                new_rank[curr] = new_rank[prev] + 1

        rank = new_rank
        k *= 2

    return sa
```

```go id="sa2m8v"
import "sort"

func SuffixArrayDoubling(s string) []int {
	n := len(s)
	sa := make([]int, n)
	rank := make([]int, n)

	for i := 0; i < n; i++ {
		sa[i] = i
		rank[i] = int(s[i])
	}

	k := 1

	getRank := func(i int) int {
		if i < n {
			return rank[i]
		}
		return -1
	}

	for k < n {
		sort.Slice(sa, func(i, j int) bool {
			a, b := sa[i], sa[j]
			if rank[a] != rank[b] {
				return rank[a] < rank[b]
			}
			return getRank(a+k) < getRank(b+k)
		})

		newRank := make([]int, n)
		newRank[sa[0]] = 0

		for i := 1; i < n; i++ {
			prev, curr := sa[i-1], sa[i]

			prevKey := [2]int{rank[prev], getRank(prev + k)}
			currKey := [2]int{rank[curr], getRank(curr + k)}

			if currKey == prevKey {
				newRank[curr] = newRank[prev]
			} else {
				newRank[curr] = newRank[prev] + 1
			}
		}

		rank = newRank
		k *= 2
	}

	return sa
}
```

