Suffix Array Doubling Sort

Suffix array doubling sort constructs a suffix array by iteratively sorting prefixes of length $1, 2, 4, 8, \dots$ until all suffixes are uniquely ranked.

It replaces full suffix comparisons with rank comparisons that double in length each round.

Problem

Given a string $T$ of length $n$ , construct a suffix array:

SA[0 \dots n-1]

such that suffixes of $T$ are sorted lexicographically.

Idea

Each suffix is represented by a pair of ranks:

(rank[i], rank[i + k])

where $k$ doubles each iteration:

k = 1, 2, 4, 8, \dots

Initially, ranks are based on single characters.

Algorithm

Step 1: Initialization

Assign rank based on characters.

Step 2: Iterative sorting

Sort suffixes using rank pairs.

suffix_array_doubling(T):
    n = length(T)
    SA = [0..n-1]
    rank = initial_character_ranks(T)

    k = 1

    while k < n:
        sort SA by (rank[i], rank[i + k])

        new_rank = array of size n
        new_rank[SA[0]] = 0

        for i from 1 to n-1:
            prev = SA[i-1]
            curr = SA[i]

            if (rank[curr], rank[curr+k]) == (rank[prev], rank[prev+k]):
                new_rank[curr] = new_rank[prev]
            else:
                new_rank[curr] = new_rank[prev] + 1

        rank = new_rank
        k = k * 2

    return SA

Example

Text:

T = "banana"

Initial ranks (single characters):

index	char	rank
0	b	1
1	a	0
2	n	13
3	a	0
4	n	13
5	a	0

First iteration ( $k = 1$ ):

Sort by pairs:

suffix	pair
a	(0,13)
a	(0,0)
a	(0,-)
banana	(1,0)
na	(13,0)
nana	(13,0)

After repeated doubling, final suffix array becomes:

SA index	suffix
0	a
1	ana
2	anana
3	banana
4	na
5	nana

Correctness

At iteration $k$ , each suffix is identified by the first $k$ characters via its rank pair:

(rank[i], rank[i+k])

If two suffixes have identical rank pairs, their first $2k$ characters are identical. Otherwise, lexicographic ordering is preserved by comparing rank pairs.

Since $k$ doubles each iteration, eventually $k \ge n$ , and all suffixes are uniquely ranked, giving a correct lexicographic ordering.

Thus the final array is a valid suffix array.

Complexity

Let $n = |T|$ .

operation	time
each sorting step	$O(n \log n)$
number of steps	$O(\log n)$
total naive implementation	$O(n \log^2 n)$

With radix or counting sort optimization:

O(n \log n)

Space complexity:

O(n)

When to Use

Suffix array doubling is useful when:

implementing suffix arrays from scratch
no advanced linear time construction is required
clarity and simplicity are preferred over optimal asymptotic performance
text indexing systems are being prototyped

It is often the baseline method before moving to SA-IS or DC3 algorithms.

Implementation

def suffix_array_doubling(s):
    n = len(s)
    sa = list(range(n))
    rank = [ord(c) for c in s]

    k = 1

    while k < n:
        sa.sort(key=lambda i: (rank[i], rank[i+k] if i+k < n else -1))

        new_rank = [0] * n
        new_rank[sa[0]] = 0

        for i in range(1, n):
            prev, curr = sa[i-1], sa[i]

            prev_key = (rank[prev], rank[prev+k] if prev+k < n else -1)
            curr_key = (rank[curr], rank[curr+k] if curr+k < n else -1)

            if curr_key == prev_key:
                new_rank[curr] = new_rank[prev]
            else:
                new_rank[curr] = new_rank[prev] + 1

        rank = new_rank
        k *= 2

    return sa

import "sort"

func SuffixArrayDoubling(s string) []int {
	n := len(s)
	sa := make([]int, n)
	rank := make([]int, n)

	for i := 0; i < n; i++ {
		sa[i] = i
		rank[i] = int(s[i])
	}

	k := 1

	getRank := func(i int) int {
		if i < n {
			return rank[i]
		}
		return -1
	}

	for k < n {
		sort.Slice(sa, func(i, j int) bool {
			a, b := sa[i], sa[j]
			if rank[a] != rank[b] {
				return rank[a] < rank[b]
			}
			return getRank(a+k) < getRank(b+k)
		})

		newRank := make([]int, n)
		newRank[sa[0]] = 0

		for i := 1; i < n; i++ {
			prev, curr := sa[i-1], sa[i]

			prevKey := [2]int{rank[prev], getRank(prev + k)}
			currKey := [2]int{rank[curr], getRank(curr + k)}

			if currKey == prevKey {
				newRank[curr] = newRank[prev]
			} else {
				newRank[curr] = newRank[prev] + 1
			}
		}

		rank = newRank
		k *= 2
	}

	return sa
}