Skip to content

Suffix Array Doubling Sort

Build a suffix array using the prefix doubling technique with iterative ranking.

Suffix array doubling sort constructs a suffix array by iteratively sorting prefixes of length 1,2,4,8,1, 2, 4, 8, \dots until all suffixes are uniquely ranked.

It replaces full suffix comparisons with rank comparisons that double in length each round.

Problem

Given a string TT of length nn, construct a suffix array:

SA[0n1] SA[0 \dots n-1]

such that suffixes of TT are sorted lexicographically.

Idea

Each suffix is represented by a pair of ranks:

(rank[i],rank[i+k]) (rank[i], rank[i + k])

where kk doubles each iteration:

k=1,2,4,8, k = 1, 2, 4, 8, \dots

Initially, ranks are based on single characters.

Algorithm

Step 1: Initialization

Assign rank based on characters.

Step 2: Iterative sorting

Sort suffixes using rank pairs.

suffix_array_doubling(T):
    n = length(T)
    SA = [0..n-1]
    rank = initial_character_ranks(T)

    k = 1

    while k < n:
        sort SA by (rank[i], rank[i + k])

        new_rank = array of size n
        new_rank[SA[0]] = 0

        for i from 1 to n-1:
            prev = SA[i-1]
            curr = SA[i]

            if (rank[curr], rank[curr+k]) == (rank[prev], rank[prev+k]):
                new_rank[curr] = new_rank[prev]
            else:
                new_rank[curr] = new_rank[prev] + 1

        rank = new_rank
        k = k * 2

    return SA

Example

Text:

T="banana" T = "banana"

Initial ranks (single characters):

indexcharrank
0b1
1a0
2n13
3a0
4n13
5a0

First iteration (k=1k = 1):

Sort by pairs:

suffixpair
a(0,13)
a(0,0)
a(0,-)
banana(1,0)
na(13,0)
nana(13,0)

After repeated doubling, final suffix array becomes:

SA indexsuffix
0a
1ana
2anana
3banana
4na
5nana

Correctness

At iteration kk, each suffix is identified by the first kk characters via its rank pair:

(rank[i],rank[i+k]) (rank[i], rank[i+k])

If two suffixes have identical rank pairs, their first 2k2k characters are identical. Otherwise, lexicographic ordering is preserved by comparing rank pairs.

Since kk doubles each iteration, eventually knk \ge n, and all suffixes are uniquely ranked, giving a correct lexicographic ordering.

Thus the final array is a valid suffix array.

Complexity

Let n=Tn = |T|.

operationtime
each sorting stepO(nlogn)O(n \log n)
number of stepsO(logn)O(\log n)
total naive implementationO(nlog2n)O(n \log^2 n)

With radix or counting sort optimization:

O(nlogn) O(n \log n)

Space complexity:

O(n) O(n)

When to Use

Suffix array doubling is useful when:

  • implementing suffix arrays from scratch
  • no advanced linear time construction is required
  • clarity and simplicity are preferred over optimal asymptotic performance
  • text indexing systems are being prototyped

It is often the baseline method before moving to SA-IS or DC3 algorithms.

Implementation

def suffix_array_doubling(s):
    n = len(s)
    sa = list(range(n))
    rank = [ord(c) for c in s]

    k = 1

    while k < n:
        sa.sort(key=lambda i: (rank[i], rank[i+k] if i+k < n else -1))

        new_rank = [0] * n
        new_rank[sa[0]] = 0

        for i in range(1, n):
            prev, curr = sa[i-1], sa[i]

            prev_key = (rank[prev], rank[prev+k] if prev+k < n else -1)
            curr_key = (rank[curr], rank[curr+k] if curr+k < n else -1)

            if curr_key == prev_key:
                new_rank[curr] = new_rank[prev]
            else:
                new_rank[curr] = new_rank[prev] + 1

        rank = new_rank
        k *= 2

    return sa
import "sort"

func SuffixArrayDoubling(s string) []int {
	n := len(s)
	sa := make([]int, n)
	rank := make([]int, n)

	for i := 0; i < n; i++ {
		sa[i] = i
		rank[i] = int(s[i])
	}

	k := 1

	getRank := func(i int) int {
		if i < n {
			return rank[i]
		}
		return -1
	}

	for k < n {
		sort.Slice(sa, func(i, j int) bool {
			a, b := sa[i], sa[j]
			if rank[a] != rank[b] {
				return rank[a] < rank[b]
			}
			return getRank(a+k) < getRank(b+k)
		})

		newRank := make([]int, n)
		newRank[sa[0]] = 0

		for i := 1; i < n; i++ {
			prev, curr := sa[i-1], sa[i]

			prevKey := [2]int{rank[prev], getRank(prev + k)}
			currKey := [2]int{rank[curr], getRank(curr + k)}

			if currKey == prevKey {
				newRank[curr] = newRank[prev]
			} else {
				newRank[curr] = newRank[prev] + 1
			}
		}

		rank = newRank
		k *= 2
	}

	return sa
}