Skip to content

Suffix Array Doubling Sort

Construct a suffix array by iteratively sorting suffixes using doubling technique with rank pairs.

Suffix array doubling sort builds a suffix array by repeatedly sorting suffixes based on prefixes of length (2^k). Each step uses previously computed ranks to compare longer prefixes efficiently.

This method avoids direct string comparison and reduces the problem to sorting integer pairs.

Problem

Given a string (S) of length (n), construct the suffix array:

SA=sorted indices of all suffixes of S SA = \text{sorted indices of all suffixes of } S

Each suffix is:

S[i:]=S[i]S[i+1]S[n1] S[i:] = S[i] S[i+1] \cdots S[n-1]

Idea

Instead of comparing suffixes directly, assign each suffix a rank.

At step (k), sort suffixes using the pair:

(rank[i],rank[i+2k]) (\text{rank}[i], \text{rank}[i + 2^k])

This represents the first (2^{k+1}) characters.

After sorting, assign new ranks. Repeat until all suffixes have unique ranks.

Algorithm

suffix_array_doubling(S):
    n = length(S)
    SA = [0, 1, ..., n-1]
    rank = initial ranks from characters
    temp = array of size n

    k = 0
    while (1 << k) < n:
        sort SA by (rank[i], rank[i + 2^k])

        temp[SA[0]] = 0
        for i from 1 to n-1:
            prev = SA[i-1]
            curr = SA[i]

            if (rank[prev], rank[prev + 2^k]) < (rank[curr], rank[curr + 2^k]):
                temp[curr] = temp[prev] + 1
            else:
                temp[curr] = temp[prev]

        rank = temp
        k += 1

    return SA

Example

Let:

S="banana" S = "banana"

Suffixes:

indexsuffix
0banana
1anana
2nana
3ana
4na
5a

Sorted suffixes:

["a","ana","anana","banana","na","nana"] ["a", "ana", "anana", "banana", "na", "nana"]

Suffix array:

[5,3,1,0,4,2] [5, 3, 1, 0, 4, 2]

Correctness

At step (k), suffixes are correctly ordered by their first (2^k) characters. Sorting by rank pairs extends this to (2^{k+1}) characters. Since ranks uniquely represent prefix order, the process converges when all suffixes have distinct ranks.

By induction, the final order corresponds to full lexicographic ordering of suffixes.

Complexity

Let:

  • (n) be the string length

Each iteration sorts (n) elements.

Using comparison sort:

O(nlogn) per iteration O(n \log n) \text{ per iteration}

Number of iterations:

O(logn) O(\log n)

Total:

O(nlog2n) O(n \log^2 n)

With counting or radix sort on ranks:

O(nlogn) O(n \log n)

Space:

O(n) O(n)

Properties

propertyvalue
stabledepends on sorting method
in placeno
comparison basedpartly
domainstrings
outputsuffix array

When to Use

Use suffix array doubling when:

  • constructing suffix arrays for moderate sized strings
  • needing a conceptually simple and implementable method
  • memory usage should remain linear

Avoid when:

  • linear time algorithms such as SA-IS are required
  • input size is extremely large and constant factors matter

Implementation (Python)

def suffix_array_doubling(s):
    n = len(s)
    sa = list(range(n))
    rank = [ord(c) for c in s]
    tmp = [0] * n

    k = 1
    while k < n:
        sa.sort(key=lambda i: (rank[i], rank[i + k] if i + k < n else -1))

        tmp[sa[0]] = 0
        for i in range(1, n):
            prev = sa[i - 1]
            curr = sa[i]

            prev_key = (rank[prev], rank[prev + k] if prev + k < n else -1)
            curr_key = (rank[curr], rank[curr + k] if curr + k < n else -1)

            if prev_key < curr_key:
                tmp[curr] = tmp[prev] + 1
            else:
                tmp[curr] = tmp[prev]

        rank = tmp[:]
        k <<= 1

    return sa