Skip to content

Induced Sorting

Linear-time technique for suffix array construction that induces order of suffixes from a subset of sorted suffixes.

Induced sorting is a technique used to construct suffix arrays in linear time. Instead of sorting all suffixes directly, it sorts a subset of suffixes and then induces the order of the remaining ones.

This idea forms the foundation of modern linear-time suffix array algorithms such as SA-IS.

Problem

Given a string SS of length nn, construct the suffix array:

SA=sorted indices of all suffixes of S SA = \text{sorted indices of all suffixes of } S

Idea

Suffixes are classified into types based on lexicographic relation with the next suffix:

  • S-type: suffix ii where S[i:]<S[i+1:]S[i:] < S[i+1:]
  • L-type: suffix ii where S[i:]>S[i+1:]S[i:] > S[i+1:]

Define LMS positions as indices where:

  • position ii is S-type
  • position i1i - 1 is L-type

The key idea:

  1. sort LMS substrings
  2. place LMS suffixes into buckets
  3. induce L-type suffixes
  4. induce S-type suffixes

Classification

Traverse from right to left:

if S[i] < S[i+1]: type[i] = S
if S[i] > S[i+1]: type[i] = L
if equal: type[i] = type[i+1]

Algorithm Sketch

induced_sort(S):
    classify each position as S or L

    identify LMS positions

    place LMS suffixes into buckets

    induce L-type suffixes (left to right)

    induce S-type suffixes (right to left)

    if LMS substrings not uniquely ranked:
        compress problem and recurse

    return suffix array

Buckets

Buckets are defined by the first character:

bucket[c]=i:S[i]=c \text{bucket}[c] = { i : S[i] = c }

Each bucket maintains:

  • start position for L-type insertion
  • end position for S-type insertion

Example

Let:

S="banana$" S = \text{"banana\$"}

Suffixes:

indexsuffix
0banana$
1anana$
2nana$
3ana$
4na$
5a$
6$

Induced sorting:

  1. classify types
  2. identify LMS positions
  3. place LMS suffixes
  4. induce L-type and S-type suffixes

Final suffix array:

[6,5,3,1,0,4,2] [6, 5, 3, 1, 0, 4, 2]

Correctness

LMS suffixes represent boundaries where lexicographic decisions occur. Sorting them provides anchors for ordering. L-type suffixes are induced by scanning from left to right using already placed suffixes. S-type suffixes are induced by scanning from right to left.

Each suffix is placed based on previously determined order of adjacent suffixes, ensuring correct global ordering. Recursion guarantees correctness when LMS substrings are not unique.

Complexity

Let:

  • nn be the string length

Time:

O(n) O(n)

Each step processes the string or suffix array a constant number of times.

Space:

O(n) O(n)

Properties

propertyvalue
stableyes
in placeno
comparison basedno
timelinear
domainstrings

When to Use

Use induced sorting when:

  • building suffix arrays for large strings
  • linear time performance is required
  • implementing advanced string processing algorithms

Avoid when:

  • simplicity is more important than performance
  • input size is small
  • a simpler doubling algorithm is sufficient

Notes

  • Core technique behind SA-IS
  • Requires careful bucket management
  • Widely used in bioinformatics and text indexing
  • Often combined with LCP array construction