Linear-time technique for suffix array construction that induces order of suffixes from a subset of sorted suffixes.
Induced sorting is a technique used to construct suffix arrays in linear time. Instead of sorting all suffixes directly, it sorts a subset of suffixes and then induces the order of the remaining ones.
This idea forms the foundation of modern linear-time suffix array algorithms such as SA-IS.
Problem
Given a string of length , construct the suffix array:
Idea
Suffixes are classified into types based on lexicographic relation with the next suffix:
- S-type: suffix where
- L-type: suffix where
Define LMS positions as indices where:
- position is S-type
- position is L-type
The key idea:
- sort LMS substrings
- place LMS suffixes into buckets
- induce L-type suffixes
- induce S-type suffixes
Classification
Traverse from right to left:
if S[i] < S[i+1]: type[i] = S
if S[i] > S[i+1]: type[i] = L
if equal: type[i] = type[i+1]Algorithm Sketch
induced_sort(S):
classify each position as S or L
identify LMS positions
place LMS suffixes into buckets
induce L-type suffixes (left to right)
induce S-type suffixes (right to left)
if LMS substrings not uniquely ranked:
compress problem and recurse
return suffix arrayBuckets
Buckets are defined by the first character:
Each bucket maintains:
- start position for L-type insertion
- end position for S-type insertion
Example
Let:
Suffixes:
| index | suffix |
|---|---|
| 0 | banana$ |
| 1 | anana$ |
| 2 | nana$ |
| 3 | ana$ |
| 4 | na$ |
| 5 | a$ |
| 6 | $ |
Induced sorting:
- classify types
- identify LMS positions
- place LMS suffixes
- induce L-type and S-type suffixes
Final suffix array:
Correctness
LMS suffixes represent boundaries where lexicographic decisions occur. Sorting them provides anchors for ordering. L-type suffixes are induced by scanning from left to right using already placed suffixes. S-type suffixes are induced by scanning from right to left.
Each suffix is placed based on previously determined order of adjacent suffixes, ensuring correct global ordering. Recursion guarantees correctness when LMS substrings are not unique.
Complexity
Let:
- be the string length
Time:
Each step processes the string or suffix array a constant number of times.
Space:
Properties
| property | value |
|---|---|
| stable | yes |
| in place | no |
| comparison based | no |
| time | linear |
| domain | strings |
When to Use
Use induced sorting when:
- building suffix arrays for large strings
- linear time performance is required
- implementing advanced string processing algorithms
Avoid when:
- simplicity is more important than performance
- input size is small
- a simpler doubling algorithm is sufficient
Notes
- Core technique behind SA-IS
- Requires careful bucket management
- Widely used in bioinformatics and text indexing
- Often combined with LCP array construction