# Induced Sorting

# Induced Sorting

Induced sorting is a technique used to construct suffix arrays in linear time. Instead of sorting all suffixes directly, it sorts a subset of suffixes and then induces the order of the remaining ones.

This idea forms the foundation of modern linear-time suffix array algorithms such as SA-IS.

## Problem

Given a string $S$ of length $n$, construct the suffix array:

$$
SA = \text{sorted indices of all suffixes of } S
$$

## Idea

Suffixes are classified into types based on lexicographic relation with the next suffix:

* S-type: suffix $i$ where $S[i:] < S[i+1:]$
* L-type: suffix $i$ where $S[i:] > S[i+1:]$

Define LMS positions as indices where:

* position $i$ is S-type
* position $i - 1$ is L-type

The key idea:

1. sort LMS substrings
2. place LMS suffixes into buckets
3. induce L-type suffixes
4. induce S-type suffixes

## Classification

Traverse from right to left:

```text
if S[i] < S[i+1]: type[i] = S
if S[i] > S[i+1]: type[i] = L
if equal: type[i] = type[i+1]
```

## Algorithm Sketch

```text id="i7k2xp"
induced_sort(S):
    classify each position as S or L

    identify LMS positions

    place LMS suffixes into buckets

    induce L-type suffixes (left to right)

    induce S-type suffixes (right to left)

    if LMS substrings not uniquely ranked:
        compress problem and recurse

    return suffix array
```

## Buckets

Buckets are defined by the first character:

$$
\text{bucket}[c] = { i : S[i] = c }
$$

Each bucket maintains:

* start position for L-type insertion
* end position for S-type insertion

## Example

Let:

$$
S = \text{"banana\$"}
$$

Suffixes:

| index | suffix  |
| ----- | ------- |
| 0     | banana$ |
| 1     | anana$  |
| 2     | nana$   |
| 3     | ana$    |
| 4     | na$     |
| 5     | a$      |
| 6     | $       |

Induced sorting:

1. classify types
2. identify LMS positions
3. place LMS suffixes
4. induce L-type and S-type suffixes

Final suffix array:

$$
[6, 5, 3, 1, 0, 4, 2]
$$

## Correctness

LMS suffixes represent boundaries where lexicographic decisions occur. Sorting them provides anchors for ordering. L-type suffixes are induced by scanning from left to right using already placed suffixes. S-type suffixes are induced by scanning from right to left.

Each suffix is placed based on previously determined order of adjacent suffixes, ensuring correct global ordering. Recursion guarantees correctness when LMS substrings are not unique.

## Complexity

Let:

* $n$ be the string length

Time:

$$
O(n)
$$

Each step processes the string or suffix array a constant number of times.

Space:

$$
O(n)
$$

## Properties

| property         | value   |
| ---------------- | ------- |
| stable           | yes     |
| in place         | no      |
| comparison based | no      |
| time             | linear  |
| domain           | strings |

## When to Use

Use induced sorting when:

* building suffix arrays for large strings
* linear time performance is required
* implementing advanced string processing algorithms

Avoid when:

* simplicity is more important than performance
* input size is small
* a simpler doubling algorithm is sufficient

## Notes

* Core technique behind SA-IS
* Requires careful bucket management
* Widely used in bioinformatics and text indexing
* Often combined with LCP array construction

