# American Flag String Sort

# American Flag String Sort

American flag string sort is an in-place MSD radix sort for strings. It partitions strings by the character at the current position, then recursively sorts each bucket by the next character.

It uses the same bucket permutation idea as American flag sort, but the digit source is a string character rather than an integer digit.

## Problem

Given an array $A$ of strings, sort the strings in lexicographic order.

For a string $s$, let:

$$
s[d]
$$

be the character at position $d$. If $d$ reaches the end of the string, use a sentinel value smaller than every real character.

## Idea

At each recursion level:

1. Count characters at position $d$
2. Compute bucket ranges
3. Move strings into their correct bucket in place
4. Recursively sort nontrivial buckets at position $d + 1$

The sentinel bucket handles shorter strings, so prefix order is correct.

## Algorithm

```text id="a7k2mx"
american_flag_string_sort(A, lo, hi, d, alphabet):
    if hi - lo <= 1:
        return

    count = [0] * alphabet

    for i from lo to hi - 1:
        c = char_at(A[i], d)
        count[c] += 1

    start = [0] * alphabet
    start[0] = lo

    for c from 1 to alphabet - 1:
        start[c] = start[c - 1] + count[c - 1]

    next = copy(start)
    end = [start[c] + count[c] for c in 0..alphabet-1]

    i = lo
    while i < hi:
        c = char_at(A[i], d)

        if start[c] <= i < end[c]:
            i += 1
        else:
            j = next[c]
            swap A[i], A[j]
            next[c] += 1

    for c from 1 to alphabet - 1:
        if end[c] - start[c] > 1:
            american_flag_string_sort(A, start[c], end[c], d + 1, alphabet)
```

The loop skips recursive sorting of the sentinel bucket because strings that have ended need no more characters processed.

## Example

Sort:

$$
["car", "cat", "dog", "cart"]
$$

At character position 0:

| bucket | strings        |
| ------ | -------------- |
| c      | car, cat, cart |
| d      | dog            |

The `c` bucket is sorted by position 1, then position 2:

| prefix | strings   |
| ------ | --------- |
| car    | car, cart |
| cat    | cat       |

The sentinel makes:

$$
"car" < "cart"
$$

Final result:

$$
["car", "cart", "cat", "dog"]
$$

## Correctness

At each level, bucket order follows character order. All strings in an earlier bucket are lexicographically smaller than all strings in a later bucket at the first position where they differ.

Within a bucket, all strings share the same current character, so recursive sorting by the next position decides their remaining order. The sentinel bucket correctly places shorter prefix strings before longer extensions.

By induction on character position, the final array is lexicographically sorted.

## Complexity

Let:

* $n$ be the number of strings
* $L$ be the average string length
* $\sigma$ be the alphabet size

Typical time:

$$
O(nL)
$$

Extra space per recursion level:

$$
O(\sigma)
$$

The algorithm is in place with respect to the string array.

## Properties

| property         | value                      |
| ---------------- | -------------------------- |
| stable           | no                         |
| in place         | yes                        |
| comparison based | no                         |
| direction        | MSD                        |
| best for         | strings and byte sequences |

## When to Use

Use American flag string sort when sorting many strings or byte sequences and auxiliary arrays of size $n$ are undesirable. It is useful when strings are large, prefix structure matters, and in-place operation is more important than stability.

Avoid it when stability is required, when the alphabet is large and sparse, or when a simpler comparison sort is sufficient.

## Implementation

```python id="s5q8vr"
def american_flag_string_sort(a):
    alphabet = 257

    def char_at(s, d):
        if d == len(s):
            return 0
        return ord(s[d]) + 1

    def sort(lo, hi, d):
        if hi - lo <= 1:
            return

        count = [0] * alphabet

        for i in range(lo, hi):
            count[char_at(a[i], d)] += 1

        start = [0] * alphabet
        start[0] = lo

        for c in range(1, alphabet):
            start[c] = start[c - 1] + count[c - 1]

        end = [start[c] + count[c] for c in range(alphabet)]
        next_pos = start.copy()

        i = lo
        while i < hi:
            c = char_at(a[i], d)

            if start[c] <= i < end[c]:
                i += 1
            else:
                j = next_pos[c]
                a[i], a[j] = a[j], a[i]
                next_pos[c] += 1

        for c in range(1, alphabet):
            if end[c] - start[c] > 1:
                sort(start[c], end[c], d + 1)

    sort(0, len(a), 0)
    return a
```

