Skip to content

American Flag String Sort

In-place MSD radix string sort that partitions strings by character using American flag style bucket permutation.

American flag string sort is an in-place MSD radix sort for strings. It partitions strings by the character at the current position, then recursively sorts each bucket by the next character.

It uses the same bucket permutation idea as American flag sort, but the digit source is a string character rather than an integer digit.

Problem

Given an array AA of strings, sort the strings in lexicographic order.

For a string ss, let:

s[d] s[d]

be the character at position dd. If dd reaches the end of the string, use a sentinel value smaller than every real character.

Idea

At each recursion level:

  1. Count characters at position dd
  2. Compute bucket ranges
  3. Move strings into their correct bucket in place
  4. Recursively sort nontrivial buckets at position d+1d + 1

The sentinel bucket handles shorter strings, so prefix order is correct.

Algorithm

american_flag_string_sort(A, lo, hi, d, alphabet):
    if hi - lo <= 1:
        return

    count = [0] * alphabet

    for i from lo to hi - 1:
        c = char_at(A[i], d)
        count[c] += 1

    start = [0] * alphabet
    start[0] = lo

    for c from 1 to alphabet - 1:
        start[c] = start[c - 1] + count[c - 1]

    next = copy(start)
    end = [start[c] + count[c] for c in 0..alphabet-1]

    i = lo
    while i < hi:
        c = char_at(A[i], d)

        if start[c] <= i < end[c]:
            i += 1
        else:
            j = next[c]
            swap A[i], A[j]
            next[c] += 1

    for c from 1 to alphabet - 1:
        if end[c] - start[c] > 1:
            american_flag_string_sort(A, start[c], end[c], d + 1, alphabet)

The loop skips recursive sorting of the sentinel bucket because strings that have ended need no more characters processed.

Example

Sort:

["car","cat","dog","cart"] ["car", "cat", "dog", "cart"]

At character position 0:

bucketstrings
ccar, cat, cart
ddog

The c bucket is sorted by position 1, then position 2:

prefixstrings
carcar, cart
catcat

The sentinel makes:

"car"<"cart" "car" < "cart"

Final result:

["car","cart","cat","dog"] ["car", "cart", "cat", "dog"]

Correctness

At each level, bucket order follows character order. All strings in an earlier bucket are lexicographically smaller than all strings in a later bucket at the first position where they differ.

Within a bucket, all strings share the same current character, so recursive sorting by the next position decides their remaining order. The sentinel bucket correctly places shorter prefix strings before longer extensions.

By induction on character position, the final array is lexicographically sorted.

Complexity

Let:

  • nn be the number of strings
  • LL be the average string length
  • σ\sigma be the alphabet size

Typical time:

O(nL) O(nL)

Extra space per recursion level:

O(σ) O(\sigma)

The algorithm is in place with respect to the string array.

Properties

propertyvalue
stableno
in placeyes
comparison basedno
directionMSD
best forstrings and byte sequences

When to Use

Use American flag string sort when sorting many strings or byte sequences and auxiliary arrays of size nn are undesirable. It is useful when strings are large, prefix structure matters, and in-place operation is more important than stability.

Avoid it when stability is required, when the alphabet is large and sparse, or when a simpler comparison sort is sufficient.

Implementation

def american_flag_string_sort(a):
    alphabet = 257

    def char_at(s, d):
        if d == len(s):
            return 0
        return ord(s[d]) + 1

    def sort(lo, hi, d):
        if hi - lo <= 1:
            return

        count = [0] * alphabet

        for i in range(lo, hi):
            count[char_at(a[i], d)] += 1

        start = [0] * alphabet
        start[0] = lo

        for c in range(1, alphabet):
            start[c] = start[c - 1] + count[c - 1]

        end = [start[c] + count[c] for c in range(alphabet)]
        next_pos = start.copy()

        i = lo
        while i < hi:
            c = char_at(a[i], d)

            if start[c] <= i < end[c]:
                i += 1
            else:
                j = next_pos[c]
                a[i], a[j] = a[j], a[i]
                next_pos[c] += 1

        for c in range(1, alphabet):
            if end[c] - start[c] > 1:
                sort(start[c], end[c], d + 1)

    sort(0, len(a), 0)
    return a