In-place MSD radix string sort that partitions strings by character using American flag style bucket permutation.
American flag string sort is an in-place MSD radix sort for strings. It partitions strings by the character at the current position, then recursively sorts each bucket by the next character.
It uses the same bucket permutation idea as American flag sort, but the digit source is a string character rather than an integer digit.
Problem
Given an array of strings, sort the strings in lexicographic order.
For a string , let:
be the character at position . If reaches the end of the string, use a sentinel value smaller than every real character.
Idea
At each recursion level:
- Count characters at position
- Compute bucket ranges
- Move strings into their correct bucket in place
- Recursively sort nontrivial buckets at position
The sentinel bucket handles shorter strings, so prefix order is correct.
Algorithm
american_flag_string_sort(A, lo, hi, d, alphabet):
if hi - lo <= 1:
return
count = [0] * alphabet
for i from lo to hi - 1:
c = char_at(A[i], d)
count[c] += 1
start = [0] * alphabet
start[0] = lo
for c from 1 to alphabet - 1:
start[c] = start[c - 1] + count[c - 1]
next = copy(start)
end = [start[c] + count[c] for c in 0..alphabet-1]
i = lo
while i < hi:
c = char_at(A[i], d)
if start[c] <= i < end[c]:
i += 1
else:
j = next[c]
swap A[i], A[j]
next[c] += 1
for c from 1 to alphabet - 1:
if end[c] - start[c] > 1:
american_flag_string_sort(A, start[c], end[c], d + 1, alphabet)The loop skips recursive sorting of the sentinel bucket because strings that have ended need no more characters processed.
Example
Sort:
At character position 0:
| bucket | strings |
|---|---|
| c | car, cat, cart |
| d | dog |
The c bucket is sorted by position 1, then position 2:
| prefix | strings |
|---|---|
| car | car, cart |
| cat | cat |
The sentinel makes:
Final result:
Correctness
At each level, bucket order follows character order. All strings in an earlier bucket are lexicographically smaller than all strings in a later bucket at the first position where they differ.
Within a bucket, all strings share the same current character, so recursive sorting by the next position decides their remaining order. The sentinel bucket correctly places shorter prefix strings before longer extensions.
By induction on character position, the final array is lexicographically sorted.
Complexity
Let:
- be the number of strings
- be the average string length
- be the alphabet size
Typical time:
Extra space per recursion level:
The algorithm is in place with respect to the string array.
Properties
| property | value |
|---|---|
| stable | no |
| in place | yes |
| comparison based | no |
| direction | MSD |
| best for | strings and byte sequences |
When to Use
Use American flag string sort when sorting many strings or byte sequences and auxiliary arrays of size are undesirable. It is useful when strings are large, prefix structure matters, and in-place operation is more important than stability.
Avoid it when stability is required, when the alphabet is large and sparse, or when a simpler comparison sort is sufficient.
Implementation
def american_flag_string_sort(a):
alphabet = 257
def char_at(s, d):
if d == len(s):
return 0
return ord(s[d]) + 1
def sort(lo, hi, d):
if hi - lo <= 1:
return
count = [0] * alphabet
for i in range(lo, hi):
count[char_at(a[i], d)] += 1
start = [0] * alphabet
start[0] = lo
for c in range(1, alphabet):
start[c] = start[c - 1] + count[c - 1]
end = [start[c] + count[c] for c in range(alphabet)]
next_pos = start.copy()
i = lo
while i < hi:
c = char_at(a[i], d)
if start[c] <= i < end[c]:
i += 1
else:
j = next_pos[c]
a[i], a[j] = a[j], a[i]
next_pos[c] += 1
for c in range(1, alphabet):
if end[c] - start[c] > 1:
sort(start[c], end[c], d + 1)
sort(0, len(a), 0)
return a