String sorting algorithm that uses three-way partitioning on characters to efficiently handle repeated prefixes.
Three way string quicksort is a refinement of multikey quicksort that uses three-way partitioning to efficiently handle repeated characters. It splits the array into less than, equal to, and greater than partitions based on the character at a given position.
It is one of the most practical general-purpose string sorting algorithms.
Problem
Given an array of strings, sort them in lexicographic order.
For string , define:
as the character at position . If exceeds the string length, use a sentinel value smaller than all characters.
Idea
At depth :
- pick pivot character
- partition into three regions:
| region | condition |
|---|---|
| left | character < |
| middle | character = |
| right | character > |
Then recursively:
- sort left region at same depth
- sort middle region at depth
- sort right region at same depth
This reduces redundant comparisons when many strings share prefixes.
Algorithm
three_way_string_quicksort(A, lo, hi, d):
if hi - lo <= 1:
return
pivot = char_at(A[lo], d)
lt = lo
i = lo + 1
gt = hi - 1
while i <= gt:
c = char_at(A[i], d)
if c < pivot:
swap A[lt], A[i]
lt += 1
i += 1
else if c > pivot:
swap A[i], A[gt]
gt -= 1
else:
i += 1
three_way_string_quicksort(A, lo, lt, d)
if pivot is not sentinel:
three_way_string_quicksort(A, lt, gt + 1, d + 1)
three_way_string_quicksort(A, gt + 1, hi, d)Example
Sort:
At depth 0:
| region | strings |
|---|---|
| b | by |
| s | she, sells, sea, shells |
| t | the |
The s group is recursively sorted at depth 1. Shared prefixes like sh and se are handled efficiently.
Final result:
Correctness
At each recursion level, partitioning ensures that all strings in the left region are lexicographically smaller than those in the middle and right regions at the current character position.
Strings in the middle region share the same character, so their order depends on the next position. Recursive sorting at depth resolves this. By induction on subarray size and depth, the algorithm produces a lexicographically sorted array.
Complexity
Let:
- be the number of strings
- be the average string length
Expected time:
Worst case:
for poor pivot choices.
Space:
due to recursion.
Properties
| property | value |
|---|---|
| stable | no |
| in place | yes |
| comparison based | hybrid |
| efficient for prefixes | yes |
When to Use
Use three way string quicksort when:
- sorting strings with many shared prefixes
- memory usage must remain low
- performance should be close to linear
Avoid when:
- stability is required
- worst case guarantees are critical
- input is adversarial without randomization
Implementation
def three_way_string_quicksort(a):
def char_at(s, d):
if d == len(s):
return -1
return ord(s[d])
def sort(lo, hi, d):
if hi - lo <= 1:
return
pivot = char_at(a[lo], d)
lt, i, gt = lo, lo + 1, hi - 1
while i <= gt:
c = char_at(a[i], d)
if c < pivot:
a[lt], a[i] = a[i], a[lt]
lt += 1
i += 1
elif c > pivot:
a[i], a[gt] = a[gt], a[i]
gt -= 1
else:
i += 1
sort(lo, lt, d)
if pivot >= 0:
sort(lt, gt + 1, d + 1)
sort(gt + 1, hi, d)
sort(0, len(a), 0)
return a