String sorting algorithm that partitions strings by the character at the current depth using three-way quicksort.
Multikey quicksort is a string sorting algorithm that combines quicksort partitioning with radix-style character inspection. Instead of comparing whole strings each time, it partitions strings by the character at a current depth.
It is especially effective for strings with shared prefixes because it avoids repeatedly comparing characters that are already known to be equal.
Problem
Given an array of strings, sort the strings in lexicographic order.
For a string , let:
be the character at position . If reaches the end of the string, use a sentinel value smaller than every real character.
Idea
At depth , choose a pivot character from one string:
Partition the array into three parts:
| region | condition |
|---|---|
| left | character at is less than |
| middle | character at equals |
| right | character at is greater than |
Then:
- sort the left region at the same depth
- sort the middle region at depth
- sort the right region at the same depth
Algorithm
multikey_quicksort(A, lo, hi, d):
if hi - lo <= 1:
return
pivot = char_at(A[lo], d)
lt = lo
i = lo + 1
gt = hi - 1
while i <= gt:
c = char_at(A[i], d)
if c < pivot:
swap A[lt], A[i]
lt += 1
i += 1
else if c > pivot:
swap A[i], A[gt]
gt -= 1
else:
i += 1
multikey_quicksort(A, lo, lt, d)
if pivot is not sentinel:
multikey_quicksort(A, lt, gt + 1, d + 1)
multikey_quicksort(A, gt + 1, hi, d)Example
Sort:
At depth 0, pivot character is c.
| region | strings |
|---|---|
equal to c | car, cat, cart |
greater than c | dog |
The middle region is recursively sorted at depth 1. Later, at depth 2, car and cart share the same prefix. The sentinel places car before cart.
Final result:
Correctness
At each recursive call, the partition step separates strings by their character at depth . Strings in the left region are lexicographically smaller than strings in the middle and right regions at the first differing character. Strings in the right region are lexicographically larger.
Strings in the middle region share the same character at depth , so their order depends on later characters. Recursing on depth correctly sorts that region. By induction over subarray size and depth, the full array becomes lexicographically sorted.
Complexity
Let:
- be the number of strings
- be the average string length
Expected time:
Worst case:
with poor pivot choices.
Auxiliary space:
expected recursion stack, depending on partition balance.
Properties
| property | value |
|---|---|
| stable | no |
| in place | yes |
| comparison based | hybrid |
| direction | left to right |
| best for | string keys |
When to Use
Use multikey quicksort when sorting strings with common prefixes and when in-place operation is useful. It is a strong general-purpose string sort because it combines radix awareness with quicksort-style partitioning.
Avoid it when stability is required or when adversarial inputs can cause poor pivot behavior unless randomized or median-based pivots are used.
Implementation
def multikey_quicksort(a):
def char_at(s, d):
if d == len(s):
return -1
return ord(s[d])
def sort(lo, hi, d):
if hi - lo <= 1:
return
pivot = char_at(a[lo], d)
lt = lo
i = lo + 1
gt = hi - 1
while i <= gt:
c = char_at(a[i], d)
if c < pivot:
a[lt], a[i] = a[i], a[lt]
lt += 1
i += 1
elif c > pivot:
a[i], a[gt] = a[gt], a[i]
gt -= 1
else:
i += 1
sort(lo, lt, d)
if pivot >= 0:
sort(lt, gt + 1, d + 1)
sort(gt + 1, hi, d)
sort(0, len(a), 0)
return a