Sort keys by processing the most significant digit first and recursively sorting subarrays.
MSD radix sort processes digits from most significant to least significant. It partitions the array by the leading digit, then recursively sorts each partition.
This approach follows lexicographic order naturally and works well for strings and variable length keys.
Problem
Given an array of keys, each represented as a sequence of digits or characters, sort them lexicographically.
A key can be written as:
MSD radix sort processes:
Idea
At each step:
- Group elements by current digit
- Recursively sort each group by the next digit
- Concatenate groups in order
Algorithm
msd_radix_sort(A, d, base):
if len(A) <= 1:
return A
buckets = array of lists of size base
for x in A:
digit = get_digit(x, d)
buckets[digit].append(x)
result = empty list
for i from 0 to base - 1:
if buckets[i] not empty:
sorted_bucket = msd_radix_sort(buckets[i], d + 1, base)
append sorted_bucket to result
return resultExample
Sort strings:
Step 1, group by first character:
- “a”: [“apple”]
- “c”: [“cat”, “car”]
- “d”: [“dog”]
Step 2, recursively sort each group:
- “apple” remains
- “cat”, “car” sorted by next character
- “dog” remains
Final result:
Handling Variable Length Keys
For strings of different lengths, define a sentinel digit:
- shorter keys have a special value smaller than all valid digits
- this ensures correct lexicographic order
Example:
Correctness
At each level, elements are grouped by the current digit. All elements in a group share the same prefix. Recursive sorting orders them by remaining suffix. Concatenation of groups yields full lexicographic order.
Complexity
Let:
- be number of keys
- be average key length
- be base
Time:
in typical cases
Worst case may degrade when many keys share long common prefixes.
Space:
due to recursion and bucket storage
When to Use
Use MSD radix sort when:
- sorting strings or variable length keys
- lexicographic order is required
- keys share common prefixes
- recursive partitioning improves locality
Avoid when recursion overhead is high or when keys are fixed width integers where LSD is simpler.
Implementation
def msd_radix_sort(a, d=0):
if len(a) <= 1:
return a
buckets = {}
for s in a:
key = s[d] if d < len(s) else ""
buckets.setdefault(key, []).append(s)
result = []
for key in sorted(buckets.keys()):
result.extend(msd_radix_sort(buckets[key], d + 1))
return result