Skip to content

Multikey Quicksort

String sorting algorithm that partitions strings by the character at the current depth using three-way quicksort.

Multikey quicksort is a string sorting algorithm that combines quicksort partitioning with radix-style character inspection. Instead of comparing whole strings each time, it partitions strings by the character at a current depth.

It is especially effective for strings with shared prefixes because it avoids repeatedly comparing characters that are already known to be equal.

Problem

Given an array AA of strings, sort the strings in lexicographic order.

For a string ss, let:

s[d] s[d]

be the character at position dd. If dd reaches the end of the string, use a sentinel value smaller than every real character.

Idea

At depth dd, choose a pivot character from one string:

v=s[d] v = s[d]

Partition the array into three parts:

regioncondition
leftcharacter at dd is less than vv
middlecharacter at dd equals vv
rightcharacter at dd is greater than vv

Then:

  • sort the left region at the same depth
  • sort the middle region at depth d+1d + 1
  • sort the right region at the same depth

Algorithm

multikey_quicksort(A, lo, hi, d):
    if hi - lo <= 1:
        return

    pivot = char_at(A[lo], d)

    lt = lo
    i = lo + 1
    gt = hi - 1

    while i <= gt:
        c = char_at(A[i], d)

        if c < pivot:
            swap A[lt], A[i]
            lt += 1
            i += 1
        else if c > pivot:
            swap A[i], A[gt]
            gt -= 1
        else:
            i += 1

    multikey_quicksort(A, lo, lt, d)

    if pivot is not sentinel:
        multikey_quicksort(A, lt, gt + 1, d + 1)

    multikey_quicksort(A, gt + 1, hi, d)

Example

Sort:

["car","cat","dog","cart"] ["car", "cat", "dog", "cart"]

At depth 0, pivot character is c.

regionstrings
equal to ccar, cat, cart
greater than cdog

The middle region is recursively sorted at depth 1. Later, at depth 2, car and cart share the same prefix. The sentinel places car before cart.

Final result:

["car","cart","cat","dog"] ["car", "cart", "cat", "dog"]

Correctness

At each recursive call, the partition step separates strings by their character at depth dd. Strings in the left region are lexicographically smaller than strings in the middle and right regions at the first differing character. Strings in the right region are lexicographically larger.

Strings in the middle region share the same character at depth dd, so their order depends on later characters. Recursing on depth d+1d + 1 correctly sorts that region. By induction over subarray size and depth, the full array becomes lexicographically sorted.

Complexity

Let:

  • nn be the number of strings
  • LL be the average string length

Expected time:

O(nL) O(nL)

Worst case:

O(n2+nL) O(n^2 + nL)

with poor pivot choices.

Auxiliary space:

O(logn+L) O(\log n + L)

expected recursion stack, depending on partition balance.

Properties

propertyvalue
stableno
in placeyes
comparison basedhybrid
directionleft to right
best forstring keys

When to Use

Use multikey quicksort when sorting strings with common prefixes and when in-place operation is useful. It is a strong general-purpose string sort because it combines radix awareness with quicksort-style partitioning.

Avoid it when stability is required or when adversarial inputs can cause poor pivot behavior unless randomized or median-based pivots are used.

Implementation

def multikey_quicksort(a):
    def char_at(s, d):
        if d == len(s):
            return -1
        return ord(s[d])

    def sort(lo, hi, d):
        if hi - lo <= 1:
            return

        pivot = char_at(a[lo], d)
        lt = lo
        i = lo + 1
        gt = hi - 1

        while i <= gt:
            c = char_at(a[i], d)

            if c < pivot:
                a[lt], a[i] = a[i], a[lt]
                lt += 1
                i += 1
            elif c > pivot:
                a[i], a[gt] = a[gt], a[i]
                gt -= 1
            else:
                i += 1

        sort(lo, lt, d)

        if pivot >= 0:
            sort(lt, gt + 1, d + 1)

        sort(gt + 1, hi, d)

    sort(0, len(a), 0)
    return a