Skip to content

Burstsort String Sorting

String sorting method that groups strings by prefixes in a trie and sorts leaf buckets for cache efficient lexicographic order.

Burstsort string sorting is a trie based method for lexicographic string sorting. It stores strings in buckets attached to trie leaves. When a bucket grows too large, the bucket is split by the next character position.

The method is designed for large string collections where comparison sorting repeatedly examines the same prefixes and causes poor cache behavior.

Problem

Given an array AA of strings, sort the strings in lexicographic order.

For two strings ss and tt, lexicographic order compares the first position where they differ. If one string is a prefix of the other, the shorter string comes first.

Idea

Burstsort avoids repeated full string comparisons by grouping strings according to prefixes.

At a trie node at depth dd:

  • strings in the node share the same prefix of length dd
  • the next character decides the child bucket
  • leaf buckets store unsorted strings until they exceed a threshold
  • large buckets burst into child buckets

After all strings are inserted, an ordered trie traversal emits buckets in lexicographic order.

Algorithm

burstsort_strings(A, threshold):
    root = new leaf node

    for s in A:
        insert(root, s, 0, threshold)

    result = empty list
    collect(root, result)
    return result

insert(node, s, d, threshold):
    if node is leaf:
        append s to node.bucket

        if size(node.bucket) > threshold:
            burst(node, d, threshold)

        return

    c = char_at(s, d)
    insert(node.child[c], s, d + 1, threshold)

burst(node, d, threshold):
    old = node.bucket
    node.bucket = empty
    node becomes internal node

    for s in old:
        c = char_at(s, d)
        create child c if missing
        append s to child c bucket

collect(node, result):
    if node is leaf:
        sort node.bucket lexicographically
        append node.bucket to result
        return

    for c in characters in sorted order:
        collect(node.child[c], result)

Sentinel Character

Variable length strings require a sentinel character that sorts before all regular characters. This handles prefix relationships correctly.

For example:

"car"<"cart" "car" < "cart"

because after matching c, a, and r, the shorter string reaches the sentinel first.

Example

Sort:

["banana","band","bee","absolute","acorn"] ["banana", "band", "bee", "absolute", "acorn"]

After prefix grouping:

prefixbucket
aabsolute, acorn
bbanana, band, bee

The a bucket is sorted before the b bucket. Inside the b bucket, further prefix grouping separates ban and be.

Final order:

["absolute","acorn","banana","band","bee"] ["absolute", "acorn", "banana", "band", "bee"]

Correctness

Each trie edge corresponds to one character position. Therefore, an ordered traversal visits prefixes in lexicographic order. All strings in an earlier child are smaller than all strings in a later child.

Leaf buckets contain strings with a common prefix. Sorting each bucket completes the order among strings that were not split further. Since every string is inserted into exactly one leaf and every leaf is visited in lexicographic order, the final output is sorted.

Complexity

Let:

  • nn be the number of strings
  • LL be the average string length
  • BB be the bucket threshold
  • σ\sigma be the alphabet size

Typical time:

O(nL) O(nL)

plus the cost of sorting leaf buckets.

Space:

O(n+T) O(n + T)

where TT is the number of trie nodes.

Properties

propertyvalue
stableno
in placeno
comparison basedhybrid
cache efficientyes
best forlarge string datasets

When to Use

Use burstsort string sorting when sorting many strings with shared prefixes, especially when ordinary comparison sorts spend too much time revisiting the same characters.

Avoid it for small arrays, short strings, or memory constrained settings where a simple comparison sort is easier and sufficient.

Implementation Sketch

class Node:
    def __init__(self):
        self.children = None
        self.bucket = []

def burstsort_strings(a, threshold=32):
    root = Node()

    for s in a:
        insert(root, s, 0, threshold)

    out = []
    collect(root, out)
    return out

def char_at(s, d):
    if d == len(s):
        return ""
    return s[d]

def insert(node, s, d, threshold):
    if node.children is None:
        node.bucket.append(s)

        if len(node.bucket) > threshold:
            burst(node, d)

        return

    c = char_at(s, d)
    if c not in node.children:
        node.children[c] = Node()

    insert(node.children[c], s, d + 1, threshold)

def burst(node, d):
    old = node.bucket
    node.bucket = []
    node.children = {}

    for s in old:
        c = char_at(s, d)
        if c not in node.children:
            node.children[c] = Node()
        node.children[c].bucket.append(s)

def collect(node, out):
    if node.children is None:
        out.extend(sorted(node.bucket))
        return

    for c in sorted(node.children):
        collect(node.children[c], out)