Skip to content

Burstsort

Cache efficient string sorting algorithm that builds a trie and sorts buckets lazily for high performance on large text data.

Burstsort is a high performance string sorting algorithm designed for large datasets. It combines trie based partitioning with deferred sorting of small buckets to improve cache locality and reduce memory traffic.

Instead of sorting the entire dataset directly, it incrementally builds a trie where leaves store buckets of strings. When a bucket grows beyond a threshold, it “bursts” into child nodes.

Problem

Given a large array of strings, sort them lexicographically with high efficiency, especially when data exceeds cache capacity.

Idea

  • Build a trie based on string prefixes
  • Store strings in leaf buckets
  • When a bucket exceeds a size threshold, split it into sub-buckets
  • After construction, sort each bucket individually
  • Concatenate buckets in trie order

This approach avoids repeatedly scanning long prefixes and improves locality.

Structure

Each node in the trie contains:

  • an array or map of child pointers indexed by character
  • a bucket of strings (if leaf)

A node starts as a leaf with a bucket. When the bucket grows too large, it bursts into an internal node.

Algorithm

burstsort(A, threshold):
    root = new node

    for s in A:
        insert(root, s, 0)

    result = []
    traverse(root, result)
    return result

insert(node, s, d):
    if node is leaf:
        add s to node.bucket

        if size(node.bucket) > threshold:
            burst(node, d)
    else:
        c = char_at(s, d)
        insert(node.child[c], s, d + 1)

burst(node, d):
    create children for node

    for s in node.bucket:
        c = char_at(s, d)
        move s to child[c]

    clear node.bucket

traverse(node, result):
    if node is leaf:
        sort node.bucket
        append to result
    else:
        for each child in order:
            traverse(child, result)

Example

Sort:

["banana","band","bee","absolute","acorn"] ["banana", "band", "bee", "absolute", "acorn"]

Trie structure groups by prefixes:

  • “a” → [“absolute”, “acorn”]
  • “b” → [“banana”, “band”, “bee”]

Further bursting separates by next characters. Final traversal yields sorted order:

["absolute","acorn","banana","band","bee"] ["absolute", "acorn", "banana", "band", "bee"]

Correctness

Trie structure ensures that all strings sharing a prefix are grouped together. Traversing children in lexicographic order produces sorted output. Sorting within buckets completes ordering among strings sharing full prefixes up to that node.

Complexity

Let:

  • nn be number of strings
  • LL be average length

Time:

O(nL) O(n \cdot L)

with improved constant factors due to reduced cache misses.

Space:

O(nL) O(n \cdot L)

for storing trie structure and strings.

Properties

propertyvalue
stableno
in placeno
comparison basedpartly
cache efficientyes
suitable forlarge string datasets

When to Use

Use burstsort when:

  • sorting very large collections of strings
  • cache performance is critical
  • strings share common prefixes
  • traditional comparison sorts become memory bound

It is often faster than quicksort or mergesort on large text workloads.

Implementation Notes

  • Threshold size affects performance and must be tuned
  • Child representation can use arrays or hash maps
  • Hybrid approaches may use insertion sort inside buckets
  • Memory allocation patterns strongly influence performance