# Burstsort String Sorting

# Burstsort String Sorting

Burstsort string sorting is a trie based method for lexicographic string sorting. It stores strings in buckets attached to trie leaves. When a bucket grows too large, the bucket is split by the next character position.

The method is designed for large string collections where comparison sorting repeatedly examines the same prefixes and causes poor cache behavior.

## Problem

Given an array $A$ of strings, sort the strings in lexicographic order.

For two strings $s$ and $t$, lexicographic order compares the first position where they differ. If one string is a prefix of the other, the shorter string comes first.

## Idea

Burstsort avoids repeated full string comparisons by grouping strings according to prefixes.

At a trie node at depth $d$:

* strings in the node share the same prefix of length $d$
* the next character decides the child bucket
* leaf buckets store unsorted strings until they exceed a threshold
* large buckets burst into child buckets

After all strings are inserted, an ordered trie traversal emits buckets in lexicographic order.

## Algorithm

```text id="x4k8qm"
burstsort_strings(A, threshold):
    root = new leaf node

    for s in A:
        insert(root, s, 0, threshold)

    result = empty list
    collect(root, result)
    return result

insert(node, s, d, threshold):
    if node is leaf:
        append s to node.bucket

        if size(node.bucket) > threshold:
            burst(node, d, threshold)

        return

    c = char_at(s, d)
    insert(node.child[c], s, d + 1, threshold)

burst(node, d, threshold):
    old = node.bucket
    node.bucket = empty
    node becomes internal node

    for s in old:
        c = char_at(s, d)
        create child c if missing
        append s to child c bucket

collect(node, result):
    if node is leaf:
        sort node.bucket lexicographically
        append node.bucket to result
        return

    for c in characters in sorted order:
        collect(node.child[c], result)
```

## Sentinel Character

Variable length strings require a sentinel character that sorts before all regular characters. This handles prefix relationships correctly.

For example:

$$
"car" < "cart"
$$

because after matching `c`, `a`, and `r`, the shorter string reaches the sentinel first.

## Example

Sort:

$$
["banana", "band", "bee", "absolute", "acorn"]
$$

After prefix grouping:

| prefix | bucket            |
| ------ | ----------------- |
| a      | absolute, acorn   |
| b      | banana, band, bee |

The `a` bucket is sorted before the `b` bucket. Inside the `b` bucket, further prefix grouping separates `ban` and `be`.

Final order:

$$
["absolute", "acorn", "banana", "band", "bee"]
$$

## Correctness

Each trie edge corresponds to one character position. Therefore, an ordered traversal visits prefixes in lexicographic order. All strings in an earlier child are smaller than all strings in a later child.

Leaf buckets contain strings with a common prefix. Sorting each bucket completes the order among strings that were not split further. Since every string is inserted into exactly one leaf and every leaf is visited in lexicographic order, the final output is sorted.

## Complexity

Let:

* $n$ be the number of strings
* $L$ be the average string length
* $B$ be the bucket threshold
* $\sigma$ be the alphabet size

Typical time:

$$
O(nL)
$$

plus the cost of sorting leaf buckets.

Space:

$$
O(n + T)
$$

where $T$ is the number of trie nodes.

## Properties

| property         | value                 |
| ---------------- | --------------------- |
| stable           | no                    |
| in place         | no                    |
| comparison based | hybrid                |
| cache efficient  | yes                   |
| best for         | large string datasets |

## When to Use

Use burstsort string sorting when sorting many strings with shared prefixes, especially when ordinary comparison sorts spend too much time revisiting the same characters.

Avoid it for small arrays, short strings, or memory constrained settings where a simple comparison sort is easier and sufficient.

## Implementation Sketch

```python id="k3f7za"
class Node:
    def __init__(self):
        self.children = None
        self.bucket = []

def burstsort_strings(a, threshold=32):
    root = Node()

    for s in a:
        insert(root, s, 0, threshold)

    out = []
    collect(root, out)
    return out

def char_at(s, d):
    if d == len(s):
        return ""
    return s[d]

def insert(node, s, d, threshold):
    if node.children is None:
        node.bucket.append(s)

        if len(node.bucket) > threshold:
            burst(node, d)

        return

    c = char_at(s, d)
    if c not in node.children:
        node.children[c] = Node()

    insert(node.children[c], s, d + 1, threshold)

def burst(node, d):
    old = node.bucket
    node.bucket = []
    node.children = {}

    for s in old:
        c = char_at(s, d)
        if c not in node.children:
            node.children[c] = Node()
        node.children[c].bucket.append(s)

def collect(node, out):
    if node.children is None:
        out.extend(sorted(node.bucket))
        return

    for c in sorted(node.children):
        collect(node.children[c], out)
```

