External-memory sorting method that inserts records into a buffered tree and extracts them in sorted order.
Buffer tree sort uses a buffer tree to sort data in the external-memory model. Instead of sorting all records directly, it inserts records into a high fanout search tree whose internal nodes hold buffers. When a buffer fills, records are pushed downward in batches.
The purpose is to reduce random I/O. Records move through the tree in large blocks, which makes the algorithm efficient when data is stored on disk or another block device.
Model
Assume:
- records
- memory capacity
- block size
- external storage where block transfers dominate cost
A buffer tree usually has fanout about , so each internal node has many children and a buffer large enough to flush many records at once.
Core Idea
Insert all records into the root buffer. When a buffer becomes full, sort or partition its records by child range and flush them downward in large batches. After all insertions finish, flush all remaining buffers. The leaves then contain records in key order.
Finally, scan the leaves from left to right to produce the sorted output.
Algorithm
buffer_tree_sort(records):
T = empty buffer tree
for record in records:
insert record into root buffer
if root buffer is full:
flush(root)
flush all remaining buffers
return scan leaves from left to rightThe flush operation moves a node buffer downward in batches.
flush(node):
group buffered records by child key range
for each child group:
append group to child buffer
if child buffer is full:
flush(child)
clear node bufferExample
Suppose a buffer tree has three leaf ranges:
| leaf | key range |
|---|---|
| L0 | keys < 30 |
| L1 | 30 <= keys < 70 |
| L2 | keys >= 70 |
Input:
Records first enter the root buffer. When flushed, they are distributed:
| leaf | records |
|---|---|
| L0 | 7, 18 |
| L1 | 42, 55 |
| L2 | 90, 71 |
Each leaf is sorted locally:
| leaf | sorted records |
|---|---|
| L0 | 7, 18 |
| L1 | 42, 55 |
| L2 | 71, 90 |
Scanning leaves gives:
Correctness
Each internal node partitions records by ordered key ranges. When a record is flushed from a node to a child, it is sent to the unique child whose range contains the record key.
After all buffers are flushed, every record reaches a leaf whose key range contains it. Leaves appear in increasing key range order, and records within each leaf are sorted. Therefore scanning the leaves from left to right returns all records in sorted order.
Complexity
Buffer tree sort achieves the external sorting bound:
I/O operations under the standard external-memory assumptions.
The CPU cost depends on the cost of partitioning and sorting buffered batches. In practice, batching improves locality and reduces per-record overhead.
Why Buffering Helps
A plain search tree may move one record at a time through many nodes. That causes many small and random I/O operations.
A buffer tree accumulates many records at a node and moves them together. This changes the access pattern from fine-grained updates to block-sized transfers.
Comparison
| method | main operation | I/O pattern |
|---|---|---|
| External merge sort | generate runs, then merge | sequential |
| Buffer tree sort | batch insert into buffered tree | batched tree I/O |
| B-tree bulk load sort | sort first, then build tree | sequential build |
When to Use
Use buffer tree sort when:
- data is too large for memory
- many records are inserted before sorted output is needed
- batched external-memory updates are natural
- you want asymptotically optimal external sorting I/O
- the data may later remain in a searchable tree structure
It is less common as a simple standalone file sort, where external merge sort is usually easier to implement.
Implementation Sketch
class BufferNode:
def __init__(self, splitters=None, children=None):
self.splitters = splitters or []
self.children = children or []
self.buffer = []
self.records = []
def is_leaf(self):
return len(self.children) == 0from bisect import bisect_left
def insert(node, record, capacity):
node.buffer.append(record)
if len(node.buffer) >= capacity:
flush(node, capacity)
def flush(node, capacity):
if node.is_leaf():
node.records.extend(node.buffer)
node.buffer.clear()
return
groups = [[] for _ in node.children]
for record in node.buffer:
i = bisect_left(node.splitters, record)
groups[i].append(record)
node.buffer.clear()
for child, group in zip(node.children, groups):
child.buffer.extend(group)
if len(child.buffer) >= capacity:
flush(child, capacity)def scan(node):
if node.is_leaf():
return sorted(node.records + node.buffer)
out = []
for child in node.children:
out.extend(scan(child))
return outNotes
Buffer tree sort is important because it shows how sorting can be expressed as batched ordered insertion. The tree structure preserves key order, while buffers make the movement of records efficient in external memory.