External sorting algorithm that repeatedly merges groups of k sorted runs until one final sorted run remains.
Balanced k way merge sort is an external sorting algorithm that repeatedly merges sorted runs in groups of at most . Unlike polyphase merge sort, it tries to keep the merge work evenly distributed across passes.
It is a simple and common model for large file sorting. The input is first divided into memory-sized chunks, each chunk is sorted internally, and then the resulting sorted runs are merged level by level.
Problem
Given a dataset too large to fit in memory, sort it using bounded memory and sequential external I/O.
The algorithm assumes that memory can hold:
- one input buffer for each of runs
- one output buffer
- a small priority queue or loser tree
Core Idea
Generate sorted runs first. Then repeatedly merge up to runs into larger sorted runs.
If there are initial runs, each merge pass reduces the number of runs to about:
The process stops when only one run remains.
Algorithm
balanced_k_way_merge_sort(input, k):
runs = generate_sorted_runs(input)
while length(runs) > 1:
next_runs = []
for each group G of at most k runs:
merged = k_way_merge(G)
next_runs.append(merged)
runs = next_runs
return runs[0]The merge step keeps the smallest current element from each input run in a min-heap.
k_way_merge(runs):
heap = empty min-heap
for each run r:
read first element from r
push (value, r) into heap
while heap is not empty:
(value, r) = extract_min(heap)
write value to output
if r has another element:
read next value from r
push (next value, r) into heapExample
Suppose there are 10 sorted runs and .
Initial runs:
Pass 1 merges groups of 3:
| group | input runs | output runs |
|---|---|---|
| 1 | 3 | 1 |
| 2 | 3 | 1 |
| 3 | 3 | 1 |
| 4 | 1 | 1 |
After pass 1:
Pass 2 merges:
| group | input runs | output runs |
|---|---|---|
| 1 | 3 | 1 |
| 2 | 1 | 1 |
After pass 2:
Pass 3 merges both remaining runs:
The final run is the sorted output.
Correctness
Each initial run is sorted by an internal sorting algorithm. During a k way merge, the algorithm repeatedly outputs the smallest head element among all active runs. Since every run is sorted, no later element in a run can be smaller than its current head. Therefore each output element is the smallest remaining element overall.
Each merge pass preserves all elements and produces sorted runs. The number of runs decreases after each pass. When one run remains, it contains every input element in sorted order.
Complexity
Let:
- be the number of elements
- be the memory capacity in elements
- be the number of initial runs
- be the block size
- be the merge fan-in
The number of merge passes is:
The I/O cost is:
The CPU cost of heap-based merging is:
per full merge level, with total comparison cost proportional to all merge passes.
Design Choices
| choice | effect |
|---|---|
| larger run size | fewer initial runs |
| larger | fewer merge passes |
| smaller | cheaper heap operations |
| larger buffers | fewer I/O calls |
| loser tree instead of heap | fewer comparisons during merge |
When to Use
Use balanced k way merge sort when:
- data is larger than memory
- sequential I/O is preferred
- many sorted runs must be merged
- implementation simplicity matters
- file handles and memory buffers are sufficient for inputs
It is commonly used in database sort operators, external file sorting, and log processing pipelines.
Implementation Sketch
import heapq
def k_way_merge(runs):
heap = []
iters = [iter(run) for run in runs]
for run_id, it in enumerate(iters):
try:
value = next(it)
heapq.heappush(heap, (value, run_id))
except StopIteration:
pass
output = []
while heap:
value, run_id = heapq.heappop(heap)
output.append(value)
try:
nxt = next(iters[run_id])
heapq.heappush(heap, (nxt, run_id))
except StopIteration:
pass
return outputdef balanced_k_way_merge_sort(runs, k):
while len(runs) > 1:
next_runs = []
for i in range(0, len(runs), k):
group = runs[i:i + k]
next_runs.append(k_way_merge(group))
runs = next_runs
return runs[0] if runs else []Notes
Balanced k way merge sort favors predictable passes and simple scheduling. It usually fits modern storage better than polyphase merge sort because modern systems can often keep many files open and use large sequential buffers.