External sorting procedure used by database systems for ORDER BY, GROUP BY, DISTINCT, and sort-merge joins.
Database external sort is the disk-based sorting procedure used inside relational database query engines. It appears in query plans for operations such as ORDER BY, GROUP BY, DISTINCT, window functions, and sort-merge joins.
The method is usually a variant of external merge sort with database-specific concerns: rows, tuple comparison, memory quotas, spill files, collation rules, NULL ordering, and stable execution under limited working memory.
Problem
Given a stream of database rows and a sort key, produce rows ordered by that key.
The input may exceed the memory granted to the query operator, so the sort must spill intermediate data to disk.
Core Idea
The database sort operator has two main modes:
In-memory sort If all rows fit in the memory grant, sort directly and emit results.
External sort If rows exceed memory, sort chunks into temporary runs, write them to disk, then merge the runs.
database_external_sort(rows, key, memory_limit):
runs = []
while more rows:
chunk = read rows until memory_limit
sort chunk by key
write chunk as temp run
runs.append(temp run)
if runs is empty:
return sorted in-memory chunk
return merge_runs(runs, key)Run Generation
Rows are read from the child operator until the memory limit is reached. The engine sorts those rows in memory and writes a sorted run to temporary storage.
generate_runs(input, key, memory_limit):
runs = []
while input has rows:
chunk = read rows until memory_limit
sort chunk using database comparator
runs.append(write_temp_run(chunk))
return runsThe comparator must respect database semantics:
- ascending or descending order
- NULLS FIRST or NULLS LAST
- collation for strings
- multiple key columns
- tie-breaking if stable order is required
Merge Phase
The merge phase performs a k-way merge over sorted runs.
merge_runs(runs, key):
heap = empty priority queue
for each run:
read first row
push row into heap
while heap not empty:
row = extract smallest row
emit row
read next row from same run
if row exists:
push row into heapExample
Input rows:
| id | name | score |
|---|---|---|
| 1 | Ada | 91 |
| 2 | Ken | 75 |
| 3 | Lin | 88 |
| 4 | Bea | 75 |
| 5 | Sol | 99 |
Sort key:
ORDER BY score ASC, name ASCIf memory holds only three rows, run generation may produce:
| run | rows |
|---|---|
| R1 | (75, Ken), (88, Lin), (91, Ada) |
| R2 | (75, Bea), (99, Sol) |
The merge compares run heads:
| step | candidates | emitted |
|---|---|---|
| 1 | Ken 75, Bea 75 | Bea 75 |
| 2 | Ken 75, Sol 99 | Ken 75 |
| 3 | Lin 88, Sol 99 | Lin 88 |
| 4 | Ada 91, Sol 99 | Ada 91 |
| 5 | Sol 99 | Sol 99 |
Final order:
| id | name | score |
|---|---|---|
| 4 | Bea | 75 |
| 2 | Ken | 75 |
| 3 | Lin | 88 |
| 1 | Ada | 91 |
| 5 | Sol | 99 |
Correctness
Each generated run is sorted according to the database comparator. The merge phase repeatedly emits the smallest available row among all run heads.
Since each run is sorted, no later row in a run can compare smaller than that run head. Therefore each emitted row is the smallest remaining row globally. After all runs are exhausted, every input row has been emitted once in sorted order.
Complexity
Let:
- be the number of rows
- be the memory grant in rows
- be the number of runs
- be the block size
- be merge fan-in
Run generation costs:
CPU time for in-memory sorting.
Merge CPU cost:
per merge level.
I/O cost:
block transfers, plus the initial read and final write or output stream.
Database-Specific Details
| concern | effect |
|---|---|
| memory grant | determines whether sort spills |
| temp storage | holds sorted runs |
| row width | affects number of rows per run |
| collation | affects string comparison cost |
| NULL ordering | affects comparator logic |
| LIMIT | may allow top-k sort instead of full sort |
| DISTINCT | may remove duplicates during merge |
| GROUP BY | may aggregate while scanning sorted groups |
When to Use
Database external sort is used when a query requires ordered rows and no suitable index already provides the needed order.
Common cases:
ORDER BY- sort-based
GROUP BY DISTINCT- window functions with ordering
- sort-merge join
- index creation
Implementation Sketch
import heapq
def database_external_sort(rows, key, memory_limit):
runs = []
for i in range(0, len(rows), memory_limit):
chunk = rows[i:i + memory_limit]
chunk.sort(key=key)
runs.append(chunk)
return merge_sorted_runs(runs, key)def merge_sorted_runs(runs, key):
heap = []
iters = [iter(run) for run in runs]
for run_id, it in enumerate(iters):
try:
row = next(it)
heapq.heappush(heap, (key(row), run_id, row))
except StopIteration:
pass
out = []
while heap:
_, run_id, row = heapq.heappop(heap)
out.append(row)
try:
nxt = next(iters[run_id])
heapq.heappush(heap, (key(nxt), run_id, nxt))
except StopIteration:
pass
return outNotes
Database external sort is an execution operator, not only a textbook algorithm. Its performance depends heavily on memory grants, row representation, temporary storage, and whether the optimizer can avoid sorting by using an existing index order.