Sort a stream in bounded chunks by buffering records, sorting each chunk, and emitting sorted runs or partially ordered output.
Stream sort by buffer sorts incoming data in memory-sized batches. It reads a fixed number of elements, sorts that buffer, emits the sorted block, and repeats.
You use it when the full input is too large or too delayed to sort all at once.
Problem
Given a stream of values and a buffer capacity ( B ), produce sorted chunks or sorted runs from the stream.
This does not necessarily produce one globally sorted output unless the sorted chunks are later merged.
Algorithm
stream_sort_by_buffer(stream, B):
runs = []
while stream not empty:
buffer = []
while length(buffer) < B and stream not empty:
buffer.append(next(stream))
sort(buffer)
runs.append(buffer)
return runsTo produce one globally sorted output, merge the runs:
merge_sorted_runs(runs):
heap = min_heap()
for r in 0..length(runs)-1:
if runs[r] not empty:
heap.push((runs[r][0], r, 0))
output = []
while heap not empty:
value, r, i = heap.pop()
output.append(value)
if i + 1 < length(runs[r]):
heap.push((runs[r][i + 1], r, i + 1))
return outputExample
Stream:
[ [7, 1, 5, 3, 9, 2, 6, 4] ]
Buffer size:
[ B = 4 ]
Sorted runs:
| run | values |
|---|---|
| 1 | [1, 3, 5, 7] |
| 2 | [2, 4, 6, 9] |
After merging:
[ [1, 2, 3, 4, 5, 6, 7, 9] ]
Correctness
Each buffer is sorted by a correct internal sorting algorithm, so every emitted run is sorted.
If a final multiway merge is performed, the heap always contains the smallest remaining element from each run. Extracting the minimum repeatedly emits the global sorted order.
Complexity
Let ( n ) be the number of elements and ( B ) the buffer size.
| phase | cost |
|---|---|
| chunk sorting | ( O(n \log B) ) |
| merging runs | ( O(n \log r) ), where ( r = \lceil n/B \rceil ) |
Space:
[ O(B) ]
for chunk generation, plus heap space ( O(r) ) during merge.
When to Use
Use stream sort by buffer when:
- input arrives as a stream
- memory is bounded
- sorted runs are useful intermediate output
- external merge sorting is acceptable
It is a simple foundation for external sorting pipelines.