Skip to content

Stream Sort by Buffer

Sort a stream in bounded chunks by buffering records, sorting each chunk, and emitting sorted runs or partially ordered output.

Stream sort by buffer sorts incoming data in memory-sized batches. It reads a fixed number of elements, sorts that buffer, emits the sorted block, and repeats.

You use it when the full input is too large or too delayed to sort all at once.

Problem

Given a stream of values and a buffer capacity ( B ), produce sorted chunks or sorted runs from the stream.

This does not necessarily produce one globally sorted output unless the sorted chunks are later merged.

Algorithm

stream_sort_by_buffer(stream, B):
    runs = []

    while stream not empty:
        buffer = []

        while length(buffer) < B and stream not empty:
            buffer.append(next(stream))

        sort(buffer)
        runs.append(buffer)

    return runs

To produce one globally sorted output, merge the runs:

merge_sorted_runs(runs):
    heap = min_heap()

    for r in 0..length(runs)-1:
        if runs[r] not empty:
            heap.push((runs[r][0], r, 0))

    output = []

    while heap not empty:
        value, r, i = heap.pop()
        output.append(value)

        if i + 1 < length(runs[r]):
            heap.push((runs[r][i + 1], r, i + 1))

    return output

Example

Stream:

[ [7, 1, 5, 3, 9, 2, 6, 4] ]

Buffer size:

[ B = 4 ]

Sorted runs:

runvalues
1[1, 3, 5, 7]
2[2, 4, 6, 9]

After merging:

[ [1, 2, 3, 4, 5, 6, 7, 9] ]

Correctness

Each buffer is sorted by a correct internal sorting algorithm, so every emitted run is sorted.

If a final multiway merge is performed, the heap always contains the smallest remaining element from each run. Extracting the minimum repeatedly emits the global sorted order.

Complexity

Let ( n ) be the number of elements and ( B ) the buffer size.

phasecost
chunk sorting( O(n \log B) )
merging runs( O(n \log r) ), where ( r = \lceil n/B \rceil )

Space:

[ O(B) ]

for chunk generation, plus heap space ( O(r) ) during merge.

When to Use

Use stream sort by buffer when:

  • input arrives as a stream
  • memory is bounded
  • sorted runs are useful intermediate output
  • external merge sorting is acceptable

It is a simple foundation for external sorting pipelines.