# Stream Sort by Buffer

# Stream Sort by Buffer

Stream sort by buffer sorts incoming data in memory-sized batches. It reads a fixed number of elements, sorts that buffer, emits the sorted block, and repeats.

You use it when the full input is too large or too delayed to sort all at once.

## Problem

Given a stream of values and a buffer capacity ( B ), produce sorted chunks or sorted runs from the stream.

This does not necessarily produce one globally sorted output unless the sorted chunks are later merged.

## Algorithm

```id="f2k8v1"
stream_sort_by_buffer(stream, B):
    runs = []

    while stream not empty:
        buffer = []

        while length(buffer) < B and stream not empty:
            buffer.append(next(stream))

        sort(buffer)
        runs.append(buffer)

    return runs
```

To produce one globally sorted output, merge the runs:

```id="v8m3q6"
merge_sorted_runs(runs):
    heap = min_heap()

    for r in 0..length(runs)-1:
        if runs[r] not empty:
            heap.push((runs[r][0], r, 0))

    output = []

    while heap not empty:
        value, r, i = heap.pop()
        output.append(value)

        if i + 1 < length(runs[r]):
            heap.push((runs[r][i + 1], r, i + 1))

    return output
```

## Example

Stream:

[
[7, 1, 5, 3, 9, 2, 6, 4]
]

Buffer size:

[
B = 4
]

Sorted runs:

| run | values       |
| --- | ------------ |
| 1   | [1, 3, 5, 7] |
| 2   | [2, 4, 6, 9] |

After merging:

[
[1, 2, 3, 4, 5, 6, 7, 9]
]

## Correctness

Each buffer is sorted by a correct internal sorting algorithm, so every emitted run is sorted.

If a final multiway merge is performed, the heap always contains the smallest remaining element from each run. Extracting the minimum repeatedly emits the global sorted order.

## Complexity

Let ( n ) be the number of elements and ( B ) the buffer size.

| phase         | cost                                             |
| ------------- | ------------------------------------------------ |
| chunk sorting | ( O(n \log B) )                                  |
| merging runs  | ( O(n \log r) ), where ( r = \lceil n/B \rceil ) |

Space:

[
O(B)
]

for chunk generation, plus heap space ( O(r) ) during merge.

## When to Use

Use stream sort by buffer when:

* input arrives as a stream
* memory is bounded
* sorted runs are useful intermediate output
* external merge sorting is acceptable

It is a simple foundation for external sorting pipelines.

