# Cascade Merge Sort

# Cascade Merge Sort

Cascade merge sort is an external sorting method that organizes merging into a pipeline of stages. Each stage merges a group of sorted runs and passes the resulting larger runs to the next stage.

The purpose is to reduce repeated redistribution and make external merging easier to schedule. It is most useful when runs arrive gradually or when storage bandwidth can be used by several merge stages in sequence.

## Problem

Given many sorted runs on external storage, produce one final sorted run while using bounded memory.

The input is too large to fit in RAM, so the algorithm must read and write runs in blocks.

## Core Idea

Instead of treating each merge pass as a separate global operation, cascade merge sort arranges merge stages in levels.

A lower level merges small runs. Its output becomes input to a higher level. Higher levels merge larger runs. The final level produces the full sorted file.

```text id="s4k12a"
small runs
    -> merge level 1
        -> larger runs
            -> merge level 2
                -> still larger runs
                    -> final merge
```

## Algorithm

```text id="73jct2"
cascade_merge_sort(input, fan_in):
    runs = generate_sorted_runs(input)

    levels = empty list

    for run in runs:
        insert_run_into_level(levels, run, fan_in)

    while more than one run exists across all levels:
        merge_ready_level(levels, fan_in)

    return the only remaining run
```

A simple level rule is:

```text id="nm2afs"
insert_run_into_level(levels, run, fan_in):
    place run at level 0

    while level has fan_in runs:
        merged = merge all runs at this level
        clear this level
        place merged run at next level
```

This is similar to carrying in base $k$: when a level has $k$ runs, they are merged into one run at the next level.

## Example

Let the merge fan-in be:

$$
k = 2
$$

Suppose 8 sorted runs arrive:

$$
R_1, R_2, R_3, R_4, R_5, R_6, R_7, R_8
$$

The cascade behaves like this:

| arrival | action                                                                  |
| ------- | ----------------------------------------------------------------------- |
| $R_1$   | store at level 0                                                        |
| $R_2$   | merge $R_1, R_2$ into level 1                                           |
| $R_3$   | store at level 0                                                        |
| $R_4$   | merge $R_3, R_4$ into level 1, then merge two level 1 runs into level 2 |
| $R_5$   | store at level 0                                                        |
| $R_6$   | merge $R_5, R_6$ into level 1                                           |
| $R_7$   | store at level 0                                                        |
| $R_8$   | merge $R_7, R_8$ into level 1, then cascade upward                      |

At the end, one run remains at the highest level.

## Correctness

Each generated run is sorted. Every merge combines sorted runs into a larger sorted run by repeatedly choosing the smallest current head element among the inputs.

The cascade only changes the order in which merges occur. It does not discard, duplicate, or modify elements. Since each merge preserves sorted order and all elements are eventually included in one remaining run, the final output is sorted and contains exactly the original input elements.

## Complexity

Let:

* $N$ be the number of elements
* $R$ be the number of initial runs
* $k$ be the merge fan-in
* $B$ be the block size

The number of merge levels is:

$$
O(\log_k R)
$$

The I/O cost is:

$$
O\left(\frac{N}{B} \log_k R\right)
$$

The CPU cost using a heap for each k way merge is:

$$
O(N \log k \log_k R)
$$

With loser trees or specialized merge networks, the comparison cost can be reduced in practice.

## Difference from Balanced K Way Merge Sort

| algorithm                 | scheduling style                  |
| ------------------------- | --------------------------------- |
| Balanced k way merge sort | complete global merge passes      |
| Cascade merge sort        | staged merging as runs accumulate |

Balanced merging is easier to reason about when all runs already exist. Cascade merging is useful when runs are produced over time or when merging should proceed incrementally.

## When to Use

Use cascade merge sort when:

* sorted runs are generated incrementally
* you want staged merging instead of global passes
* memory is bounded
* sequential I/O is preferred
* merge work should be spread over time

It is useful as a model for external sort pipelines, log compaction, and systems that continuously merge sorted files.

## Implementation Sketch

```python id="mp8w95"
import heapq

def merge_runs(runs):
    heap = []
    iters = [iter(run) for run in runs]

    for i, it in enumerate(iters):
        try:
            heapq.heappush(heap, (next(it), i))
        except StopIteration:
            pass

    out = []

    while heap:
        value, i = heapq.heappop(heap)
        out.append(value)

        try:
            heapq.heappush(heap, (next(iters[i]), i))
        except StopIteration:
            pass

    return out
```

```python id="4pi0cg"
def cascade_merge_sort(initial_runs, fan_in):
    levels = []

    def add_to_level(level, run):
        while len(levels) <= level:
            levels.append([])

        levels[level].append(run)

        if len(levels[level]) == fan_in:
            merged = merge_runs(levels[level])
            levels[level] = []
            add_to_level(level + 1, merged)

    for run in initial_runs:
        add_to_level(0, run)

    remaining = []
    for level in levels:
        remaining.extend(level)

    while len(remaining) > 1:
        remaining = [
            merge_runs(remaining[i:i + fan_in])
            for i in range(0, len(remaining), fan_in)
        ]

    return remaining[0] if remaining else []
```

## Notes

Cascade merge sort is best understood as a merge scheduling strategy. The merge operation itself is the same as external merge sort. The difference is that merging happens through a staged cascade rather than through rigid full-file passes.

