Skip to content

Tape Merge Sort

External sorting algorithm that distributes sorted runs across sequential storage tapes and repeatedly merges them.

Tape merge sort is an external sorting algorithm designed for sequential storage. It was historically used with magnetic tapes, where random access was expensive or unavailable. The algorithm writes sorted runs to multiple tapes, then repeatedly merges runs from input tapes onto output tapes.

The same idea still matters for sequential external storage: make long sequential reads and writes, avoid random seeks, and reduce the number of full passes over the data.

Problem

Given a dataset too large to fit in memory, sort it using only sequential reads and writes.

The input is divided into sorted runs. These runs are then merged until one sorted run remains.

Core Idea

Tape merge sort uses two phases:

  1. Run generation Read chunks into memory, sort them, and write sorted runs to output tapes.

  2. Tape merging Merge runs from input tapes to output tapes. After each pass, swap input and output tape roles.

Algorithm

tape_merge_sort(input, memory_size, tapes):
    runs = generate_sorted_runs(input, memory_size)
    distribute runs across input tapes

    while more than one run remains:
        merge runs from input tapes onto output tapes
        rewind tapes
        swap input tapes and output tapes

    return final run

A two-way tape merge uses two input tapes and one output tape.

merge_pass(input_tape_1, input_tape_2, output_tape):
    while both input tapes have runs:
        merge one run from input_tape_1
        with one run from input_tape_2
        write merged run to output_tape

Example

Suppose initial sorted runs are:

R1,R2,R3,R4 R_1, R_2, R_3, R_4

Distribute them across two input tapes:

taperuns
T1R1,R3R_1, R_3
T2R2,R4R_2, R_4

First merge pass:

output tapemerged runs
T3merge(R1,R2R_1, R_2), merge(R3,R4R_3, R_4)

Now there are two larger runs. A second pass merges those into one final run.

Correctness

Each initial run is sorted. A merge of two sorted runs produces a larger sorted run by repeatedly selecting the smaller current head element.

Each merge pass preserves all records and reduces the number of runs. Since every pass produces sorted runs and the process stops only when one run remains, the final run contains all input records in sorted order.

Complexity

Let:

  • NN be the number of records
  • RR be the number of initial runs
  • BB be the block size
  • kk be the merge fan-in

Number of merge passes:

O(logkR) O(\log_k R)

I/O cost:

O(NBlogkR) O\left(\frac{N}{B} \log_k R\right)

CPU cost for k-way merging:

O(Nlogk) O(N \log k)

per merge level when using a heap.

Tape Scheduling

Tape merge sort differs from ordinary external merge sort mainly in scheduling.

scheduleidea
balanced mergedistribute runs evenly across tapes
polyphase mergedistribute runs unevenly to reduce idle tapes
cascade mergepass merged runs through staged levels

For real tape devices, minimizing rewinds and redistribution can dominate performance.

When to Use

Tape merge sort is useful when:

  • storage supports fast sequential access but poor random access
  • data is much larger than memory
  • runs must be processed in passes
  • input and output streams are naturally sequential

It is mostly historical as a named method, but its I/O discipline remains relevant for external sorting and streaming pipelines.

Implementation Sketch

def merge_two_runs(a, b):
    i = 0
    j = 0
    out = []

    while i < len(a) and j < len(b):
        if a[i] <= b[j]:
            out.append(a[i])
            i += 1
        else:
            out.append(b[j])
            j += 1

    out.extend(a[i:])
    out.extend(b[j:])
    return out
def tape_merge_sort_runs(runs):
    while len(runs) > 1:
        next_runs = []

        for i in range(0, len(runs), 2):
            if i + 1 < len(runs):
                next_runs.append(merge_two_runs(runs[i], runs[i + 1]))
            else:
                next_runs.append(runs[i])

        runs = next_runs

    return runs[0] if runs else []

Notes

Tape merge sort is best understood as merge sort under sequential-storage constraints. The sorting rule is ordinary merging. The important part is how runs are placed, read, written, rewound, and redistributed across sequential devices.