# Memory Mapped External Sort

# Memory Mapped External Sort

Memory mapped external sort uses memory-mapped files to sort data larger than ordinary working memory. Instead of manually reading and writing file blocks with explicit I/O calls, the program maps file regions into virtual memory and lets the operating system page data in and out.

This can simplify implementation, but it does not remove the external-memory problem. The algorithm still needs chunking, sorted runs, and merging to avoid random page faults and excessive memory pressure.

## Problem

Given a large file of records, sort the records without loading the entire file into normal memory.

The file may be larger than available RAM, but the operating system can map file pages into the process address space.

## Core Idea

Use memory mapping for file access, but keep the sorting pattern external-memory friendly:

1. map a region of the input file
2. sort a chunk that fits comfortably in memory
3. write or map the sorted chunk as a run
4. merge the sorted runs sequentially

The memory map is an access mechanism. The sorting strategy is still external merge sort.

## Algorithm

```text id="v8s2nd"
memory_mapped_external_sort(file, chunk_size):
    runs = []

    for each chunk region in file:
        A = map chunk region
        sort A in memory
        run = write sorted chunk
        runs.append(run)
        unmap A

    return merge_runs(runs)
```

The merge phase should read runs sequentially.

```text id="uwfh8c"
merge_mapped_runs(runs):
    map small windows from each run
    keep current value from each run in a heap

    while heap is not empty:
        output minimum value
        advance inside that run window

        if window is exhausted:
            map next window from that run
```

## Example

Suppose the file contains:

$$
[8, 3, 5, 1, 9, 2, 7, 4]
$$

and memory allows chunks of 3 records.

Run generation:

| chunk     | sorted run |
| --------- | ---------- |
| [8, 3, 5] | [3, 5, 8]  |
| [1, 9, 2] | [1, 2, 9]  |
| [7, 4]    | [4, 7]     |

Merge the runs:

$$
[3,5,8], [1,2,9], [4,7]
$$

Final output:

$$
[1,2,3,4,5,7,8,9]
$$

## Correctness

Each mapped chunk is sorted before being written as a run. Therefore every generated run is sorted.

The merge phase repeatedly emits the smallest current element among all run heads. Since each run is sorted, no later element in a run can be smaller than its current head. Thus every emitted element is the smallest remaining element overall.

All records belong to exactly one chunk and then exactly one run. The final merge emits every record once, so the output is a sorted permutation of the input.

## Complexity

Let:

* $N$ be the number of records
* $M$ be the chosen chunk size
* $R = \lceil N/M \rceil$ be the number of runs
* $B$ be the page or block size
* $k$ be the merge fan-in

CPU cost for run generation:

$$
O(N \log M)
$$

Merge CPU cost:

$$
O(N \log k)
$$

per merge level.

I/O cost is similar to external merge sort:

$$
O\left(\frac{N}{B} \log_k R\right)
$$

The actual cost depends heavily on page faults, readahead, dirty-page writeback, and mapping granularity.

## Design Choices

| choice             | effect                                    |
| ------------------ | ----------------------------------------- |
| chunk size         | controls memory pressure and run count    |
| mapped window size | controls page-fault behavior during merge |
| sequential access  | improves OS readahead                     |
| random access      | may cause many page faults                |
| explicit unmapping | releases address space and page pressure  |
| temp run layout    | affects merge locality                    |

## When to Use

Use memory mapped external sort when:

* the data is file-backed
* record layout is fixed-width or easy to address
* the platform has good memory-mapping support
* implementation simplicity matters
* sequential access can be preserved

Avoid relying on memory mapping alone for huge random-access sorts. The operating system may thrash if the working set is much larger than RAM.

## Implementation Sketch

This simplified example uses arrays, but the structure matches a memory-mapped external sort.

```python id="sq2m9d"
def make_sorted_runs(data, chunk_size):
    runs = []

    for i in range(0, len(data), chunk_size):
        chunk = data[i:i + chunk_size]
        runs.append(sorted(chunk))

    return runs
```

```python id="ec6yp7"
import heapq

def merge_runs(runs):
    heap = []
    iters = [iter(run) for run in runs]

    for run_id, it in enumerate(iters):
        try:
            heapq.heappush(heap, (next(it), run_id))
        except StopIteration:
            pass

    out = []

    while heap:
        value, run_id = heapq.heappop(heap)
        out.append(value)

        try:
            nxt = next(iters[run_id])
            heapq.heappush(heap, (nxt, run_id))
        except StopIteration:
            pass

    return out
```

```python id="uam7cv"
def memory_mapped_external_sort_model(data, chunk_size):
    runs = make_sorted_runs(data, chunk_size)
    return merge_runs(runs)
```

## Notes

Memory mapped external sort is mostly an engineering variant of external merge sort. The main decision is not whether mapping is possible, but whether the algorithm still gives the operating system predictable sequential access.

