Skip to content

Cache Efficient Merge Sort

Merge sort variant designed to improve cache locality by structuring recursion and merging to fit memory hierarchies.

Cache efficient merge sort improves merge sort by reducing cache misses. It reorganizes recursion and merging so that working data fits into cache levels.

The goal is to minimize memory transfers rather than just comparisons.

Problem

Given an array AA of length nn, reorder it such that:

A[0]A[1]A[n1] A[0] \le A[1] \le \dots \le A[n-1]

while improving cache locality.

Idea

Standard merge sort performs well in theory but may suffer from poor cache usage due to repeated passes over large memory regions.

Cache efficient variants aim to:

  • divide the problem until subarrays fit into cache
  • perform merges on contiguous memory blocks
  • reduce repeated scanning of large arrays

A common approach uses blocked recursion:

  1. recursively sort subarrays until they fit into cache
  2. merge subarrays in a way that maximizes sequential access

Algorithm

cache_merge_sort(A, lo, hi):
    if hi - lo <= threshold:
        insertion_sort(A, lo, hi)
        return

    mid = (lo + hi) / 2

    cache_merge_sort(A, lo, mid)
    cache_merge_sort(A, mid + 1, hi)

    cache_merge(A, lo, mid, hi)

The merge step is optimized for locality.

cache_merge(A, lo, mid, hi):
    copy A[lo..mid] into buffer

    i = 0
    j = mid + 1
    k = lo

    while i < len(buffer) and j <= hi:
        if buffer[i] <= A[j]:
            A[k] = buffer[i]
            i += 1
        else:
            A[k] = A[j]
            j += 1
        k += 1

    copy remaining buffer elements back

Example

Let:

A=[8,3,7,1,6,2,5,4] A = [8, 3, 7, 1, 6, 2, 5, 4]

Divide recursively until small subarrays fit in cache:

  • sort small blocks locally
  • merge them sequentially

Final result:

[1,2,3,4,5,6,7,8] [1,2,3,4,5,6,7,8]

Correctness

Each recursive call sorts a subarray. The merge step combines two sorted subarrays into one sorted array. The cache optimizations do not change ordering logic, so correctness follows directly from merge sort.

Complexity

metricvalue
timeO(nlogn)O(n \log n)
cache complexityimproved
extra spaceO(n)O(n)

The number of cache misses is reduced compared to naive merge sort.

Properties

propertyvalue
stableyes
in-placeno
cache localityimproved
sequential accesshigh

Notes

Cache efficient merge sort focuses on memory hierarchy rather than asymptotic complexity. It is important for large datasets where memory bandwidth dominates performance.

This idea leads to more advanced designs such as cache oblivious algorithms and external memory sorting.

Implementation

def cache_merge_sort(a):
    def sort(lo, hi):
        if hi - lo <= 16:
            insertion_sort(lo, hi)
            return

        mid = (lo + hi) // 2
        sort(lo, mid)
        sort(mid + 1, hi)
        merge(lo, mid, hi)

    def insertion_sort(lo, hi):
        for i in range(lo + 1, hi + 1):
            key = a[i]
            j = i - 1
            while j >= lo and a[j] > key:
                a[j + 1] = a[j]
                j -= 1
            a[j + 1] = key

    def merge(lo, mid, hi):
        left = a[lo:mid + 1]
        i = 0
        j = mid + 1
        k = lo

        while i < len(left) and j <= hi:
            if left[i] <= a[j]:
                a[k] = left[i]
                i += 1
            else:
                a[k] = a[j]
                j += 1
            k += 1

        while i < len(left):
            a[k] = left[i]
            i += 1
            k += 1

    sort(0, len(a) - 1)
    return a