Cache Oblivious Merge Sort

Cache oblivious merge sort is a merge sort variant designed for memory hierarchies. It improves locality without knowing the cache size, block size, or number of cache levels.

The algorithm uses recursive structure to make subproblems eventually fit into cache automatically.

Problem

Given an array $A$ of length $n$ , reorder it such that:

A[0] \le A[1] \le \dots \le A[n-1]

while reducing cache misses across unknown memory hierarchies.

Idea

A cache aware algorithm uses parameters such as block size or cache size. A cache oblivious algorithm avoids such parameters.

Cache oblivious merge sort relies on divide and conquer:

small recursive subproblems eventually fit in cache
recursive merging improves locality
no machine specific threshold is required for correctness

The key challenge is merging. A cache oblivious merge divides the merge problem recursively instead of scanning large regions in one flat pass.

Algorithm

cache_oblivious_merge_sort(A, lo, hi):
    if lo >= hi:
        return

    mid = (lo + hi) // 2

    cache_oblivious_merge_sort(A, lo, mid)
    cache_oblivious_merge_sort(A, mid + 1, hi)

    cache_oblivious_merge(A, lo, mid, hi)

A recursive merge splits one sorted range and binary searches into the other.

cache_oblivious_merge(A, lo, mid, hi):
    if range is small:
        ordinary_merge(A, lo, mid, hi)
        return

    choose middle element from larger half
    find its rank in the other half by binary search

    recursively merge lower parts
    recursively merge upper parts

Example

Let:

A = [1, 4, 7, 10, 2, 3, 8, 9]

The two halves are sorted:

[1,4,7,10] \quad [2,3,8,9]

A recursive merge may choose $7$ from the left half and locate its position in the right half:

[2,3] < 7 < [8,9]

This splits the merge into smaller independent merge problems.

Final result:

[1,2,3,4,7,8,9,10]

Correctness

The recursive sort step produces two sorted halves. The recursive merge partitions the two halves by rank: all elements in the lower subproblem are less than or equal to all elements in the upper subproblem. Recursively merging each part and concatenating them yields a sorted range. By induction on range size, the full array is sorted.

Complexity

metric	value
time	$O(n \log n)$
extra space	usually $O(n)$
cache behavior	asymptotically efficient
tuning parameters	none

In the cache oblivious model, good implementations can achieve near optimal cache complexity for sorting.

Properties

property	value
stable	yes, with stable merge
in-place	usually no
cache aware	no
cache oblivious	yes
implementation difficulty	high

Notes

Cache oblivious merge sort matters when the same algorithm must perform well across several cache levels without manual tuning.

It is more complex than ordinary merge sort. In practical code, tuned cache aware sorts or library sorts often perform better due to lower constants.

Implementation

This implementation keeps the interface simple and uses an ordinary merge at the leaves. It illustrates the sort structure, not a fully optimized cache oblivious merge.

def cache_oblivious_merge_sort(a):
    def sort(lo, hi):
        if lo >= hi:
            return

        mid = (lo + hi) // 2
        sort(lo, mid)
        sort(mid + 1, hi)
        merge(lo, mid, hi)

    def merge(lo, mid, hi):
        left = a[lo:mid + 1]
        right = a[mid + 1:hi + 1]

        i = j = 0
        k = lo

        while i < len(left) and j < len(right):
            if left[i] <= right[j]:
                a[k] = left[i]
                i += 1
            else:
                a[k] = right[j]
                j += 1
            k += 1

        while i < len(left):
            a[k] = left[i]
            i += 1
            k += 1

        while j < len(right):
            a[k] = right[j]
            j += 1
            k += 1

    sort(0, len(a) - 1)
    return a

func CacheObliviousMergeSort(a []int) {
	var sort func(int, int)
	sort = func(lo, hi int) {
		if lo >= hi {
			return
		}

		mid := (lo + hi) / 2
		sort(lo, mid)
		sort(mid+1, hi)
		cacheObliviousMerge(a, lo, mid, hi)
	}

	sort(0, len(a)-1)
}

func cacheObliviousMerge(a []int, lo, mid, hi int) {
	left := append([]int(nil), a[lo:mid+1]...)
	right := append([]int(nil), a[mid+1:hi+1]...)

	i, j, k := 0, 0, lo

	for i < len(left) && j < len(right) {
		if left[i] <= right[j] {
			a[k] = left[i]
			i++
		} else {
			a[k] = right[j]
			j++
		}
		k++
	}

	for i < len(left) {
		a[k] = left[i]
		i++
		k++
	}

	for j < len(right) {
		a[k] = right[j]
		j++
		k++
	}
}