Merge sort variant that improves locality without tuning parameters for a specific cache size.
Cache oblivious merge sort is a merge sort variant designed for memory hierarchies. It improves locality without knowing the cache size, block size, or number of cache levels.
The algorithm uses recursive structure to make subproblems eventually fit into cache automatically.
Problem
Given an array of length , reorder it such that:
while reducing cache misses across unknown memory hierarchies.
Idea
A cache aware algorithm uses parameters such as block size or cache size. A cache oblivious algorithm avoids such parameters.
Cache oblivious merge sort relies on divide and conquer:
- small recursive subproblems eventually fit in cache
- recursive merging improves locality
- no machine specific threshold is required for correctness
The key challenge is merging. A cache oblivious merge divides the merge problem recursively instead of scanning large regions in one flat pass.
Algorithm
cache_oblivious_merge_sort(A, lo, hi):
if lo >= hi:
return
mid = (lo + hi) // 2
cache_oblivious_merge_sort(A, lo, mid)
cache_oblivious_merge_sort(A, mid + 1, hi)
cache_oblivious_merge(A, lo, mid, hi)A recursive merge splits one sorted range and binary searches into the other.
cache_oblivious_merge(A, lo, mid, hi):
if range is small:
ordinary_merge(A, lo, mid, hi)
return
choose middle element from larger half
find its rank in the other half by binary search
recursively merge lower parts
recursively merge upper partsExample
Let:
The two halves are sorted:
A recursive merge may choose from the left half and locate its position in the right half:
This splits the merge into smaller independent merge problems.
Final result:
Correctness
The recursive sort step produces two sorted halves. The recursive merge partitions the two halves by rank: all elements in the lower subproblem are less than or equal to all elements in the upper subproblem. Recursively merging each part and concatenating them yields a sorted range. By induction on range size, the full array is sorted.
Complexity
| metric | value |
|---|---|
| time | |
| extra space | usually |
| cache behavior | asymptotically efficient |
| tuning parameters | none |
In the cache oblivious model, good implementations can achieve near optimal cache complexity for sorting.
Properties
| property | value |
|---|---|
| stable | yes, with stable merge |
| in-place | usually no |
| cache aware | no |
| cache oblivious | yes |
| implementation difficulty | high |
Notes
Cache oblivious merge sort matters when the same algorithm must perform well across several cache levels without manual tuning.
It is more complex than ordinary merge sort. In practical code, tuned cache aware sorts or library sorts often perform better due to lower constants.
Implementation
This implementation keeps the interface simple and uses an ordinary merge at the leaves. It illustrates the sort structure, not a fully optimized cache oblivious merge.
def cache_oblivious_merge_sort(a):
def sort(lo, hi):
if lo >= hi:
return
mid = (lo + hi) // 2
sort(lo, mid)
sort(mid + 1, hi)
merge(lo, mid, hi)
def merge(lo, mid, hi):
left = a[lo:mid + 1]
right = a[mid + 1:hi + 1]
i = j = 0
k = lo
while i < len(left) and j < len(right):
if left[i] <= right[j]:
a[k] = left[i]
i += 1
else:
a[k] = right[j]
j += 1
k += 1
while i < len(left):
a[k] = left[i]
i += 1
k += 1
while j < len(right):
a[k] = right[j]
j += 1
k += 1
sort(0, len(a) - 1)
return afunc CacheObliviousMergeSort(a []int) {
var sort func(int, int)
sort = func(lo, hi int) {
if lo >= hi {
return
}
mid := (lo + hi) / 2
sort(lo, mid)
sort(mid+1, hi)
cacheObliviousMerge(a, lo, mid, hi)
}
sort(0, len(a)-1)
}
func cacheObliviousMerge(a []int, lo, mid, hi int) {
left := append([]int(nil), a[lo:mid+1]...)
right := append([]int(nil), a[mid+1:hi+1]...)
i, j, k := 0, 0, lo
for i < len(left) && j < len(right) {
if left[i] <= right[j] {
a[k] = left[i]
i++
} else {
a[k] = right[j]
j++
}
k++
}
for i < len(left) {
a[k] = left[i]
i++
k++
}
for j < len(right) {
a[k] = right[j]
j++
k++
}
}