Merge sort variant designed to improve cache locality by structuring recursion and merging to fit memory hierarchies.
Cache efficient merge sort improves merge sort by reducing cache misses. It reorganizes recursion and merging so that working data fits into cache levels.
The goal is to minimize memory transfers rather than just comparisons.
Problem
Given an array of length , reorder it such that:
while improving cache locality.
Idea
Standard merge sort performs well in theory but may suffer from poor cache usage due to repeated passes over large memory regions.
Cache efficient variants aim to:
- divide the problem until subarrays fit into cache
- perform merges on contiguous memory blocks
- reduce repeated scanning of large arrays
A common approach uses blocked recursion:
- recursively sort subarrays until they fit into cache
- merge subarrays in a way that maximizes sequential access
Algorithm
cache_merge_sort(A, lo, hi):
if hi - lo <= threshold:
insertion_sort(A, lo, hi)
return
mid = (lo + hi) / 2
cache_merge_sort(A, lo, mid)
cache_merge_sort(A, mid + 1, hi)
cache_merge(A, lo, mid, hi)The merge step is optimized for locality.
cache_merge(A, lo, mid, hi):
copy A[lo..mid] into buffer
i = 0
j = mid + 1
k = lo
while i < len(buffer) and j <= hi:
if buffer[i] <= A[j]:
A[k] = buffer[i]
i += 1
else:
A[k] = A[j]
j += 1
k += 1
copy remaining buffer elements backExample
Let:
Divide recursively until small subarrays fit in cache:
- sort small blocks locally
- merge them sequentially
Final result:
Correctness
Each recursive call sorts a subarray. The merge step combines two sorted subarrays into one sorted array. The cache optimizations do not change ordering logic, so correctness follows directly from merge sort.
Complexity
| metric | value |
|---|---|
| time | |
| cache complexity | improved |
| extra space |
The number of cache misses is reduced compared to naive merge sort.
Properties
| property | value |
|---|---|
| stable | yes |
| in-place | no |
| cache locality | improved |
| sequential access | high |
Notes
Cache efficient merge sort focuses on memory hierarchy rather than asymptotic complexity. It is important for large datasets where memory bandwidth dominates performance.
This idea leads to more advanced designs such as cache oblivious algorithms and external memory sorting.
Implementation
def cache_merge_sort(a):
def sort(lo, hi):
if hi - lo <= 16:
insertion_sort(lo, hi)
return
mid = (lo + hi) // 2
sort(lo, mid)
sort(mid + 1, hi)
merge(lo, mid, hi)
def insertion_sort(lo, hi):
for i in range(lo + 1, hi + 1):
key = a[i]
j = i - 1
while j >= lo and a[j] > key:
a[j + 1] = a[j]
j -= 1
a[j + 1] = key
def merge(lo, mid, hi):
left = a[lo:mid + 1]
i = 0
j = mid + 1
k = lo
while i < len(left) and j <= hi:
if left[i] <= a[j]:
a[k] = left[i]
i += 1
else:
a[k] = a[j]
j += 1
k += 1
while i < len(left):
a[k] = left[i]
i += 1
k += 1
sort(0, len(a) - 1)
return a