Sorting techniques that explicitly use knowledge of cache size and block size to minimize memory hierarchy cost.
Cache aware sorting refers to algorithms that are designed with explicit knowledge of the memory hierarchy, especially cache size and cache line (block) size. The goal is to minimize cache misses by structuring memory access patterns carefully.
This model is more detailed than the standard RAM model and closer to real hardware behavior.
Model
Assume a two-level memory model:
- Fast memory (cache) of size
- Slow memory (main memory or disk)
- Data is transferred in blocks of size
The cost of an algorithm is measured in number of block transfers (cache misses), not just comparisons.
Goal
Minimize:
instead of only minimizing comparisons.
Core Idea
Cache aware algorithms tune their behavior using and :
- process data in blocks that fit in cache
- avoid random memory access
- favor sequential scans
- use tiling and blocking techniques
Example: Blocked Merge Sort
A simple cache aware variant of merge sort uses buffers that fit into cache.
cache_aware_merge_sort(A):
if size(A) <= threshold:
sort in memory
return
split A into A_left and A_right
cache_aware_merge_sort(A_left)
cache_aware_merge_sort(A_right)
merge using buffers sized to fit cacheDuring merging, input streams and output buffers are sized so that:
- all buffers fit in cache
- block transfers are sequential
Example: Blocked Matrix Style Sorting
Data is divided into tiles:
for each block of size B:
load block into cache
sort block locally
merge sorted blocks using cache-sized buffersI/O Complexity
For comparison-based sorting, the optimal cache complexity is:
Cache aware algorithms aim to match this bound by choosing parameters explicitly.
Design Techniques
| technique | purpose |
|---|---|
| blocking | keep working set inside cache |
| tiling | process subproblems that fit cache |
| buffering | reduce repeated memory access |
| sequential access | reduce cache misses |
| prefetching | overlap memory latency |
When to Use
Cache aware sorting is useful when:
- input size is large relative to cache
- memory hierarchy dominates performance
- algorithm runs on predictable hardware
- tuning parameters and is feasible
It is common in high-performance libraries and database systems.
Comparison with Cache Oblivious Sorting
| property | cache aware | cache oblivious |
|---|---|---|
| uses , explicitly | yes | no |
| portability | lower | higher |
| tuning effort | manual | automatic |
| theoretical optimality | achievable | also achievable |
Cache oblivious algorithms achieve similar bounds without explicit parameters, but cache aware algorithms can perform better in practice when tuned.
Implementation Sketch
def blocked_sort(a, block_size):
n = len(a)
# sort blocks
for i in range(0, n, block_size):
a[i:i + block_size] = sorted(a[i:i + block_size])
# simple k-way merge of blocks
import heapq
blocks = [a[i:i + block_size] for i in range(0, n, block_size)]
heap = []
iters = [iter(b) for b in blocks]
for i, it in enumerate(iters):
try:
heapq.heappush(heap, (next(it), i))
except StopIteration:
pass
result = []
while heap:
val, i = heapq.heappop(heap)
result.append(val)
try:
heapq.heappush(heap, (next(iters[i]), i))
except StopIteration:
pass
return resultNotes
Cache aware sorting is about engineering memory access patterns. The comparison logic may stay identical to classical algorithms, but performance improves by reducing costly memory transfers.
Modern CPUs with multiple cache levels make this approach important for high-throughput systems.