Cache Aware Sorting

Cache aware sorting refers to algorithms that are designed with explicit knowledge of the memory hierarchy, especially cache size and cache line (block) size. The goal is to minimize cache misses by structuring memory access patterns carefully.

This model is more detailed than the standard RAM model and closer to real hardware behavior.

Model

Assume a two-level memory model:

Fast memory (cache) of size $M$
Slow memory (main memory or disk)
Data is transferred in blocks of size $B$

The cost of an algorithm is measured in number of block transfers (cache misses), not just comparisons.

Goal

Minimize:

\text{cache misses} \approx \text{block transfers}

instead of only minimizing comparisons.

Core Idea

Cache aware algorithms tune their behavior using $M$ and $B$ :

process data in blocks that fit in cache
avoid random memory access
favor sequential scans
use tiling and blocking techniques

Example: Blocked Merge Sort

A simple cache aware variant of merge sort uses buffers that fit into cache.

cache_aware_merge_sort(A):
    if size(A) <= threshold:
        sort in memory
        return

    split A into A_left and A_right

    cache_aware_merge_sort(A_left)
    cache_aware_merge_sort(A_right)

    merge using buffers sized to fit cache

During merging, input streams and output buffers are sized so that:

all buffers fit in cache
block transfers are sequential

Example: Blocked Matrix Style Sorting

Data is divided into tiles:

for each block of size B:
    load block into cache
    sort block locally

merge sorted blocks using cache-sized buffers

I/O Complexity

For comparison-based sorting, the optimal cache complexity is:

O\left(\frac{N}{B} \log_{\frac{M}{B}} \frac{N}{B}\right)

Cache aware algorithms aim to match this bound by choosing parameters explicitly.

Design Techniques

technique	purpose
blocking	keep working set inside cache
tiling	process subproblems that fit cache
buffering	reduce repeated memory access
sequential access	reduce cache misses
prefetching	overlap memory latency

When to Use

Cache aware sorting is useful when:

input size is large relative to cache
memory hierarchy dominates performance
algorithm runs on predictable hardware
tuning parameters $M$ and $B$ is feasible

It is common in high-performance libraries and database systems.

Comparison with Cache Oblivious Sorting

property	cache aware	cache oblivious
uses $M$ , $B$ explicitly	yes	no
portability	lower	higher
tuning effort	manual	automatic
theoretical optimality	achievable	also achievable

Cache oblivious algorithms achieve similar bounds without explicit parameters, but cache aware algorithms can perform better in practice when tuned.

Implementation Sketch

def blocked_sort(a, block_size):
    n = len(a)

    # sort blocks
    for i in range(0, n, block_size):
        a[i:i + block_size] = sorted(a[i:i + block_size])

    # simple k-way merge of blocks
    import heapq

    blocks = [a[i:i + block_size] for i in range(0, n, block_size)]
    heap = []
    iters = [iter(b) for b in blocks]

    for i, it in enumerate(iters):
        try:
            heapq.heappush(heap, (next(it), i))
        except StopIteration:
            pass

    result = []

    while heap:
        val, i = heapq.heappop(heap)
        result.append(val)
        try:
            heapq.heappush(heap, (next(iters[i]), i))
        except StopIteration:
            pass

    return result

Notes

Cache aware sorting is about engineering memory access patterns. The comparison logic may stay identical to classical algorithms, but performance improves by reducing costly memory transfers.

Modern CPUs with multiple cache levels make this approach important for high-throughput systems.