Skip to content

Cache Aware Sorting

Sorting techniques that explicitly use knowledge of cache size and block size to minimize memory hierarchy cost.

Cache aware sorting refers to algorithms that are designed with explicit knowledge of the memory hierarchy, especially cache size and cache line (block) size. The goal is to minimize cache misses by structuring memory access patterns carefully.

This model is more detailed than the standard RAM model and closer to real hardware behavior.

Model

Assume a two-level memory model:

  • Fast memory (cache) of size MM
  • Slow memory (main memory or disk)
  • Data is transferred in blocks of size BB

The cost of an algorithm is measured in number of block transfers (cache misses), not just comparisons.

Goal

Minimize:

cache missesblock transfers \text{cache misses} \approx \text{block transfers}

instead of only minimizing comparisons.

Core Idea

Cache aware algorithms tune their behavior using MM and BB:

  • process data in blocks that fit in cache
  • avoid random memory access
  • favor sequential scans
  • use tiling and blocking techniques

Example: Blocked Merge Sort

A simple cache aware variant of merge sort uses buffers that fit into cache.

cache_aware_merge_sort(A):
    if size(A) <= threshold:
        sort in memory
        return

    split A into A_left and A_right

    cache_aware_merge_sort(A_left)
    cache_aware_merge_sort(A_right)

    merge using buffers sized to fit cache

During merging, input streams and output buffers are sized so that:

  • all buffers fit in cache
  • block transfers are sequential

Example: Blocked Matrix Style Sorting

Data is divided into tiles:

for each block of size B:
    load block into cache
    sort block locally

merge sorted blocks using cache-sized buffers

I/O Complexity

For comparison-based sorting, the optimal cache complexity is:

O(NBlogMBNB) O\left(\frac{N}{B} \log_{\frac{M}{B}} \frac{N}{B}\right)

Cache aware algorithms aim to match this bound by choosing parameters explicitly.

Design Techniques

techniquepurpose
blockingkeep working set inside cache
tilingprocess subproblems that fit cache
bufferingreduce repeated memory access
sequential accessreduce cache misses
prefetchingoverlap memory latency

When to Use

Cache aware sorting is useful when:

  • input size is large relative to cache
  • memory hierarchy dominates performance
  • algorithm runs on predictable hardware
  • tuning parameters MM and BB is feasible

It is common in high-performance libraries and database systems.

Comparison with Cache Oblivious Sorting

propertycache awarecache oblivious
uses MM, BB explicitlyyesno
portabilitylowerhigher
tuning effortmanualautomatic
theoretical optimalityachievablealso achievable

Cache oblivious algorithms achieve similar bounds without explicit parameters, but cache aware algorithms can perform better in practice when tuned.

Implementation Sketch

def blocked_sort(a, block_size):
    n = len(a)

    # sort blocks
    for i in range(0, n, block_size):
        a[i:i + block_size] = sorted(a[i:i + block_size])

    # simple k-way merge of blocks
    import heapq

    blocks = [a[i:i + block_size] for i in range(0, n, block_size)]
    heap = []
    iters = [iter(b) for b in blocks]

    for i, it in enumerate(iters):
        try:
            heapq.heappush(heap, (next(it), i))
        except StopIteration:
            pass

    result = []

    while heap:
        val, i = heapq.heappop(heap)
        result.append(val)
        try:
            heapq.heappush(heap, (next(iters[i]), i))
        except StopIteration:
            pass

    return result

Notes

Cache aware sorting is about engineering memory access patterns. The comparison logic may stay identical to classical algorithms, but performance improves by reducing costly memory transfers.

Modern CPUs with multiple cache levels make this approach important for high-throughput systems.