Skip to content

Cache Oblivious Sorting

Sorting techniques that achieve good cache behavior without explicitly knowing cache size or block size.

Cache oblivious sorting uses recursive structure to obtain efficient memory hierarchy behavior without knowing the cache size MM or block size BB. The algorithm is written as if memory were flat, but its recursive subproblems eventually fit into every cache level.

This approach is useful when the same implementation should perform well across different machines and storage hierarchies.

Model

The cache oblivious model assumes:

  • memory is transferred in blocks of size BB
  • cache has capacity MM
  • the algorithm does not know MM or BB

The algorithm is analyzed by counting block transfers.

Core Idea

A cache oblivious sorting algorithm repeatedly divides the problem into smaller subproblems. Once a subproblem becomes small enough to fit in cache, it can be processed with few cache misses.

The implementation does not test for cache size. The analysis only proves that such a point exists.

Example: Recursive Merge Sort

A basic recursive merge sort has some cache-oblivious behavior because it divides the array recursively.

cache_oblivious_merge_sort(A):
    if length(A) <= 1:
        return A

    mid = length(A) / 2

    L = cache_oblivious_merge_sort(A[0:mid])
    R = cache_oblivious_merge_sort(A[mid:length(A)])

    return merge(L, R)

However, ordinary merge sort alone may not achieve the optimal cache-oblivious sorting bound unless the merge procedure is also cache efficient.

Optimal Cache Bound

Cache oblivious sorting aims for the external-memory sorting bound:

O(NBlogMBNB) O\left(\frac{N}{B} \log_{\frac{M}{B}} \frac{N}{B}\right)

This is the number of block transfers needed by an optimal comparison sort in the cache model.

Funnel Sort

Funnel sort is the classic cache oblivious sorting algorithm. It recursively sorts parts of the input, then merges them using a cache-oblivious merger called a funnel.

At a high level:

funnel_sort(A):
    split A into subarrays
    recursively funnel_sort each subarray
    merge sorted subarrays using a funnel merger

The funnel merger is designed so that data moves through recursive buffers efficiently across all cache levels.

Correctness

Cache oblivious sorting uses the same correctness structure as merge sort.

Each base case of size 0 or 1 is sorted. Each recursive call returns a sorted subarray by induction. The merge step combines sorted subarrays into one sorted output. Therefore the final array is sorted.

The cache-oblivious property affects performance, not the logical ordering rule.

Complexity

measurebound
comparisonsO(NlogN)O(N \log N)
cache missesO(NBlogMBNB)O\left(\frac{N}{B} \log_{\frac{M}{B}} \frac{N}{B}\right)
extra spacedepends on implementation

The cache-miss bound assumes the tall-cache condition, commonly written as:

M=Ω(B2) M = \Omega(B^2)

Cache Aware vs Cache Oblivious

propertycache aware sortingcache oblivious sorting
knows cache size MMyesno
knows block size BByesno
tuninghardware-specificportable
methodblocking by parametersrecursive locality
performance goalfewer block transfersfewer block transfers

When to Use

Use cache oblivious sorting when:

  • hardware parameters are unknown
  • the same code must run across many machines
  • multiple cache levels matter
  • recursive algorithms are acceptable
  • asymptotic cache behavior matters

For highly tuned production systems, cache aware implementations may still win by using platform-specific block sizes, vectorization, and prefetching.

Implementation Sketch

This is a simplified recursive sort. It shows the structure but not a full optimal funnel sort.

def cache_oblivious_sort(a):
    if len(a) <= 1:
        return a

    mid = len(a) // 2

    left = cache_oblivious_sort(a[:mid])
    right = cache_oblivious_sort(a[mid:])

    return merge(left, right)
def merge(left, right):
    i = 0
    j = 0
    out = []

    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            out.append(left[i])
            i += 1
        else:
            out.append(right[j])
            j += 1

    out.extend(left[i:])
    out.extend(right[j:])

    return out

Notes

Cache oblivious sorting separates algorithm design from hardware tuning. The recursive layout gives locality automatically as subproblems shrink. The strongest theoretical result comes from funnel sort, while practical implementations often combine cache-oblivious recursion with cache-aware engineering.