Skip to content

Cache Aware Binary Search

Arrange and search sorted data with explicit knowledge of cache lines or memory blocks.

Cache aware binary search modifies ordinary binary search to account for the memory hierarchy. The algorithm still uses ordering and comparisons, but the data layout or search strategy is chosen with explicit knowledge of cache line size, block size, or page size.

The goal is to reduce cache misses. On modern hardware, a comparison is cheap, while an unpredictable memory load can be expensive.

Problem

Given a sorted collection AA and a target value xx, find the first index ii such that

A[i]x A[i] \ge x

or report that no such index exists.

Idea

Ordinary binary search jumps to the middle of the current range. These jumps can touch distant memory locations, especially for large arrays.

A cache aware version divides the data into blocks that match the cache or storage block size. It first searches among block representatives, then searches inside one block.

For block size BB, store:

  • a top level array of block maximums or separator keys
  • contiguous sorted blocks of up to BB elements

Algorithm

cache_aware_binary_search(blocks, separators, x):
    b = lower_bound(separators, x)

    if b == length(blocks):
        return -1

    j = lower_bound(blocks[b], x)

    if j == length(blocks[b]):
        return -1

    return global_index(b, j)

The first search finds the block that may contain the answer. The second search runs within a cache sized block.

Example

Let the sorted values be split into blocks of size 44:

blockvaluesseparator
0[3, 6, 9, 12]12
1[15, 18, 21, 24]24
2[27, 30, 33, 36]36

Search for:

x=20 x = 20

First search the separators:

separator indexvaluerelation
124candidate block

Then search block 1:

local indexvaluerelation
015< x
118< x
221>= x

The lower bound is the global index:

4+2=6 4 + 2 = 6

Correctness

The separator for each block is the maximum value in that block. The first lower bound over separators finds the first block whose maximum is at least xx. All earlier blocks contain only values smaller than xx, so they cannot contain the lower bound.

The second lower bound inside that block finds the first value greater than or equal to xx within the only possible block. Therefore the combined result is the global lower bound.

Complexity

Let nn be the total number of elements and BB be the block size.

parttime
search separatorsO(log(n/B))O(\log(n/B))
search inside blockO(logB)O(\log B)
total comparisonsO(logn)O(\log n)

Memory transfers are often lower than ordinary binary search because the second phase stays inside one block.

Space usage is:

O(n) O(n)

including the data and separators.

When to Use

Use cache aware binary search when:

  • the array is large
  • search throughput matters
  • the block size is known
  • the data is mostly static
  • memory locality dominates comparison cost

This technique is common in databases, file indexes, B trees, and sorted run search.

Implementation

from bisect import bisect_left

def build_cache_aware_blocks(a, block_size):
    blocks = []
    separators = []

    for i in range(0, len(a), block_size):
        block = a[i:i + block_size]
        blocks.append(block)
        separators.append(block[-1])

    return blocks, separators

def cache_aware_binary_search(blocks, separators, x):
    b = bisect_left(separators, x)

    if b == len(blocks):
        return -1

    j = bisect_left(blocks[b], x)

    if j == len(blocks[b]):
        return -1

    return sum(len(block) for block in blocks[:b]) + j
func lowerBound(a []int, x int) int {
    left, right := 0, len(a)

    for left < right {
        mid := (left + right) / 2
        if a[mid] < x {
            left = mid + 1
        } else {
            right = mid
        }
    }

    return left
}

func CacheAwareBinarySearch(blocks [][]int, separators []int, x int) int {
    b := lowerBound(separators, x)
    if b == len(blocks) {
        return -1
    }

    j := lowerBound(blocks[b], x)
    if j == len(blocks[b]) {
        return -1
    }

    offset := 0
    for i := 0; i < b; i++ {
        offset += len(blocks[i])
    }

    return offset + j
}