Cache Aware Binary Search

Cache aware binary search modifies ordinary binary search to account for the memory hierarchy. The algorithm still uses ordering and comparisons, but the data layout or search strategy is chosen with explicit knowledge of cache line size, block size, or page size.

The goal is to reduce cache misses. On modern hardware, a comparison is cheap, while an unpredictable memory load can be expensive.

Problem

Given a sorted collection $A$ and a target value $x$ , find the first index $i$ such that

A[i] \ge x

or report that no such index exists.

Idea

Ordinary binary search jumps to the middle of the current range. These jumps can touch distant memory locations, especially for large arrays.

A cache aware version divides the data into blocks that match the cache or storage block size. It first searches among block representatives, then searches inside one block.

For block size $B$ , store:

a top level array of block maximums or separator keys
contiguous sorted blocks of up to $B$ elements

Algorithm

cache_aware_binary_search(blocks, separators, x):
    b = lower_bound(separators, x)

    if b == length(blocks):
        return -1

    j = lower_bound(blocks[b], x)

    if j == length(blocks[b]):
        return -1

    return global_index(b, j)

The first search finds the block that may contain the answer. The second search runs within a cache sized block.

Example

Let the sorted values be split into blocks of size $4$ :

block	values	separator
0	[3, 6, 9, 12]	12
1	[15, 18, 21, 24]	24
2	[27, 30, 33, 36]	36

Search for:

x = 20

First search the separators:

separator index	value	relation
1	24	candidate block

Then search block 1:

local index	value	relation
0	15	< x
1	18	< x
2	21	>= x

The lower bound is the global index:

4 + 2 = 6

Correctness

The separator for each block is the maximum value in that block. The first lower bound over separators finds the first block whose maximum is at least $x$ . All earlier blocks contain only values smaller than $x$ , so they cannot contain the lower bound.

The second lower bound inside that block finds the first value greater than or equal to $x$ within the only possible block. Therefore the combined result is the global lower bound.

Complexity

Let $n$ be the total number of elements and $B$ be the block size.

part	time
search separators	$O(\log(n/B))$
search inside block	$O(\log B)$
total comparisons	$O(\log n)$

Memory transfers are often lower than ordinary binary search because the second phase stays inside one block.

Space usage is:

O(n)

including the data and separators.

When to Use

Use cache aware binary search when:

the array is large
search throughput matters
the block size is known
the data is mostly static
memory locality dominates comparison cost

This technique is common in databases, file indexes, B trees, and sorted run search.

Implementation

from bisect import bisect_left

def build_cache_aware_blocks(a, block_size):
    blocks = []
    separators = []

    for i in range(0, len(a), block_size):
        block = a[i:i + block_size]
        blocks.append(block)
        separators.append(block[-1])

    return blocks, separators

def cache_aware_binary_search(blocks, separators, x):
    b = bisect_left(separators, x)

    if b == len(blocks):
        return -1

    j = bisect_left(blocks[b], x)

    if j == len(blocks[b]):
        return -1

    return sum(len(block) for block in blocks[:b]) + j

func lowerBound(a []int, x int) int {
    left, right := 0, len(a)

    for left < right {
        mid := (left + right) / 2
        if a[mid] < x {
            left = mid + 1
        } else {
            right = mid
        }
    }

    return left
}

func CacheAwareBinarySearch(blocks [][]int, separators []int, x int) int {
    b := lowerBound(separators, x)
    if b == len(blocks) {
        return -1
    }

    j := lowerBound(blocks[b], x)
    if j == len(blocks[b]) {
        return -1
    }

    offset := 0
    for i := 0; i < b; i++ {
        offset += len(blocks[i])
    }

    return offset + j
}