Arrange and search sorted data with explicit knowledge of cache lines or memory blocks.
Cache aware binary search modifies ordinary binary search to account for the memory hierarchy. The algorithm still uses ordering and comparisons, but the data layout or search strategy is chosen with explicit knowledge of cache line size, block size, or page size.
The goal is to reduce cache misses. On modern hardware, a comparison is cheap, while an unpredictable memory load can be expensive.
Problem
Given a sorted collection and a target value , find the first index such that
or report that no such index exists.
Idea
Ordinary binary search jumps to the middle of the current range. These jumps can touch distant memory locations, especially for large arrays.
A cache aware version divides the data into blocks that match the cache or storage block size. It first searches among block representatives, then searches inside one block.
For block size , store:
- a top level array of block maximums or separator keys
- contiguous sorted blocks of up to elements
Algorithm
cache_aware_binary_search(blocks, separators, x):
b = lower_bound(separators, x)
if b == length(blocks):
return -1
j = lower_bound(blocks[b], x)
if j == length(blocks[b]):
return -1
return global_index(b, j)The first search finds the block that may contain the answer. The second search runs within a cache sized block.
Example
Let the sorted values be split into blocks of size :
| block | values | separator |
|---|---|---|
| 0 | [3, 6, 9, 12] | 12 |
| 1 | [15, 18, 21, 24] | 24 |
| 2 | [27, 30, 33, 36] | 36 |
Search for:
First search the separators:
| separator index | value | relation |
|---|---|---|
| 1 | 24 | candidate block |
Then search block 1:
| local index | value | relation |
|---|---|---|
| 0 | 15 | < x |
| 1 | 18 | < x |
| 2 | 21 | >= x |
The lower bound is the global index:
Correctness
The separator for each block is the maximum value in that block. The first lower bound over separators finds the first block whose maximum is at least . All earlier blocks contain only values smaller than , so they cannot contain the lower bound.
The second lower bound inside that block finds the first value greater than or equal to within the only possible block. Therefore the combined result is the global lower bound.
Complexity
Let be the total number of elements and be the block size.
| part | time |
|---|---|
| search separators | |
| search inside block | |
| total comparisons |
Memory transfers are often lower than ordinary binary search because the second phase stays inside one block.
Space usage is:
including the data and separators.
When to Use
Use cache aware binary search when:
- the array is large
- search throughput matters
- the block size is known
- the data is mostly static
- memory locality dominates comparison cost
This technique is common in databases, file indexes, B trees, and sorted run search.
Implementation
from bisect import bisect_left
def build_cache_aware_blocks(a, block_size):
blocks = []
separators = []
for i in range(0, len(a), block_size):
block = a[i:i + block_size]
blocks.append(block)
separators.append(block[-1])
return blocks, separators
def cache_aware_binary_search(blocks, separators, x):
b = bisect_left(separators, x)
if b == len(blocks):
return -1
j = bisect_left(blocks[b], x)
if j == len(blocks[b]):
return -1
return sum(len(block) for block in blocks[:b]) + jfunc lowerBound(a []int, x int) int {
left, right := 0, len(a)
for left < right {
mid := (left + right) / 2
if a[mid] < x {
left = mid + 1
} else {
right = mid
}
}
return left
}
func CacheAwareBinarySearch(blocks [][]int, separators []int, x int) int {
b := lowerBound(separators, x)
if b == len(blocks) {
return -1
}
j := lowerBound(blocks[b], x)
if j == len(blocks[b]) {
return -1
}
offset := 0
for i := 0; i < b; i++ {
offset += len(blocks[i])
}
return offset + j
}