Skip to content

Learned Index Search

Use a predictive model to estimate where a key should appear, then search within a bounded local range.

Learned index search treats an ordered index as a prediction problem. Given a key, a model estimates the position where that key should appear in sorted data. The algorithm then corrects the estimate by searching a small range around the predicted position.

The main idea is that a sorted array defines a cumulative distribution function from key to position. If this function is predictable, a model can replace part of the traditional index.

Problem

Given a sorted array AA of length nn and a target key xx, find an index ii such that

A[i]=x A[i] = x

or report failure.

Idea

Train or define a model MM that predicts the approximate rank of a key:

M(x)i M(x) \approx i

where ii is the position where xx should occur.

Because the prediction may have error, search in a bounded window:

[M(x)ϵ,M(x)+ϵ] [M(x) - \epsilon, M(x) + \epsilon]

where ϵ\epsilon is the maximum known prediction error.

Algorithm

learned_index_search(A, x, M, epsilon):
    p = M(x)

    left = max(0, floor(p) - epsilon)
    right = min(length(A) - 1, floor(p) + epsilon)

    return binary_search(A, x, left, right)

Example

Let

A=[10,20,30,40,50,60,70,80] A = [10, 20, 30, 40, 50, 60, 70, 80]

Suppose the model predicts:

M(60)=5.2 M(60) = 5.2

and the error bound is:

ϵ=2 \epsilon = 2

Then search only the range:

[3,7] [3, 7]
indexvalue
340
450
560
670
780

Binary search in this range returns index 55.

Correctness

The model does not need to be exact. Correctness depends on the error bound. If every true position lies within ϵ\epsilon of the predicted position, then the target must be inside the bounded search window whenever it exists.

The final binary search is exact on that window. Therefore, if the key exists and the error bound is valid, the algorithm returns a correct index. If the key is absent, binary search correctly reports failure within the only possible window.

Complexity

Let ϵ\epsilon be the maximum prediction error.

operationtime
model evaluationO(1)O(1) or model dependent
correction searchO(logϵ)O(\log \epsilon)
totalO(1+logϵ)O(1 + \log \epsilon)

Space depends on the model. A simple linear model uses O(1)O(1) space. A recursive or piecewise model may use more.

When to Use

Use learned index search when:

  • keys are sorted
  • query volume is high
  • the key distribution is predictable
  • memory access dominates comparison cost
  • an approximate rank model can be maintained

It is most useful for static or mostly static indexes. If inserts and deletes are frequent, the model must be retrained or updated.

Implementation

def learned_index_search(a, x, model, epsilon):
    if not a:
        return -1

    p = int(model(x))
    left = max(0, p - epsilon)
    right = min(len(a) - 1, p + epsilon)

    while left <= right:
        mid = (left + right) // 2
        if a[mid] == x:
            return mid
        elif a[mid] < x:
            left = mid + 1
        else:
            right = mid - 1

    return -1
func LearnedIndexSearch(a []int, x int, model func(int) int, epsilon int) int {
    if len(a) == 0 {
        return -1
    }

    p := model(x)
    left := p - epsilon
    if left < 0 {
        left = 0
    }

    right := p + epsilon
    if right >= len(a) {
        right = len(a) - 1
    }

    for left <= right {
        mid := (left + right) / 2
        if a[mid] == x {
            return mid
        } else if a[mid] < x {
            left = mid + 1
        } else {
            right = mid - 1
        }
    }

    return -1
}