Skip to content

Recursive Model Index Search

Use a hierarchy of learned models to predict a key position, then search inside a bounded error range.

Recursive model index search uses several learned models arranged as a hierarchy. A top model selects a lower model, and the lower model predicts the likely position of a key in sorted data. The final prediction is then corrected by searching inside a bounded local range.

This is the main search procedure behind a recursive model index, often shortened to RMI. It replaces part of a traditional index with prediction.

Problem

Given a sorted array AA of length nn, a target key xx, and a hierarchy of models, find an index ii such that

A[i]=x A[i] = x

or report failure.

Idea

A single model may be too inaccurate for a large or irregular key distribution. An RMI splits the prediction task into stages.

The first model predicts which submodel should handle the key. The selected submodel predicts the final position.

For a two stage RMI:

m=M0(x) m = M_0(x) p=M1[m](x) p = M_1[m](x)

where mm is the selected model and pp is the predicted position.

Because the prediction may be wrong by some bounded error ϵm\epsilon_m, the algorithm searches:

[pϵm,p+ϵm] [p - \epsilon_m, p + \epsilon_m]

Algorithm

recursive_model_index_search(A, x, root_model, leaf_models, errors):
    m = root_model(x)
    m = clamp(floor(m), 0, length(leaf_models) - 1)

    p = leaf_models[m](x)
    p = floor(p)

    epsilon = errors[m]

    left = max(0, p - epsilon)
    right = min(length(A) - 1, p + epsilon)

    return binary_search(A, x, left, right)

Example

Let

A=[10,20,30,40,50,60,70,80] A = [10, 20, 30, 40, 50, 60, 70, 80]

Suppose the root model sends x=70x = 70 to leaf model 11:

M0(70)=1 M_0(70) = 1

The leaf model predicts:

M1=6.3 M_1 = 6.3

If the known error for leaf model 11 is:

ϵ1=1 \epsilon_1 = 1

then search:

[5,7] [5, 7]
indexvalue
560
670
780

Binary search returns index 66.

Correctness

The model hierarchy is only a predictor. Correctness comes from the verified error bound attached to the selected leaf model.

If the true position of xx is guaranteed to lie within ϵm\epsilon_m positions of the leaf prediction, then the bounded search interval contains the target whenever the target exists. Binary search over that interval is exact, so the algorithm returns the correct index or correctly reports failure.

Complexity

Let kk be the number of model stages and let ϵ\epsilon be the correction window size for the selected leaf.

parttime
model traversalO(k)O(k)
correction searchO(logϵ)O(\log \epsilon)
totalO(k+logϵ)O(k + \log \epsilon)

Space depends on the number and size of models.

For a two stage RMI with bb leaf models, space is:

O(b) O(b)

when each model has constant size.

When to Use

Use recursive model index search when:

  • keys are sorted
  • the dataset is large
  • the key distribution has predictable regions
  • query volume is high
  • the index is static or mostly static
  • memory access cost matters

It is less suitable when data changes frequently, because insertions and deletions may invalidate the learned distribution and its error bounds.

Implementation

def binary_search_range(a, x, left, right):
    while left <= right:
        mid = (left + right) // 2
        if a[mid] == x:
            return mid
        if a[mid] < x:
            left = mid + 1
        else:
            right = mid - 1
    return -1

def recursive_model_index_search(a, x, root_model, leaf_models, errors):
    if not a:
        return -1

    m = int(root_model(x))
    m = max(0, min(m, len(leaf_models) - 1))

    p = int(leaf_models[m](x))
    epsilon = errors[m]

    left = max(0, p - epsilon)
    right = min(len(a) - 1, p + epsilon)

    return binary_search_range(a, x, left, right)
func binarySearchRange(a []int, x int, left int, right int) int {
    for left <= right {
        mid := (left + right) / 2
        if a[mid] == x {
            return mid
        }
        if a[mid] < x {
            left = mid + 1
        } else {
            right = mid - 1
        }
    }
    return -1
}

func RecursiveModelIndexSearch(
    a []int,
    x int,
    rootModel func(int) int,
    leafModels []func(int) int,
    errors []int,
) int {
    if len(a) == 0 {
        return -1
    }

    m := rootModel(x)
    if m < 0 {
        m = 0
    }
    if m >= len(leafModels) {
        m = len(leafModels) - 1
    }

    p := leafModels[m](x)
    epsilon := errors[m]

    left := p - epsilon
    if left < 0 {
        left = 0
    }

    right := p + epsilon
    if right >= len(a) {
        right = len(a) - 1
    }

    return binarySearchRange(a, x, left, right)
}