Skip to content

Offline Binary Search

Answer many monotone queries by processing them together after all input is known.

Offline binary search answers many queries after all updates and queries are already known. It is closely related to parallel binary search. The main idea is to avoid solving each query separately by grouping many midpoint checks into shared passes over the data.

This method applies when each query asks for the earliest or smallest point at which a monotone condition becomes true.

Problem

Given:

  • a sequence of updates U0,U1,,UT1U_0, U_1, \dots, U_{T-1}
  • a set of queries q1,q2,,qQq_1, q_2, \dots, q_Q
  • a monotone predicate F(q,t)F(q, t)

Find, for each query qq, the smallest time tt such that:

F(q,t)=true F(q, t) = true

If no such time exists, return a sentinel such as TT or 1-1 depending on the convention.

Key Idea

Each query keeps a search interval. During each round, all active queries are grouped by midpoint. The update sequence is replayed once, and every query whose midpoint is reached is checked against the current data structure state.

This is offline because the algorithm needs all queries before it starts. It can reorder and batch their checks.

Algorithm

offline_binary_search(queries, updates):
    Q = number of queries
    T = number of updates

    for each query q:
        low[q] = 0
        high[q] = T

    while some query has low[q] < high[q]:
        buckets = array of empty lists of length T

        for each query q:
            if low[q] < high[q]:
                mid = (low[q] + high[q]) // 2
                buckets[mid].append(q)

        reset data structure

        for t from 0 to T - 1:
            apply updates[t]

            for each query q in buckets[t]:
                if check(q):
                    high[q] = t
                else:
                    low[q] = t + 1

    return low

Example

Suppose edges are added to a graph over time. Each query asks when two vertices first become connected.

The predicate is:

F(q,t)=vertices of q are connected after updates 0..t F(q, t) = \text{vertices of } q \text{ are connected after updates } 0..t

Once two vertices are connected, they remain connected if edges are only added. Therefore the predicate is monotone.

Offline binary search can answer many such queries by replaying the edge additions once per round, rather than once per query.

Correctness

For each query, the algorithm maintains the invariant that its answer lies in the interval:

[low[q],high[q]] [low[q], high[q]]

At each round, the midpoint midmid is checked under the exact state after update midmid has been applied.

If check(q) is true, monotonicity implies the first true time lies at or before midmid, so the algorithm sets:

high[q] = mid

If check(q) is false, monotonicity implies every time at or before midmid is false, so the algorithm sets:

low[q] = mid + 1

The interval shrinks for every active query. When the interval has length zero, the remaining value is the earliest feasible time.

Complexity

Let:

  • TT be the number of updates
  • QQ be the number of queries
  • CUC_U be the cost of applying one update
  • CQC_Q be the cost of checking one query

Each round replays all updates and checks each active query once.

componentcost
roundsO(logT)O(\log T)
update replayO(TCUlogT)O(T C_U \log T)
query checksO(QCQlogT)O(Q C_Q \log T)
totalO((TCU+QCQ)logT)O((T C_U + Q C_Q)\log T)

Space:

O(T+Q) O(T + Q)

Comparison

methodwhen usedcost shape
independent binary searcheach query checked separatelyrepeated update replay per query
offline binary searchqueries known in advanceshared update replay per round
online data structurequeries arrive liverequires dynamic maintenance

Offline binary search is useful when online handling is hard, but the monotone structure is clear.

When to Use

Use offline binary search when:

  • all queries are known before processing
  • each answer is a boundary in an ordered domain
  • the predicate is monotone
  • updates can be replayed from scratch
  • batching many midpoint checks saves work

Common examples include connectivity over time, threshold crossing, earliest prefix satisfying a constraint, and cumulative frequency queries.

Notes

  • The algorithm requires a resettable data structure.
  • If updates are reversible, divide and conquer on time may be better.
  • If queries must be answered immediately, this offline method does not apply.
  • The returned sentinel must be defined carefully when no feasible time exists.

Implementation Sketch

def offline_binary_search(queries, updates, reset, apply_update, check):
    qn = len(queries)
    t = len(updates)

    low = [0] * qn
    high = [t] * qn

    while True:
        active = False
        buckets = [[] for _ in range(t)]

        for qi in range(qn):
            if low[qi] < high[qi]:
                active = True
                mid = (low[qi] + high[qi]) // 2
                if mid < t:
                    buckets[mid].append(qi)

        if not active:
            break

        reset()

        for time, update in enumerate(updates):
            apply_update(update)

            for qi in buckets[time]:
                if check(queries[qi]):
                    high[qi] = time
                else:
                    low[qi] = time + 1

    return low
func OfflineBinarySearch[Q any, U any](
    queries []Q,
    updates []U,
    reset func(),
    applyUpdate func(U),
    check func(Q) bool,
) []int {
    qn := len(queries)
    t := len(updates)

    low := make([]int, qn)
    high := make([]int, qn)
    for i := range high {
        high[i] = t
    }

    for {
        active := false
        buckets := make([][]int, t)

        for qi := 0; qi < qn; qi++ {
            if low[qi] < high[qi] {
                active = true
                mid := (low[qi] + high[qi]) / 2
                if mid < t {
                    buckets[mid] = append(buckets[mid], qi)
                }
            }
        }

        if !active {
            break
        }

        reset()

        for time, update := range updates {
            applyUpdate(update)

            for _, qi := range buckets[time] {
                if check(queries[qi]) {
                    high[qi] = time
                } else {
                    low[qi] = time + 1
                }
            }
        }
    }

    return low
}