# Offline Binary Search

# Offline Binary Search

Offline binary search answers many queries after all updates and queries are already known. It is closely related to parallel binary search. The main idea is to avoid solving each query separately by grouping many midpoint checks into shared passes over the data.

This method applies when each query asks for the earliest or smallest point at which a monotone condition becomes true.

## Problem

Given:

* a sequence of updates $U_0, U_1, \dots, U_{T-1}$
* a set of queries $q_1, q_2, \dots, q_Q$
* a monotone predicate $F(q, t)$

Find, for each query $q$, the smallest time $t$ such that:

$$
F(q, t) = true
$$

If no such time exists, return a sentinel such as $T$ or $-1$ depending on the convention.

## Key Idea

Each query keeps a search interval. During each round, all active queries are grouped by midpoint. The update sequence is replayed once, and every query whose midpoint is reached is checked against the current data structure state.

This is offline because the algorithm needs all queries before it starts. It can reorder and batch their checks.

## Algorithm

```id="offline-bs-1"
offline_binary_search(queries, updates):
    Q = number of queries
    T = number of updates

    for each query q:
        low[q] = 0
        high[q] = T

    while some query has low[q] < high[q]:
        buckets = array of empty lists of length T

        for each query q:
            if low[q] < high[q]:
                mid = (low[q] + high[q]) // 2
                buckets[mid].append(q)

        reset data structure

        for t from 0 to T - 1:
            apply updates[t]

            for each query q in buckets[t]:
                if check(q):
                    high[q] = t
                else:
                    low[q] = t + 1

    return low
```

## Example

Suppose edges are added to a graph over time. Each query asks when two vertices first become connected.

The predicate is:

$$
F(q, t) = \text{vertices of } q \text{ are connected after updates } 0..t
$$

Once two vertices are connected, they remain connected if edges are only added. Therefore the predicate is monotone.

Offline binary search can answer many such queries by replaying the edge additions once per round, rather than once per query.

## Correctness

For each query, the algorithm maintains the invariant that its answer lies in the interval:

$$
[low[q], high[q]]
$$

At each round, the midpoint $mid$ is checked under the exact state after update $mid$ has been applied.

If `check(q)` is true, monotonicity implies the first true time lies at or before $mid$, so the algorithm sets:

```id="offline-bs-true"
high[q] = mid
```

If `check(q)` is false, monotonicity implies every time at or before $mid$ is false, so the algorithm sets:

```id="offline-bs-false"
low[q] = mid + 1
```

The interval shrinks for every active query. When the interval has length zero, the remaining value is the earliest feasible time.

## Complexity

Let:

* $T$ be the number of updates
* $Q$ be the number of queries
* $C_U$ be the cost of applying one update
* $C_Q$ be the cost of checking one query

Each round replays all updates and checks each active query once.

| component     |                       cost |
| ------------- | -------------------------: |
| rounds        |                $O(\log T)$ |
| update replay |          $O(T C_U \log T)$ |
| query checks  |          $O(Q C_Q \log T)$ |
| total         | $O((T C_U + Q C_Q)\log T)$ |

Space:

$$
O(T + Q)
$$

## Comparison

| method                    | when used                     | cost shape                       |
| ------------------------- | ----------------------------- | -------------------------------- |
| independent binary search | each query checked separately | repeated update replay per query |
| offline binary search     | queries known in advance      | shared update replay per round   |
| online data structure     | queries arrive live           | requires dynamic maintenance     |

Offline binary search is useful when online handling is hard, but the monotone structure is clear.

## When to Use

Use offline binary search when:

* all queries are known before processing
* each answer is a boundary in an ordered domain
* the predicate is monotone
* updates can be replayed from scratch
* batching many midpoint checks saves work

Common examples include connectivity over time, threshold crossing, earliest prefix satisfying a constraint, and cumulative frequency queries.

## Notes

* The algorithm requires a resettable data structure.
* If updates are reversible, divide and conquer on time may be better.
* If queries must be answered immediately, this offline method does not apply.
* The returned sentinel must be defined carefully when no feasible time exists.

## Implementation Sketch

```id="py-offline-bs"
def offline_binary_search(queries, updates, reset, apply_update, check):
    qn = len(queries)
    t = len(updates)

    low = [0] * qn
    high = [t] * qn

    while True:
        active = False
        buckets = [[] for _ in range(t)]

        for qi in range(qn):
            if low[qi] < high[qi]:
                active = True
                mid = (low[qi] + high[qi]) // 2
                if mid < t:
                    buckets[mid].append(qi)

        if not active:
            break

        reset()

        for time, update in enumerate(updates):
            apply_update(update)

            for qi in buckets[time]:
                if check(queries[qi]):
                    high[qi] = time
                else:
                    low[qi] = time + 1

    return low
```

```id="go-offline-bs"
func OfflineBinarySearch[Q any, U any](
    queries []Q,
    updates []U,
    reset func(),
    applyUpdate func(U),
    check func(Q) bool,
) []int {
    qn := len(queries)
    t := len(updates)

    low := make([]int, qn)
    high := make([]int, qn)
    for i := range high {
        high[i] = t
    }

    for {
        active := false
        buckets := make([][]int, t)

        for qi := 0; qi < qn; qi++ {
            if low[qi] < high[qi] {
                active = true
                mid := (low[qi] + high[qi]) / 2
                if mid < t {
                    buckets[mid] = append(buckets[mid], qi)
                }
            }
        }

        if !active {
            break
        }

        reset()

        for time, update := range updates {
            applyUpdate(update)

            for _, qi := range buckets[time] {
                if check(queries[qi]) {
                    high[qi] = time
                } else {
                    low[qi] = time + 1
                }
            }
        }
    }

    return low
}
```

