15.17 Offline Queries

Problem

You need to answer many queries over a fixed dataset.

Example:

array = [5, 1, 7, 3, 9, 2]
queries:
    count values <= 4 in range [1, 5]
    count values <= 6 in range [0, 3]
    count values <= 8 in range [2, 5]

A direct approach handles each query independently. For each range, scan the relevant portion of the array.

If there are (q) queries and each query may scan (n) elements, the total time can become:

$$ O(nq) $$

For large inputs, this is too slow.

Can divide and conquer help when all queries are known in advance?

Solution

Use offline query processing.

An online algorithm answers each query immediately as it arrives. An offline algorithm receives all queries first, then reorders or groups them to answer the entire batch more efficiently.

Divide and conquer is especially useful offline because queries can be partitioned along with the data.

The broad pattern is:

solve(data range, query set)
    if range is simple
        answer directly

    split data range
    divide queries into categories

    solve left queries
    solve right queries
    handle crossing queries

This is the same shape as merge sort. The difference is that the algorithm merges information needed by queries rather than merging sorted arrays only for sorting.

Why Offline Processing Helps

When queries are handled one by one, the same work is often repeated.

Example:

query 1 scans array[0..1000]
query 2 scans array[0..1000]
query 3 scans array[0..1000]

Offline processing lets the algorithm share preprocessing across many queries.

Instead of recomputing similar information repeatedly, it builds structure once and applies it to a batch.

This is common in:

range queries
geometry
graph connectivity
event simulation
scheduling
time-travel data structures

Example: Count Smaller-or-Equal Values in a Range

Query:

(l, r, x)

asks:

How many elements in array[l..r] are <= x?

A naive implementation scans:

array[l], array[l+1], ..., array[r]

for each query.

An offline divide-and-conquer approach sorts values and queries by threshold.

Another approach uses a merge-sort tree. Here we use the merge-sort tree because it follows the divide-and-conquer structure directly.

Merge-Sort Tree

A merge-sort tree is a segment tree where each node stores the sorted values in its range.

For:

array = [5, 1, 7, 3, 9, 2]

the root stores:

[1, 2, 3, 5, 7, 9]

The left child for [0..2] stores:

[1, 5, 7]

The right child for [3..5] stores:

[2, 3, 9]

Each level contains all elements once, just distributed across nodes.

Total space:

$$ O(n\log n) $$

Building the Structure

The build step is a divide-and-conquer merge.

type MergeSortTree struct {
    tree [][]int
    n    int
}

func NewMergeSortTree(a []int) *MergeSortTree {
    mst := &MergeSortTree{
        tree: make([][]int, 4*len(a)),
        n:    len(a),
    }

    mst.build(a, 1, 0, len(a)-1)

    return mst
}

func (mst *MergeSortTree) build(
    a []int,
    node, lo, hi int,
) {
    if lo == hi {
        mst.tree[node] = []int{a[lo]}
        return
    }

    mid := lo + (hi-lo)/2

    mst.build(a, node*2, lo, mid)
    mst.build(a, node*2+1, mid+1, hi)

    mst.tree[node] = mergeSorted(
        mst.tree[node*2],
        mst.tree[node*2+1],
    )
}

The helper merge is the same merge operation used by merge sort.

func mergeSorted(a, b []int) []int {
    result := make([]int, 0, len(a)+len(b))

    i, j := 0, 0

    for i < len(a) && j < len(b) {
        if a[i] <= b[j] {
            result = append(result, a[i])
            i++
        } else {
            result = append(result, b[j])
            j++
        }
    }

    result = append(result, a[i:]...)
    result = append(result, b[j:]...)

    return result
}

Answering a Query

To answer:

count values <= x in [ql, qr]

visit the segment tree nodes whose ranges cover [ql, qr].

For each covered node, use binary search in its sorted list.

func (mst *MergeSortTree) CountLE(
    ql, qr, x int,
) int {
    return mst.countLE(1, 0, mst.n-1, ql, qr, x)
}

func (mst *MergeSortTree) countLE(
    node, lo, hi int,
    ql, qr, x int,
) int {
    if qr < lo || hi < ql {
        return 0
    }

    if ql <= lo && hi <= qr {
        return upperBound(mst.tree[node], x)
    }

    mid := lo + (hi-lo)/2

    return mst.countLE(node*2, lo, mid, ql, qr, x) +
        mst.countLE(node*2+1, mid+1, hi, ql, qr, x)
}

The upperBound function returns the number of elements less than or equal to x.

func upperBound(a []int, x int) int {
    lo, hi := 0, len(a)

    for lo < hi {
        mid := lo + (hi-lo)/2

        if a[mid] <= x {
            lo = mid + 1
        } else {
            hi = mid
        }
    }

    return lo
}

Complexity

Building the tree costs:

$$ O(n\log n) $$

Each level stores all (n) elements once.

A range query touches:

$$ O(\log n) $$

nodes.

Each touched node performs binary search:

$$ O(\log n) $$

So each query costs:

$$ O(\log^2 n) $$

Total for (q) queries:

$$ O(n\log n + q\log^2 n) $$

This is much better than:

$$ O(nq) $$

when both (n) and (q) are large.

Offline Divide and Conquer on Time

Another common offline pattern handles updates and queries across time.

Example:

time 1: add edge (1,2)
time 2: add edge (2,3)
time 3: ask connected(1,3)
time 4: remove edge (2,3)
time 5: ask connected(1,3)

If all operations are known in advance, we can process edge lifetimes over time intervals.

A segment tree over time stores each edge in the nodes covering the interval where that edge is active.

Then a depth-first traversal applies edges when entering a node and rolls them back when leaving.

This combines divide and conquer with rollback data structures.

Rollback Union-Find Sketch

For offline dynamic connectivity:

build segment tree over time

for each edge:
    add edge to all time intervals
    where it is active

dfs(node):
    save union-find state

    apply all edges stored at node

    if leaf:
        answer queries at that time
    else:
        dfs(left child)
        dfs(right child)

    rollback to saved state

The recursive tree divides time.

The union-find structure maintains connectivity for the current time interval.

This is a powerful technique because fully dynamic online connectivity is much harder.

Offline knowledge simplifies the problem.

Divide and Conquer Over Queries

Some algorithms divide the answer space instead of the data range.

For example, given many queries asking for the (k)-th smallest value in a range, one can recursively partition candidate values:

values <= mid
values > mid

and route queries left or right based on counts.

This technique is sometimes called parallel binary search when many decision queries are solved together.

The pattern is:

divide answer candidates
batch evaluate feasibility
route queries
recurse

This is binary search on answer, but shared across many queries.

Online vs Offline Tradeoff

Offline algorithms are not always usable.

They require all queries to be known before processing begins.

Requirement	Online	Offline
Answers immediately	Yes	No
Can reorder work	No	Yes
Can batch preprocessing	Limited	Strong
Good for streams	Yes	Usually no
Good for large query batches	Sometimes	Often

Use offline processing when latency for individual queries is less important than total throughput.

Common Offline Techniques

Sorting Events

Sort updates and queries by a key, such as value or time.

Useful for threshold queries.

Divide and Conquer on Index Range

Split the array or coordinate range and route queries recursively.

Useful for range counting and geometry.

Segment Tree Over Time

Store active intervals in time ranges.

Useful for dynamic connectivity and rollback algorithms.

Parallel Binary Search

Binary search all query answers together.

Useful when feasibility checks can be batched.

Mo's Algorithm

Reorder range queries to reduce pointer movement.

Not strictly divide and conquer, but often grouped with offline query techniques.

Common Mistakes

Using Offline Algorithms for Streaming Inputs

If queries arrive live and must be answered immediately, offline methods are inappropriate.

Forgetting Original Query Order

Offline algorithms often reorder queries.

Store query IDs and restore answer order before returning results.

Overbuilding Structures

A merge-sort tree uses (O(n\log n)) memory.

For very large arrays, a Fenwick tree or wavelet tree may be better.

Mishandling Duplicates

Range-count queries often depend on exact comparison rules:

< x
<= x
>= x

Use the correct binary search variant.

Ignoring Rollback Cost

Rollback data structures must undo every mutation precisely.

For union-find rollback, path compression is usually avoided because it is hard to undo efficiently.

Discussion

Offline query processing is a shift in perspective. Instead of treating each query as an isolated request, we treat the entire query set as part of the input. That opens the door to sorting, grouping, batching, and recursive partitioning.

Divide and conquer appears in several forms here. A merge-sort tree divides index ranges. Dynamic connectivity divides time. Parallel binary search divides answer space. In each case, the same principle applies: organize the batch so that many queries share work that would otherwise be repeated.

The practical value is substantial. Problems that look impossible under online constraints often become manageable when processed offline. The cost is flexibility. You give up immediate answers and gain the ability to restructure computation globally.