15.17 Offline Queries
You need to answer many queries over a fixed dataset.
15.17 Offline Queries
Problem
You need to answer many queries over a fixed dataset.
Example:
array = [5, 1, 7, 3, 9, 2]
queries:
count values <= 4 in range [1, 5]
count values <= 6 in range [0, 3]
count values <= 8 in range [2, 5]
A direct approach handles each query independently. For each range, scan the relevant portion of the array.
If there are (q) queries and each query may scan (n) elements, the total time can become:
$$ O(nq) $$
For large inputs, this is too slow.
Can divide and conquer help when all queries are known in advance?
Solution
Use offline query processing.
An online algorithm answers each query immediately as it arrives. An offline algorithm receives all queries first, then reorders or groups them to answer the entire batch more efficiently.
Divide and conquer is especially useful offline because queries can be partitioned along with the data.
The broad pattern is:
solve(data range, query set)
if range is simple
answer directly
split data range
divide queries into categories
solve left queries
solve right queries
handle crossing queries
This is the same shape as merge sort. The difference is that the algorithm merges information needed by queries rather than merging sorted arrays only for sorting.
Why Offline Processing Helps
When queries are handled one by one, the same work is often repeated.
Example:
query 1 scans array[0..1000]
query 2 scans array[0..1000]
query 3 scans array[0..1000]
Offline processing lets the algorithm share preprocessing across many queries.
Instead of recomputing similar information repeatedly, it builds structure once and applies it to a batch.
This is common in:
- range queries
- geometry
- graph connectivity
- event simulation
- scheduling
- time-travel data structures
Example: Count Smaller-or-Equal Values in a Range
Query:
(l, r, x)
asks:
How many elements in array[l..r] are <= x?
A naive implementation scans:
array[l], array[l+1], ..., array[r]
for each query.
An offline divide-and-conquer approach sorts values and queries by threshold.
Another approach uses a merge-sort tree. Here we use the merge-sort tree because it follows the divide-and-conquer structure directly.
Merge-Sort Tree
A merge-sort tree is a segment tree where each node stores the sorted values in its range.
For:
array = [5, 1, 7, 3, 9, 2]
the root stores:
[1, 2, 3, 5, 7, 9]
The left child for [0..2] stores:
[1, 5, 7]
The right child for [3..5] stores:
[2, 3, 9]
Each level contains all elements once, just distributed across nodes.
Total space:
$$ O(n\log n) $$
Building the Structure
The build step is a divide-and-conquer merge.
type MergeSortTree struct {
tree [][]int
n int
}
func NewMergeSortTree(a []int) *MergeSortTree {
mst := &MergeSortTree{
tree: make([][]int, 4*len(a)),
n: len(a),
}
mst.build(a, 1, 0, len(a)-1)
return mst
}
func (mst *MergeSortTree) build(
a []int,
node, lo, hi int,
) {
if lo == hi {
mst.tree[node] = []int{a[lo]}
return
}
mid := lo + (hi-lo)/2
mst.build(a, node*2, lo, mid)
mst.build(a, node*2+1, mid+1, hi)
mst.tree[node] = mergeSorted(
mst.tree[node*2],
mst.tree[node*2+1],
)
}
The helper merge is the same merge operation used by merge sort.
func mergeSorted(a, b []int) []int {
result := make([]int, 0, len(a)+len(b))
i, j := 0, 0
for i < len(a) && j < len(b) {
if a[i] <= b[j] {
result = append(result, a[i])
i++
} else {
result = append(result, b[j])
j++
}
}
result = append(result, a[i:]...)
result = append(result, b[j:]...)
return result
}
Answering a Query
To answer:
count values <= x in [ql, qr]
visit the segment tree nodes whose ranges cover [ql, qr].
For each covered node, use binary search in its sorted list.
func (mst *MergeSortTree) CountLE(
ql, qr, x int,
) int {
return mst.countLE(1, 0, mst.n-1, ql, qr, x)
}
func (mst *MergeSortTree) countLE(
node, lo, hi int,
ql, qr, x int,
) int {
if qr < lo || hi < ql {
return 0
}
if ql <= lo && hi <= qr {
return upperBound(mst.tree[node], x)
}
mid := lo + (hi-lo)/2
return mst.countLE(node*2, lo, mid, ql, qr, x) +
mst.countLE(node*2+1, mid+1, hi, ql, qr, x)
}
The upperBound function returns the number of elements less than or equal to x.
func upperBound(a []int, x int) int {
lo, hi := 0, len(a)
for lo < hi {
mid := lo + (hi-lo)/2
if a[mid] <= x {
lo = mid + 1
} else {
hi = mid
}
}
return lo
}
Complexity
Building the tree costs:
$$ O(n\log n) $$
Each level stores all (n) elements once.
A range query touches:
$$ O(\log n) $$
nodes.
Each touched node performs binary search:
$$ O(\log n) $$
So each query costs:
$$ O(\log^2 n) $$
Total for (q) queries:
$$ O(n\log n + q\log^2 n) $$
This is much better than:
$$ O(nq) $$
when both (n) and (q) are large.
Offline Divide and Conquer on Time
Another common offline pattern handles updates and queries across time.
Example:
time 1: add edge (1,2)
time 2: add edge (2,3)
time 3: ask connected(1,3)
time 4: remove edge (2,3)
time 5: ask connected(1,3)
If all operations are known in advance, we can process edge lifetimes over time intervals.
A segment tree over time stores each edge in the nodes covering the interval where that edge is active.
Then a depth-first traversal applies edges when entering a node and rolls them back when leaving.
This combines divide and conquer with rollback data structures.
Rollback Union-Find Sketch
For offline dynamic connectivity:
build segment tree over time
for each edge:
add edge to all time intervals
where it is active
dfs(node):
save union-find state
apply all edges stored at node
if leaf:
answer queries at that time
else:
dfs(left child)
dfs(right child)
rollback to saved state
The recursive tree divides time.
The union-find structure maintains connectivity for the current time interval.
This is a powerful technique because fully dynamic online connectivity is much harder.
Offline knowledge simplifies the problem.
Divide and Conquer Over Queries
Some algorithms divide the answer space instead of the data range.
For example, given many queries asking for the (k)-th smallest value in a range, one can recursively partition candidate values:
values <= mid
values > mid
and route queries left or right based on counts.
This technique is sometimes called parallel binary search when many decision queries are solved together.
The pattern is:
divide answer candidates
batch evaluate feasibility
route queries
recurse
This is binary search on answer, but shared across many queries.
Online vs Offline Tradeoff
Offline algorithms are not always usable.
They require all queries to be known before processing begins.
| Requirement | Online | Offline |
|---|---|---|
| Answers immediately | Yes | No |
| Can reorder work | No | Yes |
| Can batch preprocessing | Limited | Strong |
| Good for streams | Yes | Usually no |
| Good for large query batches | Sometimes | Often |
Use offline processing when latency for individual queries is less important than total throughput.
Common Offline Techniques
Sorting Events
Sort updates and queries by a key, such as value or time.
Useful for threshold queries.
Divide and Conquer on Index Range
Split the array or coordinate range and route queries recursively.
Useful for range counting and geometry.
Segment Tree Over Time
Store active intervals in time ranges.
Useful for dynamic connectivity and rollback algorithms.
Parallel Binary Search
Binary search all query answers together.
Useful when feasibility checks can be batched.
Mo's Algorithm
Reorder range queries to reduce pointer movement.
Not strictly divide and conquer, but often grouped with offline query techniques.
Common Mistakes
Using Offline Algorithms for Streaming Inputs
If queries arrive live and must be answered immediately, offline methods are inappropriate.
Forgetting Original Query Order
Offline algorithms often reorder queries.
Store query IDs and restore answer order before returning results.
Overbuilding Structures
A merge-sort tree uses (O(n\log n)) memory.
For very large arrays, a Fenwick tree or wavelet tree may be better.
Mishandling Duplicates
Range-count queries often depend on exact comparison rules:
< x
<= x
>= x
Use the correct binary search variant.
Ignoring Rollback Cost
Rollback data structures must undo every mutation precisely.
For union-find rollback, path compression is usually avoided because it is hard to undo efficiently.
Discussion
Offline query processing is a shift in perspective. Instead of treating each query as an isolated request, we treat the entire query set as part of the input. That opens the door to sorting, grouping, batching, and recursive partitioning.
Divide and conquer appears in several forms here. A merge-sort tree divides index ranges. Dynamic connectivity divides time. Parallel binary search divides answer space. In each case, the same principle applies: organize the batch so that many queries share work that would otherwise be repeated.
The practical value is substantial. Problems that look impossible under online constraints often become manageable when processed offline. The cost is flexibility. You give up immediate answers and gain the ability to restructure computation globally.