Skip to content

Parallel Binary Search with DSU

Answer offline connectivity queries by combining parallel binary search with a disjoint set union.

Parallel binary search with DSU answers many offline connectivity queries over a sequence of edge insertions. It finds the earliest time when each pair of vertices becomes connected.

The technique combines two ideas:

  • parallel binary search over answer times
  • disjoint set union for fast connectivity checks

It applies when edges are only added. Once two vertices become connected, they remain connected, so the predicate is monotone.

Problem

Given:

  • vertices 00 to n1n - 1
  • edge insertions E0,E1,,ET1E_0, E_1, \dots, E_{T-1}
  • queries (u,v)(u, v)

For each query, find the smallest time tt such that uu and vv are connected after applying edges:

E0,E1,,Et E_0, E_1, \dots, E_t

If they never become connected, return 1-1.

Key Idea

Each query has an answer interval [l,r][l, r] over time.

During each round:

  1. Put each active query into a bucket by midpoint
  2. Reset the DSU
  3. Replay edge insertions from left to right
  4. At each time tt, answer all queries whose midpoint is tt
  5. Shrink each query interval

The DSU is rebuilt once per round, not once per query.

Algorithm

parallel_binary_search_dsu(n, edges, queries):
    T = length(edges)
    Q = length(queries)

    low[q] = 0 for every query q
    high[q] = T for every query q

    while some query has low[q] < high[q]:
        buckets = array of empty lists of length T

        for q from 0 to Q - 1:
            if low[q] < high[q]:
                mid = (low[q] + high[q]) // 2
                if mid < T:
                    buckets[mid].append(q)

        dsu.reset(n)

        for t from 0 to T - 1:
            dsu.union(edges[t].u, edges[t].v)

            for q in buckets[t]:
                u, v = queries[q]
                if dsu.find(u) == dsu.find(v):
                    high[q] = t
                else:
                    low[q] = t + 1

    answer[q] = low[q] if low[q] < T else -1
    return answer

Example

Edges are inserted over time:

timeedge
0(0,1)(0, 1)
1(2,3)(2, 3)
2(1,2)(1, 2)
3(4,5)(4, 5)

Query:

(0,3) (0, 3)

Connectivity changes:

timeconnected?
0no
1no
2yes
3yes

The earliest connected time is 22.

Correctness

For each query (u,v)(u, v), define:

F(t)=u and v are connected after edges 0..t F(t) = \text{$u$ and $v$ are connected after edges } 0..t

Because edges are only added, connectivity can change only from false to true. It never changes from true back to false.

Therefore F(t)F(t) is monotone.

Each query maintains an interval containing its earliest true time. If the DSU says the endpoints are connected at midpoint mm, then the earliest true time is at most mm, so the algorithm moves the right boundary to mm.

If they are not connected at mm, monotonicity implies they are not connected at any earlier time, so the algorithm moves the left boundary to m+1m + 1.

When the interval collapses, the remaining time is the earliest connection time, or TT if no such time exists.

Complexity

Let:

  • nn be the number of vertices
  • TT be the number of edge insertions
  • QQ be the number of queries
  • α(n)\alpha(n) be the inverse Ackermann factor from DSU

Each round replays all edges and checks all active queries.

componentcost
roundsO(logT)O(\log T)
DSU unionsO(Tα(n)logT)O(T \alpha(n) \log T)
query checksO(Qα(n)logT)O(Q \alpha(n) \log T)
totalO((T+Q)α(n)logT)O((T + Q)\alpha(n)\log T)
spaceO(n+T+Q)O(n + T + Q)

DSU Operations

make_set(v):
    parent[v] = v
    size[v] = 1

find(v):
    while parent[v] != v:
        parent[v] = parent[parent[v]]
        v = parent[v]
    return v

union(a, b):
    ra = find(a)
    rb = find(b)
    if ra == rb:
        return

    if size[ra] < size[rb]:
        swap(ra, rb)

    parent[rb] = ra
    size[ra] += size[rb]

When to Use

Use this method when:

  • all queries are known beforehand
  • updates are edge insertions only
  • each query asks for earliest connectivity
  • the graph grows monotonically
  • rebuilding DSU per round is acceptable

Notes

  • This method does not directly support edge deletions.
  • For deletions, use rollback DSU with divide and conquer on time.
  • If only final connectivity matters, ordinary DSU is enough.
  • If queries arrive online, this offline method is not suitable.

Implementation

class DSU:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n

    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]
            x = self.parent[x]
        return x

    def union(self, a, b):
        ra = self.find(a)
        rb = self.find(b)
        if ra == rb:
            return
        if self.size[ra] < self.size[rb]:
            ra, rb = rb, ra
        self.parent[rb] = ra
        self.size[ra] += self.size[rb]

def parallel_binary_search_dsu(n, edges, queries):
    t = len(edges)
    q = len(queries)

    low = [0] * q
    high = [t] * q

    while True:
        active = False
        buckets = [[] for _ in range(t)]

        for i in range(q):
            if low[i] < high[i]:
                active = True
                mid = (low[i] + high[i]) // 2
                if mid < t:
                    buckets[mid].append(i)

        if not active:
            break

        dsu = DSU(n)

        for time, (a, b) in enumerate(edges):
            dsu.union(a, b)

            for qi in buckets[time]:
                u, v = queries[qi]
                if dsu.find(u) == dsu.find(v):
                    high[qi] = time
                else:
                    low[qi] = time + 1

    return [low[i] if low[i] < t else -1 for i in range(q)]
type DSU struct {
    parent []int
    size   []int
}

func NewDSU(n int) *DSU {
    parent := make([]int, n)
    size := make([]int, n)
    for i := 0; i < n; i++ {
        parent[i] = i
        size[i] = 1
    }
    return &DSU{parent: parent, size: size}
}

func (d *DSU) Find(x int) int {
    for d.parent[x] != x {
        d.parent[x] = d.parent[d.parent[x]]
        x = d.parent[x]
    }
    return x
}

func (d *DSU) Union(a, b int) {
    ra := d.Find(a)
    rb := d.Find(b)
    if ra == rb {
        return
    }
    if d.size[ra] < d.size[rb] {
        ra, rb = rb, ra
    }
    d.parent[rb] = ra
    d.size[ra] += d.size[rb]
}

type Edge struct {
    U int
    V int
}

type Query struct {
    U int
    V int
}

func ParallelBinarySearchDSU(n int, edges []Edge, queries []Query) []int {
    t := len(edges)
    q := len(queries)

    low := make([]int, q)
    high := make([]int, q)
    for i := range high {
        high[i] = t
    }

    for {
        active := false
        buckets := make([][]int, t)

        for i := 0; i < q; i++ {
            if low[i] < high[i] {
                active = true
                mid := (low[i] + high[i]) / 2
                if mid < t {
                    buckets[mid] = append(buckets[mid], i)
                }
            }
        }

        if !active {
            break
        }

        dsu := NewDSU(n)

        for time, e := range edges {
            dsu.Union(e.U, e.V)

            for _, qi := range buckets[time] {
                query := queries[qi]
                if dsu.Find(query.U) == dsu.Find(query.V) {
                    high[qi] = time
                } else {
                    low[qi] = time + 1
                }
            }
        }
    }

    ans := make([]int, q)
    for i := 0; i < q; i++ {
        if low[i] < t {
            ans[i] = low[i]
        } else {
            ans[i] = -1
        }
    }

    return ans
}