A clear explanation of minimizing malware spread by analyzing connected components with Union Find.
Problem Restatement
We are given an undirected graph with n nodes.
The graph is represented by an adjacency matrix:
graph[i][j] == 1means node i is directly connected to node j.
Some nodes are initially infected with malware. These nodes are listed in:
initialMalware spreads through connected edges. If one node in a connected component is infected, eventually every node in that component becomes infected.
We must remove exactly one node from initial.
Return the node whose removal minimizes the final number of infected nodes. If multiple choices produce the same minimum, return the smallest node index.
A removed node is only removed from the initial infected list. It may still become infected later through spread.
Input and Output
| Item | Meaning |
|---|---|
| Input | graph adjacency matrix and initial infected nodes |
| Output | Best node to remove from initial |
| Graph type | Undirected |
| Spread rule | Infection spreads through connected components |
| Tie rule | Return the smallest index |
Function shape:
def minMalwareSpread(graph: list[list[int]], initial: list[int]) -> int:
...Examples
Example 1:
graph = [
[1, 1, 0],
[1, 1, 0],
[0, 0, 1],
]
initial = [0, 1]Nodes 0 and 1 are in the same connected component.
Even if we remove 0, node 1 is still initially infected, so the component still becomes infected.
Even if we remove 1, node 0 is still initially infected, so the component still becomes infected.
Both choices are equally good, so return the smaller index:
0Example 2:
graph = [
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
]
initial = [0, 2]Each node is isolated.
Removing 0 saves node 0.
Removing 2 saves node 2.
Both save one node, so return the smaller index:
0Example 3:
graph = [
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
]
initial = [1, 2]All nodes are in one connected component.
Removing either 1 or 2 still leaves another infected node in the same component, so the whole component is infected anyway.
Tie by smallest index:
1First Thought: Simulate Each Removal
A direct method is to try removing each node from initial.
For each removal:
- Start with all other initially infected nodes.
- Simulate malware spread.
- Count the final infected nodes.
- Choose the removal with the smallest count.
This works, but it repeats graph traversal for every candidate.
Since the graph has up to 300 nodes, this can still pass with careful implementation, but the cleaner solution uses connected components.
Key Insight
Malware spread depends only on connected components.
Inside one connected component:
| Initially infected count | What happens |
|---|---|
0 | No infection starts there |
1 | Removing that infected node saves the whole component |
2 or more | Removing one infected node still leaves infection, so the whole component is infected |
Therefore, a node is useful to remove only if it is the only initially infected node in its connected component.
If node x is the only infected node in a component of size s, then removing x saves s nodes.
So the best answer is:
- Find all connected components.
- Count component sizes.
- Count how many initially infected nodes each component has.
- Among initially infected nodes whose component has exactly one infection, choose the one in the largest component.
- Break ties by smaller node index.
- If no node can save any component, return the smallest node in
initial.
Algorithm
Use Union Find.
- Create a Union Find structure for
nnodes. - For every pair
(i, j), ifgraph[i][j] == 1, unioniandj. - Count the size of each connected component.
- Count how many initially infected nodes are in each component.
- Sort
initialso tie-breaking is simple. - For each infected node:
- Find its component root.
- If that component has exactly one infected node, it can be saved.
- Choose the node that saves the largest component.
- If no such node exists, return the smallest value in
initial.
Correctness
Every malware spread is confined to connected components.
If a component has no initially infected nodes, it remains uninfected.
If a component has at least one initially infected node, then infection eventually reaches every node in that component.
Removing one node from initial only changes the infection status of the component containing that node.
If that component has multiple initially infected nodes, removing one still leaves another infected node, so the component is still fully infected. No nodes are saved.
If that component has exactly one initially infected node, removing it leaves the component with no initial infection. Then the entire component is saved.
So each removable infected node saves either zero nodes or the full size of its component.
The algorithm evaluates exactly these saved component sizes and chooses the node that saves the most nodes. Sorting initial ensures that when saved sizes tie, the smaller node index is chosen.
If no node saves any component, every removal gives the same final infection count, so the smallest index in initial is correct.
Therefore, the algorithm returns the required node.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n^2 alpha(n) + m log m) | We scan the adjacency matrix and sort initial |
| Space | O(n) | Union Find arrays and component counters |
Here m = len(initial) and alpha(n) is the inverse Ackermann function, which is effectively constant.
Implementation
class UnionFind:
def __init__(self, n: int):
self.parent = list(range(n))
self.size = [1] * n
def find(self, x: int) -> int:
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, a: int, b: int) -> None:
root_a = self.find(a)
root_b = self.find(b)
if root_a == root_b:
return
if self.size[root_a] < self.size[root_b]:
root_a, root_b = root_b, root_a
self.parent[root_b] = root_a
self.size[root_a] += self.size[root_b]
class Solution:
def minMalwareSpread(self, graph: list[list[int]], initial: list[int]) -> int:
n = len(graph)
uf = UnionFind(n)
for i in range(n):
for j in range(i + 1, n):
if graph[i][j] == 1:
uf.union(i, j)
infected_count = [0] * n
for node in initial:
root = uf.find(node)
infected_count[root] += 1
answer = min(initial)
best_saved = -1
for node in sorted(initial):
root = uf.find(node)
if infected_count[root] == 1:
saved = uf.size[root]
if saved > best_saved:
best_saved = saved
answer = node
return answerCode Explanation
Union Find stores connected components:
self.parent = list(range(n))
self.size = [1] * nThe adjacency matrix is scanned once:
for i in range(n):
for j in range(i + 1, n):
if graph[i][j] == 1:
uf.union(i, j)We only scan j > i because the graph is undirected.
Then count how many infected nodes each component has:
infected_count[root] += 1The fallback answer is the smallest initially infected node:
answer = min(initial)Now check each infected node:
for node in sorted(initial):If its component has exactly one infected node, removing it saves the whole component:
if infected_count[root] == 1:
saved = uf.size[root]Choose the node that saves the most nodes:
if saved > best_saved:
best_saved = saved
answer = nodeBecause we iterate in sorted order, equal saved sizes keep the smaller index.
Testing
def run_tests():
s = Solution()
assert s.minMalwareSpread(
[
[1, 1, 0],
[1, 1, 0],
[0, 0, 1],
],
[0, 1],
) == 0
assert s.minMalwareSpread(
[
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
],
[0, 2],
) == 0
assert s.minMalwareSpread(
[
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
],
[1, 2],
) == 1
assert s.minMalwareSpread(
[
[1, 1, 0, 0],
[1, 1, 0, 0],
[0, 0, 1, 1],
[0, 0, 1, 1],
],
[1, 2],
) == 1
assert s.minMalwareSpread(
[
[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 1, 1, 1],
],
[0, 2],
) == 2
print("all tests passed")
run_tests()Test meaning:
| Test | Why |
|---|---|
| Two infected in same component | Removing either saves nothing |
| Isolated infected nodes | Tie by smallest index |
| Fully connected graph | Multiple infected nodes in one component |
| Equal saved component sizes | Tie by smaller node |
| Larger saved component | Choose node saving more nodes |