8.18 Centroid Decomposition
Many tree algorithms process information from a fixed root.
8.18 Centroid Decomposition
Many tree algorithms process information from a fixed root.
Heavy-Light Decomposition focuses on paths.
Euler Tours focus on subtrees.
Rerooting Dynamic Programming focuses on changing roots efficiently.
Centroid Decomposition takes a different approach entirely.
Instead of choosing a root and keeping it forever, it repeatedly divides the tree into smaller balanced pieces.
This divide-and-conquer strategy transforms many difficult global tree problems into manageable local problems.
Centroid Decomposition is especially useful for:
- Distance queries
- Nearest marked node queries
- Path counting
- Dynamic tree statistics
- Tree optimization problems
The technique appears frequently in advanced algorithm competitions and large-scale hierarchical analytics systems.
Problem
You need to solve a tree problem involving global distances or interactions between many nodes.
Direct traversal is too expensive.
Solution
Recursively split the tree around its centroid.
A centroid is a node whose removal leaves no connected component larger than half the original tree.
Consider:
0
/ \\n 1 2
/ \ / \\n 3 4 5 6
Node:
0
is a centroid.
Removing it produces:
1-subtree = 3 nodes
2-subtree = 3 nodes
Neither exceeds:
7 / 2 = 3.5
The tree is therefore balanced around node 0.
Recursively repeat the process inside every remaining component.
The resulting structure is called the centroid tree.
Discussion
The key observation is that centroids create balanced partitions.
Suppose:
n = 1000000
After one decomposition:
≤ 500000
After another:
≤ 250000
Then:
≤ 125000
and so on.
The component size shrinks by at least half at every level.
Therefore:
Centroid Tree Height
=
O(log n)
This logarithmic depth is the foundation of the technique.
Finding Subtree Sizes
The first step computes subtree sizes.
func dfsSize(
tree [][]int,
node int,
parent int,
size []int,
removed []bool,
) {
size[node] = 1
for _, next := range tree[node] {
if next == parent ||
removed[next] {
continue
}
dfsSize(
tree,
next,
node,
size,
removed,
)
size[node] += size[next]
}
}
This is identical to many earlier tree algorithms.
The sizes determine which node qualifies as a centroid.
Finding the Centroid
After computing sizes:
func findCentroid(
tree [][]int,
node int,
parent int,
total int,
size []int,
removed []bool,
) int {
for _, next := range tree[node] {
if next == parent ||
removed[next] {
continue
}
if size[next] > total/2 {
return findCentroid(
tree,
next,
node,
total,
size,
removed,
)
}
}
return node
}
The algorithm walks toward the largest subtree until no subtree exceeds half the total size.
The resulting node is the centroid.
Building the Centroid Tree
Once the centroid is found:
- Mark it removed.
- Recursively process each remaining component.
- Connect resulting centroids to the current centroid.
func decompose(
tree [][]int,
start int,
parent int,
centroidParent []int,
size []int,
removed []bool,
) {
dfsSize(
tree,
start,
-1,
size,
removed,
)
centroid :=
findCentroid(
tree,
start,
-1,
size[start],
size,
removed,
)
centroidParent[centroid] =
parent
removed[centroid] = true
for _, next := range tree[centroid] {
if removed[next] {
continue
}
decompose(
tree,
next,
centroid,
centroidParent,
size,
removed,
)
}
}
The result is a second tree layered on top of the original one.
Visualizing the Decomposition
Original tree:
0
/ \\n 1 2
/ \ / \\n 3 4 5 6
Centroid:
0
Remaining components:
1
├── 3
└── 4
2
├── 5
└── 6
Centroids:
1
2
Final centroid tree:
0
/ \\n 1 2
/ \ / \\n 3 4 5 6
This example happens to resemble the original tree.
In general, the structures may differ significantly.
Why Centroid Decomposition Helps
Suppose we need:
Distance to nearest red node
for every query.
A naive approach:
Traverse entire tree.
Complexity:
O(n)
per query.
With Centroid Decomposition:
Store information at every centroid ancestor.
Query:
node
→ centroid parent
→ centroid parent
→ ...
The path length is:
O(log n)
The answer becomes logarithmic.
Distance to Nearest Marked Node
This is the classic centroid decomposition problem.
Maintain:
best[centroid]
representing:
Distance from centroid
to nearest marked node.
When marking a node:
current := node
for current != -1 {
best[current] =
min(
best[current],
distance(
node,
current,
),
)
current =
centroidParent[current]
}
Only centroid ancestors are visited.
Complexity:
O(log n)
Querying the Nearest Marked Node
Query:
current := node
answer := infinity
for current != -1 {
answer =
min(
answer,
best[current] +
distance(
node,
current,
),
)
current =
centroidParent[current]
}
Again:
O(log n)
nodes are visited.
This is dramatically faster than traversing the original tree.
Distance Computation
Distance queries often use LCA preprocessing.
Formula:
distance(u,v)
=
depth(u)
+
depth(v)
-
2×depth(LCA)
Since centroid decomposition frequently needs distances between arbitrary pairs, LCA is commonly combined with the technique.
The two methods complement each other naturally.
Path Counting Problems
Centroid decomposition excels at counting paths.
Examples:
Number of paths
with length K
Number of paths
with sum K
Number of paths
containing property X
The centroid becomes the divide-and-conquer boundary.
Count:
Paths through centroid
then recurse into components.
This avoids double-counting.
Many difficult counting problems become tractable with this strategy.
Divide and Conquer Perspective
Centroid decomposition is fundamentally a divide-and-conquer algorithm.
At each level:
Solve local contribution.
Split problem.
Recurse.
This resembles:
- Merge Sort
- Quick Sort
- FFT
- Divide-and-Conquer DP
The difference is that the recursion occurs on a tree rather than an array.
Comparing HLD and Centroid Decomposition
These techniques solve different classes of problems.
| Feature | HLD | Centroid |
|---|---|---|
| Path queries | Excellent | Moderate |
| Subtree queries | Moderate | Moderate |
| Distance queries | Moderate | Excellent |
| Dynamic marked nodes | Difficult | Excellent |
| Path counting | Moderate | Excellent |
| Complexity | O(log² n) | O(log n) often |
Neither replaces the other.
Each addresses different structural challenges.
Multiple Levels of Information
A node belongs to multiple centroid levels.
Example:
Node 17
Level 0 centroid
Level 1 centroid
Level 2 centroid
Level 3 centroid
This hierarchy is what allows logarithmic queries.
Every query climbs the centroid tree rather than the original tree.
The original tree remains unchanged.
Only an additional structure is added.
Complexity Analysis
Let:
n = number of nodes
Construction:
| Operation | Complexity |
|---|---|
| Subtree sizes | O(n log n) |
| Centroid search | O(n log n) |
| Build centroid tree | O(n log n) |
Memory:
| Structure | Complexity |
|---|---|
| Centroid tree | O(n) |
| Auxiliary arrays | O(n) |
Queries:
| Operation | Complexity |
|---|---|
| Update node | O(log n) |
| Distance query | O(log n) |
The logarithmic behavior comes from the balanced decomposition.
Common Mistakes
A common mistake is forgetting to exclude removed centroids during recursive processing.
Once a node becomes a centroid, it should not participate in later centroid searches.
Another mistake is assuming the centroid is unique.
Some trees have two valid centroids.
Either choice works correctly.
A third mistake is recomputing distances with DFS during every query. Centroid decomposition is usually paired with LCA preprocessing so that distances remain efficient.
Finally, many implementations confuse:
Original tree parent
with:
Centroid tree parent
These are entirely different structures and must be stored separately.
Takeaway
Centroid Decomposition recursively partitions a tree into balanced components. The resulting centroid tree has logarithmic height, allowing many global tree problems to be transformed into logarithmic-time queries and updates. The technique is particularly effective for distance-based queries, dynamic marked-node problems, and path-counting tasks. More broadly, centroid decomposition demonstrates how balanced divide-and-conquer strategies can reveal efficient solutions to problems that appear inherently global.