8.18 Centroid Decomposition

Many tree algorithms process information from a fixed root.

8.18 Centroid Decomposition

Many tree algorithms process information from a fixed root.

Heavy-Light Decomposition focuses on paths.

Euler Tours focus on subtrees.

Rerooting Dynamic Programming focuses on changing roots efficiently.

Centroid Decomposition takes a different approach entirely.

Instead of choosing a root and keeping it forever, it repeatedly divides the tree into smaller balanced pieces.

This divide-and-conquer strategy transforms many difficult global tree problems into manageable local problems.

Centroid Decomposition is especially useful for:

  • Distance queries
  • Nearest marked node queries
  • Path counting
  • Dynamic tree statistics
  • Tree optimization problems

The technique appears frequently in advanced algorithm competitions and large-scale hierarchical analytics systems.

Problem

You need to solve a tree problem involving global distances or interactions between many nodes.

Direct traversal is too expensive.

Solution

Recursively split the tree around its centroid.

A centroid is a node whose removal leaves no connected component larger than half the original tree.

Consider:

          0
        /   \\n       1     2
      / \   / \\n     3   4 5   6

Node:

0

is a centroid.

Removing it produces:

1-subtree = 3 nodes
2-subtree = 3 nodes

Neither exceeds:

7 / 2 = 3.5

The tree is therefore balanced around node 0.

Recursively repeat the process inside every remaining component.

The resulting structure is called the centroid tree.

Discussion

The key observation is that centroids create balanced partitions.

Suppose:

n = 1000000

After one decomposition:

≤ 500000

After another:

≤ 250000

Then:

≤ 125000

and so on.

The component size shrinks by at least half at every level.

Therefore:

Centroid Tree Height
=
O(log n)

This logarithmic depth is the foundation of the technique.

Finding Subtree Sizes

The first step computes subtree sizes.

func dfsSize(
    tree [][]int,
    node int,
    parent int,
    size []int,
    removed []bool,
) {
    size[node] = 1

    for _, next := range tree[node] {
        if next == parent ||
           removed[next] {
            continue
        }

        dfsSize(
            tree,
            next,
            node,
            size,
            removed,
        )

        size[node] += size[next]
    }
}

This is identical to many earlier tree algorithms.

The sizes determine which node qualifies as a centroid.

Finding the Centroid

After computing sizes:

func findCentroid(
    tree [][]int,
    node int,
    parent int,
    total int,
    size []int,
    removed []bool,
) int {
    for _, next := range tree[node] {
        if next == parent ||
           removed[next] {
            continue
        }

        if size[next] > total/2 {
            return findCentroid(
                tree,
                next,
                node,
                total,
                size,
                removed,
            )
        }
    }

    return node
}

The algorithm walks toward the largest subtree until no subtree exceeds half the total size.

The resulting node is the centroid.

Building the Centroid Tree

Once the centroid is found:

  1. Mark it removed.
  2. Recursively process each remaining component.
  3. Connect resulting centroids to the current centroid.
func decompose(
    tree [][]int,
    start int,
    parent int,
    centroidParent []int,
    size []int,
    removed []bool,
) {
    dfsSize(
        tree,
        start,
        -1,
        size,
        removed,
    )

    centroid :=
        findCentroid(
            tree,
            start,
            -1,
            size[start],
            size,
            removed,
        )

    centroidParent[centroid] =
        parent

    removed[centroid] = true

    for _, next := range tree[centroid] {
        if removed[next] {
            continue
        }

        decompose(
            tree,
            next,
            centroid,
            centroidParent,
            size,
            removed,
        )
    }
}

The result is a second tree layered on top of the original one.

Visualizing the Decomposition

Original tree:

          0
        /   \\n       1     2
      / \   / \\n     3   4 5   6

Centroid:

0

Remaining components:

1
├── 3
└── 4

2
├── 5
└── 6

Centroids:

1
2

Final centroid tree:

      0
     / \\n    1   2
   / \ / \\n  3  4 5  6

This example happens to resemble the original tree.

In general, the structures may differ significantly.

Why Centroid Decomposition Helps

Suppose we need:

Distance to nearest red node

for every query.

A naive approach:

Traverse entire tree.

Complexity:

O(n)

per query.

With Centroid Decomposition:

Store information at every centroid ancestor.

Query:

node
→ centroid parent
→ centroid parent
→ ...

The path length is:

O(log n)

The answer becomes logarithmic.

Distance to Nearest Marked Node

This is the classic centroid decomposition problem.

Maintain:

best[centroid]

representing:

Distance from centroid
to nearest marked node.

When marking a node:

current := node

for current != -1 {
    best[current] =
        min(
            best[current],
            distance(
                node,
                current,
            ),
        )

    current =
        centroidParent[current]
}

Only centroid ancestors are visited.

Complexity:

O(log n)

Querying the Nearest Marked Node

Query:

current := node
answer := infinity

for current != -1 {
    answer =
        min(
            answer,
            best[current] +
            distance(
                node,
                current,
            ),
        )

    current =
        centroidParent[current]
}

Again:

O(log n)

nodes are visited.

This is dramatically faster than traversing the original tree.

Distance Computation

Distance queries often use LCA preprocessing.

Formula:

distance(u,v)
=
depth(u)
+
depth(v)
-
2×depth(LCA)

Since centroid decomposition frequently needs distances between arbitrary pairs, LCA is commonly combined with the technique.

The two methods complement each other naturally.

Path Counting Problems

Centroid decomposition excels at counting paths.

Examples:

Number of paths
with length K

Number of paths
with sum K

Number of paths
containing property X

The centroid becomes the divide-and-conquer boundary.

Count:

Paths through centroid

then recurse into components.

This avoids double-counting.

Many difficult counting problems become tractable with this strategy.

Divide and Conquer Perspective

Centroid decomposition is fundamentally a divide-and-conquer algorithm.

At each level:

Solve local contribution.

Split problem.

Recurse.

This resembles:

  • Merge Sort
  • Quick Sort
  • FFT
  • Divide-and-Conquer DP

The difference is that the recursion occurs on a tree rather than an array.

Comparing HLD and Centroid Decomposition

These techniques solve different classes of problems.

Feature HLD Centroid
Path queries Excellent Moderate
Subtree queries Moderate Moderate
Distance queries Moderate Excellent
Dynamic marked nodes Difficult Excellent
Path counting Moderate Excellent
Complexity O(log² n) O(log n) often

Neither replaces the other.

Each addresses different structural challenges.

Multiple Levels of Information

A node belongs to multiple centroid levels.

Example:

Node 17

Level 0 centroid
Level 1 centroid
Level 2 centroid
Level 3 centroid

This hierarchy is what allows logarithmic queries.

Every query climbs the centroid tree rather than the original tree.

The original tree remains unchanged.

Only an additional structure is added.

Complexity Analysis

Let:

n = number of nodes

Construction:

Operation Complexity
Subtree sizes O(n log n)
Centroid search O(n log n)
Build centroid tree O(n log n)

Memory:

Structure Complexity
Centroid tree O(n)
Auxiliary arrays O(n)

Queries:

Operation Complexity
Update node O(log n)
Distance query O(log n)

The logarithmic behavior comes from the balanced decomposition.

Common Mistakes

A common mistake is forgetting to exclude removed centroids during recursive processing.

Once a node becomes a centroid, it should not participate in later centroid searches.

Another mistake is assuming the centroid is unique.

Some trees have two valid centroids.

Either choice works correctly.

A third mistake is recomputing distances with DFS during every query. Centroid decomposition is usually paired with LCA preprocessing so that distances remain efficient.

Finally, many implementations confuse:

Original tree parent

with:

Centroid tree parent

These are entirely different structures and must be stored separately.

Takeaway

Centroid Decomposition recursively partitions a tree into balanced components. The resulting centroid tree has logarithmic height, allowing many global tree problems to be transformed into logarithmic-time queries and updates. The technique is particularly effective for distance-based queries, dynamic marked-node problems, and path-counting tasks. More broadly, centroid decomposition demonstrates how balanced divide-and-conquer strategies can reveal efficient solutions to problems that appear inherently global.