TAOCP 2.3.4.5 Exercise 12

Let a binary tree have $n$ nodes and internal path length $I$.

Section 2.3.4.5: Path Length

Exercise 12. [**] [M20] Suppose that a node has been chosen at random in a binary tree, with each node equally likely. Show that the average size of the subtree rooted at that node is related to the path length of the tree.

Verified: yes
Solve time: 45s


Let a binary tree have $n$ nodes and internal path length $I$. Denote by $s(v)$ the size of the subtree rooted at node $v$, that is, the number of nodes in the subtree including $v$ itself. Selecting a node uniformly at random corresponds to taking the expected value

$$ \mathbb{E}[,s(v),] = \frac{1}{n}\sum_{v\in T} s(v), $$

where $T$ is the set of all nodes. Observe that each node $u$ of the tree belongs to exactly $s(u)$ subtrees: it is counted once for each ancestor, including itself, because $u$ is contained in the subtree rooted at every node along the path from the root to $u$. Equivalently, the sum of the subtree sizes over all nodes is the sum of the depths of all nodes, plus one for each node: for each node $u$ at depth $d(u)$, it contributes $1$ to its own subtree and $1$ to each of the $d(u)$ ancestors above it. Therefore

$$ \sum_{v\in T} s(v) = n + I, $$

where $I$ is the internal path length, the sum of depths of all nodes. Dividing by $n$ gives the average subtree size

$$ \mathbb{E}[,s(v),] = 1 + \frac{I}{n}. $$

This shows that the expected size of a subtree rooted at a randomly chosen node is equal to one plus the average depth of the nodes, which is exactly $1 + I/n$.

This completes the proof.