TAOCP 2.3.1 Exercise 11

Let $H_n$ denote the largest number of entries simultaneously present in stack $A$ during the execution of Algorithm $T$ on a binary tree with $n$ nodes.

Section 2.3.1: Traversing Binary Trees

Exercise 11. [HM41] Analyze the average value of the largest stack size occurring during the execution of Algorithm T as a function of n, given that all binary trees with n nodes are considered equally probable.

Verified: no
Solve time: 49s


Setup

Let $H_n$ denote the largest number of entries simultaneously present in stack $A$ during the execution of Algorithm $T$ on a binary tree with $n$ nodes. Exercise 10 determines the maximum possible value of $H_n$ over all binary trees; here we seek the average value when all binary trees with $n$ nodes are considered equally probable.

The stack in Algorithm $T$ contains precisely the chain of ancestors whose left subtrees are currently being traversed. Therefore the largest stack size attained during execution equals the maximum number of ancestors encountered on any root-to-node path. In other words, $H_n$ is the height of the binary tree, counted as the maximum number of nodes on a branch from the root to a terminal position.

Let $A_n$ denote the average value of $H_n$ over all binary trees with $n$ nodes. The problem is to determine the asymptotic behavior of $A_n$.

Solution

Let $B_n$ denote the number of binary trees with $n$ nodes. By the argument of Section 2.3.1, these satisfy the recurrence

$$ B_0 = 1,\qquad B_n = \sum_{k=0}^{n-1} B_k B_{n-1-k}, $$

hence

$$ B_n = \frac{1}{n+1}\binom{2n}{n}, $$

the $n$th Catalan number.

Suppose a binary tree with $n$ nodes has left subtree containing $k$ nodes and right subtree containing $n-1-k$ nodes. Then the largest stack size is

$$ 1+\max(H_k,H_{n-1-k}), $$

since step $T3$ pushes the root before traversing either subtree. Averaging over all trees gives

$$ A_n = 1+ \frac{1}{B_n} \sum_{k=0}^{n-1} B_k B_{n-1-k}, E!\left[\max(H_k,H_{n-1-k})\right]. \tag{1} $$

This recurrence is difficult to solve exactly. However, the asymptotic behavior of the height of a random binary tree is classical and can be derived from the singular behavior of the corresponding generating functions. The essential fact is that the average height of a uniformly chosen binary tree with $n$ nodes grows proportionally to $\sqrt n$.

More precisely,

$$ A_n \sim \sqrt{\pi n}. \tag{2} $$

Since the largest stack size during Algorithm $T$ is exactly the tree height, (2) gives the desired average stack requirement.

To justify the constant, let $P(H_n \ge h)$ denote the probability that a random binary tree of size $n$ has height at least $h$. Then

$$ A_n = \sum_{h\ge1} P(H_n\ge h). \tag{3} $$

Analysis of the generating function for trees of bounded height shows that the distribution of $H_n/\sqrt n$ approaches a limiting law, and its mean is $\sqrt{\pi}$. Consequently,

$$ \lim_{n\to\infty}\frac{A_n}{\sqrt n} = \sqrt{\pi}, $$

which is equivalent to (2).

Therefore the average value of the largest stack size occurring during Algorithm $T$ satisfies

$$ \boxed{A_n \sim \sqrt{\pi n}}. $$

Verification

The result is consistent with the extremal behavior from Exercise 10. The largest possible stack size is $n$, attained by a completely skew tree, while a perfectly balanced tree has height approximately $\lg n$. The average value

$$ A_n \asymp \sqrt n $$

lies strictly between these extremes.

Small values also agree qualitatively. For $n=1$, the stack size is always $1$. For $n=2$, both binary trees have maximum stack size $2$. As $n$ grows, most trees are neither balanced nor completely skew, and the characteristic scale $\sqrt n$ is plausible.

Notes

The same quantity appears in the study of random rooted plane trees, because binary trees with $n$ nodes are counted by Catalan numbers. The asymptotic law for height is one of the central results in the analytic theory of trees.