TAOCP 2.3.4.5 Exercise 5

The original solution fails because it treats the internal path length as if it were a simple additive marking that factors independently over subtrees.

Section 2.3.4.5: Path Length

Exercise 5. [**] [HM26] Let

$$ B(w,z)=\sum_{n,p\ge 0} b_{np}w^pz^n, $$

where $b_{np}$ is the number of binary trees with $n$ nodes and internal path length $p$. [Thus,

$$ B(w,z)=1+z+2wz^2+(w^2+4w^3)z^3+(4w^4+2w^5+8w^6)z^4+\cdots; $$

$B(1,z)$ is the function $B(z)$ of Eq. (13) in Section 2.3.4.4.]

a) Find a functional relation that characterizes $B(w,z)$, generalizing Eq. (12) of Section 2.3.4.4.

b) Use the result of part (a) to determine the average internal path length of binary trees with $n$ nodes, assuming that each of the $\frac{1}{n+1}{2n\choose n}$ trees is equally probable.

c) Find the asymptotic value of the answer to part (b), as $n\to\infty$.

Verified: yes
Solve time: 2m33s


The original solution fails because it treats the internal path length as if it were a simple additive marking that factors independently over subtrees. The key correction is that depth increases inside subtrees, so the weight transformation is size dependent, not a constant multiplicative factor.

a) Correct functional equation for $B(w,z)$

Let

$$ B(w,z)=\sum_{T} w^{P(T)} z^{|T|}, $$

where $|T|$ is the number of nodes and $P(T)$ is the internal path length.

A binary tree is either empty or consists of a root with a left and right subtree $T_1,T_2$.

For nonempty trees:

  • Size: $|T| = 1 + |T_1| + |T_2|$
  • Internal path length:

$$ P(T)=P(T_1)+P(T_2)+|T_1|+|T_2|. $$

The last term appears because every node in each subtree increases its depth by 1.

Now translate this into generating functions:

Each subtree contributes

$$ w^{P(T_i)+|T_i|} z^{|T_i|} = (wz)^{|T_i|} w^{P(T_i)}. $$

So each subtree is counted by $B(w, wz)$.

Therefore the decomposition gives:

$$ B(w,z)=1 + z, B(w, wz)^2. $$

This is the correct functional equation.

b) Average internal path length

Define

$$ C(z)=B(1,z). $$

Then the functional equation becomes the standard Catalan equation:

$$ C(z)=1+zC(z)^2, $$

so

$$ C(z)=\frac{1-\sqrt{1-4z}}{2z}. $$

Let

$$ G(z)=\left.\frac{\partial}{\partial w}B(w,z)\right|_{w=1}. $$

Then $G(z)$ is the generating function of total internal path length over all binary trees.

Differentiate

$$ B(w,z)=1+zB(w,wz)^2. $$

Using the chain rule:

$$ \frac{\partial}{\partial w}B(w,wz) = B_w(w,wz) + z B_z(w,wz). $$

So:

$$ B_w(w,z)=2z,B(w,wz)\bigl(B_w(w,wz)+zB_z(w,wz)\bigr). $$

Now set $w=1$. Write $C(z)=B(1,z)$, $G(z)=B_w(1,z)$. Then:

$$ G(z)=2zC(z)\bigl(G(z)+zC'(z)\bigr). $$

Solve for $G(z)$:

$$ G(z)\bigl(1-2zC(z)\bigr)=2z^2 C(z)C'(z). $$

Use the identity from $C=1+zC^2$:

$$ 1-2zC(z)=\sqrt{1-4z}. $$

Hence:

$$ G(z)=\frac{2z^2 C(z)C'(z)}{\sqrt{1-4z}}. $$

Asymptotics of $G(z)$

Near $z=\tfrac14$:

  • $C(z)\to 2$
  • $C'(z)\sim \frac{1}{z}(1-4z)^{-1/2}$

Substitute into $G(z)$:

$$ G(z)\sim \frac{2z^2\cdot 2 \cdot (1/z)(1-4z)^{-1/2}}{\sqrt{1-4z}} \sim \frac{1}{1-4z}. $$

Thus:

$$ [z^n]G(z)\sim 4^n. $$

So the total internal path length over all Catalan trees with $n$ nodes satisfies:

$$ P_n \sim 4^n. $$

The number of such trees is the Catalan number:

$$ C_n \sim \frac{4^n}{\sqrt{\pi},n^{3/2}}. $$

Therefore the expected internal path length is:

$$ \mathbb{E}[P_n]=\frac{P_n}{C_n} \sim \sqrt{\pi},n^{3/2}. $$

c) Asymptotic behavior

$$ \boxed{\mathbb{E}[P_n]\sim \sqrt{\pi},n^{3/2}}. $$

Final answers

a)

$$ \boxed{B(w,z)=1+zB(w,wz)^2} $$

b)

$$ \boxed{\mathbb{E}[P_n]\sim \sqrt{\pi},n^{3/2}} $$

c)

$$ \boxed{\mathbb{E}[P_n]\sim \sqrt{\pi},n^{3/2}\quad (n\to\infty)} $$