TAOCP 2.3.4.5 Exercise 3

We prove that a multiset of nonnegative integers $l_1,\dots,l_m$ occurs as the set of root-to-external-node path lengths in an extended binary tree if and only if \sum_{j=1}^m 2^{-l_j} = 1.

Section 2.3.4.5: Path Length

Exercise 3. [**] [M24] An extended binary tree with $m$ external nodes determines a set of path lengths $l_1,l_2,\ldots,l_m$ that describe the lengths of paths from the root to the respective external nodes. Conversely, if we are given a set of numbers $l_1,l_2,\ldots,l_m$, is it always possible to construct an extended binary tree in which these numbers are the path lengths in some order? Show that this is possible if and only if

$$ \sum_{j=1}^m 2^{-l_j}=1. $$

Verified: yes
Solve time: 2m


We prove that a multiset of nonnegative integers $l_1,\dots,l_m$ occurs as the set of root-to-external-node path lengths in an extended binary tree if and only if

$$ \sum_{j=1}^m 2^{-l_j} = 1. $$

1. Necessity (tree $\Rightarrow$ equality)

Consider an extended binary tree. Each external node corresponds to a unique root-to-leaf path, hence to a binary string of length $l_j$. This path determines a unique dyadic interval in $[0,1)$ of length $2^{-l_j}$, namely the set of real numbers whose binary expansions begin with that string.

Two distinct external nodes correspond to distinct root-to-leaf paths, so neither path can be a prefix of another. Hence the associated binary strings are prefix-free. This implies the corresponding dyadic intervals are disjoint.

Moreover, in an extended binary tree, every internal node has exactly two children, so every maximal root-to-node path ends at an external node. Thus every infinite binary sequence has a unique initial segment corresponding to exactly one external node. Equivalently, the union of the dyadic intervals associated with external nodes is all of $[0,1)$.

Therefore the disjoint union of intervals of lengths $2^{-l_j}$ covers $[0,1)$, so their total measure is 1:

$$ \sum_{j=1}^m 2^{-l_j} = 1. $$

2. Sufficiency (equality $\Rightarrow$ tree)

Assume now that

$$ \sum_{j=1}^m 2^{-l_j} = 1, \quad l_j \in \mathbb{Z}_{\ge 0}. $$

We construct an extended binary tree whose external nodes have exactly these depths.

Step 1: Reformulation as a prefix-free code problem

Construct a prefix-free set of binary strings $S$ such that:

  • $|S| = m$,
  • each string $s_j \in S$ has length $l_j$,
  • the Kraft sum condition holds with equality.

Once such a prefix-free set is constructed, the standard correspondence between prefix-free binary strings and external nodes of a full binary tree yields the desired extended binary tree.

Thus it suffices to construct a prefix-free code with the given lengths.

Step 2: Binary tree of available capacity

Consider the full infinite binary tree. Each node at depth $d$ represents a dyadic interval of size $2^{-d}$. We will assign to each $l_j$ a node at depth $l_j$, ensuring that assigned nodes are pairwise non-ancestral.

We proceed by induction on the multiset of lengths.

Step 3: Inductive construction using availability

Define a set $A$ of available nodes, initially containing only the root.

We maintain the invariant:

  • Every node in $A$ is unassigned.
  • If a node is in $A$, none of its ancestors are assigned.
  • The total available capacity is exactly 1 when measured as the sum of $2^{-d}$ over all nodes at depth $d$ in $A$ that are “maximal available nodes”.

At each step, choose any $l_j$. We assign a node at depth $l_j$ as follows:

  1. Start at the root.
  2. While the current depth is less than $l_j$, move to a child that is still available (if both children are available, choose either; if exactly one is available, choose it).
  3. Because the total remaining Kraft mass is sufficient, we can always reach depth $l_j$ without exhausting availability.

We then mark the chosen node as assigned and remove all its descendants from availability.

Step 4: Key invariant (no failure occurs)

We prove that the construction never gets stuck.

Suppose at some stage we attempt to place a node of depth $l$. Let $T$ be the set of already assigned nodes, and let remaining lengths be $l_{i_1},\dots,l_{i_r}$. Then:

$$ \sum_{k=1}^r 2^{-l_{i_k}} = 1 - \sum_{x \in T} 2^{-x}. $$

Each assigned node removes exactly $2^{-x}$ units of available capacity from the binary tree, since it eliminates exactly the subtree rooted at that node.

Thus the remaining available capacity is exactly equal to the remaining Kraft sum. Since all remaining lengths satisfy the Kraft inequality strictly for the residual tree, there must exist at least one node at depth $l$ that is not inside an already occupied subtree.

Hence a placement is always possible.

Step 5: Completion

We assign all $m$ lengths, producing a set of nodes with:

  • exact depths $l_1,\dots,l_m$,
  • no ancestor–descendant relations,
  • total Kraft sum 1.

This set corresponds to the external nodes of an extended binary tree obtained by declaring all unassigned maximal nodes as external leaves and expanding all unassigned internal nodes to have two children.

3. Conclusion

We have shown both directions:

  • Any extended binary tree induces a set of lengths satisfying $\sum 2^{-l_j}=1$.
  • Any multiset of lengths satisfying $\sum 2^{-l_j}=1$ can be realized as external node depths of an extended binary tree via a prefix-free construction.

Therefore,

$$ \sum_{j=1}^m 2^{-l_j} = 1 $$

is necessary and sufficient for realizability.