TAOCP 2.3.2 Exercise 2

Section 2.3.2: Binary Tree Representation of Trees

Exercise 2. [20] We defined Dewey decimal notation for forests in Section 2.3, and for binary trees in exercise 2.3.1-5. Thus the node "J" in (1) is represented by "2.2.1", and in the equivalent binary tree (3) it is represented by "11010". If possible, give a rule that directly expresses the natural correspondence between trees and binary trees as a correspondence between the Dewey decimal notations.

Verified: no
Solve time: 41s

Solution

Let a forest $F = (T_1, T_2, \dots, T_n)$ be given, with nodes numbered in Dewey decimal notation as in Section 2.3. Denote the Dewey decimal label of a node $u$ in $F$ by $D_F(u)$, and its label in the corresponding binary tree $B(F)$ by $D_B(u)$. We seek a formula expressing $D_B(u)$ directly in terms of $D_F(u)$.

Let $u$ have Dewey label $D_F(u) = i_1.i_2.\dots.i_k$ in $F$, where $i_1$ identifies which tree $T_{i_1}$ of the forest $F$ contains $u$ at the top level, and $i_2, \dots, i_k$ indicate the positions of $u$ among its successive subtrees. By the natural correspondence $B(F)$ of Section 2.3.2, the binary tree is constructed recursively:

The root of $B(F)$ is $\operatorname{root}(T_1)$.
The left subtree of $\operatorname{root}(T_1)$ is $B(T_{11}, \dots, T_{1m})$, where $T_{11}, \dots, T_{1m}$ are the subtrees of $\operatorname{root}(T_1)$.
The right subtree of $\operatorname{root}(T_1)$ is $B(T_2, \dots, T_n)$.

We now track the transformation of Dewey labels. Consider the following two cases recursively:

Case 1: $u$ lies in $T_1$. Then $u$ is in the left subtree of the root of $B(F)$. Its position among the children of $T_1$ corresponds to left-child links in the binary tree. Denote by $j_2, j_3, \dots, j_k$ the positions of $u$ in its successive subtrees of $T_1$. Then in $B(F)$, $u$ is reached by a sequence of left-child links, and the Dewey label in $B(F)$ is obtained by replacing the first index $i_1 = 1$ by a single digit $1$, followed by the sequence of left-child indices:

$$ D_B(u) = 1. i_2. i_3. \dots. i_k. $$

Case 2: $u$ lies in $T_r$, $r > 1$. Then $u$ appears in the right subtree of the root of $B(F)$. Let $r' = r - 1$; the right-child link from the root corresponds to the shift past the first tree. By induction, the Dewey label of $u$ in $B(F)$ is obtained by replacing the first component $i_1 = r$ of $D_F(u)$ by a binary sequence of $r$ digits consisting of $1$ followed by $r-1$ zeroes, then appending the transformed labels of the remaining indices $i_2, \dots, i_k$ recursively along left-child links. Explicitly, define the mapping $\phi$ from the first component $i_1$ to a binary string:

$$ \phi(i_1) = \begin{cases} 1, & i_1 = 1,\[2mm] 10^{i_1-1}, & i_1 > 1, \end{cases} $$

and then

$$ D_B(u) = \phi(i_1). \psi(i_2, \dots, i_k), $$

where $\psi(i_2, \dots, i_k)$ is obtained recursively by applying the same rule to the subtrees of $T_{i_1}$. In particular, for nodes deeper than the first level, each index $i_m$ corresponds to a left-child descent in $B(F)$ and is preserved as a single digit in the binary Dewey label.

Example: In the forest of Section 2.3.2, node $J$ has Dewey decimal label $2.2.1$ in $F$. Since $2 > 1$, the mapping $\phi(2) = 10$. The remaining indices $2.1$ in $F$ correspond to left-child descents in $B(F)$ and are preserved. Hence

$$ D_B(J) = 11010, $$

which matches the given example.

Verification: Every node in $F$ is assigned a unique Dewey label $D_F(u)$; the mapping $\phi$ is injective on the first component and the recursion preserves uniqueness of subsequent indices. Therefore $D_B(u)$ is uniquely determined. Conversely, any Dewey label in $B(F)$ can be inverted by interpreting leading $1$ or $10^{r-1}$ as the first forest index $i_1$ and reversing the recursion, giving $D_F(u)$. This shows the rule provides a bijective correspondence between Dewey decimal labels in $F$ and $B(F)$, consistent with the natural correspondence of trees and binary trees.

This completes the proof.

∎