TAOCP 2.3.2 Exercise 10

Let $F$ and $F'$ be forests whose nodes in preorder are $u_1, u_2, \dots, u_n$ and $u'_1, u'_2, \dots, u'_{n'}$, respectively.

Section 2.3.2: Binary Tree Representation of Trees

Exercise 10. [M23] Let F be a forest of trees whose nodes in preorder are u_1, u_2, ..., u_n, and let F' be a forest whose nodes in preorder are u'_1, u'_2, ..., u'_{n'}. Let d(u) denote the degree (the number of children) of node u. In terms of these ideas, formulate and prove a theorem analogous to Theorem 2.3.1A.

Verified: yes
Solve time: 1m05s


Solution

Let $F$ and $F'$ be forests whose nodes in preorder are $u_1, u_2, \dots, u_n$ and $u'_1, u'2, \dots, u'{n'}$, respectively. Let $d(u)$ denote the degree of a node $u$ in its forest, that is, the number of children of $u$. We seek a theorem analogous to Theorem 2.3.1A, which gave a criterion for comparing two binary trees lexicographically in terms of preorder sequences and degrees of nodes.

We first define a relation $\prec$ on forests as follows. Let $F \prec F'$ if either

  1. $F$ is empty and $F'$ is nonempty, or
  2. $F$ and $F'$ are both nonempty, say $F = (T_1, T_2, \dots, T_n)$ and $F' = (T'_1, T'2, \dots, T'{n'})$, and one of the following holds:

(a) $\operatorname{root}(T_1) \prec \operatorname{root}(T'_1)$ according to the ordering of node information;

(b) $\operatorname{root}(T_1) = \operatorname{root}(T'_1)$ and $d(\operatorname{root}(T_1)) < d(\operatorname{root}(T'_1))$;

(c) $\operatorname{root}(T_1) = \operatorname{root}(T'_1)$, $d(\operatorname{root}(T_1)) = d(\operatorname{root}(T'_1))$, and $(\text{subforest of children of } T_1) \prec (\text{subforest of children of } T'_1)$;

(d) $T_1 = T'_1$ and $(T_2, \dots, T_n) \prec (T'2, \dots, T'{n'})$.

This definition recursively compares the roots of the first trees, their degrees, their subforests of children in preorder, and finally the remaining trees in the forest. It reduces to the binary-tree ordering of Section 2.3.1 when forests consist of single trees.

We now state the theorem precisely.

Theorem (Lexicographic Ordering of Forests). Let $F$ and $F'$ be forests with preorder sequences $u_1, u_2, \dots, u_n$ and $u'_1, u'2, \dots, u'{n'}$, and let $d(u)$ denote the degree of node $u$. Then $F \prec F'$ if and only if there exists an index $k$ such that $u_i = u'_i$ and $d(u_i) = d(u'_i)$ for all $i < k$, and either

  1. $k \le n$ and $k \le n'$ and $u_k \prec u'_k$, or
  2. $k > n$ and $k \le n'$, so that $F$ is exhausted while $F'$ is not, or
  3. $k \le n$ and $k > n'$, so that $F'$ is exhausted while $F$ is not, in which case $F' \prec F$.

Equivalently, two forests are compared by scanning their nodes in preorder, comparing the information in corresponding nodes; if the information is equal, the degrees are compared; if both information and degrees are equal, the comparison continues recursively among the children of the nodes, followed by the remaining trees in the forest.

Proof. The proof proceeds by induction on the number of nodes in the forests.

Base case. If $F$ or $F'$ is empty, the definition of $\prec$ immediately gives the correct ordering, since the empty forest is considered smaller than any nonempty forest.

Inductive step. Suppose $F = (T_1, T_2, \dots, T_n)$ and $F' = (T'_1, T'2, \dots, T'{n'})$ are both nonempty. Let $u_1 = \operatorname{root}(T_1)$ and $u'_1 = \operatorname{root}(T'_1)$. There are three possibilities:

  1. $u_1 \prec u'_1$. Then $F \prec F'$ by the first clause of the definition.
  2. $u_1 = u'_1$ but $d(u_1) \ne d(u'_1)$. Then $F \prec F'$ if $d(u_1) < d(u'_1)$ and $F' \prec F$ if $d(u_1) > d(u'_1)$.
  3. $u_1 = u'1$ and $d(u_1) = d(u'1)$. Let $C = (T{11}, \dots, T{1m})$ and $C' = (T'{11}, \dots, T'{1m'})$ be the subforests of children of $T_1$ and $T'_1$. By the inductive hypothesis, $C \prec C'$ if and only if the preorder sequences of $C$ and $C'$ satisfy the stated lexicographic condition. If $C = C'$, then the comparison reduces to the remaining trees $(T_2, \dots, T_n)$ and $(T'2, \dots, T'{n'})$, and again the inductive hypothesis applies.

In all cases, the theorem correctly describes the lexicographic ordering of forests in terms of preorder sequences and degrees. The recursive nature of the definition ensures that nodes are compared in preorder, their degrees are compared when node information is equal, and their subforests are compared recursively, exactly as required.

This completes the proof.

The theorem provides a direct analogue of Theorem 2.3.1A for forests, generalizing the binary-tree comparison to arbitrarily many trees in a forest. The key quantities are the preorder sequence of nodes and the degrees of the nodes, which together determine a lexicographic ordering of forests.