TAOCP 2.3.4.4 Exercise 18

Section 2.3.4.4: Enumeration of Trees

Exercise 18. [**] [24] Show that the following method is another way to define a one-to-one correspondence between $(n-1)$-tuples of numbers from 1 to $n$ and oriented trees with $n$ labeled vertices: Let the leaves of the tree be $V_1,\ldots,V_k$ in ascending order. Let $(V_1,V_{k+1},V_{k+2},\ldots,V_q)$ be the path from $V_1$ to the root, and write down the vertices $V_q,\ldots,V_{k+2},V_{k+1}$. Then let $(V_2,V_{q+1},V_{q+2},\ldots,V_r)$ be the shortest oriented path from $V_2$ such that $V_r$ has already been written down, and write down $V_r,\ldots,V_{q+2},V_{q+1}$. Then let $(V_3,V_{r+1},\ldots,V_s)$ be the shortest oriented path from $V_3$ such that $V_s$ has already been written, and write $V_s,\ldots,V_{r+1}$; and so on. For example, the tree (17) would be encoded as $3,1,3,3,5,10,5,10,1$. Show that this process is reversible, and in particular, draw the oriented tree with vertices ${1,2,\ldots,10}$ and representation $3,1,4,1,5,9,2,6,5$.

Verified: no
Solve time: 3m17s

Let the vertices be labeled $1,\dots,n$, and let every edge be oriented toward the root.

We first reformulate the encoding in a way that makes inversion transparent.

Suppose the leaves, in increasing order, are

$$ V_1<V_2<\cdots<V_k . $$

For each leaf $V_i$, let $P_i$ be the shortest oriented path from $V_i$ to a vertex already reached earlier. For $i=1$, the previously reached vertex is the root. Thus

$$ P_i=(V_i,x_1,x_2,\dots,x_t,y), $$

where $y$ is the first vertex already present in the part of the tree constructed from earlier leaves.

The encoding writes

$$ y,x_t,\dots,x_2,x_1. $$

Observe that every nonleaf vertex is written exactly once, namely when it first appears on one of these paths. In addition, for each leaf except $V_1$, the attachment vertex $y$ is written once more at the beginning of the corresponding block.

Since a tree with $n$ vertices and $k$ leaves has $n-k$ nonleaves, the total number of symbols written is

$$ (n-k)+(k-1)=n-1. $$

Hence the code has length $n-1$.

The inverse construction

Let

$$ a_1,a_2,\dots,a_{n-1} $$

be a code.

The key observation is that the leaves can be identified directly from the code.

Every nonleaf vertex appears somewhere in the code, because it is first discovered on one of the paths $P_i$.

A leaf never appears as a newly discovered vertex on such a path. Therefore the leaves are precisely the labels from ${1,\dots,n}$ that do not occur in the code.

Let those missing labels be

$$ V_1<V_2<\cdots<V_k . $$

These are exactly the leaves in the order used by the encoder.

Now process the code from left to right.

Maintain a set $T$ of vertices already reconstructed.

First block

The first leaf is $V_1$.

Beginning at $a_1$, read symbols until every vertex encountered has been seen for the first time.

Suppose this initial segment is

$$ a_1,a_2,\dots,a_m, $$

and $a_{m+1}$ (if it exists) is the first symbol that has already occurred among $a_1,\dots,a_m$.

Then

$$ V_1\to a_m\to a_{m-1}\to\cdots\to a_1 $$

is exactly the path from $V_1$ to the root.

Indeed, during the first step of the encoding every symbol written corresponds to a vertex being discovered for the first time, so repetitions cannot occur inside the first block.

Add these vertices to $T$.

Subsequent blocks

Suppose leaves $V_1,\dots,V_{i-1}$ have already been reconstructed.

The next unread symbol is the attachment vertex $y$, which is already in $T$.

Continue reading until just before the next occurrence of a vertex already seen for the first time in the current block, or until the code ends.

If the block is

$$ y,b_1,b_2,\dots,b_t, $$

then $b_1,\dots,b_t$ are new vertices, and the corresponding path is

$$ V_i\to b_t\to b_{t-1}\to\cdots\to b_1\to y. $$

Add the new vertices $b_j$ to $T$.

This recovers exactly the path that produced the block. Since every block introduces all previously unseen vertices on one leaf-to-tree path, the reconstruction is unique. Therefore the encoding is reversible, giving a bijection between $(n-1)$-tuples and rooted oriented trees on $n$ labeled vertices.

Decoding $3,1,4,1,5,9,2,6,5$

Here $n=10$.

The labels appearing in the code are

$$ {1,2,3,4,5,6,9}. $$

Hence the missing labels are

$$ 7,8,10. $$

Therefore the leaves are

$$ V_1=7,\qquad V_2=8,\qquad V_3=10. $$

First block

Starting from the beginning of the code,

$$ 3,1,4 $$

are all first occurrences. The next symbol is $1$, which has already appeared.

Therefore the first block is

$$ 3,1,4. $$

This yields the path

$$ 7\to4\to1\to3. $$

Hence

$$ 4\to1,\qquad 1\to3. $$

The root is $3$.

The reconstructed set is currently

$$ T={1,3,4}. $$

Second block

The next unread symbol is

$$ 1. $$

Since $1\in T$, it is the attachment vertex.

Continue reading:

$$ 1,5,9,2,6. $$

The next symbol is $5$, already encountered in this block, so the block ends.

Thus the path is

$$ 8\to6\to2\to9\to5\to1. $$

Therefore

$$ 6\to2,\qquad 2\to9,\qquad 9\to5,\qquad 5\to1. $$

Now

$$ T={1,2,3,4,5,6,9}. $$

Third block

The remaining code is

$$ 5. $$

Since $5\in T$, this is an attachment vertex and there are no new internal vertices.

Hence

$$ 10\to5. $$

The reconstructed tree

Collecting all parent relations:

$$ 4\to1,\qquad 1\to3, $$

$$ 5\to1,\qquad 9\to5,\qquad 2\to9,\qquad 6\to2, $$

$$ 7\to4,\qquad 8\to6,\qquad 10\to5. $$

The root is $3$.

In rooted-tree form:

$$ 3 $$

$$ \phantom{3}\uparrow $$

$$ 1 $$

with children $4$ and $5$,

$$ 4 $$

having child $7$,

and

$$ 5 $$

having children $9$ and $10$,

while

$$ 9 $$

has child $2$,

$$ 2 $$

has child $6$,

and

$$ 6 $$

has child $8$.

Equivalently, the oriented edges toward the root are

$$ 7\to4\to1\to3, $$

$$ 8\to6\to2\to9\to5\to1\to3, $$

$$ 10\to5\to1\to3. $$

Finally, re-encoding this tree gives:

from leaf $7$: $7\to4\to1\to3$, write $3,1,4$;
from leaf $8$: shortest path to an already written vertex is $8\to6\to2\to9\to5\to1$, write $1,5,9,2,6$;
from leaf $10$: shortest path to an already written vertex is $10\to5$, write $5$.

Thus the code is

$$ 3,1,4,1,5,9,2,6,5, $$

exactly as required. Therefore the process is reversible and the decoded tree is the one above.