TAOCP 2.2.6 Exercise 24

Section 2.2.6: Arrays and Orthogonal Lists

Exercise 24. ▶ [**] [25] (The sparse array trick.) Suppose you want to use a large array for random access, although you won't actually be referring to very many of its entries. You want A[k] to be zero the first time you access it, yet you don't want to spend the time to set every location to zero. Explain how it is possible to read and write any desired elements A[k] reliably, given $k$, without assuming anything about the actual initial memory contents, by doing only a small fixed number of additional operations per array access.

We now turn to a study of trees, the most important nonlinear structures that arise in computer algorithms. Generally speaking, tree structure means a "branching" relationship between nodes, much like that found in the trees of nature.

Let us define a tree formally as a finite set $T$ of one or more nodes such that

a) there is one specially designated node called the root of the tree, $\operatorname{root}(T)$; and

b) the remaining nodes (excluding the root) are partitioned into $m \ge 0$ disjoint sets $T_1, \ldots, T_m$, and each of these sets in turn is a tree. The trees $T_1, \ldots, T_m$ are called the subtrees of the root.

The definition just given is recursive: We have defined a tree in terms of trees. Of course, there is no problem of circularity involved here, since trees with one node must consist of only the root, and trees with $n > 1$ nodes are defined in terms of trees with fewer than $n$ nodes; hence the concept of a tree with two nodes, three nodes, or ultimately any number of nodes, is determined by the definition given. There are nonrecursive ways to define trees (for example, see exercises 10, 12, and 14, and Section 2.3.4), but a recursive definition seems most appropriate since recursion is an innate characteristic of tree structures. The recursive character of trees is present also in nature, since buds on young trees eventually grow into subtrees with buds of their own, and so on. Exercise 3 illustrates how to give rigorous proofs of important facts about trees based on a recursive definition such as the one above, by using induction on the number of nodes in a tree.

It follows from our definition that every node of a tree is the root of some subtree contained in the whole tree. The number of subtrees of a node is called the degree of that node. A node of degree zero is called a terminal node, or sometimes a leaf. A nonterminal node is often called a branch node. The level of a node with respect to $T$ is defined recursively: The level of $\operatorname{root}(T)$ is zero, and the level of any other node is one higher than that node's level with respect to the subtree of $\operatorname{root}(T)$ containing it.

These concepts are illustrated in Fig. 15, which shows a tree with seven nodes. The root is A, and it has the two subtrees ${B}$ and ${C,D,E,F,G}$. The tree ${C,D,E,F,G}$ has node C as its root. Node C is on level 1 with respect to the whole tree, and it has three subtrees ${D}$, ${E}$, and ${F,G}$; therefore C has degree 3. The terminal nodes in Fig. 15 are B, D, E, and G; F is the only node with degree 1; G is the only node with level 3.

If the relative order of the subtrees $T_1, \ldots, T_m$ in (b) of the definition is important, we say that the tree is an ordered tree; when $m \ge 2$ in an ordered tree, it makes sense to call $T_2$ the "second subtree" of the root, etc. Ordered trees are also called "plane trees" by some authors, since the manner of embedding the tree in a plane is relevant. If we do not care to regard two trees as different when they differ only in the respective ordering of subtrees of nodes, the tree is said to be oriented, since only the relative orientation of the nodes, not their

Verified: no
Solve time: 37s

Maintain three auxiliary arrays:

$$ A[,],\qquad B[,],\qquad C[,], $$

together with a counter $t$, initially $0$. The array $A$ contains the desired values; $B$ and $C$ are used to determine whether a location has ever been assigned. The invariant is:

$$ A[k]\text{ is defined} \iff 1 \le B[k] \le t \text{ and } C[B[k]]=k. $$

To write a value $x$ into $A[k]$, first test whether $k$ is already defined by the above condition. If not, increase $t$ by $1$, set

$$ B[k]\leftarrow t,\qquad C[t]\leftarrow k, $$

and then set

$$ A[k]\leftarrow x. $$

To read $A[k]$, test the same condition. If it holds, return the stored value $A[k]$; otherwise return $0$.

Correctness follows from the invariant. Whenever a new location $k$ is first written, the pair of assignments $B[k]\leftarrow t$, $C[t]\leftarrow k$ makes the condition true for that $k$ and for no other index, since $C$ records exactly the indices that have been initialized. If a location has never been written, arbitrary initial memory contents cannot accidentally satisfy the test unless both $B[k]$ points to a valid position and $C[B[k]]=k$; this can happen only for initialized entries. Thus every unread location behaves as though initially zero, and each access requires only a fixed number of extra operations.