TAOCP 2.3.4.1 Exercise 6

Process the edges one at a time, maintaining a set of connected components of the vertices already joined by the selected edges.

Section 2.3.4.1: Free Trees

Exercise 6. [**] $$M27$$ Suppose a graph has $n$ vertices $V_1,\ldots,V_n$ and $m$ edges $e_1,\ldots,e_m$. Each edge $e$ is represented by a pair of integers $(a,b)$ if it joins $V_a$ to $V_b$. Design an algorithm that takes the input pairs $(a_1,b_1),\ldots,(a_m,b_m)$ and prints out a subset of edges that forms a free tree; the algorithm reports failure if this is impossible. Strive for an efficient algorithm.

Verified: yes
Solve time: 37s


Process the edges one at a time, maintaining a set of connected components of the vertices already joined by the selected edges. Initially each vertex forms a component by itself, and no edge has been selected. When the edge $(a_i,b_i)$ is examined, determine whether $V_{a_i}$ and $V_{b_i}$ lie in different components. If they do, print $e_i$ and merge the two components; if they lie in the same component, discard $e_i$. After all $m$ edges have been processed, report failure if more than one component remains.

The selected edges contain no cycle. Indeed, if an edge joins two vertices already in the same component, there is already a path between them, so adding that edge would create a cycle; such edges are discarded. Conversely, every printed edge joins two previously distinct components, so no cycle is introduced. If the algorithm terminates with a single component, the printed edges form a connected graph with no cycles, hence a free tree by Theorem A(a). Since each printed edge reduces the number of components by $1$, starting from $n$ components and ending with $1$, exactly $n-1$ edges are printed. If the algorithm terminates with more than one component, the original graph is disconnected, therefore no free tree spanning all vertices can exist.

For efficiency, represent the components by the union-find method. Then each edge requires two FIND operations and, when selected, one UNION operation, giving a running time essentially proportional to $m$. The storage requirement is proportional to $n$.