# Chapter 32. Prufer Sequences

# Chapter 32. Prufer Sequences

A Prufer sequence is a code for a labeled tree.

It converts a tree on labeled vertices into a finite sequence of labels. This sequence determines the tree uniquely. Conversely, every sequence of the correct length determines exactly one labeled tree.

Prufer sequences are important because they give a clean proof of Cayley's formula, which states that there are \(n^{n-2}\) labeled trees on \(n\) vertices. They also show how vertex degrees are encoded in a labeled tree.

## 32.1 Labeled Trees

A labeled tree is a tree whose vertices have distinct labels.

Usually the labels are

$$
1,2,\ldots,n.
$$

The labels are part of the structure. Two trees may have the same shape but differ as labeled trees if the labels are assigned differently.

For example, the path

$$
1-2-3
$$

and the path

$$
1-3-2
$$

have the same unlabelled shape, but they are different labeled trees.

Prufer sequences apply to labeled trees. They do not directly encode unlabelled trees, because the labels determine the entries of the sequence.

## 32.2 Definition

Let \(T\) be a labeled tree with vertex set

$$
\{1,2,\ldots,n\}.
$$

The Prufer sequence of \(T\) is a sequence of length

$$
n-2
$$

formed by the following procedure.

At each step:

1. Find the leaf with the smallest label.
2. Record the label of its unique neighbor.
3. Remove that leaf from the tree.
4. Continue until only two vertices remain.

The recorded labels form the Prufer sequence of the tree.

If \(n=2\), the Prufer sequence has length \(0\). Thus the unique tree on two labeled vertices corresponds to the empty sequence.

## 32.3 Example

Consider the labeled tree with vertices

$$
\{1,2,3,4,5\}
$$

and edges

$$
\{1,3\},\{2,3\},\{3,4\},\{4,5\}.
$$

The tree has the form

$$
1-3-4-5
$$

with vertex \(2\) also attached to vertex \(3\).

We compute its Prufer sequence.

At the first step, the leaves are

$$
1,2,5.
$$

The smallest leaf is \(1\). Its neighbor is \(3\), so we record

$$
3.
$$

Remove vertex \(1\).

The remaining leaves are

$$
2,5.
$$

The smallest leaf is \(2\). Its neighbor is \(3\), so we record

$$
3.
$$

Remove vertex \(2\).

The remaining tree has edges

$$
\{3,4\},\{4,5\}.
$$

The leaves are

$$
3,5.
$$

The smallest leaf is \(3\). Its neighbor is \(4\), so we record

$$
4.
$$

Remove vertex \(3\). Only vertices \(4\) and \(5\) remain.

Therefore the Prufer sequence is

$$
(3,3,4).
$$

Since \(n=5\), the sequence has length

$$
5-2=3.
$$

## 32.4 Decoding a Prufer Sequence

A Prufer sequence can be converted back into a labeled tree.

Let

$$
P=(p_1,p_2,\ldots,p_{n-2})
$$

be a sequence whose entries belong to

$$
\{1,2,\ldots,n\}.
$$

The decoding algorithm constructs the unique tree with Prufer sequence \(P\).

At each step:

1. Find the smallest label that does not appear in the current sequence.
2. Connect that label to the first entry of the sequence.
3. Remove the first entry from the sequence.
4. Remove the chosen label from further consideration.
5. Continue until the sequence is empty.
6. Connect the two remaining labels.

This algorithm is inverse to the encoding procedure.

## 32.5 Decoding Example

Decode the sequence

$$
(3,3,4)
$$

on the vertex set

$$
\{1,2,3,4,5\}.
$$

The current sequence is

$$
(3,3,4).
$$

The labels not appearing in the sequence are

$$
1,2,5.
$$

The smallest is \(1\). Connect \(1\) to the first sequence entry \(3\):

$$
\{1,3\}.
$$

Remove \(1\) from the labels and remove the first \(3\) from the sequence.

The current sequence is

$$
(3,4).
$$

The available labels are

$$
2,3,4,5.
$$

The labels not appearing in the sequence are

$$
2,5.
$$

The smallest is \(2\). Connect \(2\) to \(3\):

$$
\{2,3\}.
$$

Remove \(2\) and remove the first \(3\).

The current sequence is

$$
(4).
$$

The available labels are

$$
3,4,5.
$$

The labels not appearing in the sequence are

$$
3,5.
$$

The smallest is \(3\). Connect \(3\) to \(4\):

$$
\{3,4\}.
$$

Remove \(3\) and remove \(4\) from the sequence.

The sequence is now empty. The remaining labels are

$$
4,5.
$$

Connect them:

$$
\{4,5\}.
$$

The decoded tree has edge set

$$
\{1,3\},\{2,3\},\{3,4\},\{4,5\},
$$

which is the original tree.

## 32.6 Why the Smallest Leaf Is Used

The encoding rule uses the smallest leaf at each step to make the procedure deterministic.

A tree may have several leaves. If one could choose any leaf, the same tree might produce different sequences. The smallest-label rule removes this ambiguity.

The decoding algorithm uses the same convention in reverse. At each step, the smallest label absent from the current sequence is the leaf that must be removed next.

Thus the smallest-label rule gives a canonical code.

## 32.7 The Bijection Theorem

The central theorem is that Prufer sequences and labeled trees are in one-to-one correspondence.

### Theorem 32.1

For \(n \geq 2\), the set of labeled trees on vertex set

$$
\{1,2,\ldots,n\}
$$

is in bijection with the set of sequences of length \(n-2\) whose entries belong to

$$
\{1,2,\ldots,n\}.
$$

### Proof

The encoding procedure maps every labeled tree to a sequence of length \(n-2\). At each of the first \(n-2\) steps, one leaf is removed and one neighbor label is recorded. When two vertices remain, the process stops.

The decoding procedure maps every sequence of length \(n-2\) to a graph on the same \(n\) labels. It adds one edge at each sequence step and one final edge at the end. Hence it creates exactly

$$
(n-2)+1=n-1
$$

edges.

At every decoding step, the chosen vertex is absent from the current sequence, so it will not be needed later as a neighbor forced by the code. It is connected once and then removed from further consideration. The first sequence entry remains available if it appears again later.

The construction never creates a cycle. Each new edge attaches one removed label to a label still available in the construction. Thus each step attaches a new leaf to the remaining partial structure.

After all steps, the two remaining labels are connected. The resulting graph is connected and has \(n-1\) edges. Therefore it is a tree.

Encoding followed by decoding recovers the original tree, because the smallest removed leaf at each encoding step is exactly the smallest available label absent from the remaining code. Decoding followed by encoding similarly recovers the original sequence.

Therefore the two constructions are inverse maps. Hence they define a bijection.

## 32.8 Cayley's Formula

Cayley's formula counts labeled trees.

### Theorem 32.2

The number of labeled trees on \(n\) vertices is

$$
n^{n-2}.
$$

### Proof

By Theorem 32.1, labeled trees on

$$
\{1,2,\ldots,n\}
$$

are in bijection with sequences of length \(n-2\) over the same \(n\) labels.

There are \(n\) choices for each entry of the sequence. Since there are \(n-2\) entries, the number of such sequences is

$$
n^{n-2}.
$$

Therefore the number of labeled trees on \(n\) vertices is

$$
n^{n-2}.
$$

This is Cayley's formula. Prufer sequences give one of its standard bijective proofs.

## 32.9 Degree Formula

Prufer sequences also encode vertex degrees.

### Theorem 32.3

Let \(T\) be a labeled tree with Prufer sequence \(P\). For each vertex \(i\),

$$
\deg_T(i)=1+\#\{j:p_j=i\}.
$$

In words, the degree of a vertex equals one plus the number of times its label appears in the Prufer sequence.

### Proof

Each time a vertex label appears in the Prufer sequence, it is recorded as the neighbor of a leaf being removed. This accounts for one incident edge of that vertex.

A vertex of degree \(d\) has \(d\) incident edges. During the leaf-removal process, all but one of those incident edges are removed while the vertex remains present as a neighbor. Therefore its label is recorded \(d-1\) times.

Eventually, the vertex itself becomes a leaf or remains among the final two vertices. Its last incident edge is not recorded as an occurrence of its own label.

Thus the number of occurrences of \(i\) in the Prufer sequence is

$$
\deg_T(i)-1.
$$

Rearranging gives

$$
\deg_T(i)=1+\#\{j:p_j=i\}.
$$

## 32.10 Leaves and Missing Labels

A vertex is a leaf exactly when its label does not appear in the Prufer sequence.

Indeed, a vertex has degree \(1\) precisely when

$$
\deg_T(i)=1.
$$

By the degree formula, this occurs precisely when \(i\) appears zero times in the sequence.

Thus the leaves of a labeled tree are exactly the labels absent from its Prufer sequence.

This observation explains the decoding algorithm: the smallest label absent from the current sequence must be the smallest current leaf.

## 32.11 Counting Trees with Prescribed Degrees

Prufer sequences count trees with a fixed degree sequence.

Suppose the desired degrees are

$$
d_1,d_2,\ldots,d_n
$$

where each \(d_i \geq 1\) and

$$
d_1+d_2+\cdots+d_n = 2(n-1).
$$

A labeled tree with these degrees must have label \(i\) appearing exactly

$$
d_i-1
$$

times in its Prufer sequence.

Therefore the number of such trees is the multinomial number

$$
\frac{(n-2)!}{(d_1-1)!(d_2-1)!\cdots(d_n-1)!}.
$$

This follows because the Prufer sequence has length \(n-2\), and its entries contain label \(i\) exactly \(d_i-1\) times.

## 32.12 Special Cases

Prufer sequences make several special cases transparent.

### Stars

The star centered at vertex \(c\) has degree

$$
n-1
$$

at \(c\), and degree \(1\) at all other vertices.

Thus its Prufer sequence is

$$
(c,c,\ldots,c),
$$

with \(n-2\) copies of \(c\).

### Paths

A labeled path has two leaves and all other vertices have degree \(2\).

Thus its Prufer sequence contains every internal vertex exactly once and contains no leaf label.

The order of those labels records how the path is assembled by the smallest-leaf rule.

### Trees with Many Leaves

A tree has many leaves exactly when many labels are missing from its Prufer sequence.

Since the sequence has only \(n-2\) positions, at least two labels are always missing or appear zero times in the appropriate degree sense. This agrees with the theorem that every finite tree with at least two vertices has at least two leaves.

## 32.13 Random Labeled Trees

Prufer sequences give a simple way to generate random labeled trees uniformly.

To generate a random labeled tree on

$$
\{1,2,\ldots,n\},
$$

choose a sequence

$$
P=(p_1,\ldots,p_{n-2})
$$

uniformly from the \(n^{n-2}\) possible sequences, then decode it.

Because the Prufer correspondence is a bijection, each labeled tree is obtained with the same probability.

This method is useful in simulations, probabilistic combinatorics, and testing graph algorithms.

## 32.14 Algorithmic Complexity

The direct encoding algorithm can be slow if one repeatedly searches all vertices for the smallest leaf.

A more efficient implementation maintains the current leaves in a priority queue. At each step, it removes the smallest leaf and updates the degree of its neighbor.

With a priority queue, encoding and decoding can be performed in

$$
O(n\log n)
$$

time.

With additional data structures and careful sorting assumptions, faster variants are possible, but the priority-queue version is usually the clearest.

## 32.15 Summary

A Prufer sequence is a canonical code for a labeled tree.

For a tree on \(n\) labeled vertices, the sequence has length \(n-2\). It is obtained by repeatedly removing the smallest leaf and recording its neighbor.

Every valid sequence decodes to exactly one labeled tree. This gives a bijection between labeled trees on \(n\) vertices and sequences of length \(n-2\) over \(n\) labels.

The bijection proves Cayley's formula:

$$
n^{n-2}.
$$

It also shows that the degree of a vertex is one plus the number of times its label appears in the sequence. Thus Prufer sequences encode both the structure and the degree data of labeled trees.
