Skip to content

Chapter 32. Prufer Sequences

A Prufer sequence is a code for a labeled tree.

It converts a tree on labeled vertices into a finite sequence of labels. This sequence determines the tree uniquely. Conversely, every sequence of the correct length determines exactly one labeled tree.

Prufer sequences are important because they give a clean proof of Cayley’s formula, which states that there are nn2n^{n-2} labeled trees on nn vertices. They also show how vertex degrees are encoded in a labeled tree.

32.1 Labeled Trees

A labeled tree is a tree whose vertices have distinct labels.

Usually the labels are

1,2,,n. 1,2,\ldots,n.

The labels are part of the structure. Two trees may have the same shape but differ as labeled trees if the labels are assigned differently.

For example, the path

123 1-2-3

and the path

132 1-3-2

have the same unlabelled shape, but they are different labeled trees.

Prufer sequences apply to labeled trees. They do not directly encode unlabelled trees, because the labels determine the entries of the sequence.

32.2 Definition

Let TT be a labeled tree with vertex set

{1,2,,n}. \{1,2,\ldots,n\}.

The Prufer sequence of TT is a sequence of length

n2 n-2

formed by the following procedure.

At each step:

  1. Find the leaf with the smallest label.
  2. Record the label of its unique neighbor.
  3. Remove that leaf from the tree.
  4. Continue until only two vertices remain.

The recorded labels form the Prufer sequence of the tree.

If n=2n=2, the Prufer sequence has length 00. Thus the unique tree on two labeled vertices corresponds to the empty sequence.

32.3 Example

Consider the labeled tree with vertices

{1,2,3,4,5} \{1,2,3,4,5\}

and edges

{1,3},{2,3},{3,4},{4,5}. \{1,3\},\{2,3\},\{3,4\},\{4,5\}.

The tree has the form

1345 1-3-4-5

with vertex 22 also attached to vertex 33.

We compute its Prufer sequence.

At the first step, the leaves are

1,2,5. 1,2,5.

The smallest leaf is 11. Its neighbor is 33, so we record

3. 3.

Remove vertex 11.

The remaining leaves are

2,5. 2,5.

The smallest leaf is 22. Its neighbor is 33, so we record

3. 3.

Remove vertex 22.

The remaining tree has edges

{3,4},{4,5}. \{3,4\},\{4,5\}.

The leaves are

3,5. 3,5.

The smallest leaf is 33. Its neighbor is 44, so we record

4. 4.

Remove vertex 33. Only vertices 44 and 55 remain.

Therefore the Prufer sequence is

(3,3,4). (3,3,4).

Since n=5n=5, the sequence has length

52=3. 5-2=3.

32.4 Decoding a Prufer Sequence

A Prufer sequence can be converted back into a labeled tree.

Let

P=(p1,p2,,pn2) P=(p_1,p_2,\ldots,p_{n-2})

be a sequence whose entries belong to

{1,2,,n}. \{1,2,\ldots,n\}.

The decoding algorithm constructs the unique tree with Prufer sequence PP.

At each step:

  1. Find the smallest label that does not appear in the current sequence.
  2. Connect that label to the first entry of the sequence.
  3. Remove the first entry from the sequence.
  4. Remove the chosen label from further consideration.
  5. Continue until the sequence is empty.
  6. Connect the two remaining labels.

This algorithm is inverse to the encoding procedure.

32.5 Decoding Example

Decode the sequence

(3,3,4) (3,3,4)

on the vertex set

{1,2,3,4,5}. \{1,2,3,4,5\}.

The current sequence is

(3,3,4). (3,3,4).

The labels not appearing in the sequence are

1,2,5. 1,2,5.

The smallest is 11. Connect 11 to the first sequence entry 33:

{1,3}. \{1,3\}.

Remove 11 from the labels and remove the first 33 from the sequence.

The current sequence is

(3,4). (3,4).

The available labels are

2,3,4,5. 2,3,4,5.

The labels not appearing in the sequence are

2,5. 2,5.

The smallest is 22. Connect 22 to 33:

{2,3}. \{2,3\}.

Remove 22 and remove the first 33.

The current sequence is

(4). (4).

The available labels are

3,4,5. 3,4,5.

The labels not appearing in the sequence are

3,5. 3,5.

The smallest is 33. Connect 33 to 44:

{3,4}. \{3,4\}.

Remove 33 and remove 44 from the sequence.

The sequence is now empty. The remaining labels are

4,5. 4,5.

Connect them:

{4,5}. \{4,5\}.

The decoded tree has edge set

{1,3},{2,3},{3,4},{4,5}, \{1,3\},\{2,3\},\{3,4\},\{4,5\},

which is the original tree.

32.6 Why the Smallest Leaf Is Used

The encoding rule uses the smallest leaf at each step to make the procedure deterministic.

A tree may have several leaves. If one could choose any leaf, the same tree might produce different sequences. The smallest-label rule removes this ambiguity.

The decoding algorithm uses the same convention in reverse. At each step, the smallest label absent from the current sequence is the leaf that must be removed next.

Thus the smallest-label rule gives a canonical code.

32.7 The Bijection Theorem

The central theorem is that Prufer sequences and labeled trees are in one-to-one correspondence.

Theorem 32.1

For n2n \geq 2, the set of labeled trees on vertex set

{1,2,,n} \{1,2,\ldots,n\}

is in bijection with the set of sequences of length n2n-2 whose entries belong to

{1,2,,n}. \{1,2,\ldots,n\}.

Proof

The encoding procedure maps every labeled tree to a sequence of length n2n-2. At each of the first n2n-2 steps, one leaf is removed and one neighbor label is recorded. When two vertices remain, the process stops.

The decoding procedure maps every sequence of length n2n-2 to a graph on the same nn labels. It adds one edge at each sequence step and one final edge at the end. Hence it creates exactly

(n2)+1=n1 (n-2)+1=n-1

edges.

At every decoding step, the chosen vertex is absent from the current sequence, so it will not be needed later as a neighbor forced by the code. It is connected once and then removed from further consideration. The first sequence entry remains available if it appears again later.

The construction never creates a cycle. Each new edge attaches one removed label to a label still available in the construction. Thus each step attaches a new leaf to the remaining partial structure.

After all steps, the two remaining labels are connected. The resulting graph is connected and has n1n-1 edges. Therefore it is a tree.

Encoding followed by decoding recovers the original tree, because the smallest removed leaf at each encoding step is exactly the smallest available label absent from the remaining code. Decoding followed by encoding similarly recovers the original sequence.

Therefore the two constructions are inverse maps. Hence they define a bijection.

32.8 Cayley’s Formula

Cayley’s formula counts labeled trees.

Theorem 32.2

The number of labeled trees on nn vertices is

nn2. n^{n-2}.

Proof

By Theorem 32.1, labeled trees on

{1,2,,n} \{1,2,\ldots,n\}

are in bijection with sequences of length n2n-2 over the same nn labels.

There are nn choices for each entry of the sequence. Since there are n2n-2 entries, the number of such sequences is

nn2. n^{n-2}.

Therefore the number of labeled trees on nn vertices is

nn2. n^{n-2}.

This is Cayley’s formula. Prufer sequences give one of its standard bijective proofs.

32.9 Degree Formula

Prufer sequences also encode vertex degrees.

Theorem 32.3

Let TT be a labeled tree with Prufer sequence PP. For each vertex ii,

degT(i)=1+#{j:pj=i}. \deg_T(i)=1+\#\{j:p_j=i\}.

In words, the degree of a vertex equals one plus the number of times its label appears in the Prufer sequence.

Proof

Each time a vertex label appears in the Prufer sequence, it is recorded as the neighbor of a leaf being removed. This accounts for one incident edge of that vertex.

A vertex of degree dd has dd incident edges. During the leaf-removal process, all but one of those incident edges are removed while the vertex remains present as a neighbor. Therefore its label is recorded d1d-1 times.

Eventually, the vertex itself becomes a leaf or remains among the final two vertices. Its last incident edge is not recorded as an occurrence of its own label.

Thus the number of occurrences of ii in the Prufer sequence is

degT(i)1. \deg_T(i)-1.

Rearranging gives

degT(i)=1+#{j:pj=i}. \deg_T(i)=1+\#\{j:p_j=i\}.

32.10 Leaves and Missing Labels

A vertex is a leaf exactly when its label does not appear in the Prufer sequence.

Indeed, a vertex has degree 11 precisely when

degT(i)=1. \deg_T(i)=1.

By the degree formula, this occurs precisely when ii appears zero times in the sequence.

Thus the leaves of a labeled tree are exactly the labels absent from its Prufer sequence.

This observation explains the decoding algorithm: the smallest label absent from the current sequence must be the smallest current leaf.

32.11 Counting Trees with Prescribed Degrees

Prufer sequences count trees with a fixed degree sequence.

Suppose the desired degrees are

d1,d2,,dn d_1,d_2,\ldots,d_n

where each di1d_i \geq 1 and

d1+d2++dn=2(n1). d_1+d_2+\cdots+d_n = 2(n-1).

A labeled tree with these degrees must have label ii appearing exactly

di1 d_i-1

times in its Prufer sequence.

Therefore the number of such trees is the multinomial number

(n2)!(d11)!(d21)!(dn1)!. \frac{(n-2)!}{(d_1-1)!(d_2-1)!\cdots(d_n-1)!}.

This follows because the Prufer sequence has length n2n-2, and its entries contain label ii exactly di1d_i-1 times.

32.12 Special Cases

Prufer sequences make several special cases transparent.

Stars

The star centered at vertex cc has degree

n1 n-1

at cc, and degree 11 at all other vertices.

Thus its Prufer sequence is

(c,c,,c), (c,c,\ldots,c),

with n2n-2 copies of cc.

Paths

A labeled path has two leaves and all other vertices have degree 22.

Thus its Prufer sequence contains every internal vertex exactly once and contains no leaf label.

The order of those labels records how the path is assembled by the smallest-leaf rule.

Trees with Many Leaves

A tree has many leaves exactly when many labels are missing from its Prufer sequence.

Since the sequence has only n2n-2 positions, at least two labels are always missing or appear zero times in the appropriate degree sense. This agrees with the theorem that every finite tree with at least two vertices has at least two leaves.

32.13 Random Labeled Trees

Prufer sequences give a simple way to generate random labeled trees uniformly.

To generate a random labeled tree on

{1,2,,n}, \{1,2,\ldots,n\},

choose a sequence

P=(p1,,pn2) P=(p_1,\ldots,p_{n-2})

uniformly from the nn2n^{n-2} possible sequences, then decode it.

Because the Prufer correspondence is a bijection, each labeled tree is obtained with the same probability.

This method is useful in simulations, probabilistic combinatorics, and testing graph algorithms.

32.14 Algorithmic Complexity

The direct encoding algorithm can be slow if one repeatedly searches all vertices for the smallest leaf.

A more efficient implementation maintains the current leaves in a priority queue. At each step, it removes the smallest leaf and updates the degree of its neighbor.

With a priority queue, encoding and decoding can be performed in

O(nlogn) O(n\log n)

time.

With additional data structures and careful sorting assumptions, faster variants are possible, but the priority-queue version is usually the clearest.

32.15 Summary

A Prufer sequence is a canonical code for a labeled tree.

For a tree on nn labeled vertices, the sequence has length n2n-2. It is obtained by repeatedly removing the smallest leaf and recording its neighbor.

Every valid sequence decodes to exactly one labeled tree. This gives a bijection between labeled trees on nn vertices and sequences of length n2n-2 over nn labels.

The bijection proves Cayley’s formula:

nn2. n^{n-2}.

It also shows that the degree of a vertex is one plus the number of times its label appears in the sequence. Thus Prufer sequences encode both the structure and the degree data of labeled trees.