A Prufer sequence is a code for a labeled tree.
It converts a tree on labeled vertices into a finite sequence of labels. This sequence determines the tree uniquely. Conversely, every sequence of the correct length determines exactly one labeled tree.
Prufer sequences are important because they give a clean proof of Cayley’s formula, which states that there are labeled trees on vertices. They also show how vertex degrees are encoded in a labeled tree.
32.1 Labeled Trees
A labeled tree is a tree whose vertices have distinct labels.
Usually the labels are
The labels are part of the structure. Two trees may have the same shape but differ as labeled trees if the labels are assigned differently.
For example, the path
and the path
have the same unlabelled shape, but they are different labeled trees.
Prufer sequences apply to labeled trees. They do not directly encode unlabelled trees, because the labels determine the entries of the sequence.
32.2 Definition
Let be a labeled tree with vertex set
The Prufer sequence of is a sequence of length
formed by the following procedure.
At each step:
- Find the leaf with the smallest label.
- Record the label of its unique neighbor.
- Remove that leaf from the tree.
- Continue until only two vertices remain.
The recorded labels form the Prufer sequence of the tree.
If , the Prufer sequence has length . Thus the unique tree on two labeled vertices corresponds to the empty sequence.
32.3 Example
Consider the labeled tree with vertices
and edges
The tree has the form
with vertex also attached to vertex .
We compute its Prufer sequence.
At the first step, the leaves are
The smallest leaf is . Its neighbor is , so we record
Remove vertex .
The remaining leaves are
The smallest leaf is . Its neighbor is , so we record
Remove vertex .
The remaining tree has edges
The leaves are
The smallest leaf is . Its neighbor is , so we record
Remove vertex . Only vertices and remain.
Therefore the Prufer sequence is
Since , the sequence has length
32.4 Decoding a Prufer Sequence
A Prufer sequence can be converted back into a labeled tree.
Let
be a sequence whose entries belong to
The decoding algorithm constructs the unique tree with Prufer sequence .
At each step:
- Find the smallest label that does not appear in the current sequence.
- Connect that label to the first entry of the sequence.
- Remove the first entry from the sequence.
- Remove the chosen label from further consideration.
- Continue until the sequence is empty.
- Connect the two remaining labels.
This algorithm is inverse to the encoding procedure.
32.5 Decoding Example
Decode the sequence
on the vertex set
The current sequence is
The labels not appearing in the sequence are
The smallest is . Connect to the first sequence entry :
Remove from the labels and remove the first from the sequence.
The current sequence is
The available labels are
The labels not appearing in the sequence are
The smallest is . Connect to :
Remove and remove the first .
The current sequence is
The available labels are
The labels not appearing in the sequence are
The smallest is . Connect to :
Remove and remove from the sequence.
The sequence is now empty. The remaining labels are
Connect them:
The decoded tree has edge set
which is the original tree.
32.6 Why the Smallest Leaf Is Used
The encoding rule uses the smallest leaf at each step to make the procedure deterministic.
A tree may have several leaves. If one could choose any leaf, the same tree might produce different sequences. The smallest-label rule removes this ambiguity.
The decoding algorithm uses the same convention in reverse. At each step, the smallest label absent from the current sequence is the leaf that must be removed next.
Thus the smallest-label rule gives a canonical code.
32.7 The Bijection Theorem
The central theorem is that Prufer sequences and labeled trees are in one-to-one correspondence.
Theorem 32.1
For , the set of labeled trees on vertex set
is in bijection with the set of sequences of length whose entries belong to
Proof
The encoding procedure maps every labeled tree to a sequence of length . At each of the first steps, one leaf is removed and one neighbor label is recorded. When two vertices remain, the process stops.
The decoding procedure maps every sequence of length to a graph on the same labels. It adds one edge at each sequence step and one final edge at the end. Hence it creates exactly
edges.
At every decoding step, the chosen vertex is absent from the current sequence, so it will not be needed later as a neighbor forced by the code. It is connected once and then removed from further consideration. The first sequence entry remains available if it appears again later.
The construction never creates a cycle. Each new edge attaches one removed label to a label still available in the construction. Thus each step attaches a new leaf to the remaining partial structure.
After all steps, the two remaining labels are connected. The resulting graph is connected and has edges. Therefore it is a tree.
Encoding followed by decoding recovers the original tree, because the smallest removed leaf at each encoding step is exactly the smallest available label absent from the remaining code. Decoding followed by encoding similarly recovers the original sequence.
Therefore the two constructions are inverse maps. Hence they define a bijection.
32.8 Cayley’s Formula
Cayley’s formula counts labeled trees.
Theorem 32.2
The number of labeled trees on vertices is
Proof
By Theorem 32.1, labeled trees on
are in bijection with sequences of length over the same labels.
There are choices for each entry of the sequence. Since there are entries, the number of such sequences is
Therefore the number of labeled trees on vertices is
This is Cayley’s formula. Prufer sequences give one of its standard bijective proofs.
32.9 Degree Formula
Prufer sequences also encode vertex degrees.
Theorem 32.3
Let be a labeled tree with Prufer sequence . For each vertex ,
In words, the degree of a vertex equals one plus the number of times its label appears in the Prufer sequence.
Proof
Each time a vertex label appears in the Prufer sequence, it is recorded as the neighbor of a leaf being removed. This accounts for one incident edge of that vertex.
A vertex of degree has incident edges. During the leaf-removal process, all but one of those incident edges are removed while the vertex remains present as a neighbor. Therefore its label is recorded times.
Eventually, the vertex itself becomes a leaf or remains among the final two vertices. Its last incident edge is not recorded as an occurrence of its own label.
Thus the number of occurrences of in the Prufer sequence is
Rearranging gives
32.10 Leaves and Missing Labels
A vertex is a leaf exactly when its label does not appear in the Prufer sequence.
Indeed, a vertex has degree precisely when
By the degree formula, this occurs precisely when appears zero times in the sequence.
Thus the leaves of a labeled tree are exactly the labels absent from its Prufer sequence.
This observation explains the decoding algorithm: the smallest label absent from the current sequence must be the smallest current leaf.
32.11 Counting Trees with Prescribed Degrees
Prufer sequences count trees with a fixed degree sequence.
Suppose the desired degrees are
where each and
A labeled tree with these degrees must have label appearing exactly
times in its Prufer sequence.
Therefore the number of such trees is the multinomial number
This follows because the Prufer sequence has length , and its entries contain label exactly times.
32.12 Special Cases
Prufer sequences make several special cases transparent.
Stars
The star centered at vertex has degree
at , and degree at all other vertices.
Thus its Prufer sequence is
with copies of .
Paths
A labeled path has two leaves and all other vertices have degree .
Thus its Prufer sequence contains every internal vertex exactly once and contains no leaf label.
The order of those labels records how the path is assembled by the smallest-leaf rule.
Trees with Many Leaves
A tree has many leaves exactly when many labels are missing from its Prufer sequence.
Since the sequence has only positions, at least two labels are always missing or appear zero times in the appropriate degree sense. This agrees with the theorem that every finite tree with at least two vertices has at least two leaves.
32.13 Random Labeled Trees
Prufer sequences give a simple way to generate random labeled trees uniformly.
To generate a random labeled tree on
choose a sequence
uniformly from the possible sequences, then decode it.
Because the Prufer correspondence is a bijection, each labeled tree is obtained with the same probability.
This method is useful in simulations, probabilistic combinatorics, and testing graph algorithms.
32.14 Algorithmic Complexity
The direct encoding algorithm can be slow if one repeatedly searches all vertices for the smallest leaf.
A more efficient implementation maintains the current leaves in a priority queue. At each step, it removes the smallest leaf and updates the degree of its neighbor.
With a priority queue, encoding and decoding can be performed in
time.
With additional data structures and careful sorting assumptions, faster variants are possible, but the priority-queue version is usually the clearest.
32.15 Summary
A Prufer sequence is a canonical code for a labeled tree.
For a tree on labeled vertices, the sequence has length . It is obtained by repeatedly removing the smallest leaf and recording its neighbor.
Every valid sequence decodes to exactly one labeled tree. This gives a bijection between labeled trees on vertices and sequences of length over labels.
The bijection proves Cayley’s formula:
It also shows that the degree of a vertex is one plus the number of times its label appears in the sequence. Thus Prufer sequences encode both the structure and the degree data of labeled trees.