8.20 Tree Isomorphism
Two trees may look different at first glance yet represent exactly the same structure.
8.20 Tree Isomorphism
Two trees may look different at first glance yet represent exactly the same structure.
Consider:
Tree A
A
/ \\n B C
/
D
and:
Tree B
X
/ \\n Y Z
\\n W
The labels differ.
The node identifiers differ.
The drawing differs.
Yet structurally these trees are identical.
This raises an important question:
How can we determine whether two trees have the same shape?
This problem is known as Tree Isomorphism.
Tree isomorphism appears in:
- Compiler optimization
- Chemical structure analysis
- Pattern matching
- Knowledge graphs
- XML comparison
- Abstract syntax trees
- Phylogenetic trees
Unlike graph isomorphism, tree isomorphism admits elegant and efficient solutions.
Problem
Given two trees, determine whether they represent the same structure regardless of node labels.
Solution
Compute a canonical representation of each tree.
If the canonical forms match:
isomorphic
Otherwise:
not isomorphic
The most common approach assigns a structural signature to every subtree.
Leaves receive identical signatures.
Internal nodes derive signatures from their children.
Discussion
The central observation is simple.
A leaf always looks the same.
()
Suppose:
A
is a leaf.
Representation:
()
Now consider:
B
/
A
Representation:
(())
A parent wraps the representations of its children.
Larger trees emerge recursively.
Canonical Encoding
Consider:
R
/ \\n A B
Leaves:
A = ()
B = ()
Root:
(()())
The encoding describes structure only.
Node names disappear.
This is exactly what we want.
Recursive Algorithm
For rooted trees:
package main
import (
"sort"
"strings"
)
func encode(
tree [][]int,
node int,
parent int,
) string {
parts := []string{}
for _, child := range tree[node] {
if child == parent {
continue
}
parts = append(
parts,
encode(
tree,
child,
node,
),
)
}
sort.Strings(parts)
return "(" +
strings.Join(parts, "") +
")"
}
Two trees are isomorphic if their encodings are identical.
The sorting step is crucial.
Why Sorting Is Necessary
Consider:
R
/ \\n A B
and:
R
/ \\n B A
The trees are clearly identical.
Without sorting:
(A)(B)
and:
(B)(A)
would differ.
Sorting removes dependence on child ordering.
Only structure remains.
Rooted Tree Isomorphism
The previous algorithm assumes a fixed root.
Example:
Root = 0
Both trees must use equivalent roots.
Then:
encode(tree1)
==
encode(tree2)
determines isomorphism.
Complexity is polynomial and practical for large trees.
Unrooted Trees
Most tree-isomorphism problems involve unrooted trees.
The difficulty is choosing a root.
Consider:
0
|
1
|
2
|
3
Rooting at different nodes produces different encodings.
We need a canonical choice.
The solution uses tree centers.
Finding Tree Centers
Recall the diameter discussion.
Every tree has:
one center
or:
two centers
The center lies at the midpoint of the diameter.
Examples:
0-1-2-3-4
Center:
2
Example:
0-1-2-3
Centers:
1
2
These nodes provide canonical rooting locations.
Isomorphism for Unrooted Trees
Procedure:
- Find centers of tree A.
- Find centers of tree B.
- Root at centers.
- Compare encodings.
If either encoding matches:
Trees are isomorphic.
Otherwise:
Trees differ.
This removes dependence on arbitrary root choices.
Example
Tree A:
0
/ \\n 1 2
Tree B:
5
/ \\n 7 8
Encodings:
(()())
and:
(()())
Match.
The trees are isomorphic.
Labels do not matter.
Only structure matters.
Non-Isomorphic Example
Tree A:
0
/ \\n 1 2
Tree B:
0
|
1
|
2
Encodings:
(()())
and:
((()))
Differ.
The structures are not equivalent.
AHU Algorithm
The classical tree-isomorphism algorithm is often called the AHU algorithm after:
- entity["people","Alfred Aho","AHU algorithm co-author"]
- entity["people","John Hopcroft","AHU algorithm co-author"]
- entity["people","Jeffrey Ullman","AHU algorithm co-author"]
The recursive encoding method presented here is essentially the same idea.
The algorithm remains one of the most elegant examples of canonical labeling.
Hash-Based Encodings
String construction is simple but can become expensive.
An alternative assigns numeric hashes.
Example:
Leaf
→ 1
Then:
Node hash =
Hash(
sorted child hashes
)
This approach reduces memory usage.
Large-scale systems often prefer hashing.
The underlying concept remains unchanged.
Applications in Compilers
Consider two expressions.
(a+b)+(c+d)
and:
(x+y)+(z+w)
Abstract syntax trees may have identical structure.
Tree isomorphism helps identify common patterns.
Compilers use related ideas for optimization and transformation.
Applications in Chemistry
Chemical compounds can be represented as graphs and trees.
Substructures often need comparison.
Questions include:
Do these molecules
share the same shape?
Tree-isomorphism techniques provide efficient solutions when the structure is acyclic.
Applications in XML
Consider:
<book>
<title/>
<author/>
</book>
and:
<article>
<heading/>
<writer/>
</article>
Labels differ.
Structure matches.
Tree isomorphism detects this similarity.
Tree Isomorphism vs Graph Isomorphism
Tree isomorphism is significantly easier.
For trees:
Polynomial time
For general graphs:
Much harder
Trees lack cycles and possess recursive structure.
This makes canonical encoding possible.
The difference illustrates how structural constraints can dramatically simplify algorithm design.
Canonical Forms
A recurring theme in computer science is:
Convert object
to canonical form.
Examples:
- Sorted arrays
- Normalized strings
- Reduced fractions
- Canonical ASTs
- Tree encodings
Comparison then becomes trivial.
Tree isomorphism follows exactly this pattern.
Complexity Analysis
Let:
n = number of nodes
Subtree encoding:
| Operation | Complexity |
|---|---|
| DFS traversal | O(n) |
Sorting child encodings:
| Operation | Complexity |
|---|---|
| Total sorting | O(n log n) |
Overall:
| Operation | Complexity |
|---|---|
| Rooted isomorphism | O(n log n) |
| Unrooted isomorphism | O(n log n) |
Memory:
| Structure | Complexity |
|---|---|
| Encodings | O(n) |
More advanced hashing approaches can reduce practical costs further.
Common Mistakes
A common mistake is forgetting to sort child encodings.
Without sorting, equivalent trees may appear different.
Another mistake is comparing unrooted trees using arbitrary roots.
Different root choices often produce different representations.
A third mistake is assuming node labels matter. Tree isomorphism ignores labels unless the problem explicitly includes them.
Finally, string-based encodings can become large for very deep trees. Hash-based approaches may be preferable when scale becomes important.
Takeaway
Tree Isomorphism determines whether two trees share the same structure regardless of labels. By recursively constructing canonical representations of subtrees and comparing the resulting encodings, the problem becomes surprisingly elegant. The technique highlights a recurring algorithmic principle: when direct comparison is difficult, transform objects into canonical forms and compare those instead. Tree isomorphism serves as both a practical tool and a beautiful example of recursive algorithm design.