8.20 Tree Isomorphism

Two trees may look different at first glance yet represent exactly the same structure.

8.20 Tree Isomorphism

Two trees may look different at first glance yet represent exactly the same structure.

Consider:

Tree A

    A
   / \\n  B   C
     /
    D

and:

Tree B

    X
   / \\n  Y   Z
       \\n        W

The labels differ.

The node identifiers differ.

The drawing differs.

Yet structurally these trees are identical.

This raises an important question:

How can we determine whether two trees have the same shape?

This problem is known as Tree Isomorphism.

Tree isomorphism appears in:

  • Compiler optimization
  • Chemical structure analysis
  • Pattern matching
  • Knowledge graphs
  • XML comparison
  • Abstract syntax trees
  • Phylogenetic trees

Unlike graph isomorphism, tree isomorphism admits elegant and efficient solutions.

Problem

Given two trees, determine whether they represent the same structure regardless of node labels.

Solution

Compute a canonical representation of each tree.

If the canonical forms match:

isomorphic

Otherwise:

not isomorphic

The most common approach assigns a structural signature to every subtree.

Leaves receive identical signatures.

Internal nodes derive signatures from their children.

Discussion

The central observation is simple.

A leaf always looks the same.

()

Suppose:

A

is a leaf.

Representation:

()

Now consider:

   B
  /
 A

Representation:

(())

A parent wraps the representations of its children.

Larger trees emerge recursively.

Canonical Encoding

Consider:

      R
     / \\n    A   B

Leaves:

A = ()
B = ()

Root:

(()())

The encoding describes structure only.

Node names disappear.

This is exactly what we want.

Recursive Algorithm

For rooted trees:

package main

import (
    "sort"
    "strings"
)

func encode(
    tree [][]int,
    node int,
    parent int,
) string {
    parts := []string{}

    for _, child := range tree[node] {
        if child == parent {
            continue
        }

        parts = append(
            parts,
            encode(
                tree,
                child,
                node,
            ),
        )
    }

    sort.Strings(parts)

    return "(" +
        strings.Join(parts, "") +
        ")"
}

Two trees are isomorphic if their encodings are identical.

The sorting step is crucial.

Why Sorting Is Necessary

Consider:

    R
   / \\n  A   B

and:

    R
   / \\n  B   A

The trees are clearly identical.

Without sorting:

(A)(B)

and:

(B)(A)

would differ.

Sorting removes dependence on child ordering.

Only structure remains.

Rooted Tree Isomorphism

The previous algorithm assumes a fixed root.

Example:

Root = 0

Both trees must use equivalent roots.

Then:

encode(tree1)
==
encode(tree2)

determines isomorphism.

Complexity is polynomial and practical for large trees.

Unrooted Trees

Most tree-isomorphism problems involve unrooted trees.

The difficulty is choosing a root.

Consider:

0
|
1
|
2
|
3

Rooting at different nodes produces different encodings.

We need a canonical choice.

The solution uses tree centers.

Finding Tree Centers

Recall the diameter discussion.

Every tree has:

one center

or:

two centers

The center lies at the midpoint of the diameter.

Examples:

0-1-2-3-4

Center:

2

Example:

0-1-2-3

Centers:

1
2

These nodes provide canonical rooting locations.

Isomorphism for Unrooted Trees

Procedure:

  1. Find centers of tree A.
  2. Find centers of tree B.
  3. Root at centers.
  4. Compare encodings.

If either encoding matches:

Trees are isomorphic.

Otherwise:

Trees differ.

This removes dependence on arbitrary root choices.

Example

Tree A:

    0
   / \\n  1   2

Tree B:

    5
   / \\n  7   8

Encodings:

(()())

and:

(()())

Match.

The trees are isomorphic.

Labels do not matter.

Only structure matters.

Non-Isomorphic Example

Tree A:

    0
   / \\n  1   2

Tree B:

0
|
1
|
2

Encodings:

(()())

and:

((()))

Differ.

The structures are not equivalent.

AHU Algorithm

The classical tree-isomorphism algorithm is often called the AHU algorithm after:

  • entity["people","Alfred Aho","AHU algorithm co-author"]
  • entity["people","John Hopcroft","AHU algorithm co-author"]
  • entity["people","Jeffrey Ullman","AHU algorithm co-author"]

The recursive encoding method presented here is essentially the same idea.

The algorithm remains one of the most elegant examples of canonical labeling.

Hash-Based Encodings

String construction is simple but can become expensive.

An alternative assigns numeric hashes.

Example:

Leaf
→ 1

Then:

Node hash =
Hash(
    sorted child hashes
)

This approach reduces memory usage.

Large-scale systems often prefer hashing.

The underlying concept remains unchanged.

Applications in Compilers

Consider two expressions.

(a+b)+(c+d)

and:

(x+y)+(z+w)

Abstract syntax trees may have identical structure.

Tree isomorphism helps identify common patterns.

Compilers use related ideas for optimization and transformation.

Applications in Chemistry

Chemical compounds can be represented as graphs and trees.

Substructures often need comparison.

Questions include:

Do these molecules
share the same shape?

Tree-isomorphism techniques provide efficient solutions when the structure is acyclic.

Applications in XML

Consider:

<book>
  <title/>
  <author/>
</book>

and:

<article>
  <heading/>
  <writer/>
</article>

Labels differ.

Structure matches.

Tree isomorphism detects this similarity.

Tree Isomorphism vs Graph Isomorphism

Tree isomorphism is significantly easier.

For trees:

Polynomial time

For general graphs:

Much harder

Trees lack cycles and possess recursive structure.

This makes canonical encoding possible.

The difference illustrates how structural constraints can dramatically simplify algorithm design.

Canonical Forms

A recurring theme in computer science is:

Convert object
to canonical form.

Examples:

  • Sorted arrays
  • Normalized strings
  • Reduced fractions
  • Canonical ASTs
  • Tree encodings

Comparison then becomes trivial.

Tree isomorphism follows exactly this pattern.

Complexity Analysis

Let:

n = number of nodes

Subtree encoding:

Operation Complexity
DFS traversal O(n)

Sorting child encodings:

Operation Complexity
Total sorting O(n log n)

Overall:

Operation Complexity
Rooted isomorphism O(n log n)
Unrooted isomorphism O(n log n)

Memory:

Structure Complexity
Encodings O(n)

More advanced hashing approaches can reduce practical costs further.

Common Mistakes

A common mistake is forgetting to sort child encodings.

Without sorting, equivalent trees may appear different.

Another mistake is comparing unrooted trees using arbitrary roots.

Different root choices often produce different representations.

A third mistake is assuming node labels matter. Tree isomorphism ignores labels unless the problem explicitly includes them.

Finally, string-based encodings can become large for very deep trees. Hash-based approaches may be preferable when scale becomes important.

Takeaway

Tree Isomorphism determines whether two trees share the same structure regardless of labels. By recursively constructing canonical representations of subtrees and comparing the resulting encodings, the problem becomes surprisingly elegant. The technique highlights a recurring algorithmic principle: when direct comparison is difficult, transform objects into canonical forms and compare those instead. Tree isomorphism serves as both a practical tool and a beautiful example of recursive algorithm design.