LeetCode 652: Find Duplicate Subtrees

Problem Restatement

We are given the root of a binary tree.

We need to find all duplicate subtrees.

Two subtrees are duplicate when they have:

Requirement	Meaning
Same structure	The nodes are arranged in the same shape
Same values	Matching nodes have equal values

For each kind of duplicate subtree, we only return one root node.

For example, if the same subtree appears three times, we still add only one of those roots to the answer.

Input and Output

Item	Meaning
Input	The root of a binary tree
Output	A list of root nodes of duplicate subtrees
Duplicate rule	Same structure and same node values
Return rule	Return one representative root for each duplicate kind

Example function shape:

def findDuplicateSubtrees(root: Optional[TreeNode]) -> List[Optional[TreeNode]]:
    ...

Examples

Consider this tree:

The subtree:

  2
 /
4

appears twice.

The single-node subtree:

also appears more than once.

So the answer contains one root for each duplicate subtree type:

[[2,4], [4]]

Another example:

    2
   / \
  1   1

The subtree rooted at value 1 appears twice.

So the answer is:

[[1]]

First Thought: Brute Force

A direct solution is to compare every subtree with every other subtree.

We could collect all nodes in a list. Every node is the root of one subtree.

Then for every pair of nodes, we check whether the two subtrees are identical.

Two subtrees are identical if:

Their root values are equal.
Their left subtrees are identical.
Their right subtrees are identical.

This works logically, but it repeats a lot of work.

If the tree has n nodes, there are many pairs of subtrees. Comparing one pair may require scanning many nodes. This can become too slow.

Key Insight

Instead of comparing subtrees directly, we can give every subtree a unique representation.

This representation must include:

Part	Why it matters
Root value	Different values mean different subtrees
Left subtree	The left child affects structure and value
Right subtree	The right child affects structure and value
Null markers	Needed to distinguish different shapes

For each subtree, we serialize it into a string.

For example, a leaf node 4 can be serialized as:

4,#,#

Here, # means null.

A subtree like this:

  2
 /
4

can be serialized as:

2,4,#,#,#

This captures both the values and the shape.

If two subtrees have the same serialization, they are duplicate subtrees.

Dynamic View of the Tree

This problem is naturally solved with postorder traversal.

Postorder means:

Process the left subtree.
Process the right subtree.
Process the current node.

That order matters because the serialization of a node depends on the serialization of its children.

For a node, we build:

node value + left serialization + right serialization

Null children return a marker such as #.

Algorithm

Use a hash map called count.

The key is a subtree serialization.

The value is how many times we have seen that serialization.

Then run DFS from the root.

For each node:

If the node is null, return #.
Serialize the left subtree.
Serialize the right subtree.
Build the current subtree serialization.
Increase its count in the hash map.
If its count becomes exactly 2, add the current node to the answer.
Return the serialization.

We add the node only when the count becomes 2.

This avoids adding the same duplicate type multiple times.

Correctness

The serialization records the current node value, the full left subtree, and the full right subtree. It also records null children. Therefore, two subtrees with the same serialization have the same structure and the same values.

The DFS computes the serialization for every subtree in the tree because every node is visited once, and every node is the root of exactly one subtree.

When a serialization appears for the first time, we have not found a duplicate yet.

When the same serialization appears for the second time, we have found one duplicate subtree type, so we add the current node to the result.

If the same serialization appears again later, it belongs to the same duplicate type, so we do not add another root.

Thus, the result contains exactly one representative root for every duplicate subtree type.

Complexity

Metric	Value	Why
Time	`O(n^2)` worst case with string concatenation	Large subtree strings may be rebuilt many times
Space	`O(n^2)` worst case with string storage	Serialized strings can contain repeated subtree text

For typical LeetCode constraints, this serialization approach is accepted and simple.

A more advanced version can assign integer IDs to subtree tuples to reduce repeated string growth.

Implementation

from collections import defaultdict
from typing import Optional, List

# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right

class Solution:
    def findDuplicateSubtrees(
        self,
        root: Optional[TreeNode],
    ) -> List[Optional[TreeNode]]:
        count = defaultdict(int)
        answer = []

        def dfs(node: Optional[TreeNode]) -> str:
            if node is None:
                return "#"

            left = dfs(node.left)
            right = dfs(node.right)

            serial = f"{node.val},{left},{right}"

            count[serial] += 1
            if count[serial] == 2:
                answer.append(node)

            return serial

        dfs(root)
        return answer

Code Explanation

We use:

count = defaultdict(int)
answer = []

count tracks how many times each subtree serialization has appeared.

answer stores one representative root for each duplicate subtree type.

The DFS returns a string:

def dfs(node: Optional[TreeNode]) -> str:

For a null child, we return a marker:

if node is None:
    return "#"

This marker is important. Without it, different tree shapes could produce the same serialization.

Then we serialize the children first:

left = dfs(node.left)
right = dfs(node.right)

After both children are serialized, we serialize the current subtree:

serial = f"{node.val},{left},{right}"

Then we update the count:

count[serial] += 1

If this is the second time we have seen this serialization, we add the current node:

if count[serial] == 2:
    answer.append(node)

The condition uses == 2, not >= 2, because we only want one representative root for each duplicate kind.

Finally, the DFS returns the serialization so the parent can use it:

return serial

Testing

LeetCode compares returned tree nodes structurally, but in local tests it is often easier to serialize the returned roots.

def serialize(root):
    if root is None:
        return "#"

    return f"{root.val},{serialize(root.left)},{serialize(root.right)}"

def collect_duplicate_serials(root):
    result = Solution().findDuplicateSubtrees(root)
    return sorted(serialize(node) for node in result)

Example tests:

def run_tests():
    # Tree:
    #         1
    #        / \
    #       2   3
    #      /   / \
    #     4   2   4
    #        /
    #       4
    root = TreeNode(1)
    root.left = TreeNode(2, TreeNode(4))
    root.right = TreeNode(3, TreeNode(2, TreeNode(4)), TreeNode(4))

    assert collect_duplicate_serials(root) == sorted([
        "2,4,#,#,#",
        "4,#,#",
    ])

    # Tree:
    #     2
    #    / \
    #   1   1
    root = TreeNode(2, TreeNode(1), TreeNode(1))

    assert collect_duplicate_serials(root) == [
        "1,#,#",
    ]

    # Tree:
    #       2
    #      / \
    #     2   2
    #    /   /
    #   3   3
    root = TreeNode(
        2,
        TreeNode(2, TreeNode(3)),
        TreeNode(2, TreeNode(3)),
    )

    assert collect_duplicate_serials(root) == sorted([
        "2,3,#,#,#",
        "3,#,#",
    ])

    print("all tests passed")

Test meaning:

Test	Why
Repeated leaf and repeated larger subtree	Confirms duplicate detection at different sizes
Two equal leaves	Checks the simplest duplicate subtree
Nested duplicate subtree	Confirms both child and parent duplicates can be returned
Null markers	Protects against confusing different tree shapes