# LeetCode 652: Find Duplicate Subtrees

## Problem Restatement

We are given the root of a binary tree.

We need to find all duplicate subtrees.

Two subtrees are duplicate when they have:

| Requirement | Meaning |
|---|---|
| Same structure | The nodes are arranged in the same shape |
| Same values | Matching nodes have equal values |

For each kind of duplicate subtree, we only return one root node.

For example, if the same subtree appears three times, we still add only one of those roots to the answer.

## Input and Output

| Item | Meaning |
|---|---|
| Input | The root of a binary tree |
| Output | A list of root nodes of duplicate subtrees |
| Duplicate rule | Same structure and same node values |
| Return rule | Return one representative root for each duplicate kind |

Example function shape:

```python
def findDuplicateSubtrees(root: Optional[TreeNode]) -> List[Optional[TreeNode]]:
    ...
```

## Examples

Consider this tree:

```text
        1
       / \
      2   3
     /   / \
    4   2   4
       /
      4
```

The subtree:

```text
  2
 /
4
```

appears twice.

The single-node subtree:

```text
4
```

also appears more than once.

So the answer contains one root for each duplicate subtree type:

```text
[[2,4], [4]]
```

Another example:

```text
    2
   / \
  1   1
```

The subtree rooted at value `1` appears twice.

So the answer is:

```text
[[1]]
```

## First Thought: Brute Force

A direct solution is to compare every subtree with every other subtree.

We could collect all nodes in a list. Every node is the root of one subtree.

Then for every pair of nodes, we check whether the two subtrees are identical.

Two subtrees are identical if:

1. Their root values are equal.
2. Their left subtrees are identical.
3. Their right subtrees are identical.

This works logically, but it repeats a lot of work.

If the tree has `n` nodes, there are many pairs of subtrees. Comparing one pair may require scanning many nodes. This can become too slow.

## Key Insight

Instead of comparing subtrees directly, we can give every subtree a unique representation.

This representation must include:

| Part | Why it matters |
|---|---|
| Root value | Different values mean different subtrees |
| Left subtree | The left child affects structure and value |
| Right subtree | The right child affects structure and value |
| Null markers | Needed to distinguish different shapes |

For each subtree, we serialize it into a string.

For example, a leaf node `4` can be serialized as:

```text
4,#,#
```

Here, `#` means null.

A subtree like this:

```text
  2
 /
4
```

can be serialized as:

```text
2,4,#,#,#
```

This captures both the values and the shape.

If two subtrees have the same serialization, they are duplicate subtrees.

## Dynamic View of the Tree

This problem is naturally solved with postorder traversal.

Postorder means:

1. Process the left subtree.
2. Process the right subtree.
3. Process the current node.

That order matters because the serialization of a node depends on the serialization of its children.

For a node, we build:

```text
node value + left serialization + right serialization
```

Null children return a marker such as `#`.

## Algorithm

Use a hash map called `count`.

The key is a subtree serialization.

The value is how many times we have seen that serialization.

Then run DFS from the root.

For each node:

1. If the node is null, return `#`.
2. Serialize the left subtree.
3. Serialize the right subtree.
4. Build the current subtree serialization.
5. Increase its count in the hash map.
6. If its count becomes exactly `2`, add the current node to the answer.
7. Return the serialization.

We add the node only when the count becomes `2`.

This avoids adding the same duplicate type multiple times.

## Correctness

The serialization records the current node value, the full left subtree, and the full right subtree. It also records null children. Therefore, two subtrees with the same serialization have the same structure and the same values.

The DFS computes the serialization for every subtree in the tree because every node is visited once, and every node is the root of exactly one subtree.

When a serialization appears for the first time, we have not found a duplicate yet.

When the same serialization appears for the second time, we have found one duplicate subtree type, so we add the current node to the result.

If the same serialization appears again later, it belongs to the same duplicate type, so we do not add another root.

Thus, the result contains exactly one representative root for every duplicate subtree type.

## Complexity

| Metric | Value | Why |
|---|---|---|
| Time | `O(n^2)` worst case with string concatenation | Large subtree strings may be rebuilt many times |
| Space | `O(n^2)` worst case with string storage | Serialized strings can contain repeated subtree text |

For typical LeetCode constraints, this serialization approach is accepted and simple.

A more advanced version can assign integer IDs to subtree tuples to reduce repeated string growth.

## Implementation

```python
from collections import defaultdict
from typing import Optional, List

# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right

class Solution:
    def findDuplicateSubtrees(
        self,
        root: Optional[TreeNode],
    ) -> List[Optional[TreeNode]]:
        count = defaultdict(int)
        answer = []

        def dfs(node: Optional[TreeNode]) -> str:
            if node is None:
                return "#"

            left = dfs(node.left)
            right = dfs(node.right)

            serial = f"{node.val},{left},{right}"

            count[serial] += 1
            if count[serial] == 2:
                answer.append(node)

            return serial

        dfs(root)
        return answer
```

## Code Explanation

We use:

```python
count = defaultdict(int)
answer = []
```

`count` tracks how many times each subtree serialization has appeared.

`answer` stores one representative root for each duplicate subtree type.

The DFS returns a string:

```python
def dfs(node: Optional[TreeNode]) -> str:
```

For a null child, we return a marker:

```python
if node is None:
    return "#"
```

This marker is important. Without it, different tree shapes could produce the same serialization.

Then we serialize the children first:

```python
left = dfs(node.left)
right = dfs(node.right)
```

After both children are serialized, we serialize the current subtree:

```python
serial = f"{node.val},{left},{right}"
```

Then we update the count:

```python
count[serial] += 1
```

If this is the second time we have seen this serialization, we add the current node:

```python
if count[serial] == 2:
    answer.append(node)
```

The condition uses `== 2`, not `>= 2`, because we only want one representative root for each duplicate kind.

Finally, the DFS returns the serialization so the parent can use it:

```python
return serial
```

## Testing

LeetCode compares returned tree nodes structurally, but in local tests it is often easier to serialize the returned roots.

```python
def serialize(root):
    if root is None:
        return "#"

    return f"{root.val},{serialize(root.left)},{serialize(root.right)}"

def collect_duplicate_serials(root):
    result = Solution().findDuplicateSubtrees(root)
    return sorted(serialize(node) for node in result)
```

Example tests:

```python
def run_tests():
    # Tree:
    #         1
    #        / \
    #       2   3
    #      /   / \
    #     4   2   4
    #        /
    #       4
    root = TreeNode(1)
    root.left = TreeNode(2, TreeNode(4))
    root.right = TreeNode(3, TreeNode(2, TreeNode(4)), TreeNode(4))

    assert collect_duplicate_serials(root) == sorted([
        "2,4,#,#,#",
        "4,#,#",
    ])

    # Tree:
    #     2
    #    / \
    #   1   1
    root = TreeNode(2, TreeNode(1), TreeNode(1))

    assert collect_duplicate_serials(root) == [
        "1,#,#",
    ]

    # Tree:
    #       2
    #      / \
    #     2   2
    #    /   /
    #   3   3
    root = TreeNode(
        2,
        TreeNode(2, TreeNode(3)),
        TreeNode(2, TreeNode(3)),
    )

    assert collect_duplicate_serials(root) == sorted([
        "2,3,#,#,#",
        "3,#,#",
    ])

    print("all tests passed")
```

Test meaning:

| Test | Why |
|---|---|
| Repeated leaf and repeated larger subtree | Confirms duplicate detection at different sizes |
| Two equal leaves | Checks the simplest duplicate subtree |
| Nested duplicate subtree | Confirms both child and parent duplicates can be returned |
| Null markers | Protects against confusing different tree shapes |