Facebook Pixel

652. Find Duplicate Subtrees

Problem Description

You are given the root of a binary tree. Your task is to find all subtrees that appear more than once in the tree (duplicate subtrees).

A duplicate subtree means that there are at least two subtrees in the binary tree that have:

  • The exact same structure (same shape)
  • The same node values at corresponding positions

For the output, you need to return a list containing the root nodes of these duplicate subtrees. However, if a particular subtree pattern appears multiple times (say 3 or more times), you only need to include one representative root node in your result - not all occurrences.

For example, if you have a tree where a subtree with structure 2->4->null->null appears three times at different positions, you would only return one of those three root nodes (the node with value 2) in your result list.

The solution works by traversing the tree and creating a unique string representation for each subtree. It uses a post-order traversal (visiting left child, right child, then current node) to build these string representations. The format is "value,left_subtree_string,right_subtree_string", where null nodes are represented as "#".

As the algorithm encounters each subtree, it:

  1. Generates the string representation for that subtree
  2. Counts how many times this exact representation has been seen
  3. When a representation is seen for the second time (count becomes 2), adds that subtree's root to the answer list
  4. Ignores further occurrences (3rd, 4th, etc.) of the same subtree pattern

This approach ensures that each type of duplicate subtree is represented exactly once in the final result.

Flowchart Walkthrough

First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:

Is it a graph?

  • Yes: While the problem presents a binary tree, a tree is a special type of graph (specifically, a connected acyclic graph). Binary trees are graphs where each node has at most two children and there's exactly one path between any two nodes.

Is it a tree?

  • Yes: The problem explicitly states we're working with a binary tree. We need to traverse this tree structure to identify duplicate subtrees.

DFS

  • Conclusion: The flowchart leads us directly to DFS (Depth-First Search) as the appropriate algorithm for this tree problem.

The DFS pattern is ideal here because:

  1. We need to examine every subtree in the binary tree to identify duplicates
  2. We need to process each subtree completely (all its descendants) to generate a unique representation
  3. Post-order traversal (a type of DFS) allows us to build subtree representations from the bottom up - we process children before parents
  4. DFS naturally handles the recursive structure of trees, where each node's representation depends on its children's representations

The solution uses DFS to traverse the tree, creating string serializations of each subtree. By visiting nodes in post-order (left, right, root), we ensure that when we process a node, we already have the complete representations of its children, allowing us to build the current subtree's representation efficiently.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that if two subtrees are identical, they must have the same "fingerprint" or unique representation. Think of it like taking a photograph of each subtree - identical subtrees would produce identical photographs.

To identify duplicate subtrees, we need to:

  1. Create a unique representation for every subtree in the tree
  2. Track how many times each unique representation appears
  3. When we see a representation for the second time, we've found a duplicate

The challenge is how to create this unique representation. We can't just use the root value alone because different subtrees might have the same root value but different structures. For example, two subtrees with root value 2 might have completely different children.

The solution is to serialize each subtree into a string that captures both its structure and values. We can think of this like creating a "DNA sequence" for each subtree. The serialization format "value,left_subtree,right_subtree" ensures that:

  • The node's value is captured
  • The structure of left and right children is preserved
  • The order matters (left vs right)
  • Null nodes are explicitly marked with "#" to distinguish between different structures

Why use post-order traversal (visiting children before parent)? Because to know what a subtree looks like, we first need to know what its children look like. It's like building a house - you need to complete the foundation and lower floors before you can describe the entire building.

By using a counter/hashmap to track how many times we've seen each serialization, we can efficiently identify when we encounter a duplicate (count becomes 2) and add it to our result exactly once, avoiding duplicates in our answer list even if that subtree pattern appears many times.

Learn more about Tree, Depth-First Search and Binary Tree patterns.

Solution Approach

The implementation uses a post-order DFS traversal combined with string serialization and a hash map to track duplicate subtrees.

Core Data Structures:

  • counter: A hash map (Python's Counter) that maps each subtree's string representation to its occurrence count
  • ans: A list to store the root nodes of duplicate subtrees

The DFS Function:

The recursive dfs function performs the following steps for each node:

  1. Base Case: If the current node is None, return '#' as a placeholder to represent null nodes in the serialization.

  2. Recursive Calls: First, recursively process the left and right children:

    • dfs(root.left) returns the string representation of the left subtree
    • dfs(root.right) returns the string representation of the right subtree
  3. Build Current Subtree's Representation: Create a unique string for the current subtree using the format:

    v = f'{root.val},{dfs(root.left)},{dfs(root.right)}'

    This format ensures that subtrees with the same structure and values produce identical strings.

  4. Count and Track Duplicates:

    • Increment the count for this representation: counter[v] += 1
    • If the count becomes exactly 2 (second occurrence), add this root to the answer list
    • We check for == 2 specifically to ensure each duplicate type is added only once
  5. Return Representation: Return the string representation v so parent nodes can use it in their own serialization.

Example Walkthrough:

For a simple subtree like:

    2
   /
  4

The serialization would be:

  • Node 4: "4,#,#" (value 4, no left child, no right child)
  • Node 2: "2,4,#,#,#" (value 2, left subtree is "4,#,#", no right child)

If this exact subtree appears again elsewhere in the tree, its serialization would be identical, the counter would increment to 2, and we'd add that occurrence's root to our answer.

Time Complexity: O(n) where n is the number of nodes, as we visit each node once.

Space Complexity: O(n) for the hash map storing at most n different subtree representations, plus the recursion stack.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to illustrate how the solution identifies duplicate subtrees.

Consider this binary tree:

        1
       / \
      2   3
     /   / \
    4   2   4
       /
      4

Step-by-step traversal (post-order DFS):

  1. Visit leftmost leaf (4):

    • No children, so serialize as: "4,#,#"
    • counter["4,#,#"] = 1
    • Not a duplicate yet
  2. Visit subtree rooted at 2 (left child of root):

    • Left child serialization: "4,#,#"
    • Right child: null → "#"
    • Serialize as: "2,4,#,#,#"
    • counter["2,4,#,#,#"] = 1
  3. Visit another leaf (4) under the second 2:

    • Serialize as: "4,#,#"
    • counter["4,#,#"] = 2 ✓
    • Count is 2! Add this node (4) to answer list
  4. Visit subtree rooted at 2 (left child of 3):

    • Left child: "4,#,#"
    • Right child: "#"
    • Serialize as: "2,4,#,#,#"
    • counter["2,4,#,#,#"] = 2 ✓
    • Count is 2! Add this node (2) to answer list
  5. Visit rightmost leaf (4):

    • Serialize as: "4,#,#"
    • counter["4,#,#"] = 3
    • Count > 2, so don't add (already in answer)
  6. Visit subtree rooted at 3:

    • Left: "2,4,#,#,#"
    • Right: "4,#,#"
    • Serialize as: "3,2,4,#,#,#,4,#,#"
    • counter["3,2,4,#,#,#,4,#,#"] = 1
  7. Visit root (1):

    • Left: "2,4,#,#,#"
    • Right: "3,2,4,#,#,#,4,#,#"
    • Serialize as: "1,2,4,#,#,#,3,2,4,#,#,#,4,#,#"
    • counter["1,2,4,#,#,#,3,2,4,#,#,#,4,#,#"] = 1

Final Result: The answer list contains two nodes:

  • A node with value 4 (representing the duplicate leaf pattern)
  • A node with value 2 (representing the duplicate subtree pattern 2->4)

The key insight: Each unique subtree structure gets its own string "fingerprint", and we track exactly when each fingerprint appears for the second time to identify duplicates without repetition in our answer.

Solution Implementation

1# Definition for a binary tree node.
2# class TreeNode:
3#     def __init__(self, val=0, left=None, right=None):
4#         self.val = val
5#         self.left = left
6#         self.right = right
7
8from typing import Optional, List
9from collections import Counter
10
11class Solution:
12    def findDuplicateSubtrees(
13        self, root: Optional[TreeNode]
14    ) -> List[Optional[TreeNode]]:
15        """
16        Find all duplicate subtrees in a binary tree.
17        Returns a list containing one root node from each group of duplicate subtrees.
18        """
19      
20        def serialize_and_count(node: Optional[TreeNode]) -> str:
21            """
22            Serialize the subtree rooted at 'node' using post-order traversal.
23            Count occurrences of each subtree structure and collect duplicates.
24          
25            Args:
26                node: Current tree node being processed
27              
28            Returns:
29                String representation of the subtree structure
30            """
31            # Base case: null node represented as '#'
32            if node is None:
33                return '#'
34          
35            # Recursively serialize left and right subtrees
36            left_serialization = serialize_and_count(node.left)
37            right_serialization = serialize_and_count(node.right)
38          
39            # Create unique serialization for current subtree
40            # Format: "value,left_subtree,right_subtree"
41            subtree_serialization = f'{node.val},{left_serialization},{right_serialization}'
42          
43            # Increment count for this subtree structure
44            subtree_count[subtree_serialization] += 1
45          
46            # Add to result only when we encounter the second occurrence
47            # This ensures each duplicate group is represented once
48            if subtree_count[subtree_serialization] == 2:
49                duplicate_roots.append(node)
50          
51            return subtree_serialization
52      
53        # Initialize result list to store root nodes of duplicate subtrees
54        duplicate_roots = []
55      
56        # Dictionary to count occurrences of each unique subtree structure
57        subtree_count = Counter()
58      
59        # Start DFS traversal from root
60        serialize_and_count(root)
61      
62        return duplicate_roots
63
1/**
2 * Definition for a binary tree node.
3 * public class TreeNode {
4 *     int val;
5 *     TreeNode left;
6 *     TreeNode right;
7 *     TreeNode() {}
8 *     TreeNode(int val) { this.val = val; }
9 *     TreeNode(int val, TreeNode left, TreeNode right) {
10 *         this.val = val;
11 *         this.left = left;
12 *         this.right = right;
13 *     }
14 * }
15 */
16class Solution {
17    // Map to store serialized subtree representations and their occurrence count
18    private Map<String, Integer> subtreeCountMap;
19    // List to store root nodes of duplicate subtrees
20    private List<TreeNode> duplicateSubtrees;
21
22    /**
23     * Finds all duplicate subtrees in a binary tree.
24     * A duplicate subtree means there are at least two subtrees with the same structure and node values.
25     * 
26     * @param root The root of the binary tree
27     * @return List of root nodes of all duplicate subtrees
28     */
29    public List<TreeNode> findDuplicateSubtrees(TreeNode root) {
30        // Initialize data structures
31        subtreeCountMap = new HashMap<>();
32        duplicateSubtrees = new ArrayList<>();
33      
34        // Perform depth-first search to serialize and identify duplicates
35        serializeSubtree(root);
36      
37        return duplicateSubtrees;
38    }
39
40    /**
41     * Performs post-order traversal to serialize each subtree and identify duplicates.
42     * Creates a unique string representation for each subtree using the pattern:
43     * "nodeValue,leftSubtreeString,rightSubtreeString"
44     * 
45     * @param node The current node being processed
46     * @return String representation of the subtree rooted at current node
47     */
48    private String serializeSubtree(TreeNode node) {
49        // Base case: null node represented as "#"
50        if (node == null) {
51            return "#";
52        }
53      
54        // Recursively serialize left and right subtrees (post-order traversal)
55        String leftSerialization = serializeSubtree(node.left);
56        String rightSerialization = serializeSubtree(node.right);
57      
58        // Create unique serialization for current subtree
59        String currentSerialization = node.val + "," + leftSerialization + "," + rightSerialization;
60      
61        // Update occurrence count for this subtree pattern
62        int currentCount = subtreeCountMap.getOrDefault(currentSerialization, 0) + 1;
63        subtreeCountMap.put(currentSerialization, currentCount);
64      
65        // Add to result list only when we encounter the second occurrence
66        // This ensures each duplicate subtree is added only once
67        if (currentCount == 2) {
68            duplicateSubtrees.add(node);
69        }
70      
71        return currentSerialization;
72    }
73}
74
1/**
2 * Definition for a binary tree node.
3 * struct TreeNode {
4 *     int val;
5 *     TreeNode *left;
6 *     TreeNode *right;
7 *     TreeNode() : val(0), left(nullptr), right(nullptr) {}
8 *     TreeNode(int x) : val(x), left(nullptr), right(nullptr) {}
9 *     TreeNode(int x, TreeNode *left, TreeNode *right) : val(x), left(left), right(right) {}
10 * };
11 */
12class Solution {
13public:
14    vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
15        // Map to store serialized subtree representations and their occurrence count
16        unordered_map<string, int> subtreeCounter;
17      
18        // Result vector to store root nodes of duplicate subtrees
19        vector<TreeNode*> duplicateSubtrees;
20      
21        // Perform DFS traversal to serialize and identify duplicate subtrees
22        serializeSubtree(root, subtreeCounter, duplicateSubtrees);
23      
24        return duplicateSubtrees;
25    }
26
27private:
28    /**
29     * Serializes the subtree rooted at the given node using post-order traversal.
30     * Tracks occurrences of each unique subtree structure.
31     * 
32     * @param node Current node being processed
33     * @param subtreeCounter Map tracking serialized subtree occurrences
34     * @param duplicateSubtrees Vector collecting roots of duplicate subtrees
35     * @return Serialized string representation of the subtree
36     */
37    string serializeSubtree(TreeNode* node, 
38                           unordered_map<string, int>& subtreeCounter,
39                           vector<TreeNode*>& duplicateSubtrees) {
40        // Base case: null node represented as "#"
41        if (!node) {
42            return "#";
43        }
44      
45        // Recursively serialize left subtree
46        string leftSerialization = serializeSubtree(node->left, subtreeCounter, duplicateSubtrees);
47      
48        // Recursively serialize right subtree
49        string rightSerialization = serializeSubtree(node->right, subtreeCounter, duplicateSubtrees);
50      
51        // Build serialization string for current subtree
52        // Format: "nodeValue,leftSubtree,rightSubtree"
53        string currentSerialization = to_string(node->val) + "," + 
54                                     leftSerialization + "," + 
55                                     rightSerialization;
56      
57        // Increment occurrence count for this subtree structure
58        subtreeCounter[currentSerialization]++;
59      
60        // Add to result only when we encounter the second occurrence
61        // This ensures each duplicate subtree is added only once
62        if (subtreeCounter[currentSerialization] == 2) {
63            duplicateSubtrees.push_back(node);
64        }
65      
66        return currentSerialization;
67    }
68};
69
1/**
2 * Definition for a binary tree node.
3 * class TreeNode {
4 *     val: number
5 *     left: TreeNode | null
6 *     right: TreeNode | null
7 *     constructor(val?: number, left?: TreeNode | null, right?: TreeNode | null) {
8 *         this.val = (val===undefined ? 0 : val)
9 *         this.left = (left===undefined ? null : left)
10 *         this.right = (right===undefined ? null : right)
11 *     }
12 * }
13 */
14
15/**
16 * Finds all duplicate subtrees in a binary tree.
17 * A duplicate subtree means there are two or more subtrees with the same structure and node values.
18 * 
19 * @param root - The root node of the binary tree
20 * @returns An array containing one root node of each duplicate subtree
21 */
22function findDuplicateSubtrees(root: TreeNode | null): Array<TreeNode | null> {
23    // Map to store serialized subtree representations and their occurrence count
24    const subtreeCountMap = new Map<string, number>();
25  
26    // Array to store the result - root nodes of duplicate subtrees
27    const duplicateSubtrees: Array<TreeNode | null> = [];
28  
29    /**
30     * Performs depth-first search to serialize each subtree and identify duplicates.
31     * Uses post-order traversal to build serialized representation from bottom up.
32     * 
33     * @param node - Current node being processed
34     * @returns Serialized string representation of the subtree rooted at this node
35     */
36    const serializeAndFindDuplicates = (node: TreeNode | null): string => {
37        // Base case: null node represented as '#'
38        if (node === null) {
39            return '#';
40        }
41      
42        // Recursively serialize left and right subtrees
43        const leftSerialized: string = serializeAndFindDuplicates(node.left);
44        const rightSerialized: string = serializeAndFindDuplicates(node.right);
45      
46        // Create unique serialization for current subtree
47        // Format: "nodeValue,leftSubtree,rightSubtree"
48        const currentSubtreeSerialized: string = `${node.val},${leftSerialized},${rightSerialized}`;
49      
50        // Update occurrence count for this subtree structure
51        const currentCount: number = subtreeCountMap.get(currentSubtreeSerialized) ?? 0;
52        subtreeCountMap.set(currentSubtreeSerialized, currentCount + 1);
53      
54        // Add to result only when we encounter the second occurrence
55        // This ensures each duplicate subtree is added only once
56        if (subtreeCountMap.get(currentSubtreeSerialized) === 2) {
57            duplicateSubtrees.push(node);
58        }
59      
60        return currentSubtreeSerialized;
61    };
62  
63    // Start the DFS traversal from root
64    serializeAndFindDuplicates(root);
65  
66    return duplicateSubtrees;
67}
68

Time and Space Complexity

Time Complexity: O(n^2)

The algorithm performs a depth-first traversal of the tree where each node is visited once. For each node, we construct a string representation by concatenating:

  • The node's value
  • The string representation of its left subtree
  • The string representation of its right subtree

In the worst case (a completely unbalanced tree forming a linear chain), the string concatenation at each level creates strings of lengths 1, 2, 3, ..., n. The total time for string operations becomes 1 + 2 + 3 + ... + n = O(n^2).

For a balanced tree, the string length at each node is proportional to the size of its subtree. While the concatenation work is more distributed, the total work across all nodes still sums to O(n^2) due to repeated substring copying during string construction.

Space Complexity: O(n^2)

The space complexity consists of:

  • Recursion stack: O(h) where h is the height of the tree. In the worst case, h = n for a skewed tree, giving O(n).
  • Counter dictionary: Stores up to n unique serialized subtree representations. In the worst case (completely unbalanced tree), these strings have lengths 1, 2, 3, ..., n, requiring total space of O(n^2).
  • Output list: O(n) in the worst case if all subtrees are duplicates.

The dominant factor is the storage of serialized strings in the counter, giving an overall space complexity of O(n^2).

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect Serialization Format Leading to Collisions

One major pitfall is using an ambiguous serialization format that can cause different subtrees to produce the same string representation.

Problematic Example:

# WRONG: Using simple concatenation without delimiters
subtree_string = f'{node.val}{left}{right}'

This can cause collisions. Consider these two different subtrees:

  • Subtree A: root=12, left=#, right=# → produces "12##"
  • Subtree B: root=1, left=2, right=# → where node 2 has left=# and right=# → also produces "12##"

These are structurally different trees but would incorrectly be identified as duplicates!

Solution: Always use clear delimiters between components:

# CORRECT: Using commas as delimiters
subtree_string = f'{node.val},{left},{right}'

2. Adding All Occurrences Instead of One Representative

A common mistake is adding the root node to the result list every time a duplicate is found, rather than just once per unique subtree pattern.

Problematic Example:

# WRONG: Adding every duplicate occurrence
if subtree_count[subtree_serialization] >= 2:
    duplicate_roots.append(node)

This would add the same subtree pattern multiple times if it appears 3+ times in the tree.

Solution: Only add when count equals exactly 2:

# CORRECT: Add only on the second occurrence
if subtree_count[subtree_serialization] == 2:
    duplicate_roots.append(node)

3. Using Pre-order or In-order Traversal for Serialization

While post-order traversal isn't strictly required, using pre-order or in-order can make the implementation more error-prone or less intuitive.

Why Post-order Works Best:

  • You need the complete serialization of child subtrees before processing the parent
  • Post-order naturally provides this by processing children first
  • The string for each subtree is fully constructed when needed by its parent

Example of Pre-order Attempt (more complex):

# More complex with pre-order - requires careful handling
def serialize(node):
    if not node:
        return '#'
    # Need to carefully manage the recursion and string building
    return f'{node.val},{serialize(node.left)},{serialize(node.right)}'

4. Not Handling Node Values with Special Characters

If node values can contain the delimiter character (like commas), the serialization breaks.

Problematic Scenario: If a node value is "1,2" and you use comma as delimiter, the serialization becomes ambiguous.

Solution Options:

  • Use a delimiter that cannot appear in values (like '\x00' for strings)
  • Escape special characters in node values
  • Use length-prefixed encoding
# Better for arbitrary values
subtree_string = f'{node.val}\x00{left}\x00{right}'
# Or use a tuple/list approach that naturally handles any value type
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which of the following is a good use case for backtracking?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More