652. Find Duplicate Subtrees
Problem Description
You are given the root of a binary tree. Your task is to find all subtrees that appear more than once in the tree (duplicate subtrees).
A duplicate subtree means that there are at least two subtrees in the binary tree that have:
- The exact same structure (same shape)
- The same node values at corresponding positions
For the output, you need to return a list containing the root nodes of these duplicate subtrees. However, if a particular subtree pattern appears multiple times (say 3 or more times), you only need to include one representative root node in your result - not all occurrences.
For example, if you have a tree where a subtree with structure 2->4->null->null
appears three times at different positions, you would only return one of those three root nodes (the node with value 2) in your result list.
The solution works by traversing the tree and creating a unique string representation for each subtree. It uses a post-order traversal (visiting left child, right child, then current node) to build these string representations. The format is "value,left_subtree_string,right_subtree_string"
, where null nodes are represented as "#"
.
As the algorithm encounters each subtree, it:
- Generates the string representation for that subtree
- Counts how many times this exact representation has been seen
- When a representation is seen for the second time (count becomes 2), adds that subtree's root to the answer list
- Ignores further occurrences (3rd, 4th, etc.) of the same subtree pattern
This approach ensures that each type of duplicate subtree is represented exactly once in the final result.
Flowchart Walkthrough
First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:
Is it a graph?
- Yes: While the problem presents a binary tree, a tree is a special type of graph (specifically, a connected acyclic graph). Binary trees are graphs where each node has at most two children and there's exactly one path between any two nodes.
Is it a tree?
- Yes: The problem explicitly states we're working with a binary tree. We need to traverse this tree structure to identify duplicate subtrees.
DFS
- Conclusion: The flowchart leads us directly to DFS (Depth-First Search) as the appropriate algorithm for this tree problem.
The DFS pattern is ideal here because:
- We need to examine every subtree in the binary tree to identify duplicates
- We need to process each subtree completely (all its descendants) to generate a unique representation
- Post-order traversal (a type of DFS) allows us to build subtree representations from the bottom up - we process children before parents
- DFS naturally handles the recursive structure of trees, where each node's representation depends on its children's representations
The solution uses DFS to traverse the tree, creating string serializations of each subtree. By visiting nodes in post-order (left, right, root), we ensure that when we process a node, we already have the complete representations of its children, allowing us to build the current subtree's representation efficiently.
Intuition
The key insight is that if two subtrees are identical, they must have the same "fingerprint" or unique representation. Think of it like taking a photograph of each subtree - identical subtrees would produce identical photographs.
To identify duplicate subtrees, we need to:
- Create a unique representation for every subtree in the tree
- Track how many times each unique representation appears
- When we see a representation for the second time, we've found a duplicate
The challenge is how to create this unique representation. We can't just use the root value alone because different subtrees might have the same root value but different structures. For example, two subtrees with root value 2
might have completely different children.
The solution is to serialize each subtree into a string that captures both its structure and values. We can think of this like creating a "DNA sequence" for each subtree. The serialization format "value,left_subtree,right_subtree"
ensures that:
- The node's value is captured
- The structure of left and right children is preserved
- The order matters (left vs right)
- Null nodes are explicitly marked with
"#"
to distinguish between different structures
Why use post-order traversal (visiting children before parent)? Because to know what a subtree looks like, we first need to know what its children look like. It's like building a house - you need to complete the foundation and lower floors before you can describe the entire building.
By using a counter/hashmap to track how many times we've seen each serialization, we can efficiently identify when we encounter a duplicate (count becomes 2) and add it to our result exactly once, avoiding duplicates in our answer list even if that subtree pattern appears many times.
Learn more about Tree, Depth-First Search and Binary Tree patterns.
Solution Approach
The implementation uses a post-order DFS traversal combined with string serialization and a hash map to track duplicate subtrees.
Core Data Structures:
counter
: A hash map (Python'sCounter
) that maps each subtree's string representation to its occurrence countans
: A list to store the root nodes of duplicate subtrees
The DFS Function:
The recursive dfs
function performs the following steps for each node:
-
Base Case: If the current node is
None
, return'#'
as a placeholder to represent null nodes in the serialization. -
Recursive Calls: First, recursively process the left and right children:
dfs(root.left)
returns the string representation of the left subtreedfs(root.right)
returns the string representation of the right subtree
-
Build Current Subtree's Representation: Create a unique string for the current subtree using the format:
v = f'{root.val},{dfs(root.left)},{dfs(root.right)}'
This format ensures that subtrees with the same structure and values produce identical strings.
-
Count and Track Duplicates:
- Increment the count for this representation:
counter[v] += 1
- If the count becomes exactly 2 (second occurrence), add this root to the answer list
- We check for
== 2
specifically to ensure each duplicate type is added only once
- Increment the count for this representation:
-
Return Representation: Return the string representation
v
so parent nodes can use it in their own serialization.
Example Walkthrough:
For a simple subtree like:
2 / 4
The serialization would be:
- Node 4:
"4,#,#"
(value 4, no left child, no right child) - Node 2:
"2,4,#,#,#"
(value 2, left subtree is "4,#,#", no right child)
If this exact subtree appears again elsewhere in the tree, its serialization would be identical, the counter would increment to 2, and we'd add that occurrence's root to our answer.
Time Complexity: O(n)
where n is the number of nodes, as we visit each node once.
Space Complexity: O(n)
for the hash map storing at most n different subtree representations, plus the recursion stack.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example to illustrate how the solution identifies duplicate subtrees.
Consider this binary tree:
1 / \ 2 3 / / \ 4 2 4 / 4
Step-by-step traversal (post-order DFS):
-
Visit leftmost leaf (4):
- No children, so serialize as:
"4,#,#"
- counter["4,#,#"] = 1
- Not a duplicate yet
- No children, so serialize as:
-
Visit subtree rooted at 2 (left child of root):
- Left child serialization:
"4,#,#"
- Right child: null →
"#"
- Serialize as:
"2,4,#,#,#"
- counter["2,4,#,#,#"] = 1
- Left child serialization:
-
Visit another leaf (4) under the second 2:
- Serialize as:
"4,#,#"
- counter["4,#,#"] = 2 ✓
- Count is 2! Add this node (4) to answer list
- Serialize as:
-
Visit subtree rooted at 2 (left child of 3):
- Left child:
"4,#,#"
- Right child:
"#"
- Serialize as:
"2,4,#,#,#"
- counter["2,4,#,#,#"] = 2 ✓
- Count is 2! Add this node (2) to answer list
- Left child:
-
Visit rightmost leaf (4):
- Serialize as:
"4,#,#"
- counter["4,#,#"] = 3
- Count > 2, so don't add (already in answer)
- Serialize as:
-
Visit subtree rooted at 3:
- Left:
"2,4,#,#,#"
- Right:
"4,#,#"
- Serialize as:
"3,2,4,#,#,#,4,#,#"
- counter["3,2,4,#,#,#,4,#,#"] = 1
- Left:
-
Visit root (1):
- Left:
"2,4,#,#,#"
- Right:
"3,2,4,#,#,#,4,#,#"
- Serialize as:
"1,2,4,#,#,#,3,2,4,#,#,#,4,#,#"
- counter["1,2,4,#,#,#,3,2,4,#,#,#,4,#,#"] = 1
- Left:
Final Result: The answer list contains two nodes:
- A node with value 4 (representing the duplicate leaf pattern)
- A node with value 2 (representing the duplicate subtree pattern
2->4
)
The key insight: Each unique subtree structure gets its own string "fingerprint", and we track exactly when each fingerprint appears for the second time to identify duplicates without repetition in our answer.
Solution Implementation
1# Definition for a binary tree node.
2# class TreeNode:
3# def __init__(self, val=0, left=None, right=None):
4# self.val = val
5# self.left = left
6# self.right = right
7
8from typing import Optional, List
9from collections import Counter
10
11class Solution:
12 def findDuplicateSubtrees(
13 self, root: Optional[TreeNode]
14 ) -> List[Optional[TreeNode]]:
15 """
16 Find all duplicate subtrees in a binary tree.
17 Returns a list containing one root node from each group of duplicate subtrees.
18 """
19
20 def serialize_and_count(node: Optional[TreeNode]) -> str:
21 """
22 Serialize the subtree rooted at 'node' using post-order traversal.
23 Count occurrences of each subtree structure and collect duplicates.
24
25 Args:
26 node: Current tree node being processed
27
28 Returns:
29 String representation of the subtree structure
30 """
31 # Base case: null node represented as '#'
32 if node is None:
33 return '#'
34
35 # Recursively serialize left and right subtrees
36 left_serialization = serialize_and_count(node.left)
37 right_serialization = serialize_and_count(node.right)
38
39 # Create unique serialization for current subtree
40 # Format: "value,left_subtree,right_subtree"
41 subtree_serialization = f'{node.val},{left_serialization},{right_serialization}'
42
43 # Increment count for this subtree structure
44 subtree_count[subtree_serialization] += 1
45
46 # Add to result only when we encounter the second occurrence
47 # This ensures each duplicate group is represented once
48 if subtree_count[subtree_serialization] == 2:
49 duplicate_roots.append(node)
50
51 return subtree_serialization
52
53 # Initialize result list to store root nodes of duplicate subtrees
54 duplicate_roots = []
55
56 # Dictionary to count occurrences of each unique subtree structure
57 subtree_count = Counter()
58
59 # Start DFS traversal from root
60 serialize_and_count(root)
61
62 return duplicate_roots
63
1/**
2 * Definition for a binary tree node.
3 * public class TreeNode {
4 * int val;
5 * TreeNode left;
6 * TreeNode right;
7 * TreeNode() {}
8 * TreeNode(int val) { this.val = val; }
9 * TreeNode(int val, TreeNode left, TreeNode right) {
10 * this.val = val;
11 * this.left = left;
12 * this.right = right;
13 * }
14 * }
15 */
16class Solution {
17 // Map to store serialized subtree representations and their occurrence count
18 private Map<String, Integer> subtreeCountMap;
19 // List to store root nodes of duplicate subtrees
20 private List<TreeNode> duplicateSubtrees;
21
22 /**
23 * Finds all duplicate subtrees in a binary tree.
24 * A duplicate subtree means there are at least two subtrees with the same structure and node values.
25 *
26 * @param root The root of the binary tree
27 * @return List of root nodes of all duplicate subtrees
28 */
29 public List<TreeNode> findDuplicateSubtrees(TreeNode root) {
30 // Initialize data structures
31 subtreeCountMap = new HashMap<>();
32 duplicateSubtrees = new ArrayList<>();
33
34 // Perform depth-first search to serialize and identify duplicates
35 serializeSubtree(root);
36
37 return duplicateSubtrees;
38 }
39
40 /**
41 * Performs post-order traversal to serialize each subtree and identify duplicates.
42 * Creates a unique string representation for each subtree using the pattern:
43 * "nodeValue,leftSubtreeString,rightSubtreeString"
44 *
45 * @param node The current node being processed
46 * @return String representation of the subtree rooted at current node
47 */
48 private String serializeSubtree(TreeNode node) {
49 // Base case: null node represented as "#"
50 if (node == null) {
51 return "#";
52 }
53
54 // Recursively serialize left and right subtrees (post-order traversal)
55 String leftSerialization = serializeSubtree(node.left);
56 String rightSerialization = serializeSubtree(node.right);
57
58 // Create unique serialization for current subtree
59 String currentSerialization = node.val + "," + leftSerialization + "," + rightSerialization;
60
61 // Update occurrence count for this subtree pattern
62 int currentCount = subtreeCountMap.getOrDefault(currentSerialization, 0) + 1;
63 subtreeCountMap.put(currentSerialization, currentCount);
64
65 // Add to result list only when we encounter the second occurrence
66 // This ensures each duplicate subtree is added only once
67 if (currentCount == 2) {
68 duplicateSubtrees.add(node);
69 }
70
71 return currentSerialization;
72 }
73}
74
1/**
2 * Definition for a binary tree node.
3 * struct TreeNode {
4 * int val;
5 * TreeNode *left;
6 * TreeNode *right;
7 * TreeNode() : val(0), left(nullptr), right(nullptr) {}
8 * TreeNode(int x) : val(x), left(nullptr), right(nullptr) {}
9 * TreeNode(int x, TreeNode *left, TreeNode *right) : val(x), left(left), right(right) {}
10 * };
11 */
12class Solution {
13public:
14 vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
15 // Map to store serialized subtree representations and their occurrence count
16 unordered_map<string, int> subtreeCounter;
17
18 // Result vector to store root nodes of duplicate subtrees
19 vector<TreeNode*> duplicateSubtrees;
20
21 // Perform DFS traversal to serialize and identify duplicate subtrees
22 serializeSubtree(root, subtreeCounter, duplicateSubtrees);
23
24 return duplicateSubtrees;
25 }
26
27private:
28 /**
29 * Serializes the subtree rooted at the given node using post-order traversal.
30 * Tracks occurrences of each unique subtree structure.
31 *
32 * @param node Current node being processed
33 * @param subtreeCounter Map tracking serialized subtree occurrences
34 * @param duplicateSubtrees Vector collecting roots of duplicate subtrees
35 * @return Serialized string representation of the subtree
36 */
37 string serializeSubtree(TreeNode* node,
38 unordered_map<string, int>& subtreeCounter,
39 vector<TreeNode*>& duplicateSubtrees) {
40 // Base case: null node represented as "#"
41 if (!node) {
42 return "#";
43 }
44
45 // Recursively serialize left subtree
46 string leftSerialization = serializeSubtree(node->left, subtreeCounter, duplicateSubtrees);
47
48 // Recursively serialize right subtree
49 string rightSerialization = serializeSubtree(node->right, subtreeCounter, duplicateSubtrees);
50
51 // Build serialization string for current subtree
52 // Format: "nodeValue,leftSubtree,rightSubtree"
53 string currentSerialization = to_string(node->val) + "," +
54 leftSerialization + "," +
55 rightSerialization;
56
57 // Increment occurrence count for this subtree structure
58 subtreeCounter[currentSerialization]++;
59
60 // Add to result only when we encounter the second occurrence
61 // This ensures each duplicate subtree is added only once
62 if (subtreeCounter[currentSerialization] == 2) {
63 duplicateSubtrees.push_back(node);
64 }
65
66 return currentSerialization;
67 }
68};
69
1/**
2 * Definition for a binary tree node.
3 * class TreeNode {
4 * val: number
5 * left: TreeNode | null
6 * right: TreeNode | null
7 * constructor(val?: number, left?: TreeNode | null, right?: TreeNode | null) {
8 * this.val = (val===undefined ? 0 : val)
9 * this.left = (left===undefined ? null : left)
10 * this.right = (right===undefined ? null : right)
11 * }
12 * }
13 */
14
15/**
16 * Finds all duplicate subtrees in a binary tree.
17 * A duplicate subtree means there are two or more subtrees with the same structure and node values.
18 *
19 * @param root - The root node of the binary tree
20 * @returns An array containing one root node of each duplicate subtree
21 */
22function findDuplicateSubtrees(root: TreeNode | null): Array<TreeNode | null> {
23 // Map to store serialized subtree representations and their occurrence count
24 const subtreeCountMap = new Map<string, number>();
25
26 // Array to store the result - root nodes of duplicate subtrees
27 const duplicateSubtrees: Array<TreeNode | null> = [];
28
29 /**
30 * Performs depth-first search to serialize each subtree and identify duplicates.
31 * Uses post-order traversal to build serialized representation from bottom up.
32 *
33 * @param node - Current node being processed
34 * @returns Serialized string representation of the subtree rooted at this node
35 */
36 const serializeAndFindDuplicates = (node: TreeNode | null): string => {
37 // Base case: null node represented as '#'
38 if (node === null) {
39 return '#';
40 }
41
42 // Recursively serialize left and right subtrees
43 const leftSerialized: string = serializeAndFindDuplicates(node.left);
44 const rightSerialized: string = serializeAndFindDuplicates(node.right);
45
46 // Create unique serialization for current subtree
47 // Format: "nodeValue,leftSubtree,rightSubtree"
48 const currentSubtreeSerialized: string = `${node.val},${leftSerialized},${rightSerialized}`;
49
50 // Update occurrence count for this subtree structure
51 const currentCount: number = subtreeCountMap.get(currentSubtreeSerialized) ?? 0;
52 subtreeCountMap.set(currentSubtreeSerialized, currentCount + 1);
53
54 // Add to result only when we encounter the second occurrence
55 // This ensures each duplicate subtree is added only once
56 if (subtreeCountMap.get(currentSubtreeSerialized) === 2) {
57 duplicateSubtrees.push(node);
58 }
59
60 return currentSubtreeSerialized;
61 };
62
63 // Start the DFS traversal from root
64 serializeAndFindDuplicates(root);
65
66 return duplicateSubtrees;
67}
68
Time and Space Complexity
Time Complexity: O(n^2)
The algorithm performs a depth-first traversal of the tree where each node is visited once. For each node, we construct a string representation by concatenating:
- The node's value
- The string representation of its left subtree
- The string representation of its right subtree
In the worst case (a completely unbalanced tree forming a linear chain), the string concatenation at each level creates strings of lengths 1, 2, 3, ..., n
. The total time for string operations becomes 1 + 2 + 3 + ... + n = O(n^2)
.
For a balanced tree, the string length at each node is proportional to the size of its subtree. While the concatenation work is more distributed, the total work across all nodes still sums to O(n^2)
due to repeated substring copying during string construction.
Space Complexity: O(n^2)
The space complexity consists of:
- Recursion stack:
O(h)
whereh
is the height of the tree. In the worst case,h = n
for a skewed tree, givingO(n)
. - Counter dictionary: Stores up to
n
unique serialized subtree representations. In the worst case (completely unbalanced tree), these strings have lengths1, 2, 3, ..., n
, requiring total space ofO(n^2)
. - Output list:
O(n)
in the worst case if all subtrees are duplicates.
The dominant factor is the storage of serialized strings in the counter, giving an overall space complexity of O(n^2)
.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Incorrect Serialization Format Leading to Collisions
One major pitfall is using an ambiguous serialization format that can cause different subtrees to produce the same string representation.
Problematic Example:
# WRONG: Using simple concatenation without delimiters
subtree_string = f'{node.val}{left}{right}'
This can cause collisions. Consider these two different subtrees:
- Subtree A: root=12, left=#, right=# → produces "12##"
- Subtree B: root=1, left=2, right=# → where node 2 has left=# and right=# → also produces "12##"
These are structurally different trees but would incorrectly be identified as duplicates!
Solution: Always use clear delimiters between components:
# CORRECT: Using commas as delimiters
subtree_string = f'{node.val},{left},{right}'
2. Adding All Occurrences Instead of One Representative
A common mistake is adding the root node to the result list every time a duplicate is found, rather than just once per unique subtree pattern.
Problematic Example:
# WRONG: Adding every duplicate occurrence if subtree_count[subtree_serialization] >= 2: duplicate_roots.append(node)
This would add the same subtree pattern multiple times if it appears 3+ times in the tree.
Solution: Only add when count equals exactly 2:
# CORRECT: Add only on the second occurrence if subtree_count[subtree_serialization] == 2: duplicate_roots.append(node)
3. Using Pre-order or In-order Traversal for Serialization
While post-order traversal isn't strictly required, using pre-order or in-order can make the implementation more error-prone or less intuitive.
Why Post-order Works Best:
- You need the complete serialization of child subtrees before processing the parent
- Post-order naturally provides this by processing children first
- The string for each subtree is fully constructed when needed by its parent
Example of Pre-order Attempt (more complex):
# More complex with pre-order - requires careful handling
def serialize(node):
if not node:
return '#'
# Need to carefully manage the recursion and string building
return f'{node.val},{serialize(node.left)},{serialize(node.right)}'
4. Not Handling Node Values with Special Characters
If node values can contain the delimiter character (like commas), the serialization breaks.
Problematic Scenario: If a node value is "1,2" and you use comma as delimiter, the serialization becomes ambiguous.
Solution Options:
- Use a delimiter that cannot appear in values (like '\x00' for strings)
- Escape special characters in node values
- Use length-prefixed encoding
# Better for arbitrary values
subtree_string = f'{node.val}\x00{left}\x00{right}'
# Or use a tuple/list approach that naturally handles any value type
Which of the following is a good use case for backtracking?
Recommended Readings
Everything About Trees A tree is a type of graph data structure composed of nodes and edges Its main properties are It is acyclic doesn't contain any cycles There exists a path from the root to any node Has N 1 edges where N is the number of nodes in the tree and
https assets algo monster cover_photos dfs svg Depth First Search Prereqs Recursion Review problems recursion_intro Trees problems tree_intro With a solid understanding of recursion under our belts we are now ready to tackle one of the most useful techniques in coding interviews Depth First Search DFS As the name suggests
Binary Tree Min Depth Prereq BFS on Tree problems bfs_intro Given a binary tree find the depth of the shallowest leaf node https assets algo monster binary_tree_min_depth png Explanation We can solve this problem with either DFS or BFS With DFS we traverse the whole tree looking for leaf nodes and record and update the minimum depth as we go With BFS though since we search level by level we are guaranteed to find the shallowest leaf node
Want a Structured Path to Master System Design Too? Don’t Miss This!