652. Find Duplicate Subtrees
Problem Description
In this problem, we are given a binary tree and we have to identify all the subtrees that appear more than once within that tree. A subtree consists of a node and all of its descendants. Subtrees are considered duplicates if they have identical structure and node values. That is to say, if you took two duplicate subtrees and compared them, they would be indistinguishable in terms of both their shape and the values held in each node. The problem requires returning a list where each element is the root node of one of the duplicate subtrees. If there are multiple duplicates of the same subtree, we only need to include one instance of it in our answer.
Flowchart Walkthrough
Let's determine the appropriate algorithm for solving Leetcode 652. Find Duplicate Subtrees, using the Flowchart. Below is a detailed explanation through the flowchart:
-
Is it a graph?
- Yes: In this problem, trees are graphs without cycles, and subtrees can be considered as smaller graphs.
-
Is it a tree?
- Yes: The main data structure discussed in the problem is a binary tree.
-
DFS
- The decision leads directly to applying Depth-First Search.
Conclusion: The flowchart dictates the use of DFS for solving the problem where we need to traverse a tree to find duplicate subtrees. DFS is well-suited as it allows exploring each node and its children thoroughly which is necessary to serialize or encode subtrees for comparison.
Intuition
The core idea of finding duplicate subtrees is to use a traversal algorithm to serialize or represent each subtree in a unique way so that we can compare them directly. Serialization means converting the structure of a subtree into a string (or any other form of a data structure that can be used for comparison purposes) which, if two serialized strings are the same, ensures that the subtrees they represent are identical.
A common method is to use a post-order depth-first search (DFS) to serialize the tree. Post-order traversal means we visit a node's left child, then its right child, and finally the node itself. We define a helper function dfs
to perform the traversal and serialize the subtrees as it recursively processes the tree. For each node, we create a string that includes the node's value and the serialization of its left and right children. This way, identical subtrees yield the same serialized string.
We store the frequency of each serialized subtree in a counter (a hashmap), and if we come across the same serialization again, it means we have found a duplicate subtree. We keep a list called ans
where we add the root of a subtree when we encounter its serialization the second time (when the count becomes 2). We continue this process until every subtree has been serialized and processed, ending up with a list of roots for all duplicate subtrees.
Learn more about Tree, Depth-First Search and Binary Tree patterns.
Solution Approach
The proposed solution involves implementing a post-order depth-first search (DFS) to traverse the tree, serialize each subtree, and use a counter to detect duplicates. The steps in the implementation and the data structures involved are described below:
-
Define a helper function
dfs(root)
that will:- Take the current root of a subtree as an input.
- Base case: If the current root is
None
, return a special placeholder ('#'
) to indicate a null node. - Recur for the left child and right child, serializing both subtrees.
- Construct the serialized version of the current subtree, denoted by
v
, using the value of the current node and the representations of the left and right subtrees in a comma-separated format (e.g.,val,left_repr,right_repr
).
-
Use a
Counter
, which is a special dictionary from Python'scollections
module that keeps track of the frequency of each serialized subtree representation. -
In the main
findDuplicateSubtrees
method, define an empty listans
to store the roots of detected duplicate subtrees. -
Call the helper function
dfs(root)
to start the process from the root of the entire tree. -
Within the
dfs
function, after getting the serialized representation of the current subtree:- Increment the count for this serialized form in the
counter
. - If the count is exactly 2, it means we've found a duplicate for the first time, so we append the current root to
ans
(we only add the subtrees when their count hits 2 to ensure we don't add duplicates of duplicates).
- Increment the count for this serialized form in the
-
Once the DFS and serialization process completes, return the list
ans
which now contains the roots of all duplicate subtrees.
The choice of using a post-order DFS is strategic for a couple of reasons:
- When you process a node, its left and right subtrees have already been serialized and compared, so the comparison of the structure (shape) is naturally included.
- The construction of strings for serialization allows an accurate and efficiently comparable representation of each subtree, making it easy to use a counter to detect duplicates.
By using serialization and a hash-based counter, this approach ensures that the resulting list contains each duplicate subtree exactly once, following the problem's constraints.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's use a small binary tree to illustrate the solution approach:
Consider the following binary tree:
1 / \ 2 3 / / \ 4 2 4 / 4
In this tree, the leftmost subtree 2 -> 4
appears again as the left subtree of node 3
. We'll walk through the solution steps to find this duplicate subtree.
-
We define a helper function
dfs
which takes a node of the binary tree and serializes the subtree rooted at that node. -
Starting with the root,
dfs
performs post-order traversal. This means it will first process all nodes in the left subtree, then the right subtree, and finally, the current node. -
At each node, the
dfs
function:- Returns the string
'#'
if the node isNone
. - Calls itself recursively to get the serialized string of the left subtree.
- Calls itself recursively to get the serialized string of the right subtree.
- Combines the value of the current node and the serialized strings from step 3 in a comma-separated format. For the root node '1', assuming its left serialization is '2,#,4' and right is '3,2,#,4,#,4', the serialized string would be '1,2,#,4,3,2,#,4,#,4'.
- Returns the string
-
We use a
Counter
to track the occurrence of each serialized subtree. Initially it is empty. -
Whenever we serialize a subtree in our
dfs
function:- We add the serialization to the
Counter
. - If we come across a serialized subtree that already has a count of 1, which means this is the second time we have found this subtree, we add its root node to the answer list
ans
.
- We add the serialization to the
Following the traversal, we end up with the following serializations after dfs
function is completed:
4
(for the leaves with value 4)2,#,4
(for the subtrees rooted at the nodes with value 2)3,2,#,4,#,4
(for the subtree rooted at the node with value 3)1,2,#,4,3,2,#,4,#,4
(for the whole tree)
The 'Counter' at the end of traversal would have the serialized strings with their frequencies:
'#'
: 3 (placeholder for null, for each node without one child or more)4
: 3 (for each leaf node)2,#,4
: 2 (for the subtrees that appear twice)- The rest will have a count of 1 because they don't have duplicates.
Since 2,#,4
has a frequency of 2, we only add the root node of its first instance to the ans
list, which contains the node with the value '2' that is the left child of the root node '1'.
- At the end of traversal, the
ans
list contains the root node of the duplicate subtree found, which is just the node with value '2' in our example.
After we have finished traversing the entire tree, the solution would be the node with value '2' that appeared as the left child of root '1', because the structure 2 -> 4
is duplicated in the tree.
Solution Implementation
1from collections import Counter
2from typing import Optional, List
3
4# Definition for a binary tree node.
5class TreeNode:
6 def __init__(self, val=0, left=None, right=None):
7 self.val = val
8 self.left = left
9 self.right = right
10
11class Solution:
12 def findDuplicateSubtrees(self, root: Optional[TreeNode]) -> List[Optional[TreeNode]]:
13 """
14 Finds all duplicate subtrees in a binary tree.
15
16 Args:
17 - root: The root node of the binary tree.
18
19 Returns:
20 - A list of tree nodes where each node represents the root of a duplicate subtree.
21 """
22
23 def traverse(node):
24 """
25 Performs a post-order traversal of the tree to serialize each subtree.
26
27 Args:
28 - node: The current node being visited.
29
30 Returns:
31 - A string representing the serialized form of the subtree rooted at the current node.
32 """
33 if node is None:
34 return '#'
35 # Serialize the subtree rooted at this node
36 serialized_subtree = f'{node.val},{traverse(node.left)},{traverse(node.right)}'
37 # Count the occurrence of this serialized subtree
38 subtree_counter[serialized_subtree] += 1
39 # If this is the second time we've seen this subtree, add it to the answer
40 if subtree_counter[serialized_subtree] == 2:
41 duplicate_subtrees.append(node)
42 return serialized_subtree
43
44 # List to hold the root nodes of duplicate subtrees
45 duplicate_subtrees = []
46 # Counter to track the number of times a serialized subtree occurs
47 subtree_counter = Counter()
48 # Start the recursive traversal from the root
49 traverse(root)
50
51 return duplicate_subtrees
52
1import java.util.HashMap;
2import java.util.Map;
3import java.util.List;
4import java.util.ArrayList;
5
6// Definition for a binary tree node.
7class TreeNode {
8 int val; // The value of the node
9 TreeNode left; // Pointer to the left child
10 TreeNode right; // Pointer to the right child
11
12 // Node constructor
13 TreeNode() {}
14
15 // Node constructor with just the value
16 TreeNode(int val) { this.val = val; }
17
18 // Node constructor with value and children
19 TreeNode(int val, TreeNode left, TreeNode right) {
20 this.val = val;
21 this.left = left;
22 this.right = right;
23 }
24}
25
26class Solution {
27 // This Map will store the serialization of each subtree and count its occurrences
28 private Map<String, Integer> subtreeCounter;
29
30 // This List will store all the duplicate subtrees finded
31 private List<TreeNode> duplicateSubtrees;
32
33 // This method will find all duplicate subtrees in the binary tree with root 'root'
34 public List<TreeNode> findDuplicateSubtrees(TreeNode root) {
35 subtreeCounter = new HashMap<>();
36 duplicateSubtrees = new ArrayList<>();
37 traverseAndSerialize(root); // Deserialize the tree and find duplicates
38 return duplicateSubtrees;
39 }
40
41 // This recursive method serializes a subtree rooted at 'root' in pre-order
42 // and uses that serialization to identify duplicate subtrees
43 private String traverseAndSerialize(TreeNode root) {
44 if (root == null) {
45 return "#"; // Using # to denote a null node
46 }
47
48 // Serialize the current subtree
49 String serialization = root.val + "," + traverseAndSerialize(root.left) + "," + traverseAndSerialize(root.right);
50
51 // Update the count of the current subtree serialization in the map
52 subtreeCounter.put(serialization, subtreeCounter.getOrDefault(serialization, 0) + 1);
53
54 // If the count is 2 (meaning it was 1 before this increment), we found a duplicate
55 if (subtreeCounter.get(serialization) == 2) {
56 duplicateSubtrees.add(root);
57 }
58
59 // Return the serialization of this subtree
60 return serialization;
61 }
62}
63
1#include <unordered_map>
2#include <vector>
3#include <string>
4
5using namespace std;
6
7// Definition for a binary tree node.
8struct TreeNode {
9 int val;
10 TreeNode *left;
11 TreeNode *right;
12 TreeNode() : val(0), left(nullptr), right(nullptr) {}
13 TreeNode(int x) : val(x), left(nullptr), right(nullptr) {}
14 TreeNode(int x, TreeNode *left, TreeNode *right) : val(x), left(left), right(right) {}
15};
16
17class Solution {
18public:
19 unordered_map<string, int> subtreeCounter; // Used for counting occurrences of subtrees.
20 vector<TreeNode*> duplicateSubtrees; // Stores the roots of duplicate subtrees.
21
22 // Function to find all duplicate subtrees in a binary tree.
23 vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
24 traverseAndSerialize(root);
25 return duplicateSubtrees;
26 }
27
28 // Helper function to perform DFS traversal and serialize each subtree.
29 string traverseAndSerialize(TreeNode* node) {
30 if (!node) return "#"; // Use "#" to represent null pointers.
31
32 // Serialize the current subtree rooted at node.
33 string serialization = to_string(node->val) + "," + traverseAndSerialize(node->left) + "," + traverseAndSerialize(node->right);
34
35 // Increment the count for this serialized subtree.
36 ++subtreeCounter[serialization];
37
38 // If this is the second time we've seen this subtree, add it to the answer.
39 if (subtreeCounter[serialization] == 2) {
40 duplicateSubtrees.push_back(node);
41 }
42
43 // Return the serialization of this subtree.
44 return serialization;
45 }
46};
47
1// The TreeNode class defines the structure of a node in the binary tree.
2class TreeNode {
3 val: number;
4 left: TreeNode | null;
5 right: TreeNode | null;
6 constructor(val?: number, left?: TreeNode | null, right?: TreeNode | null) {
7 this.val = val === undefined ? 0 : val;
8 this.left = left === undefined ? null : left;
9 this.right = right === undefined ? null : right;
10 }
11}
12
13/**
14 * Finds all duplicate subtrees in a binary tree.
15 * @param root - The root node of the binary tree.
16 * @returns An array of TreeNode, each representing a root of a duplicate subtree.
17 */
18function findDuplicateSubtrees(root: TreeNode | null): Array<TreeNode | null> {
19 // Mapping of tree serialization to its frequency.
20 const serializationCounts = new Map<string, number>();
21 // Results array to store the root nodes of duplicate subtrees.
22 const result: Array<TreeNode | null> = [];
23
24 /**
25 * A depth-first search function that serializes the tree as it traverses,
26 * and records the frequency of each serialization.
27 * @param node - The current node being processed.
28 * @returns - A string that uniquely represents the subtree rooted at the current node.
29 */
30 function dfs(node: TreeNode | null): string {
31 // Base case for a null node.
32 if (node == null) {
33 return '#';
34 }
35 // Serialization for non-null node: value, left subtree serialization, right subtree serialization.
36 const serialization = `${node.val},${dfs(node.left)},${dfs(node.right)}`;
37 // Update frequency count or set to 1 if seeing this serialization for the first time.
38 serializationCounts.set(serialization, (serializationCounts.get(serialization) ?? 0) + 1);
39 // If the serialization count is exactly 2, add the node to the results.
40 if (serializationCounts.get(serialization) === 2) {
41 result.push(node);
42 }
43 return serialization;
44 }
45
46 // Invoke the dfs function starting from the root to fill in the results.
47 dfs(root);
48
49 // Return the filled results array.
50 return result;
51}
52
Time and Space Complexity
Time Complexity:
The main operation in the provided code is a Depth-First Search (DFS) through the binary tree which involves visiting every node exactly once and constructing a serialization string for each node by recursively obtaining the serialization of its left and right children. The time complexity for visiting all nodes in the tree is O(N)
, where N
is the number of nodes in the tree. However, constructing the serialization string for a node involves concatenation operations, which in Python have a time complexity that depends on the length of the strings being concatenated. In the worst case, these strings can get as large as O(N)
when the tree degenerates into a linked list, with N being the number of nodes. This concatenation happens for every node, making the time complexity O(N^2)
in the worst case.
Additionally, the serialization string for each node is stored in a hash table (counter), and each insertion operation can be considered O(1)
on average, though in the worst case, due to hash collisions, it could be O(N)
. However, this is rare and typically the average case is expected.
Therefore, the average time complexity of the DFS is O(N^2)
due to the serialization concatenation, but it could potentially degrade to O(N^3)
if every hash insert operation takes O(N)
time, which is a very pessimistic assumption and unlikely in practice.
Space Complexity:
The space complexity of the algorithm comes from three main sources:
-
The recursion stack used by DFS, which goes as deep as the height of the tree. In the worst case (degenerate tree), this height is
O(N)
, contributingO(N)
to the space complexity. -
The counter hash table which stores the serialization strings for nodes. In the worst case, every node has a unique serialization, so the space used by the counter can grow up to
O(N^2)
since every serialization can be of lengthO(N)
. -
The list of answers (ans), which holds a reference to each node that has a duplicate subtree. Since each node could potentially be a part of this list just once, this contributes
O(N)
to space complexity.
Considering these parts, the overall space complexity of the solution is O(N^2)
due to the storage required for the serialization strings in the hash table.
Learn more about how to find time and space complexity quickly using problem constraints.
How many ways can you arrange the three letters A, B and C?
Recommended Readings
Everything About Trees A tree is a type of graph data structure composed of nodes and edges Its main properties are It is acyclic doesn't contain any cycles There exists a path from the root to any node Has N 1 edges where N is the number of nodes in the tree and
https algomonster s3 us east 2 amazonaws com cover_photos dfs svg Depth First Search Prereqs Recursion Review problems recursion_intro Trees problems tree_intro With a solid understanding of recursion under our belts we are now ready to tackle one of the most useful techniques in coding interviews Depth First Search DFS
Binary Tree Min Depth Prereq BFS on Tree problems bfs_intro Given a binary tree find the depth of the shallowest leaf node https algomonster s3 us east 2 amazonaws com binary_tree_min_depth png Explanation We can solve this problem with either DFS or BFS With DFS we traverse the whole tree looking for leaf nodes and record and update the minimum depth as we go With BFS though since we search level by level we are guaranteed to find the shallowest leaf node
Want a Structured Path to Master System Design Too? Don’t Miss This!