Leetcode 652. Find Duplicate Subtrees

Problem Explanation

In this problem, we are given a binary tree and we need to find all duplicate subtrees in it. A duplicate subtree is defined as two subtrees that have the same structure with same node values.

Let's go through an example to understand the problem better:

Suppose we have the following binary tree:

1    1
2   / \
3  2   3
4 /   / \
54   2   4
6   /
7  4

In this tree, we have two duplicate subtrees:

A.

1  2
2 /
34

and B:

14

Therefore, you need to return above trees' root in the form of a list.

Approach to Solution

For this problem, we can use depth-first search (DFS) to traverse the entire tree, and use a map to record all the subtrees we have encountered. The key of the map is the structure of the subtree and the value is the number of times we have encountered the subtree. If a subtree appears more than once, it means this subtree is a duplicate subtree.

To convert a subtree into a string, we can use pre-order traversal (root-left-right), and record the value of each node, as well as a special character "#" to indicate the end of the left or right subtree. When two subtrees have the same string representation, they are the same subtree.

Pseudo Algorithm

  1. Initialize an empty list ans and an empty map count.
  2. Define a function encode to convert a subtree into a string based on its structure.
  3. Traverse the tree in pre-order, and record the structure of each subtree in the map.
  4. If the count of a subtree is 2, add the root of the subtree to the list ans.

Algorithm Steps

Let's explain the algorithm with an example for the tree mentioned above:

Step 1: Initialize an empty list ans and an empty map count.

Step 2: Start the pre-order traversal from the root of the tree.

Step 3: Convert the subtree rooted at node 4 (left subtree of node 2) into a string "4##". Increase the count of "4##" by 1.

Step 4: Convert the subtree rooted at node 2 into a string "2#4###". Increase the count of "2#4###" by 1.

Step 5: Convert the subtree rooted at node 4 (right subtree of node 3) into a string "4##". Increase the count of "4##" by 2, and add node 4 to the list ans.

Step 6: Convert the subtree rooted at node 2 (left subtree of node 3) into a string "2#4###". Increase the count of "2#4###" by 2, and add node 2 to the list ans.

Step 7: Convert the subtree rooted at node 3 into a string "3#2#4###4##". Increase the count of "3#2#4###4##" by 1.

Step 8: Convert the entire tree into a string "1#2#4####3#2#4###4##". Increase the count of "1#2#4####3#2#4###4##" by 1.

Step 9: Return the list ans, which contains the roots of the duplicate subtrees.

Time complexity:

Our algorithm has the time complexity of O(n^2) where n is the number of nodes in the tree. This is because in the worst case scenario, we have to generate and store the string for all nodes, and each string generation can take O(n) time.

Space complexity:

The space complexity is also O(n^2) because in the worst case, we have to store the string for all nodes, and each string can take O(n) space.

Now, let's implement this algorithm in different languages:

Python

1
2python
3class Solution:
4    def findDuplicateSubtrees(self, root):
5        count = collections.Counter()
6        ans = []
7        self.collect(root, count, ans)
8        return ans
9    
10    def collect(self, node, count, ans):
11        if not node: return "#"
12        serial = "{},{},{}".format(node.val, self.collect(node.left, count, ans), self.collect(node.right, count, ans))
13        count[serial] += 1
14        if count[serial] == 2:
15            ans.append(node)
16        return serial

Java

1
2java
3class Solution {
4    public List<TreeNode> findDuplicateSubtrees(TreeNode root) {
5        List<TreeNode> res = new ArrayList<>();
6        postorder(root, new HashMap<>(), res);
7        return res;
8    }
9
10    public String postorder(TreeNode cur, Map<String, Integer> map, List<TreeNode> res) {
11        if (cur == null) return "#";  
12        String serial = cur.val + "," + postorder(cur.left, map, res) + "," + postorder(cur.right, map, res);
13        if (map.getOrDefault(serial, 0) == 1) res.add(cur);
14        map.put(serial, map.getOrDefault(serial, 0) + 1);
15        return serial;
16    }
17}

JavaScript

1
2javascript
3var findDuplicateSubtrees = function(root) {
4    let count = {}
5    let ans = []
6    collect(root)
7    return ans
8
9    function collect(node){
10        if(!node) return '#'
11        let serial = node.val + ',' + collect(node.left) + ',' + collect(node.right)
12        count[serial] = (count[serial] || 0) + 1
13        if(count[serial] == 2) ans.push(node)
14        return serial
15    }
16};

C++

1
2c++
3class Solution {
4public:
5    vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
6        unordered_map<string, int> count;
7        vector<TreeNode*> ans;
8        getString(root, count, ans);
9        return ans;
10    }
11    
12    string getString(TreeNode* root, unordered_map<string, int>& count, vector<TreeNode*>& ans) {
13        if(!root) return "#";
14        string str = to_string(root->val) + ',' + getString(root->left, count, ans) + ',' + getString(root->right, count, ans);
15        if(++count[str] == 2)
16            ans.push_back(root);
17        return str;
18    }
19};

C#

1
2csharp
3public class Solution {
4    private IDictionary<string, int> trees;
5    private IList<TreeNode> ans;
6    
7    public IList<TreeNode> FindDuplicateSubtrees(TreeNode root) {
8        trees = new Dictionary<string, int>();
9        ans = new List<TreeNode>();
10        Lookup(root);
11        return ans;
12    }
13
14    public string Lookup(TreeNode node) {
15        if (node == null) return "#";
16        string serial = node.val + "," + Lookup(node.left) + "," + Lookup(node.right);
17        if (!trees.ContainsKey(serial)) 
18            trees.Add(serial, 1);
19        else 
20            trees[serial]++;
21        
22        if (trees[serial] == 2)
23            ans.Add(node);
24        
25        return serial;
26    }
27}

Final Thoughts and Other Considerations

As seen from above codes, the main logic of the solution is similar among all of these programming languages. The key parts include defining a function to convert each subtree into a string and traversing the binary tree to count the number of appearances of each subtree. To do this, we've utilized hash maps in each solution, Python uses Counter, Java has HashMap, JavaScript uses {}, C++ employs unordered_map, and C# uses Dictionary. These all serve a similar purpose and functionality in their respective languages.

While this solution solves the problem, the time and space complexity are both high (O(n^2)). This is mainly due to the fact that we’re storing strings converted from every subtree into a hash map, which in the worst case leads to storage of n strings each of length n for memory and requires creating all strings for time.

However, this is acceptable as creating a unique string representation for a tree/subtree is essentially a widely used method in cases like these. Moreover, storing all strings is necessary in order to determine whether a tree is a duplicate. So in this case, it’s a necessary sacrifice to take a hit on efficiency.

If you're looking for a more efficient solution (e.g., you need to handle a very large binary tree), one possible approach is to design a unique and more compact representation of each subtree. For instance, instead of using preorder traversal to create the string, you can create a unique ID for each subtree structure and only store the IDs in the map, which will make it a bit more space-efficient. However, designing such a method can be quite challenging and is definitely beyond the scope of this problem. The existing solution is already quite optimal for this specific problem.

In conclusion, this problem is a great exercise for learning how to effectively and efficiently traverse a binary tree, using a hash map to count frequency, generate string representations for tree structures and also how to identify and handle duplications in a tree structure. This entire solution is a great demonstration of the importance of a combination of tree traversal, hashing, and string manipulation strategies in solving complex data structure problems.


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫