1948. Delete Duplicate Folders in System
Problem Description
In this problem, we have a file system with many duplicate folders due to a bug. We are given a 2D array paths
, where paths[i]
is an array representing an absolute path to the i
th folder in the file system.
Two folders (not necessarily on the same level) are identical if they contain the same non-empty set of identical subfolders and underlying subfolder structure. The folders do not need to be at the root level to be identical. If two or more folders are identical, then we need to mark the folders as well as all their subfolders.
The file system will delete all the marked folders and their subfolders once and then return the 2D array ans
containing the remaining paths after the deletion. The paths can be returned in any order.
Example:
Let's walk through an example to better understand the problem.
Given the input paths
:
1 2 3[ 4 ["a"], 5 ["c"], 6 ["a", "x"], 7 ["c", "x"], 8 ["b"], 9 ["c", "x", "y"], 10 ["a", "x", "y"] 11]
The file system looks like:
1 2 3- a 4 - x 5 - y 6- b 7- c 8 - x 9 - y
As we can see, folders "a" and "c" have the same structure and same subfolders. Hence, we need to mark and delete them along with their subfolders.
After deleting, the remaining file system looks like:
1 2 3- b
So, the output ans
should be: [["b"]]
.
Solution Explanation
To solve this problem, we can utilize a Trie data structure to represent the folder structure. Each Trie node will have a children
map and a deleted
flag. The key in the children
map will be the folder name (a string) and the value will be the child Trie node representing the subfolder.
We can follow these steps to remove the duplicate folders:
- Create and populate the Trie based on the given input
paths
. - Traverse the Trie recursively and build a unique representation of the subtree rooted at each Trie node. We can use the subtree string representation to maintain a map pointing to the Trie nodes with the same subtree strings.
- Check the map created in step 2. If any subtree string has multiple Trie nodes, those nodes (and their subfolders) are duplicates, so we mark them as
deleted
. - Starting from the root, traverse the Trie again and construct the remaining paths by ignoring the marked Trie nodes.
Let's discuss the solution with an example.
1 2 3paths = [ 4 ["a"], 5 ["c"], 6 ["a", "x"], 7 ["c", "x"], 8 ["b"], 9 ["c", "x", "y"], 10 ["a", "x", "y"] 11]
First, we populate the Trie based on the paths (step 1):
1 2 3- root 4 - a 5 - x 6 - y 7 - b 8 - c 9 - x 10 - y
The traversal of the Trie to build subtree string representations (step 2) results in:
1 2 3root => "((a(x(y)))(b)(c(x(y))))" 4a => "(x(y))" 5x => "(y)" 6y => "()" 7b => "()" 8c => "(x(y))"
The subtree string to Trie nodes map (subtreeToNodes
in the code) will have:
1 2 3{ 4 "()": [y, b], 5 "(y)": [x], 6 "(x(y))": [a, c], 7 "((a(x(y)))(b)(c(x(y))))": [root] 8}
As we can see, the subtree string "(x(y))" has two Trie nodes, so they are duplicates (step 3). Mark the nodes "a" and "c" as deleted:
1 2 3- root (not deleted) 4 - a (deleted) 5 - x (deleted) 6 - y (deleted) 7 - b (not deleted) 8 - c (deleted) 9 - x (deleted) 10 - y (deleted)
Now, traverse the Trie again to construct the remaining folder paths (step 4):
1 2 3["b"]
Here is the final C++ implementation of the solution provided:
1
2cpp
3struct TrieNode {
4 unordered_map<string, shared_ptr<TrieNode>> children;
5 bool deleted = false;
6};
7
8class Solution {
9 public:
10 vector<vector<string>> deleteDuplicateFolder(vector<vector<string>>& paths) {
11 vector<vector<string>> ans;
12 vector<string> path;
13 unordered_map<string, vector<shared_ptr<TrieNode>>> subtreeToNodes;
14
15 sort(begin(paths), end(paths));
16
17 for (const vector<string>& path : paths) {
18 shared_ptr<TrieNode> node = root;
19 for (const string& s : path) {
20 if (!node->children.count(s))
21 node->children[s] = make_shared<TrieNode>();
22 node = node->children[s];
23 }
24 }
25
26 buildSubtreeToRoots(root, subtreeToNodes);
27
28 for (const auto& [_, nodes] : subtreeToNodes)
29 if (nodes.size() > 1)
30 for (shared_ptr<TrieNode> node : nodes)
31 node->deleted = true;
32
33 constructPath(root, path, ans);
34 return ans;
35 }
36
37 private:
38 shared_ptr<TrieNode> root = make_shared<TrieNode>();
39
40 string buildSubtreeToRoots(
41 shared_ptr<TrieNode> node,
42 unordered_map<string, vector<shared_ptr<TrieNode>>>& subtreeToNodes) {
43 string subtree = "(";
44 for (const auto& [s, child] : node->children)
45 subtree += s + buildSubtreeToRoots(child, subtreeToNodes);
46 subtree += ")";
47 if (subtree != "()")
48 subtreeToNodes[subtree].push_back(node);
49 return subtree;
50 }
51
52 void constructPath(shared_ptr<TrieNode> node, vector<string>& path,
53 vector<vector<string>>& ans) {
54 for (const auto& [s, child] : node->children)
55 if (!child->deleted) {
56 path.push_back(s);
57 constructPath(child, path, ans);
58 path.pop_back();
59 }
60 if (!path.empty())
61 ans.push_back(path);
62 }
63};
64
This solution's time complexity will be O(N * L) where N is the number of paths and L is the length of the strings involved. The space complexity will be O(N * L) as well due to the Trie data structure.## Python Solution
Now, let's implement the same solution in Python:
1 2python 3from collections import defaultdict 4 5class TrieNode: 6 def __init__(self): 7 self.children = defaultdict(TrieNode) 8 self.deleted = False 9 10class Solution: 11 def deleteDuplicateFolder(self, paths: List[List[str]]) -> List[List[str]]: 12 ans = [] 13 path = [] 14 subtree_to_nodes = defaultdict(list) 15 16 paths.sort() 17 18 root = TrieNode() 19 for p in paths: 20 node = root 21 for s in p: 22 node = node.children[s] 23 24 self.build_subtree_to_nodes(root, subtree_to_nodes) 25 26 for nodes in subtree_to_nodes.values(): 27 if len(nodes) > 1: 28 for node in nodes: 29 node.deleted = True 30 31 self.construct_path(root, path, ans) 32 return ans 33 34 def build_subtree_to_nodes(self, node: TrieNode, subtree_to_nodes: dict) -> str: 35 subtree = "(" 36 for s, child in node.children.items(): 37 subtree += s + self.build_subtree_to_nodes(child, subtree_to_nodes) 38 subtree += ")" 39 if subtree != "()": 40 subtree_to_nodes[subtree].append(node) 41 return subtree 42 43 def construct_path(self, node: TrieNode, path: List[str], ans: List[List[str]]): 44 for s, child in node.children.items(): 45 if not child.deleted: 46 path.append(s) 47 self.construct_path(child, path, ans) 48 path.pop() 49 if path: 50 ans.append(path[:])
This Python solution has the same time complexity O(N * L) and space complexity O(N * L).
JavaScript Solution
Finally, let's implement the solution in JavaScript:
1 2javascript 3class TrieNode { 4 constructor() { 5 this.children = new Map(); 6 this.deleted = false; 7 } 8} 9 10class Solution { 11 deleteDuplicateFolder(paths) { 12 const ans = []; 13 const path = []; 14 const subtreeToNodes = new Map(); 15 16 paths.sort(); 17 18 const root = new TrieNode(); 19 for (const p of paths) { 20 let node = root; 21 for (const s of p) { 22 if (!node.children.has(s)) 23 node.children.set(s, new TrieNode()); 24 node = node.children.get(s); 25 } 26 } 27 28 this.buildSubtreeToNodes(root, subtreeToNodes); 29 30 for (const nodes of subtreeToNodes.values()) { 31 if (nodes.length > 1) { 32 for (const node of nodes) { 33 node.deleted = true; 34 } 35 } 36 } 37 38 this.constructPath(root, path, ans); 39 return ans; 40 } 41 42 buildSubtreeToNodes(node, subtreeToNodes) { 43 let subtree = "("; 44 for (const [s, child] of node.children.entries()) { 45 subtree += s + this.buildSubtreeToNodes(child, subtreeToNodes); 46 } 47 subtree += ")"; 48 if (subtree !== "()") { 49 if (!subtreeToNodes.has(subtree)) 50 subtreeToNodes.set(subtree, []); 51 subtreeToNodes.get(subtree).push(node); 52 } 53 return subtree; 54 } 55 56 constructPath(node, path, ans) { 57 for (const [s, child] of node.children.entries()) { 58 if (!child.deleted) { 59 path.push(s); 60 this.constructPath(child, path, ans); 61 path.pop(); 62 } 63 } 64 if (path.length > 0) 65 ans.push(Array.from(path)); 66 } 67}
This JavaScript solution also has the same time complexity O(N * L) and space complexity O(N * L).
What is the best way of checking if an element exists in a sorted array once in terms of time complexity? Select the best that applies.
Which of the following array represent a max heap?
Solution Implementation
What is the running time of the following code?
1int sqrt(int n) { 2 for (int guess = 1; guess * guess <= n; guess++) { 3 if (guess * guess == n) { 4 return guess; 5 } 6 } 7 return -1; 8}
What is the best way of checking if an element exists in a sorted array once in terms of time complexity? Select the best that applies.
Recommended Readings
Top Patterns to Conquer the Technical Coding Interview Should the written word bore you fear not A delightful video alternative awaits iframe width 560 height 315 src https www youtube com embed LW8Io6IPYHw title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Got a question? Ask the Teaching Assistant anything you don't understand.
Still not clear? Ask in the Forum, Discord or Submit the part you don't understand to our editors.