1948. Delete Duplicate Folders in System


Problem Description

In this problem, we have a file system with many duplicate folders due to a bug. We are given a 2D array paths, where paths[i] is an array representing an absolute path to the ith folder in the file system.

Two folders (not necessarily on the same level) are identical if they contain the same non-empty set of identical subfolders and underlying subfolder structure. The folders do not need to be at the root level to be identical. If two or more folders are identical, then we need to mark the folders as well as all their subfolders.

The file system will delete all the marked folders and their subfolders once and then return the 2D array ans containing the remaining paths after the deletion. The paths can be returned in any order.

Example:

Let's walk through an example to better understand the problem.

Given the input paths:

1
2
3[
4  ["a"],
5  ["c"],
6  ["a", "x"],
7  ["c", "x"],
8  ["b"],
9  ["c", "x", "y"],
10  ["a", "x", "y"]
11]

The file system looks like:

1
2
3- a
4   - x
5      - y
6- b
7- c
8   - x
9      - y

As we can see, folders "a" and "c" have the same structure and same subfolders. Hence, we need to mark and delete them along with their subfolders.

After deleting, the remaining file system looks like:

1
2
3- b

So, the output ans should be: [["b"]].

Solution Explanation

To solve this problem, we can utilize a Trie data structure to represent the folder structure. Each Trie node will have a children map and a deleted flag. The key in the children map will be the folder name (a string) and the value will be the child Trie node representing the subfolder.

We can follow these steps to remove the duplicate folders:

  1. Create and populate the Trie based on the given input paths.
  2. Traverse the Trie recursively and build a unique representation of the subtree rooted at each Trie node. We can use the subtree string representation to maintain a map pointing to the Trie nodes with the same subtree strings.
  3. Check the map created in step 2. If any subtree string has multiple Trie nodes, those nodes (and their subfolders) are duplicates, so we mark them as deleted.
  4. Starting from the root, traverse the Trie again and construct the remaining paths by ignoring the marked Trie nodes.

Let's discuss the solution with an example.

1
2
3paths = [
4  ["a"],
5  ["c"],
6  ["a", "x"],
7  ["c", "x"],
8  ["b"],
9  ["c", "x", "y"],
10  ["a", "x", "y"]
11]

First, we populate the Trie based on the paths (step 1):

1
2
3- root
4   - a
5      - x
6         - y
7   - b
8   - c
9      - x
10         - y

The traversal of the Trie to build subtree string representations (step 2) results in:

1
2
3root => "((a(x(y)))(b)(c(x(y))))"
4a    => "(x(y))"
5x    => "(y)"
6y    => "()"
7b    => "()"
8c    => "(x(y))"

The subtree string to Trie nodes map (subtreeToNodes in the code) will have:

1
2
3{
4  "()": [y, b],
5  "(y)": [x],
6  "(x(y))": [a, c],
7  "((a(x(y)))(b)(c(x(y))))": [root]
8}

As we can see, the subtree string "(x(y))" has two Trie nodes, so they are duplicates (step 3). Mark the nodes "a" and "c" as deleted:

1
2
3- root (not deleted)
4   - a (deleted)
5      - x (deleted)
6         - y (deleted)
7   - b (not deleted)
8   - c (deleted)
9      - x (deleted)
10         - y (deleted)

Now, traverse the Trie again to construct the remaining folder paths (step 4):

1
2
3["b"]

Here is the final C++ implementation of the solution provided:

1
2cpp
3struct TrieNode {
4  unordered_map<string, shared_ptr<TrieNode>> children;
5  bool deleted = false;
6};
7
8class Solution {
9 public:
10  vector<vector<string>> deleteDuplicateFolder(vector<vector<string>>& paths) {
11    vector<vector<string>> ans;
12    vector<string> path;
13    unordered_map<string, vector<shared_ptr<TrieNode>>> subtreeToNodes;
14
15    sort(begin(paths), end(paths));
16
17    for (const vector<string>& path : paths) {
18      shared_ptr<TrieNode> node = root;
19      for (const string& s : path) {
20        if (!node->children.count(s))
21          node->children[s] = make_shared<TrieNode>();
22        node = node->children[s];
23      }
24    }
25
26    buildSubtreeToRoots(root, subtreeToNodes);
27
28    for (const auto& [_, nodes] : subtreeToNodes)
29      if (nodes.size() > 1)
30        for (shared_ptr<TrieNode> node : nodes)
31          node->deleted = true;
32
33    constructPath(root, path, ans);
34    return ans;
35  }
36
37 private:
38  shared_ptr<TrieNode> root = make_shared<TrieNode>();
39
40  string buildSubtreeToRoots(
41      shared_ptr<TrieNode> node,
42      unordered_map<string, vector<shared_ptr<TrieNode>>>& subtreeToNodes) {
43    string subtree = "(";
44    for (const auto& [s, child] : node->children)
45      subtree += s + buildSubtreeToRoots(child, subtreeToNodes);
46    subtree += ")";
47    if (subtree != "()")
48      subtreeToNodes[subtree].push_back(node);
49    return subtree;
50  }
51
52  void constructPath(shared_ptr<TrieNode> node, vector<string>& path,
53                     vector<vector<string>>& ans) {
54    for (const auto& [s, child] : node->children)
55      if (!child->deleted) {
56        path.push_back(s);
57        constructPath(child, path, ans);
58        path.pop_back();
59      }
60    if (!path.empty())
61      ans.push_back(path);
62  }
63};
64

This solution's time complexity will be O(N * L) where N is the number of paths and L is the length of the strings involved. The space complexity will be O(N * L) as well due to the Trie data structure.## Python Solution

Now, let's implement the same solution in Python:

1
2python
3from collections import defaultdict
4
5class TrieNode:
6    def __init__(self):
7        self.children = defaultdict(TrieNode)
8        self.deleted = False
9
10class Solution:
11    def deleteDuplicateFolder(self, paths: List[List[str]]) -> List[List[str]]:
12        ans = []
13        path = []
14        subtree_to_nodes = defaultdict(list)
15        
16        paths.sort()
17        
18        root = TrieNode()
19        for p in paths:
20            node = root
21            for s in p:
22                node = node.children[s]
23        
24        self.build_subtree_to_nodes(root, subtree_to_nodes)
25        
26        for nodes in subtree_to_nodes.values():
27            if len(nodes) > 1:
28                for node in nodes:
29                    node.deleted = True
30    
31        self.construct_path(root, path, ans)
32        return ans
33    
34    def build_subtree_to_nodes(self, node: TrieNode, subtree_to_nodes: dict) -> str:
35        subtree = "("
36        for s, child in node.children.items():
37            subtree += s + self.build_subtree_to_nodes(child, subtree_to_nodes)
38        subtree += ")"
39        if subtree != "()":
40            subtree_to_nodes[subtree].append(node)
41        return subtree
42
43    def construct_path(self, node: TrieNode, path: List[str], ans: List[List[str]]):
44        for s, child in node.children.items():
45            if not child.deleted:
46                path.append(s)
47                self.construct_path(child, path, ans)
48                path.pop()
49        if path:
50            ans.append(path[:])

This Python solution has the same time complexity O(N * L) and space complexity O(N * L).

JavaScript Solution

Finally, let's implement the solution in JavaScript:

1
2javascript
3class TrieNode {
4    constructor() {
5        this.children = new Map();
6        this.deleted = false;
7    }
8}
9
10class Solution {
11    deleteDuplicateFolder(paths) {
12        const ans = [];
13        const path = [];
14        const subtreeToNodes = new Map();
15        
16        paths.sort();
17        
18        const root = new TrieNode();
19        for (const p of paths) {
20            let node = root;
21            for (const s of p) {
22                if (!node.children.has(s))
23                    node.children.set(s, new TrieNode());
24                node = node.children.get(s);
25            }
26        }
27        
28        this.buildSubtreeToNodes(root, subtreeToNodes);
29        
30        for (const nodes of subtreeToNodes.values()) {
31            if (nodes.length > 1) {
32                for (const node of nodes) {
33                    node.deleted = true;
34                }
35            }
36        }
37    
38        this.constructPath(root, path, ans);
39        return ans;
40    }
41
42    buildSubtreeToNodes(node, subtreeToNodes) {
43        let subtree = "(";
44        for (const [s, child] of node.children.entries()) {
45            subtree += s + this.buildSubtreeToNodes(child, subtreeToNodes);
46        }
47        subtree += ")";
48        if (subtree !== "()") {
49            if (!subtreeToNodes.has(subtree))
50                subtreeToNodes.set(subtree, []);
51            subtreeToNodes.get(subtree).push(node);
52        }
53        return subtree;
54    }
55
56    constructPath(node, path, ans) {
57        for (const [s, child] of node.children.entries()) {
58            if (!child.deleted) {
59                path.push(s);
60                this.constructPath(child, path, ans);
61                path.pop();
62            }
63        }
64        if (path.length > 0)
65            ans.push(Array.from(path));
66    }
67}

This JavaScript solution also has the same time complexity O(N * L) and space complexity O(N * L).

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

What is the best way of checking if an element exists in a sorted array once in terms of time complexity? Select the best that applies.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which of the following array represent a max heap?

Solution Implementation

Not Sure What to Study? Take the 2-min Quiz:

What is the running time of the following code?

1int sqrt(int n) {
2  for (int guess = 1; guess * guess <= n; guess++) {
3    if (guess * guess == n) {
4      return guess;
5    }
6  }
7  return -1;
8}
Fast Track Your Learning with Our Quick Skills Quiz:

What is the best way of checking if an element exists in a sorted array once in terms of time complexity? Select the best that applies.


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫