1233. Remove Sub-Folders from the Filesystem


Problem Description

In this problem, we are given a list of folder paths in a file system. Each folder path is represented as a string, which follows a specific format - a concatenation of lowercase English letters, where each directory in the path is preceded by a slash ('/'). For example, "/leetcode" and "/leetcode/problems" are valid folder paths.

Our goal is to eliminate any folder paths that are sub-folders of other folder paths. A folder A is considered a sub-folder of folder B if the path of A starts with the entire path of B followed directly by a slash and additional characters.

For instance, given a folder path "/leetcode", any folder that has a path starting with "/leetcode/", such as "/leetcode/problems", is considered a sub-folder of "/leetcode".

We have to identify all the "unique" folders after removing these sub-folders and can return the remaining paths in any order.

Intuition

The provided solution works by sorting the list of folders first. Sorting is key because it ensures that any potential sub-folder paths come immediately after the parent folder path. This is due to lexicographical ordering, where a string that starts with another string will come right after it.

After sorting, we iterate through the list starting with the second folder path (as the first folder cannot be a sub-folder of any other). With each folder path, we compare it to the most recently added folder in our answer list, which is initially the first folder from the sorted list.

During this iteration, if the current path does not start with the path of the last added folder (which would make it a sub-folder), it gets added to the answer list, effectively keeping it as a unique folder.

This is achieved with a check that ensures that the last folder is not a prefix of the current folder followed immediately by a slash (to guarantee it's a sub-folder and not just a folder with a similar prefix).

In this way, we build a collection of folder paths, removing any sub-folders and retaining only the "unique" ones. The final result is a list of the cleaned folder paths.

Learn more about Depth-First Search and Trie patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

What's the output of running the following function using the following tree as input?

1def serialize(root):
2    res = []
3    def dfs(root):
4        if not root:
5            res.append('x')
6            return
7        res.append(root.val)
8        dfs(root.left)
9        dfs(root.right)
10    dfs(root)
11    return ' '.join(res)
12
1import java.util.StringJoiner;
2
3public static String serialize(Node root) {
4    StringJoiner res = new StringJoiner(" ");
5    serializeDFS(root, res);
6    return res.toString();
7}
8
9private static void serializeDFS(Node root, StringJoiner result) {
10    if (root == null) {
11        result.add("x");
12        return;
13    }
14    result.add(Integer.toString(root.val));
15    serializeDFS(root.left, result);
16    serializeDFS(root.right, result);
17}
18
1function serialize(root) {
2    let res = [];
3    serialize_dfs(root, res);
4    return res.join(" ");
5}
6
7function serialize_dfs(root, res) {
8    if (!root) {
9        res.push("x");
10        return;
11    }
12    res.push(root.val);
13    serialize_dfs(root.left, res);
14    serialize_dfs(root.right, res);
15}
16

Solution Approach

The solution begins with sorting the list of folders. This is done using Python's built-in sort() method, which arranges the folder paths in lexicographical order. The significance of sorting here is that if one folder is a sub-folder of another, it will appear immediately after its parent folder in the sorted list.

1folder.sort()

After sorting the folders, the algorithm initializes an ans list with the first folder path in it, since there's nothing to compare it with, it can't be a sub-folder of any sort:

1ans = [folder[0]]

Next, we loop over the remaining folder paths starting from the second item. The for loop determines if the current folder (f) is a sub-folder of the most recently added folder in the ans list. To check this, we compare the lengths of the current folder (f) and the last added folder (ans[-1]), and we also ensure that ans[-1] is the prefix of f followed by a '/':

1for f in folder[1:]:
2    m, n = len(ans[-1]), len(f)

The condition inside the loop checks two things:

  1. If m is not less than n, f cannot be a sub-folder of ans[-1] because a sub-folder must have a longer path than its parent folder.
  2. If ans[-1] is not a prefix of f or if the character in f that immediately follows the prefix is not the slash /, f is not a sub-folder.

If these conditions are not met, it means f is a unique folder (not a sub-folder), so it gets appended to ans:

1if m >= n or not (ans[-1] == f[:m] and f[m] == '/'):
2    ans.append(f)

At the end of the loop, ans contains all unique folders, excluding any sub-folders, following the criteria. We return ans as the final result:

1return ans

In terms of data structures and patterns, the solution mainly relies on list operations and basic string manipulation. The sorting operation helps utilize the power of lexicographical order to organize and efficiently identify sub-folders based on their path prefixes.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Consider the classic dynamic programming of longest increasing subsequence:

Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.

For example, the length of LIS for [50, 3, 10, 7, 40, 80] is 4 and LIS is [3, 7, 40, 80].

What is the recurrence relation?

Example Walkthrough

Let's say we have the following list of folder paths:

["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"]

According to the problem statement, we need to remove any paths that are sub-folders of any other path. We'll illustrate the solution approach using this list.

Firstly, we sort the list of folders:

Before sorting: ["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"]

After sorting: ["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"]

In this case, the list remains the same after sorting because it was already in lexicographical order. But the sorting step is crucial because if our list had been unordered, sorting would have arranged any sub-folders immediately after their parent folders.

We initialize the ans list with the first folder:

ans = ["/a"]

We then loop over the remaining folders in the list starting from the second one.

  • We compare "/a/b" with the last element in ans, which is "/a". Since "/a/b" starts with "/a/" and has additional characters after "/a", it is a sub-folder, and we do not add it to ans.
  • Next, we compare "/c/d" with the last element in ans which is still "/a". It does not start with "/a/" so we add it to ans.

ans = ["/a", "/c/d"]

  • We move to "/c/d/e" and compare it with the last element in ans, which is now "/c/d". Since "/c/d/e" starts with "/c/d/", it is a sub-folder, and we do not add it.
  • Lastly, we compare "/c/f" with "/c/d". It does not start with "/c/d/", so we add it to ans.

ans = ["/a", "/c/d", "/c/f"]

Now, the loop has ended, and the ans list contains all the unique folder paths, excluding any sub-folders that were removed:

Result: ["/a", "/c/d", "/c/f"]

Finally, we return this list as the solution. The whole process demonstrates how the combination of sorting and prefix comparison efficiently filters out the sub-folders.

Solution Implementation

1class Solution:
2    def removeSubfolders(self, folders: List[str]) -> List[str]:
3        # Sort the list of folder paths to ensure that parent directories are before their subfolders
4        folders.sort()
5      
6        # Initialize the result list with the first folder since it is guaranteed not to be a subfolder
7        result = [folders[0]]
8      
9        # Iterate through the sorted folders starting from the second folder
10        for folder in folders[1:]:
11            last_added_folder_length = len(result[-1])  # Length of the last added folder to result
12            current_folder_length = len(folder)         # Length of the current folder
13          
14            # Check if the current folder is not a subfolder of the last added folder
15            # This is determined by checking if the last added folder does not match the prefix of the current folder and there is a '/' after the prefix
16            if last_added_folder_length >= current_folder_length or not (result[-1] == folder[:last_added_folder_length] and folder[last_added_folder_length] == '/'):
17                result.append(folder)  # The current folder is not a subfolder, so add it to the result
18      
19        # Return the list of folders without subfolders
20        return result
21
1class Solution {
2    public List<String> removeSubfolders(String[] folders) {
3        // Sort the array of folder paths to ensure that parent folders come before their subfolders.
4        Arrays.sort(folders);
5
6        // Initialize the answer list with the first folder path since it can not be a subfolder.
7        List<String> filteredFolders = new ArrayList<>();
8        filteredFolders.add(folders[0]);
9
10        // Iterate through the sorted folder paths starting from the index 1.
11        for (int i = 1; i < folders.length; ++i) {
12            // Get the length of the last added folder path in the answer list.
13            int lastAddedFolderPathLength = filteredFolders.get(filteredFolders.size() - 1).length();
14          
15            // Get the current folder path length.
16            int currentFolderPathLength = folders[i].length();
17          
18            // Check if the last added folder is a prefix of the current folder and if there is a '/' right after it.
19            // If the last added folder path is not a prefix of the current folder (or it is the complete current folder),
20            // or the character just after the prefix is not '/', then it is not a subfolder.
21            if (lastAddedFolderPathLength >= currentFolderPathLength
22                || !(filteredFolders.get(filteredFolders.size() - 1).equals(folders[i].substring(0, lastAddedFolderPathLength))
23                    && folders[i].charAt(lastAddedFolderPathLength) == '/')) {
24                // If the current folder is not a subfolder, add it to the filtered list.
25                filteredFolders.add(folders[i]);
26            }
27        }
28
29        // Return the list of filtered folders with subfolders removed.
30        return filteredFolders;
31    }
32}
33
1#include <vector>
2#include <string>
3#include <algorithm>
4
5class Solution {
6public:
7    // This function takes a list of folder paths and removes any subfolders
8    // from the list since a subfolder is implicitly included when its parent
9    // folder is included.
10    vector<string> removeSubfolders(vector<string>& folders) {
11        // Sort the folder list to ensure that subfolders follow their parents
12        sort(folders.begin(), folders.end());
13
14        // Initialize the answer list with the first (smallest) folder
15        vector<string> filteredFolders = {folders[0]};
16
17        // Iterate through the sorted folders starting from the second element
18        for (int i = 1; i < folders.size(); ++i) {
19            int parentSize = filteredFolders.back().size();
20            int currentSize = folders[i].size();
21          
22            // Check if the current folder is a subfolder of the last parent folder
23            // Subfolder check: the parent is a prefix and is followed by a '/' in the current folder
24            if (parentSize >= currentSize || 
25                !(filteredFolders.back() == folders[i].substr(0, parentSize) && folders[i][parentSize] == '/')) {
26                // If the current folder is not a subfolder, add it to the answer list
27                filteredFolders.emplace_back(folders[i]);
28            }
29        }
30
31        // Return the list of filtered folders
32        return filteredFolders;
33    }
34};
35
1function removeSubfolders(folders: string[]): string[] {
2    // Sort the folder list to ensure that subfolders follow their parent folders.
3    folders.sort();
4
5    // Initialize the filtered folder array with the first (lexicographically smallest) folder.
6    const filteredFolders: string[] = [folders[0]];
7
8    // Iterate through the sorted folders, starting with the second element.
9    for (let i = 1; i < folders.length; i++) {
10        const parentSize = filteredFolders[filteredFolders.length - 1].length;
11        const currentSize = folders[i].length;
12
13        // If the current folder is a subfolder of the last folder in filteredFolders, we skip it.
14        // To determine if it's a subfolder, we check if the last folder in filteredFolders is a prefix
15        // of the current folder followed by a '/'.
16        if (!(parentSize < currentSize &&
17              folders[i].substring(0, parentSize) === filteredFolders[filteredFolders.length - 1] &&
18              folders[i][parentSize] === '/')) {
19            // If the current folder is not a subfolder, add it to the filtered list.
20            filteredFolders.push(folders[i]);
21        }
22    }
23
24    // Return the list of filtered folders.
25    return filteredFolders;
26}
27
Not Sure What to Study? Take the 2-min Quiz:

A heap is a ...?

Time and Space Complexity

Time Complexity

The given algorithm’s time complexity mainly consists of two parts: sorting the folder list and iterating through the sorted list to check for subfolders.

  1. Sorting the folder list using the sort() method has a time complexity of O(n log n), where n is the number of elements in the folder list. Sorting is necessary to ensure that any potential subfolder appears immediately after its parent folder, which enables the subsequent linear check.

  2. The second part of the algorithm iterates through the sorted folder list, performing a constant-time string comparison (checking prefix and the following character) for each pair of adjacent folders. This process has a time complexity of O(m * n), where n is the number of folders, and m is the average length of a folder’s path because a folder’s full path may need to be traversed to perform the comparison.

Overall, the time complexity of the given algorithm can be expressed as O(n log n + m * n). Since O(n log n) is dominated by O(m * n) for large values of n, you could consider the overall time complexity to be O(m * n), where m is the average string length and n is the number of strings.

Space Complexity

The space complexity of the algorithm is determined by the additional space used:

  1. The ans list is used to store the resulting list of folder paths that are not subfolders of any other folders in the list. In the worst case, none of the folders are subfolders of others, and hence, the ans list will contain all the folder paths. This gives us a space complexity of O(n * m), where n is the number of folders and m is the average length of a folder’s path.

  2. No other additional data structures that grow with input size are used, so other space considerations are constant and can be ignored in Big O notation.

Therefore, the overall space complexity is O(n * m), which accounts for the space required to store the list of folder paths in the ans list.

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Consider the classic dynamic programming of longest increasing subsequence:

Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.

For example, the length of LIS for [50, 3, 10, 7, 40, 80] is 4 and LIS is [3, 7, 40, 80].

What is the recurrence relation?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫