1233. Remove Sub-Folders from the Filesystem
Problem Description
In this problem, we are given a list of folder paths in a file system. Each folder path is represented as a string, which follows a specific format - a concatenation of lowercase English letters, where each directory in the path is preceded by a slash ('/'
). For example, "/leetcode"
and "/leetcode/problems"
are valid folder paths.
Our goal is to eliminate any folder paths that are sub-folders of other folder paths. A folder A
is considered a sub-folder of folder B
if the path of A
starts with the entire path of B
followed directly by a slash and additional characters.
For instance, given a folder path "/leetcode"
, any folder that has a path starting with "/leetcode/"
, such as "/leetcode/problems"
, is considered a sub-folder of "/leetcode"
.
We have to identify all the "unique" folders after removing these sub-folders and can return the remaining paths in any order.
Flowchart Walkthrough
First, let's analyze the problem using the Flowchart. Here's a step-by-step examination to determine the appropriate algorithm:
Is it a graph?
- Yes: The filesystem with its hierarchical structure can be modeled as a graph/tree where directories are nodes and parent-child relationships between directories form the edges.
Is it a tree?
- Yes: Since directories inherently form a hierarchical (tree) structure where each directory has one parent (with the exception of the root), the model is a tree.
Since the model is a tree, the conclusion from the flowchart is to use the Depth-First Search (DFS) pattern. This choice is ideal for exploring all the paths in the tree-like structure efficiently, which is critical for determining which folders are sub-folders of others and need to be removed.
Intuition
The provided solution works by sorting the list of folders first. Sorting is key because it ensures that any potential sub-folder paths come immediately after the parent folder path. This is due to lexicographical ordering, where a string that starts with another string will come right after it.
After sorting, we iterate through the list starting with the second folder path (as the first folder cannot be a sub-folder of any other). With each folder path, we compare it to the most recently added folder in our answer list, which is initially the first folder from the sorted list.
During this iteration, if the current path does not start with the path of the last added folder (which would make it a sub-folder), it gets added to the answer list, effectively keeping it as a unique folder.
This is achieved with a check that ensures that the last folder is not a prefix of the current folder followed immediately by a slash (to guarantee it's a sub-folder and not just a folder with a similar prefix).
In this way, we build a collection of folder paths, removing any sub-folders and retaining only the "unique" ones. The final result is a list of the cleaned folder paths.
Learn more about Depth-First Search and Trie patterns.
Solution Approach
The solution begins with sorting the list of folders. This is done using Python's built-in sort()
method, which arranges the folder paths in lexicographical order. The significance of sorting here is that if one folder is a sub-folder of another, it will appear immediately after its parent folder in the sorted list.
folder.sort()
After sorting the folders, the algorithm initializes an ans
list with the first folder path in it, since there's nothing to compare it with, it can't be a sub-folder of any sort:
ans = [folder[0]]
Next, we loop over the remaining folder paths starting from the second item. The for
loop determines if the current folder (f
) is a sub-folder of the most recently added folder in the ans
list. To check this, we compare the lengths of the current folder (f
) and the last added folder (ans[-1]
), and we also ensure that ans[-1]
is the prefix of f
followed by a '/':
for f in folder[1:]:
m, n = len(ans[-1]), len(f)
The condition inside the loop checks two things:
- If
m
is not less thann
,f
cannot be a sub-folder ofans[-1]
because a sub-folder must have a longer path than its parent folder. - If
ans[-1]
is not a prefix off
or if the character inf
that immediately follows the prefix is not the slash/
,f
is not a sub-folder.
If these conditions are not met, it means f
is a unique folder (not a sub-folder), so it gets appended to ans
:
if m >= n or not (ans[-1] == f[:m] and f[m] == '/'): ans.append(f)
At the end of the loop, ans
contains all unique folders, excluding any sub-folders, following the criteria. We return ans
as the final result:
return ans
In terms of data structures and patterns, the solution mainly relies on list operations and basic string manipulation. The sorting operation helps utilize the power of lexicographical order to organize and efficiently identify sub-folders based on their path prefixes.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's say we have the following list of folder paths:
["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"]
According to the problem statement, we need to remove any paths that are sub-folders of any other path. We'll illustrate the solution approach using this list.
Firstly, we sort the list of folders:
Before sorting: ["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"]
After sorting: ["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"]
In this case, the list remains the same after sorting because it was already in lexicographical order. But the sorting step is crucial because if our list had been unordered, sorting would have arranged any sub-folders immediately after their parent folders.
We initialize the ans
list with the first folder:
ans = ["/a"]
We then loop over the remaining folders in the list starting from the second one.
- We compare
"/a/b"
with the last element inans
, which is"/a"
. Since"/a/b"
starts with"/a/"
and has additional characters after"/a"
, it is a sub-folder, and we do not add it toans
. - Next, we compare
"/c/d"
with the last element inans
which is still"/a"
. It does not start with"/a/"
so we add it toans
.
ans = ["/a", "/c/d"]
- We move to
"/c/d/e"
and compare it with the last element inans
, which is now"/c/d"
. Since"/c/d/e"
starts with"/c/d/"
, it is a sub-folder, and we do not add it. - Lastly, we compare
"/c/f"
with"/c/d"
. It does not start with"/c/d/"
, so we add it toans
.
ans = ["/a", "/c/d", "/c/f"]
Now, the loop has ended, and the ans
list contains all the unique folder paths, excluding any sub-folders that were removed:
Result: ["/a", "/c/d", "/c/f"]
Finally, we return this list as the solution. The whole process demonstrates how the combination of sorting and prefix comparison efficiently filters out the sub-folders.
Solution Implementation
1class Solution:
2 def removeSubfolders(self, folders: List[str]) -> List[str]:
3 # Sort the list of folder paths to ensure that parent directories are before their subfolders
4 folders.sort()
5
6 # Initialize the result list with the first folder since it is guaranteed not to be a subfolder
7 result = [folders[0]]
8
9 # Iterate through the sorted folders starting from the second folder
10 for folder in folders[1:]:
11 last_added_folder_length = len(result[-1]) # Length of the last added folder to result
12 current_folder_length = len(folder) # Length of the current folder
13
14 # Check if the current folder is not a subfolder of the last added folder
15 # This is determined by checking if the last added folder does not match the prefix of the current folder and there is a '/' after the prefix
16 if last_added_folder_length >= current_folder_length or not (result[-1] == folder[:last_added_folder_length] and folder[last_added_folder_length] == '/'):
17 result.append(folder) # The current folder is not a subfolder, so add it to the result
18
19 # Return the list of folders without subfolders
20 return result
21
1class Solution {
2 public List<String> removeSubfolders(String[] folders) {
3 // Sort the array of folder paths to ensure that parent folders come before their subfolders.
4 Arrays.sort(folders);
5
6 // Initialize the answer list with the first folder path since it can not be a subfolder.
7 List<String> filteredFolders = new ArrayList<>();
8 filteredFolders.add(folders[0]);
9
10 // Iterate through the sorted folder paths starting from the index 1.
11 for (int i = 1; i < folders.length; ++i) {
12 // Get the length of the last added folder path in the answer list.
13 int lastAddedFolderPathLength = filteredFolders.get(filteredFolders.size() - 1).length();
14
15 // Get the current folder path length.
16 int currentFolderPathLength = folders[i].length();
17
18 // Check if the last added folder is a prefix of the current folder and if there is a '/' right after it.
19 // If the last added folder path is not a prefix of the current folder (or it is the complete current folder),
20 // or the character just after the prefix is not '/', then it is not a subfolder.
21 if (lastAddedFolderPathLength >= currentFolderPathLength
22 || !(filteredFolders.get(filteredFolders.size() - 1).equals(folders[i].substring(0, lastAddedFolderPathLength))
23 && folders[i].charAt(lastAddedFolderPathLength) == '/')) {
24 // If the current folder is not a subfolder, add it to the filtered list.
25 filteredFolders.add(folders[i]);
26 }
27 }
28
29 // Return the list of filtered folders with subfolders removed.
30 return filteredFolders;
31 }
32}
33
1#include <vector>
2#include <string>
3#include <algorithm>
4
5class Solution {
6public:
7 // This function takes a list of folder paths and removes any subfolders
8 // from the list since a subfolder is implicitly included when its parent
9 // folder is included.
10 vector<string> removeSubfolders(vector<string>& folders) {
11 // Sort the folder list to ensure that subfolders follow their parents
12 sort(folders.begin(), folders.end());
13
14 // Initialize the answer list with the first (smallest) folder
15 vector<string> filteredFolders = {folders[0]};
16
17 // Iterate through the sorted folders starting from the second element
18 for (int i = 1; i < folders.size(); ++i) {
19 int parentSize = filteredFolders.back().size();
20 int currentSize = folders[i].size();
21
22 // Check if the current folder is a subfolder of the last parent folder
23 // Subfolder check: the parent is a prefix and is followed by a '/' in the current folder
24 if (parentSize >= currentSize ||
25 !(filteredFolders.back() == folders[i].substr(0, parentSize) && folders[i][parentSize] == '/')) {
26 // If the current folder is not a subfolder, add it to the answer list
27 filteredFolders.emplace_back(folders[i]);
28 }
29 }
30
31 // Return the list of filtered folders
32 return filteredFolders;
33 }
34};
35
1function removeSubfolders(folders: string[]): string[] {
2 // Sort the folder list to ensure that subfolders follow their parent folders.
3 folders.sort();
4
5 // Initialize the filtered folder array with the first (lexicographically smallest) folder.
6 const filteredFolders: string[] = [folders[0]];
7
8 // Iterate through the sorted folders, starting with the second element.
9 for (let i = 1; i < folders.length; i++) {
10 const parentSize = filteredFolders[filteredFolders.length - 1].length;
11 const currentSize = folders[i].length;
12
13 // If the current folder is a subfolder of the last folder in filteredFolders, we skip it.
14 // To determine if it's a subfolder, we check if the last folder in filteredFolders is a prefix
15 // of the current folder followed by a '/'.
16 if (!(parentSize < currentSize &&
17 folders[i].substring(0, parentSize) === filteredFolders[filteredFolders.length - 1] &&
18 folders[i][parentSize] === '/')) {
19 // If the current folder is not a subfolder, add it to the filtered list.
20 filteredFolders.push(folders[i]);
21 }
22 }
23
24 // Return the list of filtered folders.
25 return filteredFolders;
26}
27
Time and Space Complexity
Time Complexity
The given algorithm’s time complexity mainly consists of two parts: sorting the folder
list and iterating through the sorted list to check for subfolders.
-
Sorting the
folder
list using thesort()
method has a time complexity ofO(n log n)
, wheren
is the number of elements in thefolder
list. Sorting is necessary to ensure that any potential subfolder appears immediately after its parent folder, which enables the subsequent linear check. -
The second part of the algorithm iterates through the sorted
folder
list, performing a constant-time string comparison (checking prefix and the following character) for each pair of adjacent folders. This process has a time complexity ofO(m * n)
, wheren
is the number of folders, andm
is the average length of a folder’s path because a folder’s full path may need to be traversed to perform the comparison.
Overall, the time complexity of the given algorithm can be expressed as O(n log n + m * n)
. Since O(n log n)
is dominated by O(m * n)
for large values of n
, you could consider the overall time complexity to be O(m * n)
, where m
is the average string length and n
is the number of strings.
Space Complexity
The space complexity of the algorithm is determined by the additional space used:
-
The
ans
list is used to store the resulting list of folder paths that are not subfolders of any other folders in the list. In the worst case, none of the folders are subfolders of others, and hence, theans
list will contain all the folder paths. This gives us a space complexity ofO(n * m)
, wheren
is the number of folders andm
is the average length of a folder’s path. -
No other additional data structures that grow with input size are used, so other space considerations are constant and can be ignored in Big O notation.
Therefore, the overall space complexity is O(n * m)
, which accounts for the space required to store the list of folder paths in the ans
list.
Learn more about how to find time and space complexity quickly using problem constraints.
Consider the classic dynamic programming of fibonacci numbers, what is the recurrence relation?
Recommended Readings
https algomonster s3 us east 2 amazonaws com cover_photos dfs svg Depth First Search Prereqs Recursion Review problems recursion_intro Trees problems tree_intro With a solid understanding of recursion under our belts we are now ready to tackle one of the most useful techniques in coding interviews Depth First Search DFS
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Want a Structured Path to Master System Design Too? Don’t Miss This!