1233. Remove Sub-Folders from the Filesystem
Problem Description
You are given a list of folder paths. Your task is to remove all sub-folders from this list and return only the parent folders.
A folder is considered a sub-folder of another folder if:
- It starts with the exact path of the parent folder
- It's followed by a
/
character
For example:
/a/b
is a sub-folder of/a
(because it starts with/a
and is followed by/
)/b
is NOT a sub-folder of/a/b/c
(doesn't start with/a/b/c
)
Path format rules:
- A valid path starts with
/
followed by one or more lowercase English letters - Multiple folder names can be concatenated with
/
separators - Valid examples:
/leetcode
,/leetcode/problems
- Invalid examples: empty string,
/
alone
The solution works by:
- Sorting the folder list lexicographically, which ensures parent folders appear before their sub-folders
- Starting with the first folder in the answer
- For each subsequent folder
f
, checking if it's a sub-folder of the last added folder:- If the last added folder's length is less than
f
's length, AND - If
f
starts with the last added folder's path, AND - If the character immediately after the matching prefix is
/
- Then
f
is a sub-folder and should be skipped - Otherwise, add
f
to the answer
- If the last added folder's length is less than
This approach efficiently identifies and removes all sub-folders, returning only the parent folders in the list.
Flowchart Walkthrough
First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:
Is it a graph?
- Yes: We can model the folder structure as a graph/tree where each folder is a node and the parent-child relationship forms edges. The filesystem naturally forms a hierarchical tree structure.
Is it a tree?
- Yes: The folder structure is indeed a tree - each folder (except root) has exactly one parent folder, and there are no cycles in the directory structure. The path
/a/b/c
represents a tree where/a
is parent of/b
, and/b
is parent of/c
.
DFS
- Conclusion: The flowchart leads us to DFS (Depth-First Search) as the appropriate algorithm.
The DFS pattern applies here because:
- We need to traverse the folder hierarchy to identify parent-child relationships
- We can use DFS to explore each path from root to leaves
- During traversal, we can mark or identify which folders are subfolders of others
- The sorting approach in the solution is essentially a clever way to simulate DFS traversal order - parent folders appear before their children when sorted lexicographically
While the provided solution uses sorting rather than explicit DFS traversal, the underlying logic follows the DFS pattern of processing parent nodes before their children, which is naturally achieved through lexicographical sorting of paths.
Intuition
When we think about folder structures, they naturally form a tree hierarchy. The key insight is that if we can process folders in a specific order - parents before children - we can easily identify and skip subfolders.
Consider this example: if we have /a
, /a/b
, and /a/b/c
, we want to keep only /a
. How can we efficiently identify that /a/b
and /a/b/c
are subfolders?
The clever observation is that lexicographical sorting gives us exactly the traversal order we need! When we sort folder paths alphabetically:
- Parent folders always appear before their subfolders
/a
comes before/a/b
which comes before/a/b/c
- This mimics a depth-first traversal of the folder tree
Once sorted, we can use a greedy approach: keep track of the last folder we added to our result. For each new folder we encounter:
- If it starts with our last added folder's path followed by a
/
, it's a subfolder - skip it - Otherwise, it's a new independent folder branch - add it to our result
For example, after adding /a
to our result, when we see /a/b
:
- We check: does
/a/b
start with/a/
? Yes! - So
/a/b
is a subfolder of/a
and we skip it
This approach works because once we've included a parent folder, we can immediately identify and skip all its descendants as they appear consecutively in the sorted order. The sorting essentially converts the tree structure into a linear sequence where the parent-child relationships are preserved and easily detectable through simple string prefix matching.
The beauty of this solution is that it avoids explicitly building a tree structure or using recursive DFS. Instead, it leverages the properties of lexicographical ordering to achieve the same effect with a simple linear scan.
Learn more about Depth-First Search and Trie patterns.
Solution Approach
The implementation follows the sorting-based approach we identified through our intuition:
Step 1: Sort the folders
folder.sort()
This sorts all folder paths lexicographically, ensuring parent folders appear before their subfolders. For example, ["/a/b", "/a", "/c/d", "/c"]
becomes ["/a", "/a/b", "/c", "/c/d"]
.
Step 2: Initialize the answer with the first folder
ans = [folder[0]]
The first folder in the sorted list cannot be a subfolder of anything else (it's lexicographically smallest), so we add it directly to our answer.
Step 3: Iterate through remaining folders
for f in folder[1:]:
We examine each folder starting from the second one.
Step 4: Check if current folder is a subfolder
m, n = len(ans[-1]), len(f)
if m >= n or not (ans[-1] == f[:m] and f[m] == '/'):
ans.append(f)
This condition checks if f
is NOT a subfolder of the last added folder:
m >= n
: If the last added folder is longer than or equal to the current folder, then the current folder cannot be its subfolderans[-1] == f[:m]
: Check if the current folder starts with the exact path of the last added folderf[m] == '/'
: Check if the character right after the matching prefix is/
(ensuring it's a proper subfolder, not just a similar name)
If any of these conditions indicate f
is not a subfolder, we add it to our answer.
Example walkthrough:
- Input:
["/a", "/a/b", "/c/d", "/c", "/a/b/c"]
- After sorting:
["/a", "/a/b", "/a/b/c", "/c", "/c/d"]
- Process
/a
: Add to answer โans = ["/a"]
- Process
/a/b
: Check against/a
โ/a/b
starts with/a/
โ Skip - Process
/a/b/c
: Check against/a
โ/a/b/c
starts with/a/
โ Skip - Process
/c
: Check against/a
โ/c
doesn't start with/a/
โ Add โans = ["/a", "/c"]
- Process
/c/d
: Check against/c
โ/c/d
starts with/c/
โ Skip - Final answer:
["/a", "/c"]
The time complexity is O(n log n)
for sorting plus O(n ร m)
for the iteration (where m
is average string length), and space complexity is O(n)
for storing the answer.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate the solution approach.
Input: ["/b", "/a/b", "/a", "/a/b/c"]
Step 1: Sort the folders lexicographically
After sorting: ["/a", "/a/b", "/a/b/c", "/b"]
Step 2: Initialize answer with the first folder
- Add
/a
to answer ans = ["/a"]
Step 3: Process each remaining folder
Processing /a/b
:
- Last added folder:
/a
(length = 2) - Current folder:
/a/b
(length = 4) - Check: Is
/a/b
a subfolder of/a
?- Length check: 2 < 4 โ (parent could be shorter)
- Prefix check:
/a/b
starts with/a
โ - Separator check: Character at position 2 is
/
โ
- All conditions met โ
/a/b
IS a subfolder โ Skip it ans = ["/a"]
Processing /a/b/c
:
- Last added folder:
/a
(length = 2) - Current folder:
/a/b/c
(length = 6) - Check: Is
/a/b/c
a subfolder of/a
?- Length check: 2 < 6 โ
- Prefix check:
/a/b/c
starts with/a
โ - Separator check: Character at position 2 is
/
โ
- All conditions met โ
/a/b/c
IS a subfolder โ Skip it ans = ["/a"]
Processing /b
:
- Last added folder:
/a
(length = 2) - Current folder:
/b
(length = 2) - Check: Is
/b
a subfolder of/a
?- Length check: 2 = 2 (equal length means it can't be a subfolder)
- First condition fails โ
/b
is NOT a subfolder โ Add it ans = ["/a", "/b"]
Final Result: ["/a", "/b"]
The algorithm successfully identified that /a/b
and /a/b/c
are subfolders of /a
and removed them, keeping only the parent folders /a
and /b
.
Solution Implementation
1class Solution:
2 def removeSubfolders(self, folder: List[str]) -> List[str]:
3 # Sort folders lexicographically to ensure parent folders come before their subfolders
4 folder.sort()
5
6 # Initialize result list with the first folder (it can't be a subfolder of anything)
7 result = [folder[0]]
8
9 # Iterate through remaining folders
10 for current_folder in folder[1:]:
11 last_added_folder = result[-1]
12 last_added_length = len(last_added_folder)
13 current_length = len(current_folder)
14
15 # Check if current folder is NOT a subfolder of the last added folder
16 # Conditions for current_folder to be a subfolder of last_added_folder:
17 # 1. last_added_length < current_length (parent must be shorter)
18 # 2. current_folder starts with last_added_folder
19 # 3. The character after last_added_folder in current_folder must be '/'
20 is_not_subfolder = (last_added_length >= current_length or
21 not (last_added_folder == current_folder[:last_added_length] and
22 current_folder[last_added_length] == '/'))
23
24 if is_not_subfolder:
25 result.append(current_folder)
26
27 return result
28
1class Solution {
2 public List<String> removeSubfolders(String[] folder) {
3 // Sort the folders lexicographically
4 // This ensures parent folders come before their subfolders
5 Arrays.sort(folder);
6
7 // Initialize result list with the first folder
8 List<String> result = new ArrayList<>();
9 result.add(folder[0]);
10
11 // Iterate through remaining folders
12 for (int i = 1; i < folder.length; i++) {
13 // Get the last added folder (potential parent)
14 String lastAddedFolder = result.get(result.size() - 1);
15 int lastFolderLength = lastAddedFolder.length();
16 int currentFolderLength = folder[i].length();
17
18 // Check if current folder is NOT a subfolder of the last added folder
19 // Conditions for adding current folder:
20 // 1. Last folder is longer than or equal to current folder (cannot be parent)
21 // 2. Current folder doesn't start with last folder followed by '/'
22 boolean isNotSubfolder = lastFolderLength >= currentFolderLength ||
23 !isSubfolderOf(folder[i], lastAddedFolder);
24
25 if (isNotSubfolder) {
26 result.add(folder[i]);
27 }
28 }
29
30 return result;
31 }
32
33 /**
34 * Helper method to check if currentFolder is a subfolder of parentFolder
35 * @param currentFolder The folder to check
36 * @param parentFolder The potential parent folder
37 * @return true if currentFolder is a subfolder of parentFolder
38 */
39 private boolean isSubfolderOf(String currentFolder, String parentFolder) {
40 int parentLength = parentFolder.length();
41
42 // Check if current folder starts with parent folder and has '/' after it
43 return currentFolder.startsWith(parentFolder) &&
44 currentFolder.length() > parentLength &&
45 currentFolder.charAt(parentLength) == '/';
46 }
47}
48
1class Solution {
2public:
3 vector<string> removeSubfolders(vector<string>& folder) {
4 // Sort folders lexicographically to ensure parent folders come before their subfolders
5 sort(folder.begin(), folder.end());
6
7 // Initialize result with the first folder (it cannot be a subfolder of any other)
8 vector<string> result = {folder[0]};
9
10 // Iterate through remaining folders
11 for (int i = 1; i < folder.size(); ++i) {
12 // Get the length of the last added parent folder
13 int parentLength = result.back().size();
14 // Get the length of the current folder
15 int currentLength = folder[i].size();
16
17 // Check if current folder is NOT a subfolder of the last added folder
18 // Conditions for NOT being a subfolder:
19 // 1. Parent is longer than or equal to current folder, OR
20 // 2. Current folder doesn't start with parent path followed by '/'
21 bool isNotSubfolder = parentLength >= currentLength ||
22 !(result.back() == folder[i].substr(0, parentLength) &&
23 folder[i][parentLength] == '/');
24
25 if (isNotSubfolder) {
26 // Add current folder to result as it's not a subfolder
27 result.emplace_back(folder[i]);
28 }
29 }
30
31 return result;
32 }
33};
34
1/**
2 * Removes all subfolders from a list of folder paths
3 * @param folder - Array of folder paths
4 * @returns Array of folder paths with subfolders removed
5 */
6function removeSubfolders(folder: string[]): string[] {
7 // Initialize current parent folder with the second element (index 1)
8 // This will be updated as we iterate through the sorted array
9 let currentParentFolder: string = folder[1];
10
11 // Sort the folder array lexicographically
12 // This ensures parent folders come before their subfolders
13 return folder.sort().filter((currentPath: string) => {
14 // Check if current path is NOT a subfolder of the current parent
15 // A path is a subfolder if it starts with "parentPath/"
16 const isNotSubfolder: boolean = !currentPath.startsWith(currentParentFolder + '/');
17
18 // If current path is not a subfolder, update it as the new parent folder
19 // The assignment expression returns the assigned value (currentPath)
20 // which is truthy, so the filter condition passes
21 if (isNotSubfolder) {
22 currentParentFolder = currentPath;
23 }
24
25 // Return true if not a subfolder (keeps the item in filtered array)
26 return isNotSubfolder;
27 });
28}
29
Time and Space Complexity
The time complexity is O(n ร log n ร m)
, where:
n
is the length of the arrayfolder
m
is the maximum length of the strings in the arrayfolder
This breaks down as follows:
- Sorting the folder array takes
O(n ร log n ร m)
time, where the string comparison during sorting takesO(m)
time in the worst case - The subsequent loop iterates through
n-1
elements, and for each element, the string slicing operationf[:m]
and comparison operations takeO(m)
time in the worst case - Therefore, the loop contributes
O(n ร m)
time - The overall time complexity is dominated by the sorting operation:
O(n ร log n ร m)
The space complexity is O(m)
, where:
- The
ans
list stores references to the original strings (not copies), so it doesn't add to the space complexity in terms of string storage - The sorting operation may use
O(log n)
space for the recursion stack in typical implementations - The string slicing operation
f[:m]
creates a temporary substring of length at mostm
, contributingO(m)
space - The dominant space complexity is
O(m)
from the temporary string operations
Learn more about how to find time and space complexity quickly.
Common Pitfalls
Pitfall 1: Incorrect Substring Matching Without Delimiter Check
The Problem:
A common mistake is checking if one path is a substring of another without verifying the delimiter /
. This leads to incorrect identification of subfolders.
Example of Wrong Implementation:
# WRONG: Only checking if one string starts with another if current_folder.startswith(last_added_folder): # Skip as subfolder
Why It Fails:
Consider folders /a/bc
and /a/b
. Using simple startswith()
:
/a/bc
.startswith(/a/b
) returnsTrue
- But
/a/bc
is NOT a subfolder of/a/b
! - The correct parent-child relationship requires
/a/b/...
, not/a/bc
Correct Solution:
# Must check that the character after the parent path is '/'
if (len(last_added_folder) < len(current_folder) and
current_folder.startswith(last_added_folder) and
current_folder[len(last_added_folder)] == '/'):
# Now we know it's truly a subfolder
Pitfall 2: Not Handling Edge Cases with Single Character Folders
The Problem: When folder names are single characters, boundary checking becomes critical to avoid index out of bounds errors.
Example Scenario:
folders = ["/a", "/b"] # After processing "/a", when checking "/b": # len("/a") = 2, len("/b") = 2 # Attempting to access "/b"[2] would cause IndexError if not handled properly
Solution: Always check length comparison first before accessing indices:
m, n = len(last_added_folder), len(current_folder)
if m >= n: # Check this FIRST
# Current cannot be subfolder of last_added
result.append(current_folder)
elif last_added_folder == current_folder[:m] and current_folder[m] == '/':
# Skip as subfolder
else:
result.append(current_folder)
Pitfall 3: Assuming Only Direct Parent-Child Relationships
The Problem: Some might think we need to check against ALL previously added folders, not just the last one.
Why This Seems Logical But Is Wrong:
# INEFFICIENT approach some might try: for current_folder in folder[1:]: is_subfolder = False for parent in result: # Check against ALL folders in result if is_subfolder_of(current_folder, parent): is_subfolder = True break if not is_subfolder: result.append(current_folder)
Why Checking Only Last Added Works: Due to lexicographic sorting:
- If
/a
is in the result and we're checking/a/b/c
- We would have already skipped
/a/b
(since it comes before/a/b/c
in sorted order) - So
/a
remains the last added folder when we check/a/b/c
- This property ensures we only need to check against the last added folder
Correct Approach:
Trust the sorting property and only check against result[-1]
, which significantly improves efficiency from O(nยฒ) to O(n).
How does merge sort divide the problem into subproblems?
Recommended Readings
https assets algo monster cover_photos dfs svg Depth First Search Prereqs Recursion Review problems recursion_intro Trees problems tree_intro With a solid understanding of recursion under our belts we are now ready to tackle one of the most useful techniques in coding interviews Depth First Search DFS As the name suggests
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Donโt Miss This!