316. Remove Duplicate Letters
Problem Description
You are given a string s
that contains lowercase English letters. Your task is to remove duplicate letters from the string so that each letter appears exactly once. Among all possible results after removing duplicates, you need to return the one that is lexicographically smallest.
Lexicographical order means dictionary order - for example, "abc" comes before "acb" because at the first differing position, 'b' comes before 'c' in the alphabet.
For example:
- If
s = "bcabc"
, the output should be"abc"
(each letter appears once and it's the smallest possible arrangement) - If
s = "cbacdcbc"
, the output should be"acdb"
The key challenge is that you can't simply take the first occurrence of each character or sort them alphabetically. You need to strategically choose which occurrence of each duplicate letter to keep, ensuring that:
- Every letter from the original string appears exactly once in the result
- The resulting string is the smallest possible in lexicographical order
The solution uses a greedy approach with a stack to build the result. It tracks the last occurrence of each character and uses this information to decide whether to remove characters from the stack when a smaller character is encountered. The algorithm maintains a stack of characters and a set to track which characters are already in the result. When processing each character, if it's not already in the result and there are larger characters in the stack that will appear again later, those larger characters are popped to make room for the smaller one, ensuring the lexicographically smallest result.
Intuition
To build the lexicographically smallest string, we need to make greedy choices at each step. The key insight is that when we encounter a character, we should ask: "Can I place this character in a better position by removing some larger characters that come before it?"
Think of it like arranging people in a line by height, where you want the shortest people in front. If a tall person is currently in line but you know they'll show up again later, you can remove them now to let a shorter person take their place in front.
The critical realization is that we can only remove a character from our current result if we know it will appear again later in the string. This is why we need to track the last occurrence of each character - it tells us whether we'll have another chance to include that character.
Here's how the thinking evolves:
-
Greedy choice: When we see a smaller character, we want it to appear as early as possible in our result to make the string lexicographically smaller.
-
Safe removal condition: We can only remove a character from our current result if:
- The character we want to remove is larger than the current character (this improves lexicographical order)
- That character appears again later in the string (so we won't lose it completely)
-
Stack structure: A stack naturally supports this "backtracking" behavior - we can build our result character by character, and when we find a better option, we can pop characters off the top (remove recent additions) to make room for the smaller character.
-
Avoiding duplicates: We need to track which characters are already in our result to avoid adding duplicates. If we encounter a character that's already in our result, we skip it because we've already placed it in its optimal position.
The algorithm essentially simulates making the best local decision at each step while maintaining the ability to revise previous decisions when a better option becomes available, as long as we won't lose any characters permanently.
Solution Approach
The implementation uses a monotonic stack pattern combined with a greedy approach. Here's how the solution works step by step:
Data Structures Used:
- Dictionary
last
: Maps each character to its last occurrence index in the string - Stack
stk
: Builds the result string character by character - Set
vis
: Tracks which characters are currently in the stack
Algorithm Steps:
-
Preprocessing - Build the
last
dictionary:last = {c: i for i, c in enumerate(s)}
This records the rightmost index where each character appears. For example, if
s = "bcabc"
, thenlast = {'b': 3, 'c': 4, 'a': 2}
. -
Main iteration through the string: For each character
c
at indexi
:a. Skip if already in result:
if c in vis: continue
If the character is already in our stack (tracked by
vis
), we skip it since we only want each character once.b. Pop larger characters that appear later:
while stk and stk[-1] > c and last[stk[-1]] > i: vis.remove(stk.pop())
This is the core logic. While the stack is not empty AND the top character is greater than the current character AND that top character will appear again later (its last occurrence is after the current index), we:
- Pop the character from the stack
- Remove it from the
vis
set
c. Add current character:
stk.append(c) vis.add(c)
Push the current character onto the stack and mark it as visited.
-
Build the final result:
return ''.join(stk)
Convert the stack to a string.
Example Walkthrough with s = "bcabc"
:
last = {'b': 3, 'c': 4, 'a': 2}
- Index 0,
c = 'b'
: Stack becomes['b']
,vis = {'b'}
- Index 1,
c = 'c'
: Stack becomes['b', 'c']
,vis = {'b', 'c'}
- Index 2,
c = 'a'
:- 'a' < 'c' and 'c' appears later (at index 4), so pop 'c'
- 'a' < 'b' and 'b' appears later (at index 3), so pop 'b'
- Stack becomes
['a']
,vis = {'a'}
- Index 3,
c = 'b'
: Stack becomes['a', 'b']
,vis = {'a', 'b'}
- Index 4,
c = 'c'
: Stack becomes['a', 'b', 'c']
,vis = {'a', 'b', 'c'}
- Result:
"abc"
The time complexity is O(n)
where n
is the length of the string, as each character is pushed and popped at most once. The space complexity is O(1)
since we only store at most 26 lowercase letters.
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with s = "cbacdcbc"
to see how the greedy stack approach works.
Step 1: Preprocessing First, we identify the last occurrence of each character:
last = {'c': 7, 'b': 6, 'a': 2, 'd': 4}
Step 2: Process each character
Starting with empty stack = []
and visited = {}
Index 0: c = 'c'
- 'c' not in visited
- Stack is empty, so add 'c'
stack = ['c']
,visited = {'c'}
Index 1: c = 'b'
- 'b' not in visited
- Check stack top: 'c' > 'b' and 'c' appears later (index 7 > 1)
- Pop 'c' from stack
- Add 'b'
stack = ['b']
,visited = {'b'}
Index 2: c = 'a'
- 'a' not in visited
- Check stack top: 'b' > 'a' and 'b' appears later (index 6 > 2)
- Pop 'b' from stack
- Add 'a'
stack = ['a']
,visited = {'a'}
Index 3: c = 'c'
- 'c' not in visited
- Check stack top: 'a' < 'c', no need to pop
- Add 'c'
stack = ['a', 'c']
,visited = {'a', 'c'}
Index 4: c = 'd'
- 'd' not in visited
- Check stack top: 'c' < 'd', no need to pop
- Add 'd'
stack = ['a', 'c', 'd']
,visited = {'a', 'c', 'd'}
Index 5: c = 'c'
- 'c' already in visited, skip
Index 6: c = 'b'
- 'b' not in visited
- Check stack top: 'd' > 'b' but 'd' doesn't appear later (index 4 < 6)
- Cannot pop 'd', so check if we can add 'b'
- Since 'd' > 'b' and we can't remove 'd', we need to reconsider...
- Actually, let me recalculate this step:
- 'd' > 'b' and last['d'] = 4 which is NOT > 6, so we cannot pop 'd'
- Add 'b' after 'd'
stack = ['a', 'c', 'd', 'b']
,visited = {'a', 'c', 'd', 'b'}
Index 7: c = 'c'
- 'c' already in visited, skip
Final Result: "acdb"
The key insight: When we encountered 'b' at index 6, we couldn't remove 'd' even though 'd' > 'b', because 'd' wouldn't appear again later. This demonstrates why tracking the last occurrence is crucial - we can only remove characters if we're guaranteed to see them again.
Solution Implementation
1class Solution:
2 def removeDuplicateLetters(self, s: str) -> str:
3 # Record the last occurrence index of each character
4 last_occurrence = {char: index for index, char in enumerate(s)}
5
6 # Stack to build the result string
7 stack = []
8
9 # Set to track which characters are already in the stack
10 visited = set()
11
12 # Iterate through each character in the string
13 for index, char in enumerate(s):
14 # Skip if character is already in the result
15 if char in visited:
16 continue
17
18 # Remove characters from stack that are:
19 # 1. Greater than current character (lexicographically)
20 # 2. Will appear again later in the string
21 while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
22 removed_char = stack.pop()
23 visited.remove(removed_char)
24
25 # Add current character to stack and mark as visited
26 stack.append(char)
27 visited.add(char)
28
29 # Join the stack to form the final result string
30 return ''.join(stack)
31
1class Solution {
2 public String removeDuplicateLetters(String s) {
3 int length = s.length();
4
5 // Store the last occurrence index of each character
6 int[] lastOccurrence = new int[26];
7 for (int i = 0; i < length; i++) {
8 lastOccurrence[s.charAt(i) - 'a'] = i;
9 }
10
11 // Stack to build the result string
12 Deque<Character> stack = new ArrayDeque<>();
13
14 // Bitmask to track which characters are already in the stack
15 // If bit i is set, it means character ('a' + i) is in the stack
16 int inStackMask = 0;
17
18 // Process each character in the string
19 for (int i = 0; i < length; i++) {
20 char currentChar = s.charAt(i);
21
22 // Skip if current character is already in the stack
23 if (((inStackMask >> (currentChar - 'a')) & 1) == 1) {
24 continue;
25 }
26
27 // Pop characters from stack if they are:
28 // 1. Lexicographically greater than current character
29 // 2. Will appear again later in the string
30 while (!stack.isEmpty() &&
31 stack.peek() > currentChar &&
32 lastOccurrence[stack.peek() - 'a'] > i) {
33 // Remove the popped character from the bitmask
34 inStackMask ^= 1 << (stack.pop() - 'a');
35 }
36
37 // Add current character to stack
38 stack.push(currentChar);
39
40 // Mark current character as present in stack using bitmask
41 inStackMask |= 1 << (currentChar - 'a');
42 }
43
44 // Build the result string from stack
45 StringBuilder result = new StringBuilder();
46 for (char character : stack) {
47 result.append(character);
48 }
49
50 // Stack stores characters in reverse order, so reverse the result
51 return result.reverse().toString();
52 }
53}
54
1class Solution {
2public:
3 string removeDuplicateLetters(string s) {
4 int stringLength = s.size();
5
6 // Store the last occurrence index of each character
7 int lastOccurrence[26] = {0};
8 for (int i = 0; i < stringLength; ++i) {
9 lastOccurrence[s[i] - 'a'] = i;
10 }
11
12 // Result string to build the answer
13 string result;
14
15 // Bitmask to track which characters are already in the result
16 // Bit i is set if character ('a' + i) is in the result
17 int usedCharacterMask = 0;
18
19 // Process each character in the input string
20 for (int i = 0; i < stringLength; ++i) {
21 char currentChar = s[i];
22 int charIndex = currentChar - 'a';
23
24 // Skip if this character is already in the result
25 if ((usedCharacterMask >> charIndex) & 1) {
26 continue;
27 }
28
29 // Remove characters from result that are:
30 // 1. Lexicographically greater than current character
31 // 2. Will appear again later in the string
32 while (!result.empty() &&
33 result.back() > currentChar &&
34 lastOccurrence[result.back() - 'a'] > i) {
35 // Remove the character from the bitmask
36 usedCharacterMask ^= 1 << (result.back() - 'a');
37 // Remove the character from result
38 result.pop_back();
39 }
40
41 // Add current character to result
42 result.push_back(currentChar);
43 // Mark this character as used in the bitmask
44 usedCharacterMask |= 1 << charIndex;
45 }
46
47 return result;
48 }
49};
50
1function removeDuplicateLetters(s: string): string {
2 const stringLength: number = s.length;
3
4 // Store the last occurrence index of each character
5 // Index corresponds to character position (0 for 'a', 1 for 'b', etc.)
6 const lastOccurrence: number[] = new Array(26).fill(0);
7 for (let i = 0; i < stringLength; i++) {
8 lastOccurrence[s.charCodeAt(i) - 97] = i; // 97 is char code for 'a'
9 }
10
11 // Result array to build the answer (using array for efficient operations)
12 const result: string[] = [];
13
14 // Bitmask to track which characters are already in the result
15 // Bit i is set if character ('a' + i) is in the result
16 let usedCharacterMask: number = 0;
17
18 // Process each character in the input string
19 for (let i = 0; i < stringLength; i++) {
20 const currentChar: string = s[i];
21 const charIndex: number = currentChar.charCodeAt(0) - 97;
22
23 // Skip if this character is already in the result
24 // Check if the bit at position charIndex is set
25 if ((usedCharacterMask >> charIndex) & 1) {
26 continue;
27 }
28
29 // Remove characters from result that are:
30 // 1. Lexicographically greater than current character
31 // 2. Will appear again later in the string
32 while (result.length > 0 &&
33 result[result.length - 1] > currentChar &&
34 lastOccurrence[result[result.length - 1].charCodeAt(0) - 97] > i) {
35 // Remove the character from the bitmask using XOR
36 usedCharacterMask ^= 1 << (result[result.length - 1].charCodeAt(0) - 97);
37 // Remove the character from result
38 result.pop();
39 }
40
41 // Add current character to result
42 result.push(currentChar);
43 // Mark this character as used in the bitmask using OR
44 usedCharacterMask |= 1 << charIndex;
45 }
46
47 // Convert result array back to string
48 return result.join('');
49}
50
Time and Space Complexity
Time Complexity: O(n)
, where n
is the length of the string s
.
- Creating the
last
dictionary takesO(n)
time as we iterate through the string once - The main loop iterates through each character in the string once, which is
O(n)
- Although there's a nested while loop, each character can be pushed to and popped from the stack at most once throughout the entire execution
- The
in
operation for checkingvis
(a set) isO(1)
- The remove operation from
vis
is alsoO(1)
for a set - Therefore, the total time complexity is
O(n) + O(n) = O(n)
Space Complexity: O(n)
, where n
is the length of the string s
.
- The
last
dictionary stores at most 26 entries (for lowercase English letters), but in the worst case with unique characters, it could store up ton
entries, so it'sO(n)
- The
stk
list can contain at most 26 unique characters (orn
in the worst case), which isO(n)
- The
vis
set similarly stores at most 26 characters (orn
in the worst case), which isO(n)
- The output string construction takes
O(n)
space - Therefore, the total space complexity is
O(n)
Common Pitfalls
Pitfall 1: Forgetting to Check if Character Appears Later
The Mistake: A common error is popping characters from the stack without verifying they appear later in the string:
# WRONG - This removes characters even if they won't appear again while stack and stack[-1] > char: removed_char = stack.pop() visited.remove(removed_char)
Why It's Wrong: If we remove a character that doesn't appear later, we lose it permanently from our result, violating the requirement that every unique character must appear in the final string.
Example: For s = "dbca"
:
- When we encounter 'a', if we pop 'c' without checking, we lose 'c' forever
- Result would be "dba" instead of the correct "dbca"
The Fix:
Always check last_occurrence[stack[-1]] > index
before popping:
# CORRECT - Only remove if the character appears again later while stack and stack[-1] > char and last_occurrence[stack[-1]] > index: removed_char = stack.pop() visited.remove(removed_char)
Pitfall 2: Not Tracking Visited Characters Properly
The Mistake:
Forgetting to maintain the visited
set or not updating it correctly when popping:
# WRONG - Forgetting to update visited when popping while stack and stack[-1] > char and last_occurrence[stack[-1]] > index: stack.pop() # Forgot to remove from visited set!
Why It's Wrong:
The visited
set prevents duplicate processing. If we don't remove popped characters from visited
, the algorithm will skip them when they appear again, even though they're no longer in the stack.
Example: For s = "bcabc"
:
- After popping 'b' at index 2, if 'b' remains in
visited
- When we encounter 'b' again at index 3, we'll skip it
- Result would be incomplete
The Fix:
Always synchronize the visited
set with stack operations:
# CORRECT - Keep visited set synchronized with stack while stack and stack[-1] > char and last_occurrence[stack[-1]] > index: removed_char = stack.pop() visited.remove(removed_char) # Must remove from visited
Pitfall 3: Using Wrong Comparison for Lexicographical Order
The Mistake: Comparing indices instead of characters:
# WRONG - Comparing indices instead of characters while stack and index < last_occurrence[stack[-1]] and last_occurrence[stack[-1]] > index: # This doesn't ensure lexicographical order
Why It's Wrong: Lexicographical order depends on character values ('a' < 'b' < 'c'...), not their positions in the string.
The Fix:
Compare characters directly using stack[-1] > char
:
# CORRECT - Compare characters for lexicographical order while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
Pitfall 4: Processing Duplicates Multiple Times
The Mistake: Not checking if a character is already in the result before processing:
# WRONG - Processing every character without checking
for index, char in enumerate(s):
# Missing the check: if char in visited: continue
while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
removed_char = stack.pop()
visited.remove(removed_char)
stack.append(char) # This could add duplicates!
visited.add(char)
Why It's Wrong: Without the skip check, we might try to add a character that's already in our result, potentially causing incorrect popping of other characters or adding duplicates.
The Fix: Always check if the character is already visited before processing:
# CORRECT - Skip characters already in the result if char in visited: continue
The three-steps of Depth First Search are:
- Identify states;
- Draw the state-space tree;
- DFS on the state-space tree.
Recommended Readings
Stack Intro Imagine you have a pile of books on your desk If you want to add a new book you place it on top If you want to read a book you take it from the top And if you simply want to see which book is on the top you
Greedy Introduction div class responsive iframe iframe src https www youtube com embed WTslqPbj7I title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div When do we use greedy Greedy algorithms tend to solve optimization problems Typically they will ask you to calculate the max min of some value Commonly you may see this phrased in the problem as max min longest shortest largest smallest etc These keywords can be identified by just scanning
Monotonic Stack Deque Intro If you'd prefer a video div class responsive iframe iframe src https www youtube com embed Dq_ObZwTY_Q title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div The word monotonic means a list or
Want a Structured Path to Master System Design Too? Don’t Miss This!