Facebook Pixel

316. Remove Duplicate Letters

Problem Description

You are given a string s that contains lowercase English letters. Your task is to remove duplicate letters from the string so that each letter appears exactly once. Among all possible results after removing duplicates, you need to return the one that is lexicographically smallest.

Lexicographical order means dictionary order - for example, "abc" comes before "acb" because at the first differing position, 'b' comes before 'c' in the alphabet.

For example:

  • If s = "bcabc", the output should be "abc" (each letter appears once and it's the smallest possible arrangement)
  • If s = "cbacdcbc", the output should be "acdb"

The key challenge is that you can't simply take the first occurrence of each character or sort them alphabetically. You need to strategically choose which occurrence of each duplicate letter to keep, ensuring that:

  1. Every letter from the original string appears exactly once in the result
  2. The resulting string is the smallest possible in lexicographical order

The solution uses a greedy approach with a stack to build the result. It tracks the last occurrence of each character and uses this information to decide whether to remove characters from the stack when a smaller character is encountered. The algorithm maintains a stack of characters and a set to track which characters are already in the result. When processing each character, if it's not already in the result and there are larger characters in the stack that will appear again later, those larger characters are popped to make room for the smaller one, ensuring the lexicographically smallest result.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

To build the lexicographically smallest string, we need to make greedy choices at each step. The key insight is that when we encounter a character, we should ask: "Can I place this character in a better position by removing some larger characters that come before it?"

Think of it like arranging people in a line by height, where you want the shortest people in front. If a tall person is currently in line but you know they'll show up again later, you can remove them now to let a shorter person take their place in front.

The critical realization is that we can only remove a character from our current result if we know it will appear again later in the string. This is why we need to track the last occurrence of each character - it tells us whether we'll have another chance to include that character.

Here's how the thinking evolves:

  1. Greedy choice: When we see a smaller character, we want it to appear as early as possible in our result to make the string lexicographically smaller.

  2. Safe removal condition: We can only remove a character from our current result if:

    • The character we want to remove is larger than the current character (this improves lexicographical order)
    • That character appears again later in the string (so we won't lose it completely)
  3. Stack structure: A stack naturally supports this "backtracking" behavior - we can build our result character by character, and when we find a better option, we can pop characters off the top (remove recent additions) to make room for the smaller character.

  4. Avoiding duplicates: We need to track which characters are already in our result to avoid adding duplicates. If we encounter a character that's already in our result, we skip it because we've already placed it in its optimal position.

The algorithm essentially simulates making the best local decision at each step while maintaining the ability to revise previous decisions when a better option becomes available, as long as we won't lose any characters permanently.

Solution Approach

The implementation uses a monotonic stack pattern combined with a greedy approach. Here's how the solution works step by step:

Data Structures Used:

  1. Dictionary last: Maps each character to its last occurrence index in the string
  2. Stack stk: Builds the result string character by character
  3. Set vis: Tracks which characters are currently in the stack

Algorithm Steps:

  1. Preprocessing - Build the last dictionary:

    last = {c: i for i, c in enumerate(s)}

    This records the rightmost index where each character appears. For example, if s = "bcabc", then last = {'b': 3, 'c': 4, 'a': 2}.

  2. Main iteration through the string: For each character c at index i:

    a. Skip if already in result:

    if c in vis:
        continue

    If the character is already in our stack (tracked by vis), we skip it since we only want each character once.

    b. Pop larger characters that appear later:

    while stk and stk[-1] > c and last[stk[-1]] > i:
        vis.remove(stk.pop())

    This is the core logic. While the stack is not empty AND the top character is greater than the current character AND that top character will appear again later (its last occurrence is after the current index), we:

    • Pop the character from the stack
    • Remove it from the vis set

    c. Add current character:

    stk.append(c)
    vis.add(c)

    Push the current character onto the stack and mark it as visited.

  3. Build the final result:

    return ''.join(stk)

    Convert the stack to a string.

Example Walkthrough with s = "bcabc":

  • last = {'b': 3, 'c': 4, 'a': 2}
  • Index 0, c = 'b': Stack becomes ['b'], vis = {'b'}
  • Index 1, c = 'c': Stack becomes ['b', 'c'], vis = {'b', 'c'}
  • Index 2, c = 'a':
    • 'a' < 'c' and 'c' appears later (at index 4), so pop 'c'
    • 'a' < 'b' and 'b' appears later (at index 3), so pop 'b'
    • Stack becomes ['a'], vis = {'a'}
  • Index 3, c = 'b': Stack becomes ['a', 'b'], vis = {'a', 'b'}
  • Index 4, c = 'c': Stack becomes ['a', 'b', 'c'], vis = {'a', 'b', 'c'}
  • Result: "abc"

The time complexity is O(n) where n is the length of the string, as each character is pushed and popped at most once. The space complexity is O(1) since we only store at most 26 lowercase letters.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with s = "cbacdcbc" to see how the greedy stack approach works.

Step 1: Preprocessing First, we identify the last occurrence of each character:

  • last = {'c': 7, 'b': 6, 'a': 2, 'd': 4}

Step 2: Process each character

Starting with empty stack = [] and visited = {}

Index 0: c = 'c'

  • 'c' not in visited
  • Stack is empty, so add 'c'
  • stack = ['c'], visited = {'c'}

Index 1: c = 'b'

  • 'b' not in visited
  • Check stack top: 'c' > 'b' and 'c' appears later (index 7 > 1)
  • Pop 'c' from stack
  • Add 'b'
  • stack = ['b'], visited = {'b'}

Index 2: c = 'a'

  • 'a' not in visited
  • Check stack top: 'b' > 'a' and 'b' appears later (index 6 > 2)
  • Pop 'b' from stack
  • Add 'a'
  • stack = ['a'], visited = {'a'}

Index 3: c = 'c'

  • 'c' not in visited
  • Check stack top: 'a' < 'c', no need to pop
  • Add 'c'
  • stack = ['a', 'c'], visited = {'a', 'c'}

Index 4: c = 'd'

  • 'd' not in visited
  • Check stack top: 'c' < 'd', no need to pop
  • Add 'd'
  • stack = ['a', 'c', 'd'], visited = {'a', 'c', 'd'}

Index 5: c = 'c'

  • 'c' already in visited, skip

Index 6: c = 'b'

  • 'b' not in visited
  • Check stack top: 'd' > 'b' but 'd' doesn't appear later (index 4 < 6)
  • Cannot pop 'd', so check if we can add 'b'
  • Since 'd' > 'b' and we can't remove 'd', we need to reconsider...
  • Actually, let me recalculate this step:
  • 'd' > 'b' and last['d'] = 4 which is NOT > 6, so we cannot pop 'd'
  • Add 'b' after 'd'
  • stack = ['a', 'c', 'd', 'b'], visited = {'a', 'c', 'd', 'b'}

Index 7: c = 'c'

  • 'c' already in visited, skip

Final Result: "acdb"

The key insight: When we encountered 'b' at index 6, we couldn't remove 'd' even though 'd' > 'b', because 'd' wouldn't appear again later. This demonstrates why tracking the last occurrence is crucial - we can only remove characters if we're guaranteed to see them again.

Solution Implementation

1class Solution:
2    def removeDuplicateLetters(self, s: str) -> str:
3        # Record the last occurrence index of each character
4        last_occurrence = {char: index for index, char in enumerate(s)}
5      
6        # Stack to build the result string
7        stack = []
8      
9        # Set to track which characters are already in the stack
10        visited = set()
11      
12        # Iterate through each character in the string
13        for index, char in enumerate(s):
14            # Skip if character is already in the result
15            if char in visited:
16                continue
17          
18            # Remove characters from stack that are:
19            # 1. Greater than current character (lexicographically)
20            # 2. Will appear again later in the string
21            while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
22                removed_char = stack.pop()
23                visited.remove(removed_char)
24          
25            # Add current character to stack and mark as visited
26            stack.append(char)
27            visited.add(char)
28      
29        # Join the stack to form the final result string
30        return ''.join(stack)
31
1class Solution {
2    public String removeDuplicateLetters(String s) {
3        int length = s.length();
4      
5        // Store the last occurrence index of each character
6        int[] lastOccurrence = new int[26];
7        for (int i = 0; i < length; i++) {
8            lastOccurrence[s.charAt(i) - 'a'] = i;
9        }
10      
11        // Stack to build the result string
12        Deque<Character> stack = new ArrayDeque<>();
13      
14        // Bitmask to track which characters are already in the stack
15        // If bit i is set, it means character ('a' + i) is in the stack
16        int inStackMask = 0;
17      
18        // Process each character in the string
19        for (int i = 0; i < length; i++) {
20            char currentChar = s.charAt(i);
21          
22            // Skip if current character is already in the stack
23            if (((inStackMask >> (currentChar - 'a')) & 1) == 1) {
24                continue;
25            }
26          
27            // Pop characters from stack if they are:
28            // 1. Lexicographically greater than current character
29            // 2. Will appear again later in the string
30            while (!stack.isEmpty() && 
31                   stack.peek() > currentChar && 
32                   lastOccurrence[stack.peek() - 'a'] > i) {
33                // Remove the popped character from the bitmask
34                inStackMask ^= 1 << (stack.pop() - 'a');
35            }
36          
37            // Add current character to stack
38            stack.push(currentChar);
39          
40            // Mark current character as present in stack using bitmask
41            inStackMask |= 1 << (currentChar - 'a');
42        }
43      
44        // Build the result string from stack
45        StringBuilder result = new StringBuilder();
46        for (char character : stack) {
47            result.append(character);
48        }
49      
50        // Stack stores characters in reverse order, so reverse the result
51        return result.reverse().toString();
52    }
53}
54
1class Solution {
2public:
3    string removeDuplicateLetters(string s) {
4        int stringLength = s.size();
5      
6        // Store the last occurrence index of each character
7        int lastOccurrence[26] = {0};
8        for (int i = 0; i < stringLength; ++i) {
9            lastOccurrence[s[i] - 'a'] = i;
10        }
11      
12        // Result string to build the answer
13        string result;
14      
15        // Bitmask to track which characters are already in the result
16        // Bit i is set if character ('a' + i) is in the result
17        int usedCharacterMask = 0;
18      
19        // Process each character in the input string
20        for (int i = 0; i < stringLength; ++i) {
21            char currentChar = s[i];
22            int charIndex = currentChar - 'a';
23          
24            // Skip if this character is already in the result
25            if ((usedCharacterMask >> charIndex) & 1) {
26                continue;
27            }
28          
29            // Remove characters from result that are:
30            // 1. Lexicographically greater than current character
31            // 2. Will appear again later in the string
32            while (!result.empty() && 
33                   result.back() > currentChar && 
34                   lastOccurrence[result.back() - 'a'] > i) {
35                // Remove the character from the bitmask
36                usedCharacterMask ^= 1 << (result.back() - 'a');
37                // Remove the character from result
38                result.pop_back();
39            }
40          
41            // Add current character to result
42            result.push_back(currentChar);
43            // Mark this character as used in the bitmask
44            usedCharacterMask |= 1 << charIndex;
45        }
46      
47        return result;
48    }
49};
50
1function removeDuplicateLetters(s: string): string {
2    const stringLength: number = s.length;
3  
4    // Store the last occurrence index of each character
5    // Index corresponds to character position (0 for 'a', 1 for 'b', etc.)
6    const lastOccurrence: number[] = new Array(26).fill(0);
7    for (let i = 0; i < stringLength; i++) {
8        lastOccurrence[s.charCodeAt(i) - 97] = i; // 97 is char code for 'a'
9    }
10  
11    // Result array to build the answer (using array for efficient operations)
12    const result: string[] = [];
13  
14    // Bitmask to track which characters are already in the result
15    // Bit i is set if character ('a' + i) is in the result
16    let usedCharacterMask: number = 0;
17  
18    // Process each character in the input string
19    for (let i = 0; i < stringLength; i++) {
20        const currentChar: string = s[i];
21        const charIndex: number = currentChar.charCodeAt(0) - 97;
22      
23        // Skip if this character is already in the result
24        // Check if the bit at position charIndex is set
25        if ((usedCharacterMask >> charIndex) & 1) {
26            continue;
27        }
28      
29        // Remove characters from result that are:
30        // 1. Lexicographically greater than current character
31        // 2. Will appear again later in the string
32        while (result.length > 0 && 
33               result[result.length - 1] > currentChar && 
34               lastOccurrence[result[result.length - 1].charCodeAt(0) - 97] > i) {
35            // Remove the character from the bitmask using XOR
36            usedCharacterMask ^= 1 << (result[result.length - 1].charCodeAt(0) - 97);
37            // Remove the character from result
38            result.pop();
39        }
40      
41        // Add current character to result
42        result.push(currentChar);
43        // Mark this character as used in the bitmask using OR
44        usedCharacterMask |= 1 << charIndex;
45    }
46  
47    // Convert result array back to string
48    return result.join('');
49}
50

Time and Space Complexity

Time Complexity: O(n), where n is the length of the string s.

  • Creating the last dictionary takes O(n) time as we iterate through the string once
  • The main loop iterates through each character in the string once, which is O(n)
  • Although there's a nested while loop, each character can be pushed to and popped from the stack at most once throughout the entire execution
  • The in operation for checking vis (a set) is O(1)
  • The remove operation from vis is also O(1) for a set
  • Therefore, the total time complexity is O(n) + O(n) = O(n)

Space Complexity: O(n), where n is the length of the string s.

  • The last dictionary stores at most 26 entries (for lowercase English letters), but in the worst case with unique characters, it could store up to n entries, so it's O(n)
  • The stk list can contain at most 26 unique characters (or n in the worst case), which is O(n)
  • The vis set similarly stores at most 26 characters (or n in the worst case), which is O(n)
  • The output string construction takes O(n) space
  • Therefore, the total space complexity is O(n)

Common Pitfalls

Pitfall 1: Forgetting to Check if Character Appears Later

The Mistake: A common error is popping characters from the stack without verifying they appear later in the string:

# WRONG - This removes characters even if they won't appear again
while stack and stack[-1] > char:
    removed_char = stack.pop()
    visited.remove(removed_char)

Why It's Wrong: If we remove a character that doesn't appear later, we lose it permanently from our result, violating the requirement that every unique character must appear in the final string.

Example: For s = "dbca":

  • When we encounter 'a', if we pop 'c' without checking, we lose 'c' forever
  • Result would be "dba" instead of the correct "dbca"

The Fix: Always check last_occurrence[stack[-1]] > index before popping:

# CORRECT - Only remove if the character appears again later
while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
    removed_char = stack.pop()
    visited.remove(removed_char)

Pitfall 2: Not Tracking Visited Characters Properly

The Mistake: Forgetting to maintain the visited set or not updating it correctly when popping:

# WRONG - Forgetting to update visited when popping
while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
    stack.pop()  # Forgot to remove from visited set!

Why It's Wrong: The visited set prevents duplicate processing. If we don't remove popped characters from visited, the algorithm will skip them when they appear again, even though they're no longer in the stack.

Example: For s = "bcabc":

  • After popping 'b' at index 2, if 'b' remains in visited
  • When we encounter 'b' again at index 3, we'll skip it
  • Result would be incomplete

The Fix: Always synchronize the visited set with stack operations:

# CORRECT - Keep visited set synchronized with stack
while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
    removed_char = stack.pop()
    visited.remove(removed_char)  # Must remove from visited

Pitfall 3: Using Wrong Comparison for Lexicographical Order

The Mistake: Comparing indices instead of characters:

# WRONG - Comparing indices instead of characters
while stack and index < last_occurrence[stack[-1]] and last_occurrence[stack[-1]] > index:
    # This doesn't ensure lexicographical order

Why It's Wrong: Lexicographical order depends on character values ('a' < 'b' < 'c'...), not their positions in the string.

The Fix: Compare characters directly using stack[-1] > char:

# CORRECT - Compare characters for lexicographical order
while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:

Pitfall 4: Processing Duplicates Multiple Times

The Mistake: Not checking if a character is already in the result before processing:

# WRONG - Processing every character without checking
for index, char in enumerate(s):
    # Missing the check: if char in visited: continue
    while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
        removed_char = stack.pop()
        visited.remove(removed_char)
    stack.append(char)  # This could add duplicates!
    visited.add(char)

Why It's Wrong: Without the skip check, we might try to add a character that's already in our result, potentially causing incorrect popping of other characters or adding duplicates.

The Fix: Always check if the character is already visited before processing:

# CORRECT - Skip characters already in the result
if char in visited:
    continue
Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

The three-steps of Depth First Search are:

  1. Identify states;
  2. Draw the state-space tree;
  3. DFS on the state-space tree.

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More