Facebook Pixel

1081. Smallest Subsequence of Distinct Characters

Problem Description

Given a string s, you need to find the lexicographically smallest subsequence that contains every distinct character from s exactly once.

A subsequence is a sequence that can be derived from the original string by deleting some (possibly zero) characters without changing the order of the remaining characters. For example, "ace" is a subsequence of "aebdc".

Lexicographically smallest means the subsequence that would appear first in dictionary order. For instance, "abc" comes before "abd" lexicographically.

The key requirements are:

  • The result must be a subsequence of the original string (maintain relative order)
  • Every unique character from the original string must appear exactly once in the result
  • Among all valid subsequences meeting the above criteria, return the lexicographically smallest one

For example:

  • If s = "bcabc", the distinct characters are 'a', 'b', and 'c'. The possible subsequences containing each character once include "bca", "bac", "cab", "abc", etc. Among these, "abc" is the lexicographically smallest, so it's the answer.
  • If s = "cbacdcbc", the answer would be "acdb" as it contains all distinct characters ('a', 'b', 'c', 'd') exactly once and is the smallest lexicographical subsequence possible.
Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The challenge is to build the lexicographically smallest subsequence while ensuring we include all distinct characters. The key insight is that we want smaller characters to appear as early as possible in our result.

Consider this: when we encounter a character, we face a decision - should we include it now or wait for a potentially better position later? The answer depends on two factors:

  1. Can we safely skip this character because it appears again later?
  2. Can we improve our current result by removing some previously added characters?

This naturally leads us to think about using a stack (or similar structure) to build our answer. As we process each character, we can make greedy decisions:

  • If we've already included this character, skip it (we only want each character once)
  • If the current character is smaller than the last character in our result, we might want to remove the last character and use the current one instead
  • But we can only remove a character from our result if it appears again later in the string

The crucial realization is that we need to know the last occurrence position of each character. This allows us to determine whether it's safe to remove a character from our current result - we can only remove it if we'll encounter it again later.

The algorithm becomes: maintain a stack of characters forming our result. For each new character, if it's smaller than the top of the stack AND the top character appears again later, we pop the stack. We keep popping while this condition holds. Then we add the current character (if not already present).

This greedy approach works because at each step, we're making the locally optimal choice that leads to the globally optimal lexicographically smallest subsequence.

Learn more about Stack, Greedy and Monotonic Stack patterns.

Solution Approach

The implementation uses a monotonic stack approach with additional tracking to ensure each character appears exactly once.

Data Structures Used:

  1. last: A dictionary that maps each character to its last occurrence index in the string
  2. stk: A stack (implemented as a list) to build the result subsequence
  3. vis: A set to track which characters are currently in the stack

Algorithm Steps:

  1. Preprocessing: Build the last dictionary by iterating through the string and recording the last index where each character appears:

    last = {c: i for i, c in enumerate(s)}
  2. Main Processing: Iterate through each character with its index:

    for i, c in enumerate(s):
  3. Skip Duplicates: If the character is already in our result (tracked by vis), skip it:

    if c in vis:
        continue
  4. Maintain Monotonic Property: Before adding the current character, check if we can improve the result by removing larger characters from the stack:

    while stk and stk[-1] > c and last[stk[-1]] > i:
        vis.remove(stk.pop())

    This condition checks:

    • stk: Stack is not empty
    • stk[-1] > c: The top character is lexicographically larger than the current character
    • last[stk[-1]] > i: The top character appears again later in the string (safe to remove)
  5. Add Current Character: After potentially removing larger characters, add the current character to the stack and mark it as visited:

    stk.append(c)
    vis.add(c)
  6. Build Result: After processing all characters, join the stack to form the final string:

    return "".join(stk)

The algorithm ensures that at each step, we maintain the smallest possible lexicographical order while guaranteeing that all distinct characters will eventually be included. The time complexity is O(n) where n is the length of the string, as each character is pushed and popped from the stack at most once.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's trace through the algorithm with s = "cbacdcbc".

Step 1: Preprocessing

  • Build last dictionary: {'c': 7, 'b': 6, 'a': 2, 'd': 4}
  • Initialize: stk = [], vis = set()

Step 2: Process each character

Index 0, character 'c':

  • 'c' not in vis
  • Stack is empty, no need to pop
  • Push 'c': stk = ['c'], vis = {'c'}

Index 1, character 'b':

  • 'b' not in vis
  • Check while condition: stk[-1] = 'c' > 'b' is True, and last['c'] = 7 > 1 is True
  • Pop 'c': stk = [], vis = {}
  • Push 'b': stk = ['b'], vis = {'b'}

Index 2, character 'a':

  • 'a' not in vis
  • Check while condition: stk[-1] = 'b' > 'a' is True, and last['b'] = 6 > 2 is True
  • Pop 'b': stk = [], vis = {}
  • Push 'a': stk = ['a'], vis = {'a'}

Index 3, character 'c':

  • 'c' not in vis
  • Check while condition: stk[-1] = 'a' > 'c' is False
  • Push 'c': stk = ['a', 'c'], vis = {'a', 'c'}

Index 4, character 'd':

  • 'd' not in vis
  • Check while condition: stk[-1] = 'c' > 'd' is False
  • Push 'd': stk = ['a', 'c', 'd'], vis = {'a', 'c', 'd'}

Index 5, character 'c':

  • 'c' in vis, skip

Index 6, character 'b':

  • 'b' not in vis
  • Check while condition: stk[-1] = 'd' > 'b' is True, and last['d'] = 4 > 6 is False
  • Cannot pop 'd' (it won't appear again)
  • Push 'b': stk = ['a', 'c', 'd', 'b'], vis = {'a', 'c', 'd', 'b'}

Index 7, character 'c':

  • 'c' in vis, skip

Step 3: Build result

  • Join stack: "acdb"

The algorithm successfully builds the lexicographically smallest subsequence containing all distinct characters exactly once. The key insight demonstrated is how we backtrack (pop from stack) when we find a smaller character, but only if we're guaranteed to see the popped character again later.

Solution Implementation

1class Solution:
2    def smallestSubsequence(self, s: str) -> str:
3        # Create a dictionary to store the last occurrence index of each character
4        last_occurrence = {char: index for index, char in enumerate(s)}
5      
6        # Stack to build the result string
7        stack = []
8      
9        # Set to track which characters are already in the stack
10        visited = set()
11      
12        # Iterate through each character in the string
13        for index, char in enumerate(s):
14            # Skip if character is already in our result
15            if char in visited:
16                continue
17          
18            # Remove characters from stack that are:
19            # 1. Greater than current character (lexicographically)
20            # 2. Will appear again later in the string
21            while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
22                removed_char = stack.pop()
23                visited.remove(removed_char)
24          
25            # Add current character to stack and mark as visited
26            stack.append(char)
27            visited.add(char)
28      
29        # Join the stack to form the final result string
30        return "".join(stack)
31
1class Solution {
2    public String smallestSubsequence(String text) {
3        // Count frequency of each character in the string
4        int[] charFrequency = new int[26];
5        for (char c : text.toCharArray()) {
6            charFrequency[c - 'a']++;
7        }
8      
9        // Track which characters are already in the result stack
10        boolean[] isInStack = new boolean[26];
11      
12        // Stack to build the result (using array for efficiency)
13        char[] stack = new char[text.length()];
14        int stackTop = -1;
15      
16        // Process each character in the string
17        for (char currentChar : text.toCharArray()) {
18            // Decrease remaining count for current character
19            charFrequency[currentChar - 'a']--;
20          
21            // Skip if character is already in the result stack
22            if (!isInStack[currentChar - 'a']) {
23                // Remove characters from stack that are:
24                // 1. Greater than current character (not lexicographically smallest)
25                // 2. Will appear again later (have remaining count > 0)
26                while (stackTop >= 0 && 
27                       currentChar < stack[stackTop] && 
28                       charFrequency[stack[stackTop] - 'a'] > 0) {
29                    // Mark the removed character as not in stack
30                    isInStack[stack[stackTop] - 'a'] = false;
31                    stackTop--;
32                }
33              
34                // Add current character to stack
35                stack[++stackTop] = currentChar;
36                // Mark current character as in stack
37                isInStack[currentChar - 'a'] = true;
38            }
39        }
40      
41        // Convert stack to string and return
42        return String.valueOf(stack, 0, stackTop + 1);
43    }
44}
45
1class Solution {
2public:
3    string smallestSubsequence(string s) {
4        int stringLength = s.size();
5      
6        // Store the last occurrence index of each character
7        int lastOccurrence[26] = {0};
8        for (int i = 0; i < stringLength; ++i) {
9            lastOccurrence[s[i] - 'a'] = i;
10        }
11      
12        // Result string to build the smallest subsequence
13        string result;
14      
15        // Bitmask to track which characters are already in the result
16        // Bit i is set if character ('a' + i) is in the result
17        int usedCharMask = 0;
18      
19        // Process each character in the input string
20        for (int i = 0; i < stringLength; ++i) {
21            char currentChar = s[i];
22            int charIndex = currentChar - 'a';
23          
24            // Skip if this character is already in the result
25            if ((usedCharMask >> charIndex) & 1) {
26                continue;
27            }
28          
29            // Remove characters from result that are:
30            // 1. Lexicographically larger than current character
31            // 2. Will appear again later in the string
32            while (!result.empty() && 
33                   result.back() > currentChar && 
34                   lastOccurrence[result.back() - 'a'] > i) {
35                // Remove the character from the bitmask
36                usedCharMask ^= 1 << (result.back() - 'a');
37                result.pop_back();
38            }
39          
40            // Add current character to result
41            result.push_back(currentChar);
42            // Mark this character as used in the bitmask
43            usedCharMask |= 1 << charIndex;
44        }
45      
46        return result;
47    }
48};
49
1/**
2 * Finds the lexicographically smallest subsequence that contains all unique characters from the input string
3 * @param s - The input string containing lowercase English letters
4 * @returns The lexicographically smallest subsequence with all unique characters
5 */
6function smallestSubsequence(s: string): string {
7    /**
8     * Converts a character to its index (0-25) based on lowercase alphabet
9     * @param char - A lowercase letter
10     * @returns The index of the character (a=0, b=1, ..., z=25)
11     */
12    const charToIndex = (char: string): number => {
13        return char.charCodeAt(0) - 'a'.charCodeAt(0);
14    };
15  
16    // Array to store the last occurrence index of each character in the string
17    const lastOccurrence: number[] = new Array(26).fill(0);
18  
19    // Record the last occurrence position for each character
20    for (const [index, char] of [...s].entries()) {
21        lastOccurrence[charToIndex(char)] = index;
22    }
23  
24    // Stack to build the result subsequence
25    const stack: string[] = [];
26  
27    // Bitmask to track which characters are already in the stack
28    // Each bit represents whether a character (a-z) is present
29    let charPresenceMask: number = 0;
30  
31    // Process each character in the string
32    for (const [currentIndex, currentChar] of [...s].entries()) {
33        const currentCharIndex: number = charToIndex(currentChar);
34      
35        // Skip if this character is already in our result stack
36        if ((charPresenceMask >> currentCharIndex) & 1) {
37            continue;
38        }
39      
40        // Remove characters from stack that are:
41        // 1. Lexicographically greater than current character
42        // 2. Will appear again later in the string
43        while (stack.length > 0 && 
44               stack[stack.length - 1] > currentChar && 
45               lastOccurrence[charToIndex(stack[stack.length - 1])] > currentIndex) {
46            // Remove the character from stack and update the presence mask
47            const removedChar: string = stack.pop()!;
48            charPresenceMask ^= 1 << charToIndex(removedChar);
49        }
50      
51        // Add current character to stack
52        stack.push(currentChar);
53      
54        // Mark current character as present in the stack
55        charPresenceMask |= 1 << currentCharIndex;
56    }
57  
58    // Join the stack to form the final result string
59    return stack.join('');
60}
61

Time and Space Complexity

Time Complexity: O(n) where n is the length of the input string s.

  • Creating the last dictionary takes O(n) time as we iterate through the string once.
  • The main loop iterates through each character in the string once, which is O(n).
  • Inside the loop, each character can be pushed to and popped from the stack at most once throughout the entire execution:
    • The if c in vis check is O(1) with a set.
    • The while loop might seem like it could lead to O(n²) complexity, but each character is pushed once and popped at most once, making the total work across all iterations O(n).
    • Adding to and removing from the set vis are O(1) operations.
  • Joining the stack at the end takes O(n) time.

Overall time complexity: O(n) + O(n) + O(n) = O(n)

Space Complexity: O(1) or O(26) = O(1) if we consider only lowercase English letters, otherwise O(k) where k is the number of unique characters.

  • The last dictionary stores at most one entry per unique character in the string, which is O(k) where k is the number of unique characters. For lowercase English letters, this is at most 26, hence O(1).
  • The stack stk can contain at most all unique characters, which is O(k).
  • The set vis similarly stores at most all unique characters, which is O(k).

Overall space complexity: O(k) where k is the number of unique characters in the string, which simplifies to O(1) for a fixed alphabet size.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

Pitfall 1: Forgetting to Check Future Occurrences Before Removing Characters

The Problem: A common mistake is to greedily remove any character that is lexicographically larger than the current character without checking if it appears later in the string. This can lead to missing characters in the final result.

Incorrect Implementation:

# WRONG: Missing the last occurrence check
while stack and stack[-1] > char:  # Missing: and last_occurrence[stack[-1]] > index
    removed_char = stack.pop()
    visited.remove(removed_char)

Example Where It Fails: For input s = "ecbacba":

  • Without the last occurrence check, when we encounter the first 'a', we might remove 'e', 'c', and 'b'
  • But 'e' only appears once at the beginning, so removing it means we lose it forever
  • The result would be missing the character 'e'

Solution: Always verify that a character appears later before removing it:

while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
    removed_char = stack.pop()
    visited.remove(removed_char)

Pitfall 2: Not Tracking Visited Characters Properly

The Problem: Without properly tracking which characters are already in the stack, you might add duplicates or incorrectly process the same character multiple times.

Incorrect Implementation:

# WRONG: Not maintaining visited set properly
for index, char in enumerate(s):
    # Missing: if char in visited: continue
    while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
        stack.pop()  # Missing: visited.remove(stack.pop())
    stack.append(char)
    # Missing: visited.add(char)

Example Where It Fails: For input s = "bcabc":

  • Without the visited check, 'b' would be added multiple times
  • The result might be "bbac" or "bcabc" instead of "abc"

Solution: Maintain a visited set that accurately reflects what's currently in the stack:

  1. Skip characters already in the result
  2. Remove from visited when popping from stack
  3. Add to visited when pushing to stack

Pitfall 3: Using Wrong Data Structure for Last Occurrence

The Problem: Some might try to check for future occurrences by scanning the remaining string each time, leading to O(n²) time complexity.

Incorrect Implementation:

# WRONG: Inefficient O(n²) approach
for index, char in enumerate(s):
    if char in visited:
        continue
    while stack and stack[-1] > char:
        # Scanning remaining string every time - O(n) operation
        if stack[-1] in s[index+1:]:  # Inefficient!
            removed_char = stack.pop()
            visited.remove(removed_char)
        else:
            break

Solution: Pre-compute the last occurrence of each character in O(n) time:

last_occurrence = {char: index for index, char in enumerate(s)}
# Now checking is O(1): last_occurrence[stack[-1]] > index
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

What's the relationship between a tree and a graph?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More