1081. Smallest Subsequence of Distinct Characters
Problem Description
Given a string s
, you need to find the lexicographically smallest subsequence that contains every distinct character from s
exactly once.
A subsequence is a sequence that can be derived from the original string by deleting some (possibly zero) characters without changing the order of the remaining characters. For example, "ace" is a subsequence of "aebdc".
Lexicographically smallest means the subsequence that would appear first in dictionary order. For instance, "abc" comes before "abd" lexicographically.
The key requirements are:
- The result must be a subsequence of the original string (maintain relative order)
- Every unique character from the original string must appear exactly once in the result
- Among all valid subsequences meeting the above criteria, return the lexicographically smallest one
For example:
- If
s = "bcabc"
, the distinct characters are 'a', 'b', and 'c'. The possible subsequences containing each character once include "bca", "bac", "cab", "abc", etc. Among these, "abc" is the lexicographically smallest, so it's the answer. - If
s = "cbacdcbc"
, the answer would be "acdb" as it contains all distinct characters ('a', 'b', 'c', 'd') exactly once and is the smallest lexicographical subsequence possible.
Intuition
The challenge is to build the lexicographically smallest subsequence while ensuring we include all distinct characters. The key insight is that we want smaller characters to appear as early as possible in our result.
Consider this: when we encounter a character, we face a decision - should we include it now or wait for a potentially better position later? The answer depends on two factors:
- Can we safely skip this character because it appears again later?
- Can we improve our current result by removing some previously added characters?
This naturally leads us to think about using a stack (or similar structure) to build our answer. As we process each character, we can make greedy decisions:
- If we've already included this character, skip it (we only want each character once)
- If the current character is smaller than the last character in our result, we might want to remove the last character and use the current one instead
- But we can only remove a character from our result if it appears again later in the string
The crucial realization is that we need to know the last occurrence position of each character. This allows us to determine whether it's safe to remove a character from our current result - we can only remove it if we'll encounter it again later.
The algorithm becomes: maintain a stack of characters forming our result. For each new character, if it's smaller than the top of the stack AND the top character appears again later, we pop the stack. We keep popping while this condition holds. Then we add the current character (if not already present).
This greedy approach works because at each step, we're making the locally optimal choice that leads to the globally optimal lexicographically smallest subsequence.
Learn more about Stack, Greedy and Monotonic Stack patterns.
Solution Approach
The implementation uses a monotonic stack approach with additional tracking to ensure each character appears exactly once.
Data Structures Used:
last
: A dictionary that maps each character to its last occurrence index in the stringstk
: A stack (implemented as a list) to build the result subsequencevis
: A set to track which characters are currently in the stack
Algorithm Steps:
-
Preprocessing: Build the
last
dictionary by iterating through the string and recording the last index where each character appears:last = {c: i for i, c in enumerate(s)}
-
Main Processing: Iterate through each character with its index:
for i, c in enumerate(s):
-
Skip Duplicates: If the character is already in our result (tracked by
vis
), skip it:if c in vis: continue
-
Maintain Monotonic Property: Before adding the current character, check if we can improve the result by removing larger characters from the stack:
while stk and stk[-1] > c and last[stk[-1]] > i: vis.remove(stk.pop())
This condition checks:
stk
: Stack is not emptystk[-1] > c
: The top character is lexicographically larger than the current characterlast[stk[-1]] > i
: The top character appears again later in the string (safe to remove)
-
Add Current Character: After potentially removing larger characters, add the current character to the stack and mark it as visited:
stk.append(c) vis.add(c)
-
Build Result: After processing all characters, join the stack to form the final string:
return "".join(stk)
The algorithm ensures that at each step, we maintain the smallest possible lexicographical order while guaranteeing that all distinct characters will eventually be included. The time complexity is O(n) where n is the length of the string, as each character is pushed and popped from the stack at most once.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's trace through the algorithm with s = "cbacdcbc"
.
Step 1: Preprocessing
- Build
last
dictionary:{'c': 7, 'b': 6, 'a': 2, 'd': 4}
- Initialize:
stk = []
,vis = set()
Step 2: Process each character
Index 0, character 'c':
- 'c' not in
vis
- Stack is empty, no need to pop
- Push 'c':
stk = ['c']
,vis = {'c'}
Index 1, character 'b':
- 'b' not in
vis
- Check while condition:
stk[-1] = 'c' > 'b'
is True, andlast['c'] = 7 > 1
is True - Pop 'c':
stk = []
,vis = {}
- Push 'b':
stk = ['b']
,vis = {'b'}
Index 2, character 'a':
- 'a' not in
vis
- Check while condition:
stk[-1] = 'b' > 'a'
is True, andlast['b'] = 6 > 2
is True - Pop 'b':
stk = []
,vis = {}
- Push 'a':
stk = ['a']
,vis = {'a'}
Index 3, character 'c':
- 'c' not in
vis
- Check while condition:
stk[-1] = 'a' > 'c'
is False - Push 'c':
stk = ['a', 'c']
,vis = {'a', 'c'}
Index 4, character 'd':
- 'd' not in
vis
- Check while condition:
stk[-1] = 'c' > 'd'
is False - Push 'd':
stk = ['a', 'c', 'd']
,vis = {'a', 'c', 'd'}
Index 5, character 'c':
- 'c' in
vis
, skip
Index 6, character 'b':
- 'b' not in
vis
- Check while condition:
stk[-1] = 'd' > 'b'
is True, andlast['d'] = 4 > 6
is False - Cannot pop 'd' (it won't appear again)
- Push 'b':
stk = ['a', 'c', 'd', 'b']
,vis = {'a', 'c', 'd', 'b'}
Index 7, character 'c':
- 'c' in
vis
, skip
Step 3: Build result
- Join stack:
"acdb"
The algorithm successfully builds the lexicographically smallest subsequence containing all distinct characters exactly once. The key insight demonstrated is how we backtrack (pop from stack) when we find a smaller character, but only if we're guaranteed to see the popped character again later.
Solution Implementation
1class Solution:
2 def smallestSubsequence(self, s: str) -> str:
3 # Create a dictionary to store the last occurrence index of each character
4 last_occurrence = {char: index for index, char in enumerate(s)}
5
6 # Stack to build the result string
7 stack = []
8
9 # Set to track which characters are already in the stack
10 visited = set()
11
12 # Iterate through each character in the string
13 for index, char in enumerate(s):
14 # Skip if character is already in our result
15 if char in visited:
16 continue
17
18 # Remove characters from stack that are:
19 # 1. Greater than current character (lexicographically)
20 # 2. Will appear again later in the string
21 while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
22 removed_char = stack.pop()
23 visited.remove(removed_char)
24
25 # Add current character to stack and mark as visited
26 stack.append(char)
27 visited.add(char)
28
29 # Join the stack to form the final result string
30 return "".join(stack)
31
1class Solution {
2 public String smallestSubsequence(String text) {
3 // Count frequency of each character in the string
4 int[] charFrequency = new int[26];
5 for (char c : text.toCharArray()) {
6 charFrequency[c - 'a']++;
7 }
8
9 // Track which characters are already in the result stack
10 boolean[] isInStack = new boolean[26];
11
12 // Stack to build the result (using array for efficiency)
13 char[] stack = new char[text.length()];
14 int stackTop = -1;
15
16 // Process each character in the string
17 for (char currentChar : text.toCharArray()) {
18 // Decrease remaining count for current character
19 charFrequency[currentChar - 'a']--;
20
21 // Skip if character is already in the result stack
22 if (!isInStack[currentChar - 'a']) {
23 // Remove characters from stack that are:
24 // 1. Greater than current character (not lexicographically smallest)
25 // 2. Will appear again later (have remaining count > 0)
26 while (stackTop >= 0 &&
27 currentChar < stack[stackTop] &&
28 charFrequency[stack[stackTop] - 'a'] > 0) {
29 // Mark the removed character as not in stack
30 isInStack[stack[stackTop] - 'a'] = false;
31 stackTop--;
32 }
33
34 // Add current character to stack
35 stack[++stackTop] = currentChar;
36 // Mark current character as in stack
37 isInStack[currentChar - 'a'] = true;
38 }
39 }
40
41 // Convert stack to string and return
42 return String.valueOf(stack, 0, stackTop + 1);
43 }
44}
45
1class Solution {
2public:
3 string smallestSubsequence(string s) {
4 int stringLength = s.size();
5
6 // Store the last occurrence index of each character
7 int lastOccurrence[26] = {0};
8 for (int i = 0; i < stringLength; ++i) {
9 lastOccurrence[s[i] - 'a'] = i;
10 }
11
12 // Result string to build the smallest subsequence
13 string result;
14
15 // Bitmask to track which characters are already in the result
16 // Bit i is set if character ('a' + i) is in the result
17 int usedCharMask = 0;
18
19 // Process each character in the input string
20 for (int i = 0; i < stringLength; ++i) {
21 char currentChar = s[i];
22 int charIndex = currentChar - 'a';
23
24 // Skip if this character is already in the result
25 if ((usedCharMask >> charIndex) & 1) {
26 continue;
27 }
28
29 // Remove characters from result that are:
30 // 1. Lexicographically larger than current character
31 // 2. Will appear again later in the string
32 while (!result.empty() &&
33 result.back() > currentChar &&
34 lastOccurrence[result.back() - 'a'] > i) {
35 // Remove the character from the bitmask
36 usedCharMask ^= 1 << (result.back() - 'a');
37 result.pop_back();
38 }
39
40 // Add current character to result
41 result.push_back(currentChar);
42 // Mark this character as used in the bitmask
43 usedCharMask |= 1 << charIndex;
44 }
45
46 return result;
47 }
48};
49
1/**
2 * Finds the lexicographically smallest subsequence that contains all unique characters from the input string
3 * @param s - The input string containing lowercase English letters
4 * @returns The lexicographically smallest subsequence with all unique characters
5 */
6function smallestSubsequence(s: string): string {
7 /**
8 * Converts a character to its index (0-25) based on lowercase alphabet
9 * @param char - A lowercase letter
10 * @returns The index of the character (a=0, b=1, ..., z=25)
11 */
12 const charToIndex = (char: string): number => {
13 return char.charCodeAt(0) - 'a'.charCodeAt(0);
14 };
15
16 // Array to store the last occurrence index of each character in the string
17 const lastOccurrence: number[] = new Array(26).fill(0);
18
19 // Record the last occurrence position for each character
20 for (const [index, char] of [...s].entries()) {
21 lastOccurrence[charToIndex(char)] = index;
22 }
23
24 // Stack to build the result subsequence
25 const stack: string[] = [];
26
27 // Bitmask to track which characters are already in the stack
28 // Each bit represents whether a character (a-z) is present
29 let charPresenceMask: number = 0;
30
31 // Process each character in the string
32 for (const [currentIndex, currentChar] of [...s].entries()) {
33 const currentCharIndex: number = charToIndex(currentChar);
34
35 // Skip if this character is already in our result stack
36 if ((charPresenceMask >> currentCharIndex) & 1) {
37 continue;
38 }
39
40 // Remove characters from stack that are:
41 // 1. Lexicographically greater than current character
42 // 2. Will appear again later in the string
43 while (stack.length > 0 &&
44 stack[stack.length - 1] > currentChar &&
45 lastOccurrence[charToIndex(stack[stack.length - 1])] > currentIndex) {
46 // Remove the character from stack and update the presence mask
47 const removedChar: string = stack.pop()!;
48 charPresenceMask ^= 1 << charToIndex(removedChar);
49 }
50
51 // Add current character to stack
52 stack.push(currentChar);
53
54 // Mark current character as present in the stack
55 charPresenceMask |= 1 << currentCharIndex;
56 }
57
58 // Join the stack to form the final result string
59 return stack.join('');
60}
61
Time and Space Complexity
Time Complexity: O(n)
where n
is the length of the input string s
.
- Creating the
last
dictionary takesO(n)
time as we iterate through the string once. - The main loop iterates through each character in the string once, which is
O(n)
. - Inside the loop, each character can be pushed to and popped from the stack at most once throughout the entire execution:
- The
if c in vis
check isO(1)
with a set. - The while loop might seem like it could lead to
O(n²)
complexity, but each character is pushed once and popped at most once, making the total work across all iterationsO(n)
. - Adding to and removing from the set
vis
areO(1)
operations.
- The
- Joining the stack at the end takes
O(n)
time.
Overall time complexity: O(n) + O(n) + O(n) = O(n)
Space Complexity: O(1)
or O(26) = O(1)
if we consider only lowercase English letters, otherwise O(k)
where k
is the number of unique characters.
- The
last
dictionary stores at most one entry per unique character in the string, which isO(k)
wherek
is the number of unique characters. For lowercase English letters, this is at most 26, henceO(1)
. - The stack
stk
can contain at most all unique characters, which isO(k)
. - The set
vis
similarly stores at most all unique characters, which isO(k)
.
Overall space complexity: O(k)
where k
is the number of unique characters in the string, which simplifies to O(1)
for a fixed alphabet size.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
Pitfall 1: Forgetting to Check Future Occurrences Before Removing Characters
The Problem: A common mistake is to greedily remove any character that is lexicographically larger than the current character without checking if it appears later in the string. This can lead to missing characters in the final result.
Incorrect Implementation:
# WRONG: Missing the last occurrence check while stack and stack[-1] > char: # Missing: and last_occurrence[stack[-1]] > index removed_char = stack.pop() visited.remove(removed_char)
Example Where It Fails:
For input s = "ecbacba"
:
- Without the last occurrence check, when we encounter the first 'a', we might remove 'e', 'c', and 'b'
- But 'e' only appears once at the beginning, so removing it means we lose it forever
- The result would be missing the character 'e'
Solution: Always verify that a character appears later before removing it:
while stack and stack[-1] > char and last_occurrence[stack[-1]] > index: removed_char = stack.pop() visited.remove(removed_char)
Pitfall 2: Not Tracking Visited Characters Properly
The Problem: Without properly tracking which characters are already in the stack, you might add duplicates or incorrectly process the same character multiple times.
Incorrect Implementation:
# WRONG: Not maintaining visited set properly
for index, char in enumerate(s):
# Missing: if char in visited: continue
while stack and stack[-1] > char and last_occurrence[stack[-1]] > index:
stack.pop() # Missing: visited.remove(stack.pop())
stack.append(char)
# Missing: visited.add(char)
Example Where It Fails:
For input s = "bcabc"
:
- Without the visited check, 'b' would be added multiple times
- The result might be "bbac" or "bcabc" instead of "abc"
Solution:
Maintain a visited
set that accurately reflects what's currently in the stack:
- Skip characters already in the result
- Remove from
visited
when popping from stack - Add to
visited
when pushing to stack
Pitfall 3: Using Wrong Data Structure for Last Occurrence
The Problem: Some might try to check for future occurrences by scanning the remaining string each time, leading to O(n²) time complexity.
Incorrect Implementation:
# WRONG: Inefficient O(n²) approach
for index, char in enumerate(s):
if char in visited:
continue
while stack and stack[-1] > char:
# Scanning remaining string every time - O(n) operation
if stack[-1] in s[index+1:]: # Inefficient!
removed_char = stack.pop()
visited.remove(removed_char)
else:
break
Solution: Pre-compute the last occurrence of each character in O(n) time:
last_occurrence = {char: index for index, char in enumerate(s)}
# Now checking is O(1): last_occurrence[stack[-1]] > index
What's the relationship between a tree and a graph?
Recommended Readings
Stack Intro Imagine you have a pile of books on your desk If you want to add a new book you place it on top If you want to read a book you take it from the top And if you simply want to see which book is on the top you
Greedy Introduction div class responsive iframe iframe src https www youtube com embed WTslqPbj7I title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div When do we use greedy Greedy algorithms tend to solve optimization problems Typically they will ask you to calculate the max min of some value Commonly you may see this phrased in the problem as max min longest shortest largest smallest etc These keywords can be identified by just scanning
Monotonic Stack Deque Intro If you'd prefer a video div class responsive iframe iframe src https www youtube com embed Dq_ObZwTY_Q title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div The word monotonic means a list or
Want a Structured Path to Master System Design Too? Don’t Miss This!