1081. Smallest Subsequence of Distinct Characters
Problem Description
The problem requires us to find a specific type of subsequence from a given string s
. A subsequence is a sequence that can be derived from the original string by deleting some or no characters without changing the order of the remaining characters. The goal is to determine the lexicographically smallest subsequence that contains every distinct character of s
exactly once.
Let's break it down:
- Lexicographically smallest: This means the sequence that would appear first if all possible subsequences were sorted in dictionary order.
- Subsequence: This is a sequence that can be derived from another sequence by deleting some or none of the elements without changing the order of the remaining elements.
- Contains all the distinct characters of
s
exactly once: The subsequence must have all unique characters that appear ins
, and none of these characters should be repeated.
This is like a puzzle where you need to choose characters, ensuring that you include each unique character at least once, but you can't choose a later character if it will cause an earlier unique character to never appear again in the remaining part of the string s
.
Intuition
To find the lexicographically smallest subsequence, we should take the smallest available character unless taking it would prevent us from including another distinct character later on. We can use a stack to keep track of the characters in the current subsequence as we iterate through the string. If we can add a smaller character and still have the opportunity to add the characters currently in the stack later, we should do that to ensure our subsequence is the smallest possible lexicographically.
The intuition to make this efficient is as follows:
- Track the last occurrence of each character, so we know if we'll see a character again in the future.
- Use a stack to maintain the characters of the current smallest subsequence.
- Make use of a set to quickly check if a character is already in our stack (and hence, in our current smallest subsequence).
- When we consider adding a new character, check if it is smaller than the last one on the stack, and if we'll encounter the last one on the stack again later (thanks to our tracking of the last occurrence).
- If both conditions are true, we can safely pop the larger character off the stack and remove it from our set without fearing that we won't have it in our subsequence.
By following these steps, we can build the smallest subsequence as we iterate through the string s
once.
Learn more about Stack, Greedy and Monotonic Stack patterns.
Solution Approach
The solution uses a greedy algorithm with a stack to generate the lexicographically smallest subsequence. Here is a step-by-step explanation of how the code makes use of this approach:
-
Track the Last Occurrence: We need to know the last position at which each character appears in the string. This is essential to decide whether we can drop a character from the stack for a smaller one. The
last
dictionary is used to store the last index for every character using a dictionary comprehension:last = {c: i for i, c in enumerate(s)}
. -
Stack for Current Smallest Subsequence: A stack data structure is ideal for maintaining the current sequence of characters. As we iterate through the string, characters are pushed onto the stack if they can potentially be part of the lexicographically smallest subsequence. The stack,
stk
, is initialized as an empty list[]
. -
Set for Visited Characters: To ensure that each character is added only once to the subsequence, a set
vis
is used to track the characters that are already present in the stack. -
Iterate Over the String: We iterate over each character
c
and its indexi
in the string usingfor i, c in enumerate(s)
. If the character has already been visited, we continue to the next iteration withcontinue
. -
Character Comparison and Stack Pop: This is where the greedy algorithm comes into play. For the current character
c
, we check if the stack is not empty and the char at the top of the stack is greater thanc
, and also if the character at the top of the stack occurs later in the string (last[stk[-1]] > i
). If all these conditions are true, it means we can pop the top of the stack to make our subsequence lexicographically smaller and we are sure that this character will come again later, thanks to our last occurrence tracking. We pop the character from stack and remove it fromvis
set. -
Add the Current Character to the Stack and Visited Set: After making sure the characters on the stack are in the correct lexicographical order and popping those that aren't, we can safely push the current character onto the stack and mark it as visited by adding it to the
vis
set withvis.add(c)
. -
Build and Return the Result: At the end, the stack
stk
contains the lexicographically smallest subsequence. We join all the characters in the stack to form a string and return it withreturn "".join(stk)
.
By using a stack along with a set and a last occurrence tracking dictionary, the given Python code efficiently computes the lexicographically smallest subsequence that contains all distinct characters of the input string s
exactly once.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's consider a simple example to illustrate the solution approach with the string s = "cbacdcbc"
. We want to find the lexicographically smallest subsequence which contains every distinct character exactly once.
-
Track the Last Occurrence: First, we create a dictionary to track the last occurrence of each character:
- last occurrence of 'c' at index 7
- last occurrence of 'b' at index 6
- last occurrence of 'a' at index 2
- last occurrence of 'd' at index 4
So,
last = {'c': 7, 'b': 6, 'a': 2, 'd': 4}
.
-
Stack for Current Smallest Subsequence: We initialize an empty stack
stk = []
to keep track of the subsequence characters. -
Set for Visited Characters: We also have a set
vis = {}
to mark characters that we have already seen and added to the stack. -
Iterate Over the String: We iterate over each character in
"cbacdcbc"
. Let's go through this process step by step:i. On encountering the first 'c', it is not in
vis
, so we add 'c' tostk
andvis
.ii. Next, 'b' is not in
vis
, so we add 'b' tostk
andvis
.stk
now contains ['c', 'b'].iii. 'a' is not in
vis
, so we add 'a' tostk
. However, 'a' is smaller than 'b' and 'c', and both 'b' and 'c' will come later in the string. We pop 'b' and 'c' from the stack and remove them fromvis
. Then, add 'a' tostk
andvis
.stk
now contains ['a'].iv. 'c' comes again, we add it back since 'a' < 'c' and 'c' is not in
vis
.stk
now contains ['a', 'c'].v. 'd' is encountered, not in
vis
, so we add it tostk
.stk
now contains ['a', 'c', 'd'].vi. Next, we see another 'c'. It's already in
vis
, so we skip it.vii. 'b' comes next, is not in
vis
, and is less than 'd'. Since 'd' appears again later (we know from our last occurrence dictionary), we will pop 'd' fromstk
and remove it fromvis
, and add 'b' instead.stk
now contains ['a', 'c', 'b'].viii. Lastly, 'c' comes again, but it is in
vis
, so we skip it. -
Character Comparison and Stack Pop: Throughout the iteration, whenever we meet a smaller character, we check if we can pop the larger ones before it, as described above.
-
Add the Current Character to the Stack and Visited Set: We've been doing this in each iteration whenever necessary.
-
Build and Return the Result: At the end,
stk
contains the lexicographically smallest subsequence. We join the elements of the stack to get "acb", which is our final answer.
We successfully found the lexicographically smallest subsequence "acb" that contains every distinct character of s
exactly once.
Solution Implementation
1class Solution:
2 def smallestSubsequence(self, s: str) -> str:
3 # Dictionary to store the last occurrence index for each character
4 last_occurrence = {character: index for index, character in enumerate(s)}
5
6 # Result stack to build the smallest subsequence
7 stack = []
8
9 # Set to keep track of characters already in the stack
10 visited = set()
11
12 # Iterate through the string's characters
13 for index, character in enumerate(s):
14 # Ignore characters already added to the stack
15 if character in visited:
16 continue
17
18 # Ensure characters in stack are in ascending order and
19 # remove any character that can be replaced with a lesser character
20 # that occurs later
21 while stack and stack[-1] > character and last_occurrence[stack[-1]] > index:
22 # Remove the character from visited when it is popped from the stack
23 visited.remove(stack.pop())
24
25 # Add the current character to the stack and mark it as visited
26 stack.append(character)
27 visited.add(character)
28
29 # Join the characters in the stack to form the smallest subsequence
30 return "".join(stack)
31
1class Solution {
2 public String smallestSubsequence(String text) {
3 // Count array for each character 'a' through 'z'
4 int[] charCount = new int[26];
5 // Fill the count array with the frequency of each character in the text
6 for (char c : text.toCharArray()) {
7 charCount[c - 'a']++;
8 }
9
10 // Visited array to track if a character is in the current result
11 boolean[] visited = new boolean[26];
12 // Result array to build the subsequence
13 char[] result = new char[text.length()];
14 // Stack pointer initialization
15 int stackTop = -1;
16
17 // Iterate over each character in the input text
18 for (char c : text.toCharArray()) {
19 // Decrement the count of the current character
20 charCount[c - 'a']--;
21 // If the character has not been visited, we need to include it in the result
22 if (!visited[c - 'a']) {
23 // While the stack is not empty, the current character is smaller than the
24 // character at the stack's top and the character at the stack's top still
25 // occurs later in the text (i.e., the count is greater than 0), we pop
26 // from the stack marking it as not visited
27 while (stackTop >= 0 && c < result[stackTop] && charCount[result[stackTop] - 'a'] > 0) {
28 visited[result[stackTop] - 'a'] = false;
29 stackTop--;
30 }
31 // Push the current character onto the stack and mark it as visited
32 result[++stackTop] = c;
33 visited[c - 'a'] = true;
34 }
35 }
36
37 // Build the output string from the stack, which contains the smallest subsequence
38 return String.valueOf(result, 0, stackTop + 1);
39 }
40}
41
1class Solution {
2public:
3 string smallestSubsequence(string s) {
4 int strSize = s.size(); // Determine the length of the input string.
5
6 // Array to store the last position (index) of each character in the string.
7 int lastIndexOf[26] = {0};
8 for (int i = 0; i < strSize; ++i) {
9 lastIndexOf[s[i] - 'a'] = i; // Populate the array with the last index of each character.
10 }
11
12 string result; // This will hold the smallest lexicographical subsequence.
13 int presenceMask = 0; // Bitmask to keep track of characters included in the result.
14
15 for (int i = 0; i < strSize; ++i) {
16 char currentChar = s[i]; // The current character we're considering to add to the result.
17
18 // Check if the character is already included in the result using the presence mask.
19 if ((presenceMask >> (currentChar - 'a')) & 1) {
20 continue; // If it's already present, move to the next character.
21 }
22
23 // While there are characters in the result, the last character in the result is
24 // greater than the current character, and the last occurrence of the last character
25 // in the result is after the current character in the original string...
26 while (!result.empty() && result.back() > currentChar && lastIndexOf[result.back() - 'a'] > i) {
27 // Remove the last character from the result since we've found
28 // a better character to maintain lexicographical order.
29 presenceMask ^= 1 << (result.back() - 'a'); // Update the presence mask.
30 result.pop_back(); // Remove the last character from the result.
31 }
32
33 // Add the current character to the result,
34 // and update the presence mask accordingly.
35 result.push_back(currentChar);
36 presenceMask |= 1 << (currentChar - 'a');
37 }
38
39 // Once the loop is done, the 'result' contains the smallest lexicographical subsequence.
40 return result;
41 }
42};
43
1function smallestSubsequence(s: string): string {
2 // Function to get the index of a character in the alphabet
3 const getIndex = (char: string): number => char.charCodeAt(0) - 'a'.charCodeAt(0);
4
5 // Last occurrence index of each character in the alphabet
6 const lastOccurrence: number[] = new Array(26).fill(0);
7 // Update last occurrence index for each character in the string
8 [...s].forEach((char, index) => {
9 lastOccurrence[getIndex(char)] = index;
10 });
11
12 // Stack to hold the characters for the result
13 const stack: string[] = [];
14 // Bitmask to track which characters are in the stack
15 let mask = 0;
16
17 // Iterate over each character in the string
18 [...s].forEach((char, index) => {
19 const charIndex = getIndex(char);
20
21 // Skip if the character is already in the stack
22 if ((mask >> charIndex) & 1) {
23 return;
24 }
25
26 // Ensure characters in stack are smallest possible and remove if not needed
27 while (stack.length > 0 && stack[stack.length - 1] > char && lastOccurrence[getIndex(stack[stack.length - 1])] > index) {
28 const lastCharIndex = getIndex(stack.pop()!);
29 mask ^= 1 << lastCharIndex;
30 }
31
32 // Add the current character to the stack
33 stack.push(char);
34 // Update the bitmask to include the current character
35 mask |= 1 << charIndex;
36 });
37
38 // Join the stack into a string and return it as the smallest subsequence
39 return stack.join('');
40}
41
Time and Space Complexity
Time Complexity
The provided code's time complexity can be analyzed based on the number of iterations and operations that are performed:
-
Building the
last
dictionary: This takesO(n)
time, wheren
is the length of strings
, as we go through every character of the string once. -
Iterating through string
s
: The main for loop runs for every character, so it isO(n)
. However, we must also consider the nested while loop. -
The nested while loop: Although there is a while loop inside the for loop, each element is added to the
stk
only once because of thevis
set check, and each element is removed fromstk
only once. This means, in total, the while loop will run at mostn
times for the entire for loop. This does not change the overall time complexity which remainsO(n)
.
Combining these steps, the overall time complexity is O(n)
where n
is the length of the input string s
.
Space Complexity
Analyzing the space used by the code:
-
The
last
dictionary: The dictionary holds a mapping for each unique character to its last occurrence ins
. In the worst case, where all characters are unique, it will holdn
entries, which leads toO(n)
space. -
The
stk
list: As with the dictionary, in the worst case, it may hold all characters if they are all unique, leading toO(n)
space. -
The
vis
set: This also, in the worst-case scenario, will holdn
entries forn
unique characters ins
, usingO(n)
space.
Considering all the auxiliary space used, the overall space complexity is O(n)
.
Putting it all together in the markdown template:
1
2
3### Time Complexity
4
5The time complexity of the code is `O(n)`, where `n` is the length of the given string `s`. This is due to the fact that the for loop will iterate `n` times and the nested while loop, in combination with the stack operations and the set checks, will still result in a linear number of operations in total across all iterations.
6
7### Space Complexity
8
9The space complexity of the code is `O(n)` as well, due to the storage requirements of the `last` dictionary, the `stk` list, and the `vis` set, all of which in the worst case scale linearly with the number of characters `n` in the string `s`.
Learn more about how to find time and space complexity quickly using problem constraints.
Which algorithm should you use to find a node that is close to the root of the tree?
Recommended Readings
Stack Intro Imagine you have a pile of books on your desk If you want to add a new book you place it on top If you want to read a book you take it from the top And if you simply want to see which book is on the top you
Greedy Introduction div class responsive iframe iframe src https www youtube com embed WTslqPbj7I title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div When do we use greedy Greedy algorithms tend to solve optimization problems Typically they will ask you to calculate the max min of some value Commonly you may see this phrased in the problem as max min longest shortest largest smallest etc These keywords can be identified by just scanning
Monotonic Stack Deque Intro If you'd prefer a video div class responsive iframe iframe src https www youtube com embed Dq_ObZwTY_Q title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div The word monotonic means a list or
Got a question?ย Ask the Monster Assistantย anything you don't understand.
Still not clear? ย Submitย the part you don't understand to our editors. Or join ourย Discord and ask the community.