1081. Smallest Subsequence of Distinct Characters


Problem Description

The problem requires us to find a specific type of subsequence from a given string s. A subsequence is a sequence that can be derived from the original string by deleting some or no characters without changing the order of the remaining characters. The goal is to determine the lexicographically smallest subsequence that contains every distinct character of s exactly once.

Let's break it down:

  • Lexicographically smallest: This means the sequence that would appear first if all possible subsequences were sorted in dictionary order.
  • Subsequence: This is a sequence that can be derived from another sequence by deleting some or none of the elements without changing the order of the remaining elements.
  • Contains all the distinct characters of s exactly once: The subsequence must have all unique characters that appear in s, and none of these characters should be repeated.

This is like a puzzle where you need to choose characters, ensuring that you include each unique character at least once, but you can't choose a later character if it will cause an earlier unique character to never appear again in the remaining part of the string s.

Intuition

To find the lexicographically smallest subsequence, we should take the smallest available character unless taking it would prevent us from including another distinct character later on. We can use a stack to keep track of the characters in the current subsequence as we iterate through the string. If we can add a smaller character and still have the opportunity to add the characters currently in the stack later, we should do that to ensure our subsequence is the smallest possible lexicographically.

The intuition to make this efficient is as follows:

  • Track the last occurrence of each character, so we know if we'll see a character again in the future.
  • Use a stack to maintain the characters of the current smallest subsequence.
  • Make use of a set to quickly check if a character is already in our stack (and hence, in our current smallest subsequence).
  • When we consider adding a new character, check if it is smaller than the last one on the stack, and if we'll encounter the last one on the stack again later (thanks to our tracking of the last occurrence).
  • If both conditions are true, we can safely pop the larger character off the stack and remove it from our set without fearing that we won't have it in our subsequence.

By following these steps, we can build the smallest subsequence as we iterate through the string s once.

Solution Approach

The solution uses a greedy algorithm with a stack to generate the lexicographically smallest subsequence. Here is a step-by-step explanation of how the code makes use of this approach:

  1. Track the Last Occurrence: We need to know the last position at which each character appears in the string. This is essential to decide whether we can drop a character from the stack for a smaller one. The last dictionary is used to store the last index for every character using a dictionary comprehension: last = {c: i for i, c in enumerate(s)}.

  2. Stack for Current Smallest Subsequence: A stack data structure is ideal for maintaining the current sequence of characters. As we iterate through the string, characters are pushed onto the stack if they can potentially be part of the lexicographically smallest subsequence. The stack, stk, is initialized as an empty list [].

  3. Set for Visited Characters: To ensure that each character is added only once to the subsequence, a set vis is used to track the characters that are already present in the stack.

  4. Iterate Over the String: We iterate over each character c and its index i in the string using for i, c in enumerate(s). If the character has already been visited, we continue to the next iteration with continue.

  5. Character Comparison and Stack Pop: This is where the greedy algorithm comes into play. For the current character c, we check if the stack is not empty and the char at the top of the stack is greater than c, and also if the character at the top of the stack occurs later in the string (last[stk[-1]] > i). If all these conditions are true, it means we can pop the top of the stack to make our subsequence lexicographically smaller and we are sure that this character will come again later, thanks to our last occurrence tracking. We pop the character from stack and remove it from vis set.

  6. Add the Current Character to the Stack and Visited Set: After making sure the characters on the stack are in the correct lexicographical order and popping those that aren't, we can safely push the current character onto the stack and mark it as visited by adding it to the vis set with vis.add(c).

  7. Build and Return the Result: At the end, the stack stk contains the lexicographically smallest subsequence. We join all the characters in the stack to form a string and return it with return "".join(stk).

By using a stack along with a set and a last occurrence tracking dictionary, the given Python code efficiently computes the lexicographically smallest subsequence that contains all distinct characters of the input string s exactly once.

💪
Level Up Your
Algo Skills

Example Walkthrough

Let's consider a simple example to illustrate the solution approach with the string s = "cbacdcbc". We want to find the lexicographically smallest subsequence which contains every distinct character exactly once.

  1. Track the Last Occurrence: First, we create a dictionary to track the last occurrence of each character:

    • last occurrence of 'c' at index 7
    • last occurrence of 'b' at index 6
    • last occurrence of 'a' at index 2
    • last occurrence of 'd' at index 4 So, last = {'c': 7, 'b': 6, 'a': 2, 'd': 4}.
  2. Stack for Current Smallest Subsequence: We initialize an empty stack stk = [] to keep track of the subsequence characters.

  3. Set for Visited Characters: We also have a set vis = {} to mark characters that we have already seen and added to the stack.

  4. Iterate Over the String: We iterate over each character in "cbacdcbc". Let's go through this process step by step:

    i. On encountering the first 'c', it is not in vis, so we add 'c' to stk and vis.

    ii. Next, 'b' is not in vis, so we add 'b' to stk and vis. stk now contains ['c', 'b'].

    iii. 'a' is not in vis, so we add 'a' to stk. However, 'a' is smaller than 'b' and 'c', and both 'b' and 'c' will come later in the string. We pop 'b' and 'c' from the stack and remove them from vis. Then, add 'a' to stk and vis. stk now contains ['a'].

    iv. 'c' comes again, we add it back since 'a' < 'c' and 'c' is not in vis. stk now contains ['a', 'c'].

    v. 'd' is encountered, not in vis, so we add it to stk. stk now contains ['a', 'c', 'd'].

    vi. Next, we see another 'c'. It's already in vis, so we skip it.

    vii. 'b' comes next, is not in vis, and is less than 'd'. Since 'd' appears again later (we know from our last occurrence dictionary), we will pop 'd' from stk and remove it from vis, and add 'b' instead. stk now contains ['a', 'c', 'b'].

    viii. Lastly, 'c' comes again, but it is in vis, so we skip it.

  5. Character Comparison and Stack Pop: Throughout the iteration, whenever we meet a smaller character, we check if we can pop the larger ones before it, as described above.

  6. Add the Current Character to the Stack and Visited Set: We've been doing this in each iteration whenever necessary.

  7. Build and Return the Result: At the end, stk contains the lexicographically smallest subsequence. We join the elements of the stack to get "acb", which is our final answer.

We successfully found the lexicographically smallest subsequence "acb" that contains every distinct character of s exactly once.

Python Solution

1class Solution:
2    def smallestSubsequence(self, s: str) -> str:
3        # Dictionary to store the last occurrence index for each character
4        last_occurrence = {character: index for index, character in enumerate(s)}
5      
6        # Result stack to build the smallest subsequence
7        stack = []
8      
9        # Set to keep track of characters already in the stack
10        visited = set()
11      
12        # Iterate through the string's characters
13        for index, character in enumerate(s):
14            # Ignore characters already added to the stack
15            if character in visited:
16                continue
17          
18            # Ensure characters in stack are in ascending order and 
19            # remove any character that can be replaced with a lesser character 
20            # that occurs later
21            while stack and stack[-1] > character and last_occurrence[stack[-1]] > index:
22                # Remove the character from visited when it is popped from the stack
23                visited.remove(stack.pop())
24          
25            # Add the current character to the stack and mark it as visited
26            stack.append(character)
27            visited.add(character)
28      
29        # Join the characters in the stack to form the smallest subsequence
30        return "".join(stack)
31

Java Solution

1class Solution {
2    public String smallestSubsequence(String text) {
3        // Count array for each character 'a' through 'z'
4        int[] charCount = new int[26];
5        // Fill the count array with the frequency of each character in the text
6        for (char c : text.toCharArray()) {
7            charCount[c - 'a']++;
8        }
9      
10        // Visited array to track if a character is in the current result
11        boolean[] visited = new boolean[26];
12        // Result array to build the subsequence
13        char[] result = new char[text.length()];
14        // Stack pointer initialization
15        int stackTop = -1;
16      
17        // Iterate over each character in the input text
18        for (char c : text.toCharArray()) {
19            // Decrement the count of the current character
20            charCount[c - 'a']--;
21            // If the character has not been visited, we need to include it in the result
22            if (!visited[c - 'a']) {
23                // While the stack is not empty, the current character is smaller than the
24                // character at the stack's top and the character at the stack's top still
25                // occurs later in the text (i.e., the count is greater than 0), we pop
26                // from the stack marking it as not visited
27                while (stackTop >= 0 && c < result[stackTop] && charCount[result[stackTop] - 'a'] > 0) {
28                    visited[result[stackTop] - 'a'] = false;
29                    stackTop--;
30                }
31                // Push the current character onto the stack and mark it as visited
32                result[++stackTop] = c;
33                visited[c - 'a'] = true;
34            }
35        }
36      
37        // Build the output string from the stack, which contains the smallest subsequence
38        return String.valueOf(result, 0, stackTop + 1);
39    }
40}
41

C++ Solution

1class Solution {
2public:
3    string smallestSubsequence(string s) {
4        int strSize = s.size(); // Determine the length of the input string.
5
6        // Array to store the last position (index) of each character in the string.
7        int lastIndexOf[26] = {0};
8        for (int i = 0; i < strSize; ++i) {
9            lastIndexOf[s[i] - 'a'] = i; // Populate the array with the last index of each character.
10        }
11
12        string result; // This will hold the smallest lexicographical subsequence.
13        int presenceMask = 0; // Bitmask to keep track of characters included in the result.
14
15        for (int i = 0; i < strSize; ++i) {
16            char currentChar = s[i]; // The current character we're considering to add to the result.
17
18            // Check if the character is already included in the result using the presence mask.
19            if ((presenceMask >> (currentChar - 'a')) & 1) {
20                continue; // If it's already present, move to the next character.
21            }
22
23            // While there are characters in the result, the last character in the result is
24            // greater than the current character, and the last occurrence of the last character
25            // in the result is after the current character in the original string...
26            while (!result.empty() && result.back() > currentChar && lastIndexOf[result.back() - 'a'] > i) {
27                // Remove the last character from the result since we've found
28                // a better character to maintain lexicographical order.
29                presenceMask ^= 1 << (result.back() - 'a'); // Update the presence mask.
30                result.pop_back(); // Remove the last character from the result.
31            }
32
33            // Add the current character to the result,
34            // and update the presence mask accordingly.
35            result.push_back(currentChar);
36            presenceMask |= 1 << (currentChar - 'a');
37        }
38
39        // Once the loop is done, the 'result' contains the smallest lexicographical subsequence.
40        return result;
41    }
42};
43

Typescript Solution

1function smallestSubsequence(s: string): string {
2    // Function to get the index of a character in the alphabet
3    const getIndex = (char: string): number => char.charCodeAt(0) - 'a'.charCodeAt(0);
4
5    // Last occurrence index of each character in the alphabet
6    const lastOccurrence: number[] = new Array(26).fill(0);
7    // Update last occurrence index for each character in the string
8    [...s].forEach((char, index) => {
9        lastOccurrence[getIndex(char)] = index;
10    });
11
12    // Stack to hold the characters for the result
13    const stack: string[] = [];
14    // Bitmask to track which characters are in the stack
15    let mask = 0;
16
17    // Iterate over each character in the string
18    [...s].forEach((char, index) => {
19        const charIndex = getIndex(char);
20
21        // Skip if the character is already in the stack
22        if ((mask >> charIndex) & 1) {
23            return;
24        }
25
26        // Ensure characters in stack are smallest possible and remove if not needed
27        while (stack.length > 0 && stack[stack.length - 1] > char && lastOccurrence[getIndex(stack[stack.length - 1])] > index) {
28            const lastCharIndex = getIndex(stack.pop()!);
29            mask ^= 1 << lastCharIndex;
30        }
31
32        // Add the current character to the stack
33        stack.push(char);
34        // Update the bitmask to include the current character
35        mask |= 1 << charIndex;
36    });
37
38    // Join the stack into a string and return it as the smallest subsequence
39    return stack.join('');
40}
41

Time and Space Complexity

Time Complexity

The provided code's time complexity can be analyzed based on the number of iterations and operations that are performed:

  1. Building the last dictionary: This takes O(n) time, where n is the length of string s, as we go through every character of the string once.

  2. Iterating through string s: The main for loop runs for every character, so it is O(n). However, we must also consider the nested while loop.

  3. The nested while loop: Although there is a while loop inside the for loop, each element is added to the stk only once because of the vis set check, and each element is removed from stk only once. This means, in total, the while loop will run at most n times for the entire for loop. This does not change the overall time complexity which remains O(n).

Combining these steps, the overall time complexity is O(n) where n is the length of the input string s.

Space Complexity

Analyzing the space used by the code:

  1. The last dictionary: The dictionary holds a mapping for each unique character to its last occurrence in s. In the worst case, where all characters are unique, it will hold n entries, which leads to O(n) space.

  2. The stk list: As with the dictionary, in the worst case, it may hold all characters if they are all unique, leading to O(n) space.

  3. The vis set: This also, in the worst-case scenario, will hold n entries for n unique characters in s, using O(n) space.

Considering all the auxiliary space used, the overall space complexity is O(n).

Putting it all together in the markdown template:

1
2
3### Time Complexity
4
5The time complexity of the code is `O(n)`, where `n` is the length of the given string `s`. This is due to the fact that the for loop will iterate `n` times and the nested while loop, in combination with the stack operations and the set checks, will still result in a linear number of operations in total across all iterations.
6
7### Space Complexity
8
9The space complexity of the code is `O(n)` as well, due to the storage requirements of the `last` dictionary, the `stk` list, and the `vis` set, all of which in the worst case scale linearly with the number of characters `n` in the string `s`.
😈
Become an
Algo Monster

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫