1987. Number of Unique Good Subsequences


Problem Description

The problem requires us to count the unique good subsequences in a given binary string binary. A subsequence is a series of characters that can be derived from the original string by removing zero or more characters without changing the order of the remaining characters. A good subsequence is one that does not have leading zeros, except for the single "0". For example, in "001", the good subsequences are "0", "0", and "1", but subsequences like "00", "01", and "001" are not good because they have leading zeros.

Our task is to find how many such unique good subsequences exist in the given binary string. We need to calculate this count modulo 10^9 + 7 due to the potential for very large numbers.

To solidify the understanding, let's consider an example: If binary = "101", the unique good subsequences are "", "1", "0", "10", "11", "101". Note that "" (the empty string) is not considered a good subsequence since it's not non-empty, so the answer would be 5 unique good subsequences.

Intuition

To solve this problem, we leverage dynamic programming to keep track of the count of unique good subsequences ending with '0' and '1', separately.

Here's the intuition for the solution:

  • If we encounter a '0' in the binary string, we can form new unique subsequences by appending '0' to all the unique good subsequences ending with '1'. However, because '0' cannot be at the leading position, we cannot start new subsequences with it.
  • If we encounter a '1', we can form new unique subsequences by appending it to all the unique good subsequences ending in '0' and '1', and we can also start a new subsequence with '1'.
  • Special care must be taken to include the single "0" subsequence if '0' is present in the string, which is handled by the ans = 1 line if a '0' is found.

The variable f holds the number of unique good subsequences ending with '1', and g with '0'. When iterating through the string, if we find '0', we update g. If it's '1', we update f. The variable ans is set to '1' when we find a '0' to account for the single "0" subsequence. Finally, we sum up the two variables (and add 'ans' if '0' was encountered) to get the total count and take the modulo 10^9 + 7.

By keeping track of these two counts, we can update how many subsequences we have as we iterate through the binary string. The dynamic programming approach ensures that we do not count duplicates, and by taking the sum of f and g, we cover all unique good subsequences.

Solution Approach

The implementation involves iterating through each character in the binary string and updating two variables, f and g, which represent the count of unique good subsequences ending in '1' and '0', respectively.

Here is a step-by-step guide through the implementation:

  • Initialize f and g to zero. f is used to keep track of subsequences ending with '1', and g for those ending with '0'.
  • The ans variable is initialized to zero, which will later be used to ensure that we include the single "0" subsequence if necessary.
  • We also define mod = 10**9 + 7 to handle the modulo operation for a large number of good subsequences.
  • We loop through each character c in the binary string:
    • If c is "0", we update g to be g + f. This counts any new subsequences formed by appending '0' to subsequences that were previously ending with '1'. We cannot start a new subsequence with '0' to prevent leading zeros. We also set ans to 1 to account for the "0" subsequence.
    • If c is "1", we update f to be f + g + 1. Here, f + g accounts for the new subsequences ending with '1' formed by appending '1' to all previous subsequences ending with '0' and '1'. The + 1 handles the case where '1' itself can start a new good subsequence.
  • After the loop, ans is updated to be the sum of g, f, and the existing ans, which accounts for the "0" when it is present.
  • Finally, we return ans % mod, ensuring that the result adheres to the modulo constraint.

The algorithm uses dynamic programming efficiently, as it effectively keeps tally of the subsequences without generating them, preventing duplicate counts, and cleverly uses modular arithmetic to manage the potentially large numbers.

Here is the mathematical representation of the update steps, enclosed in backticks for proper markdown display:

  • When c is "0": g = (g + f) % mod
  • When c is "1": f = (f + g + 1) % mod
  • Finally: ans = (ans + f + g) % mod

These update rules succinctly represent the logic needed to arrive at the solution of counting the number of unique good subsequences in the binary string.

Example Walkthrough

Let's walk through a small example to illustrate the solution approach with the binary string binary = "00110".

  1. Initialize the variables:
  • f = 0 // Number of subsequences ending with '1'.
  • g = 0 // Number of subsequences ending with '0'.
  • ans = 0 // To account for the single "0" subsequence.
  • mod = 10**9 + 7 // For modulo operation.
  1. Start with the first character, '0':
  • Since it's '0', we cannot start a new subsequence, so g remains 0.
  • But we set ans to 1 to account for the "0" subsequence.
  • Results: f = 0, g = 0, ans = 1.
  1. Move to the second character, '0':
  • Again, it's '0', so we follow the same step, g remains as it is.
  • The value of ans doesn't change because we only include the single "0" once.
  • Results: f = 0, g = 0, ans = 1.
  1. Continue with the third character, '1':
  • The character '1' allows new subsequences by appending '1' to all good subsequences ending with '0' and '1', plus a new subsequence just '1'.
  • Update f: f = (f + g + 1) % mod.
  • Results: f = 1, g = 0, ans = 1.
  1. Next character, '1':
  • Now, f will include the previous f, plus g, plus a new subsequence '1':
  • Update f: f = (1 + 0 + 1) % mod = 2.
  • Results: f = 2, g = 0, ans = 1.
  1. Last character, '0':
  • This '0' is appended to good subsequences ending with '1', which are counted in f.
  • Update g: g = (g + f) % mod = (0 + 2) % mod = 2.
  • And since it's '0', ans stays as 1.
  • Results: f = 2, g = 2, ans = 1.

Now we add up f, g, and ans for the total count of unique good subsequences:

  • ans = (ans + f + g) % mod = (1 + 2 + 2) % mod = 5.

We find there are 5 unique good subsequences in the binary string "00110". These are "0", "1", "10", "11", and "110". Thus, the answer agrees with our algorithm's final computation.

Python Solution

1class Solution:
2    def numberOfUniqueGoodSubsequences(self, binary: str) -> int:
3        # Initialize variables for dynamic programming.
4        # 'count_ones' stores the count of subsequences ending with '1'.
5        # 'count_zeros' stores the count of subsequences ending with '0'.
6        # 'has_zero' tracks if a subsequence with '0' has been observed.
7        count_ones = 0
8        count_zeros = 0
9        has_zero = False
10     
11        # Modular constant to avoid overflow on large numbers.
12        mod = 10**9 + 7
13      
14        # Iterate through the given binary string character by character.
15        for char in binary:
16            if char == "0":
17                # Update count_zeros with count_ones since every subsequence 
18                # ending with '1' can have a '0' appended to make a new subsequence.
19                count_zeros = (count_zeros + count_ones) % mod
20              
21                # We've encountered a zero, so we note that.
22                has_zero = True
23            else:
24                # Update count_ones with count_zeros since every subsequence 
25                # ending with '0' can have a '1' appended to make a new subsequence,
26                # plus the new subsequence consisting of the single character '1'.
27                count_ones = (count_ones + count_zeros + 1) % mod
28      
29        # Aggregate total unique subsequences from those ending in '1' and '0'.
30        total_unique_subseq = (count_ones + count_zeros) % mod
31      
32        # If we've encountered a zero, add that as an additional unique subsequence.
33        if has_zero:
34            total_unique_subseq = (total_unique_subseq + 1) % mod
35      
36        return total_unique_subseq
37

Java Solution

1class Solution {
2    public int numberOfUniqueGoodSubsequences(String binary) {
3        final int MODULO = (int) 1e9 + 7; // Defining the modulo value as a constant
4      
5        int endsWithOne = 0;   // Initialize variable to store subsequences that end with '1'.
6        int endsWithZero = 0;  // Initialize variable to store subsequences that end with '0'.
7        int containsZero = 0;  // A flag to indicate if there's at least one '0' in the sequence.
8      
9        // Loop through each character in the binary string
10        for (int i = 0; i < binary.length(); ++i) {
11            if (binary.charAt(i) == '0') {
12                // If the current character is '0', update number of subseq. ending with '0'
13                endsWithZero = (endsWithZero + endsWithOne) % MODULO;
14              
15                // As we found a '0', the flag is set to 1
16                containsZero = 1;
17            } else {
18                // If the current character is '1', update number of subseq. ending with '1'
19                endsWithOne = (endsWithOne + endsWithZero + 1) % MODULO;
20            }
21        }
22      
23        // Calculate the total by adding subsequences ending with '1', ending with '0'
24        // Also, add the flag value to include the empty subsequence if there was at least one '0'
25        int totalUniqueGoodSubsequences = (endsWithOne + endsWithZero + containsZero) % MODULO;
26      
27        return totalUniqueGoodSubsequences;
28    }
29}
30

C++ Solution

1class Solution {
2public:
3    int numberOfUniqueGoodSubsequences(string binary) {
4        const int MOD = 1e9 + 7; // Define a constant for modulo to keep numbers within the integer range
5        int endsWithOne = 0; // 'f' variable is now 'endsWithOne', tracks subsequences that end with '1'
6        int endsWithZero = 0; // 'g' variable is now 'endsWithZero', tracks subsequences that end with '0'
7        int hasZero = 0; // 'ans' variable is now 'hasZero', to indicate if there is at least one '0' in the string
8
9        // Iterate through the given binary string
10        for (char& c : binary) {
11            if (c == '0') {
12                // If current character is '0', update subsequences ending with '0'
13                endsWithZero = (endsWithZero + endsWithOne) % MOD;
14                // Record that the string contains at least one '0'
15                hasZero = 1;
16            } else {
17                // If current character is '1', update subsequences ending with '1'
18                // The current count is increased by the count of subsequences 
19                // ending in '0' plus this new '1' to form new unique subsequences
20                endsWithOne = (endsWithOne + endsWithZero + 1) % MOD;
21            }
22        }
23
24        // Calculate entire number of unique good subsequences
25        // which is the sum of subsequences ending with '0', '1', plus '0'
26        // if it exists in the string
27        int totalUniqueGoodSubsequences = (hasZero + endsWithOne + endsWithZero) % MOD;
28        return totalUniqueGoodSubsequences;
29    }
30};
31

Typescript Solution

1function numberOfUniqueGoodSubsequences(binary: string): number {
2    // f represents the count of unique good subsequences ending with 1.
3    // g represents the count of unique good subsequences ending with 0.
4    let endingWithOneCount = 0;
5    let endingWithZeroCount = 0;
6
7    // ans will accumulate the final answer.
8    let uniqueGoodSubsequencesCount = 0;
9
10    // The modulus to ensure the answer stays within the integer limit.
11    const MODULUS = 1e9 + 7;
12
13    // Iterate over each character in the binary string.
14    for (const char of binary) {
15        if (char === '0') {
16            // If the current character is '0', update the count of subsequences ending with 0.
17            endingWithZeroCount = (endingWithZeroCount + endingWithOneCount) % MODULUS;
18            // The sequence "0" is always considered in the unique subsequences.
19            uniqueGoodSubsequencesCount = 1;
20        } else {
21            // If the current character is '1', update the count of subsequences ending with 1.
22            endingWithOneCount = (endingWithOneCount + endingWithZeroCount + 1) % MODULUS;
23        }
24    }
25
26    // Add the counts of subsequences ending with '0' and '1', as well as the sequence "0" if present.
27    uniqueGoodSubsequencesCount = (uniqueGoodSubsequencesCount + endingWithOneCount + endingWithZeroCount) % MODULUS;
28  
29    // Return the final answer.
30    return uniqueGoodSubsequencesCount;
31}
32

Time and Space Complexity

Time Complexity

The time complexity of the algorithm is O(N), where N is the length of the input string binary. This stems from the fact that the algorithm processes each character of the input string exactly once, performing a constant number of arithmetic operations and assignment operations for each character.

Space Complexity

The space complexity of the algorithm is O(1). There are only a few integer variables (f, g, ans, and mod) used to track the state throughout the processing of the input string, and their memory usage does not scale with the size of the input.


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫