2539. Count the Number of Good Subsequences

MediumHash TableMathStringCombinatoricsCounting
Leetcode Link

Problem Description

The problem requires us to count the number of "good" subsequences in a given string s. To qualify as good, a subsequence must not be empty and all characters within the subsequence should appear with the same frequency. For example, in the string "aabbcc", the subsequences "abc", "aabbcc", and "bbcc" are considered good because each letter appears with the same frequency within those subsequences.

Since the number of good subsequences could be quite large, we calculate the answer modulo (10^9 + 7), which is a commonly used prime number for modulo operations in computational problems to avoid integer overflow.

Note that a subsequence can be obtained by deleting some or no characters from a string without changing the order of the remaining characters. It does not need to be a contiguous section of the string.

Intuition

The solution strategy for this problem revolves around utilizing combinations and the attributes of subsequences. Here are the broad steps:

  1. Calculate the frequency of each character in the string s. This will be used to determine the number of ways in which subsequences can be formed for each character based on its frequency.

  2. Since we are looking for subsequences where all characters appear the same number of times, we iterate over the possible frequencies (from 1 to the highest frequency of any character in the string).

  3. For each possible frequency, we consider how we can create subsequences where each character appears with that frequency. This requires calculating combinations, as for each character with a frequency greater than or equal to the current frequency, we can choose some or all appearances of that character (hence the combination formula and the addition of 1 for each character's possibilities).

  4. Since we are looking for only non-empty subsequences, we subtract 1 from the total for each frequency because the empty subsequence is included in our combination calculation but does not qualify as 'good.'

  5. We add up all the possibilities for each frequency, always taking the modulo to prevent integer overflow.

In essence, we are leveraging combinatorial mathematics to efficiently count the number of ways we can form good subsequences from a given string.

Learn more about Math and Combinatorics patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Which of the following is a min heap?

Solution Approach

The implementation details for the solution proceed as follows:

  1. Precompute Factorials: For the combination calculations, we need factorials of numbers up to ( N ), where ( N ) is 10001 here, ensuring we cover the range of possible character frequencies in string s. We precompute the factorials to use them multiple times efficiently, reducing the computational complexity.

    1f = [1] * N
    2for i in range(1, N):
    3    f[i] = f[i - 1] * i % MOD
  2. Precompute Modular Inverses: These are needed because in combination calculations we need factorials and also division by factorials (the denominator of the combination formula), which in modular arithmetic is handled by multiplying by modular inverses.

    1g = [1] * N
    2for i in range(1, N):
    3    g[i] = pow(f[i], MOD - 2, MOD)
  3. Combination Function: This function is used to compute the combinations C(n, k) which is the number of ways to choose k elements from n elements. In modular arithmetic, division by factorial is done by multiplication with a precomputed modular inverse of the factorial.

    1def comb(n, k):
    2    return f[n] * g[k] * g[n - k] % MOD
  4. Frequency Calculation: We use the Counter class from the collections module to calculate the frequency of each character in the string.

    1cnt = Counter(s)
  5. Iterate Over Frequencies: The main logic of the program involves iterating over the possible frequencies and calculating the number of good subsequences for each frequency. We are actually adding up all the subsequences over all frequencies that include any character at least that number of times.

    1for i in range(1, max(cnt.values()) + 1):
    2    x = 1
    3    for v in cnt.values():
    4        if v >= i:
    5            x = x * (comb(v, i) + 1) % MOD
    6    ans = (ans + x - 1) % MOD
  6. Total Good Subsequences: Lastly, after iterating through each frequency, we accumulate the count of good subsequences in the ans variable, applying modulo at each step to keep the value within bounds.

  7. Final Result: After the loop, ans holds the final count of good subsequences modulo MOD.

The heart of the implementation is the combination of counting theory and efficient precomputation. The precomputation of factorials and their inverses in a modular sense allows the combination function to be calculated very quickly. The core algorithm iterates through all potential frequencies and multiplies the number of ways the characters with sufficient occurrence can contribute to subsequences of that frequency, modding the result each time to maintain numerical stability.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which technique can we use to find the middle of a linked list?

Example Walkthrough

Let's consider a smaller example to illustrate the solution approach. Suppose we have the string s = "aabb".

Step 1: Count character frequencies. For s = "aabb", we have cnt = {'a': 2, 'b': 2}.

Step 2: We loop over possible frequencies. The highest frequency is 2 in this case, so our loop runs for frequencies 1 and 2.

Step 3: For frequency 1, we calculate the number of ways we can form subsequences where each character ('a' and 'b') appears once. The combination for each character is C(2, 1) which gives us 2 (we can pick either of the two occurrences of a character).

However, since we can either include each character or not, we add 1 to account for the possibility of not choosing a character. Therefore, we have (2 + 1)*(2 + 1) - 1 combinations, which gives us (3 * 3) - 1 = 8 non-empty subsequences for frequency 1.

Step 4: For frequency 2, each character must appear twice if it's included. The combination for each character is C(2, 2) which gives us 1 (we have to pick both occurrences of a character).

So, for the subsequences to be good, each included character must appear exactly twice. The calculation is (1 + 1)*(1 + 1) - 1, which gives us (2 * 2) - 1 = 3 non-empty subsequences for frequency 2. These are 'aabb', 'aa', and 'bb'.

Step 5: Add up the calculations for each frequency. So, we have 8 from frequency 1 and 3 from frequency 2, giving us a total preliminary count of 11 good subsequences.

Step 6: Modulo operation. Our count 11 is less than 10^9 + 7, so no modulo operation is needed in this example. However, if the count were larger, we would apply % MOD to each step.

Step 7: The final count after adding up the possibilities for each frequency is 11 good subsequences for the string "aabb".

This example clarifies each step of the solution, showing how the combinatorics calculations are applied to the problem and how the answer is built up iteratively.

Solution Implementation

1from collections import Counter
2
3# Define constants
4N = 10001
5MOD = 10**9 + 7
6
7# Precompute factorial and factorial modular inverses
8factorials = [1] * N
9factorial_inverses = [1] * N
10for i in range(1, N):
11    factorials[i] = factorials[i - 1] * i % MOD  # Calculate factorial of i
12    factorial_inverses[i] = pow(factorials[i], MOD - 2, MOD)  # Calculate the inverse using Fermat's Little Theorem
13
14
15def comb(n, k):
16    """Calculate the combinational number C(n,k) modulo MOD."""
17    return factorials[n] * factorial_inverses[k] * factorial_inverses[n - k] % MOD
18
19
20class Solution:
21    def countGoodSubsequences(self, s: str) -> int:
22        # Count the number of occurrences of each character in the string
23        char_count = Counter(s)
24      
25        # Initialize the total count of good subsequences to 0
26        total_good_subsequences = 0
27      
28        # Iterate through all possible lengths of subsequences
29        for i in range(1, max(char_count.values()) + 1):
30            ways_to_form = 1
31            # Iterate through the counts of each character
32            for count in char_count.values():
33                # If there are at least i occurrences of the character
34                if count >= i:
35                    # Compute the number of ways to choose i occurrences of this character
36                    # and add 1 for the option of not choosing this character
37                    ways_to_form = ways_to_form * (comb(count, i) + 1) % MOD
38            # Update the total good subsequences count, subtracting 1 to exclude the empty subsequence
39            total_good_subsequences = (total_good_subsequences + ways_to_form - 1) % MOD
40      
41        # Return the total count of good subsequences
42        return total_good_subsequences
43
1class Solution {
2    // Constants for MOD value and the pre-calculated array size
3    private static final int ARRAY_SIZE = 10001;
4    private static final int MOD = 1000000007;
5  
6    // Pre-calculated factorial and modular multiplicative inverse arrays
7    private static final long[] factorialArray = new long[ARRAY_SIZE];
8    private static final long[] inverseArray = new long[ARRAY_SIZE];
9
10    // Static initializer block for pre-calculation of factorials and their inverses
11    static {
12        factorialArray[0] = 1;
13        inverseArray[0] = 1;
14        for (int i = 1; i < ARRAY_SIZE; ++i) {
15            factorialArray[i] = factorialArray[i - 1] * i % MOD;
16            inverseArray[i] = quickModularInverse(factorialArray[i], MOD - 2, MOD);
17        }
18    }
19
20    // Method to compute the quick modular inverse using exponentiation
21    public static long quickModularInverse(long base, long exponent, long modulus) {
22        long result = 1;
23        while (exponent != 0) {
24            if ((exponent & 1) == 1) {
25                result = result * base % modulus;
26            }
27            exponent >>= 1;
28            base = base * base % modulus;
29        }
30        return result;
31    }
32
33    // Method to compute the combination (n choose k) under modulus
34    public static long combination(int n, int k) {
35        return (factorialArray[n] * inverseArray[k] % MOD) * inverseArray[n - k] % MOD;
36    }
37
38    // Method to count the number of good subsequences in the string s
39    public int countGoodSubsequences(String s) {
40        // Count array for each character with a maximum value tracker
41        int[] characterCount = new int[26];
42        int maxCount = 1;
43      
44        // Calculate character counts and find the max count
45        for (int i = 0; i < s.length(); ++i) {
46            maxCount = Math.max(maxCount, ++characterCount[s.charAt(i) - 'a']);
47        }
48      
49        // Initialize the answer value which will be the final count
50        long answer = 0;
51      
52        // Iterate over all possible subsequence lengths
53        for (int i = 1; i <= maxCount; ++i) {
54            long countForLength = 1;
55            for (int j = 0; j < 26; ++j) {
56                if (characterCount[j] >= i) {
57                    countForLength = countForLength * (combination(characterCount[j], i) + 1) % MOD;
58                }
59            }
60            // Subtract 1 because we are excluding the empty subsequence
61            answer = (answer + countForLength - 1) % MOD;
62        }
63      
64        // Return the result after casting to int
65        return (int) answer;
66    }
67}
68
1#include <bits/stdc++.h>
2using namespace std;
3
4class Solution {
5    // Constants for MOD value and the pre-calculated array size
6    static const int ARRAY_SIZE = 10001;
7    static const int MOD = 1000000007;
8
9    // Pre-calculated factorial and modular multiplicative inverse arrays
10    static long long factorialArray[ARRAY_SIZE];
11    static long long inverseArray[ARRAY_SIZE];
12
13    // Static initializer block for pre-calculation of factorials and their inverses
14    static void initialize() {
15        factorialArray[0] = 1;
16        inverseArray[0] = 1;
17        for (int i = 1; i < ARRAY_SIZE; ++i) {
18            factorialArray[i] = factorialArray[i - 1] * i % MOD;
19            inverseArray[i] = quickModularInverse(factorialArray[i], MOD - 2);
20        }
21    }
22
23    // Method to compute the quick modular inverse using exponentiation
24    static long long quickModularInverse(long long base, long long exponent) {
25        long long result = 1;
26        while (exponent > 0) {
27            if (exponent % 2 == 1) {
28                result = (result * base) % MOD;
29            }
30            base = (base * base) % MOD;
31            exponent /= 2;
32        }
33        return result;
34    }
35
36    // Method to compute the combination (n choose k) under modulus
37    static long long combination(int n, int k) {
38        return ((factorialArray[n] * inverseArray[k]) % MOD) * inverseArray[n - k] % MOD;
39    }
40
41public:
42    // Method to count the number of good subsequences in the string s
43    int countGoodSubsequences(string s) {
44        // Initialize our static arrays
45        static bool initialized = false;
46        if (!initialized) {
47            initialize();
48            initialized = true;
49        }
50      
51        // Count array for each character with a maximum value tracker
52        vector<int> characterCount(26, 0);
53        int maxCount = 1;
54      
55        // Calculate character counts and find the max count
56        for (char c : s) {
57            maxCount = max(maxCount, ++characterCount[c - 'a']);
58        }
59      
60        // Initialize the answer value which will be the final count
61        long long answer = 0;
62      
63        // Iterate over all possible subsequence lengths
64        for (int i = 1; i <= maxCount; ++i) {
65            long long countForLength = 1;
66            for (int j = 0; j < 26; ++j) {
67                if (characterCount[j] >= i) {
68                    countForLength = countForLength * (combination(characterCount[j], i) + 1) % MOD;
69                }
70            }
71            // Subtract 1 because we are excluding the empty subsequence
72            answer = (answer + countForLength - 1) % MOD;
73        }
74      
75        // Return the result after casting to int
76        return static_cast<int>(answer);
77    }
78};
79
80long long Solution::factorialArray[Solution::ARRAY_SIZE];
81long long Solution::inverseArray[Solution::ARRAY_SIZE];
82
83// Main function for demonstrational purpose
84int main() {
85    Solution solver;
86    string s = "abcabc";
87    cout << solver.countGoodSubsequences(s) << endl;
88    return 0;
89}
90
1// Constants for MOD value and the pre-calculated array size
2const ARRAY_SIZE: number = 10001;
3const MOD: number = 1000000007;
4
5// Pre-calculated factorial and modular multiplicative inverse arrays
6const factorialArray: bigint[] = new Array(ARRAY_SIZE);
7const inverseArray: bigint[] = new Array(ARRAY_SIZE);
8
9// Static initializer block for pre-calculation of factorials and their inverses
10(() => {
11    factorialArray[0] = BigInt(1);
12    inverseArray[0] = BigInt(1);
13    for (let i = 1; i < ARRAY_SIZE; ++i) {
14        factorialArray[i] = (factorialArray[i - 1] * BigInt(i)) % BigInt(MOD);
15        inverseArray[i] = quickModularInverse(factorialArray[i], BigInt(MOD - 2), MOD);
16    }
17})();
18
19// Function to compute the quick modular inverse using exponentiation
20function quickModularInverse(base: bigint, exponent: bigint, modulus: number): bigint {
21    let result: bigint = BigInt(1);
22    while (exponent !== BigInt(0)) {
23        if ((exponent & BigInt(1)) === BigInt(1)) {
24            result = (result * base) % BigInt(modulus);
25        }
26        exponent >>= BigInt(1);
27        base = (base * base) % BigInt(modulus);
28    }
29    return result;
30}
31
32// Function to compute the combination (n choose k) under modulus
33function combination(n: number, k: number): bigint {
34    return (factorialArray[n] * inverseArray[k] % BigInt(MOD)) * inverseArray[n - k] % BigInt(MOD);
35}
36
37// Function to count the number of good subsequences in the string s
38function countGoodSubsequences(s: string): number {
39    // Array to count each character with a tracker for the maximum count
40    const characterCount: number[] = new Array(26).fill(0);
41    let maxCount: number = 1;
42
43    // Calculate character counts and find the maximum count
44    for (let i = 0; i < s.length; ++i) {
45        maxCount = Math.max(maxCount, ++characterCount[s.charCodeAt(i) - 'a'.charCodeAt(0)]);
46    }
47
48    // Initialize the answer which will be the final count of good subsequences
49    let answer: bigint = BigInt(0);
50
51    // Iterate over all possible subsequence lengths
52    for (let i = 1; i <= maxCount; ++i) {
53        let countForLength: bigint = BigInt(1);
54        for (let j = 0; j < 26; ++j) {
55            if (characterCount[j] >= i) {
56                countForLength = countForLength * (combination(characterCount[j], i) + BigInt(1)) % BigInt(MOD);
57            }
58        }
59        // Subtracting 1 because we are excluding the empty subsequence
60        answer = (answer + countForLength - BigInt(1)) % BigInt(MOD);
61    }
62
63    // Return the result after converting to a number type
64    return Number(answer);
65}
66
Not Sure What to Study? Take the 2-min Quiz:

Which of these pictures shows the visit order of a depth-first search?

Time and Space Complexity

Time Complexity

The time complexity of the provided code can be analyzed as follows:

  • The first two loops to compute arrays f and g are O(N), where N = 10001.
  • The comb function is called for each combination and performs constant operations including multiplication and modulo, which can be considered O(1).
  • The for loop in countGoodSubsequences runs for the maximum value of counts plus one. In the worst case, this can be O(N) where N is the length of string s.
  • Inside this loop, there's another loop iterating over all values in cnt which in the worst case is O(26) since the string s can only be composed of lowercase English letters. For each of these iterations, comb is called and followed by constant time operations.

Given that s represents the length of the input string:

  • The worst case for the cnt dictionary values to be max(cnt.values()) is when all characters are the same, thereby iterating the loop in countGoodSubsequences up to the length of the string s.

Adding everything together, the overall time complexity is O(len(s) * 26), which simplifies to O(len(s)) since 26 is a constant and does not depend on the input size.

Space Complexity

The space complexity can be analyzed as follows:

  • The arrays f and g, each of size N = 10001, result in a space complexity of O(N).
  • The Counter object cnt on the string s will have up to 26 key-value pairs, leading to a constant space contribution, O(26), which is O(1).
  • Temporary variables used for computation inside the loop do not depend on the input size and hence contribute a constant space complexity.

Therefore, the overall space complexity is dominated by the space used for precomputed factorials and inverses, which is O(N).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

Problem: Given a list of tasks and a list of requirements, compute a sequence of tasks that can be performed, such that we complete every task once while satisfying all the requirements.

Which of the following method should we use to solve this problem?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫