2539. Count the Number of Good Subsequences
Problem Description
The problem requires us to count the number of "good" subsequences in a given string s
. To qualify as good, a subsequence must not be empty and all characters within the subsequence should appear with the same frequency. For example, in the string "aabbcc", the subsequences "abc", "aabbcc", and "bbcc" are considered good because each letter appears with the same frequency within those subsequences.
Since the number of good subsequences could be quite large, we calculate the answer modulo (10^9 + 7), which is a commonly used prime number for modulo operations in computational problems to avoid integer overflow.
Note that a subsequence can be obtained by deleting some or no characters from a string without changing the order of the remaining characters. It does not need to be a contiguous section of the string.
Intuition
The solution strategy for this problem revolves around utilizing combinations and the attributes of subsequences. Here are the broad steps:
-
Calculate the frequency of each character in the string
s
. This will be used to determine the number of ways in which subsequences can be formed for each character based on its frequency. -
Since we are looking for subsequences where all characters appear the same number of times, we iterate over the possible frequencies (from 1 to the highest frequency of any character in the string).
-
For each possible frequency, we consider how we can create subsequences where each character appears with that frequency. This requires calculating combinations, as for each character with a frequency greater than or equal to the current frequency, we can choose some or all appearances of that character (hence the combination formula and the addition of 1 for each character's possibilities).
-
Since we are looking for only non-empty subsequences, we subtract 1 from the total for each frequency because the empty subsequence is included in our combination calculation but does not qualify as 'good.'
-
We add up all the possibilities for each frequency, always taking the modulo to prevent integer overflow.
In essence, we are leveraging combinatorial mathematics to efficiently count the number of ways we can form good subsequences from a given string.
Learn more about Math and Combinatorics patterns.
Solution Approach
The implementation details for the solution proceed as follows:
-
Precompute Factorials: For the combination calculations, we need factorials of numbers up to ( N ), where ( N ) is 10001 here, ensuring we cover the range of possible character frequencies in string
s
. We precompute the factorials to use them multiple times efficiently, reducing the computational complexity.f = [1] * N for i in range(1, N): f[i] = f[i - 1] * i % MOD
-
Precompute Modular Inverses: These are needed because in combination calculations we need factorials and also division by factorials (the denominator of the combination formula), which in modular arithmetic is handled by multiplying by modular inverses.
g = [1] * N for i in range(1, N): g[i] = pow(f[i], MOD - 2, MOD)
-
Combination Function: This function is used to compute the combinations
C(n, k)
which is the number of ways to choosek
elements fromn
elements. In modular arithmetic, division by factorial is done by multiplication with a precomputed modular inverse of the factorial.def comb(n, k): return f[n] * g[k] * g[n - k] % MOD
-
Frequency Calculation: We use the
Counter
class from thecollections
module to calculate the frequency of each character in the string.cnt = Counter(s)
-
Iterate Over Frequencies: The main logic of the program involves iterating over the possible frequencies and calculating the number of good subsequences for each frequency. We are actually adding up all the subsequences over all frequencies that include any character at least that number of times.
for i in range(1, max(cnt.values()) + 1): x = 1 for v in cnt.values(): if v >= i: x = x * (comb(v, i) + 1) % MOD ans = (ans + x - 1) % MOD
-
Total Good Subsequences: Lastly, after iterating through each frequency, we accumulate the count of good subsequences in the
ans
variable, applying modulo at each step to keep the value within bounds. -
Final Result: After the loop,
ans
holds the final count of good subsequences moduloMOD
.
The heart of the implementation is the combination of counting theory and efficient precomputation. The precomputation of factorials and their inverses in a modular sense allows the combination function to be calculated very quickly. The core algorithm iterates through all potential frequencies and multiplies the number of ways the characters with sufficient occurrence can contribute to subsequences of that frequency, modding the result each time to maintain numerical stability.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's consider a smaller example to illustrate the solution approach. Suppose we have the string s = "aabb"
.
Step 1: Count character frequencies. For s = "aabb"
, we have cnt = {'a': 2, 'b': 2}
.
Step 2: We loop over possible frequencies. The highest frequency is 2 in this case, so our loop runs for frequencies 1 and 2.
Step 3: For frequency 1, we calculate the number of ways we can form subsequences where each character ('a'
and 'b'
) appears once. The combination for each character is C(2, 1)
which gives us 2 (we can pick either of the two occurrences of a character).
However, since we can either include each character or not, we add 1 to account for the possibility of not choosing a character. Therefore, we have (2 + 1)*(2 + 1) - 1
combinations, which gives us (3 * 3) - 1 = 8
non-empty subsequences for frequency 1.
Step 4: For frequency 2, each character must appear twice if it's included. The combination for each character is C(2, 2)
which gives us 1 (we have to pick both occurrences of a character).
So, for the subsequences to be good, each included character must appear exactly twice. The calculation is (1 + 1)*(1 + 1) - 1
, which gives us (2 * 2) - 1 = 3
non-empty subsequences for frequency 2. These are 'aabb', 'aa', and 'bb'.
Step 5: Add up the calculations for each frequency. So, we have 8
from frequency 1 and 3
from frequency 2, giving us a total preliminary count of 11
good subsequences.
Step 6: Modulo operation. Our count 11
is less than 10^9 + 7
, so no modulo operation is needed in this example. However, if the count were larger, we would apply % MOD
to each step.
Step 7: The final count after adding up the possibilities for each frequency is 11
good subsequences for the string "aabb".
This example clarifies each step of the solution, showing how the combinatorics calculations are applied to the problem and how the answer is built up iteratively.
Solution Implementation
1from collections import Counter
2
3# Define constants
4N = 10001
5MOD = 10**9 + 7
6
7# Precompute factorial and factorial modular inverses
8factorials = [1] * N
9factorial_inverses = [1] * N
10for i in range(1, N):
11 factorials[i] = factorials[i - 1] * i % MOD # Calculate factorial of i
12 factorial_inverses[i] = pow(factorials[i], MOD - 2, MOD) # Calculate the inverse using Fermat's Little Theorem
13
14
15def comb(n, k):
16 """Calculate the combinational number C(n,k) modulo MOD."""
17 return factorials[n] * factorial_inverses[k] * factorial_inverses[n - k] % MOD
18
19
20class Solution:
21 def countGoodSubsequences(self, s: str) -> int:
22 # Count the number of occurrences of each character in the string
23 char_count = Counter(s)
24
25 # Initialize the total count of good subsequences to 0
26 total_good_subsequences = 0
27
28 # Iterate through all possible lengths of subsequences
29 for i in range(1, max(char_count.values()) + 1):
30 ways_to_form = 1
31 # Iterate through the counts of each character
32 for count in char_count.values():
33 # If there are at least i occurrences of the character
34 if count >= i:
35 # Compute the number of ways to choose i occurrences of this character
36 # and add 1 for the option of not choosing this character
37 ways_to_form = ways_to_form * (comb(count, i) + 1) % MOD
38 # Update the total good subsequences count, subtracting 1 to exclude the empty subsequence
39 total_good_subsequences = (total_good_subsequences + ways_to_form - 1) % MOD
40
41 # Return the total count of good subsequences
42 return total_good_subsequences
43
1class Solution {
2 // Constants for MOD value and the pre-calculated array size
3 private static final int ARRAY_SIZE = 10001;
4 private static final int MOD = 1000000007;
5
6 // Pre-calculated factorial and modular multiplicative inverse arrays
7 private static final long[] factorialArray = new long[ARRAY_SIZE];
8 private static final long[] inverseArray = new long[ARRAY_SIZE];
9
10 // Static initializer block for pre-calculation of factorials and their inverses
11 static {
12 factorialArray[0] = 1;
13 inverseArray[0] = 1;
14 for (int i = 1; i < ARRAY_SIZE; ++i) {
15 factorialArray[i] = factorialArray[i - 1] * i % MOD;
16 inverseArray[i] = quickModularInverse(factorialArray[i], MOD - 2, MOD);
17 }
18 }
19
20 // Method to compute the quick modular inverse using exponentiation
21 public static long quickModularInverse(long base, long exponent, long modulus) {
22 long result = 1;
23 while (exponent != 0) {
24 if ((exponent & 1) == 1) {
25 result = result * base % modulus;
26 }
27 exponent >>= 1;
28 base = base * base % modulus;
29 }
30 return result;
31 }
32
33 // Method to compute the combination (n choose k) under modulus
34 public static long combination(int n, int k) {
35 return (factorialArray[n] * inverseArray[k] % MOD) * inverseArray[n - k] % MOD;
36 }
37
38 // Method to count the number of good subsequences in the string s
39 public int countGoodSubsequences(String s) {
40 // Count array for each character with a maximum value tracker
41 int[] characterCount = new int[26];
42 int maxCount = 1;
43
44 // Calculate character counts and find the max count
45 for (int i = 0; i < s.length(); ++i) {
46 maxCount = Math.max(maxCount, ++characterCount[s.charAt(i) - 'a']);
47 }
48
49 // Initialize the answer value which will be the final count
50 long answer = 0;
51
52 // Iterate over all possible subsequence lengths
53 for (int i = 1; i <= maxCount; ++i) {
54 long countForLength = 1;
55 for (int j = 0; j < 26; ++j) {
56 if (characterCount[j] >= i) {
57 countForLength = countForLength * (combination(characterCount[j], i) + 1) % MOD;
58 }
59 }
60 // Subtract 1 because we are excluding the empty subsequence
61 answer = (answer + countForLength - 1) % MOD;
62 }
63
64 // Return the result after casting to int
65 return (int) answer;
66 }
67}
68
1#include <bits/stdc++.h>
2using namespace std;
3
4class Solution {
5 // Constants for MOD value and the pre-calculated array size
6 static const int ARRAY_SIZE = 10001;
7 static const int MOD = 1000000007;
8
9 // Pre-calculated factorial and modular multiplicative inverse arrays
10 static long long factorialArray[ARRAY_SIZE];
11 static long long inverseArray[ARRAY_SIZE];
12
13 // Static initializer block for pre-calculation of factorials and their inverses
14 static void initialize() {
15 factorialArray[0] = 1;
16 inverseArray[0] = 1;
17 for (int i = 1; i < ARRAY_SIZE; ++i) {
18 factorialArray[i] = factorialArray[i - 1] * i % MOD;
19 inverseArray[i] = quickModularInverse(factorialArray[i], MOD - 2);
20 }
21 }
22
23 // Method to compute the quick modular inverse using exponentiation
24 static long long quickModularInverse(long long base, long long exponent) {
25 long long result = 1;
26 while (exponent > 0) {
27 if (exponent % 2 == 1) {
28 result = (result * base) % MOD;
29 }
30 base = (base * base) % MOD;
31 exponent /= 2;
32 }
33 return result;
34 }
35
36 // Method to compute the combination (n choose k) under modulus
37 static long long combination(int n, int k) {
38 return ((factorialArray[n] * inverseArray[k]) % MOD) * inverseArray[n - k] % MOD;
39 }
40
41public:
42 // Method to count the number of good subsequences in the string s
43 int countGoodSubsequences(string s) {
44 // Initialize our static arrays
45 static bool initialized = false;
46 if (!initialized) {
47 initialize();
48 initialized = true;
49 }
50
51 // Count array for each character with a maximum value tracker
52 vector<int> characterCount(26, 0);
53 int maxCount = 1;
54
55 // Calculate character counts and find the max count
56 for (char c : s) {
57 maxCount = max(maxCount, ++characterCount[c - 'a']);
58 }
59
60 // Initialize the answer value which will be the final count
61 long long answer = 0;
62
63 // Iterate over all possible subsequence lengths
64 for (int i = 1; i <= maxCount; ++i) {
65 long long countForLength = 1;
66 for (int j = 0; j < 26; ++j) {
67 if (characterCount[j] >= i) {
68 countForLength = countForLength * (combination(characterCount[j], i) + 1) % MOD;
69 }
70 }
71 // Subtract 1 because we are excluding the empty subsequence
72 answer = (answer + countForLength - 1) % MOD;
73 }
74
75 // Return the result after casting to int
76 return static_cast<int>(answer);
77 }
78};
79
80long long Solution::factorialArray[Solution::ARRAY_SIZE];
81long long Solution::inverseArray[Solution::ARRAY_SIZE];
82
83// Main function for demonstrational purpose
84int main() {
85 Solution solver;
86 string s = "abcabc";
87 cout << solver.countGoodSubsequences(s) << endl;
88 return 0;
89}
90
1// Constants for MOD value and the pre-calculated array size
2const ARRAY_SIZE: number = 10001;
3const MOD: number = 1000000007;
4
5// Pre-calculated factorial and modular multiplicative inverse arrays
6const factorialArray: bigint[] = new Array(ARRAY_SIZE);
7const inverseArray: bigint[] = new Array(ARRAY_SIZE);
8
9// Static initializer block for pre-calculation of factorials and their inverses
10(() => {
11 factorialArray[0] = BigInt(1);
12 inverseArray[0] = BigInt(1);
13 for (let i = 1; i < ARRAY_SIZE; ++i) {
14 factorialArray[i] = (factorialArray[i - 1] * BigInt(i)) % BigInt(MOD);
15 inverseArray[i] = quickModularInverse(factorialArray[i], BigInt(MOD - 2), MOD);
16 }
17})();
18
19// Function to compute the quick modular inverse using exponentiation
20function quickModularInverse(base: bigint, exponent: bigint, modulus: number): bigint {
21 let result: bigint = BigInt(1);
22 while (exponent !== BigInt(0)) {
23 if ((exponent & BigInt(1)) === BigInt(1)) {
24 result = (result * base) % BigInt(modulus);
25 }
26 exponent >>= BigInt(1);
27 base = (base * base) % BigInt(modulus);
28 }
29 return result;
30}
31
32// Function to compute the combination (n choose k) under modulus
33function combination(n: number, k: number): bigint {
34 return (factorialArray[n] * inverseArray[k] % BigInt(MOD)) * inverseArray[n - k] % BigInt(MOD);
35}
36
37// Function to count the number of good subsequences in the string s
38function countGoodSubsequences(s: string): number {
39 // Array to count each character with a tracker for the maximum count
40 const characterCount: number[] = new Array(26).fill(0);
41 let maxCount: number = 1;
42
43 // Calculate character counts and find the maximum count
44 for (let i = 0; i < s.length; ++i) {
45 maxCount = Math.max(maxCount, ++characterCount[s.charCodeAt(i) - 'a'.charCodeAt(0)]);
46 }
47
48 // Initialize the answer which will be the final count of good subsequences
49 let answer: bigint = BigInt(0);
50
51 // Iterate over all possible subsequence lengths
52 for (let i = 1; i <= maxCount; ++i) {
53 let countForLength: bigint = BigInt(1);
54 for (let j = 0; j < 26; ++j) {
55 if (characterCount[j] >= i) {
56 countForLength = countForLength * (combination(characterCount[j], i) + BigInt(1)) % BigInt(MOD);
57 }
58 }
59 // Subtracting 1 because we are excluding the empty subsequence
60 answer = (answer + countForLength - BigInt(1)) % BigInt(MOD);
61 }
62
63 // Return the result after converting to a number type
64 return Number(answer);
65}
66
Time and Space Complexity
Time Complexity
The time complexity of the provided code can be analyzed as follows:
- The first two loops to compute arrays
f
andg
areO(N)
, whereN = 10001
. - The
comb
function is called for each combination and performs constant operations including multiplication and modulo, which can be consideredO(1)
. - The
for
loop incountGoodSubsequences
runs for the maximum value of counts plus one. In the worst case, this can beO(N)
whereN
is the length of strings
. - Inside this loop, there's another loop iterating over all values in
cnt
which in the worst case isO(26)
since the strings
can only be composed of lowercase English letters. For each of these iterations,comb
is called and followed by constant time operations.
Given that s
represents the length of the input string:
- The worst case for the
cnt
dictionary values to bemax(cnt.values())
is when all characters are the same, thereby iterating the loop incountGoodSubsequences
up to the length of the string s.
Adding everything together, the overall time complexity is O(len(s) * 26)
, which simplifies to O(len(s))
since 26 is a constant and does not depend on the input size.
Space Complexity
The space complexity can be analyzed as follows:
- The arrays
f
andg
, each of sizeN = 10001
, result in a space complexity ofO(N)
. - The
Counter
objectcnt
on the strings
will have up to 26 key-value pairs, leading to a constant space contribution,O(26)
, which isO(1)
. - Temporary variables used for computation inside the loop do not depend on the input size and hence contribute a constant space complexity.
Therefore, the overall space complexity is dominated by the space used for precomputed factorials and inverses, which is O(N)
.
Learn more about how to find time and space complexity quickly using problem constraints.
Which algorithm should you use to find a node that is close to the root of the tree?
Recommended Readings
Math for Technical Interviews How much math do I need to know for technical interviews The short answer is about high school level math Computer science is often associated with math and some universities even place their computer science department under the math faculty However the reality is that you
Backtracking Template Prereq DFS with States problems dfs_with_states Combinatorial search problems Combinatorial search problems involve finding groupings and assignments of objects that satisfy certain conditions Finding all permutations combinations subsets and solving Sudoku are classic combinatorial problems The time complexity of combinatorial problems often grows rapidly with the size of
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Want a Structured Path to Master System Design Too? Don’t Miss This!