2842. Count K-Subsequences of a String With Maximum Beauty
Problem Description
In this problem, we are given a string s
and an integer k
. Our goal is to find the number of subsequences of length k
from s
, such that all characters in the subsequence are unique, and the sum of the occurrences of these characters in s
is the maximum possible. This sum is known as the "beauty" of the subsequence.
A k-subsequence
is defined as a subsequence of s
that has exactly k
characters, which are all unique. We calculate the beauty of a k-subsequence by summing the number of times each of its characters appears in the original string s
(not in the subsequence itself). For example, if s ="abcb"
, the beauty of the subsequence "ac" would be f('a') + f('c')
, which is 1 + 1 = 2
in this case.
The problem asks us to return the count of all k-subsequences
that have the maximum beauty amongst all possible k-subsequences
. The result should be returned modulo 10^9 + 7
to handle potentially large numbers.
Intuition
To solve this problem effectively, we note a couple of key points:
-
The beauty of a
k-subsequence
depends solely on the characters included in it, not their order. Hence, we want to select thek
characters that have the highest frequencies ins
. -
Once we determine the
k
highest frequency characters, the problem boils down to counting all unique combinations of these characters that form validk-subsequences
.
Now, let's build on this intuition:
-
We start by counting the frequency of each character in
s
using a Counter. -
Next, if there are fewer than
k
unique characters in the string, there cannot be any validk-subsequences
, so we return0
. -
We sort the frequencies in descending order because we are interested in the
k
characters with the highest frequencies. -
The beauty of any
k-subsequence
is at least thek
-th highest frequency (represented byval
in the solution). Thus, we find how many characters have thisk
-th highest frequency (x
). -
The solution then calculates the number of ways we can pick the remaining
k-1
characters from those that have a higher frequency thanval
. -
Finally, it uses the combinatorial function to calculate the number of ways to choose the
k
characters with thek
-th highest frequencyval
. It multiplies this with the previously calculated value and takes the whole product modulo10^9 + 7
.
By the end of these steps, we have the total count of maximum beauty k-subsequences modulo 10^9 + 7
.
Solution Approach
The solution to the problem implements the following steps, utilizing a few Python-specific features and utility functions:
-
Counting Character Frequencies:
- The solution uses
Counter
from Python'scollections
library to count the occurrences of each character in the strings
. This gives us a dictionary where keys are characters and values are their respective frequencies.
- The solution uses
-
Shortcut for Impossibility:
- If the unique character count is less than
k
, we cannot form ak-subsequence
, so we return 0.
- If the unique character count is less than
-
Sorting and Selecting Frequencies:
- We sort the frequencies in descending order to focus on the highest frequencies first. The
k-th
highest frequency is found at indexk - 1
because of zero-based indexing. This value is important because it sets the minimum frequency a character must have to be considered for the maximum beauty.
- We sort the frequencies in descending order to focus on the highest frequencies first. The
-
Calculating Combinations for Choosing Characters:
- To use the characters with higher frequencies than the
k-th
highest frequency, we multiply the ways of picking these characters cumulatively. For each such character, we multiplyans
by its frequency modulomod
.
- To use the characters with higher frequencies than the
-
Handling Characters with Equal Frequency to the
k-th
:- Next, we find how many characters,
x
, have the same frequency as thek-th
highest frequencyval
. We need to choosek
from thesex
characters. However, as some characters with higher frequency have already been chosen (k
is decremented accordingly), we must choose the remaining from thesex
characters.
- Next, we find how many characters,
-
Calculations Involving Modular Arithmetic:
- We calculate
comb(x, k)
- the number of ways to choosek
remaining characters fromx
characters with the exact frequencyval
. - The
pow(val, k, mod)
calculatesval^k
(val to the power of k) modulomod
, representing multiplying the beauty combination by the frequencyval
, raised to the power of how many characters we are choosing (k
).
- We calculate
-
Final Calculation and Modulo Operation:
- The final answer is calculated by multiplying these values and taking the result modulo
mod
(10^9 + 7
) to handle large numbers according to the problem statement.
- The final answer is calculated by multiplying these values and taking the result modulo
Here's how the solution implements these steps:
1class Solution:
2 def countKSubsequencesWithMaxBeauty(self, s: str, k: int) -> int:
3 f = Counter(s) # step 1
4 if len(f) < k: # step 2
5 return 0
6 mod = 10**9 + 7
7 vs = sorted(f.values(), reverse=True) # step 3
8 val = vs[k - 1] # this gives us the k-th highest frequency
9 x = vs.count(val) # step 5
10 ans = 1
11 for v in vs: # step 4
12 if v == val:
13 break
14 k -= 1
15 ans = ans * v % mod
16 ans = ans * comb(x, k) * pow(val, k, mod) % mod # step 6 and 7
17 return ans
This approach efficiently calculates the number of k-subsequences
with maximum beauty by combining frequency analysis, sorting, combinatorial mathematics, and modular arithmetic.
Example Walkthrough
Let us consider a small example to illustrate the solution approach. Assume we have the string s = "aabbc"
and k = 2
.
Following the steps of the solution:
-
Counting Character Frequencies:
- Using
Counter(s)
, we determine that the frequency of 'a' is 2, 'b' is 2, and 'c' is 1. We have our frequency dictionary:{'a': 2, 'b': 2, 'c': 1}
.
- Using
-
Shortcut for Impossibility:
- There are more than
k
unique characters; hence ak-subsequence
is possible.
- There are more than
-
Sorting and Selecting Frequencies:
- We sort the frequencies in descending order:
[2, 2, 1]
. - The
k-th
highest frequency is at indexk - 1
, which is1
. So,val = vs[1] = 2
.
- We sort the frequencies in descending order:
-
Calculating Combinations for Choosing Characters:
- As the first two frequencies are equal and are the highest, we have no higher frequency characters to choose from. Therefore,
ans
remains 1.
- As the first two frequencies are equal and are the highest, we have no higher frequency characters to choose from. Therefore,
-
Handling Characters with Equal Frequency to the
k-th
:- There are two characters ('a' and 'b') with the same frequency as
val
, sox = 2
. - Since we have no characters with a higher frequency, we want to choose all
k
characters from thesex
characters with the highest frequency, meaning we want to choose 2 characters from a set of 2 characters.
- There are two characters ('a' and 'b') with the same frequency as
-
Calculations Involving Modular Arithmetic:
- We calculate
comb(x, k)
which iscomb(2, 2)
. Since choosing 2 characters out of 2 is only possible in one way,comb(2, 2) = 1
. - The
pow(val, k, mod)
term is basically2^2 % mod
which equals 4 modulo10^9 + 7
.
- We calculate
-
Final Calculation and Modulo Operation:
- The final answer is
ans * comb(x, k) * pow(val, k, mod) % mod
, which simplifies to1 * 1 * 4 % mod
= 4.
- The final answer is
Therefore, for the string s = "aabbc"
with k = 2
, there are 4 k-subsequences
with the maximum beauty. This is because the beautiful subsequences are: "aa"
, "ab"
, "ba"
, and "bb"
, all having a total frequency sum of 4 in the string s
.
Python Solution
1from collections import Counter
2from math import comb
3
4class Solution:
5 def countKSubsequencesWithMaxBeauty(self, string: str, k: int) -> int:
6 # Create a frequency counter for all characters in the string
7 frequency_counter = Counter(string)
8
9 # If there are fewer types of characters than k, return 0
10 if len(frequency_counter) < k:
11 return 0
12
13 # Define modulo for result to prevent integer overflow
14 mod = 10**9 + 7
15
16 # Sort the frequencies in descending order
17 sorted_frequencies = sorted(frequency_counter.values(), reverse=True)
18
19 # Get the k-th largest frequency, this will be the max beauty
20 max_beauty = sorted_frequencies[k - 1]
21
22 # Count how many times this frequency (max beauty) appears
23 count_of_max_beauty = sorted_frequencies.count(max_beauty)
24
25 # Initialize answer as 1 (multiplicative identity)
26 answer = 1
27
28 # Multiplying the count of characters by which we can not form subsequences (as they are
29 # more frequent than the k-th frequent character) modulo mod to keep the number manageable
30 for freq in sorted_frequencies:
31 if freq == max_beauty:
32 # Once we reach characters with the max beauty value, stop
33 break
34 k -= 1
35 answer = answer * freq % mod
36
37 # Multiply by the number of combinations of choosing k characters from 'count_of_max_beauty'
38 # and the max_beauty^(k) to calculate the beauty value, then take the modulo of the result
39 answer = answer * comb(count_of_max_beauty, k) * pow(max_beauty, k, mod) % mod
40
41 # Return the final answer
42 return answer
43
Java Solution
1import java.util.Arrays;
2
3class Solution {
4 private static final int MOD = (int) 1e9 + 7;
5
6 public int countKSubsequencesWithMaxBeauty(String s, int k) {
7 int[] frequency = new int[26];
8 int n = s.length();
9 int uniqueCharCount = 0;
10
11 // Count frequency of each character in the string s
12 for (int i = 0; i < n; i++) {
13 if (++frequency[s.charAt(i) - 'a'] == 1) {
14 uniqueCharCount++;
15 }
16 }
17
18 // If there are not enough unique characters to form a subsequence of length k
19 if (uniqueCharCount < k) {
20 return 0;
21 }
22
23 Integer[] values = new Integer[uniqueCharCount];
24 for (int i = 0, j = 0; i < 26; i++) {
25 if (frequency[i] > 0) {
26 values[j++] = frequency[i];
27 }
28 }
29
30 // Sort the frequency values in descending order
31 Arrays.sort(values, (a, b) -> b - a);
32
33 long answer = 1;
34 int minValue = values[k - 1];
35 int minFreqCount = 0;
36
37 // Calculate the frequency of the k-th largest unique character
38 for (int v : values) {
39 if (v == minValue) {
40 minFreqCount++;
41 }
42 }
43
44 // Calculate product of first k-1 largest frequencies
45 for (int v : values) {
46 if (v == minValue) {
47 break;
48 }
49 k--;
50 answer = answer * v % MOD;
51 }
52
53 int[][] combinations = new int[minFreqCount + 1][minFreqCount + 1];
54
55 // Initialize the combinations (Pascal's triangle for binomial coefficients)
56 for (int i = 0; i <= minFreqCount; i++) {
57 combinations[i][0] = 1;
58 for (int j = 1; j <= i; j++) {
59 combinations[i][j] = (combinations[i - 1][j - 1] + combinations[i - 1][j]) % MOD;
60 }
61 }
62
63 // Multiply with the number of ways to choose k elements from minFreqCount
64 // and raise minValue to power k
65 answer = (((answer * combinations[minFreqCount][k]) % MOD) * quickPow(minValue, k)) % MOD;
66 return (int) answer;
67 }
68
69 // Helper method to perform fast exponentiation modulo MOD
70 private long quickPow(long base, int exponent) {
71 long result = 1;
72 for (; exponent > 0; exponent >>= 1) {
73 if ((exponent & 1) == 1) {
74 result = result * base % MOD;
75 }
76 base = base * base % MOD;
77 }
78 return result;
79 }
80}
81
C++ Solution
1class Solution {
2public:
3 // Function to count the number of k subsequences with maximum beauty
4 int countKSubsequencesWithMaxBeauty(string s, int k) {
5 // Array to count frequency of letters
6 int frequency[26]{};
7 // Variable to store the distinct letter count
8 int distinctCount = 0;
9
10 // Loop to calculate the frequency of each character in the string
11 for (char& c : s) {
12 if (++frequency[c - 'a'] == 1) {
13 ++distinctCount;
14 }
15 }
16
17 // If there are fewer distinct letters than k, no valid subsequence exists
18 if (distinctCount < k) {
19 return 0;
20 }
21
22 // Vector to store frequencies of distinct letters
23 vector<int> frequencies(distinctCount);
24
25 // Populate the vector with the frequency counts of the letters
26 for (int i = 0, j = 0; i < 26; ++i) {
27 if (frequency[i]) {
28 frequencies[j++] = frequency[i];
29 }
30 }
31
32 // Sort the frequencies in descending order
33 sort(frequencies.rbegin(), frequencies.rend());
34
35 // Modulo for the result to avoid overflow
36 const int mod = 1e9 + 7;
37
38 // Variable to store the answer
39 long long answer = 1;
40
41 // Value of the k-th maximum frequency
42 int kthValue = frequencies[k - 1];
43
44 // Count how many letters have the k-th maximum frequency
45 int equalCount = 0;
46 for (int freq : frequencies) {
47 equalCount += freq == kthValue;
48 }
49
50 // Calculate part of the answer before considering combinations
51 for (int freq : frequencies) {
52 if (freq == kthValue) {
53 break;
54 }
55 --k;
56 answer = answer * freq % mod;
57 }
58
59 // Combinations array (Use Pascal's triangle)
60 int combinations[equalCount + 1][equalCount + 1];
61 memset(combinations, 0, sizeof(combinations));
62
63 for (int i = 0; i <= equalCount; ++i) {
64 combinations[i][0] = 1;
65 for (int j = 1; j <= i; ++j) {
66 combinations[i][j] = (combinations[i - 1][j] + combinations[i - 1][j - 1]) % mod;
67 }
68 }
69
70 // Quick power function to handle large exponents
71 auto quickPower = [&](long long base, int exponent) {
72 long long result = 1;
73 for (; exponent; exponent >>= 1) {
74 if (exponent & 1) {
75 result = result * base % mod;
76 }
77 base = base * base % mod;
78 }
79 return result;
80 };
81
82 // Finalize the answer with combinations and power calculation
83 answer = (answer * combinations[equalCount][k] % mod) * quickPower(kthValue, k) % mod;
84
85 return answer;
86 }
87};
88
Typescript Solution
1function countKSubsequencesWithMaxBeauty(s: string, k: number): number {
2 // Allocate an array to keep counts of each letter appearing in the string.
3 // Initialize an array of length 26 (for each letter of the alphabet) with zeros.
4 const frequency: number[] = new Array(26).fill(0);
5
6 // Initialize a count of unique characters present in the string.
7 let uniqueCharCount = 0;
8
9 // Count the frequency of each character in the input string.
10 for (const char of s) {
11 // Calculate the index by subtracting ASCII of 'a' to get a range from 0 to 25.
12 const index = char.charCodeAt(0) - 97;
13 // Increase the count for this character and increment unique character count if this is the first occurrence.
14 if (++frequency[index] === 1) {
15 ++uniqueCharCount;
16 }
17 }
18
19 // If the unique count is less than k, it is not possible to have k subsequences.
20 if (uniqueCharCount < k) {
21 return 0;
22 }
23
24 // Define modulus as per the problem constraints for large number handling.
25 const mod = BigInt(10 ** 9 + 7);
26
27 // Filter out zero frequency characters and sort the frequencies in descending order.
28 const validFrequencies: number[] = frequency.filter(value => value > 0).sort((a, b) => b - a);
29
30 // The value is the frequency of the k-th most frequent character after sorting.
31 const value = validFrequencies[k - 1];
32
33 // Count how many characters have the same frequency as the value.
34 const countWithValue = validFrequencies.filter(value => value === val).length;
35
36 // Initialize answer as a big integer.
37 let answer = 1n;
38
39 // Multiply answer by the frequency of each character more frequent than the k-th one.
40 for (const freq of validFrequencies) {
41 if (freq === value) {
42 break;
43 }
44 --k;
45 answer = (answer * BigInt(freq)) % mod;
46 }
47
48 // Initialize combination array for calculating combinations.
49 const combinations: number[][] = new Array(countWithValue + 1).fill(0).map(() => new Array(k + 1).fill(0));
50
51 // Populate the combination array using Pascal Triangle approach.
52 for (let i = 0; i <= countWithValue; ++i) {
53 combinations[i][0] = 1;
54 for (let j = 1; j <= Math.min(i, k); ++j) {
55 combinations[i][j] = (combinations[i - 1][j] + combinations[i - 1][j - 1]) % Number(mod);
56 }
57 }
58
59 // Define quick power function to calculate a^n % mod efficiently.
60 const quickPower = (a: bigint, n: number): bigint => {
61 let ans = 1n;
62 for (; n; n >>>= 1) {
63 if (n & 1) {
64 ans = (ans * a) % mod;
65 }
66 a = (a * a) % mod;
67 }
68 return ans;
69 };
70
71 // Complete the answer calculation by incorporating the combinations and quick power results.
72 answer = (((answer * BigInt(combinations[countWithValue][k])) % mod) * quickPower(BigInt(value), k)) % mod;
73
74 // Return the answer converted to a regular number.
75 return Number(answer);
76}
77
Time and Space Complexity
Time Complexity
The overall time complexity of the function countKSubsequencesWithMaxBeauty
is dominated by a few key operations:
-
Constructing the frequency counter (
f = Counter(s)
) with respect to the characters of the strings
, which has a complexity ofO(n)
wheren
is the length of the input strings
. -
Checking the length of the unique elements in the frequency counter (
if len(f) < k
), which isO(1)
because the length operation for a dictionary is constant time. -
Sorting the values of the frequency counter (
vs = sorted(f.values(), reverse=True)
), which will takeO(m log m)
wherem
is the number of distinct characters ins
. This is typically less than or equal ton
, but we consider it separately in cases
is very long but has few distinct characters. -
Counting the frequency of the
k-th
highest value (x = vs.count(val)
), which has a complexity ofO(m)
since it requires traversing the sorted list of frequencies. -
We have a loop to compute the product of values and modulo
ans = ans * v % mod
, but the loop runs at mostm
times, so this isO(m)
. -
Calculating the combination (
comb(x, k)
) which typically can be computed inO(k)
time, and calculating the power (pow(val, k, mod)
) which can be done inO(log k)
time using fast exponentiation.
Since m <= n
, we can say the time complexity is O(n + n log n + k + log k) = O(n log n)
since the log n
term from the sorting will dominate as n
grows large. k
is bounded by n
, hence it doesn't change the overall complexity.
Space Complexity
The space complexity is as follows:
-
The frequency counter
f
which will have at mostm
elements wherem
is the number of unique characters ins
, giving usO(m)
space. -
The sorted list
vs
, which will also be of sizem
, giving us anotherO(m)
space. -
Constant space for auxiliary variables like
mod
,val
,x
,ans
and a few others during iterations and calculations.
Therefore, the space complexity of the code is O(m)
, which can be O(n)
in the worst case when all the characters in s
are unique.
Got a question?ย Ask the Teaching Assistantย anything you don't understand.
Still not clear? Ask in the Forum, ย Discordย orย Submitย the part you don't understand to our editors.