2842. Count K-Subsequences of a String With Maximum Beauty


Problem Description

In this problem, we are given a string s and an integer k. Our goal is to find the number of subsequences of length k from s, such that all characters in the subsequence are unique, and the sum of the occurrences of these characters in s is the maximum possible. This sum is known as the "beauty" of the subsequence.

A k-subsequence is defined as a subsequence of s that has exactly k characters, which are all unique. We calculate the beauty of a k-subsequence by summing the number of times each of its characters appears in the original string s (not in the subsequence itself). For example, if s ="abcb", the beauty of the subsequence "ac" would be f('a') + f('c'), which is 1 + 1 = 2 in this case.

The problem asks us to return the count of all k-subsequences that have the maximum beauty amongst all possible k-subsequences. The result should be returned modulo 10^9 + 7 to handle potentially large numbers.

Intuition

To solve this problem effectively, we note a couple of key points:

  1. The beauty of a k-subsequence depends solely on the characters included in it, not their order. Hence, we want to select the k characters that have the highest frequencies in s.

  2. Once we determine the k highest frequency characters, the problem boils down to counting all unique combinations of these characters that form valid k-subsequences.

Now, let's build on this intuition:

  1. We start by counting the frequency of each character in s using a Counter.

  2. Next, if there are fewer than k unique characters in the string, there cannot be any valid k-subsequences, so we return 0.

  3. We sort the frequencies in descending order because we are interested in the k characters with the highest frequencies.

  4. The beauty of any k-subsequence is at least the k-th highest frequency (represented by val in the solution). Thus, we find how many characters have this k-th highest frequency (x).

  5. The solution then calculates the number of ways we can pick the remaining k-1 characters from those that have a higher frequency than val.

  6. Finally, it uses the combinatorial function to calculate the number of ways to choose the k characters with the k-th highest frequency val. It multiplies this with the previously calculated value and takes the whole product modulo 10^9 + 7.

By the end of these steps, we have the total count of maximum beauty k-subsequences modulo 10^9 + 7.

Solution Approach

The solution to the problem implements the following steps, utilizing a few Python-specific features and utility functions:

  1. Counting Character Frequencies:

    • The solution uses Counter from Python's collections library to count the occurrences of each character in the string s. This gives us a dictionary where keys are characters and values are their respective frequencies.
  2. Shortcut for Impossibility:

    • If the unique character count is less than k, we cannot form a k-subsequence, so we return 0.
  3. Sorting and Selecting Frequencies:

    • We sort the frequencies in descending order to focus on the highest frequencies first. The k-th highest frequency is found at index k - 1 because of zero-based indexing. This value is important because it sets the minimum frequency a character must have to be considered for the maximum beauty.
  4. Calculating Combinations for Choosing Characters:

    • To use the characters with higher frequencies than the k-th highest frequency, we multiply the ways of picking these characters cumulatively. For each such character, we multiply ans by its frequency modulo mod.
  5. Handling Characters with Equal Frequency to the k-th:

    • Next, we find how many characters, x, have the same frequency as the k-th highest frequency val. We need to choose k from these x characters. However, as some characters with higher frequency have already been chosen (k is decremented accordingly), we must choose the remaining from these x characters.
  6. Calculations Involving Modular Arithmetic:

    • We calculate comb(x, k) - the number of ways to choose k remaining characters from x characters with the exact frequency val.
    • The pow(val, k, mod) calculates val^k (val to the power of k) modulo mod, representing multiplying the beauty combination by the frequency val, raised to the power of how many characters we are choosing (k).
  7. Final Calculation and Modulo Operation:

    • The final answer is calculated by multiplying these values and taking the result modulo mod (10^9 + 7) to handle large numbers according to the problem statement.

Here's how the solution implements these steps:

1class Solution:
2    def countKSubsequencesWithMaxBeauty(self, s: str, k: int) -> int:
3        f = Counter(s) # step 1
4        if len(f) < k: # step 2
5            return 0
6        mod = 10**9 + 7
7        vs = sorted(f.values(), reverse=True) # step 3
8        val = vs[k - 1] # this gives us the k-th highest frequency
9        x = vs.count(val) # step 5
10        ans = 1
11        for v in vs: # step 4
12            if v == val:
13                break
14            k -= 1
15            ans = ans * v % mod
16        ans = ans * comb(x, k) * pow(val, k, mod) % mod # step 6 and 7
17        return ans

This approach efficiently calculates the number of k-subsequences with maximum beauty by combining frequency analysis, sorting, combinatorial mathematics, and modular arithmetic.

Example Walkthrough

Let us consider a small example to illustrate the solution approach. Assume we have the string s = "aabbc" and k = 2.

Following the steps of the solution:

  1. Counting Character Frequencies:

    • Using Counter(s), we determine that the frequency of 'a' is 2, 'b' is 2, and 'c' is 1. We have our frequency dictionary: {'a': 2, 'b': 2, 'c': 1}.
  2. Shortcut for Impossibility:

    • There are more than k unique characters; hence a k-subsequence is possible.
  3. Sorting and Selecting Frequencies:

    • We sort the frequencies in descending order: [2, 2, 1].
    • The k-th highest frequency is at index k - 1, which is 1. So, val = vs[1] = 2.
  4. Calculating Combinations for Choosing Characters:

    • As the first two frequencies are equal and are the highest, we have no higher frequency characters to choose from. Therefore, ans remains 1.
  5. Handling Characters with Equal Frequency to the k-th:

    • There are two characters ('a' and 'b') with the same frequency as val, so x = 2.
    • Since we have no characters with a higher frequency, we want to choose all k characters from these x characters with the highest frequency, meaning we want to choose 2 characters from a set of 2 characters.
  6. Calculations Involving Modular Arithmetic:

    • We calculate comb(x, k) which is comb(2, 2). Since choosing 2 characters out of 2 is only possible in one way, comb(2, 2) = 1.
    • The pow(val, k, mod) term is basically 2^2 % mod which equals 4 modulo 10^9 + 7.
  7. Final Calculation and Modulo Operation:

    • The final answer is ans * comb(x, k) * pow(val, k, mod) % mod, which simplifies to 1 * 1 * 4 % mod = 4.

Therefore, for the string s = "aabbc" with k = 2, there are 4 k-subsequences with the maximum beauty. This is because the beautiful subsequences are: "aa", "ab", "ba", and "bb", all having a total frequency sum of 4 in the string s.

Python Solution

1from collections import Counter
2from math import comb
3
4class Solution:
5    def countKSubsequencesWithMaxBeauty(self, string: str, k: int) -> int:
6        # Create a frequency counter for all characters in the string
7        frequency_counter = Counter(string)
8      
9        # If there are fewer types of characters than k, return 0
10        if len(frequency_counter) < k:
11            return 0
12      
13        # Define modulo for result to prevent integer overflow
14        mod = 10**9 + 7
15      
16        # Sort the frequencies in descending order
17        sorted_frequencies = sorted(frequency_counter.values(), reverse=True)
18      
19        # Get the k-th largest frequency, this will be the max beauty
20        max_beauty = sorted_frequencies[k - 1]
21      
22        # Count how many times this frequency (max beauty) appears
23        count_of_max_beauty = sorted_frequencies.count(max_beauty)
24      
25        # Initialize answer as 1 (multiplicative identity)
26        answer = 1
27      
28        # Multiplying the count of characters by which we can not form subsequences (as they are 
29        # more frequent than the k-th frequent character) modulo mod to keep the number manageable
30        for freq in sorted_frequencies:
31            if freq == max_beauty:
32                # Once we reach characters with the max beauty value, stop
33                break
34            k -= 1
35            answer = answer * freq % mod
36      
37        # Multiply by the number of combinations of choosing k characters from 'count_of_max_beauty'
38        # and the max_beauty^(k) to calculate the beauty value, then take the modulo of the result
39        answer = answer * comb(count_of_max_beauty, k) * pow(max_beauty, k, mod) % mod
40      
41        # Return the final answer
42        return answer
43

Java Solution

1import java.util.Arrays;
2
3class Solution {
4    private static final int MOD = (int) 1e9 + 7;
5
6    public int countKSubsequencesWithMaxBeauty(String s, int k) {
7        int[] frequency = new int[26];
8        int n = s.length();
9        int uniqueCharCount = 0;
10
11        // Count frequency of each character in the string s
12        for (int i = 0; i < n; i++) {
13            if (++frequency[s.charAt(i) - 'a'] == 1) {
14                uniqueCharCount++;
15            }
16        }
17
18        // If there are not enough unique characters to form a subsequence of length k
19        if (uniqueCharCount < k) {
20            return 0;
21        }
22
23        Integer[] values = new Integer[uniqueCharCount];
24        for (int i = 0, j = 0; i < 26; i++) {
25            if (frequency[i] > 0) {
26                values[j++] = frequency[i];
27            }
28        }
29
30        // Sort the frequency values in descending order
31        Arrays.sort(values, (a, b) -> b - a);
32
33        long answer = 1;
34        int minValue = values[k - 1];
35        int minFreqCount = 0;
36
37        // Calculate the frequency of the k-th largest unique character
38        for (int v : values) {
39            if (v == minValue) {
40                minFreqCount++;
41            }
42        }
43
44        // Calculate product of first k-1 largest frequencies
45        for (int v : values) {
46            if (v == minValue) {
47                break;
48            }
49            k--;
50            answer = answer * v % MOD;
51        }
52
53        int[][] combinations = new int[minFreqCount + 1][minFreqCount + 1];
54
55        // Initialize the combinations (Pascal's triangle for binomial coefficients)
56        for (int i = 0; i <= minFreqCount; i++) {
57            combinations[i][0] = 1;
58            for (int j = 1; j <= i; j++) {
59                combinations[i][j] = (combinations[i - 1][j - 1] + combinations[i - 1][j]) % MOD;
60            }
61        }
62
63        // Multiply with the number of ways to choose k elements from minFreqCount
64        // and raise minValue to power k
65        answer = (((answer * combinations[minFreqCount][k]) % MOD) * quickPow(minValue, k)) % MOD;
66        return (int) answer;
67    }
68
69    // Helper method to perform fast exponentiation modulo MOD
70    private long quickPow(long base, int exponent) {
71        long result = 1;
72        for (; exponent > 0; exponent >>= 1) {
73            if ((exponent & 1) == 1) {
74                result = result * base % MOD;
75            }
76            base = base * base % MOD;
77        }
78        return result;
79    }
80}
81

C++ Solution

1class Solution {
2public:
3    // Function to count the number of k subsequences with maximum beauty
4    int countKSubsequencesWithMaxBeauty(string s, int k) {
5        // Array to count frequency of letters
6        int frequency[26]{};
7        // Variable to store the distinct letter count
8        int distinctCount = 0;
9
10        // Loop to calculate the frequency of each character in the string
11        for (char& c : s) {
12            if (++frequency[c - 'a'] == 1) {
13                ++distinctCount;
14            }
15        }
16      
17        // If there are fewer distinct letters than k, no valid subsequence exists
18        if (distinctCount < k) {
19            return 0;
20        }
21
22        // Vector to store frequencies of distinct letters
23        vector<int> frequencies(distinctCount);
24
25        // Populate the vector with the frequency counts of the letters
26        for (int i = 0, j = 0; i < 26; ++i) {
27            if (frequency[i]) {
28                frequencies[j++] = frequency[i];
29            }
30        }
31
32        // Sort the frequencies in descending order
33        sort(frequencies.rbegin(), frequencies.rend());
34
35        // Modulo for the result to avoid overflow
36        const int mod = 1e9 + 7;
37
38        // Variable to store the answer
39        long long answer = 1;
40
41        // Value of the k-th maximum frequency
42        int kthValue = frequencies[k - 1];
43
44        // Count how many letters have the k-th maximum frequency
45        int equalCount = 0;
46        for (int freq : frequencies) {
47            equalCount += freq == kthValue;
48        }
49
50        // Calculate part of the answer before considering combinations
51        for (int freq : frequencies) {
52            if (freq == kthValue) {
53                break;
54            }
55            --k;
56            answer = answer * freq % mod;
57        }
58
59        // Combinations array (Use Pascal's triangle)
60        int combinations[equalCount + 1][equalCount + 1];
61        memset(combinations, 0, sizeof(combinations));
62      
63        for (int i = 0; i <= equalCount; ++i) {
64            combinations[i][0] = 1;
65            for (int j = 1; j <= i; ++j) {
66                combinations[i][j] = (combinations[i - 1][j] + combinations[i - 1][j - 1]) % mod;
67            }
68        }
69
70        // Quick power function to handle large exponents
71        auto quickPower = [&](long long base, int exponent) {
72            long long result = 1;
73            for (; exponent; exponent >>= 1) {
74                if (exponent & 1) {
75                    result = result * base % mod;
76                }
77                base = base * base % mod;
78            }
79            return result;
80        };
81
82        // Finalize the answer with combinations and power calculation
83        answer = (answer * combinations[equalCount][k] % mod) * quickPower(kthValue, k) % mod;
84      
85        return answer;
86    }
87};
88

Typescript Solution

1function countKSubsequencesWithMaxBeauty(s: string, k: number): number {
2    // Allocate an array to keep counts of each letter appearing in the string.
3    // Initialize an array of length 26 (for each letter of the alphabet) with zeros.
4    const frequency: number[] = new Array(26).fill(0);
5
6    // Initialize a count of unique characters present in the string.
7    let uniqueCharCount = 0;
8  
9    // Count the frequency of each character in the input string.
10    for (const char of s) {
11        // Calculate the index by subtracting ASCII of 'a' to get a range from 0 to 25.
12        const index = char.charCodeAt(0) - 97;
13        // Increase the count for this character and increment unique character count if this is the first occurrence.
14        if (++frequency[index] === 1) {
15            ++uniqueCharCount;
16        }
17    }
18  
19    // If the unique count is less than k, it is not possible to have k subsequences.
20    if (uniqueCharCount < k) {
21        return 0;
22    }
23  
24    // Define modulus as per the problem constraints for large number handling.
25    const mod = BigInt(10 ** 9 + 7);
26  
27    // Filter out zero frequency characters and sort the frequencies in descending order.
28    const validFrequencies: number[] = frequency.filter(value => value > 0).sort((a, b) => b - a);
29  
30    // The value is the frequency of the k-th most frequent character after sorting.
31    const value = validFrequencies[k - 1];
32  
33    // Count how many characters have the same frequency as the value.
34    const countWithValue = validFrequencies.filter(value => value === val).length;
35  
36    // Initialize answer as a big integer.
37    let answer = 1n;
38  
39    // Multiply answer by the frequency of each character more frequent than the k-th one.
40    for (const freq of validFrequencies) {
41        if (freq === value) {
42            break;
43        }
44        --k;
45        answer = (answer * BigInt(freq)) % mod;
46    }
47  
48    // Initialize combination array for calculating combinations.
49    const combinations: number[][] = new Array(countWithValue + 1).fill(0).map(() => new Array(k + 1).fill(0));
50  
51    // Populate the combination array using Pascal Triangle approach.
52    for (let i = 0; i <= countWithValue; ++i) {
53        combinations[i][0] = 1;
54        for (let j = 1; j <= Math.min(i, k); ++j) {
55            combinations[i][j] = (combinations[i - 1][j] + combinations[i - 1][j - 1]) % Number(mod);
56        }
57    }
58  
59    // Define quick power function to calculate a^n % mod efficiently.
60    const quickPower = (a: bigint, n: number): bigint => {
61        let ans = 1n;
62        for (; n; n >>>= 1) {
63            if (n & 1) {
64                ans = (ans * a) % mod;
65            }
66            a = (a * a) % mod;
67        }
68        return ans;
69    };
70  
71    // Complete the answer calculation by incorporating the combinations and quick power results.
72    answer = (((answer * BigInt(combinations[countWithValue][k])) % mod) * quickPower(BigInt(value), k)) % mod;
73  
74    // Return the answer converted to a regular number.
75    return Number(answer);
76}
77

Time and Space Complexity

Time Complexity

The overall time complexity of the function countKSubsequencesWithMaxBeauty is dominated by a few key operations:

  1. Constructing the frequency counter (f = Counter(s)) with respect to the characters of the string s, which has a complexity of O(n) where n is the length of the input string s.

  2. Checking the length of the unique elements in the frequency counter (if len(f) < k), which is O(1) because the length operation for a dictionary is constant time.

  3. Sorting the values of the frequency counter (vs = sorted(f.values(), reverse=True)), which will take O(m log m) where m is the number of distinct characters in s. This is typically less than or equal to n, but we consider it separately in case s is very long but has few distinct characters.

  4. Counting the frequency of the k-th highest value (x = vs.count(val)), which has a complexity of O(m) since it requires traversing the sorted list of frequencies.

  5. We have a loop to compute the product of values and modulo ans = ans * v % mod, but the loop runs at most m times, so this is O(m).

  6. Calculating the combination (comb(x, k)) which typically can be computed in O(k) time, and calculating the power (pow(val, k, mod)) which can be done in O(log k) time using fast exponentiation.

Since m <= n, we can say the time complexity is O(n + n log n + k + log k) = O(n log n) since the log n term from the sorting will dominate as n grows large. k is bounded by n, hence it doesn't change the overall complexity.

Space Complexity

The space complexity is as follows:

  1. The frequency counter f which will have at most m elements where m is the number of unique characters in s, giving us O(m) space.

  2. The sorted list vs, which will also be of size m, giving us another O(m) space.

  3. Constant space for auxiliary variables like mod, val, x, ans and a few others during iterations and calculations.

Therefore, the space complexity of the code is O(m), which can be O(n) in the worst case when all the characters in s are unique.


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫