Facebook Pixel

2842. Count K-Subsequences of a String With Maximum Beauty

Problem Description

You are given a string s and an integer k. Your task is to find k-subsequences with maximum beauty value.

A k-subsequence is defined as:

  • A subsequence of string s with exactly length k
  • All characters in the subsequence must be unique (each character appears only once in the subsequence)

The beauty of a k-subsequence is calculated as the sum of f(c) for every character c in that subsequence, where f(c) is the total number of times character c appears in the original string s (not in the subsequence).

For example, with s = "abbbdd" and k = 2:

  • Character frequencies: f('a') = 1, f('b') = 3, f('d') = 2
  • Some possible k-subsequences:
    • "ab" has beauty = f('a') + f('b') = 1 + 3 = 4
    • "ad" has beauty = f('a') + f('d') = 1 + 2 = 3
    • "bd" has beauty = f('b') + f('d') = 3 + 2 = 5

You need to:

  1. Find the maximum possible beauty value among all k-subsequences
  2. Count how many k-subsequences achieve this maximum beauty
  3. Return the count modulo 10^9 + 7

Important notes:

  • Two k-subsequences are considered different if they use different indices from the original string, even if they form the same character sequence
  • If it's impossible to form a k-subsequence (when there are fewer than k distinct characters in s), return 0

The challenge involves identifying which characters to include in the k-subsequence to maximize beauty, then counting all possible ways to form such subsequences from the original string positions.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

To maximize the beauty of a k-subsequence, we need to choose characters that have the highest frequencies in the original string. Think of it this way: since beauty is the sum of f(c) for each character in our subsequence, we want to include characters with the largest f(c) values.

The greedy approach works here - we should always pick the k characters with the highest frequencies. If we have characters with frequencies [5, 4, 3, 2, 1] and need to pick k=3 characters, we'd pick those with frequencies [5, 4, 3] to get maximum beauty of 5 + 4 + 3 = 12.

However, there's a complication when multiple characters have the same frequency. For example, if frequencies are [5, 4, 3, 3, 3, 2] and k=4, we must pick the characters with frequencies [5, 4] and then choose 2 out of the 3 characters that have frequency 3.

This leads us to the key insight:

  1. First, sort all character frequencies in descending order
  2. The k-th largest frequency (val) determines our cutoff point
  3. All characters with frequency greater than val must be included
  4. For characters with frequency exactly equal to val, we need to choose some of them

The counting part requires combinatorial mathematics:

  • For characters with frequency > val: each occurrence in the string can be chosen independently, giving us frequency choices per character
  • For characters with frequency = val: we need to choose exactly remaining_k characters from x available characters (where x is the count of characters with this frequency), giving us C(x, remaining_k) ways
  • Each selected character with frequency val contributes val choices for positions in the original string

The final answer is the product of all these choices: ans = (product of frequencies > val) × C(x, remaining_k) × val^remaining_k

This approach ensures we find all k-subsequences with maximum beauty by systematically counting all valid combinations of the highest-frequency characters.

Learn more about Greedy, Math and Combinatorics patterns.

Solution Approach

The implementation follows these key steps:

  1. Character Frequency Counting: Use a Counter (hash table) f to count the occurrence of each character in string s. This gives us f[c] for each character.

  2. Early Termination Check: If the number of distinct characters (len(f)) is less than k, it's impossible to form a k-subsequence with unique characters, so return 0.

  3. Sort Frequencies: Extract all frequency values from f and sort them in descending order into array vs. This allows us to greedily select characters with highest frequencies.

  4. Find Cutoff Frequency: The k-th largest frequency val = vs[k-1] determines our cutoff. Characters with this frequency may or may not be fully included in the optimal subsequences.

  5. Count Characters at Cutoff: Count how many characters have frequency exactly equal to val. Store this as x = vs.count(val).

  6. Calculate Answer in Two Parts:

    Part 1 - Characters with frequency > val:

    • Iterate through vs and for each frequency v > val:
      • Multiply answer by v (each occurrence can be chosen)
      • Decrement k to track remaining slots
    • After this loop, k represents how many characters with frequency val we need to select

    Part 2 - Characters with frequency = val:

    • We need to choose k characters from x available characters: multiply by comb(x, k)
    • Each selected character contributes val possible positions: multiply by pow(val, k, mod)
  7. Modular Arithmetic: All multiplications are done modulo 10^9 + 7 to prevent overflow.

The final formula becomes:

ans = (∏ v for all v > val) × C(x, k) × val^k (mod 10^9 + 7)

This approach efficiently counts all k-subsequences with maximum beauty by:

  • Using greedy selection to identify which characters must be included
  • Using combinatorics to count different ways to select characters at the cutoff frequency
  • Using modular exponentiation for efficient computation of large powers

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through an example with s = "aabbccd" and k = 3.

Step 1: Count character frequencies

  • f('a') = 2
  • f('b') = 2
  • f('c') = 2
  • f('d') = 1

Step 2: Check if k-subsequence is possible

  • We have 4 distinct characters, and k = 3, so it's possible (4 ≥ 3) ✓

Step 3: Sort frequencies in descending order

  • vs = [2, 2, 2, 1]

Step 4: Find the cutoff frequency

  • The k-th (3rd) largest frequency is val = vs[k-1] = vs[2] = 2
  • This means we need at least characters with frequency ≥ 2

Step 5: Count characters at cutoff

  • Characters with frequency exactly 2: 'a', 'b', 'c' → count x = 3
  • Characters with frequency > 2: none

Step 6: Calculate the answer

Since all top frequencies equal val = 2, we skip Part 1 and go directly to Part 2:

  • We need to choose k = 3 characters from x = 3 available characters
  • Number of ways to choose: C(3, 3) = 1
  • Each chosen character has 2 occurrences in the string, so each contributes factor of 2
  • Total combinations: 1 × 2^3 = 1 × 8 = 8

Verification of the 8 subsequences: The maximum beauty is 2 + 2 + 2 = 6 (choosing any 3 characters from {a, b, c})

Since we must choose all three characters {a, b, c}, and each appears twice in the string:

  • For 'a': can choose index 0 or 1 (2 ways)
  • For 'b': can choose index 2 or 3 (2 ways)
  • For 'c': can choose index 4 or 5 (2 ways)
  • Total: 2 × 2 × 2 = 8 different k-subsequences

All 8 k-subsequences have the same character set "abc" but use different index combinations from the original string, each achieving the maximum beauty of 6.

Solution Implementation

1from collections import Counter
2from math import comb
3
4class Solution:
5    def countKSubsequencesWithMaxBeauty(self, s: str, k: int) -> int:
6        # Count frequency of each character in the string
7        char_frequency = Counter(s)
8      
9        # If there are fewer unique characters than k, no valid subsequence exists
10        if len(char_frequency) < k:
11            return 0
12      
13        MOD = 10**9 + 7
14      
15        # Sort character frequencies in descending order
16        frequency_values = sorted(char_frequency.values(), reverse=True)
17      
18        # Find the k-th largest frequency value (threshold frequency)
19        threshold_freq = frequency_values[k - 1]
20      
21        # Count how many characters have the threshold frequency
22        num_chars_with_threshold = frequency_values.count(threshold_freq)
23      
24        # Initialize result to 1 (multiplicative identity)
25        result = 1
26      
27        # Process all frequencies greater than the threshold
28        # These characters must be included in every valid subsequence
29        for freq in frequency_values:
30            if freq == threshold_freq:
31                break
32            k -= 1  # Reduce remaining slots to fill
33            result = (result * freq) % MOD
34      
35        # For characters with threshold frequency, we need to choose k from num_chars_with_threshold
36        # Multiply by combinations * (threshold_freq ^ k) for the remaining slots
37        result = (result * comb(num_chars_with_threshold, k) * pow(threshold_freq, k, MOD)) % MOD
38      
39        return result
40
1class Solution {
2    private final int MOD = (int) 1e9 + 7;
3
4    public int countKSubsequencesWithMaxBeauty(String s, int k) {
5        // Count frequency of each character
6        int[] frequency = new int[26];
7        int stringLength = s.length();
8        int distinctCharCount = 0;
9      
10        // Calculate frequency and count distinct characters
11        for (int i = 0; i < stringLength; i++) {
12            int charIndex = s.charAt(i) - 'a';
13            if (++frequency[charIndex] == 1) {
14                distinctCharCount++;
15            }
16        }
17      
18        // If we don't have enough distinct characters, return 0
19        if (distinctCharCount < k) {
20            return 0;
21        }
22      
23        // Create array of non-zero frequencies
24        Integer[] frequencies = new Integer[distinctCharCount];
25        int index = 0;
26        for (int i = 0; i < 26; i++) {
27            if (frequency[i] > 0) {
28                frequencies[index++] = frequency[i];
29            }
30        }
31      
32        // Sort frequencies in descending order
33        Arrays.sort(frequencies, (a, b) -> b - a);
34      
35        // Initialize result
36        long result = 1;
37      
38        // Get the k-th largest frequency value (boundary value)
39        int boundaryFrequency = frequencies[k - 1];
40      
41        // Count how many characters have the boundary frequency
42        int boundaryFrequencyCount = 0;
43        for (int freq : frequencies) {
44            if (freq == boundaryFrequency) {
45                boundaryFrequencyCount++;
46            }
47        }
48      
49        // Multiply result by frequencies greater than boundary
50        for (int freq : frequencies) {
51            if (freq == boundaryFrequency) {
52                break;
53            }
54            k--;
55            result = result * freq % MOD;
56        }
57      
58        // Build Pascal's triangle for combination calculation
59        int[][] combinations = new int[boundaryFrequencyCount + 1][boundaryFrequencyCount + 1];
60        for (int i = 0; i <= boundaryFrequencyCount; i++) {
61            combinations[i][0] = 1;
62            for (int j = 1; j <= i; j++) {
63                combinations[i][j] = (combinations[i - 1][j - 1] + combinations[i - 1][j]) % MOD;
64            }
65        }
66      
67        // Calculate final result:
68        // result * C(boundaryFrequencyCount, k) * boundaryFrequency^k
69        result = ((result * combinations[boundaryFrequencyCount][k]) % MOD) * 
70                 quickPower(boundaryFrequency, k) % MOD;
71      
72        return (int) result;
73    }
74
75    /**
76     * Calculate (base^exponent) % MOD using binary exponentiation
77     */
78    private long quickPower(long base, int exponent) {
79        long result = 1;
80        while (exponent > 0) {
81            // If current bit is 1, multiply result by base
82            if ((exponent & 1) == 1) {
83                result = result * base % MOD;
84            }
85            // Square the base for next bit
86            base = base * base % MOD;
87            // Move to next bit
88            exponent >>= 1;
89        }
90        return result;
91    }
92}
93
1class Solution {
2public:
3    int countKSubsequencesWithMaxBeauty(string s, int k) {
4        // Count frequency of each character
5        int frequency[26]{};
6        int distinctChars = 0;
7      
8        for (char& c : s) {
9            if (++frequency[c - 'a'] == 1) {
10                distinctChars++;
11            }
12        }
13      
14        // If distinct characters less than k, impossible to form subsequence
15        if (distinctChars < k) {
16            return 0;
17        }
18      
19        // Collect non-zero frequencies
20        vector<int> frequencies(distinctChars);
21        for (int i = 0, j = 0; i < 26; ++i) {
22            if (frequency[i]) {
23                frequencies[j++] = frequency[i];
24            }
25        }
26      
27        // Sort frequencies in descending order
28        sort(frequencies.rbegin(), frequencies.rend());
29      
30        const int MOD = 1e9 + 7;
31        long long result = 1;
32      
33        // Find the k-th largest frequency value
34        int kthFrequency = frequencies[k - 1];
35      
36        // Count how many characters have the k-th frequency
37        int countWithKthFreq = 0;
38        for (int freq : frequencies) {
39            countWithKthFreq += (freq == kthFrequency);
40        }
41      
42        // Multiply frequencies greater than k-th frequency
43        for (int freq : frequencies) {
44            if (freq == kthFrequency) {
45                break;
46            }
47            k--;
48            result = result * freq % MOD;
49        }
50      
51        // Build Pascal's triangle for combination calculation
52        int combinations[countWithKthFreq + 1][countWithKthFreq + 1];
53        memset(combinations, 0, sizeof(combinations));
54      
55        for (int i = 0; i <= countWithKthFreq; ++i) {
56            combinations[i][0] = 1;
57            for (int j = 1; j <= i; ++j) {
58                combinations[i][j] = (combinations[i - 1][j] + combinations[i - 1][j - 1]) % MOD;
59            }
60        }
61      
62        // Fast exponentiation function
63        auto quickPower = [&](long long base, int exponent) {
64            long long answer = 1;
65            while (exponent > 0) {
66                if (exponent & 1) {
67                    answer = answer * base % MOD;
68                }
69                base = base * base % MOD;
70                exponent >>= 1;
71            }
72            return answer;
73        };
74      
75        // Calculate final result: result * C(countWithKthFreq, k) * kthFrequency^k
76        result = (result * combinations[countWithKthFreq][k] % MOD) * quickPower(kthFrequency, k) % MOD;
77      
78        return result;
79    }
80};
81
1/**
2 * Counts k-subsequences with maximum beauty from string s
3 * Beauty is defined as the product of character frequencies in the subsequence
4 * @param s - Input string
5 * @param k - Number of distinct characters to select
6 * @returns Number of k-subsequences with maximum beauty modulo 10^9 + 7
7 */
8function countKSubsequencesWithMaxBeauty(s: string, k: number): number {
9    // Initialize frequency array for 26 lowercase letters
10    const charFrequencies: number[] = new Array(26).fill(0);
11    let distinctCharCount = 0;
12  
13    // Count frequency of each character and track distinct characters
14    for (const char of s) {
15        const charIndex = char.charCodeAt(0) - 97; // Convert 'a'-'z' to 0-25
16        if (++charFrequencies[charIndex] === 1) {
17            ++distinctCharCount;
18        }
19    }
20  
21    // If we don't have enough distinct characters, return 0
22    if (distinctCharCount < k) {
23        return 0;
24    }
25  
26    const MOD = BigInt(10 ** 9 + 7);
27  
28    // Get non-zero frequencies and sort in descending order
29    const nonZeroFrequencies: number[] = charFrequencies
30        .filter(freq => freq > 0)
31        .sort((a, b) => b - a);
32  
33    // Find the k-th largest frequency value
34    const kthLargestFreq = nonZeroFrequencies[k - 1];
35  
36    // Count how many characters have the k-th largest frequency
37    const countOfKthFreq = nonZeroFrequencies.filter(freq => freq === kthLargestFreq).length;
38  
39    let result = 1n;
40  
41    // Multiply frequencies larger than the k-th largest
42    for (const freq of nonZeroFrequencies) {
43        if (freq === kthLargestFreq) {
44            break;
45        }
46        --k;
47        result = (result * BigInt(freq)) % MOD;
48    }
49  
50    // Build Pascal's triangle for combinations C(n, r)
51    const combinations: number[][] = new Array(countOfKthFreq + 1)
52        .fill(0)
53        .map(() => new Array(k + 1).fill(0));
54  
55    for (let n = 0; n <= countOfKthFreq; ++n) {
56        combinations[n][0] = 1;
57        for (let r = 1; r <= Math.min(n, k); ++r) {
58            combinations[n][r] = (combinations[n - 1][r] + combinations[n - 1][r - 1]) % Number(MOD);
59        }
60    }
61  
62    // Calculate final result: result * C(countOfKthFreq, k) * kthLargestFreq^k
63    result = (((result * BigInt(combinations[countOfKthFreq][k])) % MOD) * 
64              quickPower(BigInt(kthLargestFreq), k)) % MOD;
65  
66    return Number(result);
67}
68
69/**
70 * Performs fast modular exponentiation
71 * @param base - Base number
72 * @param exponent - Exponent value
73 * @returns (base^exponent) % MOD
74 */
75function quickPower(base: bigint, exponent: number): bigint {
76    const MOD = BigInt(10 ** 9 + 7);
77    let result = 1n;
78  
79    while (exponent > 0) {
80        // If current bit is 1, multiply result by base
81        if (exponent & 1) {
82            result = (result * base) % MOD;
83        }
84        // Square the base for next bit
85        base = (base * base) % MOD;
86        // Right shift to process next bit
87        exponent >>>= 1;
88    }
89  
90    return result;
91}
92

Time and Space Complexity

The time complexity is O(n + |Σ| log |Σ|), where n is the length of the string and |Σ| is the size of the character set (at most 26 for lowercase letters).

  • Creating the frequency counter takes O(n) time as we iterate through the string once
  • Sorting the frequency values takes O(|Σ| log |Σ|) time, which is O(26 log 26) = O(1) for lowercase letters
  • Counting occurrences of val takes O(|Σ|) time
  • The first loop iterates at most k times, which is at most |Σ| times, taking O(|Σ|) time
  • The combination calculation comb(x, k) takes O(min(k, x-k)) time, which is at most O(|Σ|)
  • The modular exponentiation pow(val, k, mod) takes O(log k) time

Since |Σ| = 26 is a constant for lowercase letters, the overall time complexity simplifies to O(n).

The space complexity is O(|Σ|), where |Σ| is the size of the character set.

  • The Counter object stores at most |Σ| key-value pairs
  • The sorted values list vs contains at most |Σ| elements
  • Other variables use O(1) space

For lowercase letters where |Σ| = 26, this is effectively O(1) constant space relative to the input size.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect Handling of Integer Overflow

The most critical pitfall is forgetting to apply modulo operations consistently throughout the calculation. The combination value comb(num_chars_with_threshold, k) can become extremely large, and multiplying it with other values without proper modulo handling will cause integer overflow.

Problem Code:

# This will overflow for large values
result = (result * comb(num_chars_with_threshold, k) * pow(threshold_freq, k, MOD)) % MOD

Solution: Apply modulo to the combination result before multiplication:

combination_value = comb(num_chars_with_threshold, k) % MOD
result = (result * combination_value % MOD * pow(threshold_freq, k, MOD)) % MOD

2. Misunderstanding the Beauty Calculation

A common mistake is thinking that beauty is calculated based on character frequency within the subsequence itself, rather than the frequency in the original string.

Wrong Interpretation:

  • Counting how many times 'b' appears in subsequence "bd"
  • This would give beauty = 1 + 1 = 2

Correct Interpretation:

  • Using original string frequencies: f('b') = 3, f('d') = 2
  • Beauty of "bd" = 3 + 2 = 5

3. Off-by-One Error in Finding Threshold

Using the wrong index when finding the k-th largest frequency can lead to incorrect results.

Problem Code:

# Wrong: using k instead of k-1
threshold_freq = frequency_values[k]  # This gets the (k+1)-th element

Solution:

# Correct: arrays are 0-indexed
threshold_freq = frequency_values[k - 1]

4. Not Handling the Case Where All Characters Have Same Frequency

When all characters have the same frequency and k equals the number of distinct characters, the algorithm might incorrectly calculate combinations.

Example: s = "abc", k = 3

  • All frequencies are 1
  • The loop for frequencies > threshold won't execute
  • Must ensure k is properly maintained for the combination calculation

Solution: Track k carefully through the algorithm and ensure the final combination calculation uses the correct remaining value of k.

5. Confusion Between Character Count and Occurrence Count

Mixing up the number of distinct characters with a certain frequency versus the frequency value itself.

Problem Code:

# Wrong: using frequency value instead of count of characters
num_chars_with_threshold = threshold_freq  # This is the frequency, not the count

Solution:

# Correct: count how many characters have this frequency
num_chars_with_threshold = frequency_values.count(threshold_freq)

6. Inefficient Power Calculation

Using regular Python exponentiation without modulo can cause performance issues and overflow.

Problem Code:

# Inefficient and may overflow
result = (result * (threshold_freq ** k)) % MOD

Solution:

# Use built-in modular exponentiation
result = (result * pow(threshold_freq, k, MOD)) % MOD
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which two pointer techniques do you use to check if a string is a palindrome?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More