Facebook Pixel

3545. Minimum Deletions for At Most K Distinct Characters

Problem Description

You are given a string s consisting of lowercase English letters, and an integer k.

Your task is to delete some (possibly none) of the characters in the string so that the number of distinct characters in the resulting string is at most k.

Return the minimum number of deletions required to achieve this.

Intuition

The main goal is to limit the number of distinct characters in the string s to at most k by performing the fewest deletions. To tackle this, we employ a greedy strategy along with frequency counting. Here is a breakdown of the approach:

  1. Character Frequency Counting: First, count the frequency of each character in the string s. This helps us understand which characters are more frequent and potentially more costly to remove.

  2. Sorting Frequencies: Once the frequencies are counted, we sort them. By sorting, we order the characters by how often they appear, from least to most. This allows us to make informed decisions about which characters to delete.

  3. Greedy Deletion: To minimize deletions, we should remove characters that have the lowest frequencies, as these are the least costly in terms of resulting impact on the number of deletions. Since we want to limit the number of distinct characters to k, we calculate how many characters can remain, which is at most k.

  4. Calculation of Minimum Deletions: Finally, to ensure that the total number of distinct characters is reduced to k, sum up the frequencies of the least frequent characters that exceed the top k most frequent ones. The sum of these frequencies gives the minimum deletions required to achieve the desired number of distinct characters.

This approach balances maintaining frequent characters while minimizing deletions, arriving at an optimal solution efficiently.

Solution Approach

To solve the problem of reducing the number of distinct characters to at most k, we can utilize a counting mechanism followed by a greedy strategy to determine the minimum deletions required. Here’s how the implementation is structured:

  1. Character Frequency Counting: Use the Counter from Python's collections module to efficiently count the frequency of each character in the string s. This step provides us with a dictionary where keys are characters and values are their respective counts.

    from collections import Counter
    frequency_count = Counter(s)
  2. Sorting Frequencies: Once we have the frequencies, extract them and sort them in ascending order. This sorts characters based on how frequently they appear in s, which helps in identifying the least frequent characters that are candidates for removal.

    sorted_frequencies = sorted(frequency_count.values())
  3. Calculate Minimum Deletions: The key is to reduce the number of distinct characters to k. Therefore, if there are more than k distinct characters, we need to delete some.

    • Determine how many deletions are necessary by subtracting k from the total number of distinct characters (or frequencies).
    • Sum the frequencies of the least frequent characters (based on sorting) that are in excess of k to calculate the minimum deletions required.
    deletions_needed = sum(sorted_frequencies[:len(sorted_frequencies) - k])
  4. Return Result: The sum obtained gives the minimum number of deletions required to limit the number of distinct characters to k.

This approach leverages the combination of counting, sorting, and a greedy selection process to strategically determine the minimal deletions necessary. The implementation is straightforward and efficient, mainly due to the simplicity of counting and sorting operations in Python.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's take a small example to illustrate the solution approach. Consider the string s = "aacccbbb" and k = 2. Our goal is to delete characters such that there are at most 2 distinct characters left.

Step 1: Character Frequency Counting

First, count the frequency of each character in s. Using Python's collections.Counter, we get:

  • 'a': 2
  • 'c': 3
  • 'b': 3
from collections import Counter
frequency_count = Counter("aacccbbb")
# Output: Counter({'c': 3, 'b': 3, 'a': 2})

Step 2: Sorting Frequencies

Extract these frequencies and sort them in ascending order to prioritize removing the least frequent characters:

  • Frequencies: [2, 3, 3]
sorted_frequencies = sorted(frequency_count.values())
# Output: [2, 3, 3]

Step 3: Calculate Minimum Deletions

Since we can only have k = 2 distinct characters, and currently we have 3 distinct characters, we need to remove the excess. The number of deletions required is the sum of the frequencies of the least frequent character(s) that we choose to remove.

  • We have 3 distinct characters, but we need only 2.
  • The frequency to be removed is 2 (frequency of 'a').
deletions_needed = sum(sorted_frequencies[:len(sorted_frequencies) - k])
# Output: 2

Step 4: Return Result

The minimum number of deletions required to reduce the distinct characters to k is 2.

This method efficiently limits the distinct characters while minimizing deletions by leveraging sorted character frequencies for strategic removal.

# Final result
print(deletions_needed) # Output: 2

Solution Implementation

1from collections import Counter
2
3class Solution:
4    def minDeletion(self, s: str, k: int) -> int:
5        # Count the frequency of each character in the string s
6        char_count = Counter(s)
7      
8        # Get the values (frequencies) from the counter and sort them in ascending order
9        frequencies = sorted(char_count.values())
10      
11        # Sum up all frequencies except the largest 'k' ones
12        # This removes the least frequent characters needed to minimize the deletions
13        return sum(frequencies[:-k])
14
1import java.util.Arrays;
2
3class Solution {
4    public int minDeletion(String s, int k) {
5        // Create an array to count the frequency of each character
6        int[] characterCount = new int[26];
7
8        // Increment the frequency counter for each character in the string
9        for (char character : s.toCharArray()) {
10            characterCount[character - 'a']++;
11        }
12
13        // Sort the character count array in ascending order
14        Arrays.sort(characterCount);
15
16        int deletions = 0;
17        // Sum up the frequencies of the first (26 - k) characters that need to be deleted 
18        // (those with the least frequency after sorting)
19        for (int i = 0; i + k < 26; i++) {
20            deletions += characterCount[i];
21        }
22
23        // Return the total number of deletions needed
24        return deletions;
25    }
26}
27
1#include <vector>
2#include <string>
3#include <algorithm>
4
5class Solution {
6public:
7    // Function to calculate the minimum number of deletions required
8    // to make the string 's' contain at most 'k' distinct characters.
9    int minDeletion(std::string s, int k) {
10        // Vector to count the frequency of each character (26 letters of the alphabet)
11        std::vector<int> characterCount(26, 0);
12      
13        // Populate the frequency vector with counts of each character in the string
14        for (char c : s) {
15            ++characterCount[c - 'a'];
16        }
17      
18        // Sort the frequency vector in ascending order
19        std::sort(characterCount.begin(), characterCount.end());
20      
21        // Variable to accumulate the number of deletions needed
22        int deletionsRequired = 0;
23      
24        // Iterate over the sorted frequency vector, considering only
25        // the characters which are in excess of the allowed 'k' distinct characters
26        for (int i = 0; i + k < 26; ++i) {
27            // Add the frequency of the least frequent characters to deletions count
28            deletionsRequired += characterCount[i];
29        }
30      
31        // Return the total number of deletions required
32        return deletionsRequired;
33    }
34};
35
1// Function to find the minimum number of deletions needed to have exactly `k` unique characters remaining in the string `s`
2function minDeletion(s: string, k: number): number {
3    // Array to keep count of occurrences of each character (assuming lowercase English letters)
4    const count: number[] = Array(26).fill(0);
5  
6    // Increment the count for each character in the string `s`
7    for (const char of s) {
8        count[char.charCodeAt(0) - 97]++;
9    }
10  
11    // Sort the counts in ascending order to identify characters that can be deleted
12    count.sort((a, b) => a - b);
13  
14    // Sum the counts of the characters that fall outside the `k` largest counts
15    return count.slice(0, 26 - k).reduce((sum, current) => sum + current, 0);
16}
17

Time and Space Complexity

The time complexity of the code is determined by two main operations: counting the frequency of each character in the string using Counter(s), which takes O(n) time where n is the length of the string s, and sorting the frequency values, which takes O(|\Sigma| \times \log |\Sigma|) time. In this problem, |\Sigma| = 26, representing the fixed size of the lowercase English alphabet. Therefore, the sorting operation dominates the overall time complexity, resulting in O(|\Sigma| \times \log |\Sigma|).

The space complexity is determined by storing the character counts, which requires O(|\Sigma|) space, again considering the fixed alphabet size. Hence, the overall space complexity is O(|\Sigma|).


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More