3545. Minimum Deletions for At Most K Distinct Characters
Problem Description
You are given a string s
consisting of lowercase English letters, and an integer k
.
Your task is to delete some (possibly none) of the characters in the string so that the number of distinct characters in the resulting string is at most k
.
Return the minimum number of deletions required to achieve this.
Intuition
The main goal is to limit the number of distinct characters in the string s
to at most k
by performing the fewest deletions. To tackle this, we employ a greedy strategy along with frequency counting. Here is a breakdown of the approach:
-
Character Frequency Counting: First, count the frequency of each character in the string
s
. This helps us understand which characters are more frequent and potentially more costly to remove. -
Sorting Frequencies: Once the frequencies are counted, we sort them. By sorting, we order the characters by how often they appear, from least to most. This allows us to make informed decisions about which characters to delete.
-
Greedy Deletion: To minimize deletions, we should remove characters that have the lowest frequencies, as these are the least costly in terms of resulting impact on the number of deletions. Since we want to limit the number of distinct characters to
k
, we calculate how many characters can remain, which is at mostk
. -
Calculation of Minimum Deletions: Finally, to ensure that the total number of distinct characters is reduced to
k
, sum up the frequencies of the least frequent characters that exceed the topk
most frequent ones. The sum of these frequencies gives the minimum deletions required to achieve the desired number of distinct characters.
This approach balances maintaining frequent characters while minimizing deletions, arriving at an optimal solution efficiently.
Solution Approach
To solve the problem of reducing the number of distinct characters to at most k
, we can utilize a counting mechanism followed by a greedy strategy to determine the minimum deletions required. Here’s how the implementation is structured:
-
Character Frequency Counting: Use the
Counter
from Python'scollections
module to efficiently count the frequency of each character in the strings
. This step provides us with a dictionary where keys are characters and values are their respective counts.from collections import Counter frequency_count = Counter(s)
-
Sorting Frequencies: Once we have the frequencies, extract them and sort them in ascending order. This sorts characters based on how frequently they appear in
s
, which helps in identifying the least frequent characters that are candidates for removal.sorted_frequencies = sorted(frequency_count.values())
-
Calculate Minimum Deletions: The key is to reduce the number of distinct characters to
k
. Therefore, if there are more thank
distinct characters, we need to delete some.- Determine how many deletions are necessary by subtracting
k
from the total number of distinct characters (or frequencies). - Sum the frequencies of the least frequent characters (based on sorting) that are in excess of
k
to calculate the minimum deletions required.
deletions_needed = sum(sorted_frequencies[:len(sorted_frequencies) - k])
- Determine how many deletions are necessary by subtracting
-
Return Result: The sum obtained gives the minimum number of deletions required to limit the number of distinct characters to
k
.
This approach leverages the combination of counting, sorting, and a greedy selection process to strategically determine the minimal deletions necessary. The implementation is straightforward and efficient, mainly due to the simplicity of counting and sorting operations in Python.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's take a small example to illustrate the solution approach. Consider the string s = "aacccbbb"
and k = 2
. Our goal is to delete characters such that there are at most 2 distinct characters left.
Step 1: Character Frequency Counting
First, count the frequency of each character in s
. Using Python's collections.Counter
, we get:
'a'
: 2'c'
: 3'b'
: 3
from collections import Counter frequency_count = Counter("aacccbbb") # Output: Counter({'c': 3, 'b': 3, 'a': 2})
Step 2: Sorting Frequencies
Extract these frequencies and sort them in ascending order to prioritize removing the least frequent characters:
- Frequencies:
[2, 3, 3]
sorted_frequencies = sorted(frequency_count.values())
# Output: [2, 3, 3]
Step 3: Calculate Minimum Deletions
Since we can only have k = 2
distinct characters, and currently we have 3 distinct characters, we need to remove the excess. The number of deletions required is the sum of the frequencies of the least frequent character(s) that we choose to remove.
- We have 3 distinct characters, but we need only 2.
- The frequency to be removed is
2
(frequency of'a'
).
deletions_needed = sum(sorted_frequencies[:len(sorted_frequencies) - k])
# Output: 2
Step 4: Return Result
The minimum number of deletions required to reduce the distinct characters to k is 2.
This method efficiently limits the distinct characters while minimizing deletions by leveraging sorted character frequencies for strategic removal.
# Final result
print(deletions_needed) # Output: 2
Solution Implementation
1from collections import Counter
2
3class Solution:
4 def minDeletion(self, s: str, k: int) -> int:
5 # Count the frequency of each character in the string s
6 char_count = Counter(s)
7
8 # Get the values (frequencies) from the counter and sort them in ascending order
9 frequencies = sorted(char_count.values())
10
11 # Sum up all frequencies except the largest 'k' ones
12 # This removes the least frequent characters needed to minimize the deletions
13 return sum(frequencies[:-k])
14
1import java.util.Arrays;
2
3class Solution {
4 public int minDeletion(String s, int k) {
5 // Create an array to count the frequency of each character
6 int[] characterCount = new int[26];
7
8 // Increment the frequency counter for each character in the string
9 for (char character : s.toCharArray()) {
10 characterCount[character - 'a']++;
11 }
12
13 // Sort the character count array in ascending order
14 Arrays.sort(characterCount);
15
16 int deletions = 0;
17 // Sum up the frequencies of the first (26 - k) characters that need to be deleted
18 // (those with the least frequency after sorting)
19 for (int i = 0; i + k < 26; i++) {
20 deletions += characterCount[i];
21 }
22
23 // Return the total number of deletions needed
24 return deletions;
25 }
26}
27
1#include <vector>
2#include <string>
3#include <algorithm>
4
5class Solution {
6public:
7 // Function to calculate the minimum number of deletions required
8 // to make the string 's' contain at most 'k' distinct characters.
9 int minDeletion(std::string s, int k) {
10 // Vector to count the frequency of each character (26 letters of the alphabet)
11 std::vector<int> characterCount(26, 0);
12
13 // Populate the frequency vector with counts of each character in the string
14 for (char c : s) {
15 ++characterCount[c - 'a'];
16 }
17
18 // Sort the frequency vector in ascending order
19 std::sort(characterCount.begin(), characterCount.end());
20
21 // Variable to accumulate the number of deletions needed
22 int deletionsRequired = 0;
23
24 // Iterate over the sorted frequency vector, considering only
25 // the characters which are in excess of the allowed 'k' distinct characters
26 for (int i = 0; i + k < 26; ++i) {
27 // Add the frequency of the least frequent characters to deletions count
28 deletionsRequired += characterCount[i];
29 }
30
31 // Return the total number of deletions required
32 return deletionsRequired;
33 }
34};
35
1// Function to find the minimum number of deletions needed to have exactly `k` unique characters remaining in the string `s`
2function minDeletion(s: string, k: number): number {
3 // Array to keep count of occurrences of each character (assuming lowercase English letters)
4 const count: number[] = Array(26).fill(0);
5
6 // Increment the count for each character in the string `s`
7 for (const char of s) {
8 count[char.charCodeAt(0) - 97]++;
9 }
10
11 // Sort the counts in ascending order to identify characters that can be deleted
12 count.sort((a, b) => a - b);
13
14 // Sum the counts of the characters that fall outside the `k` largest counts
15 return count.slice(0, 26 - k).reduce((sum, current) => sum + current, 0);
16}
17
Time and Space Complexity
The time complexity of the code is determined by two main operations: counting the frequency of each character in the string using Counter(s)
, which takes O(n)
time where n
is the length of the string s
, and sorting the frequency values, which takes O(|\Sigma| \times \log |\Sigma|)
time. In this problem, |\Sigma| = 26
, representing the fixed size of the lowercase English alphabet. Therefore, the sorting operation dominates the overall time complexity, resulting in O(|\Sigma| \times \log |\Sigma|)
.
The space complexity is determined by storing the character counts, which requires O(|\Sigma|)
space, again considering the fixed alphabet size. Hence, the overall space complexity is O(|\Sigma|)
.
Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!