3545. Minimum Deletions for At Most K Distinct Characters
Problem Description
You are given a string s
that contains only lowercase English letters and an integer k
.
The goal is to delete some characters (you can delete zero or more characters) from the string such that the resulting string has at most k
distinct characters.
For example:
- If
s = "aabbcc"
andk = 2
, you could delete all 'c' characters to get"aabb"
, which has only 2 distinct characters ('a' and 'b'). The minimum deletions would be 2. - If
s = "abcdef"
andk = 3
, you need to remove 3 characters to keep only 3 distinct characters. You could delete 'd', 'e', 'f' to get"abc"
.
The task is to find the minimum number of character deletions needed to ensure the string has no more than k
distinct characters.
The solution approach counts the frequency of each character using Counter(s).values()
, sorts these frequencies in ascending order, and then sums up the frequencies of the characters that need to be removed (the first 26 - k
characters when sorted, since there are at most 26 lowercase English letters). By removing characters with the lowest frequencies first, we minimize the total number of deletions.
Intuition
To minimize the number of deletions while keeping at most k
distinct characters, we need to decide which characters to keep and which to remove completely.
The key insight is that if we're going to remove a character, we should remove all occurrences of that character. Why? Because keeping even one occurrence of a character still counts as one distinct character, so partial removal doesn't help us reduce the distinct character count.
Given this, the problem becomes: which characters should we completely remove to minimize total deletions?
The greedy approach is to remove the characters that appear least frequently first. This makes intuitive sense - if we have to remove entire characters, we want to remove the ones that contribute the fewest characters to our total count.
For example, if we have character frequencies: a:10, b:8, c:2, d:1
and we need to keep only 2 distinct characters, we should remove 'd' (1 deletion) and 'c' (2 deletions) for a total of 3 deletions, rather than removing 'a' or 'b' which would require many more deletions.
This leads us to the solution strategy:
- Count the frequency of each character
- Sort these frequencies in ascending order
- Remove characters starting from the lowest frequency until we have at most
k
distinct characters remaining - The sum of frequencies of removed characters is our answer
The expression sorted(Counter(s).values())[:-k]
elegantly captures this - it takes all frequency values except the k
largest ones (which we keep), and sums the rest (which we delete).
Solution Approach
The implementation uses a counting and greedy strategy to solve the problem efficiently.
Step 1: Count Character Frequencies
We use Python's Counter
from the collections module to count the frequency of each character in the string s
. The expression Counter(s)
creates a dictionary-like object where keys are characters and values are their frequencies.
Step 2: Extract and Sort Frequencies
Counter(s).values()
extracts just the frequency values (we don't need to know which character has which frequency). These values are then sorted in ascending order using sorted()
.
For example, if s = "aabbbbccc"
, we get:
- Counter:
{'a': 2, 'b': 4, 'c': 3}
- Values:
[2, 4, 3]
- Sorted:
[2, 3, 4]
Step 3: Select Characters to Remove
The slice notation [:-k]
selects all elements except the last k
elements. Since our frequencies are sorted in ascending order, the last k
elements represent the k
most frequent characters - these are the ones we want to keep.
The characters corresponding to the frequencies in [:-k]
are the ones we need to remove entirely.
Step 4: Calculate Total Deletions
sum()
adds up all the frequencies of characters we're removing. This gives us the total number of deletions needed.
Example Walkthrough:
If s = "aaabbbccd"
and k = 2
:
- Counter gives us:
{'a': 3, 'b': 3, 'c': 2, 'd': 1}
- Sorted frequencies:
[1, 2, 3, 3]
- With
k = 2
, we take[:-2]
which is[1, 2]
- Sum is
1 + 2 = 3
, meaning we delete 3 characters total
The time complexity is O(n + m log m)
where n
is the length of the string and m
is the number of distinct characters (at most 26). The space complexity is O(m)
for storing the character counts.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with s = "aabbccddde"
and k = 2
.
Step 1: Count Character Frequencies
- 'a' appears 2 times
- 'b' appears 2 times
- 'c' appears 2 times
- 'd' appears 3 times
- 'e' appears 1 time
So we have 5 distinct characters with frequencies: {a:2, b:2, c:2, d:3, e:1}
Step 2: Sort Frequencies in Ascending Order
Extract just the frequency values: [2, 2, 2, 3, 1]
Sort them: [1, 2, 2, 2, 3]
Step 3: Determine Which Characters to Remove
We want to keep at most k = 2
distinct characters. Since we have 5 distinct characters total, we need to remove 3 characters completely.
Using [:-k]
on our sorted list [1, 2, 2, 2, 3]
:
[:-2]
gives us[1, 2, 2]
- These are the 3 smallest frequencies, representing the characters we'll remove
Step 4: Calculate Minimum Deletions
Sum the frequencies of characters to remove: 1 + 2 + 2 = 5
This means we need to delete 5 characters total. Looking back at our original frequencies:
- We keep the 2 characters with highest frequencies: 'd' (3 occurrences) and one of the characters with 2 occurrences
- We remove: 'e' (1 deletion), and two of the characters that appear twice (4 deletions)
- Total: 5 deletions
The resulting string would have exactly 2 distinct characters remaining, achieved with the minimum possible deletions.
Solution Implementation
1from collections import Counter
2
3class Solution:
4 def minDeletion(self, s: str, k: int) -> int:
5 # Count the frequency of each character in the string
6 char_frequencies = Counter(s)
7
8 # Get all frequency values and sort them in ascending order
9 sorted_frequencies = sorted(char_frequencies.values())
10
11 # Calculate minimum deletions needed to keep only k distinct characters
12 # Sum all frequencies except the k largest ones (which we want to keep)
13 # If k >= number of distinct characters, no deletions needed (sum of empty list is 0)
14 min_deletions = sum(sorted_frequencies[:-k] if k > 0 else sorted_frequencies)
15
16 return min_deletions
17
1class Solution {
2 public int minDeletion(String s, int k) {
3 // Create an array to count frequency of each character (26 lowercase letters)
4 int[] characterFrequency = new int[26];
5
6 // Count the frequency of each character in the string
7 for (char currentChar : s.toCharArray()) {
8 characterFrequency[currentChar - 'a']++;
9 }
10
11 // Sort the frequency array in ascending order
12 Arrays.sort(characterFrequency);
13
14 // Calculate minimum deletions needed
15 // We keep the k most frequent characters and delete all others
16 int minimumDeletions = 0;
17 for (int i = 0; i + k < 26; i++) {
18 // Add frequencies of characters that will be deleted
19 // These are the (26 - k) least frequent characters
20 minimumDeletions += characterFrequency[i];
21 }
22
23 return minimumDeletions;
24 }
25}
26
1class Solution {
2public:
3 int minDeletion(string s, int k) {
4 // Count frequency of each character (a-z)
5 vector<int> charFrequency(26, 0);
6 for (char c : s) {
7 charFrequency[c - 'a']++;
8 }
9
10 // Sort frequencies in ascending order
11 sort(charFrequency.begin(), charFrequency.end());
12
13 // Calculate minimum deletions needed
14 // We keep the k most frequent characters and delete the rest
15 int minDeletions = 0;
16 for (int i = 0; i + k < 26; i++) {
17 minDeletions += charFrequency[i];
18 }
19
20 return minDeletions;
21 }
22};
23
1/**
2 * Calculates the minimum number of characters to delete from string s
3 * to ensure at most k distinct characters remain
4 * @param s - The input string containing lowercase letters
5 * @param k - The maximum number of distinct characters allowed
6 * @returns The minimum number of deletions required
7 */
8function minDeletion(s: string, k: number): number {
9 // Initialize frequency array for 26 lowercase letters (a-z)
10 const characterFrequencies: number[] = Array(26).fill(0);
11
12 // Count the frequency of each character in the string
13 for (const character of s) {
14 // Convert character to index (0-25) by subtracting ASCII value of 'a'
15 const charIndex = character.charCodeAt(0) - 97;
16 characterFrequencies[charIndex]++;
17 }
18
19 // Sort frequencies in ascending order to prioritize keeping characters with higher frequencies
20 characterFrequencies.sort((a, b) => a - b);
21
22 // Calculate total deletions by summing frequencies of characters to be removed
23 // We keep the (26 - k) least frequent characters, which means keeping the last k characters after sorting
24 const charactersToDelete = characterFrequencies.slice(0, 26 - k);
25 const totalDeletions = charactersToDelete.reduce((sum, frequency) => sum + frequency, 0);
26
27 return totalDeletions;
28}
29
Time and Space Complexity
Time Complexity: O(|ฮฃ| ร log |ฮฃ|)
where |ฮฃ|
is the size of the character set. In this problem, |ฮฃ| = 26
since we're dealing with lowercase English letters.
The time complexity breaks down as follows:
Counter(s)
takesO(n)
time wheren
is the length of strings
Counter(s).values()
returns at most|ฮฃ|
frequency valuessorted()
on these values takesO(|ฮฃ| ร log |ฮฃ|)
time- Slicing
[:-k]
takesO(|ฮฃ|)
time sum()
takesO(|ฮฃ|)
time
Since n โฅ |ฮฃ|
(the string length is at least as large as the number of unique characters), and |ฮฃ|
is constant (26), the dominant operation is the sorting step, giving us O(|ฮฃ| ร log |ฮฃ|)
.
Space Complexity: O(|ฮฃ|)
where |ฮฃ| = 26
The space complexity comes from:
Counter(s)
stores at most|ฮฃ|
key-value pairs:O(|ฮฃ|)
Counter(s).values()
creates a view/list of at most|ฮฃ|
values:O(|ฮฃ|)
sorted()
creates a new list of at most|ฮฃ|
elements:O(|ฮฃ|)
The overall space complexity is O(|ฮฃ|)
.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Edge Case: When k = 0
Pitfall: The slice [:-k]
with k = 0
results in [:-0]
, which returns an empty list instead of the full list. This would incorrectly return 0 deletions when we actually need to delete all characters.
Example:
- Input:
s = "abc"
,k = 0
- Expected: Delete all 3 characters
- Buggy result:
sorted_frequencies[:-0]
returns[]
, sum is 0
Solution:
def minDeletion(self, s: str, k: int) -> int:
char_frequencies = Counter(s)
sorted_frequencies = sorted(char_frequencies.values())
# Handle k = 0 case explicitly
if k == 0:
return len(s)
# For k > 0, use the slicing approach
return sum(sorted_frequencies[:-k])
2. When k Exceeds Number of Distinct Characters
Pitfall: If k
is greater than or equal to the number of distinct characters, the slice [:-k]
might behave unexpectedly. For instance, if there are 3 distinct characters and k = 5
, [:-5]
would return an empty list, which is correct but not immediately obvious.
Better Approach for Clarity:
def minDeletion(self, s: str, k: int) -> int:
char_frequencies = Counter(s)
sorted_frequencies = sorted(char_frequencies.values())
# No deletions needed if k covers all distinct characters
if k >= len(sorted_frequencies):
return 0
# Sum frequencies of characters to remove (smallest frequencies first)
return sum(sorted_frequencies[:len(sorted_frequencies) - k])
3. Misunderstanding the Slice Notation
Pitfall: Confusing [:-k]
with [:k]
. The notation [:-k]
means "all elements except the last k", while [:k]
means "the first k elements".
Incorrect Implementation:
# WRONG: This would keep characters with lowest frequencies
return sum(sorted_frequencies[:k])
Correct Implementation:
# RIGHT: Remove characters with lowest frequencies
return sum(sorted_frequencies[:-k])
4. Not Handling Empty String
Pitfall: While Counter("")
handles empty strings correctly (returns empty Counter), it's good practice to explicitly handle this edge case for clarity.
Robust Solution:
def minDeletion(self, s: str, k: int) -> int:
if not s or k >= 26: # Empty string or k covers all possible lowercase letters
return 0
char_frequencies = Counter(s)
sorted_frequencies = sorted(char_frequencies.values())
if k == 0:
return len(s)
if k >= len(sorted_frequencies):
return 0
return sum(sorted_frequencies[:-k])
What are the two properties the problem needs to have for dynamic programming to be applicable? (Select 2)
Recommended Readings
Greedy Introduction div class responsive iframe iframe src https www youtube com embed WTslqPbj7I title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div When do we use greedy Greedy algorithms tend to solve optimization problems Typically they will ask you to calculate the max min of some value Commonly you may see this phrased in the problem as max min longest shortest largest smallest etc These keywords can be identified by just scanning
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Donโt Miss This!