1647. Minimum Deletions to Make Character Frequencies Unique
Problem Description
You are given a string s
consisting of lowercase English letters. A string is called good if no two different characters in the string have the same frequency (number of occurrences).
Your task is to find the minimum number of characters you need to delete from s
to make it a good string.
For example:
- In the string
"aab"
, character'a'
appears 2 times and character'b'
appears 1 time. Since the frequencies are different (2 and 1), this string is already good and requires 0 deletions. - If you have a string where two different characters appear the same number of times, you would need to delete some occurrences of one or both characters to ensure all frequencies are unique.
The goal is to minimize the total number of deletions while ensuring that each character that remains in the string has a unique frequency compared to all other characters in the string.
Intuition
The key insight is that we want to keep as many characters as possible while ensuring all frequencies are unique. To minimize deletions, we should preserve higher frequencies when possible and adjust lower ones.
Think of it this way: if we have characters with frequencies [5, 5, 3, 2]
, we need to make them all unique. The greedy approach would be to keep the highest frequency as-is and adjust others downward. So we could transform them to [5, 4, 3, 2]
by deleting just 1 character from the second group.
This suggests we should:
- First count the frequency of each character
- Sort these frequencies in descending order
- Process them from highest to lowest
When processing each frequency, we track what the "previous" valid frequency was. If the current frequency is greater than or equal to the previous one, we need to reduce it to at least previous - 1
to maintain uniqueness. The number of deletions needed would be current - (previous - 1)
.
For example, if we have frequencies [5, 5, 3]
:
- Keep the first
5
as-is (previous = 5) - The second
5
must become at most4
, requiring 1 deletion (previous = 4) - The
3
is already less than4
, so no deletions needed (previous = 3)
The special case is when the previous frequency reaches 0
- at that point, we must delete all remaining characters since we can't have negative frequencies.
This greedy strategy works because by processing from highest to lowest frequency, we minimize the "cascade effect" of adjustments needed to maintain uniqueness throughout the entire set of frequencies.
Solution Approach
The implementation follows our greedy strategy using sorting and tracking the previous valid frequency:
-
Count Character Frequencies: First, we use a
Counter
to count the occurrences of each character in the strings
. This gives us a dictionary where keys are characters and values are their frequencies. -
Sort Frequencies in Descending Order: We extract just the frequency values from the counter using
cnt.values()
and sort them in descending order withsorted(cnt.values(), reverse=True)
. We don't need to track which character has which frequency - we only care about the frequency values themselves. -
Initialize Variables:
ans = 0
: Tracks the total number of deletions neededpre = inf
: Represents the previous valid frequency. We start with infinity since the first (highest) frequency can be any value
-
Process Each Frequency: We iterate through each frequency
v
in our sorted list:-
Case 1:
pre == 0
: If the previous valid frequency has reached 0, we cannot assign any positive frequency to this character (all available frequencies from 1 to our previous value are taken). So we must delete allv
occurrences of this character:ans += v
-
Case 2:
v >= pre
: The current frequency is greater than or equal to the previous valid frequency. We need to reduce it topre - 1
to maintain uniqueness. The number of deletions needed isv - (pre - 1) = v - pre + 1
. We add this to our answer and updatepre
topre - 1
for the next iteration. -
Case 3:
v < pre
: The current frequency is already less than the previous valid frequency, so no deletions are needed. We simply updatepre = v
to use this as the new upper bound for subsequent frequencies.
-
-
Return Result: After processing all frequencies,
ans
contains the minimum number of deletions needed to make the string good.
For example, with frequencies [5, 5, 3, 2]
:
- First
5
:pre = inf
, sopre
becomes5
- Second
5
:v >= pre
, so deletions =5 - 5 + 1 = 1
,pre
becomes4
3
:v < pre
, so no deletions,pre
becomes3
2
:v < pre
, so no deletions,pre
becomes2
- Total deletions:
1
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with the string s = "ceabaacb"
.
Step 1: Count Character Frequencies
- 'c': appears 2 times
- 'e': appears 1 time
- 'a': appears 3 times
- 'b': appears 2 times
Step 2: Extract and Sort Frequencies
- Frequencies: [3, 2, 2, 1]
- Sorted in descending order: [3, 2, 2, 1]
Step 3: Process Each Frequency
Initialize: ans = 0
, pre = infinity
-
Frequency 3:
- Since
pre = infinity
, we can keep all 3 occurrences - No deletions needed
- Update:
pre = 3
,ans = 0
- Since
-
Frequency 2 (first one):
- Current frequency (2) < previous valid frequency (3)
- No deletions needed
- Update:
pre = 2
,ans = 0
-
Frequency 2 (second one):
- Current frequency (2) >= previous valid frequency (2)
- Must reduce to
pre - 1 = 1
- Deletions needed:
2 - 1 = 1
- Update:
pre = 1
,ans = 1
-
Frequency 1:
- Current frequency (1) >= previous valid frequency (1)
- Must reduce to
pre - 1 = 0
- Deletions needed:
1 - 0 = 1
- Update:
pre = 0
,ans = 2
Result: We need to delete 2 characters minimum.
The final frequencies after deletions would be [3, 2, 1, 0], meaning:
- One character appears 3 times
- One character appears 2 times
- One character appears 1 time
- One character is completely removed
This ensures all remaining characters have unique frequencies, making the string "good".
Solution Implementation
1from collections import Counter
2from math import inf
3
4class Solution:
5 def minDeletions(self, s: str) -> int:
6 # Count frequency of each character in the string
7 char_frequency = Counter(s)
8
9 # Initialize variables
10 deletions = 0 # Total number of deletions needed
11 previous_frequency = inf # Track the previous valid frequency level
12
13 # Process frequencies in descending order
14 for current_freq in sorted(char_frequency.values(), reverse=True):
15 # If previous frequency is 0, we must delete all occurrences of current character
16 if previous_frequency == 0:
17 deletions += current_freq
18 # If current frequency >= previous, we need to reduce it below previous
19 elif current_freq >= previous_frequency:
20 # Delete enough characters to make frequency = previous - 1
21 deletions += current_freq - previous_frequency + 1
22 previous_frequency -= 1 # Update previous for next iteration
23 # If current frequency < previous, it's already unique
24 else:
25 previous_frequency = current_freq # Update previous to current
26
27 return deletions
28
1class Solution {
2 public int minDeletions(String s) {
3 // Array to store frequency count of each character (a-z)
4 int[] frequencyCount = new int[26];
5
6 // Count the frequency of each character in the string
7 for (int i = 0; i < s.length(); i++) {
8 frequencyCount[s.charAt(i) - 'a']++;
9 }
10
11 // Sort frequencies in ascending order
12 Arrays.sort(frequencyCount);
13
14 // Initialize deletion counter
15 int deletions = 0;
16
17 // Iterate from second highest frequency to lowest (non-zero frequencies are at the end after sorting)
18 for (int i = 24; i >= 0; i--) {
19 // If current frequency is greater than or equal to the next higher frequency,
20 // we need to delete characters to make it unique
21 while (frequencyCount[i] >= frequencyCount[i + 1] && frequencyCount[i] > 0) {
22 frequencyCount[i]--;
23 deletions++;
24 }
25 }
26
27 return deletions;
28 }
29}
30
1class Solution {
2public:
3 int minDeletions(string s) {
4 // Count frequency of each character (a-z)
5 vector<int> frequency(26, 0);
6 for (char& ch : s) {
7 frequency[ch - 'a']++;
8 }
9
10 // Sort frequencies in descending order
11 sort(frequency.rbegin(), frequency.rend());
12
13 // Track number of deletions needed
14 int deletions = 0;
15
16 // Iterate through frequencies and ensure each is less than the previous
17 for (int i = 1; i < 26; i++) {
18 // While current frequency is greater than or equal to previous frequency
19 // and is not zero, decrement it
20 while (frequency[i] >= frequency[i - 1] && frequency[i] > 0) {
21 frequency[i]--;
22 deletions++;
23 }
24 }
25
26 return deletions;
27 }
28};
29
1/**
2 * Finds the minimum number of character deletions needed to make each character frequency unique
3 * @param s - Input string
4 * @returns Minimum number of deletions required
5 */
6function minDeletions(s: string): number {
7 // Create a frequency map to count occurrences of each character
8 const frequencyMap: Record<string, number> = {};
9 for (const char of s) {
10 frequencyMap[char] = (frequencyMap[char] || 0) + 1;
11 }
12
13 // Track total deletions needed
14 let deletionCount: number = 0;
15
16 // Extract frequency values and sort them in ascending order
17 const frequencies: number[] = Object.values(frequencyMap);
18 frequencies.sort((a, b) => a - b);
19
20 // Iterate through frequencies and reduce duplicates
21 for (let i = 1; i < frequencies.length; i++) {
22 // Keep reducing current frequency while it's positive and appears earlier in the array
23 while (frequencies[i] > 0 && i !== frequencies.indexOf(frequencies[i])) {
24 frequencies[i]--;
25 deletionCount++;
26 }
27 }
28
29 return deletionCount;
30}
31
Time and Space Complexity
Time Complexity: O(n + |ฮฃ| ร log |ฮฃ|)
The time complexity consists of two main parts:
O(n)
for counting the frequency of each character in the string usingCounter(s)
, wheren
is the length of the stringO(|ฮฃ| ร log |ฮฃ|)
for sorting the frequency values, where|ฮฃ|
represents the number of unique characters (at most 26 for lowercase English letters)- The subsequent loop iterates through at most
|ฮฃ|
elements withO(1)
operations each, contributingO(|ฮฃ|)
Since |ฮฃ| โค 26
is a constant in this problem, the dominant term is O(n)
for large inputs, but the precise complexity is O(n + |ฮฃ| ร log |ฮฃ|)
.
Space Complexity: O(|ฮฃ|)
The space complexity is determined by:
O(|ฮฃ|)
for the Counter dictionary storing at most|ฮฃ|
unique characters and their frequenciesO(|ฮฃ|)
for the sorted list of frequency valuesO(1)
for the auxiliary variables (ans
,pre
)
The overall space complexity is O(|ฮฃ|)
, which equals O(26) = O(1)
for this specific problem with lowercase English letters.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
Pitfall 1: Not Handling the Case When Frequency Reaches Zero
The Problem:
A critical mistake is forgetting to handle what happens when previous_frequency
reaches 0. Once we've used frequency levels down to 1, the next character cannot have any positive frequency without creating a duplicate. Many implementations incorrectly allow previous_frequency
to become negative, which doesn't make sense for character counts.
Incorrect Implementation:
def minDeletions(self, s: str) -> int:
char_frequency = Counter(s)
deletions = 0
previous_frequency = inf
for current_freq in sorted(char_frequency.values(), reverse=True):
if current_freq >= previous_frequency:
deletions += current_freq - previous_frequency + 1
previous_frequency -= 1 # This can go negative!
else:
previous_frequency = current_freq
return deletions
Why It Fails:
For input like "aaabbbccc"
with frequencies [3, 3, 3]
:
- First 3:
previous_frequency = 3
- Second 3: deletions = 1,
previous_frequency = 2
- Third 3: deletions += 2,
previous_frequency = 1
- If there was a fourth character with frequency 3:
previous_frequency
would become 0, then -1, -2, etc.
The Solution:
Always check if previous_frequency == 0
before processing. When it reaches 0, all remaining characters must be completely deleted:
if previous_frequency == 0: deletions += current_freq elif current_freq >= previous_frequency: deletions += current_freq - previous_frequency + 1 previous_frequency -= 1 else: previous_frequency = current_freq
Pitfall 2: Using previous_frequency = previous_frequency - 1
Without Bounds Checking
The Problem:
When reducing previous_frequency
, developers often write previous_frequency = previous_frequency - 1
without ensuring it doesn't go below 0.
Better Approach:
elif current_freq >= previous_frequency:
deletions += current_freq - previous_frequency + 1
previous_frequency = max(0, previous_frequency - 1) # Ensure non-negative
This prevents logical errors and makes the zero-check more explicit in subsequent iterations.
Pitfall 3: Misunderstanding the Frequency Assignment Logic
The Problem: Some implementations try to find the "next available" frequency slot by checking all previously used frequencies. This overcomplicates the solution and often leads to errors.
Incorrect Approach:
used_frequencies = set()
for freq in sorted(char_frequency.values(), reverse=True):
while freq > 0 and freq in used_frequencies:
freq -= 1
deletions += 1
if freq > 0:
used_frequencies.add(freq)
Why the Greedy Approach is Better:
The sorted greedy approach with previous_frequency
tracking is more efficient and cleaner. By processing frequencies in descending order and maintaining the constraint that each frequency must be less than the previous one, we automatically ensure uniqueness without needing to track all used frequencies.
How many times is a tree node visited in a depth first search?
Recommended Readings
Greedy Introduction div class responsive iframe iframe src https www youtube com embed WTslqPbj7I title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div When do we use greedy Greedy algorithms tend to solve optimization problems Typically they will ask you to calculate the max min of some value Commonly you may see this phrased in the problem as max min longest shortest largest smallest etc These keywords can be identified by just scanning
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Donโt Miss This!