Facebook Pixel

1647. Minimum Deletions to Make Character Frequencies Unique

MediumGreedyHash TableStringSorting
Leetcode Link

Problem Description

You are given a string s consisting of lowercase English letters. A string is called good if no two different characters in the string have the same frequency (number of occurrences).

Your task is to find the minimum number of characters you need to delete from s to make it a good string.

For example:

  • In the string "aab", character 'a' appears 2 times and character 'b' appears 1 time. Since the frequencies are different (2 and 1), this string is already good and requires 0 deletions.
  • If you have a string where two different characters appear the same number of times, you would need to delete some occurrences of one or both characters to ensure all frequencies are unique.

The goal is to minimize the total number of deletions while ensuring that each character that remains in the string has a unique frequency compared to all other characters in the string.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that we want to keep as many characters as possible while ensuring all frequencies are unique. To minimize deletions, we should preserve higher frequencies when possible and adjust lower ones.

Think of it this way: if we have characters with frequencies [5, 5, 3, 2], we need to make them all unique. The greedy approach would be to keep the highest frequency as-is and adjust others downward. So we could transform them to [5, 4, 3, 2] by deleting just 1 character from the second group.

This suggests we should:

  1. First count the frequency of each character
  2. Sort these frequencies in descending order
  3. Process them from highest to lowest

When processing each frequency, we track what the "previous" valid frequency was. If the current frequency is greater than or equal to the previous one, we need to reduce it to at least previous - 1 to maintain uniqueness. The number of deletions needed would be current - (previous - 1).

For example, if we have frequencies [5, 5, 3]:

  • Keep the first 5 as-is (previous = 5)
  • The second 5 must become at most 4, requiring 1 deletion (previous = 4)
  • The 3 is already less than 4, so no deletions needed (previous = 3)

The special case is when the previous frequency reaches 0 - at that point, we must delete all remaining characters since we can't have negative frequencies.

This greedy strategy works because by processing from highest to lowest frequency, we minimize the "cascade effect" of adjustments needed to maintain uniqueness throughout the entire set of frequencies.

Learn more about Greedy and Sorting patterns.

Solution Approach

The implementation follows our greedy strategy using sorting and tracking the previous valid frequency:

  1. Count Character Frequencies: First, we use a Counter to count the occurrences of each character in the string s. This gives us a dictionary where keys are characters and values are their frequencies.

  2. Sort Frequencies in Descending Order: We extract just the frequency values from the counter using cnt.values() and sort them in descending order with sorted(cnt.values(), reverse=True). We don't need to track which character has which frequency - we only care about the frequency values themselves.

  3. Initialize Variables:

    • ans = 0: Tracks the total number of deletions needed
    • pre = inf: Represents the previous valid frequency. We start with infinity since the first (highest) frequency can be any value
  4. Process Each Frequency: We iterate through each frequency v in our sorted list:

    • Case 1: pre == 0: If the previous valid frequency has reached 0, we cannot assign any positive frequency to this character (all available frequencies from 1 to our previous value are taken). So we must delete all v occurrences of this character: ans += v

    • Case 2: v >= pre: The current frequency is greater than or equal to the previous valid frequency. We need to reduce it to pre - 1 to maintain uniqueness. The number of deletions needed is v - (pre - 1) = v - pre + 1. We add this to our answer and update pre to pre - 1 for the next iteration.

    • Case 3: v < pre: The current frequency is already less than the previous valid frequency, so no deletions are needed. We simply update pre = v to use this as the new upper bound for subsequent frequencies.

  5. Return Result: After processing all frequencies, ans contains the minimum number of deletions needed to make the string good.

For example, with frequencies [5, 5, 3, 2]:

  • First 5: pre = inf, so pre becomes 5
  • Second 5: v >= pre, so deletions = 5 - 5 + 1 = 1, pre becomes 4
  • 3: v < pre, so no deletions, pre becomes 3
  • 2: v < pre, so no deletions, pre becomes 2
  • Total deletions: 1

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with the string s = "ceabaacb".

Step 1: Count Character Frequencies

  • 'c': appears 2 times
  • 'e': appears 1 time
  • 'a': appears 3 times
  • 'b': appears 2 times

Step 2: Extract and Sort Frequencies

  • Frequencies: [3, 2, 2, 1]
  • Sorted in descending order: [3, 2, 2, 1]

Step 3: Process Each Frequency

Initialize: ans = 0, pre = infinity

  • Frequency 3:

    • Since pre = infinity, we can keep all 3 occurrences
    • No deletions needed
    • Update: pre = 3, ans = 0
  • Frequency 2 (first one):

    • Current frequency (2) < previous valid frequency (3)
    • No deletions needed
    • Update: pre = 2, ans = 0
  • Frequency 2 (second one):

    • Current frequency (2) >= previous valid frequency (2)
    • Must reduce to pre - 1 = 1
    • Deletions needed: 2 - 1 = 1
    • Update: pre = 1, ans = 1
  • Frequency 1:

    • Current frequency (1) >= previous valid frequency (1)
    • Must reduce to pre - 1 = 0
    • Deletions needed: 1 - 0 = 1
    • Update: pre = 0, ans = 2

Result: We need to delete 2 characters minimum.

The final frequencies after deletions would be [3, 2, 1, 0], meaning:

  • One character appears 3 times
  • One character appears 2 times
  • One character appears 1 time
  • One character is completely removed

This ensures all remaining characters have unique frequencies, making the string "good".

Solution Implementation

1from collections import Counter
2from math import inf
3
4class Solution:
5    def minDeletions(self, s: str) -> int:
6        # Count frequency of each character in the string
7        char_frequency = Counter(s)
8      
9        # Initialize variables
10        deletions = 0  # Total number of deletions needed
11        previous_frequency = inf  # Track the previous valid frequency level
12      
13        # Process frequencies in descending order
14        for current_freq in sorted(char_frequency.values(), reverse=True):
15            # If previous frequency is 0, we must delete all occurrences of current character
16            if previous_frequency == 0:
17                deletions += current_freq
18            # If current frequency >= previous, we need to reduce it below previous
19            elif current_freq >= previous_frequency:
20                # Delete enough characters to make frequency = previous - 1
21                deletions += current_freq - previous_frequency + 1
22                previous_frequency -= 1  # Update previous for next iteration
23            # If current frequency < previous, it's already unique
24            else:
25                previous_frequency = current_freq  # Update previous to current
26      
27        return deletions
28
1class Solution {
2    public int minDeletions(String s) {
3        // Array to store frequency count of each character (a-z)
4        int[] frequencyCount = new int[26];
5      
6        // Count the frequency of each character in the string
7        for (int i = 0; i < s.length(); i++) {
8            frequencyCount[s.charAt(i) - 'a']++;
9        }
10      
11        // Sort frequencies in ascending order
12        Arrays.sort(frequencyCount);
13      
14        // Initialize deletion counter
15        int deletions = 0;
16      
17        // Iterate from second highest frequency to lowest (non-zero frequencies are at the end after sorting)
18        for (int i = 24; i >= 0; i--) {
19            // If current frequency is greater than or equal to the next higher frequency,
20            // we need to delete characters to make it unique
21            while (frequencyCount[i] >= frequencyCount[i + 1] && frequencyCount[i] > 0) {
22                frequencyCount[i]--;
23                deletions++;
24            }
25        }
26      
27        return deletions;
28    }
29}
30
1class Solution {
2public:
3    int minDeletions(string s) {
4        // Count frequency of each character (a-z)
5        vector<int> frequency(26, 0);
6        for (char& ch : s) {
7            frequency[ch - 'a']++;
8        }
9      
10        // Sort frequencies in descending order
11        sort(frequency.rbegin(), frequency.rend());
12      
13        // Track number of deletions needed
14        int deletions = 0;
15      
16        // Iterate through frequencies and ensure each is less than the previous
17        for (int i = 1; i < 26; i++) {
18            // While current frequency is greater than or equal to previous frequency
19            // and is not zero, decrement it
20            while (frequency[i] >= frequency[i - 1] && frequency[i] > 0) {
21                frequency[i]--;
22                deletions++;
23            }
24        }
25      
26        return deletions;
27    }
28};
29
1/**
2 * Finds the minimum number of character deletions needed to make each character frequency unique
3 * @param s - Input string
4 * @returns Minimum number of deletions required
5 */
6function minDeletions(s: string): number {
7    // Create a frequency map to count occurrences of each character
8    const frequencyMap: Record<string, number> = {};
9    for (const char of s) {
10        frequencyMap[char] = (frequencyMap[char] || 0) + 1;
11    }
12  
13    // Track total deletions needed
14    let deletionCount: number = 0;
15  
16    // Extract frequency values and sort them in ascending order
17    const frequencies: number[] = Object.values(frequencyMap);
18    frequencies.sort((a, b) => a - b);
19  
20    // Iterate through frequencies and reduce duplicates
21    for (let i = 1; i < frequencies.length; i++) {
22        // Keep reducing current frequency while it's positive and appears earlier in the array
23        while (frequencies[i] > 0 && i !== frequencies.indexOf(frequencies[i])) {
24            frequencies[i]--;
25            deletionCount++;
26        }
27    }
28  
29    return deletionCount;
30}
31

Time and Space Complexity

Time Complexity: O(n + |ฮฃ| ร— log |ฮฃ|)

The time complexity consists of two main parts:

  • O(n) for counting the frequency of each character in the string using Counter(s), where n is the length of the string
  • O(|ฮฃ| ร— log |ฮฃ|) for sorting the frequency values, where |ฮฃ| represents the number of unique characters (at most 26 for lowercase English letters)
  • The subsequent loop iterates through at most |ฮฃ| elements with O(1) operations each, contributing O(|ฮฃ|)

Since |ฮฃ| โ‰ค 26 is a constant in this problem, the dominant term is O(n) for large inputs, but the precise complexity is O(n + |ฮฃ| ร— log |ฮฃ|).

Space Complexity: O(|ฮฃ|)

The space complexity is determined by:

  • O(|ฮฃ|) for the Counter dictionary storing at most |ฮฃ| unique characters and their frequencies
  • O(|ฮฃ|) for the sorted list of frequency values
  • O(1) for the auxiliary variables (ans, pre)

The overall space complexity is O(|ฮฃ|), which equals O(26) = O(1) for this specific problem with lowercase English letters.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

Pitfall 1: Not Handling the Case When Frequency Reaches Zero

The Problem: A critical mistake is forgetting to handle what happens when previous_frequency reaches 0. Once we've used frequency levels down to 1, the next character cannot have any positive frequency without creating a duplicate. Many implementations incorrectly allow previous_frequency to become negative, which doesn't make sense for character counts.

Incorrect Implementation:

def minDeletions(self, s: str) -> int:
    char_frequency = Counter(s)
    deletions = 0
    previous_frequency = inf
  
    for current_freq in sorted(char_frequency.values(), reverse=True):
        if current_freq >= previous_frequency:
            deletions += current_freq - previous_frequency + 1
            previous_frequency -= 1  # This can go negative!
        else:
            previous_frequency = current_freq
  
    return deletions

Why It Fails: For input like "aaabbbccc" with frequencies [3, 3, 3]:

  • First 3: previous_frequency = 3
  • Second 3: deletions = 1, previous_frequency = 2
  • Third 3: deletions += 2, previous_frequency = 1
  • If there was a fourth character with frequency 3: previous_frequency would become 0, then -1, -2, etc.

The Solution: Always check if previous_frequency == 0 before processing. When it reaches 0, all remaining characters must be completely deleted:

if previous_frequency == 0:
    deletions += current_freq
elif current_freq >= previous_frequency:
    deletions += current_freq - previous_frequency + 1
    previous_frequency -= 1
else:
    previous_frequency = current_freq

Pitfall 2: Using previous_frequency = previous_frequency - 1 Without Bounds Checking

The Problem: When reducing previous_frequency, developers often write previous_frequency = previous_frequency - 1 without ensuring it doesn't go below 0.

Better Approach:

elif current_freq >= previous_frequency:
    deletions += current_freq - previous_frequency + 1
    previous_frequency = max(0, previous_frequency - 1)  # Ensure non-negative

This prevents logical errors and makes the zero-check more explicit in subsequent iterations.

Pitfall 3: Misunderstanding the Frequency Assignment Logic

The Problem: Some implementations try to find the "next available" frequency slot by checking all previously used frequencies. This overcomplicates the solution and often leads to errors.

Incorrect Approach:

used_frequencies = set()
for freq in sorted(char_frequency.values(), reverse=True):
    while freq > 0 and freq in used_frequencies:
        freq -= 1
        deletions += 1
    if freq > 0:
        used_frequencies.add(freq)

Why the Greedy Approach is Better: The sorted greedy approach with previous_frequency tracking is more efficient and cleaner. By processing frequencies in descending order and maintaining the constraint that each frequency must be less than the previous one, we automatically ensure uniqueness without needing to track all used frequencies.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

How many times is a tree node visited in a depth first search?


Recommended Readings

Want a Structured Path to Master System Design Too? Donโ€™t Miss This!

Load More