1338. Reduce Array Size to The Half
Problem Description
You have an integer array arr. Your task is to select a set of distinct integers from this array and remove all occurrences of each selected integer from the array.
The goal is to find the minimum number of distinct integers you need to select such that after removing all their occurrences, at least half of the original array elements are removed.
For example, if you have an array [3,3,3,3,5,5,5,2,2,7] with 10 elements:
- If you choose to remove all occurrences of 3(4 elements) and all occurrences of5(3 elements), you remove 7 elements total
- Since 7 ≥ 10/2, you've removed at least half the array
- You selected 2 distinct integers (3and5), so the answer would be 2
The solution approach counts the frequency of each number in the array using a Counter. It then sorts these frequencies in descending order (using most_common()). By greedily selecting the most frequent numbers first and keeping track of how many elements have been removed (m), the algorithm continues adding numbers to the set until at least half of the array elements are removed (m * 2 >= len(arr)). This greedy approach ensures the minimum set size because removing the most frequent numbers first maximizes the number of elements removed per selection.
Intuition
To minimize the number of distinct integers we need to select, we want to maximize the impact of each selection. Since removing a number removes all its occurrences, we should prioritize removing the numbers that appear most frequently in the array.
Think of it this way: if we need to remove at least half of the array elements, and we can only remove complete sets of identical numbers, then we want each selection to contribute as much as possible toward our goal.
For instance, if we have a number that appears 100 times and another that appears just once, removing the frequent number gets us much closer to the "half removed" target with just one selection, whereas we'd need to remove many infrequent numbers to achieve the same effect.
This naturally leads to a greedy strategy: sort the numbers by their frequency in descending order and keep selecting the most frequent ones until we've removed at least half of the array. This is optimal because:
- Each selection removes all occurrences of that number
- Selecting high-frequency numbers first means fewer selections needed overall
- We stop as soon as we reach the threshold of removing half the elements
The key insight is that the actual values of the numbers don't matter - only their frequencies do. That's why we can use a Counter to get frequencies, sort them in descending order, and greedily pick from the top until we've removed enough elements.
Learn more about Greedy, Sorting and Heap (Priority Queue) patterns.
Solution Approach
The implementation follows a counting and sorting strategy to solve this problem efficiently.
Step 1: Count Frequencies
cnt = Counter(arr)
We use Python's Counter from the collections module to create a frequency map. This gives us a dictionary where keys are the unique numbers in arr and values are their occurrence counts.
Step 2: Sort by Frequency
for _, v in cnt.most_common():
The most_common() method returns a list of (element, count) pairs sorted by count in descending order. We iterate through this list, but we only need the count value v (not the actual number), so we use _ to ignore the element.
Step 3: Greedy Selection
ans = m = 0
for _, v in cnt.most_common():
    m += v
    ans += 1
    if m * 2 >= len(arr):
        break
We maintain two variables:
- m: tracks the total number of elements removed so far
- ans: counts how many distinct numbers we've selected
For each iteration:
- Add the current frequency vtom(removing all occurrences of this number)
- Increment ansby 1 (we've selected one more distinct number)
- Check if m * 2 >= len(arr), which is equivalent to checking ifm >= len(arr) / 2
- If we've removed at least half the elements, we break and return ans
Time Complexity: O(n log n) where n is the length of the array, dominated by the sorting operation in most_common().
Space Complexity: O(n) for storing the frequency counter.
The algorithm is optimal because it always selects the minimum number of distinct integers needed by greedily choosing the most frequent ones first.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with the array [2,2,1,3,3,3,4]:
Step 1: Count Frequencies
Using Counter(arr), we get:
- 1appears 1 time
- 2appears 2 times
- 3appears 3 times
- 4appears 1 time
Step 2: Sort by Frequency (Descending)
After calling most_common(), we get the order:
- 3with frequency 3
- 2with frequency 2
- 1with frequency 1 (could be either order with- 4)
- 4with frequency 1
Step 3: Greedy Selection
We need to remove at least ⌈7/2⌉ = 4 elements.
Let's trace through the iterations:
- 
Iteration 1: Select number 3- Add frequency 3 to m:m = 0 + 3 = 3
- Increment ans:ans = 1
- Check: Is 3 * 2 >= 7? No (6 < 7), continue
 
- Add frequency 3 to 
- 
Iteration 2: Select number 2- Add frequency 2 to m:m = 3 + 2 = 5
- Increment ans:ans = 2
- Check: Is 5 * 2 >= 7? Yes (10 >= 7), break!
 
- Add frequency 2 to 
Result: We need to select 2 distinct integers (numbers 3 and 2). By removing all occurrences of these two numbers, we remove 5 elements from the 7-element array, which is more than half.
The greedy approach works because selecting 3 first (the most frequent) removes 3 elements with just one selection, whereas if we had started with less frequent numbers like 1 or 4, we would need more selections to reach our target.
Solution Implementation
1from typing import List
2from collections import Counter
3
4class Solution:
5    def minSetSize(self, arr: List[int]) -> int:
6        # Count frequency of each element in the array
7        frequency_counter = Counter(arr)
8      
9        # Initialize variables to track the answer and removed elements count
10        min_set_size = 0
11        removed_count = 0
12      
13        # Iterate through elements sorted by frequency in descending order
14        for element, frequency in frequency_counter.most_common():
15            # Add current element's frequency to the removed count
16            removed_count += frequency
17            # Increment the size of our removal set
18            min_set_size += 1
19          
20            # Check if we've removed at least half of the array elements
21            if removed_count * 2 >= len(arr):
22                break
23      
24        return min_set_size
251class Solution {
2    public int minSetSize(int[] arr) {
3        // Find the maximum value in the array to determine the size of frequency array
4        int maxValue = 0;
5        for (int num : arr) {
6            maxValue = Math.max(maxValue, num);
7        }
8      
9        // Create frequency array to count occurrences of each number
10        int[] frequency = new int[maxValue + 1];
11        for (int num : arr) {
12            frequency[num]++;
13        }
14      
15        // Sort the frequency array in ascending order
16        Arrays.sort(frequency);
17      
18        // Greedily select elements with highest frequencies
19        int setSize = 0;  // Number of distinct elements selected
20        int removedCount = 0;  // Total count of elements removed
21      
22        // Iterate from highest frequency to lowest (array is sorted ascending, so traverse backwards)
23        for (int i = maxValue; ; i--) {
24            if (frequency[i] > 0) {
25                removedCount += frequency[i];
26                setSize++;
27              
28                // Check if we've removed at least half of the array elements
29                if (removedCount * 2 >= arr.length) {
30                    return setSize;
31                }
32            }
33        }
34    }
35}
361class Solution {
2public:
3    int minSetSize(vector<int>& arr) {
4        // Find the maximum value in the array to determine array size
5        int maxValue = ranges::max(arr);
6      
7        // Create frequency array to count occurrences of each element
8        // Size is maxValue + 1 to handle indices from 0 to maxValue
9        int frequency[maxValue + 1];
10        memset(frequency, 0, sizeof(frequency));
11      
12        // Count the frequency of each element in the array
13        for (int& element : arr) {
14            ++frequency[element];
15        }
16      
17        // Sort frequencies in descending order to greedily pick elements
18        // with highest frequencies first
19        sort(frequency, frequency + maxValue + 1, greater<int>());
20      
21        // Track minimum set size and cumulative elements removed
22        int minSetCount = 0;
23        int removedElements = 0;
24      
25        // Greedily select elements with highest frequencies
26        for (int& freq : frequency) {
27            // Skip zero frequencies (elements not present in original array)
28            if (freq == 0) {
29                continue;
30            }
31          
32            // Add current element's frequency to removed count
33            removedElements += freq;
34            ++minSetCount;
35          
36            // Check if we've removed at least half of the array
37            if (removedElements * 2 >= arr.size()) {
38                break;
39            }
40        }
41      
42        return minSetCount;
43    }
44};
451/**
2 * Finds the minimum size of the set whose removal makes array size at most half
3 * @param arr - Input array of integers
4 * @returns Minimum number of unique elements to remove
5 */
6function minSetSize(arr: number[]): number {
7    // Create a frequency map to count occurrences of each element
8    const frequencyMap = new Map<number, number>();
9  
10    // Count the frequency of each element in the array
11    for (const element of arr) {
12        frequencyMap.set(element, (frequencyMap.get(element) ?? 0) + 1);
13    }
14  
15    // Initialize result counter and count of removed elements
16    let minSetCount = 0;
17    let removedElementsCount = 0;
18  
19    // Sort frequencies in descending order and iterate through them
20    // Remove elements with highest frequency first (greedy approach)
21    for (const frequency of Array.from(frequencyMap.values()).sort((a, b) => b - a)) {
22        // Add current frequency to removed elements count
23        removedElementsCount += frequency;
24        // Increment the set size
25        minSetCount++;
26      
27        // Check if we've removed at least half of the array elements
28        if (removedElementsCount * 2 >= arr.length) {
29            break;
30        }
31    }
32  
33    return minSetCount;
34}
35Time and Space Complexity
The time complexity is O(n log n), where n is the length of the array arr. This complexity arises from:
- Creating the Counter takes O(n)time to iterate through all elements
- The most_common()method internally sorts the counter items by frequency, which takesO(k log k)time wherekis the number of unique elements. In the worst case,k = n(all elements are unique), giving usO(n log n)
- The iteration through the sorted items takes at most O(k)=O(n)time
The dominant operation is the sorting step, resulting in overall time complexity of O(n log n).
The space complexity is O(n). This comes from:
- The Counter object stores at most nunique elements (in the worst case when all elements are different), requiringO(n)space
- The most_common()method creates a sorted list of the counter items, which also requiresO(k)=O(n)space in the worst case
- Other variables (ans,m, loop variables) useO(1)space
Therefore, the total space complexity is O(n).
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Integer Division vs Float Division Confusion
A common mistake is using integer division when checking if half the elements are removed:
# WRONG - This can cause incorrect results due to integer division
if removed_count >= len(arr) // 2:  
    break
Why it's wrong: For an array of length 5, len(arr) // 2 equals 2, but we actually need to remove at least 3 elements (which is ⌈5/2⌉ = 3). This would incorrectly return after removing only 2 elements.
Correct approach: Use multiplication to avoid division issues:
if removed_count * 2 >= len(arr):
    break
2. Modifying the Array During Iteration
Some might try to actually remove elements from the original array:
# WRONG - Don't modify the array for num in selected_numbers: arr = [x for x in arr if x != num] # This is unnecessary and inefficient
Why it's wrong: The problem only asks for the count of distinct integers to select, not to actually modify the array. This approach is inefficient (O(n²)) and unnecessary.
Correct approach: Work with frequencies only, never modify the original array.
3. Not Handling Edge Cases
Failing to consider edge cases like:
- Empty array (though typically constraints ensure non-empty)
- Array with all identical elements
- Array with all unique elements
Solution: The current approach naturally handles these cases:
# For array [1,1,1,1], only 1 distinct integer needed # For array [1,2,3,4], need 2 distinct integers to remove half
4. Sorting in Wrong Order
Sorting frequencies in ascending order instead of descending:
# WRONG - This gives suboptimal result
for _, v in sorted(frequency_counter.values()):  # Ascending order
    # This would select less frequent elements first
Why it's wrong: Selecting less frequent elements first requires more distinct integers to reach half the array size.
Correct approach: Always use most_common() or sort in descending order to ensure the greedy approach works optimally.
Consider the classic dynamic programming of longest increasing subsequence:
Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.
For example, the length of LIS for [50, 3, 10, 7, 40, 80] is 4 and LIS is
[3, 7, 40, 80].
What is the recurrence relation?
Recommended Readings
Greedy Introduction div class responsive iframe iframe src https www youtube com embed WTslqPbj7I title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div When do we use greedy Greedy algorithms tend to solve optimization problems Typically they will ask you to calculate the max min of some value Commonly you may see this phrased in the problem as max min longest shortest largest smallest etc These keywords can be identified by just scanning
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
https assets algo monster cover_photos heap svg Priority Queue and Heap What is the relationship between priority queue and heap Priority Queue is an Abstract Data Type and Heap is the concrete data structure we use to implement a priority queue Priority Queue A priority queue is a data structure
Want a Structured Path to Master System Design Too? Don’t Miss This!