1338. Reduce Array Size to The Half
Problem Description
You have an integer array arr
. Your task is to select a set of distinct integers from this array and remove all occurrences of each selected integer from the array.
The goal is to find the minimum number of distinct integers you need to select such that after removing all their occurrences, at least half of the original array elements are removed.
For example, if you have an array [3,3,3,3,5,5,5,2,2,7]
with 10 elements:
- If you choose to remove all occurrences of
3
(4 elements) and all occurrences of5
(3 elements), you remove 7 elements total - Since 7 ≥ 10/2, you've removed at least half the array
- You selected 2 distinct integers (
3
and5
), so the answer would be 2
The solution approach counts the frequency of each number in the array using a Counter
. It then sorts these frequencies in descending order (using most_common()
). By greedily selecting the most frequent numbers first and keeping track of how many elements have been removed (m
), the algorithm continues adding numbers to the set until at least half of the array elements are removed (m * 2 >= len(arr)
). This greedy approach ensures the minimum set size because removing the most frequent numbers first maximizes the number of elements removed per selection.
Intuition
To minimize the number of distinct integers we need to select, we want to maximize the impact of each selection. Since removing a number removes all its occurrences, we should prioritize removing the numbers that appear most frequently in the array.
Think of it this way: if we need to remove at least half of the array elements, and we can only remove complete sets of identical numbers, then we want each selection to contribute as much as possible toward our goal.
For instance, if we have a number that appears 100 times and another that appears just once, removing the frequent number gets us much closer to the "half removed" target with just one selection, whereas we'd need to remove many infrequent numbers to achieve the same effect.
This naturally leads to a greedy strategy: sort the numbers by their frequency in descending order and keep selecting the most frequent ones until we've removed at least half of the array. This is optimal because:
- Each selection removes all occurrences of that number
- Selecting high-frequency numbers first means fewer selections needed overall
- We stop as soon as we reach the threshold of removing half the elements
The key insight is that the actual values of the numbers don't matter - only their frequencies do. That's why we can use a Counter
to get frequencies, sort them in descending order, and greedily pick from the top until we've removed enough elements.
Learn more about Greedy, Sorting and Heap (Priority Queue) patterns.
Solution Approach
The implementation follows a counting and sorting strategy to solve this problem efficiently.
Step 1: Count Frequencies
cnt = Counter(arr)
We use Python's Counter
from the collections module to create a frequency map. This gives us a dictionary where keys are the unique numbers in arr
and values are their occurrence counts.
Step 2: Sort by Frequency
for _, v in cnt.most_common():
The most_common()
method returns a list of (element, count)
pairs sorted by count in descending order. We iterate through this list, but we only need the count value v
(not the actual number), so we use _
to ignore the element.
Step 3: Greedy Selection
ans = m = 0
for _, v in cnt.most_common():
m += v
ans += 1
if m * 2 >= len(arr):
break
We maintain two variables:
m
: tracks the total number of elements removed so farans
: counts how many distinct numbers we've selected
For each iteration:
- Add the current frequency
v
tom
(removing all occurrences of this number) - Increment
ans
by 1 (we've selected one more distinct number) - Check if
m * 2 >= len(arr)
, which is equivalent to checking ifm >= len(arr) / 2
- If we've removed at least half the elements, we break and return
ans
Time Complexity: O(n log n)
where n
is the length of the array, dominated by the sorting operation in most_common()
.
Space Complexity: O(n)
for storing the frequency counter.
The algorithm is optimal because it always selects the minimum number of distinct integers needed by greedily choosing the most frequent ones first.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with the array [2,2,1,3,3,3,4]
:
Step 1: Count Frequencies
Using Counter(arr)
, we get:
1
appears 1 time2
appears 2 times3
appears 3 times4
appears 1 time
Step 2: Sort by Frequency (Descending)
After calling most_common()
, we get the order:
3
with frequency 32
with frequency 21
with frequency 1 (could be either order with4
)4
with frequency 1
Step 3: Greedy Selection
We need to remove at least ⌈7/2⌉ = 4
elements.
Let's trace through the iterations:
-
Iteration 1: Select number
3
- Add frequency 3 to
m
:m = 0 + 3 = 3
- Increment
ans
:ans = 1
- Check: Is
3 * 2 >= 7
? No (6 < 7), continue
- Add frequency 3 to
-
Iteration 2: Select number
2
- Add frequency 2 to
m
:m = 3 + 2 = 5
- Increment
ans
:ans = 2
- Check: Is
5 * 2 >= 7
? Yes (10 >= 7), break!
- Add frequency 2 to
Result: We need to select 2 distinct integers (numbers 3
and 2
). By removing all occurrences of these two numbers, we remove 5 elements from the 7-element array, which is more than half.
The greedy approach works because selecting 3
first (the most frequent) removes 3 elements with just one selection, whereas if we had started with less frequent numbers like 1
or 4
, we would need more selections to reach our target.
Solution Implementation
1from typing import List
2from collections import Counter
3
4class Solution:
5 def minSetSize(self, arr: List[int]) -> int:
6 # Count frequency of each element in the array
7 frequency_counter = Counter(arr)
8
9 # Initialize variables to track the answer and removed elements count
10 min_set_size = 0
11 removed_count = 0
12
13 # Iterate through elements sorted by frequency in descending order
14 for element, frequency in frequency_counter.most_common():
15 # Add current element's frequency to the removed count
16 removed_count += frequency
17 # Increment the size of our removal set
18 min_set_size += 1
19
20 # Check if we've removed at least half of the array elements
21 if removed_count * 2 >= len(arr):
22 break
23
24 return min_set_size
25
1class Solution {
2 public int minSetSize(int[] arr) {
3 // Find the maximum value in the array to determine the size of frequency array
4 int maxValue = 0;
5 for (int num : arr) {
6 maxValue = Math.max(maxValue, num);
7 }
8
9 // Create frequency array to count occurrences of each number
10 int[] frequency = new int[maxValue + 1];
11 for (int num : arr) {
12 frequency[num]++;
13 }
14
15 // Sort the frequency array in ascending order
16 Arrays.sort(frequency);
17
18 // Greedily select elements with highest frequencies
19 int setSize = 0; // Number of distinct elements selected
20 int removedCount = 0; // Total count of elements removed
21
22 // Iterate from highest frequency to lowest (array is sorted ascending, so traverse backwards)
23 for (int i = maxValue; ; i--) {
24 if (frequency[i] > 0) {
25 removedCount += frequency[i];
26 setSize++;
27
28 // Check if we've removed at least half of the array elements
29 if (removedCount * 2 >= arr.length) {
30 return setSize;
31 }
32 }
33 }
34 }
35}
36
1class Solution {
2public:
3 int minSetSize(vector<int>& arr) {
4 // Find the maximum value in the array to determine array size
5 int maxValue = ranges::max(arr);
6
7 // Create frequency array to count occurrences of each element
8 // Size is maxValue + 1 to handle indices from 0 to maxValue
9 int frequency[maxValue + 1];
10 memset(frequency, 0, sizeof(frequency));
11
12 // Count the frequency of each element in the array
13 for (int& element : arr) {
14 ++frequency[element];
15 }
16
17 // Sort frequencies in descending order to greedily pick elements
18 // with highest frequencies first
19 sort(frequency, frequency + maxValue + 1, greater<int>());
20
21 // Track minimum set size and cumulative elements removed
22 int minSetCount = 0;
23 int removedElements = 0;
24
25 // Greedily select elements with highest frequencies
26 for (int& freq : frequency) {
27 // Skip zero frequencies (elements not present in original array)
28 if (freq == 0) {
29 continue;
30 }
31
32 // Add current element's frequency to removed count
33 removedElements += freq;
34 ++minSetCount;
35
36 // Check if we've removed at least half of the array
37 if (removedElements * 2 >= arr.size()) {
38 break;
39 }
40 }
41
42 return minSetCount;
43 }
44};
45
1/**
2 * Finds the minimum size of the set whose removal makes array size at most half
3 * @param arr - Input array of integers
4 * @returns Minimum number of unique elements to remove
5 */
6function minSetSize(arr: number[]): number {
7 // Create a frequency map to count occurrences of each element
8 const frequencyMap = new Map<number, number>();
9
10 // Count the frequency of each element in the array
11 for (const element of arr) {
12 frequencyMap.set(element, (frequencyMap.get(element) ?? 0) + 1);
13 }
14
15 // Initialize result counter and count of removed elements
16 let minSetCount = 0;
17 let removedElementsCount = 0;
18
19 // Sort frequencies in descending order and iterate through them
20 // Remove elements with highest frequency first (greedy approach)
21 for (const frequency of Array.from(frequencyMap.values()).sort((a, b) => b - a)) {
22 // Add current frequency to removed elements count
23 removedElementsCount += frequency;
24 // Increment the set size
25 minSetCount++;
26
27 // Check if we've removed at least half of the array elements
28 if (removedElementsCount * 2 >= arr.length) {
29 break;
30 }
31 }
32
33 return minSetCount;
34}
35
Time and Space Complexity
The time complexity is O(n log n)
, where n
is the length of the array arr
. This complexity arises from:
- Creating the Counter takes
O(n)
time to iterate through all elements - The
most_common()
method internally sorts the counter items by frequency, which takesO(k log k)
time wherek
is the number of unique elements. In the worst case,k = n
(all elements are unique), giving usO(n log n)
- The iteration through the sorted items takes at most
O(k)
=O(n)
time
The dominant operation is the sorting step, resulting in overall time complexity of O(n log n)
.
The space complexity is O(n)
. This comes from:
- The Counter object stores at most
n
unique elements (in the worst case when all elements are different), requiringO(n)
space - The
most_common()
method creates a sorted list of the counter items, which also requiresO(k)
=O(n)
space in the worst case - Other variables (
ans
,m
, loop variables) useO(1)
space
Therefore, the total space complexity is O(n)
.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Integer Division vs Float Division Confusion
A common mistake is using integer division when checking if half the elements are removed:
# WRONG - This can cause incorrect results due to integer division
if removed_count >= len(arr) // 2:
break
Why it's wrong: For an array of length 5, len(arr) // 2
equals 2, but we actually need to remove at least 3 elements (which is ⌈5/2⌉ = 3). This would incorrectly return after removing only 2 elements.
Correct approach: Use multiplication to avoid division issues:
if removed_count * 2 >= len(arr):
break
2. Modifying the Array During Iteration
Some might try to actually remove elements from the original array:
# WRONG - Don't modify the array for num in selected_numbers: arr = [x for x in arr if x != num] # This is unnecessary and inefficient
Why it's wrong: The problem only asks for the count of distinct integers to select, not to actually modify the array. This approach is inefficient (O(n²)) and unnecessary.
Correct approach: Work with frequencies only, never modify the original array.
3. Not Handling Edge Cases
Failing to consider edge cases like:
- Empty array (though typically constraints ensure non-empty)
- Array with all identical elements
- Array with all unique elements
Solution: The current approach naturally handles these cases:
# For array [1,1,1,1], only 1 distinct integer needed # For array [1,2,3,4], need 2 distinct integers to remove half
4. Sorting in Wrong Order
Sorting frequencies in ascending order instead of descending:
# WRONG - This gives suboptimal result
for _, v in sorted(frequency_counter.values()): # Ascending order
# This would select less frequent elements first
Why it's wrong: Selecting less frequent elements first requires more distinct integers to reach half the array size.
Correct approach: Always use most_common()
or sort in descending order to ensure the greedy approach works optimally.
Is the following code DFS or BFS?
void search(Node root) { if (!root) return; visit(root); root.visited = true; for (Node node in root.adjacent) { if (!node.visited) { search(node); } } }
Recommended Readings
Greedy Introduction div class responsive iframe iframe src https www youtube com embed WTslqPbj7I title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div When do we use greedy Greedy algorithms tend to solve optimization problems Typically they will ask you to calculate the max min of some value Commonly you may see this phrased in the problem as max min longest shortest largest smallest etc These keywords can be identified by just scanning
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
https assets algo monster cover_photos heap svg Priority Queue and Heap What is the relationship between priority queue and heap Priority Queue is an Abstract Data Type and Heap is the concrete data structure we use to implement a priority queue Priority Queue A priority queue is a data structure
Want a Structured Path to Master System Design Too? Don’t Miss This!