Facebook Pixel

1338. Reduce Array Size to The Half

Problem Description

You have an integer array arr. Your task is to select a set of distinct integers from this array and remove all occurrences of each selected integer from the array.

The goal is to find the minimum number of distinct integers you need to select such that after removing all their occurrences, at least half of the original array elements are removed.

For example, if you have an array [3,3,3,3,5,5,5,2,2,7] with 10 elements:

  • If you choose to remove all occurrences of 3 (4 elements) and all occurrences of 5 (3 elements), you remove 7 elements total
  • Since 7 ≥ 10/2, you've removed at least half the array
  • You selected 2 distinct integers (3 and 5), so the answer would be 2

The solution approach counts the frequency of each number in the array using a Counter. It then sorts these frequencies in descending order (using most_common()). By greedily selecting the most frequent numbers first and keeping track of how many elements have been removed (m), the algorithm continues adding numbers to the set until at least half of the array elements are removed (m * 2 >= len(arr)). This greedy approach ensures the minimum set size because removing the most frequent numbers first maximizes the number of elements removed per selection.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

To minimize the number of distinct integers we need to select, we want to maximize the impact of each selection. Since removing a number removes all its occurrences, we should prioritize removing the numbers that appear most frequently in the array.

Think of it this way: if we need to remove at least half of the array elements, and we can only remove complete sets of identical numbers, then we want each selection to contribute as much as possible toward our goal.

For instance, if we have a number that appears 100 times and another that appears just once, removing the frequent number gets us much closer to the "half removed" target with just one selection, whereas we'd need to remove many infrequent numbers to achieve the same effect.

This naturally leads to a greedy strategy: sort the numbers by their frequency in descending order and keep selecting the most frequent ones until we've removed at least half of the array. This is optimal because:

  1. Each selection removes all occurrences of that number
  2. Selecting high-frequency numbers first means fewer selections needed overall
  3. We stop as soon as we reach the threshold of removing half the elements

The key insight is that the actual values of the numbers don't matter - only their frequencies do. That's why we can use a Counter to get frequencies, sort them in descending order, and greedily pick from the top until we've removed enough elements.

Learn more about Greedy, Sorting and Heap (Priority Queue) patterns.

Solution Approach

The implementation follows a counting and sorting strategy to solve this problem efficiently.

Step 1: Count Frequencies

cnt = Counter(arr)

We use Python's Counter from the collections module to create a frequency map. This gives us a dictionary where keys are the unique numbers in arr and values are their occurrence counts.

Step 2: Sort by Frequency

for _, v in cnt.most_common():

The most_common() method returns a list of (element, count) pairs sorted by count in descending order. We iterate through this list, but we only need the count value v (not the actual number), so we use _ to ignore the element.

Step 3: Greedy Selection

ans = m = 0
for _, v in cnt.most_common():
    m += v
    ans += 1
    if m * 2 >= len(arr):
        break

We maintain two variables:

  • m: tracks the total number of elements removed so far
  • ans: counts how many distinct numbers we've selected

For each iteration:

  1. Add the current frequency v to m (removing all occurrences of this number)
  2. Increment ans by 1 (we've selected one more distinct number)
  3. Check if m * 2 >= len(arr), which is equivalent to checking if m >= len(arr) / 2
  4. If we've removed at least half the elements, we break and return ans

Time Complexity: O(n log n) where n is the length of the array, dominated by the sorting operation in most_common().

Space Complexity: O(n) for storing the frequency counter.

The algorithm is optimal because it always selects the minimum number of distinct integers needed by greedily choosing the most frequent ones first.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with the array [2,2,1,3,3,3,4]:

Step 1: Count Frequencies Using Counter(arr), we get:

  • 1 appears 1 time
  • 2 appears 2 times
  • 3 appears 3 times
  • 4 appears 1 time

Step 2: Sort by Frequency (Descending) After calling most_common(), we get the order:

  1. 3 with frequency 3
  2. 2 with frequency 2
  3. 1 with frequency 1 (could be either order with 4)
  4. 4 with frequency 1

Step 3: Greedy Selection We need to remove at least ⌈7/2⌉ = 4 elements.

Let's trace through the iterations:

  • Iteration 1: Select number 3

    • Add frequency 3 to m: m = 0 + 3 = 3
    • Increment ans: ans = 1
    • Check: Is 3 * 2 >= 7? No (6 < 7), continue
  • Iteration 2: Select number 2

    • Add frequency 2 to m: m = 3 + 2 = 5
    • Increment ans: ans = 2
    • Check: Is 5 * 2 >= 7? Yes (10 >= 7), break!

Result: We need to select 2 distinct integers (numbers 3 and 2). By removing all occurrences of these two numbers, we remove 5 elements from the 7-element array, which is more than half.

The greedy approach works because selecting 3 first (the most frequent) removes 3 elements with just one selection, whereas if we had started with less frequent numbers like 1 or 4, we would need more selections to reach our target.

Solution Implementation

1from typing import List
2from collections import Counter
3
4class Solution:
5    def minSetSize(self, arr: List[int]) -> int:
6        # Count frequency of each element in the array
7        frequency_counter = Counter(arr)
8      
9        # Initialize variables to track the answer and removed elements count
10        min_set_size = 0
11        removed_count = 0
12      
13        # Iterate through elements sorted by frequency in descending order
14        for element, frequency in frequency_counter.most_common():
15            # Add current element's frequency to the removed count
16            removed_count += frequency
17            # Increment the size of our removal set
18            min_set_size += 1
19          
20            # Check if we've removed at least half of the array elements
21            if removed_count * 2 >= len(arr):
22                break
23      
24        return min_set_size
25
1class Solution {
2    public int minSetSize(int[] arr) {
3        // Find the maximum value in the array to determine the size of frequency array
4        int maxValue = 0;
5        for (int num : arr) {
6            maxValue = Math.max(maxValue, num);
7        }
8      
9        // Create frequency array to count occurrences of each number
10        int[] frequency = new int[maxValue + 1];
11        for (int num : arr) {
12            frequency[num]++;
13        }
14      
15        // Sort the frequency array in ascending order
16        Arrays.sort(frequency);
17      
18        // Greedily select elements with highest frequencies
19        int setSize = 0;  // Number of distinct elements selected
20        int removedCount = 0;  // Total count of elements removed
21      
22        // Iterate from highest frequency to lowest (array is sorted ascending, so traverse backwards)
23        for (int i = maxValue; ; i--) {
24            if (frequency[i] > 0) {
25                removedCount += frequency[i];
26                setSize++;
27              
28                // Check if we've removed at least half of the array elements
29                if (removedCount * 2 >= arr.length) {
30                    return setSize;
31                }
32            }
33        }
34    }
35}
36
1class Solution {
2public:
3    int minSetSize(vector<int>& arr) {
4        // Find the maximum value in the array to determine array size
5        int maxValue = ranges::max(arr);
6      
7        // Create frequency array to count occurrences of each element
8        // Size is maxValue + 1 to handle indices from 0 to maxValue
9        int frequency[maxValue + 1];
10        memset(frequency, 0, sizeof(frequency));
11      
12        // Count the frequency of each element in the array
13        for (int& element : arr) {
14            ++frequency[element];
15        }
16      
17        // Sort frequencies in descending order to greedily pick elements
18        // with highest frequencies first
19        sort(frequency, frequency + maxValue + 1, greater<int>());
20      
21        // Track minimum set size and cumulative elements removed
22        int minSetCount = 0;
23        int removedElements = 0;
24      
25        // Greedily select elements with highest frequencies
26        for (int& freq : frequency) {
27            // Skip zero frequencies (elements not present in original array)
28            if (freq == 0) {
29                continue;
30            }
31          
32            // Add current element's frequency to removed count
33            removedElements += freq;
34            ++minSetCount;
35          
36            // Check if we've removed at least half of the array
37            if (removedElements * 2 >= arr.size()) {
38                break;
39            }
40        }
41      
42        return minSetCount;
43    }
44};
45
1/**
2 * Finds the minimum size of the set whose removal makes array size at most half
3 * @param arr - Input array of integers
4 * @returns Minimum number of unique elements to remove
5 */
6function minSetSize(arr: number[]): number {
7    // Create a frequency map to count occurrences of each element
8    const frequencyMap = new Map<number, number>();
9  
10    // Count the frequency of each element in the array
11    for (const element of arr) {
12        frequencyMap.set(element, (frequencyMap.get(element) ?? 0) + 1);
13    }
14  
15    // Initialize result counter and count of removed elements
16    let minSetCount = 0;
17    let removedElementsCount = 0;
18  
19    // Sort frequencies in descending order and iterate through them
20    // Remove elements with highest frequency first (greedy approach)
21    for (const frequency of Array.from(frequencyMap.values()).sort((a, b) => b - a)) {
22        // Add current frequency to removed elements count
23        removedElementsCount += frequency;
24        // Increment the set size
25        minSetCount++;
26      
27        // Check if we've removed at least half of the array elements
28        if (removedElementsCount * 2 >= arr.length) {
29            break;
30        }
31    }
32  
33    return minSetCount;
34}
35

Time and Space Complexity

The time complexity is O(n log n), where n is the length of the array arr. This complexity arises from:

  • Creating the Counter takes O(n) time to iterate through all elements
  • The most_common() method internally sorts the counter items by frequency, which takes O(k log k) time where k is the number of unique elements. In the worst case, k = n (all elements are unique), giving us O(n log n)
  • The iteration through the sorted items takes at most O(k) = O(n) time

The dominant operation is the sorting step, resulting in overall time complexity of O(n log n).

The space complexity is O(n). This comes from:

  • The Counter object stores at most n unique elements (in the worst case when all elements are different), requiring O(n) space
  • The most_common() method creates a sorted list of the counter items, which also requires O(k) = O(n) space in the worst case
  • Other variables (ans, m, loop variables) use O(1) space

Therefore, the total space complexity is O(n).

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Integer Division vs Float Division Confusion

A common mistake is using integer division when checking if half the elements are removed:

# WRONG - This can cause incorrect results due to integer division
if removed_count >= len(arr) // 2:  
    break

Why it's wrong: For an array of length 5, len(arr) // 2 equals 2, but we actually need to remove at least 3 elements (which is ⌈5/2⌉ = 3). This would incorrectly return after removing only 2 elements.

Correct approach: Use multiplication to avoid division issues:

if removed_count * 2 >= len(arr):
    break

2. Modifying the Array During Iteration

Some might try to actually remove elements from the original array:

# WRONG - Don't modify the array
for num in selected_numbers:
    arr = [x for x in arr if x != num]  # This is unnecessary and inefficient

Why it's wrong: The problem only asks for the count of distinct integers to select, not to actually modify the array. This approach is inefficient (O(n²)) and unnecessary.

Correct approach: Work with frequencies only, never modify the original array.

3. Not Handling Edge Cases

Failing to consider edge cases like:

  • Empty array (though typically constraints ensure non-empty)
  • Array with all identical elements
  • Array with all unique elements

Solution: The current approach naturally handles these cases:

# For array [1,1,1,1], only 1 distinct integer needed
# For array [1,2,3,4], need 2 distinct integers to remove half

4. Sorting in Wrong Order

Sorting frequencies in ascending order instead of descending:

# WRONG - This gives suboptimal result
for _, v in sorted(frequency_counter.values()):  # Ascending order
    # This would select less frequent elements first

Why it's wrong: Selecting less frequent elements first requires more distinct integers to reach half the array size.

Correct approach: Always use most_common() or sort in descending order to ensure the greedy approach works optimally.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Is the following code DFS or BFS?

void search(Node root) {
  if (!root) return;
  visit(root);
  root.visited = true;
  for (Node node in root.adjacent) {
    if (!node.visited) {
      search(node);
    }
  }
}

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More