1338. Reduce Array Size to The Half

Problem Description

The problem is about reducing the size of an integer array by removing some subsets of integers to make sure at least half of the elements in the original array are removed. The task is to find the minimum number of distinct integers needed for these subsets such that once their occurrences are removed, at least half of the array elements are gone.

To clarify with an example, if we have an array [3,3,3,3,5,5,5,2,2,7], a possible solution would be choosing the subset {3,5} which contains 2 integers. Removing all the instances of 3 and 5 from the array would remove 3 + 3 = 6 elements from the 10-element array, thereby removing more than half of the array, which satisfies the condition.


The intuition behind the solution approach involves the following steps:

  1. Identify that we want to maximize the number of elements removed by picking the least number of distinct integers.
  2. Understand that to remove the most numbers, we should start by removing the integers that appear most frequently.
  3. Use a frequency counter to count the occurrences of each integer in the array.
  4. Sort these integers by their frequencies in descending order, so we can remove the integers with higher frequencies first.
  5. Start removing integers from the sorted list one by one, adding their frequencies to a cumulative sum (m in the code).
  6. Each time an integer is added to the set (the subset we're removing), increase the count (ans in the code) that represents the size of our set.
  7. Once we've removed enough numbers such that the cumulative sum is at least half of the original array's size, we've found our minimum set size and can stop.

The provided Python code uses a Counter from the collections module to tally the integer frequencies and most_common() method to sort them by frequency in descending order. It iterates through these frequencies while keeping a running total and a count until at least half the array size is reached.

Learn more about Greedy, Sorting and Heap (Priority Queue) patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

What is the worst case running time for finding an element in a binary search tree(not necessarily balanced) of size n?

Solution Approach

The solution uses a hash map and sorting technique to solve the problem efficiently. Here's a breakdown of the implementation steps referring to the Python code provided:

  1. Hash Map (Counter):

    • Use the Python Counter from the collections module to create a hash map that counts the occurrences of each integer in the input array arr. This operation has a complexity of O(n), where n is the size of the array.
    1cnt = Counter(arr)
  2. Sorting:

    • The most_common() function is called on the Counter object which under the hood performs a sort operation, arranging the counted integers in descending order based on their frequency. The complexity of this operation is O(u log u), where u is the number of unique elements in arr.
    1for _, v in cnt.most_common():
  3. Iterative Accumulation:

    • Initialize two variables, ans to count the number of elements in the required set and m to keep track of the cumulative frequency of elements removed from the array arr.
    1ans = m = 0
    • Iteratively add the frequency v of the most common element to m. Increment ans by 1 to reflect that an element is added to the set.
    1m += v
    2ans += 1
    • Check if the cumulative frequency m is at least half the length of the original array. If so, break out of the loop since the requirement of removing at least half of the array elements is met.
    1if m * 2 >= len(arr):
    2    break
  4. Return the Result:

    • After the loop terminates, return the value of ans, which represents the minimum size of the set to remove at least half of the array elements.
    1return ans

This approach is efficient as it targets the frequency of elements which is a key insight for achieving the task with a minimal set size. The heavy-lifting operations are counting and sorting which are both done optimally using Python's built-in data structures and algorithms.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

What is the worst case running time for finding an element in a binary search tree(not necessarily balanced) of size n?

Example Walkthrough

Let's use a small example to illustrate the solution approach. Suppose we have an array arr = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4].

  1. Hash Map (Counter):

    • First, we'll count the occurrences of each integer:
    1from collections import Counter
    2cnt = Counter(arr)
    3# cnt = Counter({4: 4, 3: 3, 2: 2, 1: 1})

    In cnt, the key is the integer and the value is its frequency in the array.

  2. Sorting:

    • Next, we sort these by frequency using most_common():
    1# This will sort the integers by frequency: [4, 3, 2, 1]
    2sorted_integers_by_freq = [item for item, count in cnt.most_common()]
  3. Iterative Accumulation:

    • Initialize the variables ans and m:
    1ans = m = 0
    • Remove integers one by one, starting from the most frequent. In our list, 4 occurs the most, so we start with that:
    1m += cnt[4]  # m = 4 (since 4 appears 4 times)
    2ans += 1     # ans = 1 (we have added one integer to our set)
    • Now we move to the next most frequent integer, 3:
    1m += cnt[3]  # m = 7 (4 from '4's and 3 from '3's)
    2ans += 1     # ans = 2 (we have added two integers to our set)
  4. Checking the Cumulative Frequency:

    • We need to check if we have removed at least half of the elements in the array. The original array has 10 elements, so we need to remove at least 5. With m = 7, we have surpassed this:
    1if m * 2 >= len(arr):  # 7*2 >= 10 which is True
    2    # Break the loop as we have reached our requirement.
  5. Return the Result:

    • We've found our answer, which is 2, because we were able to remove at least half of the elements by removing numbers 4 and 3.
    1return ans
    2# return 2

In this example, the subsets {4, 3} contain the minimum number of distinct integers (2 integers) that, once removed, reduce the original array size at least by half. Following this procedure guarantees that we achieve the goal with the smallest possible set, efficiently utilizing the insights about frequency of elements.

Not Sure What to Study? Take the 2-min Quiz:

Which of the following is a min heap?

Python Solution

1from typing import List
2from collections import Counter
4class Solution:
5    def minSetSize(self, arr: List[int]) -> int:
6        # Count the frequency of each element in the array
7        frequency_counter = Counter(arr)
9        # Initialize variables for the minimum set size and the count of removed elements
10        min_set_size = 0
11        removed_count = 0
13        # Iterate over the elements in the frequency counter, starting with the most common
14        for _, frequency in frequency_counter.most_common():
15            # Increment the count of removed elements by the frequency of the current element
16            removed_count += frequency
18            # Increment the set size by 1 (since we're adding one more element to the removal set)
19            min_set_size += 1
21            # Check if we have removed at least half of the elements from the original array
22            if removed_count * 2 >= len(arr):
23                # If true, we break the loop as we have reached the required set size
24                break
26        # Return the minimum set size needed to remove at least half of the elements
27        return min_set_size
29# Example usage:
30# solution = Solution()
31# result = solution.minSetSize([3,3,3,3,5,5,5,2,2,7])
32# print(result) # Output should be 2, as removing 3 and 5 removes 6 elements, > half of the array size.

Java Solution

1class Solution {
2    public int minSetSize(int[] arr) {
3        // Find the maximum value in arr to determine the range of counts.
4        int maxValue = 0;
5        for (int num : arr) {
6            maxValue = Math.max(maxValue, num);
7        }
9        // Create and populate a frequency array where index represents the number from arr
10        // and the value at that index represents the count of that number in arr.
11        int[] frequency = new int[maxValue + 1];
12        for (int num : arr) {
13            ++frequency[num];
14        }
16        // Sort the frequency array to have the most frequent numbers at the end.
17        Arrays.sort(frequency);
19        // Initialize variables to track the number of elements chosen and their cumulative count.
20        int setSize = 0;
21        int removedElementsCount = 0;
23        // Iterate from the end of the frequency array towards the beginning,
24        // adding the count of each number to our cumulative count and incrementing setSize.
25        // Stop once we've removed enough elements to reduce arr to half its size.
26        for (int i = maxValue; i >= 0; --i) {
27            if (frequency[i] > 0) {
28                removedElementsCount += frequency[i];
29                ++setSize;
30                // If the removed elements count is at least half the size of arr, return setSize.
31                if (removedElementsCount * 2 >= arr.length) {
32                    return setSize;
33                }
34            }
35        }
37        // The loop should always return before reaching this point;
38        // hence we don't need a return statement here, but the function requires a return type.
39        // So ideally we could throw an exception or ensure the loop condition always holds true.
40        // However, as per the constraints and correct functionality, this line will never be reached.
41        return -1; // Placeholder return statement (should not be reached).
42    }

C++ Solution

1#include <vector>
2#include <algorithm>
3#include <numeric>
5class Solution {
7    int minSetSize(vector<int>& arr) {
8        // Find the maximum value in the array to define the range of counts
9        int maxVal = *max_element(arr.begin(), arr.end());
11        // Create a frequency array to count occurrences of each element
12        // Initialize it with zero using `vector` instead of plain array
13        vector<int> frequency(maxVal + 1, 0);
15        // Count the frequency of each element in the array
16        for (int num : arr) {
17            ++frequency[num];
18        }
20        // Sort the frequency array in descending order to prioritize removing frequent elements
21        sort(frequency.begin(), frequency.end(), greater<int>());
23        // Initialize the result variable and a counter for removed elements
24        int result = 0;
25        int removedCount = 0;
27        // Iterate over the sorted frequencies
28        for (int freq : frequency) {
29            // Only consider non-zero frequencies
30            if (freq > 0) {
31                // Increase the counter by the current frequency
32                removedCount += freq;
33                // Increment the result for each selected element
34                ++result;
36                // If the removed elements count is at least half of the total array size, stop
37                if (removedCount * 2 >= arr.size()) {
38                    break;
39                }
40            }
41        }
43        // Return the minimum size of the set of elements that can be removed
44        return result;
45    }

Typescript Solution

1function minSetSize(arr: number[]): number {
2    // Map to count the frequency of each element in the array
3    const frequencyCounter = new Map<number, number>();
5    // Iterate through the array and populate the frequencyCounter
6    for (const value of arr) {
7        frequencyCounter.set(value, (frequencyCounter.get(value) ?? 0) + 1);
8    }
10    // Convert map values into an array and sort it in descending order
11    const frequencies = Array.from(frequencyCounter.values());
12    frequencies.sort((a, b) => b - a); // Sort by descending frequency
14    let setSize = 0; // Initialize the minimum set size required
15    let elementsCount = 0; // To track the number of elements included
17    // Iterate through the sorted frequencies
18    for (const count of frequencies) {
19        elementsCount += count; // Add the frequency count to our total
20        setSize++; // Increment the count of the set size
22        // Check if we've reached or exceeded half of the array length
23        if (elementsCount * 2 >= arr.length) {
24            break; // We've reached the minimum set size
25        }
26    }
28    return setSize; // Return the minimum set size
Fast Track Your Learning with Our Quick Skills Quiz:

Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?

Time and Space Complexity

Time Complexity

The time complexity of the provided code can be broken down into the following operations:

  1. Creating the Counter: The Counter(arr) operation has a time complexity of O(N) where N is the length of the array. It goes through each element exactly once.

  2. Sorting: The most_common() method on the counter returns elements in decreasing frequency, which implies an internal sort. The complexity of this sorting is O(K log K) where K is the number of unique elements in the array.

  3. Iterating through sorted frequencies: In the worst case, this can iterate up to K times. However, since it stops once the removed elements count is at least half of the array size, and considering the frequency of the elements, in practice, it might be less than K. Nevertheless, we consider the worst case, so this step has a complexity of O(K).

When combined, the time complexity is dominated by the time complexity of the sorting step, resulting in a total time complexity of O(N) + O(K log K) + O(K), which simplifies to O(N + K log K) since K ≤ N.

Space Complexity

The space complexity can be analyzed as follows:

  1. Counter: Storing the frequency of each element in the array requires O(K) space, where K is the number of unique elements.

  2. Most Common List: The list of tuples returned by most_common() also takes up O(K) space.

Combined, the overall space complexity of the algorithm is O(K), but since K can be at most N (when all elements are unique), the space complexity simplifies to O(N).

Learn more about how to find time and space complexity quickly.

Recommended Readings

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.

TA 👨‍🏫