1338. Reduce Array Size to The Half
Problem Description
The problem is about reducing the size of an integer array by removing some subsets of integers to make sure at least half of the elements in the original array are removed. The task is to find the minimum number of distinct integers needed for these subsets such that once their occurrences are removed, at least half of the array elements are gone.
To clarify with an example, if we have an array [3,3,3,3,5,5,5,2,2,7]
, a possible solution would be choosing the subset {3,5}
which contains 2 integers. Removing all the instances of 3
and 5
from the array would remove 3 + 3 = 6
elements from the 10-element array, thereby removing more than half of the array, which satisfies the condition.
Intuition
The intuition behind the solution approach involves the following steps:
- Identify that we want to maximize the number of elements removed by picking the least number of distinct integers.
- Understand that to remove the most numbers, we should start by removing the integers that appear most frequently.
- Use a frequency counter to count the occurrences of each integer in the array.
- Sort these integers by their frequencies in descending order, so we can remove the integers with higher frequencies first.
- Start removing integers from the sorted list one by one, adding their frequencies to a cumulative sum (
m
in the code). - Each time an integer is added to the set (the subset we're removing), increase the count (
ans
in the code) that represents the size of our set. - Once we've removed enough numbers such that the cumulative sum is at least half of the original array's size, we've found our minimum set size and can stop.
The provided Python code uses a Counter
from the collections
module to tally the integer frequencies and most_common()
method to sort them by frequency in descending order. It iterates through these frequencies while keeping a running total and a count until at least half the array size is reached.
Learn more about Greedy, Sorting and Heap (Priority Queue) patterns.
Solution Approach
The solution uses a hash map and sorting technique to solve the problem efficiently. Here's a breakdown of the implementation steps referring to the Python code provided:
-
Hash Map (Counter):
- Use the Python
Counter
from thecollections
module to create a hash map that counts the occurrences of each integer in the input arrayarr
. This operation has a complexity of O(n), where n is the size of the array.
cnt = Counter(arr)
- Use the Python
-
- The
most_common()
function is called on theCounter
object which under the hood performs a sort operation, arranging the counted integers in descending order based on their frequency. The complexity of this operation is O(u log u), where u is the number of unique elements inarr
.
for _, v in cnt.most_common():
- The
-
Iterative Accumulation:
- Initialize two variables,
ans
to count the number of elements in the required set andm
to keep track of the cumulative frequency of elements removed from the arrayarr
.
ans = m = 0
- Iteratively add the frequency
v
of the most common element tom
. Incrementans
by 1 to reflect that an element is added to the set.
m += v ans += 1
- Check if the cumulative frequency
m
is at least half the length of the original array. If so, break out of the loop since the requirement of removing at least half of the array elements is met.
if m * 2 >= len(arr): break
- Initialize two variables,
-
Return the Result:
- After the loop terminates, return the value of
ans
, which represents the minimum size of the set to remove at least half of the array elements.
return ans
- After the loop terminates, return the value of
This approach is efficient as it targets the frequency of elements which is a key insight for achieving the task with a minimal set size. The heavy-lifting operations are counting and sorting which are both done optimally using Python's built-in data structures and algorithms.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's use a small example to illustrate the solution approach. Suppose we have an array arr = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
.
-
Hash Map (Counter):
- First, we'll count the occurrences of each integer:
from collections import Counter cnt = Counter(arr) # cnt = Counter({4: 4, 3: 3, 2: 2, 1: 1})
In
cnt
, the key is the integer and the value is its frequency in the array. -
Sorting:
- Next, we sort these by frequency using
most_common()
:
# This will sort the integers by frequency: [4, 3, 2, 1] sorted_integers_by_freq = [item for item, count in cnt.most_common()]
- Next, we sort these by frequency using
-
Iterative Accumulation:
- Initialize the variables
ans
andm
:
ans = m = 0
- Remove integers one by one, starting from the most frequent. In our list,
4
occurs the most, so we start with that:
m += cnt[4] # m = 4 (since 4 appears 4 times) ans += 1 # ans = 1 (we have added one integer to our set)
- Now we move to the next most frequent integer,
3
:
m += cnt[3] # m = 7 (4 from '4's and 3 from '3's) ans += 1 # ans = 2 (we have added two integers to our set)
- Initialize the variables
-
Checking the Cumulative Frequency:
- We need to check if we have removed at least half of the elements in the array. The original array has 10 elements, so we need to remove at least 5. With
m = 7
, we have surpassed this:
if m * 2 >= len(arr): # 7*2 >= 10 which is True # Break the loop as we have reached our requirement.
- We need to check if we have removed at least half of the elements in the array. The original array has 10 elements, so we need to remove at least 5. With
-
Return the Result:
- We've found our answer, which is 2, because we were able to remove at least half of the elements by removing numbers
4
and3
.
return ans # return 2
- We've found our answer, which is 2, because we were able to remove at least half of the elements by removing numbers
In this example, the subsets {4, 3}
contain the minimum number of distinct integers (2 integers) that, once removed, reduce the original array size at least by half. Following this procedure guarantees that we achieve the goal with the smallest possible set, efficiently utilizing the insights about frequency of elements.
Solution Implementation
1from typing import List
2from collections import Counter
3
4class Solution:
5 def minSetSize(self, arr: List[int]) -> int:
6 # Count the frequency of each element in the array
7 frequency_counter = Counter(arr)
8
9 # Initialize variables for the minimum set size and the count of removed elements
10 min_set_size = 0
11 removed_count = 0
12
13 # Iterate over the elements in the frequency counter, starting with the most common
14 for _, frequency in frequency_counter.most_common():
15 # Increment the count of removed elements by the frequency of the current element
16 removed_count += frequency
17
18 # Increment the set size by 1 (since we're adding one more element to the removal set)
19 min_set_size += 1
20
21 # Check if we have removed at least half of the elements from the original array
22 if removed_count * 2 >= len(arr):
23 # If true, we break the loop as we have reached the required set size
24 break
25
26 # Return the minimum set size needed to remove at least half of the elements
27 return min_set_size
28
29# Example usage:
30# solution = Solution()
31# result = solution.minSetSize([3,3,3,3,5,5,5,2,2,7])
32# print(result) # Output should be 2, as removing 3 and 5 removes 6 elements, > half of the array size.
33
1class Solution {
2 public int minSetSize(int[] arr) {
3 // Find the maximum value in arr to determine the range of counts.
4 int maxValue = 0;
5 for (int num : arr) {
6 maxValue = Math.max(maxValue, num);
7 }
8
9 // Create and populate a frequency array where index represents the number from arr
10 // and the value at that index represents the count of that number in arr.
11 int[] frequency = new int[maxValue + 1];
12 for (int num : arr) {
13 ++frequency[num];
14 }
15
16 // Sort the frequency array to have the most frequent numbers at the end.
17 Arrays.sort(frequency);
18
19 // Initialize variables to track the number of elements chosen and their cumulative count.
20 int setSize = 0;
21 int removedElementsCount = 0;
22
23 // Iterate from the end of the frequency array towards the beginning,
24 // adding the count of each number to our cumulative count and incrementing setSize.
25 // Stop once we've removed enough elements to reduce arr to half its size.
26 for (int i = maxValue; i >= 0; --i) {
27 if (frequency[i] > 0) {
28 removedElementsCount += frequency[i];
29 ++setSize;
30 // If the removed elements count is at least half the size of arr, return setSize.
31 if (removedElementsCount * 2 >= arr.length) {
32 return setSize;
33 }
34 }
35 }
36
37 // The loop should always return before reaching this point;
38 // hence we don't need a return statement here, but the function requires a return type.
39 // So ideally we could throw an exception or ensure the loop condition always holds true.
40 // However, as per the constraints and correct functionality, this line will never be reached.
41 return -1; // Placeholder return statement (should not be reached).
42 }
43}
44
1#include <vector>
2#include <algorithm>
3#include <numeric>
4
5class Solution {
6public:
7 int minSetSize(vector<int>& arr) {
8 // Find the maximum value in the array to define the range of counts
9 int maxVal = *max_element(arr.begin(), arr.end());
10
11 // Create a frequency array to count occurrences of each element
12 // Initialize it with zero using `vector` instead of plain array
13 vector<int> frequency(maxVal + 1, 0);
14
15 // Count the frequency of each element in the array
16 for (int num : arr) {
17 ++frequency[num];
18 }
19
20 // Sort the frequency array in descending order to prioritize removing frequent elements
21 sort(frequency.begin(), frequency.end(), greater<int>());
22
23 // Initialize the result variable and a counter for removed elements
24 int result = 0;
25 int removedCount = 0;
26
27 // Iterate over the sorted frequencies
28 for (int freq : frequency) {
29 // Only consider non-zero frequencies
30 if (freq > 0) {
31 // Increase the counter by the current frequency
32 removedCount += freq;
33 // Increment the result for each selected element
34 ++result;
35
36 // If the removed elements count is at least half of the total array size, stop
37 if (removedCount * 2 >= arr.size()) {
38 break;
39 }
40 }
41 }
42
43 // Return the minimum size of the set of elements that can be removed
44 return result;
45 }
46};
47
1function minSetSize(arr: number[]): number {
2 // Map to count the frequency of each element in the array
3 const frequencyCounter = new Map<number, number>();
4
5 // Iterate through the array and populate the frequencyCounter
6 for (const value of arr) {
7 frequencyCounter.set(value, (frequencyCounter.get(value) ?? 0) + 1);
8 }
9
10 // Convert map values into an array and sort it in descending order
11 const frequencies = Array.from(frequencyCounter.values());
12 frequencies.sort((a, b) => b - a); // Sort by descending frequency
13
14 let setSize = 0; // Initialize the minimum set size required
15 let elementsCount = 0; // To track the number of elements included
16
17 // Iterate through the sorted frequencies
18 for (const count of frequencies) {
19 elementsCount += count; // Add the frequency count to our total
20 setSize++; // Increment the count of the set size
21
22 // Check if we've reached or exceeded half of the array length
23 if (elementsCount * 2 >= arr.length) {
24 break; // We've reached the minimum set size
25 }
26 }
27
28 return setSize; // Return the minimum set size
29}
30
Time and Space Complexity
Time Complexity
The time complexity of the provided code can be broken down into the following operations:
-
Creating the Counter: The
Counter(arr)
operation has a time complexity ofO(N)
whereN
is the length of the array. It goes through each element exactly once. -
Sorting: The
most_common()
method on the counter returns elements in decreasing frequency, which implies an internal sort. The complexity of this sorting isO(K log K)
whereK
is the number of unique elements in the array. -
Iterating through sorted frequencies: In the worst case, this can iterate up to
K
times. However, since it stops once the removed elements count is at least half of the array size, and considering the frequency of the elements, in practice, it might be less thanK
. Nevertheless, we consider the worst case, so this step has a complexity ofO(K)
.
When combined, the time complexity is dominated by the time complexity of the sorting step, resulting in a total time complexity of O(N) + O(K log K) + O(K)
, which simplifies to O(N + K log K)
since K ≤ N
.
Space Complexity
The space complexity can be analyzed as follows:
-
Counter: Storing the frequency of each element in the array requires
O(K)
space, whereK
is the number of unique elements. -
Most Common List: The list of tuples returned by
most_common()
also takes upO(K)
space.
Combined, the overall space complexity of the algorithm is O(K)
, but since K
can be at most N
(when all elements are unique), the space complexity simplifies to O(N)
.
Learn more about how to find time and space complexity quickly using problem constraints.
Problem: Given a list of tasks and a list of requirements, compute a sequence of tasks that can be performed, such that we complete every task once while satisfying all the requirements.
Which of the following method should we use to solve this problem?
Recommended Readings
Greedy Introduction div class responsive iframe iframe src https www youtube com embed WTslqPbj7I title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture web share allowfullscreen iframe div When do we use greedy Greedy algorithms tend to solve optimization problems Typically they will ask you to calculate the max min of some value Commonly you may see this phrased in the problem as max min longest shortest largest smallest etc These keywords can be identified by just scanning
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
https algomonster s3 us east 2 amazonaws com cover_photos heap svg Priority Queue and Heap What is the relationship between priority queue and heap Priority Queue is an Abstract Data Type and Heap is the concrete data structure we use to implement a priority queue Priority Queue A priority queue
Want a Structured Path to Master System Design Too? Don’t Miss This!