347. Top K Frequent Elements
Problem Description
The LeetCode problem provides us with an integer array nums
and an integer k
. Our task is to find the k
most frequent elements in the array. The "frequency" of an element is the number of times it occurs in the array. The problem specifies that we can return the result in any order, which means the sequence of the results does not matter.
Intuition
To solve this problem, we need to count the occurrences of each element and then find the k
elements with the highest frequencies. The natural approach is to use a hash map (or dictionary in Python) to achieve the frequency count efficiently.
Once we have the frequency of each element, we want to retrieve the k
elements with the highest frequency. A common data structure to maintain the k
largest or smallest elements is a heap. In Python, we use a min-heap by default, which ensures that the smallest element is always at the top.
The intuition behind the solution is:
- Count the frequency of each element using a hash map.
- Iterate over the frequency map, adding each element along with its frequency as a tuple to a min-heap.
- If the heap exceeds size
k
, we remove the smallest item, which is automatically done because of the heap's properties. This ensures we only keep thek
most frequent elements in the heap. - After processing all elements, we're left with a heap containing
k
elements with the highest frequency. - We convert this heap into a list containing just the elements (not the frequencies) to return as our final answer.
This approach is highly efficient as it allows us to keep only the k
most frequent elements at all times without having to sort the entire frequency map, which could be much larger than k
.
Learn more about Divide and Conquer, Sorting and Heap (Priority Queue) patterns.
Solution Approach
The implementation of the solution uses Python's Counter
class from the collections
module to calculate the frequency of each element in the nums
array. Counter
is essentially a hash map or a dictionary that maps each element to its frequency.
Here's a step-by-step walkthrough of the implementation:
-
First, we use
Counter(nums)
to create a frequency map that holds the count of each number in thenums
array. -
Next, we initialize an empty min-heap
hp
as a list to store tuples of the form(frequency, num)
, wherefrequency
is the frequency of the numbernum
in the array. -
We iterate over each item in the frequency map and add a tuple
(freq, num)
to the heap using theheappush
function. -
While we add elements to the heap, we maintain the size of the heap to not exceed
k
. If adding an element causes the heap size to become greater thank
, we pop the smallest item from the heap usingheappop
. This is done to keep only thek
most frequent elements in the heap. -
After we finish processing all elements, the heap contains
k
tuples representing thek
most frequent elements. The least frequent element is on the top of the min-heap, while the k-th most frequent element is the last one in the heap's binary tree representation. -
Finally, we build the result list by extracting the
num
from each tuple(freq, num)
in the heap using a list comprehension:[v[1] for v in hp]
.
The Counter
efficiently calculates the frequencies of each element in O(n)
time complexity, where n
is the length of the input array. The heap operations (insertion and removal) work in O(log k)
time, and since we perform these operations at most n
times, the total time complexity of the heap operations is O(n log k)
. Thus, the overall time complexity of the solution is O(n log k)
, with O(n)
coming from the frequency map creation and O(n log k)
from the heap operations. The space complexity of the solution is O(n)
to store the frequency map and the heap.
This efficient implementation ensures we're not doing unnecessary work by keeping only the top k
frequencies in the heap, and it avoids having to sort large sets of data.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's use a small example to illustrate the solution approach. Consider the array nums = [1,2,3,2,1,2]
and k = 2
. Our goal is to find the 2 most frequent elements in nums
.
-
We first use
Counter(nums)
to create a frequency map. This gives us{1: 2, 2: 3, 3: 1}
where the key is the number fromnums
and the value is its frequency. -
We initialize an empty min-heap
hp
. It's going to store tuples like(frequency, num)
. -
We iterate over the frequency map and add each
num
and its frequency tohp
. For example,(2, 1)
for the number1
with a frequency of2
. We useheappush
to add the tuples tohp
, so after this stephp
might have[(1, 3), (2, 1)]
. -
The heap should not exceed the size
k
. In our case,k
is2
, which means after we add the third element(3, 2)
, we need to pop the smallest frequency. So we end up withhp
as[(2, 1), (3, 2)]
after all the operations since(1, 3)
would be the popped element because it had the lowest frequency. -
The heap now contains the tuples for the 2 most frequent elements. The tuple with the smallest frequency is at the top, ensuring that less frequent elements have been popped off when the size limit was exceeded.
-
Finally, to build our result list, we extract the number from each tuple in the heap. Using list comprehension
[v[1] for v in hp]
we get[1, 2]
, which are the elements with the highest frequency. This is our final result and we can return it.
Following this approach, we implemented an efficient solution to the problem that avoids sorting the entire frequency map directly and instead maintains a heap of size k
to track the k most frequent elements.
Solution Implementation
1from collections import Counter
2from heapq import heappush, heappop
3
4class Solution:
5 def topKFrequent(self, nums, k):
6 # Count the frequency of each number in nums using Counter.
7 num_frequencies = Counter(nums)
8
9 # Initialize a min heap to keep track of top k elements.
10 min_heap = []
11
12 # Iterate over the number-frequency pairs.
13 for num, freq in num_frequencies.items():
14 # Push a tuple of (frequency, number) onto the heap.
15 # Python's heapq module creates a min-heap by default.
16 heappush(min_heap, (freq, num))
17
18 # If the heap size exceeds k, remove the smallest frequency element.
19 if len(min_heap) > k:
20 heappop(min_heap)
21
22 # Extract the top k frequent numbers by taking the second element of each tuple.
23 top_k_frequent = [pair[1] for pair in min_heap]
24
25 return top_k_frequent
26
1import java.util.*;
2import java.util.function.Function;
3import java.util.stream.Collectors;
4
5class Solution {
6 public int[] topKFrequent(int[] nums, int k) {
7 // Create a Map to store the frequency of each number
8 Map<Integer, Long> frequencyMap = Arrays.stream(nums)
9 .boxed() // box the ints to Integers
10 .collect(Collectors.groupingBy(Function.identity(), // group by the number itself
11 Collectors.counting())); // count the frequency
12
13 // Initialize a min-heap based on the frequency values
14 Queue<Map.Entry<Integer, Long>> minHeap = new PriorityQueue<>(Comparator.comparingLong(Map.Entry::getValue));
15
16 // Iterate over the frequency map
17 for (Map.Entry<Integer, Long> entry : frequencyMap.entrySet()) {
18 // Insert the current entry into the min-heap
19 minHeap.offer(entry);
20
21 // If the heap size exceeds 'k', remove the smallest frequency element
22 if (minHeap.size() > k) {
23 minHeap.poll();
24 }
25 }
26
27 // Extract the top 'k' frequent numbers from the min-heap into an array
28 return minHeap.stream()
29 .mapToInt(Map.Entry::getKey)
30 .toArray();
31 }
32}
33
1#include <vector>
2#include <queue>
3#include <unordered_map>
4using namespace std;
5
6// Definition for a pair of integers
7using IntPair = pair<int, int>;
8
9class Solution {
10public:
11 // Function that returns the k most frequent elements from 'nums'.
12 vector<int> topKFrequent(vector<int>& nums, int k) {
13 // HashMap to store the frequency of each number in 'nums'.
14 unordered_map<int, int> frequencyMap;
15 // Increment the frequency count for each number in 'nums'.
16 for (int value : nums) {
17 frequencyMap[value]++;
18 }
19
20 // Min-heap to keep track of the top k frequent numbers.
21 // It stores pairs of (frequency, number) and orders by smallest frequency first.
22 priority_queue<IntPair, vector<IntPair>, greater<IntPair>> minHeap;
23
24 // Iterate over the frequency map.
25 for (const auto& element : frequencyMap) {
26 int number = element.first;
27 int frequency = element.second;
28 // Push the current number and its frequency to the min-heap.
29 minHeap.push({frequency, number});
30
31 // If the heap size exceeds k, remove the least frequent element.
32 // This ensures that the heap always contains the top k frequent elements.
33 if (minHeap.size() > k) {
34 minHeap.pop();
35 }
36 }
37
38 // Prepare a vector to store the result.
39 vector<int> topKFrequentElements(k);
40 // Retrieve the most frequent elements from the min-heap.
41 for (int i = 0; i < k; ++i) {
42 topKFrequentElements[i] = minHeap.top().second; // Store the number, not the frequency.
43 minHeap.pop();
44 }
45
46 // Return the top k frequent elements in ascending frequency order.
47 return topKFrequentElements;
48 }
49};
50
1function topKFrequent(nums: number[], k: number): number[] {
2 // Initialize a Map to hold the frequency of each number
3 const frequencyMap = new Map<number, number>();
4
5 // Iterate over the array of numbers and populate the frequencyMap
6 for (const num of nums) {
7 frequencyMap.set(num, (frequencyMap.get(num) || 0) + 1);
8 }
9
10 // Convert the Map into an array of key-value pairs
11 const frequencyArray = Array.from(frequencyMap);
12
13 // Sort the array based on frequency in descending order
14 frequencyArray.sort((a, b) => b[1] - a[1]);
15
16 // Initialize an array to hold the top k frequent elements
17 const topKElements: number[] = [];
18
19 // Iterate k times and push the most frequent elements onto the topKElements array
20 for (let i = 0; i < k; i++) {
21 topKElements.push(frequencyArray[i][0]);
22 }
23
24 // Return the top k frequent elements
25 return topKElements;
26}
27
Time and Space Complexity
Time Complexity
The time complexity of the function is determined by several factors:
-
Counting Elements: The function begins with
cnt = Counter(nums)
which counts the frequency of each element in thenums
array. Constructing this frequency counter takesO(N)
time, whereN
is the number of elements innums
. -
Heap Operations: The function then involves a for loop, iterating over the frequency counter's items, and performing heap operations. For each unique element (up to
N
unique elements), a heap push is performed, which has a time complexity ofO(log K)
, as the size of the heap is maintained atk
. In the worst case, there areN
heap push and pop operations, each takingO(log K)
time. Therefore, the complexity due to heap operations isO(N * log K)
.
The resulting overall time complexity is O(N + N * log K)
. However, since N * log K
is the dominant term, we consider the overall time complexity to be O(N * log K)
.
Space Complexity
The space complexity of the function is the additional space required by the data structures used:
-
Frequency Counter: The
Counter
will at most storeN
key-value pairs if all elements innums
are unique, which takesO(N)
space. -
Heap: The heap size is maintained at
k
, so the space required for the heap isO(k)
.
Therefore, the overall space complexity is O(N + k)
. However, in most scenarios, we expect k
to be much smaller than N
, therefore the space complexity is often expressed as O(N)
.
Learn more about how to find time and space complexity quickly using problem constraints.
Which of the following is a min heap?
Recommended Readings
https algomonster s3 us east 2 amazonaws com cover_photos dnd svg Divide and Conquer The main idea of divide and conquer is to split the main problem into two smaller components assuming that each one of the components had already been solved recursively and then try to solve the bigger
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
https algomonster s3 us east 2 amazonaws com cover_photos heap svg Priority Queue and Heap What is the relationship between priority queue and heap Priority Queue is an Abstract Data Type and Heap is the concrete data structure we use to implement a priority queue Priority Queue A priority queue
Want a Structured Path to Master System Design Too? Don’t Miss This!
I was a bit confused on why we have to store the values in the heap as a tuple in the (frequency, number) format instead of (number, frequency), because the (num, freq) felt more natural to me. I asked chatGPT and it said that when using a min-heap and tuples, it will sort the tuple based on the first value in the tuple. Therefore its critical for the format of the values in the heap to be stored as (freq, num).
Here's an excerpt from the answer it gave me: