347. Top K Frequent Elements
Problem Description
You are given an integer array nums
and an integer k
. Your task is to find and return the k
most frequent elements from the array.
The problem asks you to identify which elements appear most often in the array and return the top k
elements based on their frequency. The elements in your answer can be returned in any order.
For example:
- If
nums = [1,1,1,2,2,3]
andk = 2
, the element1
appears 3 times,2
appears 2 times, and3
appears 1 time. So the two most frequent elements are[1,2]
. - If
nums = [1]
andk = 1
, the only element1
appears once, so the answer is[1]
.
The key points to understand:
- You need to count how many times each unique element appears in the array
- You need to identify the
k
elements with the highest frequencies - The order of elements in your output doesn't matter
- You're guaranteed that the answer is unique (there won't be ties where multiple valid answers exist)
Intuition
To find the most frequent elements, we need to know how often each element appears. This naturally leads us to think about counting frequencies first. A hash table (or dictionary) is perfect for this - we can traverse the array once and count occurrences of each element.
Once we have the frequency counts, we need to identify the top k
elements. This is essentially a "top k" problem, which has several classic approaches:
-
We could sort all elements by their frequencies and take the first
k
elements. This works but might be inefficient if we're sorting all elements when we only needk
of them. -
We could use a heap (priority queue) to maintain only the top
k
elements. A min heap of sizek
is particularly clever here - we keep adding elements, and whenever the heap size exceedsk
, we remove the element with the smallest frequency. This way, the heap always contains thek
most frequent elements. -
Since Python's
Counter
class has a built-inmost_common(k)
method that efficiently returns thek
most frequent elements, we can leverage this for a clean solution.
The heap approach is especially efficient when k
is much smaller than the number of unique elements, as we only maintain a heap of size k
rather than sorting all elements. The time complexity becomes O(n log k)
instead of O(n log n)
for sorting.
The key insight is breaking the problem into two steps: first count frequencies, then select the top k
. This separation makes the problem much more manageable and allows us to use appropriate data structures for each step.
Solution Approach
The solution uses a hash table combined with a min heap to efficiently find the top k
frequent elements.
Step 1: Count Frequencies
First, we traverse the array and use a hash table cnt
to count the occurrence of each element. In Python, we can use Counter(nums)
which automatically creates a dictionary where keys are the unique elements and values are their frequencies.
Step 2: Build and Maintain a Min Heap
We iterate through the hash table and use a min heap to keep track of the top k
frequent elements. The heap stores pairs of (frequency, element)
.
The process works as follows:
- For each element and its count in the hash table, we push the pair
(count, element)
into the min heap - If the heap size exceeds
k
, we pop the smallest element (the one with minimum frequency) - By maintaining the heap size at most
k
, we ensure that only thek
most frequent elements remain
Step 3: Extract Results
After processing all elements, the heap contains exactly the k
most frequent elements. We pop all elements from the heap and add them to our result array.
Implementation Details:
- The min heap property ensures that the element with the smallest frequency is always at the root
- When we push a new element and the heap size exceeds
k
, popping removes the element with the smallest frequency among the currentk+1
elements - This guarantees that we keep the
k
elements with the highest frequencies
Time Complexity: O(n log k)
where n
is the length of the input array. We traverse the array once (O(n)
) and perform heap operations for each unique element, where each operation takes O(log k)
.
Space Complexity: O(n)
for the hash table to store frequencies, plus O(k)
for the heap.
The provided solution code uses Python's Counter.most_common(k)
method, which internally implements a similar efficient approach to find the k
most frequent elements, returning them as a list of (element, count)
tuples. The list comprehension [x for x, _ in cnt.most_common(k)]
extracts just the elements, discarding the counts.
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with nums = [1,1,1,2,2,3]
and k = 2
.
Step 1: Count Frequencies We traverse the array and count each element's occurrences:
- Element
1
appears 3 times - Element
2
appears 2 times - Element
3
appears 1 time
Our frequency map becomes: {1: 3, 2: 2, 3: 1}
Step 2: Build and Maintain Min Heap (size k=2)
We process each element-frequency pair:
-
Push
(3, 1)
to heap- Heap:
[(3, 1)]
- Size = 1 ≤ k, keep it
- Heap:
-
Push
(2, 2)
to heap- Heap:
[(2, 2), (3, 1)]
(min heap orders by frequency) - Size = 2 ≤ k, keep both
- Heap:
-
Push
(1, 3)
to heap- Heap temporarily:
[(1, 3), (3, 1), (2, 2)]
- Size = 3 > k, so pop minimum frequency
- Pop
(1, 3)
- Heap:
[(2, 2), (3, 1)]
- Heap temporarily:
Step 3: Extract Results The heap contains our k=2 most frequent elements:
(3, 1)
- element 1 with frequency 3(2, 2)
- element 2 with frequency 2
Extracting just the elements gives us: [1, 2]
The key insight is that by maintaining a min heap of size k, we automatically filter out elements with lower frequencies. When element 3 (frequency 1) tried to enter, it was immediately removed because we already had k=2 elements with higher frequencies.
Solution Implementation
1from typing import List
2from collections import Counter
3
4class Solution:
5 def topKFrequent(self, nums: List[int], k: int) -> List[int]:
6 # Count the frequency of each number in the array
7 frequency_counter = Counter(nums)
8
9 # Get the k most common elements
10 # most_common(k) returns a list of (element, count) tuples
11 most_common_elements = frequency_counter.most_common(k)
12
13 # Extract only the elements (not their counts) from the tuples
14 result = [element for element, count in most_common_elements]
15
16 return result
17
1class Solution {
2 public int[] topKFrequent(int[] nums, int k) {
3 // Create a frequency map to count occurrences of each number
4 Map<Integer, Integer> frequencyMap = new HashMap<>();
5
6 // Count the frequency of each number in the array
7 for (int num : nums) {
8 frequencyMap.merge(num, 1, Integer::sum);
9 }
10
11 // Create a min heap that sorts entries by their frequency (value)
12 // Using a min heap of size k ensures we keep the k most frequent elements
13 PriorityQueue<Map.Entry<Integer, Integer>> minHeap =
14 new PriorityQueue<>(Comparator.comparingInt(Map.Entry::getValue));
15
16 // Process each entry from the frequency map
17 for (Map.Entry<Integer, Integer> entry : frequencyMap.entrySet()) {
18 // Add the current entry to the heap
19 minHeap.offer(entry);
20
21 // If heap size exceeds k, remove the element with minimum frequency
22 // This ensures we always keep the k most frequent elements
23 if (minHeap.size() > k) {
24 minHeap.poll();
25 }
26 }
27
28 // Convert the heap entries to an array of keys (the actual numbers)
29 // The heap now contains exactly k entries with the highest frequencies
30 return minHeap.stream()
31 .mapToInt(Map.Entry::getKey)
32 .toArray();
33 }
34}
35
1class Solution {
2public:
3 vector<int> topKFrequent(vector<int>& nums, int k) {
4 // Step 1: Count frequency of each number
5 unordered_map<int, int> frequencyMap;
6 for (int num : nums) {
7 frequencyMap[num]++;
8 }
9
10 // Step 2: Use min heap to maintain top k frequent elements
11 // Pair: {frequency, number}
12 using FreqNumPair = pair<int, int>;
13 priority_queue<FreqNumPair, vector<FreqNumPair>, greater<FreqNumPair>> minHeap;
14
15 // Step 3: Build heap with size constraint k
16 for (const auto& [number, frequency] : frequencyMap) {
17 minHeap.push({frequency, number});
18
19 // Keep only k most frequent elements in the heap
20 if (minHeap.size() > k) {
21 minHeap.pop(); // Remove element with smallest frequency
22 }
23 }
24
25 // Step 4: Extract all elements from heap to build result
26 vector<int> result;
27 while (!minHeap.empty()) {
28 result.push_back(minHeap.top().second);
29 minHeap.pop();
30 }
31
32 return result;
33 }
34};
35
1/**
2 * Finds the k most frequent elements in an array
3 * @param nums - Input array of numbers
4 * @param k - Number of most frequent elements to return
5 * @returns Array containing the k most frequent elements
6 */
7function topKFrequent(nums: number[], k: number): number[] {
8 // Create a frequency map to count occurrences of each number
9 const frequencyMap = new Map<number, number>();
10
11 // Count the frequency of each number in the array
12 for (const num of nums) {
13 frequencyMap.set(num, (frequencyMap.get(num) ?? 0) + 1);
14 }
15
16 // Use a min heap to maintain the k most frequent elements
17 // The heap is ordered by frequency (priority)
18 const minHeap = new MinPriorityQueue();
19
20 // Iterate through each unique number and its frequency
21 for (const [num, frequency] of frequencyMap) {
22 // Add the number to the heap with its frequency as priority
23 minHeap.enqueue(num, frequency);
24
25 // If heap size exceeds k, remove the element with minimum frequency
26 if (minHeap.size() > k) {
27 minHeap.dequeue();
28 }
29 }
30
31 // Extract all elements from the heap and return as an array
32 return minHeap.toArray().map(item => item.element);
33}
34
Time and Space Complexity
The time complexity of this code is O(n + m log m)
where n
is the length of the input array and m
is the number of unique elements. In the worst case where all elements are unique, m = n
, making it O(n log n)
. The space complexity is O(m)
for storing the counter dictionary.
However, the reference answer suggests O(n log k)
time complexity, which would be achievable if we used a min-heap of size k
instead of the most_common(k)
method. The Counter
constructor takes O(n)
time to count frequencies. The most_common(k)
method internally sorts all unique elements by frequency, which takes O(m log m)
time, then returns the top k
elements.
The reference answer's space complexity of O(k)
would also apply to a heap-based solution that maintains only k
elements. The current implementation uses O(m)
space for the Counter object which stores all unique elements and their frequencies, not just the top k
.
Common Pitfalls
1. Assuming the Input Array is Sorted or Elements are Unique
A common mistake is assuming that the input array has some special properties. The array can contain duplicates (that's the whole point - counting frequencies), and elements can appear in any order. Don't try to optimize based on assumptions about the input structure.
2. Using Sorting on the Entire Frequency Map
A pitfall is sorting all elements by frequency and then taking the first k
elements:
# Inefficient approach - sorts ALL unique elements
frequency_counter = Counter(nums)
sorted_items = sorted(frequency_counter.items(), key=lambda x: x[1], reverse=True)
return [item[0] for item in sorted_items[:k]]
This has O(n log n) time complexity where n is the number of unique elements. The heap-based approach or most_common(k)
is more efficient at O(n log k).
3. Incorrect Heap Implementation
When manually implementing with a heap, a common error is using a max heap instead of a min heap, or forgetting to limit the heap size:
import heapq # Wrong: This keeps ALL elements in the heap heap = [] for num, freq in frequency_counter.items(): heapq.heappush(heap, (freq, num)) # Missing the size limit check
Correct implementation:
import heapq
heap = []
for num, freq in frequency_counter.items():
heapq.heappush(heap, (freq, num))
if len(heap) > k:
heapq.heappop(heap) # Maintain heap size at most k
4. Returning Frequencies Instead of Elements
When using Counter.most_common(k)
, it returns tuples of (element, count)
. A mistake is returning the entire tuples or just the counts:
# Wrong: Returns tuples return frequency_counter.most_common(k) # Returns [(1, 3), (2, 2)] # Wrong: Returns counts instead of elements return [count for _, count in frequency_counter.most_common(k)] # Returns [3, 2]
5. Not Handling Edge Cases
While the problem guarantees valid input, in real scenarios you should consider:
- What if
k
equals the length of the array? - What if
k
equals the number of unique elements? - What if the array is empty? (though the problem constraints prevent this)
The Counter.most_common(k)
method handles these cases gracefully, but manual implementations might fail if not careful.
In a binary min heap, the maximum element can be found in:
Recommended Readings
https assets algo monster cover_photos dnd svg Divide and Conquer The main idea of divide and conquer is to split the main problem into two smaller components assuming that each one of the components had already been solved recursively and then try to solve the bigger problem using the solutions
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
https assets algo monster cover_photos heap svg Priority Queue and Heap What is the relationship between priority queue and heap Priority Queue is an Abstract Data Type and Heap is the concrete data structure we use to implement a priority queue Priority Queue A priority queue is a data structure
Want a Structured Path to Master System Design Too? Don’t Miss This!