347. Top K Frequent Elements

MediumArrayHash TableDivide and ConquerBucket SortCountingQuickselectSortingHeap (Priority Queue)
Leetcode Link

Problem Description

The LeetCode problem provides us with an integer array nums and an integer k. Our task is to find the k most frequent elements in the array. The "frequency" of an element is the number of times it occurs in the array. The problem specifies that we can return the result in any order, which means the sequence of the results does not matter.

Intuition

To solve this problem, we need to count the occurrences of each element and then find the k elements with the highest frequencies. The natural approach is to use a hash map (or dictionary in Python) to achieve the frequency count efficiently.

Once we have the frequency of each element, we want to retrieve the k elements with the highest frequency. A common data structure to maintain the k largest or smallest elements is a heap. In Python, we use a min-heap by default, which ensures that the smallest element is always at the top.

The intuition behind the solution is:

  1. Count the frequency of each element using a hash map.
  2. Iterate over the frequency map, adding each element along with its frequency as a tuple to a min-heap.
  3. If the heap exceeds size k, we remove the smallest item, which is automatically done because of the heap's properties. This ensures we only keep the k most frequent elements in the heap.
  4. After processing all elements, we're left with a heap containing k elements with the highest frequency.
  5. We convert this heap into a list containing just the elements (not the frequencies) to return as our final answer.

This approach is highly efficient as it allows us to keep only the k most frequent elements at all times without having to sort the entire frequency map, which could be much larger than k.

Learn more about Divide and Conquer, Sorting and Heap (Priority Queue) patterns.

Solution Approach

The implementation of the solution uses Python's Counter class from the collections module to calculate the frequency of each element in the nums array. Counter is essentially a hash map or a dictionary that maps each element to its frequency.

Here's a step-by-step walkthrough of the implementation:

  1. First, we use Counter(nums) to create a frequency map that holds the count of each number in the nums array.

  2. Next, we initialize an empty min-heap hp as a list to store tuples of the form (frequency, num), where frequency is the frequency of the number num in the array.

  3. We iterate over each item in the frequency map and add a tuple (freq, num) to the heap using the heappush function.

  4. While we add elements to the heap, we maintain the size of the heap to not exceed k. If adding an element causes the heap size to become greater than k, we pop the smallest item from the heap using heappop. This is done to keep only the k most frequent elements in the heap.

  5. After we finish processing all elements, the heap contains k tuples representing the k most frequent elements. The least frequent element is on the top of the min-heap, while the k-th most frequent element is the last one in the heap's binary tree representation.

  6. Finally, we build the result list by extracting the num from each tuple (freq, num) in the heap using a list comprehension: [v[1] for v in hp].

The Counter efficiently calculates the frequencies of each element in O(n) time complexity, where n is the length of the input array. The heap operations (insertion and removal) work in O(log k) time, and since we perform these operations at most n times, the total time complexity of the heap operations is O(n log k). Thus, the overall time complexity of the solution is O(n log k), with O(n) coming from the frequency map creation and O(n log k) from the heap operations. The space complexity of the solution is O(n) to store the frequency map and the heap.

This efficient implementation ensures we're not doing unnecessary work by keeping only the top k frequencies in the heap, and it avoids having to sort large sets of data.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's use a small example to illustrate the solution approach. Consider the array nums = [1,2,3,2,1,2] and k = 2. Our goal is to find the 2 most frequent elements in nums.

  1. We first use Counter(nums) to create a frequency map. This gives us {1: 2, 2: 3, 3: 1} where the key is the number from nums and the value is its frequency.

  2. We initialize an empty min-heap hp. It's going to store tuples like (frequency, num).

  3. We iterate over the frequency map and add each num and its frequency to hp. For example, (2, 1) for the number 1 with a frequency of 2. We use heappush to add the tuples to hp, so after this step hp might have [(1, 3), (2, 1)].

  4. The heap should not exceed the size k. In our case, k is 2, which means after we add the third element (3, 2), we need to pop the smallest frequency. So we end up with hp as [(2, 1), (3, 2)] after all the operations since (1, 3) would be the popped element because it had the lowest frequency.

  5. The heap now contains the tuples for the 2 most frequent elements. The tuple with the smallest frequency is at the top, ensuring that less frequent elements have been popped off when the size limit was exceeded.

  6. Finally, to build our result list, we extract the number from each tuple in the heap. Using list comprehension [v[1] for v in hp] we get [1, 2], which are the elements with the highest frequency. This is our final result and we can return it.

Following this approach, we implemented an efficient solution to the problem that avoids sorting the entire frequency map directly and instead maintains a heap of size k to track the k most frequent elements.

Solution Implementation

1from collections import Counter
2from heapq import heappush, heappop
3
4class Solution:
5    def topKFrequent(self, nums, k):
6        # Count the frequency of each number in nums using Counter.
7        num_frequencies = Counter(nums)
8      
9        # Initialize a min heap to keep track of top k elements.
10        min_heap = []
11      
12        # Iterate over the number-frequency pairs.
13        for num, freq in num_frequencies.items():
14            # Push a tuple of (frequency, number) onto the heap.
15            # Python's heapq module creates a min-heap by default.
16            heappush(min_heap, (freq, num))
17          
18            # If the heap size exceeds k, remove the smallest frequency element.
19            if len(min_heap) > k:
20                heappop(min_heap)
21      
22        # Extract the top k frequent numbers by taking the second element of each tuple.
23        top_k_frequent = [pair[1] for pair in min_heap]
24      
25        return top_k_frequent
26
1import java.util.*;
2import java.util.function.Function;
3import java.util.stream.Collectors;
4
5class Solution {
6    public int[] topKFrequent(int[] nums, int k) {
7        // Create a Map to store the frequency of each number
8        Map<Integer, Long> frequencyMap = Arrays.stream(nums)
9                                                .boxed() // box the ints to Integers
10                                                .collect(Collectors.groupingBy(Function.identity(), // group by the number itself
11                                                    Collectors.counting())); // count the frequency
12      
13        // Initialize a min-heap based on the frequency values
14        Queue<Map.Entry<Integer, Long>> minHeap = new PriorityQueue<>(Comparator.comparingLong(Map.Entry::getValue));
15      
16        // Iterate over the frequency map
17        for (Map.Entry<Integer, Long> entry : frequencyMap.entrySet()) {
18            // Insert the current entry into the min-heap
19            minHeap.offer(entry);
20          
21            // If the heap size exceeds 'k', remove the smallest frequency element
22            if (minHeap.size() > k) {
23                minHeap.poll();
24            }
25        }
26      
27        // Extract the top 'k' frequent numbers from the min-heap into an array
28        return minHeap.stream()
29                      .mapToInt(Map.Entry::getKey)
30                      .toArray();
31    }
32}
33
1#include <vector>
2#include <queue>
3#include <unordered_map>
4using namespace std;
5
6// Definition for a pair of integers
7using IntPair = pair<int, int>;
8
9class Solution {
10public:
11    // Function that returns the k most frequent elements from 'nums'.
12    vector<int> topKFrequent(vector<int>& nums, int k) {
13        // HashMap to store the frequency of each number in 'nums'.
14        unordered_map<int, int> frequencyMap;
15        // Increment the frequency count for each number in 'nums'.
16        for (int value : nums) {
17            frequencyMap[value]++;
18        }
19
20        // Min-heap to keep track of the top k frequent numbers.
21        // It stores pairs of (frequency, number) and orders by smallest frequency first.
22        priority_queue<IntPair, vector<IntPair>, greater<IntPair>> minHeap;
23
24        // Iterate over the frequency map.
25        for (const auto& element : frequencyMap) {
26            int number = element.first;
27            int frequency = element.second;
28            // Push the current number and its frequency to the min-heap.
29            minHeap.push({frequency, number});
30
31            // If the heap size exceeds k, remove the least frequent element.
32            // This ensures that the heap always contains the top k frequent elements.
33            if (minHeap.size() > k) {
34                minHeap.pop();
35            }
36        }
37
38        // Prepare a vector to store the result.
39        vector<int> topKFrequentElements(k);
40        // Retrieve the most frequent elements from the min-heap.
41        for (int i = 0; i < k; ++i) {
42            topKFrequentElements[i] = minHeap.top().second;  // Store the number, not the frequency.
43            minHeap.pop();
44        }
45
46        // Return the top k frequent elements in ascending frequency order.
47        return topKFrequentElements;
48    }
49};
50
1function topKFrequent(nums: number[], k: number): number[] {
2    // Initialize a Map to hold the frequency of each number
3    const frequencyMap = new Map<number, number>();
4
5    // Iterate over the array of numbers and populate the frequencyMap
6    for (const num of nums) {
7        frequencyMap.set(num, (frequencyMap.get(num) || 0) + 1);
8    }
9
10    // Convert the Map into an array of key-value pairs
11    const frequencyArray = Array.from(frequencyMap);
12
13    // Sort the array based on frequency in descending order
14    frequencyArray.sort((a, b) => b[1] - a[1]);
15
16    // Initialize an array to hold the top k frequent elements
17    const topKElements: number[] = [];
18
19    // Iterate k times and push the most frequent elements onto the topKElements array
20    for (let i = 0; i < k; i++) {
21        topKElements.push(frequencyArray[i][0]);
22    }
23
24    // Return the top k frequent elements
25    return topKElements;
26}
27

Time and Space Complexity

Time Complexity

The time complexity of the function is determined by several factors:

  • Counting Elements: The function begins with cnt = Counter(nums) which counts the frequency of each element in the nums array. Constructing this frequency counter takes O(N) time, where N is the number of elements in nums.

  • Heap Operations: The function then involves a for loop, iterating over the frequency counter's items, and performing heap operations. For each unique element (up to N unique elements), a heap push is performed, which has a time complexity of O(log K), as the size of the heap is maintained at k. In the worst case, there are N heap push and pop operations, each taking O(log K) time. Therefore, the complexity due to heap operations is O(N * log K).

The resulting overall time complexity is O(N + N * log K). However, since N * log K is the dominant term, we consider the overall time complexity to be O(N * log K).

Space Complexity

The space complexity of the function is the additional space required by the data structures used:

  • Frequency Counter: The Counter will at most store N key-value pairs if all elements in nums are unique, which takes O(N) space.

  • Heap: The heap size is maintained at k, so the space required for the heap is O(k).

Therefore, the overall space complexity is O(N + k). However, in most scenarios, we expect k to be much smaller than N, therefore the space complexity is often expressed as O(N).

Learn more about how to find time and space complexity quickly using problem constraints.


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:
Question 1 out of 10

Which of the following is a min heap?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!


Jay S.

I was a bit confused on why we have to store the values in the heap as a tuple in the (frequency, number) format instead of (number, frequency), because the (num, freq) felt more natural to me. I asked chatGPT and it said that when using a min-heap and tuples, it will sort the tuple based on the first value in the tuple. Therefore its critical for the format of the values in the heap to be stored as (freq, num).

Here's an excerpt from the answer it gave me:

-Heap Order: When tuples are pushed into the heap, heapq maintains the order based on the first 
  element of the tuple.
-Efficiency: Keeping (freq, num) ensures that the least frequent element can be removed efficiently, 
  which is crucial for maintaining the top K elements.
Mon Jun 24 2024