692. Top K Frequent Words
Problem Description
You are given an array of strings words
and an integer k
. Your task is to find and return the k
most frequently occurring strings from the array.
The returned strings must be sorted according to these rules:
- First, sort by frequency in descending order (most frequent words come first)
- If two words have the same frequency, sort them lexicographically (alphabetical order)
For example, if you have words with frequencies like:
- "apple" appears 3 times
- "banana" appears 3 times
- "cherry" appears 2 times
Since "apple" and "banana" both appear 3 times, they would be sorted alphabetically between themselves, so "apple" would come before "banana". Both would come before "cherry" since they have higher frequency.
The function should return exactly k
strings following these sorting criteria.
Intuition
To find the most frequent words, we need to count how many times each word appears. This naturally leads us to think about using a frequency counter or hash table, where each word is mapped to its count.
Once we have the frequencies, we need to identify the top k
words. However, there's a twist - we can't just pick any k
words with the highest frequencies. When multiple words share the same frequency, we need to break ties using lexicographical order.
The key insight is that we can handle both sorting requirements (frequency and lexicographical) simultaneously by creating a custom sorting criteria. We want:
- Words with higher frequency to come first (descending order by frequency)
- Among words with equal frequency, those that are lexicographically smaller to come first (ascending alphabetical order)
In Python, we can express this by sorting with a key function that returns a tuple: (-frequency, word)
. The negative frequency ensures descending order for counts (since Python sorts in ascending order by default), while the word itself provides the lexicographical ordering for ties.
After sorting all words by this criteria, we simply take the first k
words from the sorted list. This guarantees we get the k
most frequent words with proper tie-breaking.
Learn more about Trie, Sorting and Heap (Priority Queue) patterns.
Solution Approach
The implementation follows these steps:
-
Count Word Frequencies: We use Python's
Counter
from the collections module to create a hash tablecnt
that maps each word to its frequency. This gives uscnt[word] = frequency
for each unique word in the input array. -
Custom Sorting: We apply Python's
sorted()
function directly on the keys of the counter (which are the unique words). The sorting uses a lambda function as the key:lambda x: (-cnt[x], x)
- This returns a tuple for each word
x
- The first element
-cnt[x]
is the negative frequency, ensuring words with higher counts come first - The second element
x
is the word itself, providing lexicographical ordering when frequencies are equal
-
Select Top K: After sorting, we use list slicing
[:k]
to extract the firstk
elements from the sorted list.
The beauty of this solution lies in its conciseness - the entire logic is compressed into a single line:
return sorted(cnt, key=lambda x: (-cnt[x], x))[:k]
This approach leverages Python's stable sorting algorithm and tuple comparison behavior. When comparing tuples, Python first compares the first elements; if they're equal, it moves to the second elements. This naturally handles our two-level sorting requirement.
The time complexity is O(n log n)
where n
is the number of unique words, due to the sorting operation. The space complexity is O(n)
for storing the counter hash table.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example to see how the solution works step by step.
Input: words = ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"]
, k = 4
Step 1: Count Word Frequencies
First, we create a frequency counter using Counter(words)
:
cnt = { "the": 4, "is": 3, "sunny": 2, "day": 1 }
Step 2: Apply Custom Sorting
Now we sort the unique words using our lambda function lambda x: (-cnt[x], x)
.
For each word, we create a tuple:
- "the" → (-4, "the")
- "is" → (-3, "is")
- "sunny" → (-2, "sunny")
- "day" → (-1, "day")
Python sorts these tuples in ascending order:
- (-4, "the") comes first (smallest negative number = highest frequency)
- (-3, "is") comes second
- (-2, "sunny") comes third
- (-1, "day") comes last
If we had two words with the same frequency, say both "apple" and "banana" appeared 3 times:
- "apple" → (-3, "apple")
- "banana" → (-3, "banana")
Since the first element is the same (-3), Python would compare the second elements alphabetically, placing "apple" before "banana".
Step 3: Select Top K Elements
After sorting, our list is: ["the", "is", "sunny", "day"]
We take the first k = 4
elements using slicing [:4]
.
Output: ["the", "is", "sunny", "day"]
The words are ordered by frequency (4, 3, 2, 1), and if there were ties, they would be resolved alphabetically. This demonstrates how the single line return sorted(cnt, key=lambda x: (-cnt[x], x))[:k]
elegantly handles both sorting requirements simultaneously.
Solution Implementation
1from typing import List
2from collections import Counter
3
4class Solution:
5 def topKFrequent(self, words: List[str], k: int) -> List[str]:
6 # Count the frequency of each word in the input list
7 word_count = Counter(words)
8
9 # Sort words by:
10 # 1. Frequency in descending order (higher frequency first)
11 # 2. Lexicographical order for words with same frequency
12 sorted_words = sorted(word_count, key=lambda word: (-word_count[word], word))
13
14 # Return the top k most frequent words
15 return sorted_words[:k]
16
1class Solution {
2 public List<String> topKFrequent(String[] words, int k) {
3 // Build frequency map: word -> count
4 Map<String, Integer> frequencyMap = new HashMap<>();
5 for (String word : words) {
6 frequencyMap.merge(word, 1, Integer::sum);
7 }
8
9 // Sort words array based on:
10 // 1. Frequency in descending order (higher frequency first)
11 // 2. Lexicographical order for words with same frequency
12 Arrays.sort(words, (word1, word2) -> {
13 int frequency1 = frequencyMap.get(word1);
14 int frequency2 = frequencyMap.get(word2);
15
16 if (frequency1 == frequency2) {
17 // Same frequency: sort lexicographically (ascending)
18 return word1.compareTo(word2);
19 } else {
20 // Different frequency: sort by frequency (descending)
21 return frequency2 - frequency1;
22 }
23 });
24
25 // Collect top k unique words from sorted array
26 List<String> result = new ArrayList<>();
27 for (int i = 0; i < words.length && result.size() < k; i++) {
28 // Add word only if it's the first occurrence or different from previous
29 if (i == 0 || !words[i].equals(words[i - 1])) {
30 result.add(words[i]);
31 }
32 }
33
34 return result;
35 }
36}
37
1class Solution {
2public:
3 vector<string> topKFrequent(vector<string>& words, int k) {
4 // Build frequency map to count occurrences of each word
5 unordered_map<string, int> frequencyMap;
6 for (const auto& word : words) {
7 ++frequencyMap[word];
8 }
9
10 // Extract all unique words into a result vector
11 vector<string> result;
12 for (const auto& [word, frequency] : frequencyMap) {
13 result.push_back(word);
14 }
15
16 // Sort words by frequency (descending) and lexicographically (ascending) for ties
17 ranges::sort(result, [&](const string& word1, const string& word2) {
18 // If frequencies are different, sort by frequency in descending order
19 // If frequencies are the same, sort lexicographically in ascending order
20 return frequencyMap[word1] > frequencyMap[word2] ||
21 (frequencyMap[word1] == frequencyMap[word2] && word1 < word2);
22 });
23
24 // Keep only the top k frequent words
25 result.resize(k);
26
27 return result;
28 }
29};
30
1/**
2 * Find the k most frequent words in an array
3 * @param words - Array of words to analyze
4 * @param k - Number of top frequent words to return
5 * @returns Array of k most frequent words, sorted by frequency (descending) and lexicographically for ties
6 */
7function topKFrequent(words: string[], k: number): string[] {
8 // Create a frequency map to count occurrences of each word
9 const frequencyMap: Map<string, number> = new Map();
10
11 // Count the frequency of each word
12 for (const word of words) {
13 frequencyMap.set(word, (frequencyMap.get(word) || 0) + 1);
14 }
15
16 // Get all unique words from the frequency map
17 const uniqueWords: string[] = Array.from(frequencyMap.keys());
18
19 // Sort words by frequency (descending), then lexicographically for words with same frequency
20 uniqueWords.sort((wordA, wordB) => {
21 const frequencyA = frequencyMap.get(wordA)!;
22 const frequencyB = frequencyMap.get(wordB)!;
23
24 if (frequencyA === frequencyB) {
25 // If frequencies are equal, sort lexicographically (ascending)
26 return wordA.localeCompare(wordB);
27 }
28 // Sort by frequency in descending order
29 return frequencyB - frequencyA;
30 });
31
32 // Return the top k most frequent words
33 return uniqueWords.slice(0, k);
34}
35
Time and Space Complexity
The time complexity is O(n × log n)
, where n
is the number of unique words in the input list.
- Creating the Counter takes
O(m)
time, wherem
is the total number of words in the input list - The sorted() function sorts the unique words (keys in cnt), which has at most
n
unique words - The sorting operation with custom key takes
O(n × log n)
time - Slicing the first
k
elements takesO(k)
time
Since n ≤ m
(unique words cannot exceed total words) and O(n × log n)
dominates, the overall time complexity is O(n × log n)
.
The space complexity is O(n)
, where n
is the number of unique words.
- The Counter object stores at most
n
unique words and their frequencies, requiringO(n)
space - The sorted() function creates a new list of size
n
, requiringO(n)
space - The final sliced result contains
k
elements, requiringO(k)
space
Since k ≤ n
and multiple O(n)
space allocations don't change the overall complexity, the total space complexity is O(n)
.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Incorrect Sorting Key Order
A common mistake is reversing the order of criteria in the sorting key or using the wrong sign for frequency sorting:
Incorrect approaches:
# Wrong: This sorts by word first, then frequency
sorted(word_count, key=lambda word: (word, -word_count[word]))
# Wrong: This sorts frequency in ascending order
sorted(word_count, key=lambda word: (word_count[word], word))
# Wrong: This would sort words in reverse alphabetical order
sorted(word_count, key=lambda word: (-word_count[word], -word))
Why it matters: The tuple comparison in Python processes elements left-to-right. Placing the word first would prioritize alphabetical sorting over frequency, completely breaking the requirement.
2. Sorting the Entire Words Array Instead of Unique Words
Some might attempt to sort the original words
array directly:
# Wrong approach
sorted_words = sorted(words, key=lambda word: (-words.count(word), word))
return sorted_words[:k] # This returns k words, not k unique words
Problems with this approach:
- Returns duplicate words if a word appears multiple times
- Extremely inefficient:
O(n²)
time complexity due to callingcount()
for each word - Doesn't return k unique words as required
3. Using a Min-Heap Without Proper Comparison
When optimizing for large datasets, using a heap of size k seems attractive:
import heapq
# Attempting to use a min-heap
heap = []
for word, freq in word_count.items():
heapq.heappush(heap, (freq, word)) # Wrong comparison logic
if len(heap) > k:
heapq.heappop(heap)
Issues:
- Python's heapq is a min-heap, so this keeps the k words with LOWEST frequency
- When frequencies are equal, it compares words lexicographically in ascending order, but we need descending frequency first
- The final heap wouldn't be properly sorted according to requirements
Correct heap approach would require:
# Custom comparison or negative frequencies with careful word handling heap = [] for word, freq in word_count.items(): # Use negative freq for max-heap behavior, but this alone isn't enough # because equal frequencies would sort words in reverse order heapq.heappush(heap, (-freq, word)) # Then extract and sort the final k elements properly
4. Assuming Words Are Already Unique
If you skip the frequency counting step:
# Wrong: assumes each word appears only once
sorted_words = sorted(words) # Just alphabetical sorting
return sorted_words[:k]
This completely ignores the frequency requirement and would fail for inputs like ["the", "the", "day", "is"]
.
Solution to Avoid These Pitfalls
Stick to the proven pattern:
- Always count frequencies first using
Counter
- Sort the unique words (keys of the counter), not the original array
- Use the exact tuple order:
(-frequency, word)
for the sorting key - Verify with test cases that have equal frequencies to ensure lexicographical ordering works correctly
Example test to validate:
words = ["i", "love", "coding", "i", "love", "python"] k = 2 # Expected: ["i", "love"] (both appear twice, alphabetically ordered) # Not: ["love", "i"] or ["i", "i"]
Is the following code DFS or BFS?
void search(Node root) { if (!root) return; visit(root); root.visited = true; for (Node node in root.adjacent) { if (!node.visited) { search(node); } } }
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
https assets algo monster cover_photos heap svg Priority Queue and Heap What is the relationship between priority queue and heap Priority Queue is an Abstract Data Type and Heap is the concrete data structure we use to implement a priority queue Priority Queue A priority queue is a data structure
Want a Structured Path to Master System Design Too? Don’t Miss This!