Facebook Pixel

692. Top K Frequent Words

MediumTrieArrayHash TableStringBucket SortCountingSortingHeap (Priority Queue)
Leetcode Link

Problem Description

You are given an array of strings words and an integer k. Your task is to find and return the k most frequently occurring strings from the array.

The returned strings must be sorted according to these rules:

  • First, sort by frequency in descending order (most frequent words come first)
  • If two words have the same frequency, sort them lexicographically (alphabetical order)

For example, if you have words with frequencies like:

  • "apple" appears 3 times
  • "banana" appears 3 times
  • "cherry" appears 2 times

Since "apple" and "banana" both appear 3 times, they would be sorted alphabetically between themselves, so "apple" would come before "banana". Both would come before "cherry" since they have higher frequency.

The function should return exactly k strings following these sorting criteria.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

To find the most frequent words, we need to count how many times each word appears. This naturally leads us to think about using a frequency counter or hash table, where each word is mapped to its count.

Once we have the frequencies, we need to identify the top k words. However, there's a twist - we can't just pick any k words with the highest frequencies. When multiple words share the same frequency, we need to break ties using lexicographical order.

The key insight is that we can handle both sorting requirements (frequency and lexicographical) simultaneously by creating a custom sorting criteria. We want:

  • Words with higher frequency to come first (descending order by frequency)
  • Among words with equal frequency, those that are lexicographically smaller to come first (ascending alphabetical order)

In Python, we can express this by sorting with a key function that returns a tuple: (-frequency, word). The negative frequency ensures descending order for counts (since Python sorts in ascending order by default), while the word itself provides the lexicographical ordering for ties.

After sorting all words by this criteria, we simply take the first k words from the sorted list. This guarantees we get the k most frequent words with proper tie-breaking.

Learn more about Trie, Sorting and Heap (Priority Queue) patterns.

Solution Approach

The implementation follows these steps:

  1. Count Word Frequencies: We use Python's Counter from the collections module to create a hash table cnt that maps each word to its frequency. This gives us cnt[word] = frequency for each unique word in the input array.

  2. Custom Sorting: We apply Python's sorted() function directly on the keys of the counter (which are the unique words). The sorting uses a lambda function as the key:

    • lambda x: (-cnt[x], x)
    • This returns a tuple for each word x
    • The first element -cnt[x] is the negative frequency, ensuring words with higher counts come first
    • The second element x is the word itself, providing lexicographical ordering when frequencies are equal
  3. Select Top K: After sorting, we use list slicing [:k] to extract the first k elements from the sorted list.

The beauty of this solution lies in its conciseness - the entire logic is compressed into a single line:

return sorted(cnt, key=lambda x: (-cnt[x], x))[:k]

This approach leverages Python's stable sorting algorithm and tuple comparison behavior. When comparing tuples, Python first compares the first elements; if they're equal, it moves to the second elements. This naturally handles our two-level sorting requirement.

The time complexity is O(n log n) where n is the number of unique words, due to the sorting operation. The space complexity is O(n) for storing the counter hash table.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to see how the solution works step by step.

Input: words = ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4

Step 1: Count Word Frequencies

First, we create a frequency counter using Counter(words):

cnt = {
    "the": 4,
    "is": 3,
    "sunny": 2,
    "day": 1
}

Step 2: Apply Custom Sorting

Now we sort the unique words using our lambda function lambda x: (-cnt[x], x).

For each word, we create a tuple:

  • "the" → (-4, "the")
  • "is" → (-3, "is")
  • "sunny" → (-2, "sunny")
  • "day" → (-1, "day")

Python sorts these tuples in ascending order:

  1. (-4, "the") comes first (smallest negative number = highest frequency)
  2. (-3, "is") comes second
  3. (-2, "sunny") comes third
  4. (-1, "day") comes last

If we had two words with the same frequency, say both "apple" and "banana" appeared 3 times:

  • "apple" → (-3, "apple")
  • "banana" → (-3, "banana")

Since the first element is the same (-3), Python would compare the second elements alphabetically, placing "apple" before "banana".

Step 3: Select Top K Elements

After sorting, our list is: ["the", "is", "sunny", "day"]

We take the first k = 4 elements using slicing [:4].

Output: ["the", "is", "sunny", "day"]

The words are ordered by frequency (4, 3, 2, 1), and if there were ties, they would be resolved alphabetically. This demonstrates how the single line return sorted(cnt, key=lambda x: (-cnt[x], x))[:k] elegantly handles both sorting requirements simultaneously.

Solution Implementation

1from typing import List
2from collections import Counter
3
4class Solution:
5    def topKFrequent(self, words: List[str], k: int) -> List[str]:
6        # Count the frequency of each word in the input list
7        word_count = Counter(words)
8      
9        # Sort words by:
10        # 1. Frequency in descending order (higher frequency first)
11        # 2. Lexicographical order for words with same frequency
12        sorted_words = sorted(word_count, key=lambda word: (-word_count[word], word))
13      
14        # Return the top k most frequent words
15        return sorted_words[:k]
16
1class Solution {
2    public List<String> topKFrequent(String[] words, int k) {
3        // Build frequency map: word -> count
4        Map<String, Integer> frequencyMap = new HashMap<>();
5        for (String word : words) {
6            frequencyMap.merge(word, 1, Integer::sum);
7        }
8      
9        // Sort words array based on:
10        // 1. Frequency in descending order (higher frequency first)
11        // 2. Lexicographical order for words with same frequency
12        Arrays.sort(words, (word1, word2) -> {
13            int frequency1 = frequencyMap.get(word1);
14            int frequency2 = frequencyMap.get(word2);
15          
16            if (frequency1 == frequency2) {
17                // Same frequency: sort lexicographically (ascending)
18                return word1.compareTo(word2);
19            } else {
20                // Different frequency: sort by frequency (descending)
21                return frequency2 - frequency1;
22            }
23        });
24      
25        // Collect top k unique words from sorted array
26        List<String> result = new ArrayList<>();
27        for (int i = 0; i < words.length && result.size() < k; i++) {
28            // Add word only if it's the first occurrence or different from previous
29            if (i == 0 || !words[i].equals(words[i - 1])) {
30                result.add(words[i]);
31            }
32        }
33      
34        return result;
35    }
36}
37
1class Solution {
2public:
3    vector<string> topKFrequent(vector<string>& words, int k) {
4        // Build frequency map to count occurrences of each word
5        unordered_map<string, int> frequencyMap;
6        for (const auto& word : words) {
7            ++frequencyMap[word];
8        }
9      
10        // Extract all unique words into a result vector
11        vector<string> result;
12        for (const auto& [word, frequency] : frequencyMap) {
13            result.push_back(word);
14        }
15      
16        // Sort words by frequency (descending) and lexicographically (ascending) for ties
17        ranges::sort(result, [&](const string& word1, const string& word2) {
18            // If frequencies are different, sort by frequency in descending order
19            // If frequencies are the same, sort lexicographically in ascending order
20            return frequencyMap[word1] > frequencyMap[word2] || 
21                   (frequencyMap[word1] == frequencyMap[word2] && word1 < word2);
22        });
23      
24        // Keep only the top k frequent words
25        result.resize(k);
26      
27        return result;
28    }
29};
30
1/**
2 * Find the k most frequent words in an array
3 * @param words - Array of words to analyze
4 * @param k - Number of top frequent words to return
5 * @returns Array of k most frequent words, sorted by frequency (descending) and lexicographically for ties
6 */
7function topKFrequent(words: string[], k: number): string[] {
8    // Create a frequency map to count occurrences of each word
9    const frequencyMap: Map<string, number> = new Map();
10  
11    // Count the frequency of each word
12    for (const word of words) {
13        frequencyMap.set(word, (frequencyMap.get(word) || 0) + 1);
14    }
15  
16    // Get all unique words from the frequency map
17    const uniqueWords: string[] = Array.from(frequencyMap.keys());
18  
19    // Sort words by frequency (descending), then lexicographically for words with same frequency
20    uniqueWords.sort((wordA, wordB) => {
21        const frequencyA = frequencyMap.get(wordA)!;
22        const frequencyB = frequencyMap.get(wordB)!;
23      
24        if (frequencyA === frequencyB) {
25            // If frequencies are equal, sort lexicographically (ascending)
26            return wordA.localeCompare(wordB);
27        }
28        // Sort by frequency in descending order
29        return frequencyB - frequencyA;
30    });
31  
32    // Return the top k most frequent words
33    return uniqueWords.slice(0, k);
34}
35

Time and Space Complexity

The time complexity is O(n × log n), where n is the number of unique words in the input list.

  • Creating the Counter takes O(m) time, where m is the total number of words in the input list
  • The sorted() function sorts the unique words (keys in cnt), which has at most n unique words
  • The sorting operation with custom key takes O(n × log n) time
  • Slicing the first k elements takes O(k) time

Since n ≤ m (unique words cannot exceed total words) and O(n × log n) dominates, the overall time complexity is O(n × log n).

The space complexity is O(n), where n is the number of unique words.

  • The Counter object stores at most n unique words and their frequencies, requiring O(n) space
  • The sorted() function creates a new list of size n, requiring O(n) space
  • The final sliced result contains k elements, requiring O(k) space

Since k ≤ n and multiple O(n) space allocations don't change the overall complexity, the total space complexity is O(n).

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect Sorting Key Order

A common mistake is reversing the order of criteria in the sorting key or using the wrong sign for frequency sorting:

Incorrect approaches:

# Wrong: This sorts by word first, then frequency
sorted(word_count, key=lambda word: (word, -word_count[word]))

# Wrong: This sorts frequency in ascending order
sorted(word_count, key=lambda word: (word_count[word], word))

# Wrong: This would sort words in reverse alphabetical order
sorted(word_count, key=lambda word: (-word_count[word], -word))

Why it matters: The tuple comparison in Python processes elements left-to-right. Placing the word first would prioritize alphabetical sorting over frequency, completely breaking the requirement.

2. Sorting the Entire Words Array Instead of Unique Words

Some might attempt to sort the original words array directly:

# Wrong approach
sorted_words = sorted(words, key=lambda word: (-words.count(word), word))
return sorted_words[:k]  # This returns k words, not k unique words

Problems with this approach:

  • Returns duplicate words if a word appears multiple times
  • Extremely inefficient: O(n²) time complexity due to calling count() for each word
  • Doesn't return k unique words as required

3. Using a Min-Heap Without Proper Comparison

When optimizing for large datasets, using a heap of size k seems attractive:

import heapq
# Attempting to use a min-heap
heap = []
for word, freq in word_count.items():
    heapq.heappush(heap, (freq, word))  # Wrong comparison logic
    if len(heap) > k:
        heapq.heappop(heap)

Issues:

  • Python's heapq is a min-heap, so this keeps the k words with LOWEST frequency
  • When frequencies are equal, it compares words lexicographically in ascending order, but we need descending frequency first
  • The final heap wouldn't be properly sorted according to requirements

Correct heap approach would require:

# Custom comparison or negative frequencies with careful word handling
heap = []
for word, freq in word_count.items():
    # Use negative freq for max-heap behavior, but this alone isn't enough
    # because equal frequencies would sort words in reverse order
    heapq.heappush(heap, (-freq, word))
# Then extract and sort the final k elements properly

4. Assuming Words Are Already Unique

If you skip the frequency counting step:

# Wrong: assumes each word appears only once
sorted_words = sorted(words)  # Just alphabetical sorting
return sorted_words[:k]

This completely ignores the frequency requirement and would fail for inputs like ["the", "the", "day", "is"].

Solution to Avoid These Pitfalls

Stick to the proven pattern:

  1. Always count frequencies first using Counter
  2. Sort the unique words (keys of the counter), not the original array
  3. Use the exact tuple order: (-frequency, word) for the sorting key
  4. Verify with test cases that have equal frequencies to ensure lexicographical ordering works correctly

Example test to validate:

words = ["i", "love", "coding", "i", "love", "python"]
k = 2
# Expected: ["i", "love"] (both appear twice, alphabetically ordered)
# Not: ["love", "i"] or ["i", "i"]
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Is the following code DFS or BFS?

void search(Node root) {
  if (!root) return;
  visit(root);
  root.visited = true;
  for (Node node in root.adjacent) {
    if (!node.visited) {
      search(node);
    }
  }
}

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More