49. Group Anagrams
Problem Description
You are given an array of strings strs
. Your task is to group all the anagrams together and return them as a list of lists. Anagrams are words that contain the same characters but in different orders (for example, "eat" and "tea" are anagrams).
The problem asks you to:
- Take an array of strings as input
- Identify which strings are anagrams of each other
- Group these anagrams together into separate lists
- Return all groups as a list of lists
For example, if the input is ["eat", "tea", "tan", "ate", "nat", "bat"]
:
"eat"
,"tea"
, and"ate"
are anagrams of each other (they all contain the letters 'a', 'e', 't')"tan"
and"nat"
are anagrams of each other (they both contain 'a', 'n', 't')"bat"
has no anagrams in the list
So the output would be: [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
The order of the groups in the output doesn't matter, and the order of strings within each group also doesn't matter. The key requirement is that all anagrams must be grouped together.
Intuition
The key insight is that anagrams share a common property: when their characters are sorted, they produce the exact same string. For instance, both "eat"
and "tea"
become "aet"
when their characters are sorted alphabetically.
This observation leads us to think about using this sorted string as a unique identifier or "signature" for each group of anagrams. If two strings produce the same sorted string, they must be anagrams of each other.
We need a way to collect all strings that share the same signature. A hash table (dictionary) is perfect for this because:
- We can use the sorted string as the
key
- We can store all original strings that map to this key as a
list
in thevalue
The process becomes straightforward:
- For each string, compute its sorted version (the signature)
- Use this signature to determine which group the string belongs to
- Add the string to its corresponding group in the hash table
By the end, each key in our hash table represents a unique anagram pattern, and its value contains all strings from the input that match this pattern. The final answer is simply all the values from our hash table.
This approach elegantly solves the grouping problem in a single pass through the input array, making it both intuitive and efficient.
Learn more about Sorting patterns.
Solution Approach
Let's walk through the implementation step by step using a hash table approach:
1. Initialize a Hash Table
We use a defaultdict(list)
in Python, which automatically creates an empty list for any new key we encounter. This prevents key errors when adding strings to groups.
2. Process Each String
For each string s
in the input array strs
:
- Sort the characters of the string:
sorted(s)
returns a list of characters in alphabetical order - Join them back into a string:
''.join(sorted(s))
creates our key - For example:
"eat"
β['a', 'e', 't']
β"aet"
3. Group Strings by Their Sorted Key
Using the sorted string as the key k
, append the original string to the corresponding list in our hash table: d[k].append(s)
4. Build the Result After processing all strings, our hash table contains:
- Keys: sorted strings (like
"aet"
,"ant"
,"abt"
) - Values: lists of original strings that are anagrams
Return all the value lists: list(d.values())
Example Walkthrough:
Given strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
:
Step | Current String | Sorted Key | Hash Table State |
---|---|---|---|
1 | "eat" | "aet" | {"aet": ["eat"]} |
2 | "tea" | "aet" | {"aet": ["eat", "tea"]} |
3 | "tan" | "ant" | {"aet": ["eat", "tea"], "ant": ["tan"]} |
4 | "ate" | "aet" | {"aet": ["eat", "tea", "ate"], "ant": ["tan"]} |
5 | "nat" | "ant" | {"aet": ["eat", "tea", "ate"], "ant": ["tan", "nat"]} |
6 | "bat" | "abt" | {"aet": ["eat", "tea", "ate"], "ant": ["tan", "nat"], "abt": ["bat"]} |
Final output: [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
The time complexity is O(n * k log k)
where n
is the number of strings and k
is the maximum length of a string (due to sorting each string). The space complexity is O(n * k)
for storing all strings in the hash table.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example with strs = ["abc", "bca", "tan", "nat"]
to illustrate the solution approach.
Step-by-step process:
-
Initialize an empty hash table to store our groups.
-
Process "abc":
- Sort the characters: 'a', 'b', 'c' β "abc"
- Use "abc" as the key
- Add "abc" to the hash table:
{"abc": ["abc"]}
-
Process "bca":
- Sort the characters: 'b', 'c', 'a' β "abc"
- The key "abc" already exists
- Add "bca" to the existing group:
{"abc": ["abc", "bca"]}
-
Process "tan":
- Sort the characters: 't', 'a', 'n' β "ant"
- Use "ant" as a new key
- Add "tan" to the hash table:
{"abc": ["abc", "bca"], "ant": ["tan"]}
-
Process "nat":
- Sort the characters: 'n', 'a', 't' β "ant"
- The key "ant" already exists
- Add "nat" to the existing group:
{"abc": ["abc", "bca"], "ant": ["tan", "nat"]}
-
Extract the final result:
- Take all values from the hash table
- Result:
[["abc", "bca"], ["tan", "nat"]]
The key insight here is that "abc" and "bca" both produce the same sorted string "abc", so they're recognized as anagrams and grouped together. Similarly, "tan" and "nat" both produce "ant" when sorted, placing them in the same group. This demonstrates how sorting characters creates a unique signature for each anagram group.
Solution Implementation
1from collections import defaultdict
2from typing import List
3
4class Solution:
5 def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
6 # Dictionary to store anagram groups
7 # Key: sorted string (anagram signature)
8 # Value: list of original strings that are anagrams
9 anagram_groups = defaultdict(list)
10
11 # Iterate through each string in the input list
12 for string in strs:
13 # Create a key by sorting the characters of the string
14 # All anagrams will have the same sorted character sequence
15 sorted_key = ''.join(sorted(string))
16
17 # Add the original string to its anagram group
18 anagram_groups[sorted_key].append(string)
19
20 # Convert the dictionary values to a list and return
21 # Each value is a list of strings that are anagrams of each other
22 return list(anagram_groups.values())
23
1class Solution {
2 public List<List<String>> groupAnagrams(String[] strs) {
3 // HashMap to store sorted string as key and list of anagrams as value
4 Map<String, List<String>> anagramGroups = new HashMap<>();
5
6 // Iterate through each string in the input array
7 for (String str : strs) {
8 // Convert string to character array for sorting
9 char[] charArray = str.toCharArray();
10
11 // Sort the character array to create a unique key for anagrams
12 Arrays.sort(charArray);
13
14 // Convert sorted character array back to string to use as key
15 String sortedKey = String.valueOf(charArray);
16
17 // Add the original string to the corresponding anagram group
18 // If key doesn't exist, create a new ArrayList for this group
19 anagramGroups.computeIfAbsent(sortedKey, key -> new ArrayList<>()).add(str);
20 }
21
22 // Return all anagram groups as a list of lists
23 return new ArrayList<>(anagramGroups.values());
24 }
25}
26
1class Solution {
2public:
3 vector<vector<string>> groupAnagrams(vector<string>& strs) {
4 // Hash map to store sorted string as key and list of anagrams as value
5 unordered_map<string, vector<string>> anagramGroups;
6
7 // Iterate through each string in the input array
8 for (const auto& str : strs) {
9 // Create a copy of the current string to use as key
10 string sortedKey = str;
11
12 // Sort the characters to create a unique key for all anagrams
13 sort(sortedKey.begin(), sortedKey.end());
14
15 // Add the original string to the corresponding anagram group
16 anagramGroups[sortedKey].emplace_back(str);
17 }
18
19 // Result vector to store all groups of anagrams
20 vector<vector<string>> result;
21
22 // Extract all anagram groups from the hash map
23 for (auto& [key, group] : anagramGroups) {
24 result.emplace_back(group);
25 }
26
27 return result;
28 }
29};
30
1/**
2 * Groups anagrams together from an array of strings
3 * @param strs - Array of strings to group by anagrams
4 * @returns Array of arrays, where each inner array contains anagrams
5 */
6function groupAnagrams(strs: string[]): string[][] {
7 // Map to store sorted string as key and array of anagrams as value
8 const anagramMap: Map<string, string[]> = new Map<string, string[]>();
9
10 // Iterate through each string in the input array
11 for (const str of strs) {
12 // Sort characters of the string to create a unique key for anagrams
13 const sortedKey: string = str.split('').sort().join('');
14
15 // Initialize array for this key if it doesn't exist
16 if (!anagramMap.has(sortedKey)) {
17 anagramMap.set(sortedKey, []);
18 }
19
20 // Add the original string to its anagram group
21 anagramMap.get(sortedKey)!.push(str);
22 }
23
24 // Convert map values to array and return
25 return Array.from(anagramMap.values());
26}
27
Time and Space Complexity
Time Complexity: O(n Γ k Γ log k)
n
is the number of strings in the input arrayk
is the maximum length of a string in the array- For each of the
n
strings, we perform a sorting operation usingsorted(s)
- Sorting a string of length
k
takesO(k Γ log k)
time - Therefore, the total time complexity is
O(n Γ k Γ log k)
Space Complexity: O(n Γ k)
- The dictionary
d
stores alln
strings, where each string has up tok
characters, requiringO(n Γ k)
space - For each string, we create a sorted key string which takes
O(k)
space temporarily - The output list contains all the original strings, which is
O(n Γ k)
space - Overall space complexity is
O(n Γ k)
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Using Mutable Objects as Dictionary Keys
A common mistake is attempting to use the sorted list directly as a dictionary key:
# WRONG - This will raise a TypeError
for string in strs:
sorted_key = sorted(string) # Returns a list ['a', 'e', 't']
anagram_groups[sorted_key].append(string) # Lists can't be dictionary keys!
Why it fails: Python dictionaries require immutable objects as keys. Lists are mutable and therefore cannot be used as dictionary keys.
Solution: Convert the sorted list to an immutable type like a string or tuple:
# Correct approach 1: Join into a string
sorted_key = ''.join(sorted(string))
# Correct approach 2: Convert to tuple
sorted_key = tuple(sorted(string))
2. Incorrect Character Frequency Counting
Some might try to use character counting without proper sorting:
# WRONG - This doesn't properly identify anagrams
for string in strs:
key = ''.join(set(string)) # Using set loses duplicate characters!
anagram_groups[key].append(string)
Why it fails: Using set()
removes duplicate characters. For example, "aab" and "ab" would incorrectly be grouped together since set("aab")
and set("ab")
both produce {'a', 'b'}
.
Solution: Either sort the string properly or use a frequency count approach:
# Alternative correct approach using character frequency
from collections import Counter
for string in strs:
# Create a tuple of (char, count) pairs sorted by character
sorted_key = tuple(sorted(Counter(string).items()))
anagram_groups[sorted_key].append(string)
3. Not Handling Empty Strings or Edge Cases
The code might fail to handle edge cases properly:
# Input: ["", "", "a"] # Without proper handling, empty strings might cause issues
Solution: The provided solution actually handles empty strings correctly since sorting an empty string returns an empty list, which joins to an empty string key. However, be aware that all empty strings will be grouped together, which is the correct behavior.
4. Performance Issues with Large Strings
For very large strings, sorting can become expensive. While the O(k log k) sorting approach works well for typical cases, an alternative counting approach can achieve O(k) time per string:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
anagram_groups = defaultdict(list)
for string in strs:
# Create a frequency count key (26 letters)
count = [0] * 26
for char in string:
count[ord(char) - ord('a')] += 1
# Convert count array to tuple for use as key
key = tuple(count)
anagram_groups[key].append(string)
return list(anagram_groups.values())
This approach is more efficient for very long strings but uses more memory for the 26-element count array per unique anagram group.
In a binary min heap, the maximum element can be found in:
Recommended Readings
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!