721. Accounts Merge
Problem Description
The problem presents a scenario where we are given a list of accounts
, each containing a user's name followed by various email addresses that belong to that account. The first item in each sublist is the user's name, and the consecutive items are the emails associated with that user.
The objective is to merge accounts that belong to the same person. The conditions that define if two accounts belong to the same person are based on if there is at least one common email between the two accounts. It is important to note that individuals may share the same name but still be different people, so the name alone isn't a unique identifier. After merging, we seek rearranged accounts where each sublist starts with the username, followed by a sorted list of unique email addresses. The final list of accounts can be returned in any order but should maintain the format of name followed by sorted emails.
Flowchart Walkthrough
Let's analyze the algorithm for LeetCode problem 721. Accounts Merge using the Flowchart. We'll determine the appropriate algorithm through the following step-by-step assessment:
Is it a graph?
- Yes: Each account can be viewed as a node, and there is an edge between two nodes if they share an email address.
Is it a tree?
- No: The structure formed by accounts and their shared emails represents a general graph, which isn't necessarily hierarchical or lacking in cycles like a tree.
Is the problem related to directed acyclic graphs (DAGs)?
- No: In this problem, the direction of edges doesn’t matter. It’s concerned with groups of connected nodes (accounts connected via shared emails), not hierarchical or topologically ordered nodes.
Is the problem related to shortest paths?
- No: The primary concern is to merge accounts connected by common emails, not finding shortest paths.
Does the problem involve connectivity?
- Yes: The main challenge is to find all accounts that are connected by shared emails, which essentially means finding all connected components in the graph.
Is the graph weighted?
- No: There are no weights involved in this connectivity problem. The connections are simply whether or not two accounts share an email.
Conclusion: According to the flowchart, for an unweighted connectivity problem, the suggested approach is BFS. However, in a typical implementation of account merges where deep relationships (recursive connections) may need to be explored, DFS is equally viable and commonly used due to its recursive nature which can be more intuitive when dealing with connected components in an undirected graph like this one. Both BFS and DFS will effectively achieve the goal of identifying connected components in this case.
Intuition
The key intuition behind the solution is rooted in a union-find algorithm, also known as disjoint set union (DSU), which is a data structure that keeps track of a set of elements partitioned into multiple, non-overlapping subsets. It provides a very efficient way to test the belonging of elements and to unite the elements if they belong to different sets.
Step by step, the approach behind the solution is:
- Initialize each account as its own set by mapping it to an index.
- Iterate through each account and each email within that account. For each email, we check if we've seen it before.
- If we have, it means this email is common between accounts, and we need to merge these accounts (sets) into one. This is done by pointing the index of the current account to the index of the previously seen account with the same email. We achieve this by using the union-find algorithm.
- If we have not seen the email before, we mark this email as visited and link it to the current account index.
- Once all accounts are iterated through, we utilize a map that gathers all emails belonging to the same set (as identified by the find operation) together.
- Lastly, we iterate through the map of merged accounts: for each set of emails, we first extract the account index, prepend the account name (since all emails in the set must belong to the same person), and sort the collection of emails.
- We add this structured data to our answer in the specified format of [name, email1, email2, ...] per sublist.
By following the union-find data structure, the solution seamlessly merges accounts that share common emails and ensures that each merged account carries the correct name and set of emails before finalizing the sorted results.
Learn more about Depth-First Search, Breadth-First Search, Union Find and Sorting patterns.
Solution Approach
The implementation of the solution leverages the Union-Find algorithm and a hash map to efficiently merge accounts. Below is a step-by-step explanation of the implementation strategy based on the given Python code.
-
Initialization: Begin by preparing the union-find structure.
n
, the number of accounts, determines the initial parent arrayp
, where each index from0
ton - 1
is initialized to its own value, indicating that each account is currently its own leader in a singleton set. -
Union Operation: For each account and each email in the account, check if the email has been seen before (whether it exists in the
email_id
hash map).- If the email exists, perform a union operation by setting the root (parent) of the current account index (
find(i)
) to be the same as the root of the account index previously seen with this email (find(email_id[email])
). - If the email does not exist, record this email and its account index by putting it in the
email_id
hash map.
- If the email exists, perform a union operation by setting the root (parent) of the current account index (
-
Find Operation: The
find
function is a typical union-find operation that follows the chain of parent pointers fromx
up the tree until it reaches a root element, which is the element whose parent is itself. Path compression is used to make future searches more efficient, by making each looked-up element point directly to the root. -
Building the Merged Account List: Create a
mp
hash map (defaultdict(set)
) where each key is the root parent index of a set, and the value is a set of all emails associated with that parent index (collected from the accounts that have been merged together). For each account and email, find the root parent using thefind
function and add the email to the set associated with the root parent inmp
. -
Sorting and Finalizing the Output: For each unique root parent index in
mp
, start a new list with the name (from the originalaccounts
) and then sort the set of emails collected. This sorted list of emails is appended to the list containing the name. Each such list contains the merged account information, and these lists are appended to theans
list.
By using these data structures and algorithms, the solution efficiently identifies and merges accounts. Union-find facilitates quick unions and lookups during merges, and the hashmap mp
helps in collecting and sorting emails for each unique user.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's consider the following small example to illustrate the solution approach:
Suppose we have the following list of accounts:
- John’s first account: ["John", "[email protected]", "[email protected]"]
- John’s second account: ["John", "[email protected]", "[email protected]"]
- Mary’s account: ["Mary", "[email protected]"]
We need to merge John's two accounts into one because they both contain the email "[email protected]", which indicates they belong to the same person.
Following the solution approach:
-
Initialization: We initialize a parent array
p
wherep[0] = 0
,p[1] = 1
, andp[2] = 2
. Each account is its own set leader. -
Union Operation:
- For John's first account (index 0), we add "[email protected]" and "[email protected]" to the
email_id
map with value0
. - For John's second account (index 1), we find "[email protected]" already in the
email_id
map associated with index0
, so we unite index1
with index0
by settingp[1]
to0
. - "[email protected]" is a new email and is added to the
email_id
map with value1
.
- For John's first account (index 0), we add "[email protected]" and "[email protected]" to the
-
Find Operation:
- The
find
function updates the parent arrayp
as it encounters union operations. After the union operation, both John's accounts would have the same root,0
.
- The
-
Building the Merged Account List:
- We create a
mp
hashmap where keys are the root parent index and values are sets of emails. - "[email protected]" and "[email protected]" are added to
mp[0]
. - Since "[email protected]" has been united under
0
, it also gets added tomp[0]
. - "[email protected]" is added to
mp[2]
.
- We create a
-
Sorting and Finalizing the Output:
- For John's merged account (
mp[0]
), we prepend the name "John" and sort the emails: ["John", "[email protected]", "[email protected]", "[email protected]"]. - For Mary's account (
mp[2]
), it remains as it was since no merge happened: ["Mary", "[email protected]"].
- For John's merged account (
The final answer ans
will have the merged lists for both users in any order:
- [["John", "[email protected]", "[email protected]", "[email protected]"], ["Mary", "[email protected]"]]
Solution Implementation
1from collections import defaultdict
2
3class Solution:
4 def accountsMerge(self, accounts: List[List[str]]) -> List[List[str]]:
5 # Helper function to find the root parent for any index.
6 def find(index: int) -> int:
7 if parent[index] != index:
8 parent[index] = find(parent[index])
9 return parent[index]
10
11 # The number of accounts.
12 num_accounts = len(accounts)
13
14 # Initialize a list where the value at index i is the parent of i.
15 parent = list(range(num_accounts))
16
17 # Dictionary to map each email to its account index.
18 email_to_account_id = {}
19
20 # Assign parents for accounts with shared emails.
21 for account_id, account in enumerate(accounts):
22 for email in account[1:]:
23 if email in email_to_account_id:
24 # Union operation: set the parent of the current account to the account
25 # already associated with this email.
26 parent[find(account_id)] = find(email_to_account_id[email])
27 else:
28 email_to_account_id[email] = account_id
29
30 # Map each account index to a set of emails under the corresponding parent.
31 emails_under_parent_account = defaultdict(set)
32 for account_index, account in enumerate(accounts):
33 for email in account[1:]:
34 emails_under_parent_account[find(account_index)].add(email)
35
36 # Compile the merged accounts: one per parent index.
37 merged_accounts = []
38 for parent_index, emails in emails_under_parent_account.items():
39 # Sort the emails.
40 sorted_emails = sorted(emails)
41 # Prepend the account name.
42 account_name = [accounts[parent_index][0]]
43 merged_account = account_name + sorted_emails
44 # Append the merged account to the result.
45 merged_accounts.append(merged_account)
46
47 return merged_accounts
48
49# Note: List should be imported from typing to use it as a type hint in the function signature.
50# from typing import List
51
1import java.util.*;
2
3public class Solution {
4
5 // Parent array for storing the parent index of each disjoint set (union-find)
6 private int[] parent;
7
8 // Function to find the disjoint parent of a node
9 private int find(int x) {
10 if (parent[x] != x) {
11 // Path compression - Directly attaching x node to its root parent
12 parent[x] = find(parent[x]);
13 }
14 return parent[x];
15 }
16
17 // The main method for merging accounts
18 public List<List<String>> accountsMerge(List<List<String>> accounts) {
19 int n = accounts.size();
20 parent = new int[n];
21
22 // Initialize the parent array with each node as its own parent
23 for (int i = 0; i < n; i++) {
24 parent[i] = i;
25 }
26
27 // Map to store the account index for a unique email
28 Map<String, Integer> emailToIndex = new HashMap<>();
29
30 // Iterate over accounts to find and union the emails
31 for (int i = 0; i < n; i++) {
32 List<String> currentAccount = accounts.get(i);
33 for (int j = 1; j < currentAccount.size(); j++) {
34 String email = currentAccount.get(j);
35 if (emailToIndex.containsKey(email)) {
36 // Perform union operation by setting the parent
37 parent[find(i)] = find(emailToIndex.get(email));
38 } else {
39 // If the email is unique, map it to the current account index
40 emailToIndex.put(email, i);
41 }
42 }
43 }
44
45 // Map to combine emails that belong to the same owner
46 Map<Integer, Set<String>> mergedAccounts = new HashMap<>();
47 for (int i = 0; i < n; i++) {
48 for (int j = 1; j < accounts.get(i).size(); j++) {
49 String email = accounts.get(i).get(j);
50 // Find the set's root and add the email
51 int root = find(i);
52 if (!mergedAccounts.containsKey(root)) {
53 mergedAccounts.put(root, new TreeSet<String>());
54 }
55 mergedAccounts.get(root).add(email);
56 }
57 }
58
59 // Convert the sets of emails back to the required output format
60 List<List<String>> result = new ArrayList<>();
61 for (Integer i : mergedAccounts.keySet()) {
62 List<String> account = new ArrayList<>();
63 account.add(accounts.get(i).get(0)); // Add the account name
64 account.addAll(mergedAccounts.get(i)); // Add all merged emails
65 result.add(account); // Add to final results
66 }
67
68 return result;
69 }
70}
71
1#include <vector>
2#include <string>
3#include <unordered_map>
4#include <unordered_set>
5#include <algorithm>
6
7class Solution {
8public:
9 // The parent vector for storing the disjoint set (union-find) representatives
10 vector<int> parent;
11
12 // The method to find the unique email accounts and merge them if they belong to the same user
13 vector<vector<string>> accountsMerge(vector<vector<string>>& accounts) {
14 int n = accounts.size();
15 // Initialize the parent array for union-find
16 parent.resize(n);
17 for (int i = 0; i < n; ++i) parent[i] = i;
18
19 // Map to store email to the index of the account it belongs to
20 unordered_map<string, int> emailToIdx;
21
22 // Iterate through each account to union emails and assign them a common parent index
23 for (int i = 0; i < n; ++i) {
24 for (int j = 1; j < accounts[i].size(); ++j) {
25 string email = accounts[i][j];
26 if (emailToIdx.count(email)) {
27 // Union operation: merge the sets containing these two emails
28 parent[find(i)] = find(emailToIdx[email]);
29 } else {
30 // Assign the current index to the unique email
31 emailToIdx[email] = i;
32 }
33 }
34 }
35
36 // Map to store the root index of each account and the associated unique emails
37 unordered_map<int, unordered_set<string>> mergedAccounts;
38 for (int i = 0; i < n; ++i) {
39 for (int j = 1; j < accounts[i].size(); ++j) {
40 string email = accounts[i][j];
41 // Insert the email into the set that belongs to the root parent of i
42 mergedAccounts[find(i)].insert(email);
43 }
44 }
45
46 // Prepare the final result from the merged accounts
47 vector<vector<string>> result;
48 for (auto& [i, emails] : mergedAccounts) {
49 vector<string> account;
50 account.push_back(accounts[i][0]); // Add the name of the account owner
51 account.insert(account.end(), emails.begin(), emails.end()); // Add all the merged emails
52 sort(account.begin() + 1, account.end()); // Sort the emails
53 result.push_back(account); // Add this list to the results
54 }
55
56 return result;
57 }
58
59 // The method to find the representative of the set that element x belongs to
60 int find(int x) {
61 if (parent[x] != x) {
62 // Path compression: update the parent to the representative of the set for faster future lookups
63 parent[x] = find(parent[x]);
64 }
65 return parent[x];
66 }
67};
68
1// Emulating a vector using an array
2let parent: number[] = [];
3
4// Helper function to find the representative of the set that element x belongs to
5function find(x: number): number {
6 if (parent[x] !== x) {
7 // Path compression: update the parent to the representative for faster lookups
8 parent[x] = find(parent[x]);
9 }
10 return parent[x];
11}
12
13// Function to merge email accounts
14function accountsMerge(accounts: string[][]): string[][] {
15 const n: number = accounts.length;
16 // Initialize the parent array for the disjoint set (union-find)
17 parent = Array.from({ length: n }, (_, index) => index);
18
19 // Map to store email to the index of the account it belongs to
20 const emailToIndex: Map<string, number> = new Map();
21
22 // Iterate through each account to union the emails and assign them a common parent index
23 accounts.forEach((account, i) => {
24 account.slice(1).forEach(email => {
25 if (emailToIndex.has(email)) {
26 // Union operation: merge the sets that contain these two emails
27 parent[find(i)] = find(emailToIndex.get(email)!);
28 } else {
29 // Assign the current index to the unique email
30 emailToIndex.set(email, i);
31 }
32 });
33 });
34
35 // Map to store the root index of each account and associated unique emails
36 const mergedAccounts: Map<number, Set<string>> = new Map();
37
38 accounts.forEach((account, i) => {
39 account.slice(1).forEach(email => {
40 const root: number = find(i);
41 if (!mergedAccounts.has(root)) {
42 mergedAccounts.set(root, new Set());
43 }
44 mergedAccounts.get(root)!.add(email);
45 });
46 });
47
48 // Prepare the final result from the merged accounts
49 const result: string[][] = [];
50 mergedAccounts.forEach((emails, i) => {
51 const account: string[] = [accounts[i][0]]; // Add the name of the account owner
52 const sortedEmails: string[] = Array.from(emails).sort(); // Sort the emails
53 result.push(account.concat(sortedEmails)); // Add this list to the results
54 });
55
56 return result;
57}
58
Time and Space Complexity
Time Complexity
The time complexity of the code is primarily determined by the following factors:
-
The iteration over each account and email to build the
email_id
mapping and perform union operations: This isO(N * K)
, whereN
is the number of accounts andK
is the average number of emails per account. -
The path compression in the
find
function helps in keeping the union-find operations nearly constant time, at worst with amortized complexityO(α(N))
, whereα
is the inverse Ackermann function which grows very slowly. -
Iterating over each account again and performing find operations on each email to build the
mp
mapping: This also takesO(N * K)
time, as we are performing essentially constant-time operations per email. -
Sorting of the email lists: In the worst case, if all emails end up in one set, the time complexity would be
O(E log E)
, whereE
is the total number of emails across all accounts. -
Combining the names with the sorted emails: This can be considered
O(E)
as it is linear to the number of emails.
Thus, the overall time complexity would be O(N * K + E log E)
.
Space Complexity
The space complexity is determined by the data structures used:
-
The parent list
p
, which is of sizeO(N)
. -
The
email_id
dictionary, which could contain at mostO(E)
email-to-account mappings. -
The
mp
defaultdict, which can storeO(E)
email entries. -
The output list, which would also contain
O(E)
email entries.
Therefore, the overall space complexity of the code is O(N + E)
.
Learn more about how to find time and space complexity quickly using problem constraints.
Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?
Recommended Readings
https algomonster s3 us east 2 amazonaws com cover_photos dfs svg Depth First Search Prereqs Recursion Review problems recursion_intro Trees problems tree_intro With a solid understanding of recursion under our belts we are now ready to tackle one of the most useful techniques in coding interviews Depth First Search DFS
https algomonster s3 us east 2 amazonaws com cover_photos bfs svg Breadth First Search on Trees Hopefully by this time you've drunk enough DFS Kool Aid to understand its immense power and seen enough visualization to create a call stack in your mind Now let me introduce the companion spell
Union Find Disjoint Set Union Data Structure Introduction Prerequisite Depth First Search Review problems dfs_intro Once we have a strong grasp of recursion and Depth First Search we can now introduce Disjoint Set Union DSU This data structure is motivated by the following problem Suppose we have sets of elements
Want a Structured Path to Master System Design Too? Don’t Miss This!