721. Accounts Merge

MediumDepth-First SearchBreadth-First SearchUnion FindArrayHash TableStringSorting

Problem Description

The problem presents a scenario where we are given a list of accounts, each containing a user's name followed by various email addresses that belong to that account. The first item in each sublist is the user's name, and the consecutive items are the emails associated with that user.

The objective is to merge accounts that belong to the same person. The conditions that define if two accounts belong to the same person are based on if there is at least one common email between the two accounts. It is important to note that individuals may share the same name but still be different people, so the name alone isn't a unique identifier. After merging, we seek rearranged accounts where each sublist starts with the username, followed by a sorted list of unique email addresses. The final list of accounts can be returned in any order but should maintain the format of name followed by sorted emails.

Flowchart Walkthrough

Let's analyze the algorithm for LeetCode problem 721. Accounts Merge using the Flowchart. We'll determine the appropriate algorithm through the following step-by-step assessment:

Is it a graph?

Yes: Each account can be viewed as a node, and there is an edge between two nodes if they share an email address.

Is it a tree?

No: The structure formed by accounts and their shared emails represents a general graph, which isn't necessarily hierarchical or lacking in cycles like a tree.

Is the problem related to directed acyclic graphs (DAGs)?

No: In this problem, the direction of edges doesn’t matter. It’s concerned with groups of connected nodes (accounts connected via shared emails), not hierarchical or topologically ordered nodes.

Is the problem related to shortest paths?

No: The primary concern is to merge accounts connected by common emails, not finding shortest paths.

Does the problem involve connectivity?

Yes: The main challenge is to find all accounts that are connected by shared emails, which essentially means finding all connected components in the graph.

Is the graph weighted?

No: There are no weights involved in this connectivity problem. The connections are simply whether or not two accounts share an email.

Conclusion: According to the flowchart, for an unweighted connectivity problem, the suggested approach is BFS. However, in a typical implementation of account merges where deep relationships (recursive connections) may need to be explored, DFS is equally viable and commonly used due to its recursive nature which can be more intuitive when dealing with connected components in an undirected graph like this one. Both BFS and DFS will effectively achieve the goal of identifying connected components in this case.

Intuition

The key intuition behind the solution is rooted in a union-find algorithm, also known as disjoint set union (DSU), which is a data structure that keeps track of a set of elements partitioned into multiple, non-overlapping subsets. It provides a very efficient way to test the belonging of elements and to unite the elements if they belong to different sets.

Step by step, the approach behind the solution is:

Initialize each account as its own set by mapping it to an index.
Iterate through each account and each email within that account. For each email, we check if we've seen it before.
If we have, it means this email is common between accounts, and we need to merge these accounts (sets) into one. This is done by pointing the index of the current account to the index of the previously seen account with the same email. We achieve this by using the union-find algorithm.
If we have not seen the email before, we mark this email as visited and link it to the current account index.
Once all accounts are iterated through, we utilize a map that gathers all emails belonging to the same set (as identified by the find operation) together.
Lastly, we iterate through the map of merged accounts: for each set of emails, we first extract the account index, prepend the account name (since all emails in the set must belong to the same person), and sort the collection of emails.
We add this structured data to our answer in the specified format of [name, email1, email2, ...] per sublist.

By following the union-find data structure, the solution seamlessly merges accounts that share common emails and ensures that each merged account carries the correct name and set of emails before finalizing the sorted results.

Learn more about Depth-First Search, Breadth-First Search, Union Find and Sorting patterns.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Solution Approach

The implementation of the solution leverages the Union-Find algorithm and a hash map to efficiently merge accounts. Below is a step-by-step explanation of the implementation strategy based on the given Python code.

Initialization: Begin by preparing the union-find structure. n, the number of accounts, determines the initial parent array p, where each index from 0 to n - 1 is initialized to its own value, indicating that each account is currently its own leader in a singleton set.
Union Operation: For each account and each email in the account, check if the email has been seen before (whether it exists in the email_id hash map).
- If the email exists, perform a union operation by setting the root (parent) of the current account index (find(i)) to be the same as the root of the account index previously seen with this email (find(email_id[email])).
- If the email does not exist, record this email and its account index by putting it in the email_id hash map.
Find Operation: The find function is a typical union-find operation that follows the chain of parent pointers from x up the tree until it reaches a root element, which is the element whose parent is itself. Path compression is used to make future searches more efficient, by making each looked-up element point directly to the root.
Building the Merged Account List: Create a mp hash map (defaultdict(set)) where each key is the root parent index of a set, and the value is a set of all emails associated with that parent index (collected from the accounts that have been merged together). For each account and email, find the root parent using the find function and add the email to the set associated with the root parent in mp.
Sorting and Finalizing the Output: For each unique root parent index in mp, start a new list with the name (from the original accounts) and then sort the set of emails collected. This sorted list of emails is appended to the list containing the name. Each such list contains the merged account information, and these lists are appended to the ans list.

By using these data structures and algorithms, the solution efficiently identifies and merges accounts. Union-find facilitates quick unions and lookups during merges, and the hashmap mp helps in collecting and sorting emails for each unique user.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's consider the following small example to illustrate the solution approach:

Suppose we have the following list of accounts:

John’s first account: ["John", "johnsmith@mail.com", "john_newyork@mail.com"]
John’s second account: ["John", "johnsmith@mail.com", "john00@mail.com"]
Mary’s account: ["Mary", "mary@mail.com"]

We need to merge John's two accounts into one because they both contain the email "johnsmith@mail.com", which indicates they belong to the same person.

Following the solution approach:

Initialization: We initialize a parent array p where p[0] = 0, p[1] = 1, and p[2] = 2. Each account is its own set leader.
Union Operation:
- For John's first account (index 0), we add "johnsmith@mail.com" and "john_newyork@mail.com" to the email_id map with value 0.
- For John's second account (index 1), we find "johnsmith@mail.com" already in the email_id map associated with index 0, so we unite index 1 with index 0 by setting p[1] to 0.
- "john00@mail.com" is a new email and is added to the email_id map with value 1.
Find Operation:
- The find function updates the parent array p as it encounters union operations. After the union operation, both John's accounts would have the same root, 0.
Building the Merged Account List:
- We create a mp hashmap where keys are the root parent index and values are sets of emails.
- "johnsmith@mail.com" and "john_newyork@mail.com" are added to mp[0].
- Since "john00@mail.com" has been united under 0, it also gets added to mp[0].
- "mary@mail.com" is added to mp[2].
Sorting and Finalizing the Output:
- For John's merged account (mp[0]), we prepend the name "John" and sort the emails: ["John", "john00@mail.com", "john_newyork@mail.com", "johnsmith@mail.com"].
- For Mary's account (mp[2]), it remains as it was since no merge happened: ["Mary", "mary@mail.com"].

The final answer ans will have the merged lists for both users in any order:

[["John", "john00@mail.com", "john_newyork@mail.com", "johnsmith@mail.com"], ["Mary", "mary@mail.com"]]

Solution Implementation

1from collections import defaultdict
2
3class Solution:
4    def accountsMerge(self, accounts: List[List[str]]) -> List[List[str]]:
5        # Helper function to find the root parent for any index.
6        def find(index: int) -> int:
7            if parent[index] != index:
8                parent[index] = find(parent[index])
9            return parent[index]
10      
11        # The number of accounts.
12        num_accounts = len(accounts)
13      
14        # Initialize a list where the value at index i is the parent of i.
15        parent = list(range(num_accounts))
16      
17        # Dictionary to map each email to its account index.
18        email_to_account_id = {}
19      
20        # Assign parents for accounts with shared emails.
21        for account_id, account in enumerate(accounts):
22            for email in account[1:]:
23                if email in email_to_account_id:
24                    # Union operation: set the parent of the current account to the account 
25                    # already associated with this email.
26                    parent[find(account_id)] = find(email_to_account_id[email])
27                else:
28                    email_to_account_id[email] = account_id
29      
30        # Map each account index to a set of emails under the corresponding parent.
31        emails_under_parent_account = defaultdict(set)
32        for account_index, account in enumerate(accounts):
33            for email in account[1:]:
34                emails_under_parent_account[find(account_index)].add(email)
35      
36        # Compile the merged accounts: one per parent index.
37        merged_accounts = []
38        for parent_index, emails in emails_under_parent_account.items():
39            # Sort the emails.
40            sorted_emails = sorted(emails)
41            # Prepend the account name.
42            account_name = [accounts[parent_index][0]]
43            merged_account = account_name + sorted_emails
44            # Append the merged account to the result.
45            merged_accounts.append(merged_account)
46      
47        return merged_accounts
48
49# Note: List should be imported from typing to use it as a type hint in the function signature.
50# from typing import List
51

1import java.util.*;
2
3public class Solution {
4
5    // Parent array for storing the parent index of each disjoint set (union-find)
6    private int[] parent;
7
8    // Function to find the disjoint parent of a node
9    private int find(int x) {
10        if (parent[x] != x) {
11            // Path compression - Directly attaching x node to its root parent
12            parent[x] = find(parent[x]);
13        }
14        return parent[x];
15    }
16
17    // The main method for merging accounts
18    public List<List<String>> accountsMerge(List<List<String>> accounts) {
19        int n = accounts.size();
20        parent = new int[n];
21
22        // Initialize the parent array with each node as its own parent
23        for (int i = 0; i < n; i++) {
24            parent[i] = i;
25        }
26
27        // Map to store the account index for a unique email
28        Map<String, Integer> emailToIndex = new HashMap<>();
29
30        // Iterate over accounts to find and union the emails
31        for (int i = 0; i < n; i++) {
32            List<String> currentAccount = accounts.get(i);
33            for (int j = 1; j < currentAccount.size(); j++) {
34                String email = currentAccount.get(j);
35                if (emailToIndex.containsKey(email)) {
36                    // Perform union operation by setting the parent
37                    parent[find(i)] = find(emailToIndex.get(email));
38                } else {
39                    // If the email is unique, map it to the current account index
40                    emailToIndex.put(email, i);
41                }
42            }
43        }
44
45        // Map to combine emails that belong to the same owner
46        Map<Integer, Set<String>> mergedAccounts = new HashMap<>();
47        for (int i = 0; i < n; i++) {
48            for (int j = 1; j < accounts.get(i).size(); j++) {
49                String email = accounts.get(i).get(j);
50                // Find the set's root and add the email
51                int root = find(i);
52                if (!mergedAccounts.containsKey(root)) {
53                    mergedAccounts.put(root, new TreeSet<String>());
54                }
55                mergedAccounts.get(root).add(email);
56            }
57        }
58
59        // Convert the sets of emails back to the required output format
60        List<List<String>> result = new ArrayList<>();
61        for (Integer i : mergedAccounts.keySet()) {
62            List<String> account = new ArrayList<>();
63            account.add(accounts.get(i).get(0)); // Add the account name
64            account.addAll(mergedAccounts.get(i)); // Add all merged emails
65            result.add(account); // Add to final results
66        }
67
68        return result;
69    }
70}
71

1#include <vector>
2#include <string>
3#include <unordered_map>
4#include <unordered_set>
5#include <algorithm>
6
7class Solution {
8public:
9    // The parent vector for storing the disjoint set (union-find) representatives
10    vector<int> parent;
11
12    // The method to find the unique email accounts and merge them if they belong to the same user
13    vector<vector<string>> accountsMerge(vector<vector<string>>& accounts) {
14        int n = accounts.size();
15        // Initialize the parent array for union-find
16        parent.resize(n);
17        for (int i = 0; i < n; ++i) parent[i] = i;
18      
19        // Map to store email to the index of the account it belongs to
20        unordered_map<string, int> emailToIdx;
21      
22        // Iterate through each account to union emails and assign them a common parent index
23        for (int i = 0; i < n; ++i) {
24            for (int j = 1; j < accounts[i].size(); ++j) {
25                string email = accounts[i][j];
26                if (emailToIdx.count(email)) {
27                    // Union operation: merge the sets containing these two emails
28                    parent[find(i)] = find(emailToIdx[email]);
29                } else {
30                    // Assign the current index to the unique email
31                    emailToIdx[email] = i;
32                }
33            }
34        }
35      
36        // Map to store the root index of each account and the associated unique emails
37        unordered_map<int, unordered_set<string>> mergedAccounts;
38        for (int i = 0; i < n; ++i) {
39            for (int j = 1; j < accounts[i].size(); ++j) {
40                string email = accounts[i][j];
41                // Insert the email into the set that belongs to the root parent of i
42                mergedAccounts[find(i)].insert(email);
43            }
44        }
45      
46        // Prepare the final result from the merged accounts
47        vector<vector<string>> result;
48        for (auto& [i, emails] : mergedAccounts) {
49            vector<string> account;
50            account.push_back(accounts[i][0]); // Add the name of the account owner
51            account.insert(account.end(), emails.begin(), emails.end()); // Add all the merged emails
52            sort(account.begin() + 1, account.end()); // Sort the emails
53            result.push_back(account); // Add this list to the results
54        }
55      
56        return result;
57    }
58
59    // The method to find the representative of the set that element x belongs to
60    int find(int x) {
61        if (parent[x] != x) {
62            // Path compression: update the parent to the representative of the set for faster future lookups
63            parent[x] = find(parent[x]);
64        }
65        return parent[x];
66    }
67};
68

1// Emulating a vector using an array
2let parent: number[] = [];
3
4// Helper function to find the representative of the set that element x belongs to
5function find(x: number): number {
6    if (parent[x] !== x) {
7        // Path compression: update the parent to the representative for faster lookups
8        parent[x] = find(parent[x]);
9    }
10    return parent[x];
11}
12
13// Function to merge email accounts
14function accountsMerge(accounts: string[][]): string[][] {
15    const n: number = accounts.length;
16    // Initialize the parent array for the disjoint set (union-find)
17    parent = Array.from({ length: n }, (_, index) => index);
18
19    // Map to store email to the index of the account it belongs to
20    const emailToIndex: Map<string, number> = new Map();
21
22    // Iterate through each account to union the emails and assign them a common parent index
23    accounts.forEach((account, i) => {
24        account.slice(1).forEach(email => {
25            if (emailToIndex.has(email)) {
26                // Union operation: merge the sets that contain these two emails
27                parent[find(i)] = find(emailToIndex.get(email)!);
28            } else {
29                // Assign the current index to the unique email
30                emailToIndex.set(email, i);
31            }
32        });
33    });
34
35    // Map to store the root index of each account and associated unique emails
36    const mergedAccounts: Map<number, Set<string>> = new Map();
37
38    accounts.forEach((account, i) => {
39        account.slice(1).forEach(email => {
40            const root: number = find(i);
41            if (!mergedAccounts.has(root)) {
42                mergedAccounts.set(root, new Set());
43            }
44            mergedAccounts.get(root)!.add(email);
45        });
46    });
47
48    // Prepare the final result from the merged accounts
49    const result: string[][] = [];
50    mergedAccounts.forEach((emails, i) => {
51        const account: string[] = [accounts[i][0]]; // Add the name of the account owner
52        const sortedEmails: string[] = Array.from(emails).sort(); // Sort the emails
53        result.push(account.concat(sortedEmails)); // Add this list to the results
54    });
55
56    return result;
57}
58

Time and Space Complexity

Time Complexity

The time complexity of the code is primarily determined by the following factors:

The iteration over each account and email to build the email_id mapping and perform union operations: This is O(N * K), where N is the number of accounts and K is the average number of emails per account.
The path compression in the find function helps in keeping the union-find operations nearly constant time, at worst with amortized complexity O(α(N)), where α is the inverse Ackermann function which grows very slowly.
Iterating over each account again and performing find operations on each email to build the mp mapping: This also takes O(N * K) time, as we are performing essentially constant-time operations per email.
Sorting of the email lists: In the worst case, if all emails end up in one set, the time complexity would be O(E log E), where E is the total number of emails across all accounts.
Combining the names with the sorted emails: This can be considered O(E) as it is linear to the number of emails.

Thus, the overall time complexity would be O(N * K + E log E).

Space Complexity

The space complexity is determined by the data structures used:

The parent list p, which is of size O(N).
The email_id dictionary, which could contain at most O(E) email-to-account mappings.
The mp defaultdict, which can store O(E) email entries.
The output list, which would also contain O(E) email entries.

Therefore, the overall space complexity of the code is O(N + E).

Learn more about how to find time and space complexity quickly using problem constraints.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which of the following problems can be solved with backtracking (select multiple)

Generating subsets

Generating random numbers

Generating permutations

Sorting integers