929. Unique Email Addresses
Problem Description
The problem relates to the processing of email addresses in order to identify unique recipients. An email address consists of two parts: a local name and a domain name, separated by '@'. The local part can include periods '.' and plus signs '+'. The periods within the local part have no effect on the email delivery, meaning 'alice.z' and 'alicez' in local parts are effectively the same. Additionally, everything after a plus sign in the local part of the email is disregarded. The domain name, however, is unaffected by these rules.
The goal is to count how many unique email addresses are actually being used for email delivery after applying the mentioned rules. For example, "alice.z@leetcode.com" and "alicez@leetcode.com" count as one unique email address since the period in the local part is ignored.
Given an array of email strings, the task is to return the count of distinct addresses that will actually receive emails, after normalizing them according to the rules.
Intuition
To solve this problem, we need to normalize each email address by removing extra periods and any characters following a plus sign in the local name.
Here's how we can approach it:
- We initialize an empty set to store the unique email addresses.
- We iterate over each email in the list.
- For each email, we split it into local and domain parts using the '@' symbol.
- We remove all periods from the local part.
- If there is a plus sign in the local part, we truncate the local part up to the plus sign, ignoring everything after it.
- We recombine the normalized local part with the domain part, and add the resulting email address to the set.
- Since a set automatically ensures that only unique items are stored, after processing all emails, the size of the set will be equal to the number of unique email addresses that actually receive mails.
- We return this count as the final answer.
By using a set, we avoid having to manually check for duplicates, which simplifies the logic and improves efficiency.
Solution Approach
The implementation in Python uses a set to store unique email addresses that are normalized according to the problem's rules.
-
Set Data Structure: A set is chosen because it only stores unique values, which automatically helps us to keep track of unique emails without any duplicates.
-
String Manipulation: The algorithm uses basic string operations such as
split
,replace
, and slicing to manipulate the local names as per the rules provided. -
Processing Emails:
-
Splitting: Each email is split into a
local
anddomain
part usingemail.split('@')
. This separates the local name from the domain name. -
Removing Periods: All periods are removed from the local part with
local.replace('.', '')
, since periods do not change the destination of the email. -
Handling Plus Sign: If a plus sign is present in the local part, it finds the index of the first plus sign using
local.find('+')
. The local part is then truncated up to that index (not including the plus and everything after it) withlocal[:i]
, wherei
is the index of the plus sign. -
Reassembly: The cleaned local part is concatenated with the domain part using
'@'
to form the normalized email. -
Adding to Set: The resulting email address is added to the set. Python takes care of checking for uniqueness.
-
-
Return Unique Count: The length of the set,
len(s)
, reflects the number of unique normalized emails, which is the desired output.
This algorithm effectively streamlines the email normalization while respecting the defined rules and ensures that we are able to calculate the unique recipient emails in an efficient manner.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's take a small example to illustrate the solution approach. Suppose we have the following array of email strings:
["test.email+alex@leetcode.com", "test.e.mail+bob.cathy@leetcode.com", "testemail+david@lee.tcode.com"]
We want to find out how many unique email addresses there are after normalizing these emails according to the rules described.
-
We declare a set to keep track of unique normalized emails:
unique_emails = set()
. -
Start iterating over each email:
-
For the first email,
test.email+alex@leetcode.com
:- Split into
local = "test.email+alex"
anddomain = "leetcode.com"
. - Remove periods in the local part:
local = local.replace('.', '')
resulting inlocal = "testemail+alex"
. - Look for a plus sign and truncate everything after it:
local = local[:local.find('+')]
resulting inlocal = "testemail"
. - Combine the modified local part with the domain part:
normalized_email = "testemail@leetcode.com"
. - Add the normalized email to the set:
unique_emails.add(normalized_email)
.
- Split into
-
Repeat the steps for the other emails:
- Second email
test.e.mail+bob.cathy@leetcode.com
becomestestemail@leetcode.com
after normalization. - Third email
testemail+david@lee.tcode.com
remains astestemail@lee.tcode.com
because the domain is different.
- Second email
-
-
After processing all emails, we have the set
unique_emails = {"testemail@leetcode.com", "testemail@lee.tcode.com"}
. -
The length of the set is 2, meaning there are two unique email addresses.
So the function would return 2
for this given example, illustrating that after normalizing the given email addresses, there are two unique emails that would receive messages.
Solution Implementation
1class Solution:
2 def numUniqueEmails(self, emails: List[str]) -> int:
3 # Initialize a set to keep track of unique emails
4 unique_emails = set()
5
6 # Iterate over each email in the list
7 for email in emails:
8 # Split the email into local and domain parts
9 local, domain = email.split('@')
10 # Remove all occurrences of '.'
11 local = local.replace('.', '')
12 # Find the index of '+' to ignore everything after it in the local part
13 plus_index = local.find('+')
14 # If '+' is present, keep only the part before it
15 if plus_index != -1:
16 local = local[:plus_index]
17 # Combine the processed local part with the domain and add to the set
18 unique_emails.add(local + '@' + domain)
19
20 # The length of the set represents the count of unique emails
21 return len(unique_emails)
22
1class Solution {
2
3 public int numUniqueEmails(String[] emails) {
4 // Create a hash set to store unique emails
5 Set<String> uniqueEmails = new HashSet<>();
6
7 // Iterate over each email in the provided array
8 for (String email : emails) {
9 // Split the email into local and domain parts
10 String[] parts = email.split("@");
11 String localPart = parts[0];
12 String domainPart = parts[1];
13
14 // Remove all periods from the local part
15 localPart = localPart.replace(".", "");
16
17 // Check for a '+' character and cut off any substring after it in the local part
18 int plusIndex = localPart.indexOf('+');
19 if (plusIndex != -1) {
20 localPart = localPart.substring(0, plusIndex);
21 }
22
23 // Combine the processed local part with the domain part
24 String cleanEmail = localPart + "@" + domainPart;
25
26 // Add the clean email to the set (duplicates are ignored)
27 uniqueEmails.add(cleanEmail);
28 }
29
30 // Return the number of unique emails found, which is the size of the set
31 return uniqueEmails.size();
32 }
33}
34
1#include <string>
2#include <vector>
3#include <unordered_set>
4using namespace std;
5
6class Solution {
7public:
8 int numUniqueEmails(vector<string>& emails) {
9 // Use an unordered set to store unique emails.
10 unordered_set<string> uniqueEmails;
11
12 // Process each email in the input vector.
13 for (auto& email : emails) {
14 // Find the position of '@' which separates local and domain parts.
15 size_t atPosition = email.find('@');
16
17 // Extract the local part and domain part based on the position of '@'.
18 string localPart = email.substr(0, atPosition);
19 string domainPart = email.substr(atPosition);
20
21 // Find the position of '+' which indicates the end of the relevant local part.
22 size_t plusPosition = localPart.find('+');
23 if (plusPosition != string::npos) {
24 // Keep only the relevant part of the local part, before '+'.
25 localPart = localPart.substr(0, plusPosition);
26 }
27
28 // Remove all the occurrences of '.' from the local part.
29 size_t dotPosition = localPart.find('.');
30 while (dotPosition != string::npos) {
31 localPart.erase(dotPosition, 1);
32 dotPosition = localPart.find('.');
33 }
34
35 // Combine the sanitized local part with the domain part and insert into set.
36 uniqueEmails.insert(localPart + domainPart);
37 }
38
39 // Return the size of the set, which represents the number of unique emails.
40 return uniqueEmails.size();
41 }
42};
43
1function numUniqueEmails(emails: string[]): number {
2 // Create a new Set to store the unique emails
3 return new Set(
4 emails
5 .map(email => {
6 // For each email, split it into two parts [local part, domain part]
7 const parts = email.split('@');
8 const localPart = parts[0];
9 const domain = parts[1];
10 // Remove any occurrences of '.' and anything following '+'
11 const sanitizedLocalPart = localPart.replace(/\+.*|\./g, '');
12 // Concatenate the sanitized local part with domain
13 return sanitizedLocalPart + '@' + domain;
14 }),
15 ).size; // Return the size of the Set, which is the count of unique emails
16}
17
18// The function numUniqueEmails takes an array of email strings,
19// sanitizes them by removing the local part's dots and ignoring
20// any characters after a '+', and then counts unique email addresses.
21
Time and Space Complexity
The above Python code processes a list of email addresses to find the number of unique email addresses after normalizing them based on the specific rules provided.
Time Complexity:
Let n
be the number of email addresses and k
be the average length of these email addresses.
- Splitting the email into local and domain strings takes
O(k)
time for each email. - Replacing dots in the local part is
O(k)
as it requires traversing the local part. - The find operation for '+' is also in
O(k)
because finding a character in a string takes linear time in the worst case. - Slicing the local part up to '+' is at worst
O(k)
since in the worst case the '+' could be at the end of the local part. - Adding the processed email to set
s
can be consideredO(1)
on average due to hash set operation, but the worst case (though rare due to proper hashing) for insertion isO(k)
when there's a hash collision and it needs to traverse a linked list in the set bucket.
Therefore, each email address takes O(k)
time to process, and hence the total time for processing all email addresses is O(n*k)
.
Space Complexity:
- The set
s
will contain at mostn
elements if all normalized email addresses are unique. Since each email address isO(k)
, the total space taken by the set isO(n*k)
in the worst case. - Auxiliary space is used for storing the local and domain parts of each email address along with storing the sliced local part when '+' is found. These do not depend on the number of emails, just on the length of the emails, so they are
O(k)
each.
Hence, the overall space complexity is O(n*k)
if the space taken by the input list is not considered. If considering the space for input, the total space complexity remains O(n*k)
since this is the dominating term.
Learn more about how to find time and space complexity quickly using problem constraints.
Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?
Recommended Readings
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!