929. Unique Email Addresses
Problem Description
You are given an array of email addresses and need to determine how many unique addresses will actually receive emails after applying certain forwarding rules.
Each valid email has two parts separated by the @
sign:
- Local name (before
@
) - e.g., "alice" in "alice@leetcode.com" - Domain name (after
@
) - e.g., "leetcode.com" in "alice@leetcode.com"
The email system applies two special rules to the local name only (these rules do NOT apply to the domain name):
-
Dots (
.
) are ignored: Any periods in the local name are removed when determining the actual recipient.- Example: "alice.z@leetcode.com" and "alicez@leetcode.com" go to the same address
-
Plus sign (
+
) truncates: Everything after the first+
sign in the local name is ignored.- Example: "m.y+name@email.com" forwards to "my@email.com"
Both rules can be applied together. For instance:
- "a.b+xyz@email.com" becomes "ab@email.com"
- "a.b.c+test@email.com" becomes "abc@email.com"
Given an array emails
containing email addresses, return the count of distinct addresses that will actually receive mail after applying these forwarding rules.
Example:
If the input is ["test.email+alex@leetcode.com", "test.e.mail+bob.cathy@leetcode.com", "testemail+david@lee.tcode.com"]
:
- First email becomes: "testemail@leetcode.com"
- Second email becomes: "testemail@leetcode.com" (same as first)
- Third email becomes: "testemail@lee.tcode.com" (different domain)
The answer would be 2 unique receiving addresses.
Intuition
The key insight is that we need to normalize each email address to its canonical form (the actual receiving address) and then count how many distinct canonical forms exist.
Since multiple email addresses can map to the same receiving address due to the forwarding rules, we're essentially dealing with a many-to-one mapping problem. The natural data structure for tracking unique items is a set, which automatically handles duplicates for us.
The transformation process for each email is straightforward:
- Split the email into local and domain parts at the
@
symbol - Process only the local part by applying the two rules
- Reconstruct the canonical email address
For processing the local part, we need to:
- Skip any
.
characters we encounter - Stop processing entirely when we hit a
+
character - Keep all other characters
By iterating through each character of the local part, we can build the cleaned version character by character. Once we encounter a +
, we immediately stop processing since everything after it is ignored.
After processing all emails and adding their canonical forms to a set, the size of the set gives us the count of unique receiving addresses. The set automatically ensures that duplicate canonical forms (like "testemail@leetcode.com" appearing multiple times) are counted only once.
This approach is efficient because we process each email exactly once with O(n)
time complexity for each email string, and the set operations (add and size check) are O(1)
on average.
Solution Approach
We use a hash table (set) to store all unique email addresses after normalization. Here's the step-by-step implementation:
-
Initialize a set
s
to store unique canonical email addresses. -
Process each email in the input array:
- Split the email into
local
anddomain
parts usingemail.split("@")
- The
@
symbol serves as our delimiter to separate these two components
- Split the email into
-
Normalize the local part:
- Create an empty list
t
to build the processed local name character by character - Iterate through each character
c
in the local part:- If
c == "."
: Skip it (continue to next character) - If
c == "+"
: Break the loop (ignore everything after) - Otherwise: Append the character to list
t
- If
- Create an empty list
-
Reconstruct the canonical email:
- Join the characters in list
t
to form the processed local part:"".join(t)
- Concatenate with the domain:
"".join(t) + "@" + domain
- Add this canonical form to the set
s
- Join the characters in list
-
Return the result: The length of set
s
gives us the count of unique receiving addresses.
The algorithm efficiently handles both rules simultaneously:
- Dots are filtered out during the character-by-character processing
- The plus rule is enforced by breaking the loop as soon as we encounter a
+
Time Complexity: O(n × m)
where n
is the number of emails and m
is the average length of each email string.
Space Complexity: O(n × m)
in the worst case where all emails are unique and we store them all in the set.
The use of a set is crucial here as it automatically handles duplicate canonical forms, ensuring each unique address is counted exactly once.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example with the input: ["a.b+c@mail.com", "ab@mail.com", "a.b+xyz@mail.com"]
Step 1: Initialize an empty set
s = {}
Step 2: Process first email "a.b+c@mail.com"
- Split at
@
: local = "a.b+c", domain = "mail.com" - Process local part character by character:
'a'
→ not.
or+
, add to list:t = ['a']
'.'
→ skip it:t = ['a']
'b'
→ not.
or+
, add to list:t = ['a', 'b']
'+'
→ stop processing (ignore 'c')
- Join and reconstruct:
"".join(['a','b'])
+ "@mail.com" = "ab@mail.com" - Add to set:
s = {"ab@mail.com"}
Step 3: Process second email "ab@mail.com"
- Split at
@
: local = "ab", domain = "mail.com" - Process local part:
'a'
→ add to list:t = ['a']
'b'
→ add to list:t = ['a', 'b']
- Join and reconstruct: "ab@mail.com"
- Add to set:
s = {"ab@mail.com"}
(no change, already exists)
Step 4: Process third email "a.b+xyz@mail.com"
- Split at
@
: local = "a.b+xyz", domain = "mail.com" - Process local part:
'a'
→ add to list:t = ['a']
'.'
→ skip it:t = ['a']
'b'
→ add to list:t = ['a', 'b']
'+'
→ stop processing (ignore 'xyz')
- Join and reconstruct: "ab@mail.com"
- Add to set:
s = {"ab@mail.com"}
(no change, already exists)
Step 5: Return result
- Length of set = 1
All three emails forward to the same address "ab@mail.com", so the answer is 1.
Solution Implementation
1class Solution:
2 def numUniqueEmails(self, emails: List[str]) -> int:
3 """
4 Count the number of unique email addresses after applying rules:
5 - Dots (.) in local name are ignored
6 - Everything after plus (+) in local name is ignored
7 - Domain name remains unchanged
8
9 Args:
10 emails: List of email addresses
11
12 Returns:
13 Number of unique email addresses
14 """
15 unique_emails = set()
16
17 for email in emails:
18 # Split email into local name and domain name
19 local_name, domain_name = email.split("@")
20
21 # Process the local name according to rules
22 processed_local = []
23 for char in local_name:
24 # Ignore dots in local name
25 if char == ".":
26 continue
27 # Stop processing at plus sign
28 if char == "+":
29 break
30 # Add valid characters to processed local name
31 processed_local.append(char)
32
33 # Construct the normalized email and add to set
34 normalized_email = "".join(processed_local) + "@" + domain_name
35 unique_emails.add(normalized_email)
36
37 return len(unique_emails)
38
1class Solution {
2 public int numUniqueEmails(String[] emails) {
3 // Use a HashSet to store unique normalized email addresses
4 Set<String> uniqueEmails = new HashSet<>();
5
6 // Process each email address
7 for (String email : emails) {
8 // Split email into local name and domain parts
9 String[] emailParts = email.split("@");
10 String localName = emailParts[0];
11 String domainName = emailParts[1];
12
13 // Build the normalized local name
14 StringBuilder normalizedLocal = new StringBuilder();
15
16 // Process each character in the local name
17 for (char character : localName.toCharArray()) {
18 // Skip dots in the local name
19 if (character == '.') {
20 continue;
21 }
22 // Stop processing when we encounter a plus sign
23 if (character == '+') {
24 break;
25 }
26 // Append valid characters to the normalized local name
27 normalizedLocal.append(character);
28 }
29
30 // Construct the normalized email and add to set
31 String normalizedEmail = normalizedLocal.toString() + "@" + domainName;
32 uniqueEmails.add(normalizedEmail);
33 }
34
35 // Return the count of unique email addresses
36 return uniqueEmails.size();
37 }
38}
39
1class Solution {
2public:
3 int numUniqueEmails(vector<string>& emails) {
4 // Use a set to store unique normalized email addresses
5 unordered_set<string> uniqueEmails;
6
7 // Process each email address
8 for (const string& email : emails) {
9 // Find the position of '@' to separate local and domain parts
10 size_t atPosition = email.find('@');
11
12 // Extract local part (before '@') and domain part (after '@')
13 string localPart = email.substr(0, atPosition);
14 string domainPart = email.substr(atPosition + 1);
15
16 // Build the normalized local part
17 string normalizedLocal;
18 for (char ch : localPart) {
19 // Skip dots in the local part
20 if (ch == '.') {
21 continue;
22 }
23 // Stop processing when '+' is encountered
24 if (ch == '+') {
25 break;
26 }
27 // Add valid characters to the normalized local part
28 normalizedLocal.push_back(ch);
29 }
30
31 // Combine normalized local part with domain and add to set
32 string normalizedEmail = normalizedLocal + "@" + domainPart;
33 uniqueEmails.insert(normalizedEmail);
34 }
35
36 // Return the count of unique email addresses
37 return uniqueEmails.size();
38 }
39};
40
1/**
2 * Counts the number of unique email addresses after applying Gmail-style rules
3 * @param emails - Array of email addresses to process
4 * @returns Number of unique email addresses
5 */
6function numUniqueEmails(emails: string[]): number {
7 // Set to store unique normalized email addresses
8 const uniqueEmails = new Set<string>();
9
10 // Process each email address
11 for (const email of emails) {
12 // Split email into local and domain parts
13 const [localPart, domainPart] = email.split('@');
14
15 // Build the normalized local part
16 let normalizedLocal = '';
17
18 // Process each character in the local part
19 for (const char of localPart) {
20 // Ignore dots in the local part
21 if (char === '.') {
22 continue;
23 }
24
25 // Stop processing at the first plus sign
26 if (char === '+') {
27 break;
28 }
29
30 // Add valid characters to the normalized local part
31 normalizedLocal += char;
32 }
33
34 // Combine normalized local part with domain and add to set
35 const normalizedEmail = normalizedLocal + '@' + domainPart;
36 uniqueEmails.add(normalizedEmail);
37 }
38
39 // Return the count of unique emails
40 return uniqueEmails.size;
41}
42
Time and Space Complexity
Time Complexity: O(L)
The algorithm iterates through each email address once. For each email, it:
- Splits the email at "@" which takes
O(n)
wheren
is the length of that email - Iterates through each character in the local part to process dots and plus signs, which is
O(n)
in the worst case - Joins the processed characters and concatenates with domain, which is
O(n)
- Adds the result to a set, which is
O(n)
for the string hashing
Since we process each character of each email at most a constant number of times, and L
represents the total length of all email addresses, the overall time complexity is O(L)
.
Space Complexity: O(L)
The space usage includes:
- The set
s
which stores unique processed emails. In the worst case where all emails are unique after processing, this could store up toO(L)
characters total - The temporary list
t
for building the processed local part, which can be at mostO(n)
for a single email - The split operation creates new string objects for local and domain parts
The dominant factor is the set storing all unique emails, which in the worst case could contain all the input data. Therefore, the space complexity is O(L)
where L
is the total length of all email addresses.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Applying Rules to the Domain Name
A critical mistake is accidentally applying the dot removal or plus truncation rules to the domain part of the email. The rules ONLY apply to the local name (before the @
).
Incorrect approach:
# Wrong: This removes dots from the entire email email = email.replace(".", "") # This would turn "test@lee.code.com" into "test@leecodecom"
Correct approach:
local_name, domain_name = email.split("@") # Process only local_name, keep domain_name unchanged
2. Multiple @ Signs in Email
If an email contains multiple @
signs (though technically invalid), using split("@")
without limits can cause unpacking errors.
Problem scenario:
# If email = "test@middle@domain.com" local_name, domain_name = email.split("@") # ValueError: too many values to unpack
Solution:
# Limit split to first @ sign only
parts = email.split("@", 1) # Split at most once
if len(parts) == 2:
local_name, domain_name = parts
3. Order of Operations: Removing Dots After Plus
Processing the plus sign before removing dots can lead to incorrect results if you're using string replacement methods.
Incorrect approach:
# Wrong order if using replace local_name = local_name.split("+")[0] # First handle plus local_name = local_name.replace(".", "") # Then remove dots # This works but is less efficient than single-pass iteration
Better approach (as shown in solution): Process character by character in a single pass, handling both rules simultaneously.
4. Using List Instead of Set
Using a list to store unique emails requires manual duplicate checking, making the solution inefficient.
Inefficient approach:
unique_emails = [] for email in emails: normalized = process_email(email) if normalized not in unique_emails: # O(n) lookup unique_emails.append(normalized)
Efficient approach:
unique_emails = set() # O(1) average case for add and lookup
unique_emails.add(normalized_email)
5. Not Handling Edge Cases
Failing to consider edge cases like empty local names after processing or emails without @
signs.
Potential edge cases to consider:
- Email:
"+test@domain.com"
→ Results in empty local name - Email:
"...@domain.com"
→ All dots removed, empty local name - Email:
"nodomain"
→ No @ sign present
Robust solution addition:
if "@" not in email: continue # Skip invalid emails # After processing local_name if not processed_local: # Empty local name continue # Or handle according to requirements
What is an advantages of top-down dynamic programming vs bottom-up dynamic programming?
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!