929. Unique Email Addresses

EasyArrayHash TableString

Problem Description

You are given an array of email addresses and need to determine how many unique addresses will actually receive emails after applying certain forwarding rules.

Each valid email has two parts separated by the @ sign:

Local name (before @) - e.g., "alice" in "alice@leetcode.com"
Domain name (after @) - e.g., "leetcode.com" in "alice@leetcode.com"

The email system applies two special rules to the local name only (these rules do NOT apply to the domain name):

Dots (.) are ignored: Any periods in the local name are removed when determining the actual recipient.
- Example: "alice.z@leetcode.com" and "alicez@leetcode.com" go to the same address
Plus sign (+) truncates: Everything after the first + sign in the local name is ignored.
- Example: "m.y+name@email.com" forwards to "my@email.com"

Both rules can be applied together. For instance:

"a.b+xyz@email.com" becomes "ab@email.com"
"a.b.c+test@email.com" becomes "abc@email.com"

Given an array emails containing email addresses, return the count of distinct addresses that will actually receive mail after applying these forwarding rules.

Example: If the input is ["test.email+alex@leetcode.com", "test.e.mail+bob.cathy@leetcode.com", "testemail+david@lee.tcode.com"]:

First email becomes: "testemail@leetcode.com"
Second email becomes: "testemail@leetcode.com" (same as first)
Third email becomes: "testemail@lee.tcode.com" (different domain)

The answer would be 2 unique receiving addresses.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

The key insight is that we need to normalize each email address to its canonical form (the actual receiving address) and then count how many distinct canonical forms exist.

Since multiple email addresses can map to the same receiving address due to the forwarding rules, we're essentially dealing with a many-to-one mapping problem. The natural data structure for tracking unique items is a set, which automatically handles duplicates for us.

The transformation process for each email is straightforward:

Split the email into local and domain parts at the @ symbol
Process only the local part by applying the two rules
Reconstruct the canonical email address

For processing the local part, we need to:

Skip any . characters we encounter
Stop processing entirely when we hit a + character
Keep all other characters

By iterating through each character of the local part, we can build the cleaned version character by character. Once we encounter a +, we immediately stop processing since everything after it is ignored.

After processing all emails and adding their canonical forms to a set, the size of the set gives us the count of unique receiving addresses. The set automatically ensures that duplicate canonical forms (like "testemail@leetcode.com" appearing multiple times) are counted only once.

This approach is efficient because we process each email exactly once with O(n) time complexity for each email string, and the set operations (add and size check) are O(1) on average.

Solution Approach

We use a hash table (set) to store all unique email addresses after normalization. Here's the step-by-step implementation:

Initialize a set s to store unique canonical email addresses.
Process each email in the input array:
- Split the email into local and domain parts using email.split("@")
- The @ symbol serves as our delimiter to separate these two components
Normalize the local part:
- Create an empty list t to build the processed local name character by character
- Iterate through each character c in the local part:
  - If c == ".": Skip it (continue to next character)
  - If c == "+": Break the loop (ignore everything after)
  - Otherwise: Append the character to list t
Reconstruct the canonical email:
- Join the characters in list t to form the processed local part: "".join(t)
- Concatenate with the domain: "".join(t) + "@" + domain
- Add this canonical form to the set s
Return the result: The length of set s gives us the count of unique receiving addresses.

The algorithm efficiently handles both rules simultaneously:

Dots are filtered out during the character-by-character processing
The plus rule is enforced by breaking the loop as soon as we encounter a +

Time Complexity: O(n × m) where n is the number of emails and m is the average length of each email string.

Space Complexity: O(n × m) in the worst case where all emails are unique and we store them all in the set.

The use of a set is crucial here as it automatically handles duplicate canonical forms, ensuring each unique address is counted exactly once.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example with the input: ["a.b+c@mail.com", "ab@mail.com", "a.b+xyz@mail.com"]

Step 1: Initialize an empty set

s = {}

Step 2: Process first email "a.b+c@mail.com"

Split at @: local = "a.b+c", domain = "mail.com"
Process local part character by character:
- 'a' → not . or +, add to list: t = ['a']
- '.' → skip it: t = ['a']
- 'b' → not . or +, add to list: t = ['a', 'b']
- '+' → stop processing (ignore 'c')
Join and reconstruct: "".join(['a','b']) + "@mail.com" = "ab@mail.com"
Add to set: s = {"ab@mail.com"}

Step 3: Process second email "ab@mail.com"

Split at @: local = "ab", domain = "mail.com"
Process local part:
- 'a' → add to list: t = ['a']
- 'b' → add to list: t = ['a', 'b']
Join and reconstruct: "ab@mail.com"
Add to set: s = {"ab@mail.com"} (no change, already exists)

Step 4: Process third email "a.b+xyz@mail.com"

Split at @: local = "a.b+xyz", domain = "mail.com"
Process local part:
- 'a' → add to list: t = ['a']
- '.' → skip it: t = ['a']
- 'b' → add to list: t = ['a', 'b']
- '+' → stop processing (ignore 'xyz')
Join and reconstruct: "ab@mail.com"
Add to set: s = {"ab@mail.com"} (no change, already exists)

Step 5: Return result

Length of set = 1

All three emails forward to the same address "ab@mail.com", so the answer is 1.

Solution Implementation

1class Solution:
2    def numUniqueEmails(self, emails: List[str]) -> int:
3        """
4        Count the number of unique email addresses after applying rules:
5        - Dots (.) in local name are ignored
6        - Everything after plus (+) in local name is ignored
7        - Domain name remains unchanged
8      
9        Args:
10            emails: List of email addresses
11          
12        Returns:
13            Number of unique email addresses
14        """
15        unique_emails = set()
16      
17        for email in emails:
18            # Split email into local name and domain name
19            local_name, domain_name = email.split("@")
20          
21            # Process the local name according to rules
22            processed_local = []
23            for char in local_name:
24                # Ignore dots in local name
25                if char == ".":
26                    continue
27                # Stop processing at plus sign
28                if char == "+":
29                    break
30                # Add valid characters to processed local name
31                processed_local.append(char)
32          
33            # Construct the normalized email and add to set
34            normalized_email = "".join(processed_local) + "@" + domain_name
35            unique_emails.add(normalized_email)
36      
37        return len(unique_emails)
38

1class Solution {
2    public int numUniqueEmails(String[] emails) {
3        // Use a HashSet to store unique normalized email addresses
4        Set<String> uniqueEmails = new HashSet<>();
5      
6        // Process each email address
7        for (String email : emails) {
8            // Split email into local name and domain parts
9            String[] emailParts = email.split("@");
10            String localName = emailParts[0];
11            String domainName = emailParts[1];
12          
13            // Build the normalized local name
14            StringBuilder normalizedLocal = new StringBuilder();
15          
16            // Process each character in the local name
17            for (char character : localName.toCharArray()) {
18                // Skip dots in the local name
19                if (character == '.') {
20                    continue;
21                }
22                // Stop processing when we encounter a plus sign
23                if (character == '+') {
24                    break;
25                }
26                // Append valid characters to the normalized local name
27                normalizedLocal.append(character);
28            }
29          
30            // Construct the normalized email and add to set
31            String normalizedEmail = normalizedLocal.toString() + "@" + domainName;
32            uniqueEmails.add(normalizedEmail);
33        }
34      
35        // Return the count of unique email addresses
36        return uniqueEmails.size();
37    }
38}
39

1class Solution {
2public:
3    int numUniqueEmails(vector<string>& emails) {
4        // Use a set to store unique normalized email addresses
5        unordered_set<string> uniqueEmails;
6      
7        // Process each email address
8        for (const string& email : emails) {
9            // Find the position of '@' to separate local and domain parts
10            size_t atPosition = email.find('@');
11          
12            // Extract local part (before '@') and domain part (after '@')
13            string localPart = email.substr(0, atPosition);
14            string domainPart = email.substr(atPosition + 1);
15          
16            // Build the normalized local part
17            string normalizedLocal;
18            for (char ch : localPart) {
19                // Skip dots in the local part
20                if (ch == '.') {
21                    continue;
22                }
23                // Stop processing when '+' is encountered
24                if (ch == '+') {
25                    break;
26                }
27                // Add valid characters to the normalized local part
28                normalizedLocal.push_back(ch);
29            }
30          
31            // Combine normalized local part with domain and add to set
32            string normalizedEmail = normalizedLocal + "@" + domainPart;
33            uniqueEmails.insert(normalizedEmail);
34        }
35      
36        // Return the count of unique email addresses
37        return uniqueEmails.size();
38    }
39};
40

1/**
2 * Counts the number of unique email addresses after applying Gmail-style rules
3 * @param emails - Array of email addresses to process
4 * @returns Number of unique email addresses
5 */
6function numUniqueEmails(emails: string[]): number {
7    // Set to store unique normalized email addresses
8    const uniqueEmails = new Set<string>();
9  
10    // Process each email address
11    for (const email of emails) {
12        // Split email into local and domain parts
13        const [localPart, domainPart] = email.split('@');
14      
15        // Build the normalized local part
16        let normalizedLocal = '';
17      
18        // Process each character in the local part
19        for (const char of localPart) {
20            // Ignore dots in the local part
21            if (char === '.') {
22                continue;
23            }
24          
25            // Stop processing at the first plus sign
26            if (char === '+') {
27                break;
28            }
29          
30            // Add valid characters to the normalized local part
31            normalizedLocal += char;
32        }
33      
34        // Combine normalized local part with domain and add to set
35        const normalizedEmail = normalizedLocal + '@' + domainPart;
36        uniqueEmails.add(normalizedEmail);
37    }
38  
39    // Return the count of unique emails
40    return uniqueEmails.size;
41}
42

Time and Space Complexity

Time Complexity: O(L)

The algorithm iterates through each email address once. For each email, it:

Splits the email at "@" which takes O(n) where n is the length of that email
Iterates through each character in the local part to process dots and plus signs, which is O(n) in the worst case
Joins the processed characters and concatenates with domain, which is O(n)
Adds the result to a set, which is O(n) for the string hashing

Since we process each character of each email at most a constant number of times, and L represents the total length of all email addresses, the overall time complexity is O(L).

Space Complexity: O(L)

The space usage includes:

The set s which stores unique processed emails. In the worst case where all emails are unique after processing, this could store up to O(L) characters total
The temporary list t for building the processed local part, which can be at most O(n) for a single email
The split operation creates new string objects for local and domain parts

The dominant factor is the set storing all unique emails, which in the worst case could contain all the input data. Therefore, the space complexity is O(L) where L is the total length of all email addresses.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Applying Rules to the Domain Name

A critical mistake is accidentally applying the dot removal or plus truncation rules to the domain part of the email. The rules ONLY apply to the local name (before the @).

Incorrect approach:

# Wrong: This removes dots from the entire email
email = email.replace(".", "")  # This would turn "test@lee.code.com" into "test@leecodecom"

Correct approach:

local_name, domain_name = email.split("@")
# Process only local_name, keep domain_name unchanged

2. Multiple @ Signs in Email

If an email contains multiple @ signs (though technically invalid), using split("@") without limits can cause unpacking errors.

Problem scenario:

# If email = "test@middle@domain.com"
local_name, domain_name = email.split("@")  # ValueError: too many values to unpack

Solution:

# Limit split to first @ sign only
parts = email.split("@", 1)  # Split at most once
if len(parts) == 2:
    local_name, domain_name = parts

3. Order of Operations: Removing Dots After Plus

Processing the plus sign before removing dots can lead to incorrect results if you're using string replacement methods.

Incorrect approach:

# Wrong order if using replace
local_name = local_name.split("+")[0]  # First handle plus
local_name = local_name.replace(".", "")  # Then remove dots
# This works but is less efficient than single-pass iteration

Better approach (as shown in solution): Process character by character in a single pass, handling both rules simultaneously.

4. Using List Instead of Set

Using a list to store unique emails requires manual duplicate checking, making the solution inefficient.

Inefficient approach:

unique_emails = []
for email in emails:
    normalized = process_email(email)
    if normalized not in unique_emails:  # O(n) lookup
        unique_emails.append(normalized)

Efficient approach:

unique_emails = set()  # O(1) average case for add and lookup
unique_emails.add(normalized_email)

5. Not Handling Edge Cases

Failing to consider edge cases like empty local names after processing or emails without @ signs.

Potential edge cases to consider:

Email: "+test@domain.com" → Results in empty local name
Email: "...@domain.com" → All dots removed, empty local name
Email: "nodomain" → No @ sign present

Robust solution addition:

if "@" not in email:
    continue  # Skip invalid emails
  
# After processing local_name
if not processed_local:  # Empty local name
    continue  # Or handle according to requirements

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which of the following shows the order of node visit in a Breadth-first Search?

I don't know