Facebook Pixel

3295. Report Spam Message

MediumArrayHash TableString
Leetcode Link

Problem Description

You are given two arrays of strings: message and bannedWords.

The task is to determine if the message array should be considered spam based on the following rule:

  • A message is spam if it contains at least 2 words that exactly match any word from the bannedWords array.

Your function should:

  • Return true if the message is spam (contains 2 or more banned words)
  • Return false if the message is not spam (contains fewer than 2 banned words)

The solution uses a hash table approach by converting bannedWords into a set s for O(1) lookup time. It then counts how many words from message appear in this set. The expression sum(w in s for w in message) counts the total occurrences of banned words in the message. If this count is greater than or equal to 2, the function returns true, indicating the message is spam.

Note that the same banned word appearing multiple times in the message counts toward the threshold of 2. For example, if "bad" is in bannedWords and appears twice in message, that would make the message spam.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The core challenge is to efficiently check if words from message appear in bannedWords and count how many matches we find.

A naive approach would be to check each word in message against every word in bannedWords using nested loops, which would give us O(n × m) time complexity where n is the length of message and m is the length of bannedWords. This becomes inefficient for large inputs.

The key insight is that we need fast lookups - we want to quickly determine if a word exists in bannedWords. This naturally points us toward using a hash table (set in Python), which provides O(1) average-case lookup time.

By converting bannedWords into a set first, we transform the problem from "check if this word exists in a list" (O(m) operation) to "check if this word exists in a set" (O(1) operation). This preprocessing step takes O(m) time but pays off when we iterate through the message array.

For counting, we simply need to track how many words from message are found in our banned words set. Python's generator expression w in s for w in message creates a sequence of boolean values (True/False), and sum() conveniently counts the True values since Python treats True as 1 and False as 0.

Once we have the count, determining if the message is spam becomes a simple comparison: is the count >= 2? This gives us an overall time complexity of O(n + m) which is optimal since we need to look at each word at least once.

Solution Approach

The solution implements a hash table-based approach to efficiently detect spam messages.

Step 1: Create a Hash Set

s = set(bannedWords)

We convert the bannedWords list into a set s. This preprocessing step takes O(m) time where m is the number of banned words, but it enables O(1) average-case lookup time for each word check.

Step 2: Count Banned Words in Message

sum(w in s for w in message)

This line performs multiple operations:

  • for w in message: Iterates through each word in the message array
  • w in s: Checks if the current word exists in the banned words set (O(1) operation)
  • The generator expression produces a sequence of boolean values (True if word is banned, False otherwise)
  • sum(): Counts the True values since Python treats True as 1 and False as 0

Step 3: Check Spam Threshold

return sum(w in s for w in message) >= 2

The final comparison checks if the count of banned words is at least 2. If yes, the message is spam and we return true; otherwise, we return false.

Time Complexity: O(n + m) where n is the length of message and m is the length of bannedWords. We spend O(m) time creating the set and O(n) time checking each word in the message.

Space Complexity: O(m) for storing the banned words in the hash set.

This approach is optimal because we must examine each word at least once, and the hash table provides the fastest possible lookups for determining if a word is banned.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to understand how the solution works:

Input:

  • message = ["hello", "world", "free", "money", "free", "offer"]
  • bannedWords = ["free", "money", "spam"]

Step 1: Create Hash Set First, we convert bannedWords into a set:

s = {"free", "money", "spam"}

This allows us O(1) lookup time for checking if a word is banned.

Step 2: Count Banned Words Now we iterate through each word in message and check if it exists in our set:

Word in messageIs it in set s?Boolean value
"hello"NoFalse (0)
"world"NoFalse (0)
"free"YesTrue (1)
"money"YesTrue (1)
"free"YesTrue (1)
"offer"NoFalse (0)

The generator expression w in s for w in message produces: [False, False, True, True, True, False]

When we apply sum() to these boolean values:

sum([False, False, True, True, True, False]) = 0 + 0 + 1 + 1 + 1 + 0 = 3

Step 3: Check Spam Threshold We found 3 banned words in the message (counting duplicates). Since 3 >= 2, the function returns true - this message is spam.

Note that "free" appears twice in the message and both occurrences count toward our total. This demonstrates that duplicate banned words contribute to the spam threshold.

Solution Implementation

1class Solution:
2    def reportSpam(self, message: List[str], bannedWords: List[str]) -> bool:
3        # Convert banned words list to a set for O(1) lookup time
4        banned_words_set = set(bannedWords)
5      
6        # Count how many words from the message appear in the banned words set
7        # Using generator expression with sum to count True values (True = 1, False = 0)
8        banned_word_count = sum(word in banned_words_set for word in message)
9      
10        # Return True if at least 2 banned words are found in the message
11        return banned_word_count >= 2
12
1class Solution {
2    /**
3     * Determines if a message should be reported as spam based on banned words.
4     * A message is considered spam if it contains at least 2 banned words.
5     * 
6     * @param message Array of words representing the message to check
7     * @param bannedWords Array of words that are considered banned/spam indicators
8     * @return true if the message contains 2 or more banned words, false otherwise
9     */
10    public boolean reportSpam(String[] message, String[] bannedWords) {
11        // Create a HashSet for O(1) lookup of banned words
12        Set<String> bannedWordsSet = new HashSet<>();
13      
14        // Populate the set with all banned words
15        for (String word : bannedWords) {
16            bannedWordsSet.add(word);
17        }
18      
19        // Counter to track how many banned words are found in the message
20        int bannedWordCount = 0;
21      
22        // Iterate through each word in the message
23        for (String word : message) {
24            // Check if current word is in the banned words set
25            if (bannedWordsSet.contains(word)) {
26                bannedWordCount++;
27              
28                // Early return if threshold is met (2 or more banned words)
29                if (bannedWordCount >= 2) {
30                    return true;
31                }
32            }
33        }
34      
35        // Message is not spam (contains less than 2 banned words)
36        return false;
37    }
38}
39
1class Solution {
2public:
3    bool reportSpam(vector<string>& message, vector<string>& bannedWords) {
4        // Create a hash set from banned words for O(1) lookup
5        unordered_set<string> bannedWordsSet(bannedWords.begin(), bannedWords.end());
6      
7        // Counter for tracking banned words found in the message
8        int bannedWordCount = 0;
9      
10        // Iterate through each word in the message
11        for (const auto& word : message) {
12            // Check if current word is in the banned words set
13            if (bannedWordsSet.contains(word)) {
14                bannedWordCount++;
15              
16                // Early return if we've found at least 2 banned words
17                if (bannedWordCount >= 2) {
18                    return true;
19                }
20            }
21        }
22      
23        // Message is not spam (less than 2 banned words found)
24        return false;
25    }
26};
27
1/**
2 * Determines if a message should be reported as spam based on banned words
3 * @param message - Array of words in the message to check
4 * @param bannedWords - Array of banned words to search for
5 * @returns true if the message contains at least 2 banned words, false otherwise
6 */
7function reportSpam(message: string[], bannedWords: string[]): boolean {
8    // Create a set of banned words for O(1) lookup performance
9    const bannedWordsSet = new Set<string>(bannedWords);
10  
11    // Counter to track the number of banned words found
12    let bannedWordCount = 0;
13  
14    // Iterate through each word in the message
15    for (const word of message) {
16        // Check if current word is in the banned words set
17        if (bannedWordsSet.has(word)) {
18            bannedWordCount++;
19          
20            // Early return if we've found at least 2 banned words
21            if (bannedWordCount >= 2) {
22                return true;
23            }
24        }
25    }
26  
27    // Return false if fewer than 2 banned words were found
28    return false;
29}
30

Time and Space Complexity

The time complexity is O((n + m) × |w|), where:

  • n is the length of the array message
  • m is the length of the array bannedWords
  • |w| is the maximum length of the words in the arrays

Breaking down the time complexity:

  • Creating the set from bannedWords: O(m × |w|) - we need to hash each of the m words, and hashing a string takes O(|w|) time
  • Iterating through message and checking membership: O(n × |w|) - we check each of the n words in the message, and each membership check in the set takes O(|w|) time for string comparison/hashing
  • Total: O(m × |w| + n × |w|) = O((n + m) × |w|)

The space complexity is O(m × |w|), which comes from:

  • The set s storing all m banned words, where each word can have up to |w| characters
  • The space needed to store the strings themselves in the set

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Case Sensitivity Issues

Pitfall: The solution performs exact matching, which means "Spam" and "spam" are treated as different words. If the banned words list contains "spam" but the message contains "Spam", it won't be detected as a banned word.

Solution:

class Solution:
    def reportSpam(self, message: List[str], bannedWords: List[str]) -> bool:
        # Convert all banned words to lowercase for case-insensitive matching
        banned_words_set = set(word.lower() for word in bannedWords)
      
        # Check message words in lowercase
        banned_word_count = sum(word.lower() in banned_words_set for word in message)
      
        return banned_word_count >= 2

2. Duplicate Banned Words in Input

Pitfall: If bannedWords contains duplicates like ["bad", "bad", "evil"], converting to a set automatically removes duplicates. While this doesn't affect correctness (since we're checking membership, not counting banned words), it might cause confusion when debugging or if requirements change.

Solution:

class Solution:
    def reportSpam(self, message: List[str], bannedWords: List[str]) -> bool:
        # Explicitly handle duplicates if needed for logging/debugging
        banned_words_set = set(bannedWords)
        if len(banned_words_set) != len(bannedWords):
            # Log or handle duplicate banned words if necessary
            pass
      
        banned_word_count = sum(word in banned_words_set for word in message)
        return banned_word_count >= 2

3. Empty Input Arrays

Pitfall: Not handling edge cases where either message or bannedWords is empty. While the current solution handles these correctly (returning false), it's good practice to make this explicit.

Solution:

class Solution:
    def reportSpam(self, message: List[str], bannedWords: List[str]) -> bool:
        # Early return for edge cases
        if not message or not bannedWords:
            return False
      
        banned_words_set = set(bannedWords)
        banned_word_count = sum(word in banned_words_set for word in message)
        return banned_word_count >= 2

4. Memory Optimization for Large Banned Words Lists

Pitfall: When bannedWords is extremely large, creating a set might consume significant memory. If memory is constrained and message is small, iterating might be more memory-efficient.

Alternative Approach for Small Messages:

class Solution:
    def reportSpam(self, message: List[str], bannedWords: List[str]) -> bool:
        # If message is very small and bannedWords is huge
        if len(message) < 10:  # threshold can be adjusted
            count = 0
            for word in message:
                if word in bannedWords:
                    count += 1
                    if count >= 2:
                        return True
            return False
      
        # Original approach for normal cases
        banned_words_set = set(bannedWords)
        return sum(word in banned_words_set for word in message) >= 2
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More