3295. Report Spam Message
Problem Description
You are given two arrays of strings: message
and bannedWords
.
The task is to determine if the message
array should be considered spam based on the following rule:
- A message is spam if it contains at least 2 words that exactly match any word from the
bannedWords
array.
Your function should:
- Return
true
if the message is spam (contains 2 or more banned words) - Return
false
if the message is not spam (contains fewer than 2 banned words)
The solution uses a hash table approach by converting bannedWords
into a set s
for O(1) lookup time. It then counts how many words from message
appear in this set. The expression sum(w in s for w in message)
counts the total occurrences of banned words in the message. If this count is greater than or equal to 2, the function returns true
, indicating the message is spam.
Note that the same banned word appearing multiple times in the message counts toward the threshold of 2. For example, if "bad" is in bannedWords
and appears twice in message
, that would make the message spam.
Intuition
The core challenge is to efficiently check if words from message
appear in bannedWords
and count how many matches we find.
A naive approach would be to check each word in message
against every word in bannedWords
using nested loops, which would give us O(n × m) time complexity where n is the length of message
and m is the length of bannedWords
. This becomes inefficient for large inputs.
The key insight is that we need fast lookups - we want to quickly determine if a word exists in bannedWords
. This naturally points us toward using a hash table (set in Python), which provides O(1) average-case lookup time.
By converting bannedWords
into a set first, we transform the problem from "check if this word exists in a list" (O(m) operation) to "check if this word exists in a set" (O(1) operation). This preprocessing step takes O(m) time but pays off when we iterate through the message
array.
For counting, we simply need to track how many words from message
are found in our banned words set. Python's generator expression w in s for w in message
creates a sequence of boolean values (True/False), and sum()
conveniently counts the True values since Python treats True as 1 and False as 0.
Once we have the count, determining if the message is spam becomes a simple comparison: is the count >= 2? This gives us an overall time complexity of O(n + m) which is optimal since we need to look at each word at least once.
Solution Approach
The solution implements a hash table-based approach to efficiently detect spam messages.
Step 1: Create a Hash Set
s = set(bannedWords)
We convert the bannedWords
list into a set s
. This preprocessing step takes O(m) time where m is the number of banned words, but it enables O(1) average-case lookup time for each word check.
Step 2: Count Banned Words in Message
sum(w in s for w in message)
This line performs multiple operations:
for w in message
: Iterates through each word in themessage
arrayw in s
: Checks if the current word exists in the banned words set (O(1) operation)- The generator expression produces a sequence of boolean values (True if word is banned, False otherwise)
sum()
: Counts the True values since Python treats True as 1 and False as 0
Step 3: Check Spam Threshold
return sum(w in s for w in message) >= 2
The final comparison checks if the count of banned words is at least 2. If yes, the message is spam and we return true
; otherwise, we return false
.
Time Complexity: O(n + m) where n is the length of message
and m is the length of bannedWords
. We spend O(m) time creating the set and O(n) time checking each word in the message.
Space Complexity: O(m) for storing the banned words in the hash set.
This approach is optimal because we must examine each word at least once, and the hash table provides the fastest possible lookups for determining if a word is banned.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example to understand how the solution works:
Input:
message = ["hello", "world", "free", "money", "free", "offer"]
bannedWords = ["free", "money", "spam"]
Step 1: Create Hash Set
First, we convert bannedWords
into a set:
s = {"free", "money", "spam"}
This allows us O(1) lookup time for checking if a word is banned.
Step 2: Count Banned Words
Now we iterate through each word in message
and check if it exists in our set:
Word in message | Is it in set s? | Boolean value |
---|---|---|
"hello" | No | False (0) |
"world" | No | False (0) |
"free" | Yes | True (1) |
"money" | Yes | True (1) |
"free" | Yes | True (1) |
"offer" | No | False (0) |
The generator expression w in s for w in message
produces: [False, False, True, True, True, False]
When we apply sum()
to these boolean values:
sum([False, False, True, True, True, False]) = 0 + 0 + 1 + 1 + 1 + 0 = 3
Step 3: Check Spam Threshold
We found 3 banned words in the message (counting duplicates). Since 3 >= 2, the function returns true
- this message is spam.
Note that "free" appears twice in the message and both occurrences count toward our total. This demonstrates that duplicate banned words contribute to the spam threshold.
Solution Implementation
1class Solution:
2 def reportSpam(self, message: List[str], bannedWords: List[str]) -> bool:
3 # Convert banned words list to a set for O(1) lookup time
4 banned_words_set = set(bannedWords)
5
6 # Count how many words from the message appear in the banned words set
7 # Using generator expression with sum to count True values (True = 1, False = 0)
8 banned_word_count = sum(word in banned_words_set for word in message)
9
10 # Return True if at least 2 banned words are found in the message
11 return banned_word_count >= 2
12
1class Solution {
2 /**
3 * Determines if a message should be reported as spam based on banned words.
4 * A message is considered spam if it contains at least 2 banned words.
5 *
6 * @param message Array of words representing the message to check
7 * @param bannedWords Array of words that are considered banned/spam indicators
8 * @return true if the message contains 2 or more banned words, false otherwise
9 */
10 public boolean reportSpam(String[] message, String[] bannedWords) {
11 // Create a HashSet for O(1) lookup of banned words
12 Set<String> bannedWordsSet = new HashSet<>();
13
14 // Populate the set with all banned words
15 for (String word : bannedWords) {
16 bannedWordsSet.add(word);
17 }
18
19 // Counter to track how many banned words are found in the message
20 int bannedWordCount = 0;
21
22 // Iterate through each word in the message
23 for (String word : message) {
24 // Check if current word is in the banned words set
25 if (bannedWordsSet.contains(word)) {
26 bannedWordCount++;
27
28 // Early return if threshold is met (2 or more banned words)
29 if (bannedWordCount >= 2) {
30 return true;
31 }
32 }
33 }
34
35 // Message is not spam (contains less than 2 banned words)
36 return false;
37 }
38}
39
1class Solution {
2public:
3 bool reportSpam(vector<string>& message, vector<string>& bannedWords) {
4 // Create a hash set from banned words for O(1) lookup
5 unordered_set<string> bannedWordsSet(bannedWords.begin(), bannedWords.end());
6
7 // Counter for tracking banned words found in the message
8 int bannedWordCount = 0;
9
10 // Iterate through each word in the message
11 for (const auto& word : message) {
12 // Check if current word is in the banned words set
13 if (bannedWordsSet.contains(word)) {
14 bannedWordCount++;
15
16 // Early return if we've found at least 2 banned words
17 if (bannedWordCount >= 2) {
18 return true;
19 }
20 }
21 }
22
23 // Message is not spam (less than 2 banned words found)
24 return false;
25 }
26};
27
1/**
2 * Determines if a message should be reported as spam based on banned words
3 * @param message - Array of words in the message to check
4 * @param bannedWords - Array of banned words to search for
5 * @returns true if the message contains at least 2 banned words, false otherwise
6 */
7function reportSpam(message: string[], bannedWords: string[]): boolean {
8 // Create a set of banned words for O(1) lookup performance
9 const bannedWordsSet = new Set<string>(bannedWords);
10
11 // Counter to track the number of banned words found
12 let bannedWordCount = 0;
13
14 // Iterate through each word in the message
15 for (const word of message) {
16 // Check if current word is in the banned words set
17 if (bannedWordsSet.has(word)) {
18 bannedWordCount++;
19
20 // Early return if we've found at least 2 banned words
21 if (bannedWordCount >= 2) {
22 return true;
23 }
24 }
25 }
26
27 // Return false if fewer than 2 banned words were found
28 return false;
29}
30
Time and Space Complexity
The time complexity is O((n + m) × |w|)
, where:
n
is the length of the arraymessage
m
is the length of the arraybannedWords
|w|
is the maximum length of the words in the arrays
Breaking down the time complexity:
- Creating the set from
bannedWords
:O(m × |w|)
- we need to hash each of them
words, and hashing a string takesO(|w|)
time - Iterating through
message
and checking membership:O(n × |w|)
- we check each of then
words in the message, and each membership check in the set takesO(|w|)
time for string comparison/hashing - Total:
O(m × |w| + n × |w|) = O((n + m) × |w|)
The space complexity is O(m × |w|)
, which comes from:
- The set
s
storing allm
banned words, where each word can have up to|w|
characters - The space needed to store the strings themselves in the set
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Case Sensitivity Issues
Pitfall: The solution performs exact matching, which means "Spam" and "spam" are treated as different words. If the banned words list contains "spam" but the message contains "Spam", it won't be detected as a banned word.
Solution:
class Solution:
def reportSpam(self, message: List[str], bannedWords: List[str]) -> bool:
# Convert all banned words to lowercase for case-insensitive matching
banned_words_set = set(word.lower() for word in bannedWords)
# Check message words in lowercase
banned_word_count = sum(word.lower() in banned_words_set for word in message)
return banned_word_count >= 2
2. Duplicate Banned Words in Input
Pitfall: If bannedWords
contains duplicates like ["bad", "bad", "evil"]
, converting to a set automatically removes duplicates. While this doesn't affect correctness (since we're checking membership, not counting banned words), it might cause confusion when debugging or if requirements change.
Solution:
class Solution:
def reportSpam(self, message: List[str], bannedWords: List[str]) -> bool:
# Explicitly handle duplicates if needed for logging/debugging
banned_words_set = set(bannedWords)
if len(banned_words_set) != len(bannedWords):
# Log or handle duplicate banned words if necessary
pass
banned_word_count = sum(word in banned_words_set for word in message)
return banned_word_count >= 2
3. Empty Input Arrays
Pitfall: Not handling edge cases where either message
or bannedWords
is empty. While the current solution handles these correctly (returning false
), it's good practice to make this explicit.
Solution:
class Solution:
def reportSpam(self, message: List[str], bannedWords: List[str]) -> bool:
# Early return for edge cases
if not message or not bannedWords:
return False
banned_words_set = set(bannedWords)
banned_word_count = sum(word in banned_words_set for word in message)
return banned_word_count >= 2
4. Memory Optimization for Large Banned Words Lists
Pitfall: When bannedWords
is extremely large, creating a set might consume significant memory. If memory is constrained and message
is small, iterating might be more memory-efficient.
Alternative Approach for Small Messages:
class Solution:
def reportSpam(self, message: List[str], bannedWords: List[str]) -> bool:
# If message is very small and bannedWords is huge
if len(message) < 10: # threshold can be adjusted
count = 0
for word in message:
if word in bannedWords:
count += 1
if count >= 2:
return True
return False
# Original approach for normal cases
banned_words_set = set(bannedWords)
return sum(word in banned_words_set for word in message) >= 2
Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!