1078. Occurrences After Bigram
Problem Description
You are given a string text
containing multiple words separated by spaces. You need to find patterns where three consecutive words appear in a specific sequence.
Given two strings first
and second
, you need to:
- Look for occurrences where
first
appears as a word, immediately followed bysecond
as the next word - When you find such a pattern, capture the third word that immediately follows
second
- Return all such third words in an array
For example, if text = "alice is a good girl she is a good student"
, first = "a"
, and second = "good"
:
- The pattern "a good" appears twice in the text
- After the first occurrence "a good", the next word is "girl"
- After the second occurrence "a good", the next word is "student"
- So you would return
["girl", "student"]
The solution works by:
- Splitting the text into individual words using
text.split()
- Iterating through the words array with a sliding window of size 3
- Checking if the first two words in the window match
first
andsecond
respectively - If they match, adding the third word to the result array
- The loop stops at
len(words) - 2
to ensure we always have three words to examine
Intuition
The key insight is that we're looking for a pattern of three consecutive words where the first two words are fixed (given as first
and second
), and we need to collect all possible third words that complete this pattern.
Since we're dealing with words in a text, the natural first step is to split the text into individual words. This transforms our problem from string matching to array traversal, which is much simpler to handle.
Once we have an array of words, we can think of this as a sliding window problem. We need to examine every possible sequence of three consecutive words in the text. For each triplet of words (a, b, c)
, we check if a
matches first
and b
matches second
. If both conditions are met, then c
is one of our answers.
Why do we iterate only until len(words) - 2
? Because we need to examine groups of three words at each position. If we start at position i
, we look at words at positions i
, i+1
, and i+2
. The last valid starting position is therefore len(words) - 3
, which means we iterate while i < len(words) - 2
.
The beauty of this approach is its simplicity - we don't need complex string matching algorithms or regular expressions. By converting the problem to word-level matching instead of character-level, we avoid complications like partial word matches or dealing with word boundaries. Each word is treated as a distinct unit, making the comparison straightforward with the ==
operator.
Solution Approach
The implementation follows a straightforward linear scan approach with a sliding window pattern:
-
Text Preprocessing: First, we split the input text into an array of words using
text.split()
. This built-in method handles multiple spaces and gives us clean word tokens to work with. -
Initialize Result Container: We create an empty list
ans
to store all the third words that match our pattern. -
Sliding Window Traversal: We iterate through the words array with index
i
from0
tolen(words) - 2
. At each position, we extract a window of three consecutive words:a, b, c = words[i : i + 3]
This slice operation
words[i : i + 3]
gives us exactly three words starting from indexi
. -
Pattern Matching: For each triplet
(a, b, c)
, we check if the pattern matches:if a == first and b == second: ans.append(c)
When both conditions are satisfied (first word equals
first
and second word equalssecond
), we add the third wordc
to our result list. -
Return Results: After scanning through all possible positions, we return the collected list of third words.
The time complexity is O(n)
where n
is the number of words in the text, as we visit each word position once. The space complexity is O(n)
for storing the split words array, plus O(k)
for the result where k
is the number of matching patterns found.
This solution elegantly handles edge cases:
- If the text has fewer than 3 words, the loop doesn't execute and returns an empty list
- Multiple occurrences of the pattern are all captured
- The same third word can appear multiple times in the result if the pattern occurs multiple times
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate the solution approach.
Input:
text = "I love to code and love to learn"
first = "love"
second = "to"
Step 1: Split the text into words
words = ["I", "love", "to", "code", "and", "love", "to", "learn"]
Step 2: Initialize result container
ans = []
Step 3: Iterate with sliding window (i from 0 to len(words) - 2 = 6)
-
i = 0: Extract triplet
["I", "love", "to"]
- a = "I", b = "love", c = "to"
- Check: a == "love"? No → skip
-
i = 1: Extract triplet
["love", "to", "code"]
- a = "love", b = "to", c = "code"
- Check: a == "love"? Yes, b == "to"? Yes → Match found!
- Add "code" to ans:
ans = ["code"]
-
i = 2: Extract triplet
["to", "code", "and"]
- a = "to", b = "code", c = "and"
- Check: a == "love"? No → skip
-
i = 3: Extract triplet
["code", "and", "love"]
- a = "code", b = "and", c = "love"
- Check: a == "love"? No → skip
-
i = 4: Extract triplet
["and", "love", "to"]
- a = "and", b = "love", c = "to"
- Check: a == "love"? No → skip
-
i = 5: Extract triplet
["love", "to", "learn"]
- a = "love", b = "to", c = "learn"
- Check: a == "love"? Yes, b == "to"? Yes → Match found!
- Add "learn" to ans:
ans = ["code", "learn"]
Step 4: Return result
return ["code", "learn"]
The pattern "love to" appears twice in the text, followed by "code" and "learn" respectively, which are both captured in our result.
Solution Implementation
1class Solution:
2 def findOcurrences(self, text: str, first: str, second: str) -> List[str]:
3 """
4 Find all occurrences of the third word that follows the pattern: first second third
5
6 Args:
7 text: Input string containing words separated by spaces
8 first: The first word in the pattern to match
9 second: The second word in the pattern to match
10
11 Returns:
12 List of third words that appear after consecutive occurrences of first and second
13 """
14 # Split the text into individual words
15 words = text.split()
16
17 # Initialize result list to store third words
18 result = []
19
20 # Iterate through the words, stopping 2 positions before the end
21 # to ensure we have enough words to check a triplet
22 for i in range(len(words) - 2):
23 # Extract three consecutive words starting at index i
24 word_1, word_2, word_3 = words[i:i + 3]
25
26 # Check if the first two words match our pattern
27 if word_1 == first and word_2 == second:
28 # Add the third word to our result
29 result.append(word_3)
30
31 return result
32
1class Solution {
2
3 /**
4 * Finds all occurrences of words that appear immediately after the pattern "first second" in the text.
5 *
6 * @param text The input text to search through
7 * @param first The first word in the pattern to match
8 * @param second The second word in the pattern to match
9 * @return An array of words that appear immediately after each occurrence of "first second"
10 */
11 public String[] findOcurrences(String text, String first, String second) {
12 // Split the text into individual words using space as delimiter
13 String[] words = text.split(" ");
14
15 // List to store the third words that follow the pattern "first second"
16 List<String> resultList = new ArrayList<>();
17
18 // Iterate through the words array, stopping 2 positions before the end
19 // to ensure we have room to check for the pattern "first second third"
20 for (int i = 0; i < words.length - 2; i++) {
21 // Check if current word matches 'first' and next word matches 'second'
22 if (first.equals(words[i]) && second.equals(words[i + 1])) {
23 // Add the word following the pattern to the result list
24 resultList.add(words[i + 2]);
25 }
26 }
27
28 // Convert the list to an array and return
29 return resultList.toArray(new String[0]);
30 }
31}
32
1class Solution {
2public:
3 vector<string> findOcurrences(string text, string first, string second) {
4 // Create input string stream to parse the text
5 istringstream inputStream(text);
6
7 // Store all words from the text
8 vector<string> words;
9 string currentWord;
10
11 // Extract words from the text using stream extraction
12 while (inputStream >> currentWord) {
13 words.emplace_back(currentWord);
14 }
15
16 // Result vector to store third words after matching bigrams
17 vector<string> result;
18
19 // Get the total number of words
20 int wordCount = words.size();
21
22 // Iterate through words to find matching bigrams (first, second)
23 // Stop at wordCount - 2 to ensure we have at least 3 consecutive words
24 for (int i = 0; i < wordCount - 2; ++i) {
25 // Check if current word matches 'first' and next word matches 'second'
26 if (words[i] == first && words[i + 1] == second) {
27 // Add the third word (after the bigram) to the result
28 result.emplace_back(words[i + 2]);
29 }
30 }
31
32 return result;
33 }
34};
35
1/**
2 * Finds all words that appear immediately after the pattern "first second" in the text
3 * @param text - The input text to search through
4 * @param first - The first word of the pattern to match
5 * @param second - The second word of the pattern to match
6 * @returns An array of words that appear after each occurrence of "first second"
7 */
8function findOcurrences(text: string, first: string, second: string): string[] {
9 // Split the text into individual words
10 const words: string[] = text.split(' ');
11
12 // Get the total number of words
13 const wordCount: number = words.length;
14
15 // Initialize array to store the results
16 const results: string[] = [];
17
18 // Iterate through words, stopping 2 positions before the end
19 // to ensure we have space to check for the pattern and the following word
20 for (let index = 0; index < wordCount - 2; index++) {
21 // Check if current word matches 'first' and next word matches 'second'
22 if (words[index] === first && words[index + 1] === second) {
23 // Add the word that follows the pattern to results
24 results.push(words[index + 2]);
25 }
26 }
27
28 return results;
29}
30
Time and Space Complexity
Time Complexity: O(n)
where n
is the length of the input string text
.
- The
text.split()
operation takesO(n)
time to split the string into words - The for loop iterates through
len(words) - 2
iterations, which isO(m)
wherem
is the number of words - Inside each iteration, slicing
words[i : i + 3]
takesO(1)
time since it always slices exactly 3 elements - The string comparisons
a == first
andb == second
takeO(1)
time on average (assuming constant-length words) - The
append()
operation takes amortizedO(1)
time - Since the number of words
m
is proportional to the length of the textn
, the overall time complexity isO(n)
Space Complexity: O(n)
where n
is the length of the input string text
.
- The
words
list stores all words from the split operation, requiringO(n)
space to store all characters from the input text - The
ans
list stores the result, which in the worst case could contain up toO(m)
words wherem
is the number of words in the text - The temporary variables
a
,b
,c
useO(1)
space - Since the total space used is dominated by storing the split words, the overall space complexity is
O(n)
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Incorrect Handling of Multiple Consecutive Spaces
One common pitfall is assuming that text.split()
and text.split(' ')
behave identically. When the input text contains multiple consecutive spaces or leading/trailing spaces, using text.split(' ')
can create empty strings in the words array, leading to incorrect pattern matching.
Problem Example:
text = "alice is a good girl" # Multiple spaces between words first = "a" second = "good" # Wrong approach: words = text.split(' ') # Results in ['alice', '', 'is', '', '', 'a', 'good', 'girl'] # Correct approach: words = text.split() # Results in ['alice', 'is', 'a', 'good', 'girl']
Solution: Always use text.split()
without arguments, which automatically handles multiple spaces, tabs, and newlines as delimiters and filters out empty strings.
2. Case Sensitivity Issues
Another pitfall occurs when the problem requirements are unclear about case sensitivity. The current solution is case-sensitive, meaning "Good" and "good" are treated as different words.
Problem Example:
text = "Alice is A Good girl she is a good student" first = "a" second = "good" # This would only match "a good" but not "A Good"
Solution: If case-insensitive matching is required, normalize the case:
def findOcurrences(self, text: str, first: str, second: str) -> List[str]:
words = text.split()
first_lower = first.lower()
second_lower = second.lower()
result = []
for i in range(len(words) - 2):
word_1, word_2, word_3 = words[i:i + 3]
if word_1.lower() == first_lower and word_2.lower() == second_lower:
result.append(word_3) # Return original case
return result
3. Not Validating Input Assumptions
A subtle pitfall is not considering edge cases in the input, such as empty strings or single-word inputs.
Problem Example:
text = "" # Empty string # or text = "word" # Single word # or text = "first second" # Only two words when we need three
Solution: While the current implementation handles these gracefully (the loop simply doesn't execute), it's good practice to explicitly check:
def findOcurrences(self, text: str, first: str, second: str) -> List[str]:
if not text or not first or not second:
return []
words = text.split()
if len(words) < 3:
return []
# Continue with the main logic...
4. Overlapping Pattern Consideration
A less obvious pitfall is not recognizing that patterns can overlap when the same word appears multiple times consecutively.
Problem Example:
text = "a a a a" first = "a" second = "a" # The patterns "a a a" overlap at indices 0-2 and 1-3 # Should return ["a", "a"]
The current solution correctly handles this by advancing the window one position at a time, but developers might mistakenly try to skip ahead after finding a match, missing overlapping patterns.
Which of the following uses divide and conquer strategy?
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!