Facebook Pixel

1078. Occurrences After Bigram

EasyString
Leetcode Link

Problem Description

You are given a string text containing multiple words separated by spaces. You need to find patterns where three consecutive words appear in a specific sequence.

Given two strings first and second, you need to:

  1. Look for occurrences where first appears as a word, immediately followed by second as the next word
  2. When you find such a pattern, capture the third word that immediately follows second
  3. Return all such third words in an array

For example, if text = "alice is a good girl she is a good student", first = "a", and second = "good":

  • The pattern "a good" appears twice in the text
  • After the first occurrence "a good", the next word is "girl"
  • After the second occurrence "a good", the next word is "student"
  • So you would return ["girl", "student"]

The solution works by:

  1. Splitting the text into individual words using text.split()
  2. Iterating through the words array with a sliding window of size 3
  3. Checking if the first two words in the window match first and second respectively
  4. If they match, adding the third word to the result array
  5. The loop stops at len(words) - 2 to ensure we always have three words to examine
Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that we're looking for a pattern of three consecutive words where the first two words are fixed (given as first and second), and we need to collect all possible third words that complete this pattern.

Since we're dealing with words in a text, the natural first step is to split the text into individual words. This transforms our problem from string matching to array traversal, which is much simpler to handle.

Once we have an array of words, we can think of this as a sliding window problem. We need to examine every possible sequence of three consecutive words in the text. For each triplet of words (a, b, c), we check if a matches first and b matches second. If both conditions are met, then c is one of our answers.

Why do we iterate only until len(words) - 2? Because we need to examine groups of three words at each position. If we start at position i, we look at words at positions i, i+1, and i+2. The last valid starting position is therefore len(words) - 3, which means we iterate while i < len(words) - 2.

The beauty of this approach is its simplicity - we don't need complex string matching algorithms or regular expressions. By converting the problem to word-level matching instead of character-level, we avoid complications like partial word matches or dealing with word boundaries. Each word is treated as a distinct unit, making the comparison straightforward with the == operator.

Solution Approach

The implementation follows a straightforward linear scan approach with a sliding window pattern:

  1. Text Preprocessing: First, we split the input text into an array of words using text.split(). This built-in method handles multiple spaces and gives us clean word tokens to work with.

  2. Initialize Result Container: We create an empty list ans to store all the third words that match our pattern.

  3. Sliding Window Traversal: We iterate through the words array with index i from 0 to len(words) - 2. At each position, we extract a window of three consecutive words:

    a, b, c = words[i : i + 3]

    This slice operation words[i : i + 3] gives us exactly three words starting from index i.

  4. Pattern Matching: For each triplet (a, b, c), we check if the pattern matches:

    if a == first and b == second:
        ans.append(c)

    When both conditions are satisfied (first word equals first and second word equals second), we add the third word c to our result list.

  5. Return Results: After scanning through all possible positions, we return the collected list of third words.

The time complexity is O(n) where n is the number of words in the text, as we visit each word position once. The space complexity is O(n) for storing the split words array, plus O(k) for the result where k is the number of matching patterns found.

This solution elegantly handles edge cases:

  • If the text has fewer than 3 words, the loop doesn't execute and returns an empty list
  • Multiple occurrences of the pattern are all captured
  • The same third word can appear multiple times in the result if the pattern occurs multiple times

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example to illustrate the solution approach.

Input:

  • text = "I love to code and love to learn"
  • first = "love"
  • second = "to"

Step 1: Split the text into words

words = ["I", "love", "to", "code", "and", "love", "to", "learn"]

Step 2: Initialize result container

ans = []

Step 3: Iterate with sliding window (i from 0 to len(words) - 2 = 6)

  • i = 0: Extract triplet ["I", "love", "to"]

    • a = "I", b = "love", c = "to"
    • Check: a == "love"? No → skip
  • i = 1: Extract triplet ["love", "to", "code"]

    • a = "love", b = "to", c = "code"
    • Check: a == "love"? Yes, b == "to"? Yes → Match found!
    • Add "code" to ans: ans = ["code"]
  • i = 2: Extract triplet ["to", "code", "and"]

    • a = "to", b = "code", c = "and"
    • Check: a == "love"? No → skip
  • i = 3: Extract triplet ["code", "and", "love"]

    • a = "code", b = "and", c = "love"
    • Check: a == "love"? No → skip
  • i = 4: Extract triplet ["and", "love", "to"]

    • a = "and", b = "love", c = "to"
    • Check: a == "love"? No → skip
  • i = 5: Extract triplet ["love", "to", "learn"]

    • a = "love", b = "to", c = "learn"
    • Check: a == "love"? Yes, b == "to"? Yes → Match found!
    • Add "learn" to ans: ans = ["code", "learn"]

Step 4: Return result

return ["code", "learn"]

The pattern "love to" appears twice in the text, followed by "code" and "learn" respectively, which are both captured in our result.

Solution Implementation

1class Solution:
2    def findOcurrences(self, text: str, first: str, second: str) -> List[str]:
3        """
4        Find all occurrences of the third word that follows the pattern: first second third
5      
6        Args:
7            text: Input string containing words separated by spaces
8            first: The first word in the pattern to match
9            second: The second word in the pattern to match
10          
11        Returns:
12            List of third words that appear after consecutive occurrences of first and second
13        """
14        # Split the text into individual words
15        words = text.split()
16      
17        # Initialize result list to store third words
18        result = []
19      
20        # Iterate through the words, stopping 2 positions before the end
21        # to ensure we have enough words to check a triplet
22        for i in range(len(words) - 2):
23            # Extract three consecutive words starting at index i
24            word_1, word_2, word_3 = words[i:i + 3]
25          
26            # Check if the first two words match our pattern
27            if word_1 == first and word_2 == second:
28                # Add the third word to our result
29                result.append(word_3)
30      
31        return result
32
1class Solution {
2  
3    /**
4     * Finds all occurrences of words that appear immediately after the pattern "first second" in the text.
5     * 
6     * @param text   The input text to search through
7     * @param first  The first word in the pattern to match
8     * @param second The second word in the pattern to match
9     * @return An array of words that appear immediately after each occurrence of "first second"
10     */
11    public String[] findOcurrences(String text, String first, String second) {
12        // Split the text into individual words using space as delimiter
13        String[] words = text.split(" ");
14      
15        // List to store the third words that follow the pattern "first second"
16        List<String> resultList = new ArrayList<>();
17      
18        // Iterate through the words array, stopping 2 positions before the end
19        // to ensure we have room to check for the pattern "first second third"
20        for (int i = 0; i < words.length - 2; i++) {
21            // Check if current word matches 'first' and next word matches 'second'
22            if (first.equals(words[i]) && second.equals(words[i + 1])) {
23                // Add the word following the pattern to the result list
24                resultList.add(words[i + 2]);
25            }
26        }
27      
28        // Convert the list to an array and return
29        return resultList.toArray(new String[0]);
30    }
31}
32
1class Solution {
2public:
3    vector<string> findOcurrences(string text, string first, string second) {
4        // Create input string stream to parse the text
5        istringstream inputStream(text);
6      
7        // Store all words from the text
8        vector<string> words;
9        string currentWord;
10      
11        // Extract words from the text using stream extraction
12        while (inputStream >> currentWord) {
13            words.emplace_back(currentWord);
14        }
15      
16        // Result vector to store third words after matching bigrams
17        vector<string> result;
18      
19        // Get the total number of words
20        int wordCount = words.size();
21      
22        // Iterate through words to find matching bigrams (first, second)
23        // Stop at wordCount - 2 to ensure we have at least 3 consecutive words
24        for (int i = 0; i < wordCount - 2; ++i) {
25            // Check if current word matches 'first' and next word matches 'second'
26            if (words[i] == first && words[i + 1] == second) {
27                // Add the third word (after the bigram) to the result
28                result.emplace_back(words[i + 2]);
29            }
30        }
31      
32        return result;
33    }
34};
35
1/**
2 * Finds all words that appear immediately after the pattern "first second" in the text
3 * @param text - The input text to search through
4 * @param first - The first word of the pattern to match
5 * @param second - The second word of the pattern to match
6 * @returns An array of words that appear after each occurrence of "first second"
7 */
8function findOcurrences(text: string, first: string, second: string): string[] {
9    // Split the text into individual words
10    const words: string[] = text.split(' ');
11  
12    // Get the total number of words
13    const wordCount: number = words.length;
14  
15    // Initialize array to store the results
16    const results: string[] = [];
17  
18    // Iterate through words, stopping 2 positions before the end
19    // to ensure we have space to check for the pattern and the following word
20    for (let index = 0; index < wordCount - 2; index++) {
21        // Check if current word matches 'first' and next word matches 'second'
22        if (words[index] === first && words[index + 1] === second) {
23            // Add the word that follows the pattern to results
24            results.push(words[index + 2]);
25        }
26    }
27  
28    return results;
29}
30

Time and Space Complexity

Time Complexity: O(n) where n is the length of the input string text.

  • The text.split() operation takes O(n) time to split the string into words
  • The for loop iterates through len(words) - 2 iterations, which is O(m) where m is the number of words
  • Inside each iteration, slicing words[i : i + 3] takes O(1) time since it always slices exactly 3 elements
  • The string comparisons a == first and b == second take O(1) time on average (assuming constant-length words)
  • The append() operation takes amortized O(1) time
  • Since the number of words m is proportional to the length of the text n, the overall time complexity is O(n)

Space Complexity: O(n) where n is the length of the input string text.

  • The words list stores all words from the split operation, requiring O(n) space to store all characters from the input text
  • The ans list stores the result, which in the worst case could contain up to O(m) words where m is the number of words in the text
  • The temporary variables a, b, c use O(1) space
  • Since the total space used is dominated by storing the split words, the overall space complexity is O(n)

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Incorrect Handling of Multiple Consecutive Spaces

One common pitfall is assuming that text.split() and text.split(' ') behave identically. When the input text contains multiple consecutive spaces or leading/trailing spaces, using text.split(' ') can create empty strings in the words array, leading to incorrect pattern matching.

Problem Example:

text = "alice  is   a good girl"  # Multiple spaces between words
first = "a"
second = "good"

# Wrong approach:
words = text.split(' ')  # Results in ['alice', '', 'is', '', '', 'a', 'good', 'girl']

# Correct approach:
words = text.split()     # Results in ['alice', 'is', 'a', 'good', 'girl']

Solution: Always use text.split() without arguments, which automatically handles multiple spaces, tabs, and newlines as delimiters and filters out empty strings.

2. Case Sensitivity Issues

Another pitfall occurs when the problem requirements are unclear about case sensitivity. The current solution is case-sensitive, meaning "Good" and "good" are treated as different words.

Problem Example:

text = "Alice is A Good girl she is a good student"
first = "a"
second = "good"
# This would only match "a good" but not "A Good"

Solution: If case-insensitive matching is required, normalize the case:

def findOcurrences(self, text: str, first: str, second: str) -> List[str]:
    words = text.split()
    first_lower = first.lower()
    second_lower = second.lower()
    result = []
  
    for i in range(len(words) - 2):
        word_1, word_2, word_3 = words[i:i + 3]
        if word_1.lower() == first_lower and word_2.lower() == second_lower:
            result.append(word_3)  # Return original case
  
    return result

3. Not Validating Input Assumptions

A subtle pitfall is not considering edge cases in the input, such as empty strings or single-word inputs.

Problem Example:

text = ""  # Empty string
# or
text = "word"  # Single word
# or
text = "first second"  # Only two words when we need three

Solution: While the current implementation handles these gracefully (the loop simply doesn't execute), it's good practice to explicitly check:

def findOcurrences(self, text: str, first: str, second: str) -> List[str]:
    if not text or not first or not second:
        return []
  
    words = text.split()
    if len(words) < 3:
        return []
  
    # Continue with the main logic...

4. Overlapping Pattern Consideration

A less obvious pitfall is not recognizing that patterns can overlap when the same word appears multiple times consecutively.

Problem Example:

text = "a a a a"
first = "a"
second = "a"
# The patterns "a a a" overlap at indices 0-2 and 1-3
# Should return ["a", "a"]

The current solution correctly handles this by advancing the window one position at a time, but developers might mistakenly try to skip ahead after finding a match, missing overlapping patterns.

Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

Which of the following uses divide and conquer strategy?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More