1078. Occurrences After Bigram

Leetcode Link

Problem Description

The problem is about searching for a specific pattern in a string of text. This pattern is formed by three consecutive words: first, second, and third. The words first and second are given as inputs, and the goal is to find the word third that immediately follows each occurrence of the sequence first second within a given block of text. You're asked to collect all these third words into an array and return it. It's important to note that the pattern should be in the correct order, and the words must directly follow one another with no other words in between.

For example, if the input text is "alice is alice is there and alice is here", first is "alice" and second is "is", you need to return all the words that come immediately after each "alice is", which are "there" and "here".


To solve this problem, the solution approach starts with breaking down the text into individual words. This is accomplished by using the split() function which divides the text into a list of words based on whitespace.

Next, we iterate over this list of words with a running index. In each iteration, we check if the current word, the word right after it, and the word following these two match the pattern of first, second, and then any third. We are interested in this third word only when the first two match the given input words.

To perform this check efficiently, we look at slices of three words at a time using the current index: words[i : i + 3]. If we find a match for first and second, we take the third word and add it to our answer list.

This process is repeated for every group of three consecutive words in the list, and we stop iterating two words before the end of the list because there's no enough room to fit the entire three-word pattern. Finally, the solution returns the collected list of third words that followed each first second pattern found in the text.

Solution Approach

The implementation of the solution to this problem involves a straightforward algorithm that leverages basic data structures and the concept of string manipulation.

Firstly, the text is converted into a list of words, this is done using the split() function which is a standard method in Python for dividing strings into a list based on a delimiter, which in this case, is space.

1words = text.split()

Once we have our list of words, our goal is to iterate through this list and identify consecutive triplets where the first and second words match the given inputs.

The primary data structure used in the implementation is a simple list to keep track of our answers:

1ans = []

The iteration starts at the beginning of the list and continues until the third-to-last word, allowing us to look at triplets without going out of the list's bounds:

1for i in range(len(words) - 2):

At each step of the iteration, we consider a slice of three consecutive words:

1a, b, c = words[i : i + 3]

This makes a the current word, b the word following it, and c the one right after b. We then compare a and b to our input first and second:

1if a == first and b == second:

If they match, it means we've found an occurrence of the pattern, and we append the third word c to our answers list:

1    ans.append(c)

The loop continues until all valid triplets have been considered. The resulting ans list, which now contains all the third words that followed each first second pattern, is then returned:

1return ans

No additional or complex data structures are required, and the algorithm runs with a time complexity of O(n), where n is the number of words in the text, as it requires a single pass through the list of words. The space complexity is O(m), where m is the count of valid third words, since we store each one in the ans list.

Level Up Your
Algo Skills

Example Walkthrough

Let's apply the solution approach with an example. Suppose our input text is "the quick brown fox jumps over the lazy dog", and we're looking for the first word "quick" and the second word "brown." Our goal is to find the third word that comes right after each "quick brown" sequence in the text.

Step by step, the algorithm does the following:

  1. Split the text into individual words:
1words = "the quick brown fox jumps over the lazy dog".split()
2# words = ['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
  1. Initialize an empty list to store the answers:
1ans = []
  1. Iterate over the list of words, stopping two words before the last word to ensure we can look at groups of three:
1for i in range(len(words) - 2): # This loops from 0 to len(words) - 3
  1. In each iteration, create a slice of three words and assign them to variables a, b, and c:
1# Iteration 0: i = 0
2a, b, c = words[0 : 0 + 3] # a = 'the', b = 'quick', c = 'brown'
3# Iteration 1: i = 1
4a, b, c = words[1 : 1 + 3] # a = 'quick', b = 'brown', c = 'fox'
5# And so on...
  1. Compare a and b with the first and second words. If they match, append c to the ans list:
1# Iteration 1: a = 'quick', b = 'brown', c = 'fox'
2if a == "quick" and b == "brown": # This condition is True
3    ans.append(c) # ans = ['fox']
  1. Continue the loop until we have checked every group of three words. For this example, only one match exists, and the loop ends after iteration 6 (last index checked is 6 because len(words) - 2 is 7, and range is exclusive on the end):
1# After the loop ends
2return ans # ans = ['fox']

So, in this case, the answer list ans contains the word "fox," which is the word that follows "quick brown" in the given text.

Using this method, the algorithm efficiently identifies the words that follow each occurrence of a specified two-word sequence in linear time.

Python Solution

1class Solution:
2    def findOcurrences(self, text: str, first: str, second: str) -> list[str]:
3        # Split input text into a list of words
4        words = text.split()
5        # Initialize an empty list to store the third words following the first and second words
6        third_words = []
8        # Iterate through the list of words, stopping two words before the end
9        for i in range(len(words) - 2):
10            # Unpack the current triplet of words for easy comparison
11            current_first, current_second, following_word = words[i : i + 3]
12            # Check if the current first two words match the provided first and second words
13            if current_first == first and current_second == second:
14                # If yes, append the following word to the results list
15                third_words.append(following_word)
17        # Return the list of third words that follow the first and second words
18        return third_words

Java Solution

1class Solution {
2    public String[] findOccurrences(String text, String first, String second) {
3        // Split the input text into words.
4        String[] words = text.split(" ");
6        // Create a list to store the third words following the 'first' and 'second' words.
7        List<String> thirdWordsList = new ArrayList<>();
9        // Iterate through the words, stopping two words before the last to avoid out-of-bounds access.
10        for (int i = 0; i < words.length - 2; ++i) {
11            // Check if the current word is equal to 'first' and the next word is equal to 'second'.
12            if (first.equals(words[i]) && second.equals(words[i + 1])) {
13                // If the condition is met, add the word that comes after 'second' to the list.
14                thirdWordsList.add(words[i + 2]);
15            }
16        }
18        // Convert the list of third words to an array and return it.
19        return thirdWordsList.toArray(new String[0]);
20    }

C++ Solution

1#include <sstream>
2#include <string>
3#include <vector>
5class Solution {
7    // Function to find all occurrences of third word that immediately follow
8    // the first and second words in the given text.
9    std::vector<std::string> findOcurrences(std::string text, std::string first, std::string second) {
10        // Create an input string stream from the given text
11        std::istringstream inputStream(text);
12        std::vector<std::string> words; // Vector to store all words from the text
13        std::string word; // Variable to hold each word while extracting from stream
15        // Read words from the stream and emplace them to the words vector
16        while (inputStream >> word) {
17            words.emplace_back(word);
18        }
20        std::vector<std::string> result; // Vector to store the result
21        int numWords = words.size(); // Get the total number of words
23        // Iterate over all words, stopping 2 words before the last
24        for (int i = 0; i < numWords - 2; ++i) {
25            // If the current and next word match 'first' and 'second', respectively
26            if (words[i] == first && words[i + 1] == second) {
27                // Add the word immediately following them to the result
28                result.emplace_back(words[i + 2]);
29            }
30        }
32        return result; // Return the final result vector
33    }

Typescript Solution

2 * This function finds all the occurrences of a triplet pattern in a sentence, where the first two
3 * words in the triplet are given as inputs 'first' and 'second', and returns an array containing
4 * the third words of those triplets.
5 * @param text - The string text in which to search for the triplets.
6 * @param first - The first word in the triplet sequence to match.
7 * @param second - The second word in the triplet sequence to match.
8 * @returns An array of the third words following each found 'first' and 'second' word pair.
9 */
10function findOcurrences(text: string, first: string, second: string): string[] {
11    // Split the input text into an array of words.
12    const words = text.split(' ');
13    // Determine the number of words in the array.
14    const wordCount = words.length;
15    // Initialize an array to store the results.
16    const matches: string[] = [];
18    // Iterate through each word in the array until the second-to-last word.
19    for (let i = 0; i < wordCount - 2; i++) {
20        // Check if a sequence matches the 'first' and 'second' words.
21        if (words[i] === first && words[i + 1] === second) {
22            // If a match is found, add the third word to the results array.
23            matches.push(words[i + 2]);
24        }
25    }
27    // Return the array of matching third words.
28    return matches;

Time and Space Complexity

Time Complexity

The time complexity of the given function is O(n), where n is the number of words in the input string text. This complexity arises because the function goes through all the words in the text sequentially only once, examining if each sequence of three words starts with the first and second words accordingly. With the number of iterations equal to len(words) - 2, and because string splitting, comparisons, and appending to the list are all O(1) operations for each iteration, the overall time complexity remains linear.

Space Complexity

The space complexity of the function is O(m), where m is the number of triplets found that match the given pattern. This is because the space required is directly proportional to the number of third words "c" that follows a matching pair ("a", "b"). We also have additional O(n) space complexity from creating the words list where n is the number of words in the given text. However, since m is bounded by n (i.e., in the worst case, m equals n-2 when every triplet in the text is a match), the dominant term is still O(n). Therefore, we simplify this and express the overall space complexity as O(n).

Become an
Algo Monster

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.

TA 👨‍🏫