1078. Occurrences After Bigram
Problem Description
The problem is about searching for a specific pattern in a string of text. This pattern is formed by three consecutive words: first
, second
, and third
. The words first
and second
are given as inputs, and the goal is to find the word third
that immediately follows each occurrence of the sequence first second
within a given block of text. You're asked to collect all these third
words into an array and return it. It's important to note that the pattern should be in the correct order, and the words must directly follow one another with no other words in between.
For example, if the input text is "alice is alice is there and alice is here", first
is "alice" and second
is "is", you need to return all the words that come immediately after each "alice is", which are "there" and "here".
Intuition
To solve this problem, the solution approach starts with breaking down the text into individual words. This is accomplished by using the split()
function which divides the text into a list of words based on whitespace.
Next, we iterate over this list of words with a running index. In each iteration, we check if the current word, the word right after it, and the word following these two match the pattern of first
, second
, and then any third
. We are interested in this third word only when the first two match the given input words.
To perform this check efficiently, we look at slices of three words at a time using the current index: words[i : i + 3]
. If we find a match for first
and second
, we take the third word and add it to our answer list.
This process is repeated for every group of three consecutive words in the list, and we stop iterating two words before the end of the list because there's no enough room to fit the entire three-word pattern. Finally, the solution returns the collected list of third
words that followed each first second
pattern found in the text.
Solution Approach
The implementation of the solution to this problem involves a straightforward algorithm that leverages basic data structures and the concept of string manipulation.
Firstly, the text is converted into a list of words, this is done using the split()
function which is a standard method in Python for dividing strings into a list based on a delimiter, which in this case, is space.
words = text.split()
Once we have our list of words, our goal is to iterate through this list and identify consecutive triplets where the first and second words match the given inputs.
The primary data structure used in the implementation is a simple list to keep track of our answers:
ans = []
The iteration starts at the beginning of the list and continues until the third-to-last word, allowing us to look at triplets without going out of the list's bounds:
for i in range(len(words) - 2):
At each step of the iteration, we consider a slice of three consecutive words:
a, b, c = words[i : i + 3]
This makes a
the current word, b
the word following it, and c
the one right after b
. We then compare a
and b
to our input first
and second
:
if a == first and b == second:
If they match, it means we've found an occurrence of the pattern, and we append the third word c
to our answers list:
ans.append(c)
The loop continues until all valid triplets have been considered. The resulting ans
list, which now contains all the third
words that followed each first second
pattern, is then returned:
return ans
No additional or complex data structures are required, and the algorithm runs with a time complexity of O(n), where n is the number of words in the text, as it requires a single pass through the list of words. The space complexity is O(m), where m is the count of valid third
words, since we store each one in the ans
list.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's apply the solution approach with an example. Suppose our input text is "the quick brown fox jumps over the lazy dog", and we're looking for the first
word "quick" and the second
word "brown." Our goal is to find the third
word that comes right after each "quick brown" sequence in the text.
Step by step, the algorithm does the following:
- Split the text into individual words:
words = "the quick brown fox jumps over the lazy dog".split() # words = ['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
- Initialize an empty list to store the answers:
ans = []
- Iterate over the list of words, stopping two words before the last word to ensure we can look at groups of three:
for i in range(len(words) - 2): # This loops from 0 to len(words) - 3
- In each iteration, create a slice of three words and assign them to variables
a
,b
, andc
:
# Iteration 0: i = 0 a, b, c = words[0 : 0 + 3] # a = 'the', b = 'quick', c = 'brown' # Iteration 1: i = 1 a, b, c = words[1 : 1 + 3] # a = 'quick', b = 'brown', c = 'fox' # And so on...
- Compare
a
andb
with thefirst
andsecond
words. If they match, appendc
to theans
list:
# Iteration 1: a = 'quick', b = 'brown', c = 'fox' if a == "quick" and b == "brown": # This condition is True ans.append(c) # ans = ['fox']
- Continue the loop until we have checked every group of three words. For this example, only one match exists, and the loop ends after iteration 6 (last index checked is 6 because
len(words) - 2
is 7, and range is exclusive on the end):
# After the loop ends return ans # ans = ['fox']
So, in this case, the answer list ans
contains the word "fox," which is the word that follows "quick brown" in the given text.
Using this method, the algorithm efficiently identifies the words that follow each occurrence of a specified two-word sequence in linear time.
Solution Implementation
1class Solution:
2 def findOcurrences(self, text: str, first: str, second: str) -> list[str]:
3 # Split input text into a list of words
4 words = text.split()
5 # Initialize an empty list to store the third words following the first and second words
6 third_words = []
7
8 # Iterate through the list of words, stopping two words before the end
9 for i in range(len(words) - 2):
10 # Unpack the current triplet of words for easy comparison
11 current_first, current_second, following_word = words[i : i + 3]
12 # Check if the current first two words match the provided first and second words
13 if current_first == first and current_second == second:
14 # If yes, append the following word to the results list
15 third_words.append(following_word)
16
17 # Return the list of third words that follow the first and second words
18 return third_words
19
1class Solution {
2 public String[] findOccurrences(String text, String first, String second) {
3 // Split the input text into words.
4 String[] words = text.split(" ");
5
6 // Create a list to store the third words following the 'first' and 'second' words.
7 List<String> thirdWordsList = new ArrayList<>();
8
9 // Iterate through the words, stopping two words before the last to avoid out-of-bounds access.
10 for (int i = 0; i < words.length - 2; ++i) {
11 // Check if the current word is equal to 'first' and the next word is equal to 'second'.
12 if (first.equals(words[i]) && second.equals(words[i + 1])) {
13 // If the condition is met, add the word that comes after 'second' to the list.
14 thirdWordsList.add(words[i + 2]);
15 }
16 }
17
18 // Convert the list of third words to an array and return it.
19 return thirdWordsList.toArray(new String[0]);
20 }
21}
22
1#include <sstream>
2#include <string>
3#include <vector>
4
5class Solution {
6public:
7 // Function to find all occurrences of third word that immediately follow
8 // the first and second words in the given text.
9 std::vector<std::string> findOcurrences(std::string text, std::string first, std::string second) {
10 // Create an input string stream from the given text
11 std::istringstream inputStream(text);
12 std::vector<std::string> words; // Vector to store all words from the text
13 std::string word; // Variable to hold each word while extracting from stream
14
15 // Read words from the stream and emplace them to the words vector
16 while (inputStream >> word) {
17 words.emplace_back(word);
18 }
19
20 std::vector<std::string> result; // Vector to store the result
21 int numWords = words.size(); // Get the total number of words
22
23 // Iterate over all words, stopping 2 words before the last
24 for (int i = 0; i < numWords - 2; ++i) {
25 // If the current and next word match 'first' and 'second', respectively
26 if (words[i] == first && words[i + 1] == second) {
27 // Add the word immediately following them to the result
28 result.emplace_back(words[i + 2]);
29 }
30 }
31
32 return result; // Return the final result vector
33 }
34};
35
1/**
2 * This function finds all the occurrences of a triplet pattern in a sentence, where the first two
3 * words in the triplet are given as inputs 'first' and 'second', and returns an array containing
4 * the third words of those triplets.
5 * @param text - The string text in which to search for the triplets.
6 * @param first - The first word in the triplet sequence to match.
7 * @param second - The second word in the triplet sequence to match.
8 * @returns An array of the third words following each found 'first' and 'second' word pair.
9 */
10function findOcurrences(text: string, first: string, second: string): string[] {
11 // Split the input text into an array of words.
12 const words = text.split(' ');
13 // Determine the number of words in the array.
14 const wordCount = words.length;
15 // Initialize an array to store the results.
16 const matches: string[] = [];
17
18 // Iterate through each word in the array until the second-to-last word.
19 for (let i = 0; i < wordCount - 2; i++) {
20 // Check if a sequence matches the 'first' and 'second' words.
21 if (words[i] === first && words[i + 1] === second) {
22 // If a match is found, add the third word to the results array.
23 matches.push(words[i + 2]);
24 }
25 }
26
27 // Return the array of matching third words.
28 return matches;
29}
30
Time and Space Complexity
Time Complexity
The time complexity of the given function is O(n)
, where n
is the number of words in the input string text
. This complexity arises because the function goes through all the words in the text sequentially only once, examining if each sequence of three words starts with the first
and second
words accordingly. With the number of iterations equal to len(words) - 2
, and because string splitting, comparisons, and appending to the list are all O(1)
operations for each iteration, the overall time complexity remains linear.
Space Complexity
The space complexity of the function is O(m)
, where m
is the number of triplets found that match the given pattern. This is because the space required is directly proportional to the number of third words "c"
that follows a matching pair ("a"
, "b"
). We also have additional O(n)
space complexity from creating the words
list where n
is the number of words in the given text. However, since m
is bounded by n
(i.e., in the worst case, m
equals n-2
when every triplet in the text is a match), the dominant term is still O(n)
. Therefore, we simplify this and express the overall space complexity as O(n)
.
Learn more about how to find time and space complexity quickly using problem constraints.
How many times is a tree node visited in a depth first search?
Recommended Readings
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!