792. Number of Matching Subsequences
Problem Description
The problem requires you to determine how many strings in the array words
are subsequences of the string s
. A subsequence of a string is defined as a sequence that can be derived from the original string by removing some characters without altering the order of the remaining characters. For instance, "ace" is a subsequence of "abcde" because you can remove 'b' and 'd' to get "ace". Your goal is to return the count of strings in words
that can be formed in this way from the string s
.
Intuition
To solve this problem, the key idea is to track the progress of each word in the words
array as we iterate through the string s
. Instead of checking each word against the entire string s
at once, which would be inefficient, we use a smarter approach. We create a dictionary that maps the first character of each word to a queue holding tuples that represent the words. Each tuple consists of an index i
and a position j
, where i
is the index of the word in the words
array, and j
is the position up to which we have successfully matched the word in words[i]
with characters in s
.
As we iterate through the string s
, we get the queue of words (tuples) that are expecting the current character. For each word in the queue, if we find a matching character in s
, we move to the next character in the word by incrementing j
. If we reach the end of a word, it means we have successfully found it as a subsequence in s
, and we increment our answer count (ans
).
We use a queue because it efficiently supports removing elements from the front, allowing us to process the words in the order they are expecting to match characters from s
. By organizing the words based on the next expected character, we only consider the relevant part of words
for each character in s
, making the process efficient.
Learn more about Trie, Binary Search, Dynamic Programming and Sorting patterns.
Solution Approach
The solution approach makes use of the following algorithms, data structures, and patterns:
-
Dictionary with Queue: A
defaultdict(deque)
from Python'scollections
module is used to map the first character of each word inwords
to a queue containing tuples of the form(i, j)
. Herei
is the index of the word in the arraywords
, andj
is the current character position we have matched in the word so far. -
Character Iteration and Matching: The solution iterates through each character
c
in the strings
and uses it to reference the queue of tuples waiting for that character (d[c]
). -
Queue Processing: Within this iteration, another loop runs for the length of the queue
d[c]
, which processes each tuple. For each tuple(i, j)
dequeued fromd[c]
, thej
is incremented to point to the next character since the current onec
is a match. -
Subsequence Completion Check: If the incremented
j
equals the length of the wordwords[i]
, it means all the characters inwords[i]
are matched in sequence with some characters ofs
, thereforewords[i]
is a subsequence ofs
. When this condition is met, the answer counterans
is incremented. -
Updating the Dictionary with Queues: If
j
is not equal to the length of the word, the tuple(i, j)
is not completed yet and must wait for the next characterwords[i][j]
. Hence, it is appended back into the queue corresponding to this next expected characterd[words[i][j]]
.
In essence, the algorithm processes s
character by character, moving 'pointers' through each word in words
when a matching character is found, and recounts every time a word is completely matched as a subsequence.
The algorithm complexity is mainly O(N + M), where N is the length of the string s
, and M is the total length of all words in words
. It's because each character of each word is processed at most once when the corresponding character in s
is found.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate the solution approach in action. Consider the string s = "abcde"
and the list of words words = ["a", "bb", "acd", "ace"]
. We want to find how many words are subsequences of s
.
To begin, we initialize our dictionary with queues, d
, that will hold the progress for each word in words
as we iterate over s
. For our example, the initial state of the dictionary will be as follows since each queue will contain tuples with the index of the word in words
and the initial character position 0
.
{ 'a': deque([(0, 0), (2, 0), (3, 0)]), 'b': deque([(1, 0)]), }
Now we start iterating over s
, character by character:
-
For the first character
'a'
ins
, we look ind
to find the queue for'a'
, which isdeque([(0, 0), (2, 0), (3, 0)])
. We have three words waiting to match the character'a'
ins
.a. dequeue (0, 0): Since 'a' in
s
matches the first character ofwords[0]
which is "a", it's a complete match. We incrementans
to 1.b. dequeue (2, 0): Now we're looking at
words[2]
which is "acd". The character 'a' matched, so we incrementj
to 1 and requeue the tuple since "acd" is not yet fully matched. Now the tuple is (2, 1).c. dequeue (3, 0): Similarly, for
words[3]
which is "ace", we have a match on 'a', incrementj
to 1, and requeue the tuple to (3, 1).Now our dictionary looks like this:
{ 'b': deque([(1, 0)]), 'c': deque([(2, 1), (3, 1)]), }
-
The second character of
s
is'b'
. We check our dictionary for the queue under'b'
, which isdeque([(1, 0)])
.a. dequeue (1, 0): We look at
words[1]
which is "bb". Since 'b' matches the first character, we incrementj
to 1 and enqueue the tuple (1, 1) back because it's still awaiting another 'b'.And the dictionary updates to:
{ 'c': deque([(2, 1), (3, 1)]), 'b': deque([(1, 1)]), }
-
Next,
s
gives us 'c'. We process the queue under'c'
, which isdeque([(2, 1), (3, 1)])
.a. dequeue (2, 1): "acd" in words[2] has matched 'a' and now 'c', so
j
becomes 2. Sincej
is not equal to the length of "acd", we enqueue tuple (2, 2) since we are now waiting for 'd'.b. dequeue (3, 1): Similarly, "ace" in words[3] has matched 'a' and 'c', and
j
is incremented to 2, indicating we are waiting for 'e' next.Updated dictionary:
{ 'b': deque([(1, 1)]), 'd': deque([(2, 2)]), 'e': deque([(3, 2)]), }
-
The next character in
s
is'd'
, which affects the queue under'd'
.a. dequeue (2, 2): The tuple for "acd" is fully matched now because 'd' was the last needed character, so we increment
ans
by 1 (totalans
is now 2).Dictionary after the operation:
{ 'b': deque([(1, 1)]), 'e': deque([(3, 2)]), }
-
The last character in
s
is'e'
, and we process the queue under'e'
.a. dequeue (3, 2): Finally, "ace" is fully matched, and our total count
ans
is incremented by one more (totalans
is now 3).
The end state of the dictionary is not relevant anymore since we have processed all of s
. The algorithm finishes with ans = 3
because we've found that three out of four words in words
are subsequences of s
. The word "bb" was not a subsequence since the second 'b' couldn't be matched.
Solution Implementation
1from collections import defaultdict, deque
2
3class Solution:
4 def num_matching_subseq(self, string: str, words: List[str]) -> int:
5 """
6 Count the number of words from the list 'words' that are subsequences of the string 'string'.
7
8 Args:
9 string (str): the string to be searched for subsequences.
10 words (List[str]): the list of words to check as potential subsequences.
11
12 Returns:
13 int: the number of words that are subsequences of 'string'.
14 """
15
16 # Initialize a dictionary to hold queues of tuples for each character in words.
17 # Each tuple contains an index of a word in words and the next character index to search for.
18 char_to_word_indices = defaultdict(deque)
19
20 # Populate the dictionary with initial character indices for each word.
21 for index, word in enumerate(words):
22 char_to_word_indices[word[0]].append((index, 0))
23
24 # Counter for the number of matching subsequences found.
25 matching_count = 0
26
27 # Loop through each character in the input string.
28 for char in string:
29 # Process all tuples (word index, character index) for the current character.
30 for _ in range(len(char_to_word_indices[char])):
31 word_index, char_index = char_to_word_indices[char].popleft()
32 char_index += 1 # Move to the next character in the current word
33
34 # If we reached the end of the word, increment the count of matching subsequences.
35 if char_index == len(words[word_index]):
36 matching_count += 1
37 else:
38 # Else, append the tuple (word index, next character index) to the appropriate list.
39 next_char = words[word_index][char_index]
40 char_to_word_indices[next_char].append((word_index, char_index))
41
42 # Return the total count of matching subsequences.
43 return matching_count
44```
45
46Remember to include the necessary import statement at the top for `List` from `typing`:
47
48```python
49from typing import List
50
1class Solution {
2 public int numMatchingSubseq(String s, String[] words) {
3 // Create an array of dequeues to store positions of characters in words
4 Deque<int[]>[] waitingChars = new Deque[26];
5
6 // Initialize each dequeue in the array
7 Arrays.setAll(waitingChars, k -> new ArrayDeque<>());
8
9 // Fill the deques with the first characters of each word in words array
10 for (int i = 0; i < words.length; ++i) {
11 waitingChars[words[i].charAt(0) - 'a'].offer(new int[]{i, 0});
12 }
13
14 int matches = 0; // Store the number of matching subsequences
15
16 // Iterate through each character of the string s
17 for (char c : s.toCharArray()) {
18 // Get the deque corresponding to the current character
19 Deque<int[]> queue = waitingChars[c - 'a'];
20
21 // Process all elements in queue
22 for (int i = queue.size(); i > 0; --i) {
23 // Dequeue the first element
24 int[] pair = queue.pollFirst();
25 int wordIndex = pair[0], charIndex = pair[1] + 1;
26
27 // If the whole word is found, increment the match count
28 if (charIndex == words[wordIndex].length()) {
29 ++matches;
30 } else {
31 // Otherwise, enqueue the pair back with the updated character index
32 waitingChars[words[wordIndex].charAt(charIndex) - 'a'].offer(new int[]{wordIndex, charIndex});
33 }
34 }
35 }
36 // Return the total number of matching subsequences found
37 return matches;
38 }
39}
40
1class Solution {
2public:
3 // Function to count the number of words from the `words` vector that are subsequences of the string `s`
4 int numMatchingSubseq(string s, vector<string>& words) {
5 // Create a vector of queues to store pairs of indices.
6 // Each element represents a queue for words starting with the corresponding character ('a' to 'z').
7 vector<queue<pair<int, int>>> charQueues(26);
8
9 // Populate the queues with pairs: the index of the word in the `words` vector and the character index being checked.
10 for (int i = 0; i < words.size(); ++i) {
11 charQueues[words[i][0] - 'a'].emplace(i, 0);
12 }
13
14 // Initialize the answer to 0.
15 int ans = 0;
16
17 // Iterate over each character in the string `s`
18 for (char& c : s) {
19 // Reference to the queue corresponding to the current character
20 auto& queue = charQueues[c - 'a'];
21
22 // Process each pair in the queue
23 for (int t = queue.size(); t > 0; --t) {
24 // Get the front pair (word index, char index) and remove it from the queue
25 auto [wordIndex, charIndex] = queue.front();
26 queue.pop();
27
28 // Increment the char index to check the next character of the word
29 ++charIndex;
30
31 // If all characters in the word have been found in `s`, increment the answer
32 if (charIndex == words[wordIndex].size()) {
33 ++ans;
34 } else {
35 // Otherwise, push the pair (word index, next char index) to the corresponding queue
36 // for the next character to be checked in `words[wordIndex]`
37 charQueues[words[wordIndex][charIndex] - 'a'].emplace(wordIndex, charIndex);
38 }
39 }
40 }
41 // Return the total count of matching subsequences
42 return ans;
43 }
44};
45
1// Define a method to count the number of words from the `words` array that are subsequences of the string `s`.
2function numMatchingSubseq(s: string, words: string[]): number {
3 // Create an array of queues to store pairs of indices.
4 // Each element represents a queue for words starting with the corresponding character ('a' to 'z').
5 const charQueues: Array<Queue<[number, number]>> = Array.from(
6 { length: 26 },
7 () => new Queue<[number, number]>());
8
9 // Populate the queues with pairs: the index of the word in the `words` array and the character index being checked.
10 for (let i = 0; i < words.length; ++i) {
11 const charIndex = words[i].charCodeAt(0) - 'a'.charCodeAt(0);
12 charQueues[charIndex].enqueue([i, 0]);
13 }
14
15 // Initialize the answer to 0.
16 let ans = 0;
17
18 // Iterate over each character in the string `s`.
19 for (const c of s) {
20 // Reference to the queue corresponding to the current character.
21 const queue = charQueues[c.charCodeAt(0) - 'a'.charCodeAt(0)];
22
23 // Process each pair in the queue.
24 const size = queue.size();
25 for (let t = 0; t < size; ++t) {
26 // Get the front pair (word index, char index) and remove it from the queue.
27 const [wordIndex, charIndex] = queue.dequeue();
28
29 // Increment the char index to check the next character of the word.
30 const nextCharIndex = charIndex + 1;
31
32 // If all characters in the word have been found in `s`, increment the answer.
33 if (nextCharIndex === words[wordIndex].length) {
34 ans++;
35 } else {
36 // Otherwise, push the pair (word index, next char index) to the corresponding queue
37 // for the next character to be checked in `words[wordIndex]`.
38 const nextCharCode = words[wordIndex].charCodeAt(nextCharIndex) - 'a'.charCodeAt(0);
39 charQueues[nextCharCode].enqueue([wordIndex, nextCharIndex]);
40 }
41 }
42 }
43
44 // Return the total count of matching subsequences.
45 return ans;
46}
47
48// Helper class for Queue
49class Queue<T> {
50 private data: T[] = [];
51
52 enqueue(item: T): void {
53 this.data.push(item);
54 }
55
56 dequeue(): T {
57 return this.data.shift();
58 }
59
60 size(): number {
61 return this.data.length;
62 }
63}
64
Time and Space Complexity
Time Complexity
The time complexity of the given code can be analyzed as follows:
- Initializing the dictionary
d
with adeque
for each unique character has a space complexity that depends on the number of unique characters inwords
list, but initializing the deques themselves takes O(1) time. - The first loop goes through all the words in
words
list to populate the dictionaryd
. Each word is processed in O(1) time to append its index and starting character position to the corresponding deque, resulting in O(n) time complexity for this step, where n is the number of words inwords
. - The second loop goes through each character in the input string
s
. For each characterc
, it processes all indices(i, j)
ind[c]
. Since each word is exactly processed once, and each character ins
results in at most one deque operation (eitherpopleft
orappend
), the number of operations is proportional to the length ofs
plus the total length of all words. Hence, the time complexity, in this case, is O(m + k), where m is the length ofs
and k is the sum of lengths of all words inwords
.
Thus, the overall time complexity combines the complexities of both loops, resulting in O(n + m + k).
Space Complexity
The space complexity of the code is determined by:
- The dictionary
d
which contains as many keys as there are unique starting characters in thewords
, and each key has adeque
that contains at most n tuples(i, j)
, wheren
is the number of words. However, since each word can only be in one deque at a time, it's more precise to say that it contains a total of n tuples across all deques. - There are no other data structures that require significant space.
Therefore, the space complexity is O(n), where n is the number of words in words
.
Learn more about how to find time and space complexity quickly using problem constraints.
A heap is a ...?
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Want a Structured Path to Master System Design Too? Don’t Miss This!