792. Number of Matching Subsequences


Problem Description

The problem requires you to determine how many strings in the array words are subsequences of the string s. A subsequence of a string is defined as a sequence that can be derived from the original string by removing some characters without altering the order of the remaining characters. For instance, "ace" is a subsequence of "abcde" because you can remove 'b' and 'd' to get "ace". Your goal is to return the count of strings in words that can be formed in this way from the string s.

Intuition

To solve this problem, the key idea is to track the progress of each word in the words array as we iterate through the string s. Instead of checking each word against the entire string s at once, which would be inefficient, we use a smarter approach. We create a dictionary that maps the first character of each word to a queue holding tuples that represent the words. Each tuple consists of an index i and a position j, where i is the index of the word in the words array, and j is the position up to which we have successfully matched the word in words[i] with characters in s.

As we iterate through the string s, we get the queue of words (tuples) that are expecting the current character. For each word in the queue, if we find a matching character in s, we move to the next character in the word by incrementing j. If we reach the end of a word, it means we have successfully found it as a subsequence in s, and we increment our answer count (ans).

We use a queue because it efficiently supports removing elements from the front, allowing us to process the words in the order they are expecting to match characters from s. By organizing the words based on the next expected character, we only consider the relevant part of words for each character in s, making the process efficient.

Solution Approach

The solution approach makes use of the following algorithms, data structures, and patterns:

  • Dictionary with Queue: A defaultdict(deque) from Python's collections module is used to map the first character of each word in words to a queue containing tuples of the form (i, j). Here i is the index of the word in the array words, and j is the current character position we have matched in the word so far.

  • Character Iteration and Matching: The solution iterates through each character c in the string s and uses it to reference the queue of tuples waiting for that character (d[c]).

  • Queue Processing: Within this iteration, another loop runs for the length of the queue d[c], which processes each tuple. For each tuple (i, j) dequeued from d[c], the j is incremented to point to the next character since the current one c is a match.

  • Subsequence Completion Check: If the incremented j equals the length of the word words[i], it means all the characters in words[i] are matched in sequence with some characters of s, therefore words[i] is a subsequence of s. When this condition is met, the answer counter ans is incremented.

  • Updating the Dictionary with Queues: If j is not equal to the length of the word, the tuple (i, j) is not completed yet and must wait for the next character words[i][j]. Hence, it is appended back into the queue corresponding to this next expected character d[words[i][j]].

In essence, the algorithm processes s character by character, moving 'pointers' through each word in words when a matching character is found, and recounts every time a word is completely matched as a subsequence.

The algorithm complexity is mainly O(N + M), where N is the length of the string s, and M is the total length of all words in words. It's because each character of each word is processed at most once when the corresponding character in s is found.

💪
Level Up Your
Algo Skills

Example Walkthrough

Let's walk through a small example to illustrate the solution approach in action. Consider the string s = "abcde" and the list of words words = ["a", "bb", "acd", "ace"]. We want to find how many words are subsequences of s.

To begin, we initialize our dictionary with queues, d, that will hold the progress for each word in words as we iterate over s. For our example, the initial state of the dictionary will be as follows since each queue will contain tuples with the index of the word in words and the initial character position 0.

1{
2  'a': deque([(0, 0), (2, 0), (3, 0)]),
3  'b': deque([(1, 0)]),
4}

Now we start iterating over s, character by character:

  1. For the first character 'a' in s, we look in d to find the queue for 'a', which is deque([(0, 0), (2, 0), (3, 0)]). We have three words waiting to match the character 'a' in s.

    a. dequeue (0, 0): Since 'a' in s matches the first character of words[0] which is "a", it's a complete match. We increment ans to 1.

    b. dequeue (2, 0): Now we're looking at words[2] which is "acd". The character 'a' matched, so we increment j to 1 and requeue the tuple since "acd" is not yet fully matched. Now the tuple is (2, 1).

    c. dequeue (3, 0): Similarly, for words[3] which is "ace", we have a match on 'a', increment j to 1, and requeue the tuple to (3, 1).

    Now our dictionary looks like this:

    1{
    2  'b': deque([(1, 0)]),
    3  'c': deque([(2, 1), (3, 1)]),
    4}
  2. The second character of s is 'b'. We check our dictionary for the queue under 'b', which is deque([(1, 0)]).

    a. dequeue (1, 0): We look at words[1] which is "bb". Since 'b' matches the first character, we increment j to 1 and enqueue the tuple (1, 1) back because it's still awaiting another 'b'.

    And the dictionary updates to:

    1{
    2  'c': deque([(2, 1), (3, 1)]),
    3  'b': deque([(1, 1)]),
    4}
  3. Next, s gives us 'c'. We process the queue under 'c', which is deque([(2, 1), (3, 1)]).

    a. dequeue (2, 1): "acd" in words[2] has matched 'a' and now 'c', so j becomes 2. Since j is not equal to the length of "acd", we enqueue tuple (2, 2) since we are now waiting for 'd'.

    b. dequeue (3, 1): Similarly, "ace" in words[3] has matched 'a' and 'c', and j is incremented to 2, indicating we are waiting for 'e' next.

    Updated dictionary:

    1{
    2  'b': deque([(1, 1)]),
    3  'd': deque([(2, 2)]),
    4  'e': deque([(3, 2)]),
    5}
  4. The next character in s is 'd', which affects the queue under 'd'.

    a. dequeue (2, 2): The tuple for "acd" is fully matched now because 'd' was the last needed character, so we increment ans by 1 (total ans is now 2).

    Dictionary after the operation:

    1{
    2  'b': deque([(1, 1)]),
    3  'e': deque([(3, 2)]),
    4}
  5. The last character in s is 'e', and we process the queue under 'e'.

    a. dequeue (3, 2): Finally, "ace" is fully matched, and our total count ans is incremented by one more (total ans is now 3).

The end state of the dictionary is not relevant anymore since we have processed all of s. The algorithm finishes with ans = 3 because we've found that three out of four words in words are subsequences of s. The word "bb" was not a subsequence since the second 'b' couldn't be matched.

Python Solution

1from collections import defaultdict, deque
2
3class Solution:
4    def num_matching_subseq(self, string: str, words: List[str]) -> int:
5        """
6        Count the number of words from the list 'words' that are subsequences of the string 'string'.
7
8        Args:
9        string (str): the string to be searched for subsequences.
10        words (List[str]): the list of words to check as potential subsequences.
11
12        Returns:
13        int: the number of words that are subsequences of 'string'.
14        """
15      
16        # Initialize a dictionary to hold queues of tuples for each character in words.
17        # Each tuple contains an index of a word in words and the next character index to search for.
18        char_to_word_indices = defaultdict(deque)
19      
20        # Populate the dictionary with initial character indices for each word.
21        for index, word in enumerate(words):
22            char_to_word_indices[word[0]].append((index, 0))
23      
24        # Counter for the number of matching subsequences found.
25        matching_count = 0
26      
27        # Loop through each character in the input string.
28        for char in string:
29            # Process all tuples (word index, character index) for the current character.
30            for _ in range(len(char_to_word_indices[char])):
31                word_index, char_index = char_to_word_indices[char].popleft()
32                char_index += 1  # Move to the next character in the current word
33          
34                # If we reached the end of the word, increment the count of matching subsequences.
35                if char_index == len(words[word_index]):
36                    matching_count += 1
37                else:
38                    # Else, append the tuple (word index, next character index) to the appropriate list.
39                    next_char = words[word_index][char_index]
40                    char_to_word_indices[next_char].append((word_index, char_index))
41      
42        # Return the total count of matching subsequences.
43        return matching_count

Remember to include the necessary import statement at the top for List from typing:

1from typing import List
2

Java Solution

1class Solution {
2    public int numMatchingSubseq(String s, String[] words) {
3        // Create an array of dequeues to store positions of characters in words
4        Deque<int[]>[] waitingChars = new Deque[26];
5      
6        // Initialize each dequeue in the array
7        Arrays.setAll(waitingChars, k -> new ArrayDeque<>());
8      
9        // Fill the deques with the first characters of each word in words array
10        for (int i = 0; i < words.length; ++i) {
11            waitingChars[words[i].charAt(0) - 'a'].offer(new int[]{i, 0});
12        }
13      
14        int matches = 0; // Store the number of matching subsequences
15      
16        // Iterate through each character of the string s
17        for (char c : s.toCharArray()) {
18            // Get the deque corresponding to the current character
19            Deque<int[]> queue = waitingChars[c - 'a'];
20          
21            // Process all elements in queue
22            for (int i = queue.size(); i > 0; --i) {
23                // Dequeue the first element
24                int[] pair = queue.pollFirst();
25                int wordIndex = pair[0], charIndex = pair[1] + 1;
26              
27                // If the whole word is found, increment the match count
28                if (charIndex == words[wordIndex].length()) {
29                    ++matches;
30                } else {
31                    // Otherwise, enqueue the pair back with the updated character index
32                    waitingChars[words[wordIndex].charAt(charIndex) - 'a'].offer(new int[]{wordIndex, charIndex});
33                }
34            }
35        }
36        // Return the total number of matching subsequences found
37        return matches;
38    }
39}
40

C++ Solution

1class Solution {
2public:
3    // Function to count the number of words from the `words` vector that are subsequences of the string `s`
4    int numMatchingSubseq(string s, vector<string>& words) {
5        // Create a vector of queues to store pairs of indices.
6        // Each element represents a queue for words starting with the corresponding character ('a' to 'z').
7        vector<queue<pair<int, int>>> charQueues(26);
8      
9        // Populate the queues with pairs: the index of the word in the `words` vector and the character index being checked.
10        for (int i = 0; i < words.size(); ++i) {
11            charQueues[words[i][0] - 'a'].emplace(i, 0);
12        }
13      
14        // Initialize the answer to 0.
15        int ans = 0;
16      
17        // Iterate over each character in the string `s`
18        for (char& c : s) {
19            // Reference to the queue corresponding to the current character
20            auto& queue = charQueues[c - 'a'];
21          
22            // Process each pair in the queue
23            for (int t = queue.size(); t > 0; --t) {
24                // Get the front pair (word index, char index) and remove it from the queue
25                auto [wordIndex, charIndex] = queue.front();
26                queue.pop();
27              
28                // Increment the char index to check the next character of the word
29                ++charIndex;
30              
31                // If all characters in the word have been found in `s`, increment the answer
32                if (charIndex == words[wordIndex].size()) {
33                    ++ans;
34                } else {
35                    // Otherwise, push the pair (word index, next char index) to the corresponding queue
36                    // for the next character to be checked in `words[wordIndex]`
37                    charQueues[words[wordIndex][charIndex] - 'a'].emplace(wordIndex, charIndex);
38                }
39            }
40        }
41        // Return the total count of matching subsequences
42        return ans;
43    }
44};
45

Typescript Solution

1// Define a method to count the number of words from the `words` array that are subsequences of the string `s`.
2function numMatchingSubseq(s: string, words: string[]): number {
3    // Create an array of queues to store pairs of indices.
4    // Each element represents a queue for words starting with the corresponding character ('a' to 'z').
5    const charQueues: Array<Queue<[number, number]>> = Array.from(
6        { length: 26 },
7        () => new Queue<[number, number]>());
8
9    // Populate the queues with pairs: the index of the word in the `words` array and the character index being checked.
10    for (let i = 0; i < words.length; ++i) {
11        const charIndex = words[i].charCodeAt(0) - 'a'.charCodeAt(0);
12        charQueues[charIndex].enqueue([i, 0]);
13    }
14
15    // Initialize the answer to 0.
16    let ans = 0;
17
18    // Iterate over each character in the string `s`.
19    for (const c of s) {
20        // Reference to the queue corresponding to the current character.
21        const queue = charQueues[c.charCodeAt(0) - 'a'.charCodeAt(0)];
22
23        // Process each pair in the queue.
24        const size = queue.size();
25        for (let t = 0; t < size; ++t) {
26            // Get the front pair (word index, char index) and remove it from the queue.
27            const [wordIndex, charIndex] = queue.dequeue();
28
29            // Increment the char index to check the next character of the word.
30            const nextCharIndex = charIndex + 1;
31
32            // If all characters in the word have been found in `s`, increment the answer.
33            if (nextCharIndex === words[wordIndex].length) {
34                ans++;
35            } else {
36                // Otherwise, push the pair (word index, next char index) to the corresponding queue
37                // for the next character to be checked in `words[wordIndex]`.
38                const nextCharCode = words[wordIndex].charCodeAt(nextCharIndex) - 'a'.charCodeAt(0);
39                charQueues[nextCharCode].enqueue([wordIndex, nextCharIndex]);
40            }
41        }
42    }
43
44    // Return the total count of matching subsequences.
45    return ans;
46}
47
48// Helper class for Queue
49class Queue<T> {
50    private data: T[] = [];
51
52    enqueue(item: T): void {
53        this.data.push(item);
54    }
55
56    dequeue(): T {
57        return this.data.shift();
58    }
59
60    size(): number {
61        return this.data.length;
62    }
63}
64

Time and Space Complexity

Time Complexity

The time complexity of the given code can be analyzed as follows:

  • Initializing the dictionary d with a deque for each unique character has a space complexity that depends on the number of unique characters in words list, but initializing the deques themselves takes O(1) time.
  • The first loop goes through all the words in words list to populate the dictionary d. Each word is processed in O(1) time to append its index and starting character position to the corresponding deque, resulting in O(n) time complexity for this step, where n is the number of words in words.
  • The second loop goes through each character in the input string s. For each character c, it processes all indices (i, j) in d[c]. Since each word is exactly processed once, and each character in s results in at most one deque operation (either popleft or append), the number of operations is proportional to the length of s plus the total length of all words. Hence, the time complexity, in this case, is O(m + k), where m is the length of s and k is the sum of lengths of all words in words.

Thus, the overall time complexity combines the complexities of both loops, resulting in O(n + m + k).

Space Complexity

The space complexity of the code is determined by:

  • The dictionary d which contains as many keys as there are unique starting characters in the words, and each key has a deque that contains at most n tuples (i, j), where n is the number of words. However, since each word can only be in one deque at a time, it's more precise to say that it contains a total of n tuples across all deques.
  • There are no other data structures that require significant space.

Therefore, the space complexity is O(n), where n is the number of words in words.

😈
Become an
Algo Monster

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫