Facebook Pixel

792. Number of Matching Subsequences

Problem Description

You are given a string s and an array of strings called words. Your task is to count how many strings in the words array are subsequences of the string s.

A subsequence is formed by deleting some (possibly zero) characters from the original string without changing the order of the remaining characters. For example, "ace" is a subsequence of "abcde" because you can delete 'b' and 'd' from "abcde" to get "ace", maintaining the relative order of 'a', 'c', and 'e'.

The solution uses a clever approach with queues organized by the first character of each word. It groups words by their current first character that needs to be matched. As we iterate through string s, when we encounter a character c, we check all words currently waiting for that character. If a word has only one character left (meaning we just matched its last character), we increment our answer. Otherwise, we remove the matched character from the word and place it in the queue corresponding to its new first character.

For example, if s = "abcde" and words = ["a", "bb", "acd", "ace"]:

  • Initially, words are grouped: 'a' → ["a", "acd", "ace"], 'b' → ["bb"]
  • When processing 'a' from s, we match it with all three words starting with 'a'
  • "a" is complete (length 1), so answer increases by 1
  • "acd" becomes "cd" and moves to queue 'c'
  • "ace" becomes "ce" and also moves to queue 'c'
  • The process continues through the string, tracking which words become complete subsequences

The final answer is the count of words that were successfully matched as subsequences.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The naive approach would be to check each word individually against string s to see if it's a subsequence, which would require iterating through s multiple times. However, we can be more efficient by processing all words simultaneously in a single pass through s.

The key insight is that when we're checking if multiple words are subsequences of s, we're essentially tracking where each word is in its matching process. At any point, each word is waiting for a specific character (the next unmatched character in that word).

Think of it like multiple people reading the same book but looking for different sequences of letters. Instead of having each person read the entire book separately, we can have one person read the book aloud while everyone listens and marks when they hear the letter they're currently looking for.

This leads us to organize words by what character they're currently waiting for. We use a dictionary where each key is a character, and the value is a queue of words currently waiting for that character. When we encounter a character c in string s, we know exactly which words can make progress - those in the queue for character c.

For each word that matches the current character:

  • If it was the last character needed (word length is 1), we've found a complete subsequence
  • Otherwise, we remove the matched character and put the remaining part of the word into the queue for its new first character

This way, we only traverse string s once, and for each character, we only process the words that can actually make progress. The use of deques ensures efficient removal from the front and addition to the back of our waiting lists.

The beauty of this approach is that it transforms a problem that seems to require multiple string traversals into a single-pass algorithm with efficient bookkeeping.

Learn more about Trie, Binary Search, Dynamic Programming and Sorting patterns.

Solution Approach

The implementation uses a dictionary of deques to efficiently track and process words waiting for specific characters.

Data Structure Setup:

  • Create a defaultdict(deque) called d where each key represents a character and each value is a deque containing words currently waiting for that character
  • Initialize by iterating through all words and placing each word in the deque corresponding to its first character: d[w[0]].append(w)

Main Processing Loop: Iterate through each character c in string s:

  1. Process all words waiting for character c:

    • Use for _ in range(len(d[c])) to process exactly the current number of words in the queue (important because we'll be modifying the queue)
    • For each word, remove it from the front of the queue: t = d[c].popleft()
  2. Check if word is complete:

    • If len(t) == 1, this was the last character needed for this word
    • Increment the answer counter: ans += 1
  3. Handle incomplete words:

    • If the word has more characters to match, remove the first character: t[1:]
    • Add the remaining portion to the queue for its new first character: d[t[1]].append(t[1:])

Example walkthrough with s = "abcde" and words = ["a", "bb", "acd"]:

Initial state: d = {'a': ["a", "acd"], 'b': ["bb"]}

  • Process 'a':

    • "a" → length is 1, increment answer to 1
    • "acd" → becomes "cd", added to d['c']
  • Process 'b':

    • "bb" → becomes "b", added to d['b']
  • Process 'c':

    • "cd" → becomes "d", added to d['d']
  • Process 'd':

    • "d" → length is 1, increment answer to 2
  • Process 'e':

    • Nothing waiting for 'e'

Final answer: 2 (words "a" and "acd" are subsequences)

The algorithm's efficiency comes from only examining words that can make progress with the current character, avoiding redundant checks and achieving O(n + m) time complexity where n is the length of s and m is the total length of all words.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with s = "abcde" and words = ["ace", "aec", "bb"].

Step 1: Initialize the dictionary of deques We group words by their first character:

  • d['a'] = ["ace", "aec"] (both start with 'a')
  • d['b'] = ["bb"] (starts with 'b')

Step 2: Process each character in s = "abcde"

Processing 'a' (index 0):

  • Check d['a']: has 2 words waiting
  • Process "ace":
    • Remove from queue, length > 1
    • Remove first char → "ce"
    • Add to d['c']
  • Process "aec":
    • Remove from queue, length > 1
    • Remove first char → "ec"
    • Add to d['e']
  • State: d['c'] = ["ce"], d['e'] = ["ec"], d['b'] = ["bb"]

Processing 'b' (index 1):

  • Check d['b']: has 1 word waiting
  • Process "bb":
    • Remove from queue, length > 1
    • Remove first char → "b"
    • Add to d['b']
  • State: d['c'] = ["ce"], d['e'] = ["ec"], d['b'] = ["b"]

Processing 'c' (index 2):

  • Check d['c']: has 1 word waiting
  • Process "ce":
    • Remove from queue, length > 1
    • Remove first char → "e"
    • Add to d['e']
  • State: d['e'] = ["ec", "e"], d['b'] = ["b"]

Processing 'd' (index 3):

  • Check d['d']: empty, nothing to process
  • State unchanged

Processing 'e' (index 4):

  • Check d['e']: has 2 words waiting
  • Process "ec":
    • Remove from queue, length > 1
    • Remove first char → "c"
    • Add to d['c']
  • Process "e":
    • Remove from queue, length = 1
    • Complete match! Increment answer to 1
  • State: d['c'] = ["c"], d['b'] = ["b"]

Result:

  • Answer = 1 (only "ace" is a subsequence of "abcde")
  • "aec" is not a subsequence (needs 'e' before 'c')
  • "bb" is not a subsequence (only one 'b' in s)

The algorithm efficiently tracks each word's progress through the string in a single pass, only processing words that can advance with the current character.

Solution Implementation

1from collections import defaultdict, deque
2from typing import List
3
4class Solution:
5    def numMatchingSubseq(self, s: str, words: List[str]) -> int:
6        # Dictionary to store words grouped by their first character
7        # Key: character, Value: deque of words starting with that character
8        waiting_dict = defaultdict(deque)
9      
10        # Group all words by their first character
11        for word in words:
12            waiting_dict[word[0]].append(word)
13      
14        # Counter for words that are subsequences of s
15        matching_count = 0
16      
17        # Process each character in string s
18        for char in s:
19            # Process all words currently waiting for this character
20            # Use len() to get current queue size to avoid infinite loop
21            current_queue_size = len(waiting_dict[char])
22            for _ in range(current_queue_size):
23                # Get the word from front of queue
24                current_word = waiting_dict[char].popleft()
25              
26                # If this was the last character needed, word is a subsequence
27                if len(current_word) == 1:
28                    matching_count += 1
29                else:
30                    # Move word to queue for its next required character
31                    # Remove first character and add to appropriate queue
32                    next_char = current_word[1]
33                    remaining_word = current_word[1:]
34                    waiting_dict[next_char].append(remaining_word)
35      
36        return matching_count
37
1class Solution {
2    public int numMatchingSubseq(String s, String[] words) {
3        // Create an array of 26 deques, one for each letter a-z
4        // Each deque will store words that are currently waiting for that letter
5        Deque<String>[] waitingQueues = new Deque[26];
6        Arrays.setAll(waitingQueues, index -> new ArrayDeque<>());
7      
8        // Initialize: Add each word to the deque corresponding to its first character
9        for (String word : words) {
10            int firstCharIndex = word.charAt(0) - 'a';
11            waitingQueues[firstCharIndex].add(word);
12        }
13      
14        int matchingSubsequenceCount = 0;
15      
16        // Process each character in string s
17        for (char currentChar : s.toCharArray()) {
18            // Get the deque of words waiting for the current character
19            int charIndex = currentChar - 'a';
20            Deque<String> currentQueue = waitingQueues[charIndex];
21          
22            // Process all words currently in this queue
23            // Use size snapshot to avoid processing newly added words in this iteration
24            int queueSize = currentQueue.size();
25            for (int i = 0; i < queueSize; i++) {
26                String currentWord = currentQueue.pollFirst();
27              
28                if (currentWord.length() == 1) {
29                    // If word has only one character left, it's fully matched
30                    matchingSubsequenceCount++;
31                } else {
32                    // Move the word to the queue for its next character
33                    String remainingWord = currentWord.substring(1);
34                    int nextCharIndex = remainingWord.charAt(0) - 'a';
35                    waitingQueues[nextCharIndex].offer(remainingWord);
36                }
37            }
38        }
39      
40        return matchingSubsequenceCount;
41    }
42}
43
1class Solution {
2public:
3    int numMatchingSubseq(string s, vector<string>& words) {
4        // Create 26 queues, one for each letter a-z
5        // Each queue stores words that are currently waiting for that letter
6        vector<queue<string>> waitingQueues(26);
7      
8        // Initialize: put each word in the queue corresponding to its first character
9        for (const auto& word : words) {
10            waitingQueues[word[0] - 'a'].emplace(word);
11        }
12      
13        int matchedCount = 0;
14      
15        // Process each character in string s
16        for (const char& currentChar : s) {
17            // Get the queue of words waiting for the current character
18            auto& currentQueue = waitingQueues[currentChar - 'a'];
19          
20            // Process all words currently in this queue
21            // Use queue size to avoid processing newly added words in this iteration
22            int queueSize = currentQueue.size();
23            for (int i = 0; i < queueSize; ++i) {
24                // Get and remove the word from front of queue
25                string currentWord = currentQueue.front();
26                currentQueue.pop();
27              
28                // If this was the last character needed, word is a subsequence
29                if (currentWord.size() == 1) {
30                    ++matchedCount;
31                }
32                else {
33                    // Move word to queue for its next required character
34                    // Remove first character and add to appropriate queue
35                    waitingQueues[currentWord[1] - 'a'].emplace(currentWord.substr(1));
36                }
37            }
38        }
39      
40        return matchedCount;
41    }
42};
43
1function numMatchingSubseq(s: string, words: string[]): number {
2    // Create 26 queues, one for each letter a-z
3    // Each queue stores words that are currently waiting for that letter
4    const waitingQueues: string[][] = Array.from({ length: 26 }, () => []);
5  
6    // Initialize: put each word in the queue corresponding to its first character
7    for (const word of words) {
8        const firstCharIndex = word.charCodeAt(0) - 'a'.charCodeAt(0);
9        waitingQueues[firstCharIndex].push(word);
10    }
11  
12    let matchedCount = 0;
13  
14    // Process each character in string s
15    for (const currentChar of s) {
16        // Get the queue of words waiting for the current character
17        const charIndex = currentChar.charCodeAt(0) - 'a'.charCodeAt(0);
18        const currentQueue = waitingQueues[charIndex];
19      
20        // Process all words currently in this queue
21        // Use queue size to avoid processing newly added words in this iteration
22        const queueSize = currentQueue.length;
23        for (let i = 0; i < queueSize; i++) {
24            // Get and remove the word from front of queue
25            const currentWord = currentQueue.shift()!;
26          
27            // If this was the last character needed, word is a subsequence
28            if (currentWord.length === 1) {
29                matchedCount++;
30            } else {
31                // Move word to queue for its next required character
32                // Remove first character and add to appropriate queue
33                const nextCharIndex = currentWord.charCodeAt(1) - 'a'.charCodeAt(0);
34                waitingQueues[nextCharIndex].push(currentWord.substring(1));
35            }
36        }
37    }
38  
39    return matchedCount;
40}
41

Time and Space Complexity

Time Complexity: O(n + m * l)

  • n is the length of string s
  • m is the total number of words in the words list
  • l is the average length of words

The algorithm iterates through each character in string s once (O(n)). For each character, it processes all words waiting for that character. In the worst case, each word gets processed character by character, and each character operation (slicing t[1:], appending to deque) takes O(l) time. Since each word is processed at most once for each of its characters, the total work done for all words is O(m * l). Therefore, the overall time complexity is O(n + m * l).

Space Complexity: O(m * l)

  • The defaultdict d stores all words distributed across different character keys
  • In the worst case, all m words are stored in the dictionary
  • Each word takes O(l) space on average
  • The slicing operation t[1:] creates new strings, but at any point, the total space used for all word fragments doesn't exceed O(m * l) since we're just storing progressively smaller suffixes of the original words

The space complexity is dominated by storing all the words (or their suffixes) in the dictionary structure, resulting in O(m * l) space complexity.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Modifying Queue While Iterating

One of the most common mistakes is directly iterating over the deque while modifying it:

Incorrect approach:

# This will cause issues!
for word in waiting_dict[char]:
    current_word = waiting_dict[char].popleft()
    # ... rest of logic

This creates problems because:

  • The iterator gets confused when the deque size changes during iteration
  • You might skip words or process words multiple times
  • Could lead to infinite loops if you're adding back to the same queue

Correct approach:

# Get the current size first
current_queue_size = len(waiting_dict[char])
for _ in range(current_queue_size):
    current_word = waiting_dict[char].popleft()
    # ... rest of logic

2. String Slicing Inefficiency

Using string slicing current_word[1:] creates a new string object each time, which can be inefficient for long words:

Alternative approach using indices:

from collections import defaultdict, deque

class Solution:
    def numMatchingSubseq(self, s: str, words: List[str]) -> int:
        # Store tuples of (word, current_index)
        waiting_dict = defaultdict(deque)
      
        for word in words:
            waiting_dict[word[0]].append((word, 0))
      
        matching_count = 0
      
        for char in s:
            current_queue_size = len(waiting_dict[char])
            for _ in range(current_queue_size):
                word, idx = waiting_dict[char].popleft()
                idx += 1  # Move to next character
              
                if idx == len(word):
                    matching_count += 1
                else:
                    next_char = word[idx]
                    waiting_dict[next_char].append((word, idx))
      
        return matching_count

This approach avoids creating new string objects and instead tracks the current position in each word.

3. Not Handling Empty Strings

If the words array contains empty strings, the code will fail:

Problem:

waiting_dict[word[0]].append(word)  # IndexError if word is empty

Solution:

for word in words:
    if word:  # Check for non-empty string
        waiting_dict[word[0]].append(word)
    else:
        # Empty string is always a subsequence
        matching_count += 1

4. Using Regular Dictionary Instead of defaultdict

Using a regular dictionary requires checking if keys exist before accessing:

Inefficient approach:

waiting_dict = {}
for word in words:
    if word[0] not in waiting_dict:
        waiting_dict[word[0]] = deque()
    waiting_dict[word[0]].append(word)

The defaultdict(deque) automatically creates an empty deque for new keys, making the code cleaner and preventing KeyError exceptions.

Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

What is the best way of checking if an element exists in an unsorted array once in terms of time complexity? Select the best that applies.


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More