792. Number of Matching Subsequences
Problem Description
You are given a string s
and an array of strings called words
. Your task is to count how many strings in the words
array are subsequences of the string s
.
A subsequence is formed by deleting some (possibly zero) characters from the original string without changing the order of the remaining characters. For example, "ace"
is a subsequence of "abcde"
because you can delete 'b'
and 'd'
from "abcde"
to get "ace"
, maintaining the relative order of 'a'
, 'c'
, and 'e'
.
The solution uses a clever approach with queues organized by the first character of each word. It groups words by their current first character that needs to be matched. As we iterate through string s
, when we encounter a character c
, we check all words currently waiting for that character. If a word has only one character left (meaning we just matched its last character), we increment our answer. Otherwise, we remove the matched character from the word and place it in the queue corresponding to its new first character.
For example, if s = "abcde"
and words = ["a", "bb", "acd", "ace"]
:
- Initially, words are grouped:
'a'
→ ["a"
,"acd"
,"ace"
],'b'
→ ["bb"
] - When processing
'a'
froms
, we match it with all three words starting with'a'
"a"
is complete (length 1), so answer increases by 1"acd"
becomes"cd"
and moves to queue'c'
"ace"
becomes"ce"
and also moves to queue'c'
- The process continues through the string, tracking which words become complete subsequences
The final answer is the count of words that were successfully matched as subsequences.
Intuition
The naive approach would be to check each word individually against string s
to see if it's a subsequence, which would require iterating through s
multiple times. However, we can be more efficient by processing all words simultaneously in a single pass through s
.
The key insight is that when we're checking if multiple words are subsequences of s
, we're essentially tracking where each word is in its matching process. At any point, each word is waiting for a specific character (the next unmatched character in that word).
Think of it like multiple people reading the same book but looking for different sequences of letters. Instead of having each person read the entire book separately, we can have one person read the book aloud while everyone listens and marks when they hear the letter they're currently looking for.
This leads us to organize words by what character they're currently waiting for. We use a dictionary where each key is a character, and the value is a queue of words currently waiting for that character. When we encounter a character c
in string s
, we know exactly which words can make progress - those in the queue for character c
.
For each word that matches the current character:
- If it was the last character needed (word length is 1), we've found a complete subsequence
- Otherwise, we remove the matched character and put the remaining part of the word into the queue for its new first character
This way, we only traverse string s
once, and for each character, we only process the words that can actually make progress. The use of deques ensures efficient removal from the front and addition to the back of our waiting lists.
The beauty of this approach is that it transforms a problem that seems to require multiple string traversals into a single-pass algorithm with efficient bookkeeping.
Learn more about Trie, Binary Search, Dynamic Programming and Sorting patterns.
Solution Approach
The implementation uses a dictionary of deques to efficiently track and process words waiting for specific characters.
Data Structure Setup:
- Create a
defaultdict(deque)
calledd
where each key represents a character and each value is a deque containing words currently waiting for that character - Initialize by iterating through all words and placing each word in the deque corresponding to its first character:
d[w[0]].append(w)
Main Processing Loop:
Iterate through each character c
in string s
:
-
Process all words waiting for character
c
:- Use
for _ in range(len(d[c]))
to process exactly the current number of words in the queue (important because we'll be modifying the queue) - For each word, remove it from the front of the queue:
t = d[c].popleft()
- Use
-
Check if word is complete:
- If
len(t) == 1
, this was the last character needed for this word - Increment the answer counter:
ans += 1
- If
-
Handle incomplete words:
- If the word has more characters to match, remove the first character:
t[1:]
- Add the remaining portion to the queue for its new first character:
d[t[1]].append(t[1:])
- If the word has more characters to match, remove the first character:
Example walkthrough with s = "abcde"
and words = ["a", "bb", "acd"]
:
Initial state: d = {'a': ["a", "acd"], 'b': ["bb"]}
-
Process
'a'
:"a"
→ length is 1, increment answer to 1"acd"
→ becomes"cd"
, added tod['c']
-
Process
'b'
:"bb"
→ becomes"b"
, added tod['b']
-
Process
'c'
:"cd"
→ becomes"d"
, added tod['d']
-
Process
'd'
:"d"
→ length is 1, increment answer to 2
-
Process
'e'
:- Nothing waiting for
'e'
- Nothing waiting for
Final answer: 2 (words "a"
and "acd"
are subsequences)
The algorithm's efficiency comes from only examining words that can make progress with the current character, avoiding redundant checks and achieving O(n + m) time complexity where n is the length of s
and m is the total length of all words.
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with s = "abcde"
and words = ["ace", "aec", "bb"]
.
Step 1: Initialize the dictionary of deques We group words by their first character:
d['a']
= ["ace"
,"aec"
] (both start with 'a')d['b']
= ["bb"
] (starts with 'b')
Step 2: Process each character in s = "abcde"
Processing 'a' (index 0):
- Check
d['a']
: has 2 words waiting - Process
"ace"
:- Remove from queue, length > 1
- Remove first char →
"ce"
- Add to
d['c']
- Process
"aec"
:- Remove from queue, length > 1
- Remove first char →
"ec"
- Add to
d['e']
- State:
d['c']
= ["ce"
],d['e']
= ["ec"
],d['b']
= ["bb"
]
Processing 'b' (index 1):
- Check
d['b']
: has 1 word waiting - Process
"bb"
:- Remove from queue, length > 1
- Remove first char →
"b"
- Add to
d['b']
- State:
d['c']
= ["ce"
],d['e']
= ["ec"
],d['b']
= ["b"
]
Processing 'c' (index 2):
- Check
d['c']
: has 1 word waiting - Process
"ce"
:- Remove from queue, length > 1
- Remove first char →
"e"
- Add to
d['e']
- State:
d['e']
= ["ec"
,"e"
],d['b']
= ["b"
]
Processing 'd' (index 3):
- Check
d['d']
: empty, nothing to process - State unchanged
Processing 'e' (index 4):
- Check
d['e']
: has 2 words waiting - Process
"ec"
:- Remove from queue, length > 1
- Remove first char →
"c"
- Add to
d['c']
- Process
"e"
:- Remove from queue, length = 1
- Complete match! Increment answer to 1
- State:
d['c']
= ["c"
],d['b']
= ["b"
]
Result:
- Answer = 1 (only
"ace"
is a subsequence of"abcde"
) "aec"
is not a subsequence (needs 'e' before 'c')"bb"
is not a subsequence (only one 'b' in s)
The algorithm efficiently tracks each word's progress through the string in a single pass, only processing words that can advance with the current character.
Solution Implementation
1from collections import defaultdict, deque
2from typing import List
3
4class Solution:
5 def numMatchingSubseq(self, s: str, words: List[str]) -> int:
6 # Dictionary to store words grouped by their first character
7 # Key: character, Value: deque of words starting with that character
8 waiting_dict = defaultdict(deque)
9
10 # Group all words by their first character
11 for word in words:
12 waiting_dict[word[0]].append(word)
13
14 # Counter for words that are subsequences of s
15 matching_count = 0
16
17 # Process each character in string s
18 for char in s:
19 # Process all words currently waiting for this character
20 # Use len() to get current queue size to avoid infinite loop
21 current_queue_size = len(waiting_dict[char])
22 for _ in range(current_queue_size):
23 # Get the word from front of queue
24 current_word = waiting_dict[char].popleft()
25
26 # If this was the last character needed, word is a subsequence
27 if len(current_word) == 1:
28 matching_count += 1
29 else:
30 # Move word to queue for its next required character
31 # Remove first character and add to appropriate queue
32 next_char = current_word[1]
33 remaining_word = current_word[1:]
34 waiting_dict[next_char].append(remaining_word)
35
36 return matching_count
37
1class Solution {
2 public int numMatchingSubseq(String s, String[] words) {
3 // Create an array of 26 deques, one for each letter a-z
4 // Each deque will store words that are currently waiting for that letter
5 Deque<String>[] waitingQueues = new Deque[26];
6 Arrays.setAll(waitingQueues, index -> new ArrayDeque<>());
7
8 // Initialize: Add each word to the deque corresponding to its first character
9 for (String word : words) {
10 int firstCharIndex = word.charAt(0) - 'a';
11 waitingQueues[firstCharIndex].add(word);
12 }
13
14 int matchingSubsequenceCount = 0;
15
16 // Process each character in string s
17 for (char currentChar : s.toCharArray()) {
18 // Get the deque of words waiting for the current character
19 int charIndex = currentChar - 'a';
20 Deque<String> currentQueue = waitingQueues[charIndex];
21
22 // Process all words currently in this queue
23 // Use size snapshot to avoid processing newly added words in this iteration
24 int queueSize = currentQueue.size();
25 for (int i = 0; i < queueSize; i++) {
26 String currentWord = currentQueue.pollFirst();
27
28 if (currentWord.length() == 1) {
29 // If word has only one character left, it's fully matched
30 matchingSubsequenceCount++;
31 } else {
32 // Move the word to the queue for its next character
33 String remainingWord = currentWord.substring(1);
34 int nextCharIndex = remainingWord.charAt(0) - 'a';
35 waitingQueues[nextCharIndex].offer(remainingWord);
36 }
37 }
38 }
39
40 return matchingSubsequenceCount;
41 }
42}
43
1class Solution {
2public:
3 int numMatchingSubseq(string s, vector<string>& words) {
4 // Create 26 queues, one for each letter a-z
5 // Each queue stores words that are currently waiting for that letter
6 vector<queue<string>> waitingQueues(26);
7
8 // Initialize: put each word in the queue corresponding to its first character
9 for (const auto& word : words) {
10 waitingQueues[word[0] - 'a'].emplace(word);
11 }
12
13 int matchedCount = 0;
14
15 // Process each character in string s
16 for (const char& currentChar : s) {
17 // Get the queue of words waiting for the current character
18 auto& currentQueue = waitingQueues[currentChar - 'a'];
19
20 // Process all words currently in this queue
21 // Use queue size to avoid processing newly added words in this iteration
22 int queueSize = currentQueue.size();
23 for (int i = 0; i < queueSize; ++i) {
24 // Get and remove the word from front of queue
25 string currentWord = currentQueue.front();
26 currentQueue.pop();
27
28 // If this was the last character needed, word is a subsequence
29 if (currentWord.size() == 1) {
30 ++matchedCount;
31 }
32 else {
33 // Move word to queue for its next required character
34 // Remove first character and add to appropriate queue
35 waitingQueues[currentWord[1] - 'a'].emplace(currentWord.substr(1));
36 }
37 }
38 }
39
40 return matchedCount;
41 }
42};
43
1function numMatchingSubseq(s: string, words: string[]): number {
2 // Create 26 queues, one for each letter a-z
3 // Each queue stores words that are currently waiting for that letter
4 const waitingQueues: string[][] = Array.from({ length: 26 }, () => []);
5
6 // Initialize: put each word in the queue corresponding to its first character
7 for (const word of words) {
8 const firstCharIndex = word.charCodeAt(0) - 'a'.charCodeAt(0);
9 waitingQueues[firstCharIndex].push(word);
10 }
11
12 let matchedCount = 0;
13
14 // Process each character in string s
15 for (const currentChar of s) {
16 // Get the queue of words waiting for the current character
17 const charIndex = currentChar.charCodeAt(0) - 'a'.charCodeAt(0);
18 const currentQueue = waitingQueues[charIndex];
19
20 // Process all words currently in this queue
21 // Use queue size to avoid processing newly added words in this iteration
22 const queueSize = currentQueue.length;
23 for (let i = 0; i < queueSize; i++) {
24 // Get and remove the word from front of queue
25 const currentWord = currentQueue.shift()!;
26
27 // If this was the last character needed, word is a subsequence
28 if (currentWord.length === 1) {
29 matchedCount++;
30 } else {
31 // Move word to queue for its next required character
32 // Remove first character and add to appropriate queue
33 const nextCharIndex = currentWord.charCodeAt(1) - 'a'.charCodeAt(0);
34 waitingQueues[nextCharIndex].push(currentWord.substring(1));
35 }
36 }
37 }
38
39 return matchedCount;
40}
41
Time and Space Complexity
Time Complexity: O(n + m * l)
n
is the length of strings
m
is the total number of words in thewords
listl
is the average length of words
The algorithm iterates through each character in string s
once (O(n)
). For each character, it processes all words waiting for that character. In the worst case, each word gets processed character by character, and each character operation (slicing t[1:]
, appending to deque) takes O(l)
time. Since each word is processed at most once for each of its characters, the total work done for all words is O(m * l)
. Therefore, the overall time complexity is O(n + m * l)
.
Space Complexity: O(m * l)
- The defaultdict
d
stores all words distributed across different character keys - In the worst case, all
m
words are stored in the dictionary - Each word takes
O(l)
space on average - The slicing operation
t[1:]
creates new strings, but at any point, the total space used for all word fragments doesn't exceedO(m * l)
since we're just storing progressively smaller suffixes of the original words
The space complexity is dominated by storing all the words (or their suffixes) in the dictionary structure, resulting in O(m * l)
space complexity.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Modifying Queue While Iterating
One of the most common mistakes is directly iterating over the deque while modifying it:
Incorrect approach:
# This will cause issues! for word in waiting_dict[char]: current_word = waiting_dict[char].popleft() # ... rest of logic
This creates problems because:
- The iterator gets confused when the deque size changes during iteration
- You might skip words or process words multiple times
- Could lead to infinite loops if you're adding back to the same queue
Correct approach:
# Get the current size first
current_queue_size = len(waiting_dict[char])
for _ in range(current_queue_size):
current_word = waiting_dict[char].popleft()
# ... rest of logic
2. String Slicing Inefficiency
Using string slicing current_word[1:]
creates a new string object each time, which can be inefficient for long words:
Alternative approach using indices:
from collections import defaultdict, deque
class Solution:
def numMatchingSubseq(self, s: str, words: List[str]) -> int:
# Store tuples of (word, current_index)
waiting_dict = defaultdict(deque)
for word in words:
waiting_dict[word[0]].append((word, 0))
matching_count = 0
for char in s:
current_queue_size = len(waiting_dict[char])
for _ in range(current_queue_size):
word, idx = waiting_dict[char].popleft()
idx += 1 # Move to next character
if idx == len(word):
matching_count += 1
else:
next_char = word[idx]
waiting_dict[next_char].append((word, idx))
return matching_count
This approach avoids creating new string objects and instead tracks the current position in each word.
3. Not Handling Empty Strings
If the words array contains empty strings, the code will fail:
Problem:
waiting_dict[word[0]].append(word) # IndexError if word is empty
Solution:
for word in words: if word: # Check for non-empty string waiting_dict[word[0]].append(word) else: # Empty string is always a subsequence matching_count += 1
4. Using Regular Dictionary Instead of defaultdict
Using a regular dictionary requires checking if keys exist before accessing:
Inefficient approach:
waiting_dict = {} for word in words: if word[0] not in waiting_dict: waiting_dict[word[0]] = deque() waiting_dict[word[0]].append(word)
The defaultdict(deque)
automatically creates an empty deque for new keys, making the code cleaner and preventing KeyError exceptions.
What is the best way of checking if an element exists in an unsorted array once in terms of time complexity? Select the best that applies.
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Binary Search Speedrun For each of the Speedrun questions you will be given a binary search related problem and a corresponding multiple choice question The multiple choice questions are related to the techniques and template s introduced in the binary search section It's recommended that you have gone through at
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Want a Structured Path to Master System Design Too? Don’t Miss This!