30. Substring with Concatenation of All Words
Problem Description
You are given a string s
and an array of strings words
. All strings within words
have the same length. Your objective is to find all the starting indices of substrings in s
where a substring is a concatenation of every word in any possible permutation of words
.
Consider words
as a set of building blocks, where each word is a block of the same size. You have to find all position in s
where you can construct a substring using every block exactly once, arranging them in any sequence.
A "concatenated substring" is such that:
- It consists of all words from the
words
array. - The words can be in any order.
- Each word from
words
appears exactly as it is (unmodified, and in full).
For example, if words
is ["ab","cd","ef"]
, then "abcdef"
, "abefcd"
, and "cdabef"
are concatenated strings if they occur in s
. On the other hand, "acdbef"
is not a concatenated substring because it doesn't represent any permutation of words
.
You need to return a list of starting indices of such concatenated substrings found in s
, the order of indices does not matter.
Intuition
To arrive at the solution for this problem, we have to consider that searching for each possible permutation of words
in s
would be computationally expensive. We need a more efficient way to check for concatenated substrings.
The intuition behind the solution lies in the pattern recognition, sliding window, and hash table techniques:
- Pattern Recognition: Since all words in
words
have the same length, our window of the substring to search ins
will also be a constant size, which is the sum of lengths of all words inwords
. - Sliding Window: As we slide this window across
s
, we can incrementally check whether the substring window contains a valid concatenation of the words. - Hash Table: By using a hash table to count the occurrences of the words we've seen in the current window, we can keep track of the words and their counts to determine if the current window forms a valid concatenated substring.
The sliding window starts from the beginning of s
, and we move it to the right one word-length at a time. We continue this process for all possible positions the first word of the window could start from (i.e., 0 to word-length - 1).
At each step, when a new word is included in the sliding window:
- We add it to a current word frequency count hash table.
- If this new word isn't in the original words count hash table, we reset the current one, as it's not a valid continuation.
- If the count of the newly added word exceeds its expected count, we slide the window's left bound to the right to exclude enough occurrences, so the counts match.
- Whenever the number of words within the sliding window is equal to the size of
words
and all word counts correspond, we record the starting index.
By following this strategy, we avoid computing all permutations of words
, with the sliding window efficiently narrowing down possible starting indices.
Learn more about Sliding Window patterns.
Solution Approach
The solution approach involves using a hash table and a sliding window, as already suggested by the intuition.
-
Hash Table for Counting Words: A hash table is used to count the number of times each word appears in the
words
array. This is thecnt
hash table in the code. -
Setting Up for Sliding Window: The key variables are set up for the sliding window algorithm such as the length of the string (
m
), the number of words (n
), and the length of each word which is assumed to be uniform (k
). The variableans
is an array that will collect our answer - the starting indices of valid concatenated substrings. -
Iterating Starting Points: We start iterating over the string
s
with variablei
which represents the start point of the sliding window. This loop essentially allows us to accommodate words ins
starting from different positions up to the word length, making sure we cover all possible alignments of words withins
. -
Sliding Window Mechanism: Inside the loop, we initialize a counter for the current window (
cnt1
), the left (l
) and right (r
) boundaries of the window, and the variablet
to keep track of the total number of valid words encountered in the current window. -
Processing Words: We then continue to shift our window to the right word by word, capturing each word using
s[r:r+k]
. We also keep checking if the captured word is in thecnt
hash table. If it's not incnt
, we reset our counters and move the window forward because the word does not belong to any concatenation ofwords
. -
Updating Word Count: If the word is valid, we increment its count in
cnt1
and also the total countt
. -
Validating Counts: If there are too many occurrences of the word in the current window, we increment the left boundary of the window
l
to reduce the count of the word in the window. This is necessary to match the word count exactly to that of the input words count. -
Storing Valid Indices: If the total number of words
t
in the sliding window equals the number of words inwords
(n
), it means we found a valid concatenation starting at indexl
. This index is stored in the answer arrayans
.
By the end of the outer loop, we have considered all possible alignments and have added all valid starting points to the ans
array, which is then returned.
The sliding window algorithm is made efficient by the fact that it maintains a balance of counts dynamically as it shifts over the string s
, ensuring the constraints are satisfied before adding a starting index to the answer array.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's go through a simple example to illustrate the solution approach:
Suppose the input string s
is "wordgoodgoodgoodbestword"
and words
is ["word","good","best"]
, where each word in words
is of length 4.
Now let's follow the steps of the approach to find the starting indices of the concatenated substrings:
-
Create the Hash Table: Create a hash table (
cnt
) to store the frequency of each word inwords
.cnt = {"word": 1, "good": 1, "best": 1}
-
Set Up Sliding Window Variables: The length of each word (
k
) is 4,s
length (m
) is 22, and there are 3 words (n
) inwords
.Starting indices of
ans
will be collected in an array. -
Iterate Starting Points: Iterate with
i
from 0 to less thank
(sincek
is 4,i
will take values 0, 1, 2, and 3). -
For
i = 0
: (First window position)- Initialize
cnt1
(current window frequency),l
(left boundary of window),r
(right, initiallyi
), andt
(total valid words in current window, initially 0). - Start the right boundary
r
at index 0 andl
at index 0. - Slide
r
byk
to capture a word. (first wordword
from indices 0 to 4). - Since
word
is incnt
, add tocnt1
and incrementt
. - Now,
r = 4
,l = 0
, andt = 1
. - Continue shifting
r
and repeat the process. - At
r = 16
, the window capturesbest
, add it tocnt1
and incrementt
. - At this point,
r = 20
,l = 0
, andt = 3
. - Since
t
equals the number of words inwords
, we have a valid starting index atl
, which is 0.
Now
ans
array has[0]
. - Initialize
-
For
i = 1
(second possible window position):- Similar process, we initialize the variables again, but we start checking from the first index.
- No match will be found starting at
i = 1
.
-
For
i = 2
(third possible window position):- Similarly, initialize and check from index 2.
- No match will be found starting at
i = 2
.
-
For
i = 3
(fourth possible window position):- Initialize and start checking from index 3.
- A match is found for the second occurrence of the valid concatenated substring starting at index
l = 9
.
We add 9 to our
ans
array, nowans
becomes[0, 9]
.
By the end of the algorithm, we've checked all possible alignments within s
for concatenations of all words in words
. Thus, the final ans
array contains [0, 9]
, which are the starting indices where the concatenated substrings appear in s
.
The solution is efficient because it does not calculate all permutations of words
; instead, it builds the concatenated substring as it slides over s
, and validates the counts of encountered words against cnt
to determine valid starting points.
Solution Implementation
1from collections import Counter
2
3class Solution:
4 def findSubstring(self, s: str, words: List[str]) -> List[int]:
5 # Count the frequency of each word in the 'words' list
6 word_count = Counter(words)
7 total_length = len(s)
8 num_words = len(words)
9 word_length = len(words[0])
10
11 # This will hold the start indices of the substrings
12 start_indices = []
13
14 # Check every word_length characters to find valid substrings
15 for offset in range(word_length):
16 left = right = offset # Initialize two pointers
17 window_word_count = Counter()
18 total_matched_words = 0
19
20 # Move the window rightwards in the string 's' by word_length
21 while right + word_length <= total_length:
22 word = s[right: right + word_length]
23 right += word_length
24
25 # If the word is not in our word_count, reset window
26 if word not in word_count:
27 left = right
28 window_word_count.clear()
29 total_matched_words = 0
30 continue
31
32 # Increase the count for the new word in our window
33 window_word_count[word] += 1
34 total_matched_words += 1
35
36 # If there are more instances of the word than needed, shrink window
37 while window_word_count[word] > word_count[word]:
38 word_to_remove = s[left: left + word_length]
39 left += word_length
40 window_word_count[word_to_remove] -= 1
41 total_matched_words -= 1
42
43 # If the window contains exactly 'num_words' words, we found a substring starting at 'left'
44 if total_matched_words == num_words:
45 start_indices.append(left)
46
47 return start_indices
48
1class Solution {
2
3 public List<Integer> findSubstring(String s, String[] words) {
4 Map<String, Integer> wordCount = new HashMap<>();
5
6 // Create and populate a map with the count of each unique word
7 for (String word : words) {
8 wordCount.merge(word, 1, Integer::sum);
9 }
10
11 int strLength = s.length(), numOfWords = words.length;
12 int wordLength = words[0].length(); // Assume all words are the same length
13 List<Integer> indices = new ArrayList<>();
14
15 // Iterate over all possible word start indices to check for valid substrings
16 for (int i = 0; i < wordLength; ++i) {
17 Map<String, Integer> currentCount = new HashMap<>();
18 int left = i, right = i;
19 int totalWords = 0;
20
21 // Expand the window to the right, adding words into current window count
22 while (right + wordLength <= strLength) {
23 String sub = s.substring(right, right + wordLength);
24 right += wordLength;
25
26 // If the word is not in the original word list, reset the window
27 if (!wordCount.containsKey(sub)) {
28 currentCount.clear();
29 left = right;
30 totalWords = 0;
31 continue;
32 }
33
34 // Increase the count for the current word in the window
35 currentCount.merge(sub, 1, Integer::sum);
36 ++totalWords;
37
38 // If a word count exceeds its count in wordCount, reduce from left side
39 while (currentCount.get(sub) > wordCount.get(sub)) {
40 String removed = s.substring(left, left + wordLength);
41 left += wordLength;
42 currentCount.merge(removed, -1, Integer::sum);
43 --totalWords;
44 }
45
46 // If the total words reached the number of words, a valid substring is found
47 if (totalWords == numOfWords) {
48 indices.add(left);
49 }
50 }
51 }
52 return indices;
53 }
54}
55
1class Solution {
2public:
3 // This function searches for all starting indices of substring(s) in 's' that is a concatenation of each word in 'words' exactly once and without any intervening characters.
4 vector<int> findSubstring(string s, vector<string>& words) {
5 // Count the frequency of each word in 'words'.
6 unordered_map<string, int> wordCount;
7 for (auto& word : words) {
8 ++wordCount[word];
9 }
10
11 int stringSize = s.size(), wordCountSize = words.size(), wordSize = words[0].size();
12 vector<int> substrIndices;
13
14 // Iterate over the string 's'.
15 for (int i = 0; i < wordSize; ++i) {
16 unordered_map<string, int> windowCount;
17 int left = i, right = i;
18 int totalCount = 0;
19
20 // Slide a window over the string 's'.
21 while (right + wordSize <= stringSize) {
22 string currentWord = s.substr(right, wordSize);
23 right += wordSize;
24
25 // Skip the current segment if the word is not in 'words'.
26 if (!wordCount.count(currentWord)) {
27 windowCount.clear();
28 left = right;
29 totalCount = 0;
30 continue;
31 }
32
33 // Update the count for the current word in the window.
34 ++windowCount[currentWord];
35 ++totalCount;
36
37 // If there are more occurrences of 'currentWord' in the window than in 'words', remove from the left.
38 while (windowCount[currentWord] > wordCount[currentWord]) {
39 string wordToRemove = s.substr(left, wordSize);
40 left += wordSize;
41 --windowCount[wordToRemove];
42 --totalCount;
43 }
44
45 // If the total count of words match and all words frequencies are as expected, add to result.
46 if (totalCount == wordCountSize) {
47 substrIndices.push_back(left);
48 }
49 }
50 }
51
52 return substrIndices;
53 }
54};
55
1function findSubstring(s: string, words: string[]): number[] {
2 // Create a map to store the frequency of words.
3 const wordCountMap: Map<string, number> = new Map();
4 // Populate the word frequency map.
5 for (const word of words) {
6 wordCountMap.set(word, (wordCountMap.get(word) || 0) + 1);
7 }
8 const stringLength: number = s.length;
9 const wordArrayLength: number = words.length;
10 const wordLength: number = words[0].length;
11 const indices: number[] = [];
12
13 // Iterate through the string in increments of word length
14 for (let i = 0; i < wordLength; ++i) {
15 const tempCountMap: Map<string, number> = new Map();
16 let left = i;
17 let right = i;
18 let matchedWordCount = 0;
19
20 // Scan the string in chunks the size of the words' length
21 while (right + wordLength <= stringLength) {
22 const currentWord = s.slice(right, right + wordLength);
23 right += wordLength;
24
25 // Skip the word if it's not in the frequency map
26 if (!wordCountMap.has(currentWord)) {
27 tempCountMap.clear();
28 left = right;
29 matchedWordCount = 0;
30 continue;
31 }
32
33 // Update the temporary count map
34 tempCountMap.set(currentWord, (tempCountMap.get(currentWord) || 0) + 1);
35 ++matchedWordCount;
36
37 // If the current word has been seen more times than it is present in words array, slide the window to the right
38 while ((tempCountMap.get(currentWord)! - wordCountMap.get(currentWord)!) > 0) {
39 const wordToLeft = s.slice(left, left + wordLength);
40 tempCountMap.set(wordToLeft, tempCountMap.get(wordToLeft)! - 1);
41 left += wordLength;
42 --matchedWordCount;
43 }
44
45 // Check if all words match; if so, add to results
46 if (matchedWordCount === wordArrayLength) {
47 indices.push(left);
48 }
49 }
50 }
51
52 return indices;
53}
54
Time and Space Complexity
The time complexity of the given code is O(m * k)
where m
is the length of the string s
and k
is the length of each word within the words list. This stems from the fact that we iterate over the string s
in increments of k
for loops starting at each of the first k
characters.
The space complexity is O(n * k)
where n
is the number of words in the given list words
and k
is the length of each word. This is due to two counters cnt
and cnt1
storing, at most, n
different words of k
length each, along with the list ans
that in the worst case could store up to m / k
starting indices if every substring is a valid concatenation.
Learn more about how to find time and space complexity quickly using problem constraints.
You are given an array of intervals where intervals[i] = [start_i, end_i]
represent the start and end of the ith
interval. You need to merge all overlapping intervals and return an array of the non-overlapping intervals that cover all the intervals in the input.
Recommended Readings
https algomonster s3 us east 2 amazonaws com cover_photos stack svg Sliding Window Maximum Monotonic Stack We have an array and a sliding window defined by a start index and an end index The sliding window moves from left of the array to right There are always k elements in
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!