30. Substring with Concatenation of All Words
Problem Description
You need to find all starting positions in a string s
where you can find a substring that is formed by concatenating all words from a given array words
in any order.
Given:
- A string
s
- An array of strings
words
where all strings have the same length
A concatenated string is formed by taking all the words from words
and joining them together in any order, with each word used exactly once.
For example, if words = ["ab","cd","ef"]
:
- Valid concatenated strings include:
"abcdef"
,"abefcd"
,"cdabef"
,"cdefab"
,"efabcd"
,"efcdab"
- Invalid example:
"acdbef"
(not a valid concatenation of the words)
Your task is to find all positions in s
where a substring starts that matches any valid concatenation of all words in words
. Return these starting indices in any order.
The solution uses a sliding window approach with hash tables. It maintains a counter cnt
for the frequency of words in the original words
array and another counter cnt1
for the current window. The algorithm iterates through possible starting positions (0 to k-1, where k is the length of each word) and for each starting position, slides a window of size n * k
(where n is the number of words) through the string.
The window moves by word-length increments, checking if the current window contains exactly the same words as the words
array. When a valid concatenation is found (window size equals n * k
and all word frequencies match), the starting position is added to the result.
Intuition
The key insight is recognizing that since all words have the same length k
, we can treat the problem as finding windows in the string that contain exactly the right words with the right frequencies.
Instead of checking every possible substring of length n * k
starting from every position (which would be inefficient), we can optimize by observing that valid concatenations must align with word boundaries. Since words are of fixed length k
, any valid substring must start at positions that, when divided by k
, give us consistent word boundaries.
This leads us to the realization that we only need to check k
different "tracks" or starting offsets (0, 1, 2, ..., k-1). For each offset i
, we can slide through the string jumping by word-length k
at a time, essentially reading the string word by word from that offset.
The sliding window technique becomes natural here because:
- We're looking for a fixed-size window (containing exactly
n
words) - As we slide, we can incrementally update our word counts rather than recounting from scratch
- When we encounter an invalid word or too many occurrences of a valid word, we can shrink the window from the left
The use of hash tables is intuitive for tracking word frequencies - we need to quickly check if a word exists in our target list and compare frequencies between our target (cnt
) and current window (cnt1
).
The optimization of clearing the window when encountering an invalid word (not in words
) comes from realizing that any window containing such a word cannot be valid, so we might as well start fresh from the position after that invalid word.
Learn more about Sliding Window patterns.
Solution Approach
The solution uses a sliding window technique with hash tables to efficiently find all valid starting positions.
Data Structures:
cnt
: A Counter (hash table) storing the frequency of each word inwords
cnt1
: A Counter tracking the frequency of words in the current sliding windowans
: List to store valid starting indices
Algorithm Implementation:
-
Initialize variables:
m = len(s)
: length of the stringn = len(words)
: number of wordsk = len(words[0])
: length of each word
-
Iterate through k different starting offsets (
i
from 0 to k-1): For each offset, we create a separate sliding window track. -
For each offset i, maintain a sliding window:
l = r = i
: Initialize left and right pointers at offseti
cnt1 = Counter()
: Reset word frequency counter for this track
-
Slide the window by word-length increments:
while r + k <= m: t = s[r : r + k] # Extract word at right boundary r += k # Move right pointer by word length
-
Handle three cases for each extracted word
t
:Case 1: Invalid word (
cnt[t] == 0
):- The word doesn't exist in
words
- Reset window:
l = r
, clearcnt1
- Continue to next iteration
Case 2: Valid word:
- Add to window:
cnt1[t] += 1
- If frequency exceeds allowed (
cnt1[t] > cnt[t]
):- Shrink window from left until frequency is valid:
while cnt1[t] > cnt[t]: rem = s[l : l + k] l += k cnt1[rem] -= 1
Case 3: Valid window found:
- When
r - l == n * k
(window contains exactlyn
words) - Add starting position
l
to answer list
- The word doesn't exist in
-
Return the collected starting indices after checking all
k
offsets.
The algorithm efficiently processes the string in O(m * k)
time, where instead of checking every position, we only check k
different tracks, and for each track, we scan through the string once using the sliding window technique.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a concrete example to illustrate the solution approach.
Input:
s = "barfoothefoobarman"
words = ["foo", "bar"]
Setup:
m = 18
(length of s)n = 2
(number of words)k = 3
(length of each word)cnt = {"foo": 1, "bar": 1}
(frequency map of words)
We need to check k = 3
different starting offsets (0, 1, 2).
Offset 0 (i = 0):
- Initialize:
l = 0, r = 0, cnt1 = {}
- Extract "bar" (r=0 to 3): Valid word,
cnt1 = {"bar": 1}
, mover = 3
- Extract "foo" (r=3 to 6): Valid word,
cnt1 = {"bar": 1, "foo": 1}
, mover = 6
- Window size:
r - l = 6 - 0 = 6 = n * k
✓ Found valid position at index 0 - Extract "the" (r=6 to 9): Invalid word (not in words), reset
l = 9, r = 9, cnt1 = {}
- Extract "foo" (r=9 to 12): Valid word,
cnt1 = {"foo": 1}
, mover = 12
- Extract "bar" (r=12 to 15): Valid word,
cnt1 = {"foo": 1, "bar": 1}
, mover = 15
- Window size:
r - l = 15 - 9 = 6 = n * k
✓ Found valid position at index 9 - Extract "man" (r=15 to 18): Invalid word, reset (end of string)
Offset 1 (i = 1):
- Initialize:
l = 1, r = 1, cnt1 = {}
- Extract "arf" (r=1 to 4): Invalid word, reset
l = 4, r = 4, cnt1 = {}
- Extract "oot" (r=4 to 7): Invalid word, reset
l = 7, r = 7, cnt1 = {}
- Extract "hef" (r=7 to 10): Invalid word, reset
l = 10, r = 10, cnt1 = {}
- Extract "oob" (r=10 to 13): Invalid word, reset
l = 13, r = 13, cnt1 = {}
- Extract "arm" (r=13 to 16): Invalid word, reset
l = 16, r = 16, cnt1 = {}
- Extract "an" (r=16 to 18): Only 2 characters left, stop
Offset 2 (i = 2):
- Initialize:
l = 2, r = 2, cnt1 = {}
- Extract "rfo" (r=2 to 5): Invalid word, reset
l = 5, r = 5, cnt1 = {}
- Extract "oth" (r=5 to 8): Invalid word, reset
l = 8, r = 8, cnt1 = {}
- Extract "efo" (r=8 to 11): Invalid word, reset
l = 11, r = 11, cnt1 = {}
- Extract "oba" (r=11 to 14): Invalid word, reset
l = 14, r = 14, cnt1 = {}
- Extract "rma" (r=14 to 17): Invalid word, reset
l = 17, r = 17, cnt1 = {}
- Extract "n" (r=17 to 18): Only 1 character left, stop
Result: [0, 9]
The algorithm found two valid starting positions where concatenations of all words exist:
- Position 0: "barfoo" = "bar" + "foo"
- Position 9: "foobar" = "foo" + "bar"
Solution Implementation
1from typing import List
2from collections import Counter
3
4class Solution:
5 def findSubstring(self, s: str, words: List[str]) -> List[int]:
6 # Count frequency of each word in the words list
7 word_count = Counter(words)
8
9 # Get lengths: string length, number of words, and length of each word
10 string_length = len(s)
11 num_words = len(words)
12 word_length = len(words[0]) # All words have the same length
13
14 result = []
15
16 # Try all possible starting positions within a word length
17 # This ensures we check all possible alignments
18 for start_pos in range(word_length):
19 left = right = start_pos
20 current_word_count = Counter()
21
22 # Slide the window through the string
23 while right + word_length <= string_length:
24 # Extract the next word from the string
25 current_word = s[right:right + word_length]
26 right += word_length
27
28 # If the word is not in our target words, reset the window
29 if word_count[current_word] == 0:
30 left = right
31 current_word_count.clear()
32 continue
33
34 # Add the current word to our window
35 current_word_count[current_word] += 1
36
37 # If we have too many of the current word, shrink window from left
38 while current_word_count[current_word] > word_count[current_word]:
39 removed_word = s[left:left + word_length]
40 left += word_length
41 current_word_count[removed_word] -= 1
42
43 # Check if we have found a valid concatenation
44 # Window size should equal total length of all words
45 if right - left == num_words * word_length:
46 result.append(left)
47
48 return result
49
1class Solution {
2 public List<Integer> findSubstring(String s, String[] words) {
3 // Build frequency map of all words
4 Map<String, Integer> wordCount = new HashMap<>();
5 for (String word : words) {
6 wordCount.merge(word, 1, Integer::sum);
7 }
8
9 List<Integer> result = new ArrayList<>();
10 int stringLength = s.length();
11 int wordArrayLength = words.length;
12 int wordLength = words[0].length();
13
14 // Try all possible starting positions within a word length
15 // This ensures we check all possible alignments
16 for (int startPos = 0; startPos < wordLength; startPos++) {
17 int left = startPos;
18 int right = startPos;
19 Map<String, Integer> currentWindowCount = new HashMap<>();
20
21 // Slide the window through the string
22 while (right + wordLength <= stringLength) {
23 // Extract the next word from the right side of window
24 String currentWord = s.substring(right, right + wordLength);
25 right += wordLength;
26
27 // If current word is not in our target words, reset the window
28 if (!wordCount.containsKey(currentWord)) {
29 currentWindowCount.clear();
30 left = right;
31 continue;
32 }
33
34 // Add current word to our window count
35 currentWindowCount.merge(currentWord, 1, Integer::sum);
36
37 // Shrink window from left if we have too many occurrences of current word
38 while (currentWindowCount.get(currentWord) > wordCount.get(currentWord)) {
39 String leftWord = s.substring(left, left + wordLength);
40 // Decrease count and remove if it becomes zero
41 if (currentWindowCount.merge(leftWord, -1, Integer::sum) == 0) {
42 currentWindowCount.remove(leftWord);
43 }
44 left += wordLength;
45 }
46
47 // Check if current window contains exactly all words
48 if (right - left == wordArrayLength * wordLength) {
49 result.add(left);
50 }
51 }
52 }
53
54 return result;
55 }
56}
57
1class Solution {
2public:
3 vector<int> findSubstring(string s, vector<string>& words) {
4 // Count frequency of each word in the words array
5 unordered_map<string, int> wordCount;
6 for (const auto& word : words) {
7 wordCount[word]++;
8 }
9
10 vector<int> result;
11 int stringLength = s.length();
12 int wordArraySize = words.size();
13 int wordLength = words[0].length();
14
15 // Try starting from each possible offset (0 to wordLength-1)
16 // This ensures we check all possible alignments
17 for (int offset = 0; offset < wordLength; ++offset) {
18 int left = offset;
19 int right = offset;
20 unordered_map<string, int> currentWindowCount;
21
22 // Slide the window through the string
23 while (right + wordLength <= stringLength) {
24 // Extract the next word from position right
25 string currentWord = s.substr(right, wordLength);
26 right += wordLength;
27
28 // If current word is not in our target words, reset the window
29 if (!wordCount.contains(currentWord)) {
30 currentWindowCount.clear();
31 left = right;
32 continue;
33 }
34
35 // Add current word to our window
36 currentWindowCount[currentWord]++;
37
38 // Shrink window from left while we have excess of any word
39 while (currentWindowCount[currentWord] > wordCount[currentWord]) {
40 string leftWord = s.substr(left, wordLength);
41 currentWindowCount[leftWord]--;
42
43 // Remove from map if count becomes zero to keep map clean
44 if (currentWindowCount[leftWord] == 0) {
45 currentWindowCount.erase(leftWord);
46 }
47 left += wordLength;
48 }
49
50 // Check if current window size matches the total length of all words
51 // If yes, we found a valid starting position
52 if (right - left == wordArraySize * wordLength) {
53 result.push_back(left);
54 }
55 }
56 }
57
58 return result;
59 }
60};
61
1/**
2 * Finds all starting indices of substring(s) in s that is a concatenation of each word in words exactly once
3 * @param s - The input string to search in
4 * @param words - Array of words that should form the concatenation
5 * @returns Array of starting indices where valid concatenations are found
6 */
7function findSubstring(s: string, words: string[]): number[] {
8 // Create frequency map of words to track expected occurrences
9 const wordFrequencyMap: Map<string, number> = new Map();
10 for (const word of words) {
11 wordFrequencyMap.set(word, (wordFrequencyMap.get(word) || 0) + 1);
12 }
13
14 const result: number[] = [];
15 const stringLength = s.length;
16 const wordCount = words.length;
17 const wordLength = words[0].length;
18
19 // Try all possible starting positions within a word length
20 // This ensures we check all possible alignments
21 for (let offset = 0; offset < wordLength; offset++) {
22 let leftPointer = offset;
23 let rightPointer = offset;
24 const currentWindowMap: Map<string, number> = new Map();
25
26 // Slide the window through the string
27 while (rightPointer + wordLength <= stringLength) {
28 // Extract the next word from the string
29 const currentWord = s.substring(rightPointer, rightPointer + wordLength);
30 rightPointer += wordLength;
31
32 // If the word is not in our target words, reset the window
33 if (!wordFrequencyMap.has(currentWord)) {
34 currentWindowMap.clear();
35 leftPointer = rightPointer;
36 continue;
37 }
38
39 // Add the current word to our window frequency map
40 currentWindowMap.set(currentWord, (currentWindowMap.get(currentWord) || 0) + 1);
41
42 // Shrink window from left if we have excess of current word
43 while (currentWindowMap.get(currentWord)! > wordFrequencyMap.get(currentWord)!) {
44 const leftWord = s.substring(leftPointer, leftPointer + wordLength);
45 currentWindowMap.set(leftWord, currentWindowMap.get(leftWord)! - 1);
46
47 // Remove word from map if count reaches zero
48 if (currentWindowMap.get(leftWord) === 0) {
49 currentWindowMap.delete(leftWord);
50 }
51 leftPointer += wordLength;
52 }
53
54 // Check if current window matches the required concatenation length
55 if (rightPointer - leftPointer === wordCount * wordLength) {
56 result.push(leftPointer);
57 }
58 }
59 }
60
61 return result;
62}
63
Time and Space Complexity
Time Complexity: O(m × k)
The algorithm uses a sliding window approach with k
different starting positions (from 0
to k-1
). For each starting position, it processes the string s
by moving the window in steps of size k
. In each iteration:
- The right pointer
r
moves through the string in increments ofk
, covering at mostm/k
positions - For each position, we extract a substring of length
k
(takesO(k)
time) - The inner while loop for adjusting the window also processes each character at most once
Since we have k
starting positions and each processes the string with O(m)
operations involving strings of length k
, the total time complexity is O(k × m/k × k) = O(m × k)
.
Space Complexity: O(n × k)
The space is used for:
cnt
: A Counter storing all words from thewords
array, which containsn
words each of lengthk
, requiringO(n × k)
spacecnt1
: Another Counter that at most storesn
words (when a valid window is found), also requiringO(n × k)
space in the worst caseans
: The result list stores indices, which isO(number of results)
and doesn't dominate the complexity- Temporary string slices like
t
andrem
useO(k)
space
The dominant space complexity is O(n × k)
from the Counter data structures storing the word frequencies.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Forgetting to Reset the Window Counter for Each Starting Offset
One of the most common mistakes is not properly resetting the current_word_count
counter when starting a new offset track. This can lead to incorrect word frequencies being carried over between different starting positions.
Incorrect approach:
# WRONG: Counter declared outside the loop
current_word_count = Counter()
for start_pos in range(word_length):
left = right = start_pos
# Counter not reset - carries over frequencies from previous offset!
Correct approach:
for start_pos in range(word_length):
left = right = start_pos
current_word_count = Counter() # Reset for each new offset
2. Not Handling the Case When Window Needs to Shrink from the Left
Another critical pitfall is failing to properly maintain the sliding window when we encounter a valid word but have too many occurrences of it. Simply moving the entire window or resetting it would miss valid concatenations.
Incorrect approach:
if current_word_count[current_word] > word_count[current_word]: # WRONG: Completely resetting loses valid partial matches left = right current_word_count.clear()
Correct approach:
# Shrink from left until the frequency is valid while current_word_count[current_word] > word_count[current_word]: removed_word = s[left:left + word_length] left += word_length current_word_count[removed_word] -= 1
3. Incorrect Window Size Validation
A subtle error is checking the window size incorrectly or at the wrong time. The window should be checked after ensuring all word frequencies are valid, not before.
Incorrect approach:
# WRONG: Checking size before frequency validation if right - left == num_words * word_length: result.append(left) current_word_count[current_word] += 1 # Then handling excess frequencies...
Correct approach:
# First add the word and handle frequencies current_word_count[current_word] += 1 # Handle excess frequencies if needed while current_word_count[current_word] > word_count[current_word]: # ... shrink window # THEN check if we have a valid window size if right - left == num_words * word_length: result.append(left)
4. Off-by-One Errors in Boundary Checks
Be careful with the loop boundary condition. The check should be right + word_length <= string_length
to ensure we can extract a complete word.
Incorrect approach:
# WRONG: Could attempt to access beyond string bounds while right < string_length: current_word = s[right:right + word_length] # May be incomplete
Correct approach:
while right + word_length <= string_length: current_word = s[right:right + word_length] # Always gets complete word
These pitfalls can cause the algorithm to miss valid concatenations, return incorrect indices, or even crash with index errors. Always ensure proper window management and boundary checks when implementing sliding window algorithms.
Which two pointer techniques do you use to check if a string is a palindrome?
Recommended Readings
https assets algo monster cover_photos stack svg Sliding Window Maximum Monotonic Stack We have an array and a sliding window defined by a start index and an end index The sliding window moves from left of the array to right There are always k elements in the window The window
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Don’t Miss This!