Facebook Pixel

30. Substring with Concatenation of All Words

HardHash TableStringSliding Window
Leetcode Link

Problem Description

You need to find all starting positions in a string s where you can find a substring that is formed by concatenating all words from a given array words in any order.

Given:

  • A string s
  • An array of strings words where all strings have the same length

A concatenated string is formed by taking all the words from words and joining them together in any order, with each word used exactly once.

For example, if words = ["ab","cd","ef"]:

  • Valid concatenated strings include: "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", "efcdab"
  • Invalid example: "acdbef" (not a valid concatenation of the words)

Your task is to find all positions in s where a substring starts that matches any valid concatenation of all words in words. Return these starting indices in any order.

The solution uses a sliding window approach with hash tables. It maintains a counter cnt for the frequency of words in the original words array and another counter cnt1 for the current window. The algorithm iterates through possible starting positions (0 to k-1, where k is the length of each word) and for each starting position, slides a window of size n * k (where n is the number of words) through the string.

The window moves by word-length increments, checking if the current window contains exactly the same words as the words array. When a valid concatenation is found (window size equals n * k and all word frequencies match), the starting position is added to the result.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is recognizing that since all words have the same length k, we can treat the problem as finding windows in the string that contain exactly the right words with the right frequencies.

Instead of checking every possible substring of length n * k starting from every position (which would be inefficient), we can optimize by observing that valid concatenations must align with word boundaries. Since words are of fixed length k, any valid substring must start at positions that, when divided by k, give us consistent word boundaries.

This leads us to the realization that we only need to check k different "tracks" or starting offsets (0, 1, 2, ..., k-1). For each offset i, we can slide through the string jumping by word-length k at a time, essentially reading the string word by word from that offset.

The sliding window technique becomes natural here because:

  1. We're looking for a fixed-size window (containing exactly n words)
  2. As we slide, we can incrementally update our word counts rather than recounting from scratch
  3. When we encounter an invalid word or too many occurrences of a valid word, we can shrink the window from the left

The use of hash tables is intuitive for tracking word frequencies - we need to quickly check if a word exists in our target list and compare frequencies between our target (cnt) and current window (cnt1).

The optimization of clearing the window when encountering an invalid word (not in words) comes from realizing that any window containing such a word cannot be valid, so we might as well start fresh from the position after that invalid word.

Learn more about Sliding Window patterns.

Solution Approach

The solution uses a sliding window technique with hash tables to efficiently find all valid starting positions.

Data Structures:

  • cnt: A Counter (hash table) storing the frequency of each word in words
  • cnt1: A Counter tracking the frequency of words in the current sliding window
  • ans: List to store valid starting indices

Algorithm Implementation:

  1. Initialize variables:

    • m = len(s): length of the string
    • n = len(words): number of words
    • k = len(words[0]): length of each word
  2. Iterate through k different starting offsets (i from 0 to k-1): For each offset, we create a separate sliding window track.

  3. For each offset i, maintain a sliding window:

    • l = r = i: Initialize left and right pointers at offset i
    • cnt1 = Counter(): Reset word frequency counter for this track
  4. Slide the window by word-length increments:

    while r + k <= m:
        t = s[r : r + k]  # Extract word at right boundary
        r += k            # Move right pointer by word length
  5. Handle three cases for each extracted word t:

    Case 1: Invalid word (cnt[t] == 0):

    • The word doesn't exist in words
    • Reset window: l = r, clear cnt1
    • Continue to next iteration

    Case 2: Valid word:

    • Add to window: cnt1[t] += 1
    • If frequency exceeds allowed (cnt1[t] > cnt[t]):
      • Shrink window from left until frequency is valid:
      while cnt1[t] > cnt[t]:
          rem = s[l : l + k]
          l += k
          cnt1[rem] -= 1

    Case 3: Valid window found:

    • When r - l == n * k (window contains exactly n words)
    • Add starting position l to answer list
  6. Return the collected starting indices after checking all k offsets.

The algorithm efficiently processes the string in O(m * k) time, where instead of checking every position, we only check k different tracks, and for each track, we scan through the string once using the sliding window technique.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a concrete example to illustrate the solution approach.

Input:

  • s = "barfoothefoobarman"
  • words = ["foo", "bar"]

Setup:

  • m = 18 (length of s)
  • n = 2 (number of words)
  • k = 3 (length of each word)
  • cnt = {"foo": 1, "bar": 1} (frequency map of words)

We need to check k = 3 different starting offsets (0, 1, 2).

Offset 0 (i = 0):

  • Initialize: l = 0, r = 0, cnt1 = {}
  • Extract "bar" (r=0 to 3): Valid word, cnt1 = {"bar": 1}, move r = 3
  • Extract "foo" (r=3 to 6): Valid word, cnt1 = {"bar": 1, "foo": 1}, move r = 6
  • Window size: r - l = 6 - 0 = 6 = n * kFound valid position at index 0
  • Extract "the" (r=6 to 9): Invalid word (not in words), reset l = 9, r = 9, cnt1 = {}
  • Extract "foo" (r=9 to 12): Valid word, cnt1 = {"foo": 1}, move r = 12
  • Extract "bar" (r=12 to 15): Valid word, cnt1 = {"foo": 1, "bar": 1}, move r = 15
  • Window size: r - l = 15 - 9 = 6 = n * kFound valid position at index 9
  • Extract "man" (r=15 to 18): Invalid word, reset (end of string)

Offset 1 (i = 1):

  • Initialize: l = 1, r = 1, cnt1 = {}
  • Extract "arf" (r=1 to 4): Invalid word, reset l = 4, r = 4, cnt1 = {}
  • Extract "oot" (r=4 to 7): Invalid word, reset l = 7, r = 7, cnt1 = {}
  • Extract "hef" (r=7 to 10): Invalid word, reset l = 10, r = 10, cnt1 = {}
  • Extract "oob" (r=10 to 13): Invalid word, reset l = 13, r = 13, cnt1 = {}
  • Extract "arm" (r=13 to 16): Invalid word, reset l = 16, r = 16, cnt1 = {}
  • Extract "an" (r=16 to 18): Only 2 characters left, stop

Offset 2 (i = 2):

  • Initialize: l = 2, r = 2, cnt1 = {}
  • Extract "rfo" (r=2 to 5): Invalid word, reset l = 5, r = 5, cnt1 = {}
  • Extract "oth" (r=5 to 8): Invalid word, reset l = 8, r = 8, cnt1 = {}
  • Extract "efo" (r=8 to 11): Invalid word, reset l = 11, r = 11, cnt1 = {}
  • Extract "oba" (r=11 to 14): Invalid word, reset l = 14, r = 14, cnt1 = {}
  • Extract "rma" (r=14 to 17): Invalid word, reset l = 17, r = 17, cnt1 = {}
  • Extract "n" (r=17 to 18): Only 1 character left, stop

Result: [0, 9]

The algorithm found two valid starting positions where concatenations of all words exist:

  • Position 0: "barfoo" = "bar" + "foo"
  • Position 9: "foobar" = "foo" + "bar"

Solution Implementation

1from typing import List
2from collections import Counter
3
4class Solution:
5    def findSubstring(self, s: str, words: List[str]) -> List[int]:
6        # Count frequency of each word in the words list
7        word_count = Counter(words)
8      
9        # Get lengths: string length, number of words, and length of each word
10        string_length = len(s)
11        num_words = len(words)
12        word_length = len(words[0])  # All words have the same length
13      
14        result = []
15      
16        # Try all possible starting positions within a word length
17        # This ensures we check all possible alignments
18        for start_pos in range(word_length):
19            left = right = start_pos
20            current_word_count = Counter()
21          
22            # Slide the window through the string
23            while right + word_length <= string_length:
24                # Extract the next word from the string
25                current_word = s[right:right + word_length]
26                right += word_length
27              
28                # If the word is not in our target words, reset the window
29                if word_count[current_word] == 0:
30                    left = right
31                    current_word_count.clear()
32                    continue
33              
34                # Add the current word to our window
35                current_word_count[current_word] += 1
36              
37                # If we have too many of the current word, shrink window from left
38                while current_word_count[current_word] > word_count[current_word]:
39                    removed_word = s[left:left + word_length]
40                    left += word_length
41                    current_word_count[removed_word] -= 1
42              
43                # Check if we have found a valid concatenation
44                # Window size should equal total length of all words
45                if right - left == num_words * word_length:
46                    result.append(left)
47      
48        return result
49
1class Solution {
2    public List<Integer> findSubstring(String s, String[] words) {
3        // Build frequency map of all words
4        Map<String, Integer> wordCount = new HashMap<>();
5        for (String word : words) {
6            wordCount.merge(word, 1, Integer::sum);
7        }
8      
9        List<Integer> result = new ArrayList<>();
10        int stringLength = s.length();
11        int wordArrayLength = words.length;
12        int wordLength = words[0].length();
13      
14        // Try all possible starting positions within a word length
15        // This ensures we check all possible alignments
16        for (int startPos = 0; startPos < wordLength; startPos++) {
17            int left = startPos;
18            int right = startPos;
19            Map<String, Integer> currentWindowCount = new HashMap<>();
20          
21            // Slide the window through the string
22            while (right + wordLength <= stringLength) {
23                // Extract the next word from the right side of window
24                String currentWord = s.substring(right, right + wordLength);
25                right += wordLength;
26              
27                // If current word is not in our target words, reset the window
28                if (!wordCount.containsKey(currentWord)) {
29                    currentWindowCount.clear();
30                    left = right;
31                    continue;
32                }
33              
34                // Add current word to our window count
35                currentWindowCount.merge(currentWord, 1, Integer::sum);
36              
37                // Shrink window from left if we have too many occurrences of current word
38                while (currentWindowCount.get(currentWord) > wordCount.get(currentWord)) {
39                    String leftWord = s.substring(left, left + wordLength);
40                    // Decrease count and remove if it becomes zero
41                    if (currentWindowCount.merge(leftWord, -1, Integer::sum) == 0) {
42                        currentWindowCount.remove(leftWord);
43                    }
44                    left += wordLength;
45                }
46              
47                // Check if current window contains exactly all words
48                if (right - left == wordArrayLength * wordLength) {
49                    result.add(left);
50                }
51            }
52        }
53      
54        return result;
55    }
56}
57
1class Solution {
2public:
3    vector<int> findSubstring(string s, vector<string>& words) {
4        // Count frequency of each word in the words array
5        unordered_map<string, int> wordCount;
6        for (const auto& word : words) {
7            wordCount[word]++;
8        }
9
10        vector<int> result;
11        int stringLength = s.length();
12        int wordArraySize = words.size();
13        int wordLength = words[0].length();
14
15        // Try starting from each possible offset (0 to wordLength-1)
16        // This ensures we check all possible alignments
17        for (int offset = 0; offset < wordLength; ++offset) {
18            int left = offset;
19            int right = offset;
20            unordered_map<string, int> currentWindowCount;
21          
22            // Slide the window through the string
23            while (right + wordLength <= stringLength) {
24                // Extract the next word from position right
25                string currentWord = s.substr(right, wordLength);
26                right += wordLength;
27
28                // If current word is not in our target words, reset the window
29                if (!wordCount.contains(currentWord)) {
30                    currentWindowCount.clear();
31                    left = right;
32                    continue;
33                }
34
35                // Add current word to our window
36                currentWindowCount[currentWord]++;
37
38                // Shrink window from left while we have excess of any word
39                while (currentWindowCount[currentWord] > wordCount[currentWord]) {
40                    string leftWord = s.substr(left, wordLength);
41                    currentWindowCount[leftWord]--;
42                  
43                    // Remove from map if count becomes zero to keep map clean
44                    if (currentWindowCount[leftWord] == 0) {
45                        currentWindowCount.erase(leftWord);
46                    }
47                    left += wordLength;
48                }
49
50                // Check if current window size matches the total length of all words
51                // If yes, we found a valid starting position
52                if (right - left == wordArraySize * wordLength) {
53                    result.push_back(left);
54                }
55            }
56        }
57
58        return result;
59    }
60};
61
1/**
2 * Finds all starting indices of substring(s) in s that is a concatenation of each word in words exactly once
3 * @param s - The input string to search in
4 * @param words - Array of words that should form the concatenation
5 * @returns Array of starting indices where valid concatenations are found
6 */
7function findSubstring(s: string, words: string[]): number[] {
8    // Create frequency map of words to track expected occurrences
9    const wordFrequencyMap: Map<string, number> = new Map();
10    for (const word of words) {
11        wordFrequencyMap.set(word, (wordFrequencyMap.get(word) || 0) + 1);
12    }
13  
14    const result: number[] = [];
15    const stringLength = s.length;
16    const wordCount = words.length;
17    const wordLength = words[0].length;
18  
19    // Try all possible starting positions within a word length
20    // This ensures we check all possible alignments
21    for (let offset = 0; offset < wordLength; offset++) {
22        let leftPointer = offset;
23        let rightPointer = offset;
24        const currentWindowMap: Map<string, number> = new Map();
25      
26        // Slide the window through the string
27        while (rightPointer + wordLength <= stringLength) {
28            // Extract the next word from the string
29            const currentWord = s.substring(rightPointer, rightPointer + wordLength);
30            rightPointer += wordLength;
31          
32            // If the word is not in our target words, reset the window
33            if (!wordFrequencyMap.has(currentWord)) {
34                currentWindowMap.clear();
35                leftPointer = rightPointer;
36                continue;
37            }
38          
39            // Add the current word to our window frequency map
40            currentWindowMap.set(currentWord, (currentWindowMap.get(currentWord) || 0) + 1);
41          
42            // Shrink window from left if we have excess of current word
43            while (currentWindowMap.get(currentWord)! > wordFrequencyMap.get(currentWord)!) {
44                const leftWord = s.substring(leftPointer, leftPointer + wordLength);
45                currentWindowMap.set(leftWord, currentWindowMap.get(leftWord)! - 1);
46              
47                // Remove word from map if count reaches zero
48                if (currentWindowMap.get(leftWord) === 0) {
49                    currentWindowMap.delete(leftWord);
50                }
51                leftPointer += wordLength;
52            }
53          
54            // Check if current window matches the required concatenation length
55            if (rightPointer - leftPointer === wordCount * wordLength) {
56                result.push(leftPointer);
57            }
58        }
59    }
60  
61    return result;
62}
63

Time and Space Complexity

Time Complexity: O(m × k)

The algorithm uses a sliding window approach with k different starting positions (from 0 to k-1). For each starting position, it processes the string s by moving the window in steps of size k. In each iteration:

  • The right pointer r moves through the string in increments of k, covering at most m/k positions
  • For each position, we extract a substring of length k (takes O(k) time)
  • The inner while loop for adjusting the window also processes each character at most once

Since we have k starting positions and each processes the string with O(m) operations involving strings of length k, the total time complexity is O(k × m/k × k) = O(m × k).

Space Complexity: O(n × k)

The space is used for:

  • cnt: A Counter storing all words from the words array, which contains n words each of length k, requiring O(n × k) space
  • cnt1: Another Counter that at most stores n words (when a valid window is found), also requiring O(n × k) space in the worst case
  • ans: The result list stores indices, which is O(number of results) and doesn't dominate the complexity
  • Temporary string slices like t and rem use O(k) space

The dominant space complexity is O(n × k) from the Counter data structures storing the word frequencies.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Forgetting to Reset the Window Counter for Each Starting Offset

One of the most common mistakes is not properly resetting the current_word_count counter when starting a new offset track. This can lead to incorrect word frequencies being carried over between different starting positions.

Incorrect approach:

# WRONG: Counter declared outside the loop
current_word_count = Counter()
for start_pos in range(word_length):
    left = right = start_pos
    # Counter not reset - carries over frequencies from previous offset!

Correct approach:

for start_pos in range(word_length):
    left = right = start_pos
    current_word_count = Counter()  # Reset for each new offset

2. Not Handling the Case When Window Needs to Shrink from the Left

Another critical pitfall is failing to properly maintain the sliding window when we encounter a valid word but have too many occurrences of it. Simply moving the entire window or resetting it would miss valid concatenations.

Incorrect approach:

if current_word_count[current_word] > word_count[current_word]:
    # WRONG: Completely resetting loses valid partial matches
    left = right
    current_word_count.clear()

Correct approach:

# Shrink from left until the frequency is valid
while current_word_count[current_word] > word_count[current_word]:
    removed_word = s[left:left + word_length]
    left += word_length
    current_word_count[removed_word] -= 1

3. Incorrect Window Size Validation

A subtle error is checking the window size incorrectly or at the wrong time. The window should be checked after ensuring all word frequencies are valid, not before.

Incorrect approach:

# WRONG: Checking size before frequency validation
if right - left == num_words * word_length:
    result.append(left)
current_word_count[current_word] += 1
# Then handling excess frequencies...

Correct approach:

# First add the word and handle frequencies
current_word_count[current_word] += 1

# Handle excess frequencies if needed
while current_word_count[current_word] > word_count[current_word]:
    # ... shrink window

# THEN check if we have a valid window size
if right - left == num_words * word_length:
    result.append(left)

4. Off-by-One Errors in Boundary Checks

Be careful with the loop boundary condition. The check should be right + word_length <= string_length to ensure we can extract a complete word.

Incorrect approach:

# WRONG: Could attempt to access beyond string bounds
while right < string_length:
    current_word = s[right:right + word_length]  # May be incomplete

Correct approach:

while right + word_length <= string_length:
    current_word = s[right:right + word_length]  # Always gets complete word

These pitfalls can cause the algorithm to miss valid concatenations, return incorrect indices, or even crash with index errors. Always ensure proper window management and boundary checks when implementing sliding window algorithms.

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Which two pointer techniques do you use to check if a string is a palindrome?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More