Facebook Pixel

1065. Index Pairs of a String πŸ”’

Problem Description

You are given a string text and an array of strings words. Your task is to find all the substrings in text that match any word in the words array.

For each matching substring found, you need to return its starting and ending indices as a pair [i, j], where:

  • i is the starting index of the substring
  • j is the ending index of the substring
  • The substring text[i...j] (inclusive) exists in the words array

The output should be an array containing all such index pairs, sorted in a specific order:

  1. First, sort by the starting index i (ascending)
  2. If two pairs have the same starting index, sort by the ending index j (ascending)

For example, if text = "ababa" and words = ["aba", "ab"]:

  • The substring "ab" appears at indices [0, 1] and [2, 3]
  • The substring "aba" appears at indices [0, 2] and [2, 4]
  • The result would be [[0, 1], [0, 2], [2, 3], [2, 4]]

The solution uses a brute force approach that:

  1. Converts the words list to a set for O(1) lookup time
  2. Iterates through all possible substrings of text using nested loops
  3. Checks if each substring exists in the words set
  4. Returns all matching index pairs in the required sorted order
Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that we need to examine every possible substring of the given text to check if it matches any word in our word list. This naturally leads us to think about generating all substrings systematically.

To generate all possible substrings, we can use two pointers: one for the starting position and one for the ending position. For each starting position i, we try all valid ending positions j where j >= i. This gives us the substring text[i:j+1].

The brute force approach becomes clear when we realize that:

  1. We must check every possible substring (there's no way to skip substrings without potentially missing matches)
  2. For each substring, we need to verify if it exists in the words array

To optimize the lookup operation, we convert the words list into a set. This transformation is crucial because:

  • Checking membership in a list takes O(m) time where m is the number of words
  • Checking membership in a set takes O(1) average time
  • Since we're checking many substrings, this optimization significantly improves performance

The nested loop structure emerges naturally:

  • Outer loop: iterate through all possible starting positions (0 to n-1)
  • Inner loop: for each starting position, iterate through all possible ending positions (from the starting position to n-1)
  • For each [i, j] pair, extract text[i:j+1] and check if it's in our words set

The beauty of this approach is that the nested loops inherently generate the pairs in the required sorted order - first by starting index, then by ending index - eliminating the need for additional sorting.

Learn more about Trie and Sorting patterns.

Solution Approach

The implementation follows a straightforward brute-force strategy with an optimization for fast lookups:

Step 1: Convert words list to a set

words = set(words)

This conversion is the first optimization. By converting the list to a set, we reduce the time complexity of checking if a substring exists in words from O(m) to O(1) on average, where m is the number of words.

Step 2: Get the length of the text

n = len(text)

We store the length to avoid recalculating it in our loops.

Step 3: Generate all substrings and check membership

return [
    [i, j] for i in range(n) for j in range(i, n) if text[i : j + 1] in words
]

This list comprehension implements the core logic:

  • Outer loop for i in range(n): Iterates through all possible starting positions from 0 to n-1
  • Inner loop for j in range(i, n): For each starting position i, iterates through all possible ending positions from i to n-1
    • Starting from i ensures we don't generate empty substrings
    • Going up to n-1 ensures we check all possible substring lengths
  • Substring extraction text[i : j + 1]: Creates the substring from index i to j (inclusive)
    • Note: We use j + 1 because Python's slice notation [i:j+1] includes i but excludes j+1
  • Membership check if text[i : j + 1] in words: Checks if the extracted substring exists in our words set
  • Result collection [i, j]: If the substring is found, we add the pair [i, j] to our result

The nested loops naturally produce pairs in sorted order:

  • Pairs are first ordered by i (outer loop runs from 0 to n-1)
  • For the same i, pairs are ordered by j (inner loop runs from i to n-1)

Time Complexity: O(nΒ² Γ— k) where n is the length of text and k is the average length of substrings being checked Space Complexity: O(m) for the set storage where m is the number of words, plus the space for the output array

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example to illustrate the solution approach.

Given:

  • text = "abcab"
  • words = ["ab", "bc", "ca"]

Step 1: Convert words to a set

words_set = {"ab", "bc", "ca"}

Step 2: Generate all substrings using nested loops

Let's trace through the nested loops systematically:

When i = 0 (starting at 'a'):

  • j = 0: substring = text[0:1] = "a" β†’ not in words_set
  • j = 1: substring = text[0:2] = "ab" β†’ found in words_set, add [0, 1]
  • j = 2: substring = text[0:3] = "abc" β†’ not in words_set
  • j = 3: substring = text[0:4] = "abca" β†’ not in words_set
  • j = 4: substring = text[0:5] = "abcab" β†’ not in words_set

When i = 1 (starting at 'b'):

  • j = 1: substring = text[1:2] = "b" β†’ not in words_set
  • j = 2: substring = text[1:3] = "bc" β†’ found in words_set, add [1, 2]
  • j = 3: substring = text[1:4] = "bca" β†’ not in words_set
  • j = 4: substring = text[1:5] = "bcab" β†’ not in words_set

When i = 2 (starting at 'c'):

  • j = 2: substring = text[2:3] = "c" β†’ not in words_set
  • j = 3: substring = text[2:4] = "ca" β†’ found in words_set, add [2, 3]
  • j = 4: substring = text[2:5] = "cab" β†’ not in words_set

When i = 3 (starting at 'a'):

  • j = 3: substring = text[3:4] = "a" β†’ not in words_set
  • j = 4: substring = text[3:5] = "ab" β†’ found in words_set, add [3, 4]

When i = 4 (starting at 'b'):

  • j = 4: substring = text[4:5] = "b" β†’ not in words_set

Final Result: [[0, 1], [1, 2], [2, 3], [3, 4]]

Notice how the pairs are naturally sorted by starting index first (0, 1, 2, 3), and if there were multiple matches starting at the same index, they would be sorted by ending index due to the inner loop's progression.

Solution Implementation

1class Solution:
2    def indexPairs(self, text: str, words: List[str]) -> List[List[int]]:
3        # Convert words list to set for O(1) lookup time
4        words_set = set(words)
5      
6        # Get the length of the text
7        text_length = len(text)
8      
9        # Find all valid index pairs where substring exists in words set
10        result = []
11      
12        # Iterate through all possible starting positions
13        for start_idx in range(text_length):
14            # Iterate through all possible ending positions from start_idx
15            for end_idx in range(start_idx, text_length):
16                # Extract substring from start_idx to end_idx (inclusive)
17                substring = text[start_idx:end_idx + 1]
18              
19                # Check if the substring exists in the words set
20                if substring in words_set:
21                    # Add the valid index pair to result
22                    result.append([start_idx, end_idx])
23      
24        return result
25
1/**
2 * Trie (Prefix Tree) data structure for efficient string matching
3 */
4class Trie {
5    // Array to store 26 children nodes (one for each lowercase letter)
6    Trie[] children = new Trie[26];
7    // Flag to mark if current node represents end of a word
8    boolean isEnd = false;
9
10    /**
11     * Inserts a word into the trie
12     * @param word The word to be inserted
13     */
14    void insert(String word) {
15        Trie currentNode = this;
16      
17        // Traverse each character in the word
18        for (char character : word.toCharArray()) {
19            // Convert character to index (0-25)
20            int index = character - 'a';
21          
22            // Create new node if path doesn't exist
23            if (currentNode.children[index] == null) {
24                currentNode.children[index] = new Trie();
25            }
26          
27            // Move to the next node
28            currentNode = currentNode.children[index];
29        }
30      
31        // Mark the last node as end of word
32        currentNode.isEnd = true;
33    }
34}
35
36/**
37 * Solution class to find all index pairs where substrings match given words
38 */
39class Solution {
40    /**
41     * Finds all index pairs [i, j] where text.substring(i, j+1) matches any word in words array
42     * @param text The text to search in
43     * @param words Array of words to search for
44     * @return 2D array of index pairs [start, end] in sorted order
45     */
46    public int[][] indexPairs(String text, String[] words) {
47        // Build trie with all words
48        Trie trie = new Trie();
49        for (String word : words) {
50            trie.insert(word);
51        }
52      
53        int textLength = text.length();
54        List<int[]> resultList = new ArrayList<>();
55      
56        // Check all possible starting positions in text
57        for (int startIndex = 0; startIndex < textLength; ++startIndex) {
58            Trie currentNode = trie;
59          
60            // Try to match from current starting position
61            for (int endIndex = startIndex; endIndex < textLength; ++endIndex) {
62                // Get character index (0-25) for current character
63                int charIndex = text.charAt(endIndex) - 'a';
64              
65                // No matching path in trie, stop searching from this start position
66                if (currentNode.children[charIndex] == null) {
67                    break;
68                }
69              
70                // Move to next node in trie
71                currentNode = currentNode.children[charIndex];
72              
73                // If current node marks end of a word, we found a match
74                if (currentNode.isEnd) {
75                    resultList.add(new int[] {startIndex, endIndex});
76                }
77            }
78        }
79      
80        // Convert list to 2D array and return
81        return resultList.toArray(new int[resultList.size()][2]);
82    }
83}
84
1class Trie {
2public:
3    // Array to store 26 children nodes (for lowercase letters a-z)
4    vector<Trie*> children;
5    // Flag to mark if current node represents end of a word
6    bool isEnd = false;
7
8    // Constructor: Initialize children array with 26 null pointers
9    Trie() {
10        children.resize(26);
11    }
12
13    // Insert a word into the trie
14    void insert(string word) {
15        Trie* currentNode = this;
16      
17        // Traverse each character in the word
18        for (char ch : word) {
19            // Convert character to index (0-25)
20            int index = ch - 'a';
21          
22            // Create new node if path doesn't exist
23            if (!currentNode->children[index]) {
24                currentNode->children[index] = new Trie();
25            }
26          
27            // Move to the child node
28            currentNode = currentNode->children[index];
29        }
30      
31        // Mark the last node as end of word
32        currentNode->isEnd = true;
33    }
34};
35
36class Solution {
37public:
38    // Find all index pairs where substrings match words in dictionary
39    vector<vector<int>> indexPairs(string text, vector<string>& words) {
40        // Build trie from all words
41        Trie* trie = new Trie();
42        for (const auto& word : words) {
43            trie->insert(word);
44        }
45      
46        int textLength = text.size();
47        vector<vector<int>> result;
48      
49        // Check all possible starting positions in text
50        for (int startPos = 0; startPos < textLength; ++startPos) {
51            Trie* currentNode = trie;
52          
53            // Try to match from current starting position
54            for (int endPos = startPos; endPos < textLength; ++endPos) {
55                // Get character index (0-25 for a-z)
56                int charIndex = text[endPos] - 'a';
57              
58                // No matching path in trie, stop searching from this start position
59                if (!currentNode->children[charIndex]) {
60                    break;
61                }
62              
63                // Move to next node in trie
64                currentNode = currentNode->children[charIndex];
65              
66                // Found a complete word, add index pair to result
67                if (currentNode->isEnd) {
68                    result.push_back({startPos, endPos});
69                }
70            }
71        }
72      
73        return result;
74    }
75};
76
1// Trie node structure for efficient string matching
2class TrieNode {
3    // Array to store 26 children nodes (for lowercase letters a-z)
4    children: (TrieNode | null)[];
5    // Flag to mark if current node represents end of a word
6    isEnd: boolean;
7
8    // Constructor: Initialize children array with 26 null values
9    constructor() {
10        this.children = new Array(26).fill(null);
11        this.isEnd = false;
12    }
13}
14
15// Global trie root variable
16let trieRoot: TrieNode;
17
18// Insert a word into the trie
19function insertWord(word: string): void {
20    let currentNode = trieRoot;
21  
22    // Traverse each character in the word
23    for (const char of word) {
24        // Convert character to index (0-25 for lowercase a-z)
25        const index = char.charCodeAt(0) - 'a'.charCodeAt(0);
26      
27        // Create new node if path doesn't exist
28        if (!currentNode.children[index]) {
29            currentNode.children[index] = new TrieNode();
30        }
31      
32        // Move to the child node
33        currentNode = currentNode.children[index]!;
34    }
35  
36    // Mark the last node as end of word
37    currentNode.isEnd = true;
38}
39
40// Find all index pairs where substrings match words in dictionary
41function indexPairs(text: string, words: string[]): number[][] {
42    // Initialize and build trie from all words
43    trieRoot = new TrieNode();
44    for (const word of words) {
45        insertWord(word);
46    }
47  
48    const textLength = text.length;
49    const result: number[][] = [];
50  
51    // Check all possible starting positions in text
52    for (let startPos = 0; startPos < textLength; startPos++) {
53        let currentNode = trieRoot;
54      
55        // Try to match from current starting position
56        for (let endPos = startPos; endPos < textLength; endPos++) {
57            // Get character index (0-25 for lowercase a-z)
58            const charIndex = text.charCodeAt(endPos) - 'a'.charCodeAt(0);
59          
60            // No matching path in trie, stop searching from this start position
61            if (!currentNode.children[charIndex]) {
62                break;
63            }
64          
65            // Move to next node in trie
66            currentNode = currentNode.children[charIndex]!;
67          
68            // Found a complete word, add index pair to result
69            if (currentNode.isEnd) {
70                result.push([startPos, endPos]);
71            }
72        }
73    }
74  
75    return result;
76}
77

Time and Space Complexity

Time Complexity: O(nΒ² Γ— m)

The algorithm uses nested loops to iterate through all possible substrings of the text:

  • The outer loop runs n times (where n is the length of text)
  • The inner loop runs up to n times for each iteration of the outer loop
  • For each pair (i, j), we extract substring text[i:j+1] which takes O(j-i+1) time in the worst case, up to O(n)
  • Checking membership in the set words takes O(m) time where m is the average length of words in the set (for string hashing and comparison)

Total: O(nΒ²) iterations Γ— O(n) for substring extraction Γ— O(m) for set lookup = O(nΒ³ Γ— m)

However, since Python optimizes string slicing and hashing, and considering that m (average word length) is typically much smaller than n, the practical complexity is closer to O(nΒ² Γ— m).

Space Complexity: O(k Γ— m + nΒ²)

  • O(k Γ— m) for storing the set of words, where k is the number of words and m is the average word length
  • The output list can contain up to O(nΒ²) pairs in the worst case (when every possible substring is a match)
  • Additional O(n) space for substring creation during iteration

Total space complexity: O(k Γ— m + nΒ²)

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Forgetting to Handle Empty Strings in Words Array

One common pitfall is not considering that the words array might contain empty strings. If an empty string exists in words, the algorithm would generate index pairs like [0, -1], [1, 0], etc., which are invalid since the ending index would be less than the starting index.

Problem Example:

text = "abc"
words = ["", "ab"]  # Empty string in words
# Would attempt to check text[i:i] which is empty string

Solution: Filter out empty strings when converting to a set:

words_set = set(word for word in words if word)

2. Incorrect Index Range in Substring Extraction

A frequent mistake is using text[i:j] instead of text[i:j+1] when extracting substrings. This off-by-one error occurs because Python's slice notation is exclusive of the end index, but the problem expects inclusive indices.

Problem Example:

# Incorrect:
substring = text[start_idx:end_idx]  # Missing the character at end_idx

# If text = "abc" and we want substring from index 0 to 2:
# text[0:2] gives "ab" instead of "abc"

Solution: Always use j+1 in the slice operation:

substring = text[start_idx:end_idx + 1]  # Correct: includes character at end_idx

3. Performance Degradation with Large Words Array

While converting to a set helps with lookup time, if the words array contains very long strings, the set creation and substring comparison can become expensive due to string hashing overhead.

Problem Example:

# If words contains many long strings:
words = ["a" * 1000, "b" * 1000, ...]  # Very long strings
# Each substring comparison requires hashing potentially long strings

Solution: Pre-compute the maximum and minimum word lengths to avoid checking substrings that can't possibly match:

words_set = set(words)
if not words_set:
    return []

min_word_len = min(len(word) for word in words_set)
max_word_len = max(len(word) for word in words_set)

result = []
for start_idx in range(len(text)):
    # Only check substrings within valid length range
    for end_idx in range(start_idx + min_word_len - 1, 
                         min(start_idx + max_word_len, len(text))):
        substring = text[start_idx:end_idx + 1]
        if substring in words_set:
            result.append([start_idx, end_idx])

4. Not Handling Duplicate Words in Input

While converting to a set naturally handles duplicates, developers might implement alternative solutions that don't account for duplicate words, leading to duplicate results or incorrect counting.

Problem Example:

words = ["ab", "ab", "abc"]  # "ab" appears twice
# Some implementations might process "ab" twice

Solution: The set conversion automatically handles this, but if using other approaches, ensure duplicates are removed:

words_set = set(words)  # Automatically removes duplicates
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

In a binary min heap, the maximum element can be found in:


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More