1065. Index Pairs of a String π
Problem Description
You are given a string text
and an array of strings words
. Your task is to find all the substrings in text
that match any word in the words
array.
For each matching substring found, you need to return its starting and ending indices as a pair [i, j]
, where:
i
is the starting index of the substringj
is the ending index of the substring- The substring
text[i...j]
(inclusive) exists in thewords
array
The output should be an array containing all such index pairs, sorted in a specific order:
- First, sort by the starting index
i
(ascending) - If two pairs have the same starting index, sort by the ending index
j
(ascending)
For example, if text = "ababa"
and words = ["aba", "ab"]
:
- The substring
"ab"
appears at indices[0, 1]
and[2, 3]
- The substring
"aba"
appears at indices[0, 2]
and[2, 4]
- The result would be
[[0, 1], [0, 2], [2, 3], [2, 4]]
The solution uses a brute force approach that:
- Converts the
words
list to a set for O(1) lookup time - Iterates through all possible substrings of
text
using nested loops - Checks if each substring exists in the
words
set - Returns all matching index pairs in the required sorted order
Intuition
The key insight is that we need to examine every possible substring of the given text to check if it matches any word in our word list. This naturally leads us to think about generating all substrings systematically.
To generate all possible substrings, we can use two pointers: one for the starting position and one for the ending position. For each starting position i
, we try all valid ending positions j
where j >= i
. This gives us the substring text[i:j+1]
.
The brute force approach becomes clear when we realize that:
- We must check every possible substring (there's no way to skip substrings without potentially missing matches)
- For each substring, we need to verify if it exists in the
words
array
To optimize the lookup operation, we convert the words
list into a set. This transformation is crucial because:
- Checking membership in a list takes O(m) time where m is the number of words
- Checking membership in a set takes O(1) average time
- Since we're checking many substrings, this optimization significantly improves performance
The nested loop structure emerges naturally:
- Outer loop: iterate through all possible starting positions (0 to n-1)
- Inner loop: for each starting position, iterate through all possible ending positions (from the starting position to n-1)
- For each
[i, j]
pair, extracttext[i:j+1]
and check if it's in our words set
The beauty of this approach is that the nested loops inherently generate the pairs in the required sorted order - first by starting index, then by ending index - eliminating the need for additional sorting.
Solution Approach
The implementation follows a straightforward brute-force strategy with an optimization for fast lookups:
Step 1: Convert words list to a set
words = set(words)
This conversion is the first optimization. By converting the list to a set, we reduce the time complexity of checking if a substring exists in words
from O(m) to O(1) on average, where m is the number of words.
Step 2: Get the length of the text
n = len(text)
We store the length to avoid recalculating it in our loops.
Step 3: Generate all substrings and check membership
return [
[i, j] for i in range(n) for j in range(i, n) if text[i : j + 1] in words
]
This list comprehension implements the core logic:
- Outer loop
for i in range(n)
: Iterates through all possible starting positions from 0 to n-1 - Inner loop
for j in range(i, n)
: For each starting position i, iterates through all possible ending positions from i to n-1- Starting from
i
ensures we don't generate empty substrings - Going up to
n-1
ensures we check all possible substring lengths
- Starting from
- Substring extraction
text[i : j + 1]
: Creates the substring from index i to j (inclusive)- Note: We use
j + 1
because Python's slice notation[i:j+1]
includes i but excludes j+1
- Note: We use
- Membership check
if text[i : j + 1] in words
: Checks if the extracted substring exists in our words set - Result collection
[i, j]
: If the substring is found, we add the pair[i, j]
to our result
The nested loops naturally produce pairs in sorted order:
- Pairs are first ordered by
i
(outer loop runs from 0 to n-1) - For the same
i
, pairs are ordered byj
(inner loop runs from i to n-1)
Time Complexity: O(nΒ² Γ k) where n is the length of text and k is the average length of substrings being checked Space Complexity: O(m) for the set storage where m is the number of words, plus the space for the output array
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate the solution approach.
Given:
text = "abcab"
words = ["ab", "bc", "ca"]
Step 1: Convert words to a set
words_set = {"ab", "bc", "ca"}
Step 2: Generate all substrings using nested loops
Let's trace through the nested loops systematically:
When i = 0 (starting at 'a'):
- j = 0: substring = text[0:1] = "a" β not in words_set
- j = 1: substring = text[0:2] = "ab" β found in words_set, add [0, 1]
- j = 2: substring = text[0:3] = "abc" β not in words_set
- j = 3: substring = text[0:4] = "abca" β not in words_set
- j = 4: substring = text[0:5] = "abcab" β not in words_set
When i = 1 (starting at 'b'):
- j = 1: substring = text[1:2] = "b" β not in words_set
- j = 2: substring = text[1:3] = "bc" β found in words_set, add [1, 2]
- j = 3: substring = text[1:4] = "bca" β not in words_set
- j = 4: substring = text[1:5] = "bcab" β not in words_set
When i = 2 (starting at 'c'):
- j = 2: substring = text[2:3] = "c" β not in words_set
- j = 3: substring = text[2:4] = "ca" β found in words_set, add [2, 3]
- j = 4: substring = text[2:5] = "cab" β not in words_set
When i = 3 (starting at 'a'):
- j = 3: substring = text[3:4] = "a" β not in words_set
- j = 4: substring = text[3:5] = "ab" β found in words_set, add [3, 4]
When i = 4 (starting at 'b'):
- j = 4: substring = text[4:5] = "b" β not in words_set
Final Result: [[0, 1], [1, 2], [2, 3], [3, 4]]
Notice how the pairs are naturally sorted by starting index first (0, 1, 2, 3), and if there were multiple matches starting at the same index, they would be sorted by ending index due to the inner loop's progression.
Solution Implementation
1class Solution:
2 def indexPairs(self, text: str, words: List[str]) -> List[List[int]]:
3 # Convert words list to set for O(1) lookup time
4 words_set = set(words)
5
6 # Get the length of the text
7 text_length = len(text)
8
9 # Find all valid index pairs where substring exists in words set
10 result = []
11
12 # Iterate through all possible starting positions
13 for start_idx in range(text_length):
14 # Iterate through all possible ending positions from start_idx
15 for end_idx in range(start_idx, text_length):
16 # Extract substring from start_idx to end_idx (inclusive)
17 substring = text[start_idx:end_idx + 1]
18
19 # Check if the substring exists in the words set
20 if substring in words_set:
21 # Add the valid index pair to result
22 result.append([start_idx, end_idx])
23
24 return result
25
1/**
2 * Trie (Prefix Tree) data structure for efficient string matching
3 */
4class Trie {
5 // Array to store 26 children nodes (one for each lowercase letter)
6 Trie[] children = new Trie[26];
7 // Flag to mark if current node represents end of a word
8 boolean isEnd = false;
9
10 /**
11 * Inserts a word into the trie
12 * @param word The word to be inserted
13 */
14 void insert(String word) {
15 Trie currentNode = this;
16
17 // Traverse each character in the word
18 for (char character : word.toCharArray()) {
19 // Convert character to index (0-25)
20 int index = character - 'a';
21
22 // Create new node if path doesn't exist
23 if (currentNode.children[index] == null) {
24 currentNode.children[index] = new Trie();
25 }
26
27 // Move to the next node
28 currentNode = currentNode.children[index];
29 }
30
31 // Mark the last node as end of word
32 currentNode.isEnd = true;
33 }
34}
35
36/**
37 * Solution class to find all index pairs where substrings match given words
38 */
39class Solution {
40 /**
41 * Finds all index pairs [i, j] where text.substring(i, j+1) matches any word in words array
42 * @param text The text to search in
43 * @param words Array of words to search for
44 * @return 2D array of index pairs [start, end] in sorted order
45 */
46 public int[][] indexPairs(String text, String[] words) {
47 // Build trie with all words
48 Trie trie = new Trie();
49 for (String word : words) {
50 trie.insert(word);
51 }
52
53 int textLength = text.length();
54 List<int[]> resultList = new ArrayList<>();
55
56 // Check all possible starting positions in text
57 for (int startIndex = 0; startIndex < textLength; ++startIndex) {
58 Trie currentNode = trie;
59
60 // Try to match from current starting position
61 for (int endIndex = startIndex; endIndex < textLength; ++endIndex) {
62 // Get character index (0-25) for current character
63 int charIndex = text.charAt(endIndex) - 'a';
64
65 // No matching path in trie, stop searching from this start position
66 if (currentNode.children[charIndex] == null) {
67 break;
68 }
69
70 // Move to next node in trie
71 currentNode = currentNode.children[charIndex];
72
73 // If current node marks end of a word, we found a match
74 if (currentNode.isEnd) {
75 resultList.add(new int[] {startIndex, endIndex});
76 }
77 }
78 }
79
80 // Convert list to 2D array and return
81 return resultList.toArray(new int[resultList.size()][2]);
82 }
83}
84
1class Trie {
2public:
3 // Array to store 26 children nodes (for lowercase letters a-z)
4 vector<Trie*> children;
5 // Flag to mark if current node represents end of a word
6 bool isEnd = false;
7
8 // Constructor: Initialize children array with 26 null pointers
9 Trie() {
10 children.resize(26);
11 }
12
13 // Insert a word into the trie
14 void insert(string word) {
15 Trie* currentNode = this;
16
17 // Traverse each character in the word
18 for (char ch : word) {
19 // Convert character to index (0-25)
20 int index = ch - 'a';
21
22 // Create new node if path doesn't exist
23 if (!currentNode->children[index]) {
24 currentNode->children[index] = new Trie();
25 }
26
27 // Move to the child node
28 currentNode = currentNode->children[index];
29 }
30
31 // Mark the last node as end of word
32 currentNode->isEnd = true;
33 }
34};
35
36class Solution {
37public:
38 // Find all index pairs where substrings match words in dictionary
39 vector<vector<int>> indexPairs(string text, vector<string>& words) {
40 // Build trie from all words
41 Trie* trie = new Trie();
42 for (const auto& word : words) {
43 trie->insert(word);
44 }
45
46 int textLength = text.size();
47 vector<vector<int>> result;
48
49 // Check all possible starting positions in text
50 for (int startPos = 0; startPos < textLength; ++startPos) {
51 Trie* currentNode = trie;
52
53 // Try to match from current starting position
54 for (int endPos = startPos; endPos < textLength; ++endPos) {
55 // Get character index (0-25 for a-z)
56 int charIndex = text[endPos] - 'a';
57
58 // No matching path in trie, stop searching from this start position
59 if (!currentNode->children[charIndex]) {
60 break;
61 }
62
63 // Move to next node in trie
64 currentNode = currentNode->children[charIndex];
65
66 // Found a complete word, add index pair to result
67 if (currentNode->isEnd) {
68 result.push_back({startPos, endPos});
69 }
70 }
71 }
72
73 return result;
74 }
75};
76
1// Trie node structure for efficient string matching
2class TrieNode {
3 // Array to store 26 children nodes (for lowercase letters a-z)
4 children: (TrieNode | null)[];
5 // Flag to mark if current node represents end of a word
6 isEnd: boolean;
7
8 // Constructor: Initialize children array with 26 null values
9 constructor() {
10 this.children = new Array(26).fill(null);
11 this.isEnd = false;
12 }
13}
14
15// Global trie root variable
16let trieRoot: TrieNode;
17
18// Insert a word into the trie
19function insertWord(word: string): void {
20 let currentNode = trieRoot;
21
22 // Traverse each character in the word
23 for (const char of word) {
24 // Convert character to index (0-25 for lowercase a-z)
25 const index = char.charCodeAt(0) - 'a'.charCodeAt(0);
26
27 // Create new node if path doesn't exist
28 if (!currentNode.children[index]) {
29 currentNode.children[index] = new TrieNode();
30 }
31
32 // Move to the child node
33 currentNode = currentNode.children[index]!;
34 }
35
36 // Mark the last node as end of word
37 currentNode.isEnd = true;
38}
39
40// Find all index pairs where substrings match words in dictionary
41function indexPairs(text: string, words: string[]): number[][] {
42 // Initialize and build trie from all words
43 trieRoot = new TrieNode();
44 for (const word of words) {
45 insertWord(word);
46 }
47
48 const textLength = text.length;
49 const result: number[][] = [];
50
51 // Check all possible starting positions in text
52 for (let startPos = 0; startPos < textLength; startPos++) {
53 let currentNode = trieRoot;
54
55 // Try to match from current starting position
56 for (let endPos = startPos; endPos < textLength; endPos++) {
57 // Get character index (0-25 for lowercase a-z)
58 const charIndex = text.charCodeAt(endPos) - 'a'.charCodeAt(0);
59
60 // No matching path in trie, stop searching from this start position
61 if (!currentNode.children[charIndex]) {
62 break;
63 }
64
65 // Move to next node in trie
66 currentNode = currentNode.children[charIndex]!;
67
68 // Found a complete word, add index pair to result
69 if (currentNode.isEnd) {
70 result.push([startPos, endPos]);
71 }
72 }
73 }
74
75 return result;
76}
77
Time and Space Complexity
Time Complexity: O(nΒ² Γ m)
The algorithm uses nested loops to iterate through all possible substrings of the text:
- The outer loop runs
n
times (wheren
is the length of text) - The inner loop runs up to
n
times for each iteration of the outer loop - For each pair
(i, j)
, we extract substringtext[i:j+1]
which takesO(j-i+1)
time in the worst case, up toO(n)
- Checking membership in the set
words
takesO(m)
time wherem
is the average length of words in the set (for string hashing and comparison)
Total: O(nΒ²)
iterations Γ O(n)
for substring extraction Γ O(m)
for set lookup = O(nΒ³ Γ m)
However, since Python optimizes string slicing and hashing, and considering that m
(average word length) is typically much smaller than n
, the practical complexity is closer to O(nΒ² Γ m)
.
Space Complexity: O(k Γ m + nΒ²)
O(k Γ m)
for storing the set of words, wherek
is the number of words andm
is the average word length- The output list can contain up to
O(nΒ²)
pairs in the worst case (when every possible substring is a match) - Additional
O(n)
space for substring creation during iteration
Total space complexity: O(k Γ m + nΒ²)
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Forgetting to Handle Empty Strings in Words Array
One common pitfall is not considering that the words
array might contain empty strings. If an empty string exists in words
, the algorithm would generate index pairs like [0, -1]
, [1, 0]
, etc., which are invalid since the ending index would be less than the starting index.
Problem Example:
text = "abc" words = ["", "ab"] # Empty string in words # Would attempt to check text[i:i] which is empty string
Solution: Filter out empty strings when converting to a set:
words_set = set(word for word in words if word)
2. Incorrect Index Range in Substring Extraction
A frequent mistake is using text[i:j]
instead of text[i:j+1]
when extracting substrings. This off-by-one error occurs because Python's slice notation is exclusive of the end index, but the problem expects inclusive indices.
Problem Example:
# Incorrect: substring = text[start_idx:end_idx] # Missing the character at end_idx # If text = "abc" and we want substring from index 0 to 2: # text[0:2] gives "ab" instead of "abc"
Solution:
Always use j+1
in the slice operation:
substring = text[start_idx:end_idx + 1] # Correct: includes character at end_idx
3. Performance Degradation with Large Words Array
While converting to a set helps with lookup time, if the words
array contains very long strings, the set creation and substring comparison can become expensive due to string hashing overhead.
Problem Example:
# If words contains many long strings: words = ["a" * 1000, "b" * 1000, ...] # Very long strings # Each substring comparison requires hashing potentially long strings
Solution: Pre-compute the maximum and minimum word lengths to avoid checking substrings that can't possibly match:
words_set = set(words)
if not words_set:
return []
min_word_len = min(len(word) for word in words_set)
max_word_len = max(len(word) for word in words_set)
result = []
for start_idx in range(len(text)):
# Only check substrings within valid length range
for end_idx in range(start_idx + min_word_len - 1,
min(start_idx + max_word_len, len(text))):
substring = text[start_idx:end_idx + 1]
if substring in words_set:
result.append([start_idx, end_idx])
4. Not Handling Duplicate Words in Input
While converting to a set naturally handles duplicates, developers might implement alternative solutions that don't account for duplicate words, leading to duplicate results or incorrect counting.
Problem Example:
words = ["ab", "ab", "abc"] # "ab" appears twice # Some implementations might process "ab" twice
Solution: The set conversion automatically handles this, but if using other approaches, ensure duplicates are removed:
words_set = set(words) # Automatically removes duplicates
In a binary min heap, the maximum element can be found in:
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Want a Structured Path to Master System Design Too? Donβt Miss This!