1065. Index Pairs of a String
Problem Description
This problem asks for a function that, given a string text
and an array of strings words
, finds all substrings within text
that match any of the strings in words
. For each match, the function should return the starting and ending indices of the substring in the text. The output should consist of an array of index pairs [i, j]
, where text[i...j]
(inclusive) is a substring found in the words
list.
For example, if text = "dogcat"
and words = ["cat", "dog"]
, the function should return [[0, 2], [3, 5]]
since text[0...2]
is "dog" and text[3...5]
is "cat".
The output array should be sorted first by the starting index i
and then by the ending index j
in the case of a tie.
Intuition
The intuitive approach to solving this problem involves checking each possible substring of text
against all words in the words
list to see if there is a match. However, this naive approach would be inefficient as it would require a comparison of possibly every substring of text
with every word in words
, resulting in a potentially high time complexity.
To optimize, we can use a Trie data structure. A Trie is a type of search tree, an ordered data structure used to store a dynamic set of strings. It allows us to efficiently search for a string in a fixed set of strings, which is particularly useful for dealing with problems involving a dictionary of words (such as autocomplete features).
The idea is to first insert all the words from the words
list into the Trie. Each node in the Trie represents a character from 'a' to 'z'. A node will have a flag is_end
to denote whether the node represents the end of a word in the Trie.
Then, for each possible starting index i
in the text
, we will try to find any word in the words
list that starts with text[i]
. We do this by walking through the Trie tree with the characters in text
starting from i
. If we find a null child before reaching the end of a word, this means there's no match starting from this character, and we continue to the next starting index in text
. If the node's is_end
is true, then we found a word in words
that matches the substring starting at i
, and we add the current pair of indices [i, j]
to our answers list.
This approach is much faster than comparing each substring, especially when there are many words in words
or when words
contain long strings.
Solution Approach
The solution employs a Trie data structure to optimize the process of searching for words within the text. Here's how the implementation works:
-
A Trie class is defined, which includes an array called
children
of length 26, to account for every letter of the English alphabet, and a booleanis_end
that signifies whether a node corresponds to the end of a word in the Trie. -
An
insert
method is added to the Trie class, allowing the insertion of a word into the Trie. This method is important because it sets up the Trie so that theindexPairs
function can efficiently search for substrings later. Theinsert
method works as follows:- For every character in the word to be inserted, it calculates the index of the character, assuming 'a' is at index 0 (using
ord(c) - ord('a')
). - It checks if there is already a child node for that character; if not, it creates a new Trie node at the respective index.
- It then proceeds to the next character in the word until all characters are inserted.
- After inserting the last character of the word, it marks that node as an end node by setting
is_end
toTrue
.
- For every character in the word to be inserted, it calculates the index of the character, assuming 'a' is at index 0 (using
-
In the
Solution
class, theindexPairs
method is used to find all index pairs from thetext
that match any word within thewords
list by utilizing the Trie with the following steps:- First, for each word in
words
, the methodinsert
is called to create the Trie representation of thewords
list. - An empty list
ans
is initialized to store the index pairs that will be the output of this function. - The function then iterates through every possible starting index
i
intext
. - For each starting index
i
, the function attempts to form words by checking subsequent characters until either a word is found or an invalid (null) Trie path is encountered:- For each character
text[j]
starting from indexi
, the function calculates its index in the Trie. - If the Trie does not have a child node corresponding to
text[j]
, meaning no word starts with the substringtext[i...j]
, the loop breaks. - If the current Trie node signals the end of a word (
node.is_end
isTrue
), we've found a complete word, and the indices[i, j]
are added to theans
list.
- For each character
- Once all possible start indices
i
are tested, the method returns the filled-inans
list containing all matching index pairs.
- First, for each word in
This solution is efficient as it uses the Trie to quickly skip over non-matching substrings of text
and recognizes the end of words without needing to check the words
list for each substring explicitly.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example to illustrate the solution approach.
Assume we have the following text
and words
:
1text = "themanevent" 2words = ["man", "event", "the"]
We want to find all substrings in text
that match any of the words in words
. The expected output is the index pairs that correspond to these substrings. For the given example, the expected output would be [[0, 2], [3, 5], [6, 10]]
because:
text[0...2]
corresponds to "the".text[3...5]
corresponds to "man".text[6...10]
corresponds to "event".
Here is how the solution approach works step by step:
-
Create Trie: We start by constructing a Trie and inserting all the words from
words
into it.Trie after Insertion:
1root 2 | 3 t - h - e - (end) 4 | 5 m - a - n - (end) 6 | 7 e - v - e - n - t - (end)
- Each letter represents a node pointing to its children.
- "(end)" signifies a complete word at that node.
-
Search for Substrings:
- We check each starting index
i
intext
. - Initialize
ans
as[]
, to store our results.
For
i=0
("t" of "themanevent"):- We traverse the Trie. "the" is in the
words
list, so we add[0, 2]
toans
.
For
i=1
("h" of "hemanevent"):- "h" does not start any word by itself, so we do not add anything to
ans
.
For
i=2
("e" of "emanevent"):- There is no word starting with "e" directly (only as a part of other words), so no indices are added to
ans
.
For
i=3
("m" of "manevent"):- "man" is a word, so we add
[3, 5]
toans
.
For
i=4
and onwards:- Repeat above steps for each index until the end of
text
is reached. - When
i=6
, "event" is found, so we add[6, 10]
toans
.
- We check each starting index
-
Result: After testing all indices, our
ans
is[[0, 2], [3, 5], [6, 10]]
, which we then return.
This method allows us to efficiently find all index pairs corresponding to words within the text using the Trie without having to check every possible substring against all words in words
. By leveraging the Trie's structure, where each path down the Trie represents a potential word, we quickly identify substrings without exhaustive searching.
Solution Implementation
1# Defines the Trie data structure
2class Trie:
3 def __init__(self):
4 # Initialize all possible children for each letter in the alphabet
5 self.children = [None] * 26
6 # A flag to indicate the end of a word
7 self.is_end_of_word = False
8
9 # Function to insert a word into the Trie
10 def insert(self, word):
11 node = self
12 for char in word:
13 index = ord(char) - ord('a')
14 if node.children[index] is None:
15 node.children[index] = Trie()
16 node = node.children[index]
17 node.is_end_of_word = True
18
19
20# Solution class to find index pairs in text where the words start and end
21class Solution:
22 def index_pairs(self, text: str, words: list[str]) -> list[list[int]]:
23 # Initialize a Trie
24 trie = Trie()
25 # Insert all words into the Trie
26 for word in words:
27 trie.insert(word)
28
29 # Get the length of the text
30 length_of_text = len(text)
31 # This will store our answers
32 answer = []
33
34 # Go through each character in text as a potential starting point
35 for start in range(length_of_text):
36 node = trie
37 # Explore the substring from the starting point
38 for end in range(start, length_of_text):
39 index = ord(text[end]) - ord('a')
40 if node.children[index] is None:
41 break
42 node = node.children[index]
43 # If we reach the end of a word, record the current indices
44 if node.is_end_of_word:
45 answer.append([start, end])
46 # Return the list of start and end index pairs
47 return answer
48
1// Trie class to hold the Trie nodes and provide an insert method.
2class Trie {
3 // Trie array to hold references to child nodes.
4 private Trie[] children = new Trie[26];
5 // Flag to denote the end of a word.
6 private boolean isEndOfWord = false;
7
8 // Method to insert a word into the Trie.
9 public void insert(String word) {
10 Trie node = this;
11 for (char ch : word.toCharArray()) {
12 int index = ch - 'a'; // Normalize the character to an index 0-25.
13 // If the child node at that index doesn't exist, create a new Trie node.
14 if (node.children[index] == null) {
15 node.children[index] = new Trie();
16 }
17 // Move to the child node.
18 node = node.children[index];
19 }
20 // Mark the end of the word.
21 node.isEndOfWord = true;
22 }
23}
24
25// Solution class containing method to find and return index pairs of text that match any word in words array.
26class Solution {
27 public int[][] indexPairs(String text, String[] words) {
28 // Initialize a Trie and insert all words from the array.
29 Trie trie = new Trie();
30 for (String word : words) {
31 trie.insert(word);
32 }
33
34 // The length of the given text string.
35 int textLength = text.length();
36 // List to store the index pairs.
37 List<int[]> matchingIndexPairs = new ArrayList<>();
38
39 // Loop through the text to find all index pairs where words begin.
40 for (int startIndex = 0; startIndex < textLength; ++startIndex) {
41 Trie node = trie;
42 for (int endIndex = startIndex; endIndex < textLength; ++endIndex) {
43 // Calculate the index in the Trie children array that corresponds to the current character.
44 int idx = text.charAt(endIndex) - 'a';
45 // If there's no child node at this index, stop this iteration.
46 if (node.children[idx] == null) {
47 break;
48 }
49 // Move to the corresponding child node.
50 node = node.children[idx];
51 // If the current node marks the end of a word, add the start/end index pair to the list.
52 if (node.isEndOfWord) {
53 matchingIndexPairs.add(new int[] {startIndex, endIndex});
54 }
55 }
56 }
57 // Convert the list of index pairs to an int[][] array before returning.
58 return matchingIndexPairs.toArray(new int[matchingIndexPairs.size()][2]);
59 }
60}
61
1#include <string>
2#include <vector>
3
4// A TrieNode represents each node of the Trie.
5class TrieNode {
6public:
7 std::vector<TrieNode*> children; // Pointers to children nodes
8 bool isEndOfWord = false; // Flag to check if the node is the end of a word
9
10 // Constructor initializes the children to hold 26 elements for each alphabet letter.
11 TrieNode() : children(26, nullptr) {
12 }
13
14 // Method to insert a word into the trie.
15 void insert(const std::string& word) {
16 TrieNode* node = this; // Start from the root node
17 for (char c : word) {
18 int index = c - 'a'; // Convert character to index (0-25)
19 // If there is no node for this character, create it.
20 if (!node->children[index]) {
21 node->children[index] = new TrieNode();
22 }
23 // Move to the child node
24 node = node->children[index];
25 }
26 // Once all characters are inserted, mark the end of the word.
27 node->isEndOfWord = true;
28 }
29};
30
31class Solution {
32public:
33 // Method to return index pairs for all words in 'text' present in 'words' vector.
34 std::vector<std::vector<int>> indexPairs(const std::string& text, const std::vector<std::string>& words) {
35 // Build Trie for given set of words for efficient lookup
36 TrieNode* trie = new TrieNode();
37 for (const auto& word : words) {
38 trie->insert(word);
39 }
40
41 int n = text.size();
42 std::vector<std::vector<int>> result; // To store the result of index pairs
43
44 // Iterate over each character of the text
45 for (int i = 0; i < n; ++i) {
46 TrieNode* node = trie;
47 // Start checking for word from each index i
48 for (int j = i; j < n; ++j) {
49 int idx = text[j] - 'a'; // Convert character to index
50 // If the character is not the start of any word, break
51 if (!node->children[idx]) break;
52 // Otherwise, move to the child node
53 node = node->children[idx];
54 // If the node marks the end of a word, record the index pair
55 if (node->isEndOfWord) {
56 result.push_back({i, j});
57 }
58 }
59 }
60 // Deallocate memory used by Trie
61 delete trie;
62 // Return the list of index pairs
63 return result;
64 }
65};
66
1// TrieNode represents each node of the Trie.
2class TrieNode {
3 children: TrieNode[];
4 isEndOfWord: boolean;
5
6 // Constructor initializes the children to have 26 positions for each alphabet letter.
7 constructor() {
8 this.children = Array(26).fill(null); // Initialize all positions to null
9 this.isEndOfWord = false;
10 }
11
12 // Method to insert a word into the trie.
13 insert(word: string): void {
14 let node: TrieNode = this; // Start from the root node
15 for (const c of word) {
16 const index: number = c.charCodeAt(0) - 'a'.charCodeAt(0); // Convert character to index (0-25)
17 // If there's no node for this character, create it.
18 if (!node.children[index]) {
19 node.children[index] = new TrieNode();
20 }
21 // Move to the child node
22 node = node.children[index];
23 }
24 // Once all characters are inserted, mark this node as the end of a word.
25 node.isEndOfWord = true;
26 }
27}
28
29// Function to return index pairs for all words in 'text' present in 'words' array.
30function indexPairs(text: string, words: string[]): number[][] {
31 // Build Trie for the given set of words for efficient lookup.
32 const trie: TrieNode = new TrieNode();
33 words.forEach(word => {
34 trie.insert(word);
35 });
36
37 const n: number = text.length;
38 const result: number[][] = []; // To store index pairs.
39
40 // Iterate over each character of the text.
41 for (let i = 0; i < n; ++i) {
42 let node: TrieNode = trie;
43 // Start checking for word from each index 'i'.
44 for (let j = i; j < n; ++j) {
45 const index: number = text.charCodeAt(j) - 'a'.charCodeAt(0); // Convert character to index.
46 // If the character is not the start of any word, break.
47 if (!node.children[index]) break;
48 // Otherwise, move to the child node.
49 node = node.children[index];
50 // If the node marks the end of a word, record the index pair.
51 if (node.isEndOfWord) {
52 result.push([i, j]);
53 }
54 }
55 }
56 // Return the list of index pairs.
57 return result;
58}
59
Time and Space Complexity
Time Complexity
The time complexity of the given code is comprised of two parts: (1) Building the Trie, and (2) Searching for words in the text using the Trie.
-
Building the Trie:
- Each word from the
words
list is inserted into the Trie. - If we assume the average length of a word in
words
isL
, and there areW
words, then the complexity for building the Trie isO(W * L)
.
- Each word from the
-
Searching in text using the Trie:
- For each position
i
in the text of lengthN
, a check in the Trie is initiated. - In the worst case, each check can go up to the length of the text (
N
), if all characters are part of words in the Trie. - This nested loop results in a complexity of
O(N^2)
in the worst case.
- For each position
Combining both parts, we have the worst-case time complexity as O(W * L + N^2)
.
Space Complexity
The space complexity is also composed of two parts: the space required for the Trie and the space required for the answer list.
-
Trie Space:
- The Trie has at most
26 * W * L
nodes (26
possible children for each character of each word). However, due to the sharing of common prefixes, the actual space will often be much less. - The worst-case space complexity for the Trie is
O(26 * W * L)
.
- The Trie has at most
-
Answer List Space:
- In the worst case, every substring of text starting at
i
could be a word in Trie, so this could be as large asO(N^2)
since starting from each index there can beN-i
potential words and there areN
potential starting indices. - But we only store the indices, not the actual substrings, which are
O(1)
space each, so the more accurate complexity for the answer list isO(N^2)
.
- In the worst case, every substring of text starting at
The total space complexity is O(26 * W * L + N^2)
, which is O(W * L + N^2)
because the 26
is a constant factor that doesn't change with the input size and can be omitted in big O notation.
Learn more about how to find time and space complexity quickly using problem constraints.
A heap is a ...?
Recommended Readings
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Patterns The Shortest Path Algorithm for Coding Interviews The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we can determine the
Got a question?ย Ask the Monster Assistantย anything you don't understand.
Still not clear? ย Submitย the part you don't understand to our editors. Or join ourย Discord and ask the community.