1065. Index Pairs of a String


Problem Description

This problem asks for a function that, given a string text and an array of strings words, finds all substrings within text that match any of the strings in words. For each match, the function should return the starting and ending indices of the substring in the text. The output should consist of an array of index pairs [i, j], where text[i...j] (inclusive) is a substring found in the words list.

For example, if text = "dogcat" and words = ["cat", "dog"], the function should return [[0, 2], [3, 5]] since text[0...2] is "dog" and text[3...5] is "cat".

The output array should be sorted first by the starting index i and then by the ending index j in the case of a tie.

Intuition

The intuitive approach to solving this problem involves checking each possible substring of text against all words in the words list to see if there is a match. However, this naive approach would be inefficient as it would require a comparison of possibly every substring of text with every word in words, resulting in a potentially high time complexity.

To optimize, we can use a Trie data structure. A Trie is a type of search tree, an ordered data structure used to store a dynamic set of strings. It allows us to efficiently search for a string in a fixed set of strings, which is particularly useful for dealing with problems involving a dictionary of words (such as autocomplete features).

The idea is to first insert all the words from the words list into the Trie. Each node in the Trie represents a character from 'a' to 'z'. A node will have a flag is_end to denote whether the node represents the end of a word in the Trie.

Then, for each possible starting index i in the text, we will try to find any word in the words list that starts with text[i]. We do this by walking through the Trie tree with the characters in text starting from i. If we find a null child before reaching the end of a word, this means there's no match starting from this character, and we continue to the next starting index in text. If the node's is_end is true, then we found a word in words that matches the substring starting at i, and we add the current pair of indices [i, j] to our answers list.

This approach is much faster than comparing each substring, especially when there are many words in words or when words contain long strings.

Solution Approach

The solution employs a Trie data structure to optimize the process of searching for words within the text. Here's how the implementation works:

  1. A Trie class is defined, which includes an array called children of length 26, to account for every letter of the English alphabet, and a boolean is_end that signifies whether a node corresponds to the end of a word in the Trie.

  2. An insert method is added to the Trie class, allowing the insertion of a word into the Trie. This method is important because it sets up the Trie so that the indexPairs function can efficiently search for substrings later. The insert method works as follows:

    • For every character in the word to be inserted, it calculates the index of the character, assuming 'a' is at index 0 (using ord(c) - ord('a')).
    • It checks if there is already a child node for that character; if not, it creates a new Trie node at the respective index.
    • It then proceeds to the next character in the word until all characters are inserted.
    • After inserting the last character of the word, it marks that node as an end node by setting is_end to True.
  3. In the Solution class, the indexPairs method is used to find all index pairs from the text that match any word within the words list by utilizing the Trie with the following steps:

    • First, for each word in words, the method insert is called to create the Trie representation of the words list.
    • An empty list ans is initialized to store the index pairs that will be the output of this function.
    • The function then iterates through every possible starting index i in text.
    • For each starting index i, the function attempts to form words by checking subsequent characters until either a word is found or an invalid (null) Trie path is encountered:
      • For each character text[j] starting from index i, the function calculates its index in the Trie.
      • If the Trie does not have a child node corresponding to text[j], meaning no word starts with the substring text[i...j], the loop breaks.
      • If the current Trie node signals the end of a word (node.is_end is True), we've found a complete word, and the indices [i, j] are added to the ans list.
    • Once all possible start indices i are tested, the method returns the filled-in ans list containing all matching index pairs.

This solution is efficient as it uses the Trie to quickly skip over non-matching substrings of text and recognizes the end of words without needing to check the words list for each substring explicitly.

Example Walkthrough

Let's walk through a small example to illustrate the solution approach.

Assume we have the following text and words:

1text = "themanevent"
2words = ["man", "event", "the"]

We want to find all substrings in text that match any of the words in words. The expected output is the index pairs that correspond to these substrings. For the given example, the expected output would be [[0, 2], [3, 5], [6, 10]] because:

  • text[0...2] corresponds to "the".
  • text[3...5] corresponds to "man".
  • text[6...10] corresponds to "event".

Here is how the solution approach works step by step:

  1. Create Trie: We start by constructing a Trie and inserting all the words from words into it.

    Trie after Insertion:

    1root
    2 |
    3 t - h - e - (end)
    4 |
    5 m - a - n - (end)
    6            |
    7            e - v - e - n - t - (end)
    • Each letter represents a node pointing to its children.
    • "(end)" signifies a complete word at that node.
  2. Search for Substrings:

    • We check each starting index i in text.
    • Initialize ans as [], to store our results.

    For i=0 ("t" of "themanevent"):

    • We traverse the Trie. "the" is in the words list, so we add [0, 2] to ans.

    For i=1 ("h" of "hemanevent"):

    • "h" does not start any word by itself, so we do not add anything to ans.

    For i=2 ("e" of "emanevent"):

    • There is no word starting with "e" directly (only as a part of other words), so no indices are added to ans.

    For i=3 ("m" of "manevent"):

    • "man" is a word, so we add [3, 5] to ans.

    For i=4 and onwards:

    • Repeat above steps for each index until the end of text is reached.
    • When i=6, "event" is found, so we add [6, 10] to ans.
  3. Result: After testing all indices, our ans is [[0, 2], [3, 5], [6, 10]], which we then return.

This method allows us to efficiently find all index pairs corresponding to words within the text using the Trie without having to check every possible substring against all words in words. By leveraging the Trie's structure, where each path down the Trie represents a potential word, we quickly identify substrings without exhaustive searching.

Python Solution

1# Defines the Trie data structure
2class Trie:
3    def __init__(self):
4        # Initialize all possible children for each letter in the alphabet
5        self.children = [None] * 26
6        # A flag to indicate the end of a word
7        self.is_end_of_word = False
8
9    # Function to insert a word into the Trie
10    def insert(self, word):
11        node = self
12        for char in word:
13            index = ord(char) - ord('a')
14            if node.children[index] is None:
15                node.children[index] = Trie()
16            node = node.children[index]
17        node.is_end_of_word = True
18
19
20# Solution class to find index pairs in text where the words start and end
21class Solution:
22    def index_pairs(self, text: str, words: list[str]) -> list[list[int]]:
23        # Initialize a Trie
24        trie = Trie()
25        # Insert all words into the Trie
26        for word in words:
27            trie.insert(word)
28      
29        # Get the length of the text
30        length_of_text = len(text)
31        # This will store our answers
32        answer = []
33
34        # Go through each character in text as a potential starting point
35        for start in range(length_of_text):
36            node = trie
37            # Explore the substring from the starting point
38            for end in range(start, length_of_text):
39                index = ord(text[end]) - ord('a')
40                if node.children[index] is None:
41                    break
42                node = node.children[index]
43                # If we reach the end of a word, record the current indices
44                if node.is_end_of_word:
45                    answer.append([start, end])
46        # Return the list of start and end index pairs
47        return answer
48

Java Solution

1// Trie class to hold the Trie nodes and provide an insert method.
2class Trie {
3    // Trie array to hold references to child nodes.
4    private Trie[] children = new Trie[26];
5    // Flag to denote the end of a word.
6    private boolean isEndOfWord = false;
7
8    // Method to insert a word into the Trie.
9    public void insert(String word) {
10        Trie node = this;
11        for (char ch : word.toCharArray()) {
12            int index = ch - 'a'; // Normalize the character to an index 0-25.
13            // If the child node at that index doesn't exist, create a new Trie node.
14            if (node.children[index] == null) {
15                node.children[index] = new Trie();
16            }
17            // Move to the child node.
18            node = node.children[index];
19        }
20        // Mark the end of the word.
21        node.isEndOfWord = true;
22    }
23}
24
25// Solution class containing method to find and return index pairs of text that match any word in words array.
26class Solution {
27    public int[][] indexPairs(String text, String[] words) {
28        // Initialize a Trie and insert all words from the array.
29        Trie trie = new Trie();
30        for (String word : words) {
31            trie.insert(word);
32        }
33      
34        // The length of the given text string.
35        int textLength = text.length();
36        // List to store the index pairs.
37        List<int[]> matchingIndexPairs = new ArrayList<>();
38      
39        // Loop through the text to find all index pairs where words begin.
40        for (int startIndex = 0; startIndex < textLength; ++startIndex) {
41            Trie node = trie;
42            for (int endIndex = startIndex; endIndex < textLength; ++endIndex) {
43                // Calculate the index in the Trie children array that corresponds to the current character.
44                int idx = text.charAt(endIndex) - 'a';
45                // If there's no child node at this index, stop this iteration.
46                if (node.children[idx] == null) {
47                    break;
48                }
49                // Move to the corresponding child node.
50                node = node.children[idx];
51                // If the current node marks the end of a word, add the start/end index pair to the list.
52                if (node.isEndOfWord) {
53                    matchingIndexPairs.add(new int[] {startIndex, endIndex});
54                }
55            }
56        }
57        // Convert the list of index pairs to an int[][] array before returning.
58        return matchingIndexPairs.toArray(new int[matchingIndexPairs.size()][2]);
59    }
60}
61

C++ Solution

1#include <string>
2#include <vector>
3
4// A TrieNode represents each node of the Trie.
5class TrieNode {
6public:
7    std::vector<TrieNode*> children; // Pointers to children nodes
8    bool isEndOfWord = false; // Flag to check if the node is the end of a word
9
10    // Constructor initializes the children to hold 26 elements for each alphabet letter.
11    TrieNode() : children(26, nullptr) {
12    }
13
14    // Method to insert a word into the trie.
15    void insert(const std::string& word) {
16        TrieNode* node = this; // Start from the root node
17        for (char c : word) {
18            int index = c - 'a'; // Convert character to index (0-25)
19            // If there is no node for this character, create it.
20            if (!node->children[index]) {
21                node->children[index] = new TrieNode();
22            }
23            // Move to the child node
24            node = node->children[index];
25        }
26        // Once all characters are inserted, mark the end of the word.
27        node->isEndOfWord = true;
28    }
29};
30
31class Solution {
32public:
33    // Method to return index pairs for all words in 'text' present in 'words' vector.
34    std::vector<std::vector<int>> indexPairs(const std::string& text, const std::vector<std::string>& words) {
35        // Build Trie for given set of words for efficient lookup
36        TrieNode* trie = new TrieNode();
37        for (const auto& word : words) {
38            trie->insert(word);
39        }
40
41        int n = text.size();
42        std::vector<std::vector<int>> result; // To store the result of index pairs
43
44        // Iterate over each character of the text
45        for (int i = 0; i < n; ++i) {
46            TrieNode* node = trie;
47            // Start checking for word from each index i
48            for (int j = i; j < n; ++j) {
49                int idx = text[j] - 'a'; // Convert character to index
50                // If the character is not the start of any word, break
51                if (!node->children[idx]) break;
52                // Otherwise, move to the child node
53                node = node->children[idx];
54                // If the node marks the end of a word, record the index pair
55                if (node->isEndOfWord) {
56                    result.push_back({i, j});
57                }
58            }
59        }
60        // Deallocate memory used by Trie
61        delete trie;
62        // Return the list of index pairs
63        return result;
64    }
65};
66

Typescript Solution

1// TrieNode represents each node of the Trie.
2class TrieNode {
3    children: TrieNode[];
4    isEndOfWord: boolean;
5
6    // Constructor initializes the children to have 26 positions for each alphabet letter.
7    constructor() {
8        this.children = Array(26).fill(null); // Initialize all positions to null
9        this.isEndOfWord = false;
10    }
11
12    // Method to insert a word into the trie.
13    insert(word: string): void {
14        let node: TrieNode = this; // Start from the root node
15        for (const c of word) {
16            const index: number = c.charCodeAt(0) - 'a'.charCodeAt(0); // Convert character to index (0-25)
17            // If there's no node for this character, create it.
18            if (!node.children[index]) {
19                node.children[index] = new TrieNode();
20            }
21            // Move to the child node
22            node = node.children[index];
23        }
24        // Once all characters are inserted, mark this node as the end of a word.
25        node.isEndOfWord = true;
26    }
27}
28
29// Function to return index pairs for all words in 'text' present in 'words' array.
30function indexPairs(text: string, words: string[]): number[][] {
31    // Build Trie for the given set of words for efficient lookup.
32    const trie: TrieNode = new TrieNode();
33    words.forEach(word => {
34        trie.insert(word);
35    });
36
37    const n: number = text.length;
38    const result: number[][] = []; // To store index pairs.
39
40    // Iterate over each character of the text.
41    for (let i = 0; i < n; ++i) {
42        let node: TrieNode = trie;
43        // Start checking for word from each index 'i'.
44        for (let j = i; j < n; ++j) {
45            const index: number = text.charCodeAt(j) - 'a'.charCodeAt(0); // Convert character to index.
46            // If the character is not the start of any word, break.
47            if (!node.children[index]) break;
48            // Otherwise, move to the child node.
49            node = node.children[index];
50            // If the node marks the end of a word, record the index pair.
51            if (node.isEndOfWord) {
52                result.push([i, j]);
53            }
54        }
55    }
56    // Return the list of index pairs.
57    return result;
58}
59

Time and Space Complexity

Time Complexity

The time complexity of the given code is comprised of two parts: (1) Building the Trie, and (2) Searching for words in the text using the Trie.

  1. Building the Trie:

    • Each word from the words list is inserted into the Trie.
    • If we assume the average length of a word in words is L, and there are W words, then the complexity for building the Trie is O(W * L).
  2. Searching in text using the Trie:

    • For each position i in the text of length N, a check in the Trie is initiated.
    • In the worst case, each check can go up to the length of the text (N), if all characters are part of words in the Trie.
    • This nested loop results in a complexity of O(N^2) in the worst case.

Combining both parts, we have the worst-case time complexity as O(W * L + N^2).

Space Complexity

The space complexity is also composed of two parts: the space required for the Trie and the space required for the answer list.

  1. Trie Space:

    • The Trie has at most 26 * W * L nodes (26 possible children for each character of each word). However, due to the sharing of common prefixes, the actual space will often be much less.
    • The worst-case space complexity for the Trie is O(26 * W * L).
  2. Answer List Space:

    • In the worst case, every substring of text starting at i could be a word in Trie, so this could be as large as O(N^2) since starting from each index there can be N-i potential words and there are N potential starting indices.
    • But we only store the indices, not the actual substrings, which are O(1) space each, so the more accurate complexity for the answer list is O(N^2).

The total space complexity is O(26 * W * L + N^2), which is O(W * L + N^2) because the 26 is a constant factor that doesn't change with the input size and can be omitted in big O notation.


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫