745. Prefix and Suffix Search

HardDesignTrieHash TableString

Problem Description

The problem requires the design of a specialized dictionary that allows for searching words by both a prefix and a suffix. Here's how the class should work:

The WordFilter class should be initialized with an array of words. These are the words that are going to be added to the dictionary.
The class should have a function f(pref, suff) that, given a prefix pref and a suffix suff, returns the highest index of any word that starts with pref and ends with suff. If there are no words that meet these criteria, the function should return -1.

This problem is about efficiently managing a set of strings so that they can be queried by their prefixes and suffixes. Implementing such a structure with fast lookup speeds can be challenging due to the complexity of string comparisons in a potentially large dataset.

Intuition

To solve this problem, we can take advantage of a [trie](https://en.wikipedia.org/wiki/Trie), which is a tree-like data structure that stores a dynamic set of strings in a way that makes it easy to retrieve all the keys with a common prefix. For this problem, we actually need two tries: one for prefixes and one for suffixes. Here's why this approach is effective:

By inserting each word into the prefix trie normally, and into the suffix trie in reversed order, we efficiently prepare the data for the specific queries we're about to receive.
Each node in the trie keeps a list of all the indexes of the words that pass through that node, which allows us to find all the words that have a specific prefix or suffix.
Upon querying, we look up the prefix in the prefix trie to get a list of word indexes with that prefix, and we look up the reversed suffix in the suffix trie to get a list of word indexes with that suffix.
Once we have both lists of indexes for a given prefix and suffix, we start from the end of each list (since we want the largest index) and compare the indexes. We step backwards through the lists, looking for a match. Because both lists are sorted, we are able to efficiently find the greatest index that appears in both lists, or decide that no such index exists if we reach the beginning of either list.

Given that we are dealing with words which are essentially sequences of characters, a trie allows us to narrow down our search space as we progress each character further into the word, making it a suitable data structure for both time and space efficiency in this scenario.

Learn more about Trie patterns.

Solution Approach

The solution approach involves the construction and utilization of two trie data structures:

Prefix Trie (self.p): This trie stores all the words of the dictionary by their prefix. When a word is inserted, each character of the word is traversed, creating a tree pathway if it doesn't already exist. At every node, the index of the word in the original dictionary is appended to the indexes list. This enables the trie to keep track of all the dictionary indexes that share the same prefix up to that point.
Suffix Trie (self.s): The suffix trie holds the words in reversed order to facilitate suffix searching. The process of insertion is similar to the prefix trie, except that the word is inserted backwards. Thus, when searching for a particular suffix, we search the trie with the reversed suffix.

Both tries are filled with each word's respective prefixes and reversed suffixes, along with their dictionary index. When performing a lookup with the f function:

We first search the prefix trie with the given prefix, pref. This results in a list of indexes, a, containing all the words that start with that prefix.
Similarly, we search the suffix trie with the reversed suffix, suff[::-1]. This yields a list of indexes, b, containing all the words that end with the specified suffix.
With both lists a and b, we now need to find the largest index that is common to both. Since the indexes are stored in ascending order within the trie, we compare a and b's values starting from their ends (the largest indexes).
We iterate backwards through both lists simultaneously comparing their elements:
- If the indexes at our current pointers match (a[i] == b[j]), we've found the highest index of a word that satisfies both the prefix and suffix requirements. We return this index.
- If a[i] is larger than b[j], we decrement i because we need to find a smaller index in a that might match an index in b.
- Conversely, if b[j] is larger than a[i], we decrement j to try and find a smaller index in b that matches an index in a.
The iteration continues until either list has been fully traversed (indicated by ~i or ~j becoming False when the pointers become negative). If we haven't found a matching index by this point, we return -1.

The clever use of tries allows us to optimize the lookup times for both prefixes and suffixes. Moreover, by keeping track of the indexes at each node of the trie, the solution enables an efficient way to determine the highest index where the prefix and suffix conditions overlap. This approach significantly reduces the complexity compared to brute force methods which would otherwise involve comparing each possible prefix and suffix pair in the dictionary.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's go through an example to illustrate the solution approach using two words: "apple" and "apply". These words will be used to initialize our WordFilter class.

Initializing WordFilter: We create a WordFilter object and pass the words ["apple", "apply"] to the constructor.
Building Prefix Trie (self.p): Here is a simplified visual of the prefix trie after inserting "apple" and "apply":
```
root
 | 
 a
 |
 p
/ \
```

p l | | l y |
e

As each word is inserted, each character forms a path in the trie if it doesn't exist already. The index of the word is stored at the end of the path for that word. The prefix trie would have the indices of the words as they appear in the list: index 0 for "apple" and index 1 for "apply".

3. **Building Suffix Trie (`self.s`):** The suffix trie looks different, as it is constructed using the reversed words. After inserting the reversed words "elppa" and "ylppa", it will look like:

root | a | p /
p l | | l e |
y

Like the prefix trie, the indices are kept at the end of each path - index 0 for "apple" and index 1 for "apply".

4. **Querying with `f` Function:** Suppose we want to find a word that has the prefix "ap" and the suffix "le". We call `f("ap", "le")`.

5. **Search the Prefix Trie:** We search for "ap" in the prefix trie (`self.p`). This gives us the list `[0, 1]` since both "apple" and "apply" have the prefix "ap".

6. **Search the Suffix Trie:** Now, we look up the reversed suffix "le" which becomes "el" in the suffix trie (`self.s`). The list obtained here is `[0]` because only "apple" has the suffix "le".

7. **Find Largest Common Index:** We look for the largest index that is common between the two lists:
- The list from the prefix trie is `[0, 1]`.
- The list from the suffix trie is `[0]`.

We compare from the ends, but since both lists contain the index `0` and that is the largest value in the suffix list, we return it. Hence, `f("ap", "le")` returns `0`, indicating "apple" is the word that matches these criteria with the highest index.

This approach quickly finds the highest index common to both the specified prefix and suffix without the need to compare each word in the dictionary. It efficiently narrows down the search space and leads to significant time savings, especially when dealing with a large number of queries or a large dictionary.

Solution Implementation

1class TrieNode:
2    def __init__(self):
3        # Initialize the children to be a list of 26 elements for each letter in the alphabet
4        self.children = [None] * 26
5        # Initialize an empty list to hold the indexes at which the words appear
6        self.indexes = []
7
8    def insert(self, word, index):
9        node = self
10        for char in word:
11            char_index = ord(char) - ord('a')  # Find the index of the character
12            if node.children[char_index] is None:
13                node.children[char_index] = TrieNode()  # Create a TrieNode if it doesn't exist
14            node = node.children[char_index]
15            node.indexes.append(index)  # Appends index to indexes list at each node along the path
16
17    def search(self, prefix):
18        node = self
19        for char in prefix:
20            char_index = ord(char) - ord('a')  # Convert char to index
21            if node.children[char_index] is None:
22                return []  # Return empty list if character path doesn’t exist
23            node = node.children[char_index]
24        return node.indexes  # Return the indexes list if the prefix is found
25
26
27class WordFilter:
28    def __init__(self, words):
29        self.prefix_trie = TrieNode()  # Trie to store the prefixes
30        self.suffix_trie = TrieNode()  # Trie to store the reverse of the suffixes for easier matching
31        for index, word in enumerate(words):
32            self.prefix_trie.insert(word, index)
33            # Insert reversed word into suffix trie to handle suffix searches
34            self.suffix_trie.insert(word[::-1], index)
35
36    def f(self, prefix, suffix):
37        prefix_indexes = self.prefix_trie.search(prefix)
38        suffix_indexes = self.suffix_trie.search(suffix[::-1])  # Reverse suffix to match with our suffix trie
39        if not prefix_indexes or not suffix_indexes:
40            return -1
41      
42        # Use two pointers to find the largest index which is common in both prefix and suffix indexes lists
43        i, j = len(prefix_indexes) - 1, len(suffix_indexes) - 1
44        while i >= 0 and j >= 0:
45            if prefix_indexes[i] == suffix_indexes[j]:
46                return prefix_indexes[i]
47            if prefix_indexes[i] > suffix_indexes[j]:
48                i -= 1
49            else:
50                j -= 1
51        return -1  # Return -1 if no common index is found
52
53
54# The WordFilter object will be instantiated and called as such:
55# word_filter = WordFilter(words)
56# result = word_filter.f(prefix, suffix)
57

1import java.util.ArrayList;
2import java.util.Collections;
3import java.util.List;
4
5// Trie class for storing characters and their associated indexes for easy string search.
6class Trie {
7    private Trie[] children = new Trie[26];
8    private List<Integer> indexes = new ArrayList<>();
9
10    // Insert a word into the trie and map it with its index.
11    void insert(String word, int index) {
12        Trie node = this;
13        for (char ch : word.toCharArray()) {
14            int charIndex = ch - 'a';
15            if (node.children[charIndex] == null) {
16                node.children[charIndex] = new Trie();
17            }
18            node = node.children[charIndex];
19            node.indexes.add(index);
20        }
21    }
22
23    // Search for a prefix in the trie and return a list of mapped indexes.
24    List<Integer> search(String prefix) {
25        Trie node = this;
26        for (char ch : prefix.toCharArray()) {
27            int charIndex = ch - 'a';
28            if (node.children[charIndex] == null) {
29                return Collections.emptyList();
30            }
31            node = node.children[charIndex];
32        }
33        return node.indexes;
34    }
35}
36
37// WordFilter class for prefix and suffix search functionality.
38class WordFilter {
39    private Trie prefixTrie = new Trie();
40    private Trie suffixTrie = new Trie();
41
42    // Constructor that takes an array of words and builds prefix and suffix tries.
43    public WordFilter(String[] words) {
44        for (int i = 0; i < words.length; ++i) {
45            String word = words[i];
46            prefixTrie.insert(word, i);
47            // Reverse the word for suffix operations and insert into the suffix trie.
48            suffixTrie.insert(new StringBuilder(word).reverse().toString(), i);
49        }
50    }
51
52    // Method to find the max index of a word that has the given prefix and suffix.
53    public int f(String prefix, String suffix) {
54        // Reverse suffix as we stored the words in reversed form in the suffix trie.
55        suffix = new StringBuilder(suffix).reverse().toString();
56        // Get the list of indexes for both prefix and suffix
57        List<Integer> prefixIndexes = prefixTrie.search(prefix);
58        List<Integer> suffixIndexes = suffixTrie.search(suffix);
59        // Return -1 if either list is empty.
60        if (prefixIndexes.isEmpty() || suffixIndexes.isEmpty()) {
61            return -1;
62        }
63        // Use two pointers to find the common max index in both lists.
64        int i = prefixIndexes.size() - 1;
65        int j = suffixIndexes.size() - 1;
66        while (i >= 0 && j >= 0) {
67            int prefixIndex = prefixIndexes.get(i);
68            int suffixIndex = suffixIndexes.get(j);
69            if (prefixIndex == suffixIndex) {
70                // If indexes match, return the index as it has the required prefix and suffix.
71                return prefixIndex;
72            }
73            // Move pointers based on which pointer has the larger index.
74            if (prefixIndex > suffixIndex) {
75                --i;
76            } else {
77                --j;
78            }
79        }
80        // If no common index is found return -1.
81        return -1;
82    }
83}
84
85/**
86 * The following comments would be used to explain how the class should be used:
87 * 
88 * // Instantiate a WordFilter object.
89 * WordFilter obj = new WordFilter(words);
90 *
91 * // Find the maximum index of a word with the given prefix and suffix.
92 * int maxIndex = obj.f(pref, suff);
93 */
94

1#include <vector>
2#include <string>
3#include <unordered_map>
4using namespace std;
5
6class WordFilter {
7private:
8    // Dictionary to store the mapping between the concatenated prefix and suffix to the index of the word
9    unordered_map<string, int> prefixSuffixMap;
10
11public:
12    // Constructor that precomputes the prefix and suffix combinations
13    WordFilter(vector<string>& words) {
14        // Loop over each word to compute prefix and suffix combinations
15        for (int wordIndex = 0; wordIndex < words.size(); ++wordIndex) {
16            string word = words[wordIndex];
17            int wordLength = word.size();
18          
19            // Create all possible prefix and suffix combinations for current word
20            for (int prefixLength = 0; prefixLength <= wordLength; ++prefixLength) {
21                string prefix = word.substr(0, prefixLength);
22                for (int suffixStart = 0; suffixStart <= wordLength; ++suffixStart) {
23                    string suffix = word.substr(suffixStart, wordLength - suffixStart);
24                    // Store combination in the dictionary, using dot as a separator
25                    prefixSuffixMap[prefix + "." + suffix] = wordIndex;
26                }
27            }
28        }
29    }
30
31    // Function to find the index of the word with given prefix and suffix
32    int f(string prefix, string suffix) {
33        // Create the combined key as was done during precomputation
34        string key = prefix + "." + suffix;
35        // If the combination exists in the map, return the index
36        if (prefixSuffixMap.count(key)) {
37            return prefixSuffixMap[key];
38        }
39        // If the combination is not found, return -1
40        return -1;
41    }
42};
43
44/**
45 * Your WordFilter object will be instantiated and called as such:
46 * WordFilter* obj = new WordFilter(words);
47 * int param_1 = obj->f(pref, suff);
48 */
49

1// Store the dictionary globally to map the concatenated prefix and suffix to the word's index
2const prefixSuffixMap: Record<string, number> = {};
3
4// Function to precompute the prefix and suffix combinations for an array of words
5function initializeWordFilter(words: string[]): void {
6    // Iterate over each word to compute prefix and suffix combinations
7    words.forEach((word, wordIndex) => {
8        const wordLength = word.length;
9        // Generate all possible prefix and suffix combinations for the current word
10        for (let prefixLength = 0; prefixLength <= wordLength; ++prefixLength) {
11            const prefix = word.substring(0, prefixLength);
12            for (let suffixStart = 0; suffixStart <= wordLength; ++suffixStart) {
13                const suffix = word.substring(suffixStart);
14                // Store the combination in the global dictionary, using a dot as a separator
15                prefixSuffixMap[`${prefix}.${suffix}`] = wordIndex;
16            }
17        }
18    });
19}
20
21// Function to find the index of the word with a given prefix and suffix
22function f(prefix: string, suffix: string): number {
23    // Create the combined key as was done during precomputation
24    const key = `${prefix}.${suffix}`;
25    // Return the index if the combination exists in the map, otherwise return -1
26    return prefixSuffixMap.hasOwnProperty(key) ? prefixSuffixMap[key] : -1;
27}
28```
29
30Usage:
31
32```typescript
33// Initialize with a list of words
34initializeWordFilter(["apple", "banana", "cherry"]);
35
36// Find the index of the word with a given prefix and suffix
37const index = f("a", "e");  // Returns the index of "apple" if present
38console.log(index); // Output should be the index of the word "apple" or -1 if not found
39

Time and Space Complexity

Time Complexity:

Trie Construction: - O(m * n), where m is the average length of the words, and n is the total number of words. We process each word letter by letter and at each letter of each word, we may potentially create a new Trie node.
WordFilter Constructor: - O(m * n). The constructor itself calls the insert method of the Trie class for each word and its reverse once, which is 2 * m * n.
insert method: - O(m). We need to traverse or create Trie nodes for each character in the word and push the index into indexes array which can take up to O(m) operations where m is the length of the word.
search method: - O(m + k), where k is the number of entries in the index list. For each character in the prefix or suffix, we traverse the Trie, and then we traverse the list of indexes at the final Trie node.
f function: It depends on how many indexes are there to process. If the lengths of both lists are a_i and b_i for prefix and suffix respectively, then the complexity is - O(a_i + b_i), as we are iterating through both lists at most once. In the worst case scenario, it could be - O(n), where n is the total number of words since every single word could potentially match the prefix or suffix.

Space Complexity:

Trie Storage: - O(26 * m * n), each node could potentially have up to 26 children and we could have a unique path for every single letter in every word. However, due to common prefixes, the space complexity might be less in practice.
indexes Arrays Storage: - O(n * m), each index could be stored in the list of potentially every node in the path of a word in the worst case, where m is the average length of the words and n is the number of words.

Overall, for the WordFilter data structure, the space complexity can still be approximated as - O(26 * m * n) due to the Trie nodes being the dominating factor.

Learn more about how to find time and space complexity quickly using problem constraints.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?

Pre-order traversal

Post-order traversal

In-order traversal

Level-order traversal

I don't know