642. Design Search Autocomplete System


Problem Description

The problem requires building an autocomplete system much like those found in modern search engines. When a user types a sentence, the system should suggest the top 3 historical sentences that share the same prefix as the input. These suggestions are based on two key factors:

  1. The frequency of past inputs, reflecting how often each sentence has been previously typed to gauge its 'hotness' (popularity).
  2. The lexicographical order, which acts as a tiebreaker when sentences have the same frequency.

The system should update its suggestions with each new character typed, except when the special character '#' is entered, indicating the end of the sentence. At that point, the input should be recorded in the system to influence future suggestions, and the output should be an empty array.

To manage this process, we'll need to implement two main components:

  • AutocompleteSystem(sentences, times): Construct an autocomplete system with pre-recorded sentences and their corresponding frequencies.
  • input(c): Process each character input by the user. If the character is '#', update the system with the new sentence. Otherwise, find and return the top 3 suggestions.

Intuition

To efficiently manage the input sentences and find the top suggestions, we can implement a Trie Data structure (prefix tree). A Trie is particularly well-suited for this task since it can quickly lookup suggestions based on input prefixes.

For the implementation, we'll follow these steps:

  1. Initialize the Trie with preloaded sentences and their frequencies. For this, implement an insert function in the Trie that will store the sentence frequencies as well as the sentences themselves.

  2. For each input character, navigate through the Trie to find the node corresponding to the prefix formed by the typed characters up to that point.

  3. From that node, perform a depth-first search (DFS) traversing to gather all possible sentence suggestions from that prefix. Collect their frequencies and sentences themselves.

  4. After gathering all possible suggestions, sort them based on the two criteria mentioned (frequency and lexicographical order).

  5. Return the top 3 suggestions based on the sorting. If fewer than 3 suggestions are available, return all that exist.

  6. If the input character is '#', ensure to insert the completed sentence into the Trie, which can then influence future suggestions.

By structuring the Trie nodes to hold values representing sentence frequency (v) and the sentence itself (w), and by leveraging sorting on the collected results, we accomplish the desired autocomplete functionality.

The key to the solution is efficient Trie traversal and effective use of sorting to identify the hottest sentences quickly. This approach minimizes the computation needed for each character input and ensures that each new entry dynamically updates the Trie for future suggestions.

Solution Approach

The core of the autocomplete system is built on a Trie data structure, which allows for efficient insertion and searching of prefixes for the given sentences. The goal of the Trie is to provide a means of retrieving all sentences starting with a particular prefix that the user has entered up until that moment. Here is an outline of the implementation pattern used:

  1. Trie Construction: The Trie node is initialized with 27 child nodes, one for each letter of the alphabet plus a space character. The Trie node also contains two properties: v to hold the frequency of the sentence, and w to store the sentence itself.

  2. Insert Function in Trie:

    • The insert method adds a sentence to the Trie. As it moves through each character in the sentence, it determines the appropriate child node, creating a new Trie node if necessary.
    • The index of the child node is determined by converting the character to its alphabetical index, with a special case for the space character, which is mapped to 26.
    • When the end of the sentence is reached, the node's frequency or 'hotness' value v is incremented by the provided frequency t, and the sentence w is stored in the node.
  3. Search Function in Trie:

    • The search method navigates through the Trie based on the input prefix. If a character isn't found, the method returns None, indicating no suggestions available for this prefix.
  4. Depth-First Search (DFS) for Suggestions:

    • The dfs auxiliary function recursively traverses all children nodes starting from the node that matches the current input prefix to find all completed sentences stemming from it.
    • Each completed sentence and its frequency are collected in a results list res.
  5. Input Handling:

    • When input(c) is called with a character c, it appends the character to a temporary list t representing the current partial sentence being entered.
    • If c is '#', the user has finished typing, and this completed sentence is inserted into the Trie with a frequency increment of 1. An empty list is returned, and the temporary list t is reset.
    • If c is not '#', the current prefix is used to search the Trie, and then dfs is called to collect all possible sentences from this prefix.
  6. Sorting and Obtaining Top 3 Results:

    • After the DFS is complete, the list of collected suggestions is sorted first by frequency in decreasing order and then by lexical order.
    • The top 3 suggestions are extracted using array slicing (res[:3]), ensuring that if fewer suggestions are available, all are returned.

By combining the Trie structure for storing sentences efficiently and carefully designed DFS traversal with precise sorting logic, the implementation effectively provides the needed autocomplete suggestions. It's important to note that the Trie data structure is selected due to its ability to greatly optimize the lookup time for suggested sentences compared to simpler list-based or brute-force search approaches.

Example Walkthrough

Let's illustrate the solution approach with a small example. Imagine we have the following historical sentences with their corresponding frequencies:

  • "i love you", frequency = 5
  • "island", frequency = 3
  • "i love leetcode", frequency = 2
  • "ironman", frequency = 1

We initialize the AutocompleteSystem with these sentences. Now, let's consider a user starts typing the string "i".

  1. Trie Construction: Our Trie has been initialized with the given sentences and their frequencies. The structure will hold nodes for "i love you", "island", "i love leetcode", and "ironman" under nodes corresponding to "i → l" and "i → s" and "i → r".

  2. Processing Input 'i': The Trie is traversed to the node corresponding to "i". Since there is no sentence completion yet, we move to the next step.

  3. Depth-First Search (DFS) for Suggestions: From the "i" node, we do a DFS to collect all potential sentences. The system finds "i love you", "i love leetcode", "island", and "ironman".

  4. Sorting Candidates: We sort these sentences by their frequencies and lexicographically:

    • "i love you" – frequency 5
    • "i love leetcode" – frequency 2
    • "island" – frequency 3
    • "ironman" – frequency 1

    Sorted results: ["i love you", "island", "i love leetcode", "ironman"]

  5. Return Top 3 Suggestions:

    • After sorting, the top 3 are "i love you", "island", and "i love leetcode". So those are returned as suggestions to the user.
  6. User Completes the Sentence: Suppose the user types "i love leetcode#" to complete the sentence.

  7. Updating Trie: The input "i love leetcode" is now added to the Trie, updating its frequency from 2 to 3 because the user has entered it once more.

Now, if the user starts a new input with "i", the system will perform the same steps and will suggest: "i love you" (frequency 5), "i love leetcode" (updated frequency 3), and "island" (frequency 3). Since "i love leetcode" and "island" have the same frequency, lexicographic sorting decides the order, hence "i love leetcode" comes before "island".

Python Solution

1class TrieNode:
2    def __init__(self):
3        # Initialize 27 children for each letter in the alphabet plus the space character
4        self.children = [None] * 27
5        self.value = 0  # Frequency of word ending at this node
6        self.word = ''  # The word ending at this node, if any
7
8    def insert(self, word, frequency):
9        # Inserts a word into the trie along with its frequency
10        node = self
11        for char in word:
12            index = 26 if char == ' ' else ord(char) - ord('a')  # Mapping 'a'-'z' to 0-25, ' ' to 26
13            if node.children[index] is None:
14                node.children[index] = TrieNode()
15            node = node.children[index]
16        node.value += frequency
17        node.word = word
18
19    def search(self, prefix):
20        # Searches for a node in the trie that corresponds to the given prefix
21        node = self
22        for char in prefix:
23            index = 26 if char == ' ' else ord(char) - ord('a')
24            if node.children[index] is None:
25                return None
26            node = node.children[index]
27        return node
28
29class AutocompleteSystem:
30    def __init__(self, sentences, times):
31        self.trie = TrieNode()
32        for sentence, frequency in zip(sentences, times):
33            self.trie.insert(sentence, frequency)
34        self.typed_characters = []  # Keeps track of characters typed by the user
35
36    def input(self, character):
37        # Returns autocomplete suggestions based on the characters inputted so far
38
39        def dfs(node):
40            # Perform a depth-first search to find all words with their frequencies
41            if node is None:
42                return
43            if node.value:
44                results.append((node.value, node.word))
45            for next_node in node.children:
46                dfs(next_node)
47
48        if character == '#':
49            # The user finished typing a word; update the trie and reset the input
50            current_sentence = ''.join(self.typed_characters)
51            self.trie.insert(current_sentence, 1)  # Increment the frequency of the word
52            self.typed_characters = []
53            return []
54
55        results = []
56        self.typed_characters.append(character)
57        current_prefix = ''.join(self.typed_characters)
58        node = self.trie.search(current_prefix)
59        # If the prefix doesn't exist in the trie, return an empty list
60        if node is None:
61            return []
62        # Otherwise, find and return autocomplete suggestions
63        dfs(node)
64        # Sort the results in descending order of frequency, and alphabetically for ties
65        results.sort(key=lambda x: (-x[0], x[1]))
66        # Return the top 3 suggestions, or fewer if there aren't enough
67        return [entry[1] for entry in results[:3]]
68
69# The AutocompleteSystem class can be instantiated and used as follows:
70# autocomplete_system = AutocompleteSystem(sentences, times)
71# suggestions = autocomplete_system.input(character)
72

Java Solution

1import java.util.ArrayList;
2import java.util.List;
3import java.util.PriorityQueue;
4
5class Trie {
6    Trie[] children = new Trie[27]; // 26 English letters and 1 space
7    int frequency;
8    String word = "";
9
10    // Inserts a word into the trie with its associated frequency
11    void insert(String word, int frequency) {
12        Trie node = this;
13        for (char ch : word.toCharArray()) {
14            int index = ch == ' ' ? 26 : ch - 'a'; // Convert char to index, space represented by 26
15            if (node.children[index] == null) {
16                node.children[index] = new Trie();
17            }
18            node = node.children[index];
19        }
20        node.frequency += frequency; // Update the frequency of the node (word-end)
21        node.word = word;
22    }
23
24    // Searches for a node with a given prefix in the trie
25    Trie search(String prefix) {
26        Trie node = this;
27        for (char ch : prefix.toCharArray()) {
28            int index = ch == ' ' ? 26 : ch - 'a';
29            if (node.children[index] == null) {
30                return null;
31            }
32            node = node.children[index];
33        }
34        return node;
35    }
36}
37
38class AutocompleteSystem {
39    private Trie rootTrie = new Trie(); // Trie root
40    private StringBuilder currentInput = new StringBuilder();
41
42    // Constructor which takes an array of sentences and their corresponding
43    // frequencies to build the trie
44    public AutocompleteSystem(String[] sentences, int[] times) {
45        for (int i = 0; i < sentences.length; i++) {
46            rootTrie.insert(sentences[i], times[i]);
47        }
48    }
49
50    // Processes input character by character to autocomplete
51    public List<String> input(char inputChar) {
52        List<String> autocompleteResult = new ArrayList<>();
53        if (inputChar == '#') {
54            rootTrie.insert(currentInput.toString(), 1); // Insert the current input into the trie
55            currentInput = new StringBuilder(); // Reset builder for next input
56            return autocompleteResult; // Autocomplete list is empty for new input
57        }
58        currentInput.append(inputChar); // Adding new character to current input
59        Trie node = rootTrie.search(currentInput.toString());
60        if (node == null) {
61            return autocompleteResult; // If no node found, return empty list
62        }
63        PriorityQueue<Trie> minHeap
64            = new PriorityQueue<>((a, b) -> a.frequency == b.frequency ? b.word.compareTo(a.word) : a.frequency - b.frequency);
65        depthFirstSearch(node, minHeap);
66        while (!minHeap.isEmpty()) {
67            autocompleteResult.add(0, minHeap.poll().word);
68        }
69        return autocompleteResult; // Return the list of autocompleted words
70    }
71
72    // DFS to find all words with the same prefix, use a priority queue to sort them based on frequency and lexicographical order
73    private void depthFirstSearch(Trie node, PriorityQueue<Trie> minHeap) {
74        if (node == null || minHeap.size() > 3 && node.frequency < minHeap.peek().frequency) {
75            return; // Early return condition to prune the search
76        }
77        if (node.frequency > 0) {
78            minHeap.offer(node); // If it's a word, offer to the min-heap
79            if (minHeap.size() > 3) {
80                minHeap.poll(); // Keep the size of min-heap no larger than 3
81            }
82        }
83        for (Trie childNode : node.children) {
84            depthFirstSearch(childNode, minHeap); // Recursively search children
85        }
86    }
87}
88
89/**
90 * Your AutocompleteSystem object will be instantiated and called as such:
91 * AutocompleteSystem obj = new AutocompleteSystem(sentences, times);
92 * List<String> param_1 = obj.input(c);
93 */
94

C++ Solution

1#include <vector>
2#include <string>
3#include <unordered_map>
4#include <queue>
5
6using namespace std;
7
8// Trie node structure
9class Trie {
10public:
11    unordered_map<char, Trie*> children;
12    int frequency;
13    string word;
14
15    // Constructor
16    Trie() : frequency(0), word("") {}
17
18    // Destructor
19    ~Trie() {
20        for(auto child : children) {
21            delete child.second;
22        }
23    }
24
25    // Inserts a word into the trie with its associated frequency
26    void insert(string word, int frequency) {
27        Trie* node = this;
28        for (char ch : word) {
29            if (node->children.find(ch) == node->children.end())
30                node->children[ch] = new Trie();
31          
32            node = node->children[ch];
33        }
34        node->frequency += frequency; // Update the frequency of the node (word-end)
35        node->word = word;
36    }
37
38    // Searches for a node with a given prefix in the trie
39    Trie* search(string prefix) {
40        Trie* node = this;
41        for (char ch : prefix) {
42            if (node->children.find(ch) == node->children.end())
43                return nullptr;
44          
45            node = node->children[ch];
46        }
47        return node;
48    }
49};
50
51// AutocompleteSystem class to provide autocomplete functionality
52class AutocompleteSystem {
53private:
54    Trie* root Trie;
55    string currentInput;
56
57    // Comparator for priority queue
58    struct TrieComparator {
59        bool operator()(Trie* a, Trie* b) {
60            if(a->frequency == b->frequency) return a->word > b->word;
61            return a->frequency < b->frequency;
62        }
63    };
64
65public:
66    // Constructor which takes a vector of sentences and their corresponding frequencies to build the trie
67    AutocompleteSystem(vector<string>& sentences, vector<int>& times) {
68        rootTrie = new Trie();
69        for (int i = 0; i < sentences.size(); i++) {
70            rootTrie->insert(sentences[i], times[i]);
71        }
72    }
73
74    // Destructor to safely delete the Trie
75    ~AutocompleteSystem() {
76        delete rootTrie;
77    }
78
79    // Processes input character by character to autocomplete
80    vector<string> input(char inputChar) {
81        if (inputChar == '#') {
82            rootTrie->insert(currentInput, 1); // Insert the current input into the trie
83            currentInput.clear(); // Reset builder for next input
84            return vector<string>(); // Return empty vector
85        }
86
87        currentInput += inputChar; // Adding new character to current input
88        Trie* node = rootTrie->search(currentInput);
89        if (!node) {
90            return vector<string>(); // If no node found, return empty list
91        }
92
93        priority_queue<Trie*, vector<Trie*>, TrieComparator> minHeap;
94        depthFirstSearch(node, minHeap);
95
96        vector<string> autocompleteResult;
97        while (!minHeap.empty() && autocompleteResult.size() < 3) {
98            autocompleteResult.insert(autocompleteResult.begin(), minHeap.top()->word);
99            minHeap.pop();
100        }
101        return autocompleteResult; // Return the list of autocompleted words
102    }
103
104private:
105    // DFS to find all words with the same prefix, use a priority queue to sort them based on frequency and lexical order
106    void depthFirstSearch(Trie* node, priority_queue<Trie*, vector<Trie*>, TrieComparator>& minHeap) {
107        if (!node) return;
108        if (node->frequency > 0) {
109            minHeap.push(node); // If it's a word, add it to the min-heap
110            if (minHeap.size() > 3) minHeap.pop(); // If more than 3 elements, remove the one with the lowest frequency
111        }
112        for (auto pair : node->children) {
113            depthFirstSearch(pair.second, minHeap); // Recursively search children
114        }
115    }
116};
117
118// Usage example (not part of the class definition)
119/*
120int main() {
121    vector<string> sentences = {"i love you", "island", "iroman", "i love leetcode"};
122    vector<int> times = {5, 3, 2, 2};
123    AutocompleteSystem obj(sentences, times);
124    vector<string> param_1 = obj.input('i');
125    // Continue processing input characters...
126    return 0;
127}
128*/
129

Typescript Solution

1// Define the structure of a Trie node
2type TrieNode = {
3  children: TrieNode[];
4  frequency: number;
5  word: string;
6};
7
8// Initialize the root Trie node with 27 children for 26 letters and one space
9let rootTrie: TrieNode = createTrieNode();
10
11// Function to create a new Trie node
12function createTrieNode(): TrieNode {
13  return {
14    children: new Array(27).fill(null), // Initialize an array of 27 null children
15    frequency: 0,
16    word: ""
17  };
18}
19
20// Inserts a word into the trie with its associated frequency
21function insert(word: string, frequency: number): void {
22  let node = rootTrie;
23  for (let ch of word) {
24    const index = ch === ' ' ? 26 : ch.charCodeAt(0) - 'a'.charCodeAt(0);
25    if (node.children[index] === null) {
26      node.children[index] = createTrieNode();
27    }
28    node = node.children[index];
29  }
30  node.frequency += frequency; 
31  node.word = word;
32}
33
34// Searches for a node with a given prefix in the trie
35function search(prefix: string): TrieNode | null {
36  let node = rootTrie;
37  for (let ch of prefix) {
38    const index = ch === ' ' ? 26 : ch.charCodeAt(0) - 'a'.charCodeAt(0);
39    if (node.children[index] === null) {
40      return null;
41    }
42    node = node.children[index];
43  }
44  return node;
45}
46
47// Current input buffer
48let currentInput: string[] = [];
49
50// Autocomplete system constructor mimicking
51function autocompleteSystem(sentences: string[], times: number[]): void {
52  for (let i = 0; i < sentences.length; i++) {
53    insert(sentences[i], times[i]);
54  }
55}
56
57// Processes input character by character to autocomplete
58function input(inputChar: string): string[] {
59  let autocompleteResult: string[] = [];
60  if (inputChar === '#') {
61    insert(currentInput.join(''), 1); // Construct the current input string and insert it into the trie
62    currentInput = []; // Reset current input for next input
63    return autocompleteResult; // Autocomplete list is empty for new input
64  }
65  currentInput.push(inputChar); // Add new character to current input
66  const node = search(currentInput.join(''));
67  if (node === null) {
68    return autocompleteResult; // If no node found, return empty list
69  }
70
71  // Create a min-heap priority queue to store TrieNode elements
72  const minHeap: TrieNode[] = [];
73  depthFirstSearch(node, minHeap);
74  while (minHeap.length) {
75    const nextItem = minHeap.shift();
76    if (nextItem) autocompleteResult.push(nextItem.word);
77  }
78  return autocompleteResult; // Return the list of autocompleted words
79}
80
81// DFS to find all words with the same prefix
82function depthFirstSearch(node: TrieNode, minHeap: TrieNode[]): void {
83  if (!node || (minHeap.length > 2 && node.frequency < minHeap[0].frequency)) {
84    return; // Stop searching if conditions are met
85  }
86  if (node.frequency > 0) {
87    minHeap.push(node); // If the node is a word, push it to the min-heap
88    minHeap.sort((a, b) => b.frequency - a.frequency || a.word.localeCompare(b.word)); // Sort by frequency and lex order
89    if (minHeap.length > 3) {
90      minHeap.shift(); // Remove the lowest frequency node if the heap size is greater than 3
91    }
92  }
93  for (let childNode of node.children) {
94    if (childNode) {
95      depthFirstSearch(childNode, minHeap); // Recursively search all children
96    }
97  }
98}
99
100/**
101 * Example usage:
102 * autocompleteSystem(["i love you", "island", "ironman", "i love leetcode"], [5, 3, 2, 2]);
103 * console.log(input('i')); // Expected autocomplete results for prefix 'i'
104 * console.log(input('#')); // Completing the input and expecting empty result for new input
105 */
106

Time and Space Complexity

Trie Insertion - insert method

  • Time Complexity: The time complexity of inserting a word w with length n into the trie is O(n). This is because we have to visit each character in the word and check/create a node in the trie for each character.

  • Space Complexity: The space complexity for inserting a word is O(n). In the worst-case scenario, where no prefix of the word is shared with previously inserted words, we create a new node for each character.

Trie Search - search method

  • Time Complexity: The time complexity for searching a prefix pref of length m is O(m), as we need to visit each character's node in the trie to check if the prefix exists.

  • Space Complexity: The space complexity is O(1) for searching, since we're only iterating through existing nodes without using additional space proportional to the size of the input.

Autocomplete Input - input method

  • Time Complexity: The input function has several components:

    1. Appending a character to the current input, which is O(1).

    2. Searching for the current input in the trie, which is O(m) where m is the length of the current input string.

    3. Depth-First Search (DFS) to find all words with the given prefix, which can take up to O(p + q * k) where p is the total number of trie-nodes, q is the number of nodes that contains words, and k is the average length of the words stored.

    4. Sorting the results, if there are x completions found by DFS, the complexity of sorting is O(x * log(x)).

    Adding all of that together and considering the worst-case scenario, the time complexity for the input method is O(m + p + q * k + x * log(x)).

  • Space Complexity: The space complexity here is dominated by the DFS operation, which stores all the potential completions. In the worst case, it stores O(q) completions, and the space required for the sorting operation. Therefore, the space complexity is O(q * k), where k is the average length of the words stored.

Note: List[str] and times: List[int] are Python-specific type hints. The time complexity does not change, but for clarity and context, they indicate the types of the arguments sentences and times respectively. The sentences is a list of strings, and times is a list of integers.


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫