1858. Longest Word With All Prefixes


Problem Description

The problem requires us to analyze an array of strings called words and determine which string qualifies as the longest such that every substring that can be formed by removing one or more characters from the end (a prefix) is also an element of the array words. A valid string must have all its prefixes in the array to be considered. For example, if words contains ["a", "banana", "ban", "nana", "ba", "banan"], then banana is a valid answer since all of its prefixes ["banan", "ban", "ba", "b"] are present in words (assuming "b" is in the list as well).

If multiple strings meet the condition and have the same length, we choose the one that is smallest lexicographically. Lexicographic order is similar to how words are listed in a dictionary, which means "apple" will come before "banana." If no such string exists, we return an empty string "".

Intuition

To solve this problem, we introduce a data structure called a Trie (pronounced "try") to efficiently manage the set of strings and to easily find if a string's prefixes exist in the array.

Here's a step by step intuition behind using Trie and the solution approach:

  1. Trie Data Structure: A Trie is a tree-like data structure where each node represents a letter and each path from root to a leaf node represents a word. It is beneficial for this problem because its structure allows us to quickly test if a string's prefixes are in our set of strings.

  2. Building Trie: First, we insert all the words from the words array into a Trie. As we insert each word, we also mark the end of the word in the Trie. This is vital so we can later differentiate when a path corresponds to an actual word in the array.

  3. Searching the Trie: After inserting all the words, we use the Trie to check if each word's prefixes exist in the array. We start from the first character and navigate the Trie, checking at each step if we have arrived at a node which represents the end of a word.

  4. Comparing Words and Lengths: We maintain a variable, say ans, that keeps track of the current longest string with all prefixes present. For each word, if it is longer than the current answer, or if it is of the same length but lexicographically smaller, and all its prefixes are present (verified using our Trie), we update ans.

By the end of the iteration, ans contains the longest string with all prefixes present in words, and in case of ties, it is the lexicographically smallest one due to our comparison checks.

Solution Approach

The solution involves implementing a Trie data structure and using it to efficiently check for the existence of each word's prefixes. Let's walk through the step-by-step implementation:

  1. Defining the Trie: We define a Trie class with an array of 26 Trie nodes (one for each letter of the alphabet) and a Boolean flag is_end that will help us to identify the end of a word.

  2. Inserting Words into the Trie:

    • For each word w in the input words array, we insert it into the Trie.
    • Each character from the word w is used to traverse the Trie nodes. If the corresponding node for the current character does not exist, we create a new node.
    • After all characters are inserted, we set the is_end flag of the last node to True, indicating the end of a word.
  3. Searching for Words with All Prefixes in the Trie:

    • Now that we have our Trie, we iterate over each word w in words and check if it and all its prefixes are present in the Trie.
    • For each character in w, we transition from one node to the next based on the character index. If we ever find a node that isn't marked as is_end during the traversal, this means that the current prefix is not a word in words, and we stop searching.
    • If all prefixes of w are found (i.e., everywhere we stopped was marked as is_end), then w is a candidate for being our answer.
  4. Updating the Answer:

    • We keep track of the longest word ans with all prefixes inside the Trie seen so far.
    • If we find a word w that is either longer than ans or of the same length but lexicographically smaller, and we have confirmed it has all its prefixes in words, ans is updated to w.
  5. Returning the Result:

    • After all words have been processed, the variable ans contains the longest word with all its prefixes present within the words array, and it is returned as the result.

The reason behind the efficiency of this approach is that the Trie allows us to rapidly check the existence of prefixes without having to repeatedly search through the entire words array for each prefix. This greatly reduces the overall number of comparisons needed, particularly for longer words with many prefixes.

Example Walkthrough

Let's take a small example to illustrate the solution approach:

Suppose our input words array is: ["a", "app", "appl", "ap", "apply", "apple"]

Following the solution approach, we perform the following steps:

  1. Defining the Trie: Our Trie node class is defined, where each node holds an array of pointers to the next character (total 26, for each letter) and a Boolean flag is_end.

  2. Inserting Words into the Trie:

    • We insert each word from words into our Trie:
      • "a": A single node is created with is_end set to True.
      • "app": We create nodes for 'p' and 'p', and is_end is marked at the last 'p'.
      • "appl": Extending from the last 'p' of "app", we add a node for 'l', and its is_end is set to True.
      • "ap": We need not add new nodes since 'ap' is already a prefix of "app", we mark 'p' as is_end.
      • "apply": We extend from the last 'l' of "appl" to add 'y' and its is_end is set to True.
      • "apple": Similarly, we extend from the 'p' of "appl" to add 'e' and mark is_end.

Our Trie would now hold all our strings with terminal nodes indicating the end of valid words.

  1. Searching for Words with All Prefixes in the Trie:

    • We examine each word in words against our Trie.
    • For "apply": We check every character 'a', 'p', 'p', 'l', 'y' along the path and find each has is_end set to True at the corresponding node. Hence, every prefix is a valid word.
    • We do the same with each other word, like "apple".
  2. Updating the Answer:

    • Let's assume our ans value starts as an empty string.
    • We look at "apply" and find it has all prefixes in the Trie; it's longer than ans, so ans becomes "apply".
    • We then look at "apple" and, since it has all prefixes and is lexicographically smaller than "apply" but of the same length, ans is updated to "apple".
  3. Returning the Result:

    • After completing the checks for all words, we find that ans equals "apple", the longest string with all its prefixes present in the array, which is also smallest lexicographically among the longest ones.

By using a Trie, we efficiently checked the existence of all prefixes for each word while updating our longest valid word, resulting in "apple" as our final answer for the input array.

Python Solution

1class Trie:
2    def __init__(self):
3        # Initialize the Trie node with 26 children for each lowercase letter and a flag to indicate if it's the end of a word.
4        self.children = [None] * 26
5        self.is_end_of_word = False
6
7    def insert(self, word):
8        # Insert a word into the Trie.
9        node = self
10        for character in word:
11            index = ord(character) - ord('a')  # Convert character to index (0-25).
12            if node.children[index] is None:   
13                node.children[index] = Trie()   # If the child doesn't exist, create a new Trie node.
14            node = node.children[index]
15        node.is_end_of_word = True  # Mark the end of a word.
16  
17    def search(self, word):
18        # Search for a word in the Trie and return True only if the word exists and is a valid sequence of 'is_end_of_word'.
19        node = self
20        for character in word:
21            index = ord(character) - ord('a')  # Convert character to index.
22            if not node.children[index]:  
23                return False  # If the child is None, the word doesn't exist.
24            node = node.children[index]
25      
26        # The word is in the Trie only if we reach the end of the word.
27        return node.is_end_of_word
28
29class Solution:
30    def longestWord(self, words):
31        # Find the longest word in the list that can be built one character at a time by other words in the list.
32        trie = Trie()
33      
34        # Insert all words into the Trie.
35        for word in words:
36            trie.insert(word)
37      
38        longest_word = ""
39      
40        # Check each word in the list.
41        for word in words:
42            # Skip if we already have a longer word, or if same length but lexicographically earlier.
43            if longest_word and (len(longest_word) > len(word) or (len(longest_word) == len(word) and longest_word < word)):
44                continue
45          
46            # If the word can be built one character at a time by other words, update longest_word.
47            if trie.search(word):
48                longest_word = word
49      
50        return longest_word
51

Java Solution

1class Trie {
2    // Trie nodes for storing 26 English lowercase letters.
3    private Trie[] children = new Trie[26];
4    // Flag to mark the end of the word.
5    private boolean isEnd;
6
7    // Method to insert a word into the Trie.
8    public void insert(String word) {
9        Trie node = this;
10        for (char c : word.toCharArray()) {
11            int index = c - 'a'; // Find the position for the character.
12            if (node.children[index] == null) {
13                node.children[index] = new Trie(); // If not present, create a Trie node.
14            }
15            node = node.children[index]; // Move to the child node.
16        }
17        node.isEnd = true; // Mark the end of the word.
18    }
19
20    // Method to search for a word in the Trie, ensuring the path of characters exists and each node is an end of a word.
21    public boolean search(String word) {
22        Trie node = this;
23        for (char c : word.toCharArray()) {
24            int index = c - 'a';
25            node = node.children[index]; // Move to the child node.
26          
27            // If there's no node or the node is not marked as end of a word,
28            // it implies the word or its prefix does not exist.
29            if (node == null || !node.isEnd) {
30                return false;
31            }
32        }
33        return true; // If loop completes, the word's path is found with end nodes.
34    }
35}
36
37class Solution {
38    public String longestWord(String[] words) {
39        // Initialize the Trie.
40        Trie trie = new Trie();
41        // Insert all words into the Trie.
42        for (String word : words) {
43            trie.insert(word);
44        }
45      
46        String answer = ""; // To store the longest word with lexicographical order.
47        for (String word : words) {
48            // Skip this word if:
49            // 1. Current answer is longer than this word, or
50            // 2. They are of the same length but answer is lexicographically smaller.
51            if (!answer.isEmpty()
52                && (answer.length() > word.length()
53                    || (answer.length() == word.length() && answer.compareTo(word) < 0))) {
54                continue;
55            }
56          
57            // If the word is found in the Trie following the rule that
58            // each prefix is marked as an end of a word.
59            if (trie.search(word)) {
60                answer = word; // Update the answer to the new word.
61            }
62        }
63      
64        return answer; // Return the longest word that satisfies the condition.
65    }
66}
67

C++ Solution

1#include <vector>
2#include <string>
3
4using namespace std;
5
6class Trie {
7private:
8    // Trie children - each element represents a pointer to the next Trie node
9    vector<Trie*> children;
10    // Flag to check if the current node marks the end of a word
11    bool isEndOfWord;
12
13public:
14    // Constructor initializes children for 26 lowercase English letters and marks isEndOfWord as false
15    Trie() : children(26, nullptr), isEndOfWord(false) {}
16
17    // Inserts a word into the Trie
18    void insert(const string& word) {
19        Trie* node = this;
20        for (char c : word) {
21            // Convert character to index 0-25
22            int index = c - 'a';
23            if (!node->children[index]) {
24                node->children[index] = new Trie();
25            }
26            node = node->children[index];
27        }
28        // Mark the end of a word
29        node->isEndOfWord = true;
30    }
31
32    // Searches for a word in the Trie, ensuring each character is the end of a previous word
33    bool search(const string& word) {
34        Trie* node = this;
35        for (char c : word) {
36            int index = c - 'a';
37            node = node->children[index];
38            if (!node || !node->isEndOfWord) return false; // Check if intermediate node is missing or not end of a word
39        }
40        return true; // The entire word was found
41    }
42};
43
44class Solution {
45public:
46    // Finds the longest word in the dictionary that can be built one character at a time by other words in the dictionary
47    string longestWord(vector<string>& words) {
48        Trie* trie = new Trie();
49      
50        // Insert all words into the Trie
51        for (const string& w : words) {
52            trie->insert(w);
53        }
54
55        string answer = "";
56        for (const string& w : words) {
57            // Skip if current word is shorter than the answer or lexicographically smaller than an equally long answer
58            if (answer.size() > w.size() || (answer.size() == w.size() && answer < w)) continue;
59
60            // If current word can be built character by character using previous words
61            if (trie->search(w)) {
62                answer = w; // Update answer
63            }
64        }
65        return answer;
66    }
67};
68
69// Remember to include statements to deallocate the Trie to avoid memory leaks.
70

Typescript Solution

1interface TrieNode {
2  children: (TrieNode | null)[];
3  isEndOfWord: boolean;
4}
5
6// Initializes a new Trie node with children for 26 lowercase English letters
7function createTrieNode(): TrieNode {
8  return {
9    children: new Array(26).fill(null),
10    isEndOfWord: false
11  };
12}
13
14// Inserts a word into the Trie
15function insert(word: string, root: TrieNode): void {
16  let node = root;
17  for (const char of word) {
18    // Convert character to index 0-25
19    const index = char.charCodeAt(0) - 'a'.charCodeAt(0);
20    if (node.children[index] === null) {
21      node.children[index] = createTrieNode();
22    }
23    node = node.children[index]!;
24  }
25  // Mark the end of a word
26  node.isEndOfWord = true;
27}
28
29// Searches for a word in the Trie, ensuring each character is the end of a previous word
30function search(word: string, root: TrieNode): boolean {
31  let node = root;
32  for (const char of word) {
33    const index = char.charCodeAt(0) - 'a'.charCodeAt(0);
34    node = node.children[index];
35    if (node === null || !node.isEndOfWord) return false; // Check if intermediate node is missing or not end of a word
36  }
37  return true; // The entire word was found
38}
39
40// Finds the longest word in the dictionary that can be built one character at a time by other words in the dictionary
41function longestWord(words: string[]): string {
42  const trieRoot = createTrieNode();
43
44  // Insert all words into the Trie
45  for (const w of words) {
46    insert(w, trieRoot);
47  }
48
49  let answer = "";
50  for (const w of words) {
51    // Skip if current word is shorter than the answer or lexicographically smaller than an equally long answer
52    if (answer.length > w.length || (answer.length === w.length && answer < w)) continue;
53
54    // If current word can be built character by character using previous words
55    if (search(w, trieRoot)) {
56      answer = w; // Update answer
57    }
58  }
59  return answer;
60}
61

Time and Space Complexity

Time Complexity

The time complexity of the given code consists of several parts, including the construction of the Trie and the searching within the Trie.

  1. Insert words into the Trie: Each word is inserted into the Trie by traversing it letter by letter and creating nodes if they do not exist. If we assume the average length of the words is L and there are N words, this part will take O(N * L) since we visit each letter of each word exactly once during insertion.

  2. Search for words in the Trie: The search operation for each word involves traversing the Trie for each letter in the word. However, there seems to be an error in this part of the code given that the search method is supposed to check if the word is present in the Trie by marking the end of a word. The current search logic will return False as soon as any node along the path is not marked as an end node, which is not the correct condition. The corrected search operation would be O(L) for each word if we were to correct the search stipulation to only check the end node for the last character of the word.

With the corrected search logic, since we perform the search for N words, this operation would be O(N * L) as well.

  1. Finding the longest word: The longest word is determined by comparing the current longest valid word with others in a linear fashion, which is O(N). However, as we already search each word in the Trie, this is not adding an additional time complexity beyond what we have already from the insert and search operations.

Hence, the overall time complexity of the given code is O(N * L) with the assumption of the search logic being corrected.

Space Complexity

The space complexity is determined by the space taken by the Trie.

  1. Space for the Trie: The Trie has at most 26 * N * L nodes in the case that all words are of maximum length L and do not share any common prefix. However, since Trie nodes are shared among prefixes, the worst-case space complexity is O(26 * N * L) but is typically much less.

Thus the space complexity of the Trie is O(26 * N * L), which simplifies to O(N * L) since constants are ignored in Big O notation.


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫