472. Concatenated Words


Problem Description

The problem provides an array of strings called words which contains no duplicates. The task is to return a list of all concatenated words in the given array. A concatenated word is made up entirely of at least two shorter words from the array. These shorter words do not necessarily have to be unique; they can be the same word repeated or different words.

Intuition

To solve this problem, we can harness a data structure called a Trie (prefix tree). This is an efficient structure for retrieval of strings when we deal with a dataset where the strings may share a common prefix.

First, we sort the input array words by the length of the strings to ensure that when we look for concatenated words, we have already processed all possible shorter words that could comprise them.

Our approach is to iterate through each word and determine whether it's a concatenated word by recursively searching for its prefix parts in the Trie. If a prefix part is found in the Trie (which indicates that it is a valid shorter word), we then perform a depth-first search (DFS) for the remainder of the word.

Here is the general idea behind the solution:

  1. Initialize an empty Trie.
  2. Sort the words by length.
  3. Iterate through each word:
    • Use a DFS technique to determine if the current word can be formed by concatenating other words from the Trie.
    • If it can be formed, add it to the answer list.
    • If it cannot be formed, insert the word into the Trie for potential future use.
  4. Return the list of concatenated words.

The DFS function attempts to match each character in the word with a child in the Trie. If at any point the character doesn't match any child or we reach the end without finding a valid concatenation, the function will return False. If we find a node in the Trie which is the end of another word (node.is_end is True), we recursively call the DFS function on the remainder of the current word.

By sorting the words by length, we ensure that no word is attempted to be matched against itself or against longer words that haven't been inserted to the Trie yet, thereby guaranteeing that only shorter words are used.

This algorithm efficiently breaks down the problem by reusing previously computed information (subwords already in the Trie), which makes it effective to determine if a word is a concatenated word of other shorter words within the array.

Solution Approach

The implementation of our solution involves two major components: the Trie data structure and the dfs (depth-first search) function. Here's a walkthrough of how these components work together to solve the problem:

Trie Data Structure

  • A Trie is a special type of tree used to store a dynamic set or associative array where the keys are usually strings.

  • In our case, every node in the Trie represents a single character of a word and has an array of 26 elements children to represent each letter of the alphabet (assuming only lowercase letters are present).

  • A boolean variable is_end at each node indicates if the node corresponds to the last character of a word that has been inserted into the Trie.

  • The insert method is used to add a new word to the Trie. As we traverse through each character of the word, we check if a child node for the current character exists. If not, we create a new node, move to the node, and continue until the whole word is processed.

Depth-First Search (DFS) Function

  • The dfs function is crucial to our approach. It takes a word w as an argument and searches the Trie to determine if w is a concatenated word.

  • The function iterates over the characters in w, converting each character to an index (0 through 25) corresponding to a child in the current Trie node.

  • If at any point a Trie node doesn't have a child matching the current character, the function returns False, signaling that the word cannot be constructed from the Trie.

  • If a complete word is found within the Trie (indicated by node.is_end), the function recursively calls itself with the remaining substring of w (the part of w after the found word). If this recursive call returns True, then the original w is a concatenated word.

Solution Steps

  1. We initialize an empty Trie and an empty list ans that will eventually contain our concatenated words.

  2. We sort the input words by their lengths so we can build the Trie from the shortest to the longest words, ensuring that when we check if a word is a concatenated word, it can only be consisting of shorter words that have already been processed.

  3. After sorting, we iterate over words. For each word w, we call our dfs function:

    • If dfs(w) returns True, we know that w is a concatenated word, hence we add it to our ans list.
    • If dfs(w) returns False, w is not a concatenated word and we insert w into the Trie.
  4. Once all words have been processed, we return the ans list containing all concatenated words found in the words array.

This implementation leverages the Trie's efficiency in storing and searching for strings, and the dfs function's ability to recursively decompose the problem to find all valid concatenated words.

💪
Level Up Your
Algo Skills

Example Walkthrough

Let's say we are given the following array of strings:

words = ["cat", "cats", "dog", "catsdog", "catxdogcatx"]

Our goal is to find all the concatenated words, which means words that are made up of at least two shorter words from the list.

We start by sorting the words by length:

words = ["cat", "dog", "cats", "catsdog", "catxdogcatx"]

Now, we will walk through the sorted words and use a Trie and DFS to determine if a word is a concatenated word.

  1. We initialize our Trie, which initially is empty. We also initialize an empty list ans to store the results.

  2. We insert 'cat' and 'dog' into the Trie (being the first two words in the sorted order, they can't be concatenated words).

  3. Next, we analyze 'cats':

    • Performing a DFS, we find 'cat' in the Trie and then check if 's' is a word in the Trie; it's not, so 'cats' cannot be formed by concatenation of two words from the array. We add 'cats' to the Trie.
  4. Now, let's consider 'catsdog':

    • We find 'cat' in the Trie. Now we perform a DFS for 'sdog'.
    • 's' is not found as a word, but 'cats' is found. We perform a DFS for 'dog', which is found in the Trie.
    • Since we are able to form 'catsdog' by using 'cats' and 'dog', 'catsdog' is a concatenated word, and we add it to our ans list.
  5. Lastly, 'catxdogcatx':

    • We start with dfs on 'cat', which is found, then we look for 'xdogcatx', which is not found.
    • At this point, no further decompositions produce valid concatenated words, so we do not add 'catxdogcatx' to our ans list, but we do insert it into our Trie.

In the end, our ans list contains the concatenated word: ["catsdog"]. We return this list as the output of our algorithm.

Python Solution

1from typing import List
2
3class Trie:
4    def __init__(self):
5        self.children = [None] * 26  # Initialize the trie nodes for each letter of the alphabet
6        self.is_end_of_word = False   # Boolean to check if the node is the end of a word
7
8    def insert(self, word):
9        node = self
10        for char in word:
11            index = ord(char) - ord('a')  # Calculate the alphabetical index
12            # If the character doesn't have an associated Trie node, create it
13            if node.children[index] is None:
14                node.children[index] = Trie()
15            # Move to the child node associated with the character
16            node = node.children[index]
17        # Mark the last node as the end of a word
18        node.is_end_of_word = True
19
20
21class Solution:
22    def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
23
24        # Helper function to perform depth-first search in the trie
25        def dfs(word):
26            if not word:  # If the word is empty, we've found a valid concatenation
27                return True
28            node = trie  # Start at the root of the trie
29            for i, char in enumerate(word):
30                index = ord(char) - ord('a')
31                # If the character is not in the trie, the word cannot be formed
32                if node.children[index] is None:
33                    return False
34                node = node.children[index]
35                # If the current node is an end of a word and the next substring is also a word
36                if node.is_end_of_word and dfs(word[i + 1:]):
37                    return True  # The word can be concatenated from other words
38            return False  # No valid concatenation found
39
40        # Initialize Trie and the answer list
41        trie = Trie()
42        concatenated_words = []
43        # Sort the words by length to ensure that shorter words are inserted into the trie first
44        words.sort(key=len)
45        for word in words:
46            # Check if the word can be formed by the concatenation of other words
47            if dfs(word):
48                concatenated_words.append(word)
49            else:
50                # If not, insert the word into the trie for future reference
51                trie.insert(word)
52        return concatenated_words
53

Java Solution

1class Trie {
2    // Trie children nodes
3    Trie[] children = new Trie[26];
4    // Flag to indicate if a word ends at this node
5    boolean isEndOfWord;
6
7    // Insert a word into the trie
8    public void insert(String word) {
9        Trie node = this; // Start with the root
10        for (char c : word.toCharArray()) { // Traverse the word
11            int index = c - 'a'; // map character to index (0-25)
12            if (node.children[index] == null) { // If the path doesn't exist, create a new node
13                node.children[index] = new Trie();
14            }
15            node = node.children[index]; // Move to the child node
16        }
17        node.isEndOfWord = true; // Mark the end of a word
18    }
19}
20
21class Solution {
22    private Trie trie = new Trie(); // Root of the trie
23
24    public List<String> findAllConcatenatedWordsInADict(String[] words) {
25        // Sort the words array by length
26        Arrays.sort(words, (a, b) -> a.length() - b.length());
27        List<String> result = new ArrayList<>(); // List to store the concatenated words
28
29        // Go through each word in words
30        for (String word : words) {
31            if (isConcatenated(word)) { // If the word is a concatenated word
32                result.add(word); // Add it to the result list
33            } else {
34                trie.insert(word); // Otherwise, insert it into the trie for future checks
35            }
36        }
37        return result; // Return the list of concatenated words
38    }
39
40    // Helper method to check if a word is a concatenated word
41    private boolean isConcatenated(String word) {
42        if (word.isEmpty()) { // Base case for an empty string
43            return true;
44        }
45        Trie node = trie; // Start from the root of the trie
46        for (int i = 0; i < word.length(); ++i) { // Go through each character in word
47            int index = word.charAt(i) - 'a'; // Get index of the character
48            if (node.children[index] == null) { // If there is no path, it's not a concatenated word
49                return false;
50            }
51            node = node.children[index]; // Move to the corresponding child
52
53            // If we reach the end of a word and the remaining substring is a concatenated word
54            if (node.isEndOfWord && isConcatenated(word.substring(i + 1))) {
55                return true; // The word is a concatenated word
56            }
57        }
58        return false; // The word is not a concatenated word
59    }
60}
61

C++ Solution

1#include <vector>
2#include <string>
3#include <algorithm>
4
5// Trie class representing the prefix tree structure
6class Trie {
7public:
8    // Pointers to children Trie nodes for each letter
9    std::vector<Trie*> children;
10    // Indicator of whether a node marks the end of a word
11    bool isEndWord;
12
13    // Constructor initializes the children vector with null pointers
14    // and sets isEndWord to false
15    Trie()
16        : children(26, nullptr), isEndWord(false) {}
17
18    // Function to insert a word into the trie
19    void insert(const std::string& word) {
20        Trie* node = this;
21        for (char c : word) {
22            int index = c - 'a';
23            // Create a new node if it does not exist
24            if (!node->children[index]) {
25                node->children[index] = new Trie();
26            }
27            // Move to the child node corresponding to the current character
28            node = node->children[index];
29        }
30        // Mark the end of the word
31        node->isEndWord = true;
32    }
33};
34
35// Solution class to find all concatenated words in a dictionary
36class Solution {
37public:
38    // Instance of Trie initialized to store the dictionary words
39    Trie* rootTrie = new Trie();
40
41    // Function to find all concatenated words in a given dictionary
42    std::vector<std::string> findAllConcatenatedWordsInADict(std::vector<std::string>& words) {
43        // Sort the input words based on their lengths (ascending order)
44        std::sort(words.begin(), words.end(), [](const std::string& a, const std::string& b) {
45            return a.size() < b.size();
46        });
47
48        // Vector to store the concatenated words
49        std::vector<std::string> concatenatedWords;
50        // Iterate over each word and check if it is a concatenated word
51        for (const auto& word : words) {
52            // If the current word can be formed by concatenating other words, add it to the result
53            if (isConcatenatedWord(word))
54                concatenatedWords.push_back(word);
55            else
56                // If not, add the word to the trie for future use
57                rootTrie->insert(word);
58        }
59        return concatenatedWords;
60    }
61
62    // Helper function to perform depth-first search for concatenated words
63    bool isConcatenatedWord(const std::string& word) {
64        if (word.empty()) return true;
65        Trie* node = rootTrie;
66        for (int i = 0; i < word.size(); ++i) {
67            int index = word[i] - 'a';
68            // If the current character's node does not exist, word formation is not possible
69            if (!node->children[index]) return false;
70            // Move to the next child node
71            node = node->children[index];
72            // If we reached the end of a word within the trie, recursively check for the next subword
73            if (node->isEndWord && isConcatenatedWord(word.substr(i + 1)))
74                return true;
75        }
76        // Return false if the word is not a concatenated word
77        return false;
78    }
79};
80

Typescript Solution

1// Initialize an array to store the children Trie nodes for each letter
2const children: (Node | null)[] = new Array<Node | null>(26).fill(null);
3// Initialize a variable to indicate whether a node marks the end of a word
4let isEndWord: boolean = false;
5
6// The Node type definition for typescript
7type Node = {
8    children: (Node | null)[];
9    isEndWord: boolean;
10};
11
12// Function to create a new Trie node
13function createNode(): Node {
14    return {
15        children: new Array<Node | null>(26).fill(null),
16        isEndWord: false,
17    };
18}
19
20// Function to insert a word into the trie (global variable 'children')
21function insert(word: string) {
22    let node: Node | null = {children, isEndWord};
23    for (const char of word) {
24        const index = char.charCodeAt(0) - 'a'.charCodeAt(0);
25        if (!node.children[index]) {
26            node.children[index] = createNode();
27        }
28        node = node.children[index];
29    }
30    if (node) {
31        node.isEndWord = true;
32    }
33}
34
35// Global Trie root node variable, using 'createNode' for initial structure
36const rootTrie: Node = createNode();
37
38// Function to find all concatenated words in a given dictionary
39function findAllConcatenatedWordsInADict(words: string[]): string[] {
40    words.sort((a, b) => a.length - b.length);
41
42    const concatenatedWords: string[] = [];
43    for (const word of words) {
44        if (isConcatenatedWord(word, true))
45            concatenatedWords.push(word);
46        else
47            insert(word);
48    }
49    return concatenatedWords;
50}
51
52// Helper function to check if the word is concatenated
53function isConcatenatedWord(word: string, isFirstCall: boolean = false): boolean {
54    if (word === "") return !isFirstCall;
55
56    let node: Node | null = rootTrie;
57    for (let i = 0; i < word.length; i++) {
58        const index = word.charCodeAt(i) - 'a'.charCodeAt(0);
59        if (!node.children[index]) return false;
60        node = node.children[index];
61        // If we reach the end of a word within the trie, recursively check for the next subword
62        if (node && node.isEndWord) {
63            if (isConcatenatedWord(word.slice(i + 1))) return true;
64        }
65    }
66    // If the loop completes without returning, the word is not a concatenated word
67    return false;
68}
69

Time and Space Complexity

Time Complexity

The time complexity of the insert function in the Trie class is O(L), where L is the length of the word being inserted. This is because the function iterates over each character of the word and, in the worst case, needs to insert a new node for each character.

The findAllConcatenatedWordsInADict function's main complexity comes from two parts: sorting the words list and performing depth-first searches (DFS) on the words using the dfs helper function.

  1. Sorting the words list has a time complexity of O(N * log(N)), where N is the number of words in the list. The sorting is based on the lengths of words.

  2. The dfs function is called for each word to determine if it can be formed by concatenating other words in the trie. In the worst case, this DFS could take up to O(L^2) for a word of length L, since for every starting character in the word, we may need to search the remainder of the word in the trie.

Combining these two complexities, the overall time complexity of findAllConcatenatedWordsInADict can be estimated as O(N * log(N) + N * L^2). The N * log(N) term is for sorting, and N * L^2 is for DFS calls on each word.

Space Complexity

The space complexity is dominated by the space required to store the trie and the output list ans.

  1. The Trie could potentially have O(T) nodes, where T is the total number of characters across all words. However, since the trie is compact and reuses nodes for common prefixes, the space is generally less than that.

  2. The ans list can have at most N elements if every word can be formed by concatenating other words from the list.

Combining both, the space complexity can be estimated as O(T + N).

Note: These complexities assume a fixed alphabet size which does not grow relative to N or L, thus it's not factored into the complexity expressions.

😈
Become an
Algo Monster

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫