472. Concatenated Words
Problem Description
You are given an array of strings words that contains no duplicates. Your task is to find all the concatenated words in this array.
A concatenated word is a string that can be formed by combining at least two other shorter words from the same array. The shorter words used to form the concatenated word don't need to be distinct - you can use the same word multiple times.
For example:
- If words = ["cat", "cats", "dog", "catsdogs"], then"catsdogs"is a concatenated word because it can be formed by combining"cats"and"dogs"(or alternatively"cat","s","dog","s"if those were all in the array).
- A word like "cat"by itself is not a concatenated word since it cannot be broken down into at least two other words from the array.
The solution uses a Trie data structure combined with dynamic programming. Words are sorted by length and processed from shortest to longest. For each word, the algorithm checks if it can be formed by concatenating existing shorter words in the Trie. If a word can be concatenated, it's added to the result list; otherwise, it's inserted into the Trie for potential use in forming longer words.
The dfs function recursively checks if a word can be split into valid concatenations by traversing the Trie and looking for complete words (marked by is_end = True) at each position, then recursively checking if the remaining portion can also be formed from words in the Trie.
Flowchart Walkthrough
First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:
Is it a graph?
- Yes: While not explicitly a traditional graph problem, we can model this as a graph where each word is a node, and there's a directed edge from word A to word B if word B starts where word A ends in forming a concatenation. The Trie structure itself represents a graph of character transitions.
Is it a tree?
- Yes: The Trie data structure used in the solution is inherently a tree structure. Each path from root to a leaf represents a word, and we traverse this tree to find valid concatenations.
DFS
- Yes: The solution uses DFS to explore all possible ways to break down a word into concatenations. The dfsfunction recursively traverses the Trie, and whenever it finds a complete word (marked byis_end = True), it recursively checks if the remaining portion of the string can also be formed from words in the Trie.
Conclusion: The flowchart correctly leads us to DFS as the appropriate algorithm. The problem requires:
- Building a Trie (tree structure) from the words
- Using DFS to recursively check if each word can be decomposed into at least two other words from the array
- The backtracking nature of DFS allows us to explore all possible ways to split a word and return true if any valid concatenation is found
The DFS pattern is essential here because we need to explore all possible splitting points in a word and recursively validate each split, which is naturally handled by the depth-first traversal approach.
Intuition
The key insight is that we need to check if a word can be built from other words in our dictionary. Think of it like building with LEGO blocks - can we construct a bigger piece using smaller pieces we already have?
When we look at a word like "catsdogs", we need to check if we can split it into valid words from our array. We could try "c" + "atsdogs", "ca" + "tsdogs", "cat" + "sdogs", "cats" + "dogs", and so on. If we find that "cats" is a valid word in our array, we then need to check if "dogs" can also be formed from words in our array.
This naturally leads to a recursive approach: at each position in the word, we check if the substring from the start to that position forms a valid word. If it does, we recursively check if the remaining part can also be formed from valid words.
But how do we efficiently check if a substring is a valid word? Scanning through the entire word array each time would be inefficient. This is where the Trie comes in - it allows us to check character by character if we're forming a valid word, and know immediately when we've found a complete word (marked by is_end).
The clever part of the solution is processing words by length. We start with the shortest words and add them to our Trie. When we check longer words, all potential building blocks (shorter words) are already in the Trie. If a word can be concatenated from others, we add it to our result; if not, it becomes a potential building block for even longer words, so we add it to the Trie.
This approach ensures that when we check if "catsdogs" is a concatenated word, both "cats" and "dogs" are already in our Trie, making the check possible through our DFS traversal.
Solution Approach
The solution implements a Trie-based approach with DFS to identify concatenated words. Here's how the implementation works:
1. Trie Data Structure Setup
First, we define a Trie class to efficiently store and search for words:
- Each Trie node has an array childrenof size 26 (for lowercase letters a-z)
- A boolean flag is_endmarks the end of a complete word
- The insertmethod adds words to the Trie by creating nodes for each character
2. Sorting Words by Length
The algorithm sorts the input words by length: words.sort(key=lambda x: len(x)). This is crucial because:
- Shorter words cannot be concatenated from longer words
- We build our Trie incrementally with shorter words first
- When checking if a longer word is concatenated, all its potential components are already in the Trie
3. DFS Function for Validation
The dfs function checks if a word can be formed by concatenating other words:
def dfs(w):
    if not w:
        return True  # Empty string means we've successfully split the entire word
The function traverses the Trie character by character:
- For each character, it moves to the corresponding child node
- If no child exists, the current prefix isn't a valid word, so return False
- When it finds a complete word (node.is_end = True), it recursively checks if the remaining substring can also be formed from words in the Trie
4. Main Processing Loop
For each word in the sorted list:
for w in words: if dfs(w): ans.append(w) # Word can be concatenated, add to result else: trie.insert(w) # Word cannot be concatenated, add it as a building block
5. Key Implementation Detail
The recursive call dfs(w[i + 1:]) in the DFS function is critical:
- When we find a valid word at position i, we check if the substring fromi+1onwards can also be formed
- This ensures the word is split into at least two parts (the found word and the remaining substring)
- The recursion naturally handles cases where a word can be split into more than two parts
The algorithm's efficiency comes from:
- Using a Trie for O(1) character lookups instead of scanning the word list
- Processing words in length order to ensure all potential components are available
- Memoization through the Trie structure itself (once a word is added, it's instantly available for all future checks)
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example with words = ["cat", "cats", "dog", "catsdogs"].
Step 1: Sort by length
After sorting: ["cat", "dog", "cats", "catsdogs"]
Step 2: Process "cat" (length 3)
- Call dfs("cat")with empty Trie
- Start traversing: 'c' → no child exists in empty Trie
- Return False(cannot be formed from other words)
- Since DFS returns False, insert "cat" into Trie
Trie now contains: "cat"
Step 3: Process "dog" (length 3)
- Call dfs("dog")
- Start traversing: 'd' → no child for 'd' at root
- Return False
- Insert "dog" into Trie
Trie now contains: "cat", "dog"
Step 4: Process "cats" (length 4)
- Call dfs("cats")
- Traverse: 'c' → 'a' → 't' (found is_end=Trueat index 2)
- Found "cat" is a complete word, recursively call dfs("s")
- In dfs("s"): 's' → no child for 's' at root
- Return False
- Overall dfs("cats")returnsFalse
- Insert "cats" into Trie
Trie now contains: "cat", "cats", "dog"
Step 5: Process "catsdogs" (length 8)
- Call dfs("catsdogs")
- Traverse: 'c' → 'a' → 't' (found is_end=Trueat index 2)- Found "cat", recursively call dfs("sdogs")
- In dfs("sdogs"): 's' → no match at root, returnFalse
 
- Found "cat", recursively call 
- Continue traversing: 'c' → 'a' → 't' → 's' (found is_end=Trueat index 3)- Found "cats", recursively call dfs("dogs")
- In dfs("dogs"): traverse 'd' → 'o' → 'g' → 's'
- No match for "dogs" (only "dog" exists), return False
 
- Found "cats", recursively call 
- Continue traversing to check if "catsdogs" itself is in Trie
- Reach index 7 ('g'), find is_end=Truewould be at 's'
- Actually, let me reconsider...
Step 5 (corrected): Process "catsdogs" (length 8)
- Call dfs("catsdogs")
- Traverse: 'c' → 'a' → 't' → 's' (found is_end=Trueat index 3)- Found "cats", recursively call dfs("dogs")
- In dfs("dogs"): traverse 'd' → 'o' → 'g' (foundis_end=Trueat index 2)- Found "dog", recursively call dfs("s")
- In dfs("s"): 's' → no child at root, returnFalse
 
- Found "dog", recursively call 
- Back in dfs("dogs"): continue to check full "dogs"
- No "dogs" in Trie (only "dog"), traversal fails
- Return False
 
- Found "cats", recursively call 
Wait, this example shows "catsdogs" wouldn't be concatenated since "dogs" isn't in our array. Let me use a better example.
Better Example: words = ["cat", "cats", "dog", "catsdog"]
After sorting: ["cat", "dog", "cats", "catsdog"]
Processing "cat", "dog", "cats" as before - all get inserted into Trie.
Process "catsdog" (length 7)
- Call dfs("catsdog")
- Traverse: 'c' → 'a' → 't' → 's' (found is_end=Trueat index 3)- Found "cats", recursively call dfs("dog")
- In dfs("dog"): traverse 'd' → 'o' → 'g' (foundis_end=Trueat index 2)- Found complete word "dog", recursively call dfs("")
- Empty string returns True
 
- Found complete word "dog", recursively call 
- dfs("dog")returns- True
 
- Found "cats", recursively call 
- dfs("catsdog")returns- True
- Add "catsdog" to result list
Final Result: ["catsdog"]
Solution Implementation
1class Trie:
2    """Trie (prefix tree) data structure for efficient string storage and retrieval."""
3  
4    def __init__(self):
5        # Array to store 26 children nodes (one for each lowercase letter)
6        self.children = [None] * 26
7        # Flag to mark if current node represents end of a word
8        self.is_end = False
9
10    def insert(self, word: str) -> None:
11        """Insert a word into the trie."""
12        node = self
13        for char in word:
14            # Calculate index for the character (0-25)
15            index = ord(char) - ord('a')
16            # Create new node if path doesn't exist
17            if node.children[index] is None:
18                node.children[index] = Trie()
19            # Move to the child node
20            node = node.children[index]
21        # Mark the last node as end of word
22        node.is_end = True
23
24
25class Solution:
26    def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
27        """
28        Find all concatenated words in a dictionary.
29        A concatenated word is formed by combining two or more shorter words from the same dictionary.
30        """
31      
32        def can_form_by_concatenation(word: str) -> bool:
33            """
34            Check if a word can be formed by concatenating other words in the trie.
35            Uses DFS to try all possible splits of the word.
36            """
37            # Empty string means we successfully split the entire word
38            if not word:
39                return True
40          
41            node = trie
42            # Try to match prefixes of the word
43            for i, char in enumerate(word):
44                index = ord(char) - ord('a')
45                # If no matching path exists, word cannot be formed
46                if node.children[index] is None:
47                    return False
48              
49                node = node.children[index]
50                # If we found a complete word, try to match the remaining part
51                if node.is_end and can_form_by_concatenation(word[i + 1:]):
52                    return True
53          
54            return False
55
56        # Initialize trie and result list
57        trie = Trie()
58        result = []
59      
60        # Sort words by length to ensure shorter words are added to trie first
61        # This allows longer words to be formed from shorter ones
62        words.sort(key=lambda word: len(word))
63      
64        for word in words:
65            # Check if current word can be formed by concatenating existing words
66            if can_form_by_concatenation(word):
67                result.append(word)
68            else:
69                # If not, add it to trie for future concatenations
70                trie.insert(word)
71      
72        return result
731/**
2 * Trie (Prefix Tree) data structure for efficient string storage and retrieval
3 */
4class Trie {
5    // Array to store 26 child nodes (one for each lowercase letter a-z)
6    Trie[] children = new Trie[26];
7    // Flag to mark if current node represents the end of a word
8    boolean isEndOfWord;
9
10    /**
11     * Inserts a word into the trie
12     * @param word The word to be inserted
13     */
14    void insert(String word) {
15        Trie currentNode = this;
16      
17        // Traverse through each character in the word
18        for (char character : word.toCharArray()) {
19            // Convert character to index (0-25)
20            int index = character - 'a';
21          
22            // Create new node if path doesn't exist
23            if (currentNode.children[index] == null) {
24                currentNode.children[index] = new Trie();
25            }
26          
27            // Move to the child node
28            currentNode = currentNode.children[index];
29        }
30      
31        // Mark the last node as end of word
32        currentNode.isEndOfWord = true;
33    }
34}
35
36/**
37 * Solution class to find all concatenated words in a dictionary
38 * A concatenated word is formed by combining two or more shorter words from the dictionary
39 */
40class Solution {
41    // Trie to store all words for efficient prefix matching
42    private Trie trie = new Trie();
43
44    /**
45     * Finds all concatenated words in the given array
46     * @param words Array of words to check
47     * @return List of concatenated words
48     */
49    public List<String> findAllConcatenatedWordsInADict(String[] words) {
50        // Sort words by length to ensure shorter words are processed first
51        Arrays.sort(words, (word1, word2) -> word1.length() - word2.length());
52      
53        List<String> concatenatedWords = new ArrayList<>();
54      
55        // Process each word
56        for (String word : words) {
57            // Check if current word can be formed by concatenating existing words in trie
58            if (canFormByConcatenation(word)) {
59                concatenatedWords.add(word);
60            } else {
61                // If not a concatenated word, add it to trie for future use
62                trie.insert(word);
63            }
64        }
65      
66        return concatenatedWords;
67    }
68
69    /**
70     * Checks if a word can be formed by concatenating other words in the trie
71     * Uses DFS to explore all possible word combinations
72     * @param word The word to check
73     * @return true if word can be formed by concatenation, false otherwise
74     */
75    private boolean canFormByConcatenation(String word) {
76        // Base case: empty string means we've successfully matched all parts
77        if ("".equals(word)) {
78            return true;
79        }
80      
81        Trie currentNode = trie;
82      
83        // Try to match prefixes of the word
84        for (int i = 0; i < word.length(); ++i) {
85            int charIndex = word.charAt(i) - 'a';
86          
87            // If no path exists for current character, word cannot be formed
88            if (currentNode.children[charIndex] == null) {
89                return false;
90            }
91          
92            // Move to next node
93            currentNode = currentNode.children[charIndex];
94          
95            // If current position marks end of a word, try to match remaining part
96            if (currentNode.isEndOfWord && canFormByConcatenation(word.substring(i + 1))) {
97                return true;
98            }
99        }
100      
101        // No valid concatenation found
102        return false;
103    }
104}
1051class Trie {
2public:
3    vector<Trie*> children;  // Array of 26 pointers for each lowercase letter
4    bool isEnd;              // Flag to mark end of a word
5  
6    // Constructor initializes children array with nullptrs and isEnd as false
7    Trie() : children(26), isEnd(false) {}
8
9    // Insert a word into the trie
10    void insert(string word) {
11        Trie* node = this;
12        for (char ch : word) {
13            int index = ch - 'a';  // Convert character to index (0-25)
14            // Create new node if path doesn't exist
15            if (!node->children[index]) {
16                node->children[index] = new Trie();
17            }
18            node = node->children[index];  // Move to next node
19        }
20        node->isEnd = true;  // Mark the end of the word
21    }
22};
23
24class Solution {
25public:
26    Trie* trie = new Trie();  // Root of the trie structure
27
28    vector<string> findAllConcatenatedWordsInADict(vector<string>& words) {
29        // Sort words by length to ensure shorter words are added to trie first
30        sort(words.begin(), words.end(), [&](const string& a, const string& b) {
31            return a.size() < b.size();
32        });
33      
34        vector<string> result;
35      
36        // Process each word
37        for (auto& word : words) {
38            // Check if current word can be formed by concatenating existing words
39            if (dfs(word)) {
40                result.push_back(word);  // Add to result if it's a concatenated word
41            } else {
42                trie->insert(word);  // Otherwise, add it to trie for future use
43            }
44        }
45      
46        return result;
47    }
48
49    // Recursively check if a word can be formed by concatenating words in trie
50    bool dfs(string word) {
51        // Base case: empty string means successful concatenation
52        if (word == "") {
53            return true;
54        }
55      
56        Trie* node = trie;
57      
58        // Try to match prefixes of the word
59        for (int i = 0; i < word.size(); ++i) {
60            int index = word[i] - 'a';
61          
62            // If current character path doesn't exist, word can't be formed
63            if (!node->children[index]) {
64                return false;
65            }
66          
67            node = node->children[index];
68          
69            // If we found a complete word, try to match the remaining substring
70            if (node->isEnd && dfs(word.substr(i + 1))) {
71                return true;
72            }
73        }
74      
75        return false;  // No valid concatenation found
76    }
77};
781// Trie node structure for efficient word storage and retrieval
2class TrieNode {
3    children: (TrieNode | null)[];  // Array of 26 pointers for each lowercase letter
4    isEnd: boolean;                  // Flag to mark end of a word
5  
6    // Constructor initializes children array with nulls and isEnd as false
7    constructor() {
8        this.children = new Array(26).fill(null);
9        this.isEnd = false;
10    }
11}
12
13// Root of the trie structure
14let trie: TrieNode = new TrieNode();
15
16// Insert a word into the trie
17function insertWord(word: string): void {
18    let node = trie;
19  
20    for (const ch of word) {
21        const index = ch.charCodeAt(0) - 'a'.charCodeAt(0);  // Convert character to index (0-25)
22      
23        // Create new node if path doesn't exist
24        if (!node.children[index]) {
25            node.children[index] = new TrieNode();
26        }
27      
28        node = node.children[index]!;  // Move to next node
29    }
30  
31    node.isEnd = true;  // Mark the end of the word
32}
33
34// Recursively check if a word can be formed by concatenating words in trie
35function dfs(word: string): boolean {
36    // Base case: empty string means successful concatenation
37    if (word === "") {
38        return true;
39    }
40  
41    let node = trie;
42  
43    // Try to match prefixes of the word
44    for (let i = 0; i < word.length; i++) {
45        const index = word.charCodeAt(i) - 'a'.charCodeAt(0);
46      
47        // If current character path doesn't exist, word can't be formed
48        if (!node.children[index]) {
49            return false;
50        }
51      
52        node = node.children[index]!;
53      
54        // If we found a complete word, try to match the remaining substring
55        if (node.isEnd && dfs(word.substring(i + 1))) {
56            return true;
57        }
58    }
59  
60    return false;  // No valid concatenation found
61}
62
63// Main function to find all concatenated words in a dictionary
64function findAllConcatenatedWordsInADict(words: string[]): string[] {
65    // Reset trie for each function call
66    trie = new TrieNode();
67  
68    // Sort words by length to ensure shorter words are added to trie first
69    words.sort((a, b) => a.length - b.length);
70  
71    const result: string[] = [];
72  
73    // Process each word
74    for (const word of words) {
75        // Check if current word can be formed by concatenating existing words
76        if (dfs(word)) {
77            result.push(word);  // Add to result if it's a concatenated word
78        } else {
79            insertWord(word);   // Otherwise, add it to trie for future use
80        }
81    }
82  
83    return result;
84}
85Time and Space Complexity
Time Complexity: O(n * m^2) where n is the number of words and m is the average length of words.
- Sorting the words takes O(n log n)time
- For each word, we perform a DFS traversal:
- In the worst case, for each character position iin a word of lengthm, we might check if it forms a valid word (takingO(m)time to traverse the trie)
- Then recursively call DFS on the remaining substring
- This can lead to checking all possible split positions, resulting in O(m^2)time per word in the worst case
 
- In the worst case, for each character position 
- The trie insertion for non-concatenated words takes O(m)time per word
- Overall: O(n log n + n * m^2)=O(n * m^2)(since typicallym^2 > log n)
Space Complexity: O(n * m * 26) which simplifies to O(n * m)
- The Trie structure stores all words:
- In the worst case, if all words are completely different, we need O(n * m)nodes
- Each node contains an array of 26 pointers, so technically O(n * m * 26)
- Since 26 is a constant, this simplifies to O(n * m)
 
- In the worst case, if all words are completely different, we need 
- The recursion stack for DFS can go up to O(m)depth in the worst case
- The result list stores at most O(n)words
- Overall: O(n * m)for the trie structure dominates the space complexity
Common Pitfalls
1. Forgetting to Handle Empty Words
The algorithm assumes all words have at least one character. If the input contains empty strings, the code will add them to the Trie and potentially cause incorrect results when checking concatenations.
Solution: Filter out empty strings before processing:
words = [w for w in words if w] # Remove empty strings
2. Incorrect Base Case - Accepting Single Word as Valid
A critical pitfall is not ensuring that a word is formed from at least two other words. The current implementation correctly handles this, but a common mistake would be to check if node.is_end is True at the end of the DFS without ensuring we've made at least one split:
# WRONG: This would accept single words as "concatenated"
def can_form_by_concatenation(word: str, split_count=0) -> bool:
    if not word:
        return split_count > 0  # Should be split_count >= 2
Solution: The current implementation correctly avoids this by only adding words to the result if DFS returns True (meaning a valid split was found), and never inserting a word that can be formed by concatenation back into the Trie.
3. Stack Overflow with Deep Recursion
For very long words or deeply nested concatenations, the recursive DFS might hit Python's recursion limit. For example, if you have many single-character words and a very long concatenated word.
Solution: Use iterative DFS with explicit stack or add memoization:
def can_form_by_concatenation(word: str) -> bool:
    memo = {}
  
    def dfs(start_idx):
        if start_idx == len(word):
            return True
        if start_idx in memo:
            return memo[start_idx]
      
        node = trie
        for i in range(start_idx, len(word)):
            index = ord(word[i]) - ord('a')
            if node.children[index] is None:
                break
            node = node.children[index]
            if node.is_end and dfs(i + 1):
                memo[start_idx] = True
                return True
      
        memo[start_idx] = False
        return False
  
    return dfs(0)
4. Performance Degradation with Duplicate Substrings
The current implementation creates new string slices (word[i + 1:]) in each recursive call, which can be inefficient for long words with many possible splits. Each slice creates a new string object.
Solution: Pass indices instead of creating substrings:
def can_form_by_concatenation(word: str, start=0) -> bool:
    if start == len(word):
        return True
  
    node = trie
    for i in range(start, len(word)):
        index = ord(word[i]) - ord('a')
        if node.children[index] is None:
            return False
        node = node.children[index]
        if node.is_end and can_form_by_concatenation(word, i + 1):
            return True
  
    return False
5. Not Handling Case Sensitivity
The current implementation assumes all input words contain only lowercase letters. Mixed case inputs will cause index out of bounds errors.
Solution: Normalize input or add validation:
# Option 1: Normalize
words = [w.lower() for w in words]
# Option 2: Validate
for word in words:
    if not word.islower():
        raise ValueError(f"Word '{word}' contains non-lowercase characters")
Which of these pictures shows the visit order of a depth-first search?

Recommended Readings
https assets algo monster cover_photos dfs svg Depth First Search Prereqs Recursion Review problems recursion_intro Trees problems tree_intro With a solid understanding of recursion under our belts we are now ready to tackle one of the most useful techniques in coding interviews Depth First Search DFS As the name suggests
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Want a Structured Path to Master System Design Too? Don’t Miss This!