Facebook Pixel

472. Concatenated Words

Problem Description

You are given an array of strings words that contains no duplicates. Your task is to find all the concatenated words in this array.

A concatenated word is a string that can be formed by combining at least two other shorter words from the same array. The shorter words used to form the concatenated word don't need to be distinct - you can use the same word multiple times.

For example:

  • If words = ["cat", "cats", "dog", "catsdogs"], then "catsdogs" is a concatenated word because it can be formed by combining "cats" and "dogs" (or alternatively "cat", "s", "dog", "s" if those were all in the array).
  • A word like "cat" by itself is not a concatenated word since it cannot be broken down into at least two other words from the array.

The solution uses a Trie data structure combined with dynamic programming. Words are sorted by length and processed from shortest to longest. For each word, the algorithm checks if it can be formed by concatenating existing shorter words in the Trie. If a word can be concatenated, it's added to the result list; otherwise, it's inserted into the Trie for potential use in forming longer words.

The dfs function recursively checks if a word can be split into valid concatenations by traversing the Trie and looking for complete words (marked by is_end = True) at each position, then recursively checking if the remaining portion can also be formed from words in the Trie.

Flowchart Walkthrough

First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:

Is it a graph?

  • Yes: While not explicitly a traditional graph problem, we can model this as a graph where each word is a node, and there's a directed edge from word A to word B if word B starts where word A ends in forming a concatenation. The Trie structure itself represents a graph of character transitions.

Is it a tree?

  • Yes: The Trie data structure used in the solution is inherently a tree structure. Each path from root to a leaf represents a word, and we traverse this tree to find valid concatenations.

DFS

  • Yes: The solution uses DFS to explore all possible ways to break down a word into concatenations. The dfs function recursively traverses the Trie, and whenever it finds a complete word (marked by is_end = True), it recursively checks if the remaining portion of the string can also be formed from words in the Trie.

Conclusion: The flowchart correctly leads us to DFS as the appropriate algorithm. The problem requires:

  1. Building a Trie (tree structure) from the words
  2. Using DFS to recursively check if each word can be decomposed into at least two other words from the array
  3. The backtracking nature of DFS allows us to explore all possible ways to split a word and return true if any valid concatenation is found

The DFS pattern is essential here because we need to explore all possible splitting points in a word and recursively validate each split, which is naturally handled by the depth-first traversal approach.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that we need to check if a word can be built from other words in our dictionary. Think of it like building with LEGO blocks - can we construct a bigger piece using smaller pieces we already have?

When we look at a word like "catsdogs", we need to check if we can split it into valid words from our array. We could try "c" + "atsdogs", "ca" + "tsdogs", "cat" + "sdogs", "cats" + "dogs", and so on. If we find that "cats" is a valid word in our array, we then need to check if "dogs" can also be formed from words in our array.

This naturally leads to a recursive approach: at each position in the word, we check if the substring from the start to that position forms a valid word. If it does, we recursively check if the remaining part can also be formed from valid words.

But how do we efficiently check if a substring is a valid word? Scanning through the entire word array each time would be inefficient. This is where the Trie comes in - it allows us to check character by character if we're forming a valid word, and know immediately when we've found a complete word (marked by is_end).

The clever part of the solution is processing words by length. We start with the shortest words and add them to our Trie. When we check longer words, all potential building blocks (shorter words) are already in the Trie. If a word can be concatenated from others, we add it to our result; if not, it becomes a potential building block for even longer words, so we add it to the Trie.

This approach ensures that when we check if "catsdogs" is a concatenated word, both "cats" and "dogs" are already in our Trie, making the check possible through our DFS traversal.

Solution Approach

The solution implements a Trie-based approach with DFS to identify concatenated words. Here's how the implementation works:

1. Trie Data Structure Setup

First, we define a Trie class to efficiently store and search for words:

  • Each Trie node has an array children of size 26 (for lowercase letters a-z)
  • A boolean flag is_end marks the end of a complete word
  • The insert method adds words to the Trie by creating nodes for each character

2. Sorting Words by Length

The algorithm sorts the input words by length: words.sort(key=lambda x: len(x)). This is crucial because:

  • Shorter words cannot be concatenated from longer words
  • We build our Trie incrementally with shorter words first
  • When checking if a longer word is concatenated, all its potential components are already in the Trie

3. DFS Function for Validation

The dfs function checks if a word can be formed by concatenating other words:

def dfs(w):
    if not w:
        return True  # Empty string means we've successfully split the entire word

The function traverses the Trie character by character:

  • For each character, it moves to the corresponding child node
  • If no child exists, the current prefix isn't a valid word, so return False
  • When it finds a complete word (node.is_end = True), it recursively checks if the remaining substring can also be formed from words in the Trie

4. Main Processing Loop

For each word in the sorted list:

for w in words:
    if dfs(w):
        ans.append(w)  # Word can be concatenated, add to result
    else:
        trie.insert(w)  # Word cannot be concatenated, add it as a building block

5. Key Implementation Detail

The recursive call dfs(w[i + 1:]) in the DFS function is critical:

  • When we find a valid word at position i, we check if the substring from i+1 onwards can also be formed
  • This ensures the word is split into at least two parts (the found word and the remaining substring)
  • The recursion naturally handles cases where a word can be split into more than two parts

The algorithm's efficiency comes from:

  • Using a Trie for O(1) character lookups instead of scanning the word list
  • Processing words in length order to ensure all potential components are available
  • Memoization through the Trie structure itself (once a word is added, it's instantly available for all future checks)

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a small example with words = ["cat", "cats", "dog", "catsdogs"].

Step 1: Sort by length After sorting: ["cat", "dog", "cats", "catsdogs"]

Step 2: Process "cat" (length 3)

  • Call dfs("cat") with empty Trie
  • Start traversing: 'c' → no child exists in empty Trie
  • Return False (cannot be formed from other words)
  • Since DFS returns False, insert "cat" into Trie

Trie now contains: "cat"

Step 3: Process "dog" (length 3)

  • Call dfs("dog")
  • Start traversing: 'd' → no child for 'd' at root
  • Return False
  • Insert "dog" into Trie

Trie now contains: "cat", "dog"

Step 4: Process "cats" (length 4)

  • Call dfs("cats")
  • Traverse: 'c' → 'a' → 't' (found is_end=True at index 2)
  • Found "cat" is a complete word, recursively call dfs("s")
  • In dfs("s"): 's' → no child for 's' at root
  • Return False
  • Overall dfs("cats") returns False
  • Insert "cats" into Trie

Trie now contains: "cat", "cats", "dog"

Step 5: Process "catsdogs" (length 8)

  • Call dfs("catsdogs")
  • Traverse: 'c' → 'a' → 't' (found is_end=True at index 2)
    • Found "cat", recursively call dfs("sdogs")
    • In dfs("sdogs"): 's' → no match at root, return False
  • Continue traversing: 'c' → 'a' → 't' → 's' (found is_end=True at index 3)
    • Found "cats", recursively call dfs("dogs")
    • In dfs("dogs"): traverse 'd' → 'o' → 'g' → 's'
    • No match for "dogs" (only "dog" exists), return False
  • Continue traversing to check if "catsdogs" itself is in Trie
  • Reach index 7 ('g'), find is_end=True would be at 's'
  • Actually, let me reconsider...

Step 5 (corrected): Process "catsdogs" (length 8)

  • Call dfs("catsdogs")
  • Traverse: 'c' → 'a' → 't' → 's' (found is_end=True at index 3)
    • Found "cats", recursively call dfs("dogs")
    • In dfs("dogs"): traverse 'd' → 'o' → 'g' (found is_end=True at index 2)
      • Found "dog", recursively call dfs("s")
      • In dfs("s"): 's' → no child at root, return False
    • Back in dfs("dogs"): continue to check full "dogs"
    • No "dogs" in Trie (only "dog"), traversal fails
    • Return False

Wait, this example shows "catsdogs" wouldn't be concatenated since "dogs" isn't in our array. Let me use a better example.

Better Example: words = ["cat", "cats", "dog", "catsdog"]

After sorting: ["cat", "dog", "cats", "catsdog"]

Processing "cat", "dog", "cats" as before - all get inserted into Trie.

Process "catsdog" (length 7)

  • Call dfs("catsdog")
  • Traverse: 'c' → 'a' → 't' → 's' (found is_end=True at index 3)
    • Found "cats", recursively call dfs("dog")
    • In dfs("dog"): traverse 'd' → 'o' → 'g' (found is_end=True at index 2)
      • Found complete word "dog", recursively call dfs("")
      • Empty string returns True
    • dfs("dog") returns True
  • dfs("catsdog") returns True
  • Add "catsdog" to result list

Final Result: ["catsdog"]

Solution Implementation

1class Trie:
2    """Trie (prefix tree) data structure for efficient string storage and retrieval."""
3  
4    def __init__(self):
5        # Array to store 26 children nodes (one for each lowercase letter)
6        self.children = [None] * 26
7        # Flag to mark if current node represents end of a word
8        self.is_end = False
9
10    def insert(self, word: str) -> None:
11        """Insert a word into the trie."""
12        node = self
13        for char in word:
14            # Calculate index for the character (0-25)
15            index = ord(char) - ord('a')
16            # Create new node if path doesn't exist
17            if node.children[index] is None:
18                node.children[index] = Trie()
19            # Move to the child node
20            node = node.children[index]
21        # Mark the last node as end of word
22        node.is_end = True
23
24
25class Solution:
26    def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
27        """
28        Find all concatenated words in a dictionary.
29        A concatenated word is formed by combining two or more shorter words from the same dictionary.
30        """
31      
32        def can_form_by_concatenation(word: str) -> bool:
33            """
34            Check if a word can be formed by concatenating other words in the trie.
35            Uses DFS to try all possible splits of the word.
36            """
37            # Empty string means we successfully split the entire word
38            if not word:
39                return True
40          
41            node = trie
42            # Try to match prefixes of the word
43            for i, char in enumerate(word):
44                index = ord(char) - ord('a')
45                # If no matching path exists, word cannot be formed
46                if node.children[index] is None:
47                    return False
48              
49                node = node.children[index]
50                # If we found a complete word, try to match the remaining part
51                if node.is_end and can_form_by_concatenation(word[i + 1:]):
52                    return True
53          
54            return False
55
56        # Initialize trie and result list
57        trie = Trie()
58        result = []
59      
60        # Sort words by length to ensure shorter words are added to trie first
61        # This allows longer words to be formed from shorter ones
62        words.sort(key=lambda word: len(word))
63      
64        for word in words:
65            # Check if current word can be formed by concatenating existing words
66            if can_form_by_concatenation(word):
67                result.append(word)
68            else:
69                # If not, add it to trie for future concatenations
70                trie.insert(word)
71      
72        return result
73
1/**
2 * Trie (Prefix Tree) data structure for efficient string storage and retrieval
3 */
4class Trie {
5    // Array to store 26 child nodes (one for each lowercase letter a-z)
6    Trie[] children = new Trie[26];
7    // Flag to mark if current node represents the end of a word
8    boolean isEndOfWord;
9
10    /**
11     * Inserts a word into the trie
12     * @param word The word to be inserted
13     */
14    void insert(String word) {
15        Trie currentNode = this;
16      
17        // Traverse through each character in the word
18        for (char character : word.toCharArray()) {
19            // Convert character to index (0-25)
20            int index = character - 'a';
21          
22            // Create new node if path doesn't exist
23            if (currentNode.children[index] == null) {
24                currentNode.children[index] = new Trie();
25            }
26          
27            // Move to the child node
28            currentNode = currentNode.children[index];
29        }
30      
31        // Mark the last node as end of word
32        currentNode.isEndOfWord = true;
33    }
34}
35
36/**
37 * Solution class to find all concatenated words in a dictionary
38 * A concatenated word is formed by combining two or more shorter words from the dictionary
39 */
40class Solution {
41    // Trie to store all words for efficient prefix matching
42    private Trie trie = new Trie();
43
44    /**
45     * Finds all concatenated words in the given array
46     * @param words Array of words to check
47     * @return List of concatenated words
48     */
49    public List<String> findAllConcatenatedWordsInADict(String[] words) {
50        // Sort words by length to ensure shorter words are processed first
51        Arrays.sort(words, (word1, word2) -> word1.length() - word2.length());
52      
53        List<String> concatenatedWords = new ArrayList<>();
54      
55        // Process each word
56        for (String word : words) {
57            // Check if current word can be formed by concatenating existing words in trie
58            if (canFormByConcatenation(word)) {
59                concatenatedWords.add(word);
60            } else {
61                // If not a concatenated word, add it to trie for future use
62                trie.insert(word);
63            }
64        }
65      
66        return concatenatedWords;
67    }
68
69    /**
70     * Checks if a word can be formed by concatenating other words in the trie
71     * Uses DFS to explore all possible word combinations
72     * @param word The word to check
73     * @return true if word can be formed by concatenation, false otherwise
74     */
75    private boolean canFormByConcatenation(String word) {
76        // Base case: empty string means we've successfully matched all parts
77        if ("".equals(word)) {
78            return true;
79        }
80      
81        Trie currentNode = trie;
82      
83        // Try to match prefixes of the word
84        for (int i = 0; i < word.length(); ++i) {
85            int charIndex = word.charAt(i) - 'a';
86          
87            // If no path exists for current character, word cannot be formed
88            if (currentNode.children[charIndex] == null) {
89                return false;
90            }
91          
92            // Move to next node
93            currentNode = currentNode.children[charIndex];
94          
95            // If current position marks end of a word, try to match remaining part
96            if (currentNode.isEndOfWord && canFormByConcatenation(word.substring(i + 1))) {
97                return true;
98            }
99        }
100      
101        // No valid concatenation found
102        return false;
103    }
104}
105
1class Trie {
2public:
3    vector<Trie*> children;  // Array of 26 pointers for each lowercase letter
4    bool isEnd;              // Flag to mark end of a word
5  
6    // Constructor initializes children array with nullptrs and isEnd as false
7    Trie() : children(26), isEnd(false) {}
8
9    // Insert a word into the trie
10    void insert(string word) {
11        Trie* node = this;
12        for (char ch : word) {
13            int index = ch - 'a';  // Convert character to index (0-25)
14            // Create new node if path doesn't exist
15            if (!node->children[index]) {
16                node->children[index] = new Trie();
17            }
18            node = node->children[index];  // Move to next node
19        }
20        node->isEnd = true;  // Mark the end of the word
21    }
22};
23
24class Solution {
25public:
26    Trie* trie = new Trie();  // Root of the trie structure
27
28    vector<string> findAllConcatenatedWordsInADict(vector<string>& words) {
29        // Sort words by length to ensure shorter words are added to trie first
30        sort(words.begin(), words.end(), [&](const string& a, const string& b) {
31            return a.size() < b.size();
32        });
33      
34        vector<string> result;
35      
36        // Process each word
37        for (auto& word : words) {
38            // Check if current word can be formed by concatenating existing words
39            if (dfs(word)) {
40                result.push_back(word);  // Add to result if it's a concatenated word
41            } else {
42                trie->insert(word);  // Otherwise, add it to trie for future use
43            }
44        }
45      
46        return result;
47    }
48
49    // Recursively check if a word can be formed by concatenating words in trie
50    bool dfs(string word) {
51        // Base case: empty string means successful concatenation
52        if (word == "") {
53            return true;
54        }
55      
56        Trie* node = trie;
57      
58        // Try to match prefixes of the word
59        for (int i = 0; i < word.size(); ++i) {
60            int index = word[i] - 'a';
61          
62            // If current character path doesn't exist, word can't be formed
63            if (!node->children[index]) {
64                return false;
65            }
66          
67            node = node->children[index];
68          
69            // If we found a complete word, try to match the remaining substring
70            if (node->isEnd && dfs(word.substr(i + 1))) {
71                return true;
72            }
73        }
74      
75        return false;  // No valid concatenation found
76    }
77};
78
1// Trie node structure for efficient word storage and retrieval
2class TrieNode {
3    children: (TrieNode | null)[];  // Array of 26 pointers for each lowercase letter
4    isEnd: boolean;                  // Flag to mark end of a word
5  
6    // Constructor initializes children array with nulls and isEnd as false
7    constructor() {
8        this.children = new Array(26).fill(null);
9        this.isEnd = false;
10    }
11}
12
13// Root of the trie structure
14let trie: TrieNode = new TrieNode();
15
16// Insert a word into the trie
17function insertWord(word: string): void {
18    let node = trie;
19  
20    for (const ch of word) {
21        const index = ch.charCodeAt(0) - 'a'.charCodeAt(0);  // Convert character to index (0-25)
22      
23        // Create new node if path doesn't exist
24        if (!node.children[index]) {
25            node.children[index] = new TrieNode();
26        }
27      
28        node = node.children[index]!;  // Move to next node
29    }
30  
31    node.isEnd = true;  // Mark the end of the word
32}
33
34// Recursively check if a word can be formed by concatenating words in trie
35function dfs(word: string): boolean {
36    // Base case: empty string means successful concatenation
37    if (word === "") {
38        return true;
39    }
40  
41    let node = trie;
42  
43    // Try to match prefixes of the word
44    for (let i = 0; i < word.length; i++) {
45        const index = word.charCodeAt(i) - 'a'.charCodeAt(0);
46      
47        // If current character path doesn't exist, word can't be formed
48        if (!node.children[index]) {
49            return false;
50        }
51      
52        node = node.children[index]!;
53      
54        // If we found a complete word, try to match the remaining substring
55        if (node.isEnd && dfs(word.substring(i + 1))) {
56            return true;
57        }
58    }
59  
60    return false;  // No valid concatenation found
61}
62
63// Main function to find all concatenated words in a dictionary
64function findAllConcatenatedWordsInADict(words: string[]): string[] {
65    // Reset trie for each function call
66    trie = new TrieNode();
67  
68    // Sort words by length to ensure shorter words are added to trie first
69    words.sort((a, b) => a.length - b.length);
70  
71    const result: string[] = [];
72  
73    // Process each word
74    for (const word of words) {
75        // Check if current word can be formed by concatenating existing words
76        if (dfs(word)) {
77            result.push(word);  // Add to result if it's a concatenated word
78        } else {
79            insertWord(word);   // Otherwise, add it to trie for future use
80        }
81    }
82  
83    return result;
84}
85

Time and Space Complexity

Time Complexity: O(n * m^2) where n is the number of words and m is the average length of words.

  • Sorting the words takes O(n log n) time
  • For each word, we perform a DFS traversal:
    • In the worst case, for each character position i in a word of length m, we might check if it forms a valid word (taking O(m) time to traverse the trie)
    • Then recursively call DFS on the remaining substring
    • This can lead to checking all possible split positions, resulting in O(m^2) time per word in the worst case
  • The trie insertion for non-concatenated words takes O(m) time per word
  • Overall: O(n log n + n * m^2) = O(n * m^2) (since typically m^2 > log n)

Space Complexity: O(n * m * 26) which simplifies to O(n * m)

  • The Trie structure stores all words:
    • In the worst case, if all words are completely different, we need O(n * m) nodes
    • Each node contains an array of 26 pointers, so technically O(n * m * 26)
    • Since 26 is a constant, this simplifies to O(n * m)
  • The recursion stack for DFS can go up to O(m) depth in the worst case
  • The result list stores at most O(n) words
  • Overall: O(n * m) for the trie structure dominates the space complexity

Common Pitfalls

1. Forgetting to Handle Empty Words

The algorithm assumes all words have at least one character. If the input contains empty strings, the code will add them to the Trie and potentially cause incorrect results when checking concatenations.

Solution: Filter out empty strings before processing:

words = [w for w in words if w]  # Remove empty strings

2. Incorrect Base Case - Accepting Single Word as Valid

A critical pitfall is not ensuring that a word is formed from at least two other words. The current implementation correctly handles this, but a common mistake would be to check if node.is_end is True at the end of the DFS without ensuring we've made at least one split:

# WRONG: This would accept single words as "concatenated"
def can_form_by_concatenation(word: str, split_count=0) -> bool:
    if not word:
        return split_count > 0  # Should be split_count >= 2

Solution: The current implementation correctly avoids this by only adding words to the result if DFS returns True (meaning a valid split was found), and never inserting a word that can be formed by concatenation back into the Trie.

3. Stack Overflow with Deep Recursion

For very long words or deeply nested concatenations, the recursive DFS might hit Python's recursion limit. For example, if you have many single-character words and a very long concatenated word.

Solution: Use iterative DFS with explicit stack or add memoization:

def can_form_by_concatenation(word: str) -> bool:
    memo = {}
  
    def dfs(start_idx):
        if start_idx == len(word):
            return True
        if start_idx in memo:
            return memo[start_idx]
      
        node = trie
        for i in range(start_idx, len(word)):
            index = ord(word[i]) - ord('a')
            if node.children[index] is None:
                break
            node = node.children[index]
            if node.is_end and dfs(i + 1):
                memo[start_idx] = True
                return True
      
        memo[start_idx] = False
        return False
  
    return dfs(0)

4. Performance Degradation with Duplicate Substrings

The current implementation creates new string slices (word[i + 1:]) in each recursive call, which can be inefficient for long words with many possible splits. Each slice creates a new string object.

Solution: Pass indices instead of creating substrings:

def can_form_by_concatenation(word: str, start=0) -> bool:
    if start == len(word):
        return True
  
    node = trie
    for i in range(start, len(word)):
        index = ord(word[i]) - ord('a')
        if node.children[index] is None:
            return False
        node = node.children[index]
        if node.is_end and can_form_by_concatenation(word, i + 1):
            return True
  
    return False

5. Not Handling Case Sensitivity

The current implementation assumes all input words contain only lowercase letters. Mixed case inputs will cause index out of bounds errors.

Solution: Normalize input or add validation:

# Option 1: Normalize
words = [w.lower() for w in words]

# Option 2: Validate
for word in words:
    if not word.islower():
        raise ValueError(f"Word '{word}' contains non-lowercase characters")
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

What does the following code do?

1def f(arr1, arr2):
2  i, j = 0, 0
3  new_arr = []
4  while i < len(arr1) and j < len(arr2):
5      if arr1[i] < arr2[j]:
6          new_arr.append(arr1[i])
7          i += 1
8      else:
9          new_arr.append(arr2[j])
10          j += 1
11  new_arr.extend(arr1[i:])
12  new_arr.extend(arr2[j:])
13  return new_arr
14
1public static List<Integer> f(int[] arr1, int[] arr2) {
2  int i = 0, j = 0;
3  List<Integer> newArr = new ArrayList<>();
4
5  while (i < arr1.length && j < arr2.length) {
6      if (arr1[i] < arr2[j]) {
7          newArr.add(arr1[i]);
8          i++;
9      } else {
10          newArr.add(arr2[j]);
11          j++;
12      }
13  }
14
15  while (i < arr1.length) {
16      newArr.add(arr1[i]);
17      i++;
18  }
19
20  while (j < arr2.length) {
21      newArr.add(arr2[j]);
22      j++;
23  }
24
25  return newArr;
26}
27
1function f(arr1, arr2) {
2  let i = 0, j = 0;
3  let newArr = [];
4  
5  while (i < arr1.length && j < arr2.length) {
6      if (arr1[i] < arr2[j]) {
7          newArr.push(arr1[i]);
8          i++;
9      } else {
10          newArr.push(arr2[j]);
11          j++;
12      }
13  }
14  
15  while (i < arr1.length) {
16      newArr.push(arr1[i]);
17      i++;
18  }
19  
20  while (j < arr2.length) {
21      newArr.push(arr2[j]);
22      j++;
23  }
24  
25  return newArr;
26}
27

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More