472. Concatenated Words
Problem Description
You are given an array of strings words
that contains no duplicates. Your task is to find all the concatenated words in this array.
A concatenated word is a string that can be formed by combining at least two other shorter words from the same array. The shorter words used to form the concatenated word don't need to be distinct - you can use the same word multiple times.
For example:
- If
words = ["cat", "cats", "dog", "catsdogs"]
, then"catsdogs"
is a concatenated word because it can be formed by combining"cats"
and"dogs"
(or alternatively"cat"
,"s"
,"dog"
,"s"
if those were all in the array). - A word like
"cat"
by itself is not a concatenated word since it cannot be broken down into at least two other words from the array.
The solution uses a Trie data structure combined with dynamic programming. Words are sorted by length and processed from shortest to longest. For each word, the algorithm checks if it can be formed by concatenating existing shorter words in the Trie. If a word can be concatenated, it's added to the result list; otherwise, it's inserted into the Trie for potential use in forming longer words.
The dfs
function recursively checks if a word can be split into valid concatenations by traversing the Trie and looking for complete words (marked by is_end = True
) at each position, then recursively checking if the remaining portion can also be formed from words in the Trie.
Flowchart Walkthrough
First, let's pin down the algorithm using the Flowchart. Here's a step-by-step walkthrough:
Is it a graph?
- Yes: While not explicitly a traditional graph problem, we can model this as a graph where each word is a node, and there's a directed edge from word A to word B if word B starts where word A ends in forming a concatenation. The Trie structure itself represents a graph of character transitions.
Is it a tree?
- Yes: The Trie data structure used in the solution is inherently a tree structure. Each path from root to a leaf represents a word, and we traverse this tree to find valid concatenations.
DFS
- Yes: The solution uses DFS to explore all possible ways to break down a word into concatenations. The
dfs
function recursively traverses the Trie, and whenever it finds a complete word (marked byis_end = True
), it recursively checks if the remaining portion of the string can also be formed from words in the Trie.
Conclusion: The flowchart correctly leads us to DFS as the appropriate algorithm. The problem requires:
- Building a Trie (tree structure) from the words
- Using DFS to recursively check if each word can be decomposed into at least two other words from the array
- The backtracking nature of DFS allows us to explore all possible ways to split a word and return true if any valid concatenation is found
The DFS pattern is essential here because we need to explore all possible splitting points in a word and recursively validate each split, which is naturally handled by the depth-first traversal approach.
Intuition
The key insight is that we need to check if a word can be built from other words in our dictionary. Think of it like building with LEGO blocks - can we construct a bigger piece using smaller pieces we already have?
When we look at a word like "catsdogs"
, we need to check if we can split it into valid words from our array. We could try "c" + "atsdogs"
, "ca" + "tsdogs"
, "cat" + "sdogs"
, "cats" + "dogs"
, and so on. If we find that "cats"
is a valid word in our array, we then need to check if "dogs"
can also be formed from words in our array.
This naturally leads to a recursive approach: at each position in the word, we check if the substring from the start to that position forms a valid word. If it does, we recursively check if the remaining part can also be formed from valid words.
But how do we efficiently check if a substring is a valid word? Scanning through the entire word array each time would be inefficient. This is where the Trie comes in - it allows us to check character by character if we're forming a valid word, and know immediately when we've found a complete word (marked by is_end
).
The clever part of the solution is processing words by length. We start with the shortest words and add them to our Trie. When we check longer words, all potential building blocks (shorter words) are already in the Trie. If a word can be concatenated from others, we add it to our result; if not, it becomes a potential building block for even longer words, so we add it to the Trie.
This approach ensures that when we check if "catsdogs"
is a concatenated word, both "cats"
and "dogs"
are already in our Trie, making the check possible through our DFS traversal.
Solution Approach
The solution implements a Trie-based approach with DFS to identify concatenated words. Here's how the implementation works:
1. Trie Data Structure Setup
First, we define a Trie class to efficiently store and search for words:
- Each Trie node has an array
children
of size 26 (for lowercase letters a-z) - A boolean flag
is_end
marks the end of a complete word - The
insert
method adds words to the Trie by creating nodes for each character
2. Sorting Words by Length
The algorithm sorts the input words by length: words.sort(key=lambda x: len(x))
. This is crucial because:
- Shorter words cannot be concatenated from longer words
- We build our Trie incrementally with shorter words first
- When checking if a longer word is concatenated, all its potential components are already in the Trie
3. DFS Function for Validation
The dfs
function checks if a word can be formed by concatenating other words:
def dfs(w):
if not w:
return True # Empty string means we've successfully split the entire word
The function traverses the Trie character by character:
- For each character, it moves to the corresponding child node
- If no child exists, the current prefix isn't a valid word, so return
False
- When it finds a complete word (
node.is_end = True
), it recursively checks if the remaining substring can also be formed from words in the Trie
4. Main Processing Loop
For each word in the sorted list:
for w in words: if dfs(w): ans.append(w) # Word can be concatenated, add to result else: trie.insert(w) # Word cannot be concatenated, add it as a building block
5. Key Implementation Detail
The recursive call dfs(w[i + 1:])
in the DFS function is critical:
- When we find a valid word at position
i
, we check if the substring fromi+1
onwards can also be formed - This ensures the word is split into at least two parts (the found word and the remaining substring)
- The recursion naturally handles cases where a word can be split into more than two parts
The algorithm's efficiency comes from:
- Using a Trie for O(1) character lookups instead of scanning the word list
- Processing words in length order to ensure all potential components are available
- Memoization through the Trie structure itself (once a word is added, it's instantly available for all future checks)
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through a small example with words = ["cat", "cats", "dog", "catsdogs"]
.
Step 1: Sort by length
After sorting: ["cat", "dog", "cats", "catsdogs"]
Step 2: Process "cat" (length 3)
- Call
dfs("cat")
with empty Trie - Start traversing: 'c' → no child exists in empty Trie
- Return
False
(cannot be formed from other words) - Since DFS returns
False
, insert "cat" into Trie
Trie now contains: "cat"
Step 3: Process "dog" (length 3)
- Call
dfs("dog")
- Start traversing: 'd' → no child for 'd' at root
- Return
False
- Insert "dog" into Trie
Trie now contains: "cat", "dog"
Step 4: Process "cats" (length 4)
- Call
dfs("cats")
- Traverse: 'c' → 'a' → 't' (found
is_end=True
at index 2) - Found "cat" is a complete word, recursively call
dfs("s")
- In
dfs("s")
: 's' → no child for 's' at root - Return
False
- Overall
dfs("cats")
returnsFalse
- Insert "cats" into Trie
Trie now contains: "cat", "cats", "dog"
Step 5: Process "catsdogs" (length 8)
- Call
dfs("catsdogs")
- Traverse: 'c' → 'a' → 't' (found
is_end=True
at index 2)- Found "cat", recursively call
dfs("sdogs")
- In
dfs("sdogs")
: 's' → no match at root, returnFalse
- Found "cat", recursively call
- Continue traversing: 'c' → 'a' → 't' → 's' (found
is_end=True
at index 3)- Found "cats", recursively call
dfs("dogs")
- In
dfs("dogs")
: traverse 'd' → 'o' → 'g' → 's' - No match for "dogs" (only "dog" exists), return
False
- Found "cats", recursively call
- Continue traversing to check if "catsdogs" itself is in Trie
- Reach index 7 ('g'), find
is_end=True
would be at 's' - Actually, let me reconsider...
Step 5 (corrected): Process "catsdogs" (length 8)
- Call
dfs("catsdogs")
- Traverse: 'c' → 'a' → 't' → 's' (found
is_end=True
at index 3)- Found "cats", recursively call
dfs("dogs")
- In
dfs("dogs")
: traverse 'd' → 'o' → 'g' (foundis_end=True
at index 2)- Found "dog", recursively call
dfs("s")
- In
dfs("s")
: 's' → no child at root, returnFalse
- Found "dog", recursively call
- Back in
dfs("dogs")
: continue to check full "dogs" - No "dogs" in Trie (only "dog"), traversal fails
- Return
False
- Found "cats", recursively call
Wait, this example shows "catsdogs" wouldn't be concatenated since "dogs" isn't in our array. Let me use a better example.
Better Example: words = ["cat", "cats", "dog", "catsdog"]
After sorting: ["cat", "dog", "cats", "catsdog"]
Processing "cat", "dog", "cats" as before - all get inserted into Trie.
Process "catsdog" (length 7)
- Call
dfs("catsdog")
- Traverse: 'c' → 'a' → 't' → 's' (found
is_end=True
at index 3)- Found "cats", recursively call
dfs("dog")
- In
dfs("dog")
: traverse 'd' → 'o' → 'g' (foundis_end=True
at index 2)- Found complete word "dog", recursively call
dfs("")
- Empty string returns
True
- Found complete word "dog", recursively call
dfs("dog")
returnsTrue
- Found "cats", recursively call
dfs("catsdog")
returnsTrue
- Add "catsdog" to result list
Final Result: ["catsdog"]
Solution Implementation
1class Trie:
2 """Trie (prefix tree) data structure for efficient string storage and retrieval."""
3
4 def __init__(self):
5 # Array to store 26 children nodes (one for each lowercase letter)
6 self.children = [None] * 26
7 # Flag to mark if current node represents end of a word
8 self.is_end = False
9
10 def insert(self, word: str) -> None:
11 """Insert a word into the trie."""
12 node = self
13 for char in word:
14 # Calculate index for the character (0-25)
15 index = ord(char) - ord('a')
16 # Create new node if path doesn't exist
17 if node.children[index] is None:
18 node.children[index] = Trie()
19 # Move to the child node
20 node = node.children[index]
21 # Mark the last node as end of word
22 node.is_end = True
23
24
25class Solution:
26 def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
27 """
28 Find all concatenated words in a dictionary.
29 A concatenated word is formed by combining two or more shorter words from the same dictionary.
30 """
31
32 def can_form_by_concatenation(word: str) -> bool:
33 """
34 Check if a word can be formed by concatenating other words in the trie.
35 Uses DFS to try all possible splits of the word.
36 """
37 # Empty string means we successfully split the entire word
38 if not word:
39 return True
40
41 node = trie
42 # Try to match prefixes of the word
43 for i, char in enumerate(word):
44 index = ord(char) - ord('a')
45 # If no matching path exists, word cannot be formed
46 if node.children[index] is None:
47 return False
48
49 node = node.children[index]
50 # If we found a complete word, try to match the remaining part
51 if node.is_end and can_form_by_concatenation(word[i + 1:]):
52 return True
53
54 return False
55
56 # Initialize trie and result list
57 trie = Trie()
58 result = []
59
60 # Sort words by length to ensure shorter words are added to trie first
61 # This allows longer words to be formed from shorter ones
62 words.sort(key=lambda word: len(word))
63
64 for word in words:
65 # Check if current word can be formed by concatenating existing words
66 if can_form_by_concatenation(word):
67 result.append(word)
68 else:
69 # If not, add it to trie for future concatenations
70 trie.insert(word)
71
72 return result
73
1/**
2 * Trie (Prefix Tree) data structure for efficient string storage and retrieval
3 */
4class Trie {
5 // Array to store 26 child nodes (one for each lowercase letter a-z)
6 Trie[] children = new Trie[26];
7 // Flag to mark if current node represents the end of a word
8 boolean isEndOfWord;
9
10 /**
11 * Inserts a word into the trie
12 * @param word The word to be inserted
13 */
14 void insert(String word) {
15 Trie currentNode = this;
16
17 // Traverse through each character in the word
18 for (char character : word.toCharArray()) {
19 // Convert character to index (0-25)
20 int index = character - 'a';
21
22 // Create new node if path doesn't exist
23 if (currentNode.children[index] == null) {
24 currentNode.children[index] = new Trie();
25 }
26
27 // Move to the child node
28 currentNode = currentNode.children[index];
29 }
30
31 // Mark the last node as end of word
32 currentNode.isEndOfWord = true;
33 }
34}
35
36/**
37 * Solution class to find all concatenated words in a dictionary
38 * A concatenated word is formed by combining two or more shorter words from the dictionary
39 */
40class Solution {
41 // Trie to store all words for efficient prefix matching
42 private Trie trie = new Trie();
43
44 /**
45 * Finds all concatenated words in the given array
46 * @param words Array of words to check
47 * @return List of concatenated words
48 */
49 public List<String> findAllConcatenatedWordsInADict(String[] words) {
50 // Sort words by length to ensure shorter words are processed first
51 Arrays.sort(words, (word1, word2) -> word1.length() - word2.length());
52
53 List<String> concatenatedWords = new ArrayList<>();
54
55 // Process each word
56 for (String word : words) {
57 // Check if current word can be formed by concatenating existing words in trie
58 if (canFormByConcatenation(word)) {
59 concatenatedWords.add(word);
60 } else {
61 // If not a concatenated word, add it to trie for future use
62 trie.insert(word);
63 }
64 }
65
66 return concatenatedWords;
67 }
68
69 /**
70 * Checks if a word can be formed by concatenating other words in the trie
71 * Uses DFS to explore all possible word combinations
72 * @param word The word to check
73 * @return true if word can be formed by concatenation, false otherwise
74 */
75 private boolean canFormByConcatenation(String word) {
76 // Base case: empty string means we've successfully matched all parts
77 if ("".equals(word)) {
78 return true;
79 }
80
81 Trie currentNode = trie;
82
83 // Try to match prefixes of the word
84 for (int i = 0; i < word.length(); ++i) {
85 int charIndex = word.charAt(i) - 'a';
86
87 // If no path exists for current character, word cannot be formed
88 if (currentNode.children[charIndex] == null) {
89 return false;
90 }
91
92 // Move to next node
93 currentNode = currentNode.children[charIndex];
94
95 // If current position marks end of a word, try to match remaining part
96 if (currentNode.isEndOfWord && canFormByConcatenation(word.substring(i + 1))) {
97 return true;
98 }
99 }
100
101 // No valid concatenation found
102 return false;
103 }
104}
105
1class Trie {
2public:
3 vector<Trie*> children; // Array of 26 pointers for each lowercase letter
4 bool isEnd; // Flag to mark end of a word
5
6 // Constructor initializes children array with nullptrs and isEnd as false
7 Trie() : children(26), isEnd(false) {}
8
9 // Insert a word into the trie
10 void insert(string word) {
11 Trie* node = this;
12 for (char ch : word) {
13 int index = ch - 'a'; // Convert character to index (0-25)
14 // Create new node if path doesn't exist
15 if (!node->children[index]) {
16 node->children[index] = new Trie();
17 }
18 node = node->children[index]; // Move to next node
19 }
20 node->isEnd = true; // Mark the end of the word
21 }
22};
23
24class Solution {
25public:
26 Trie* trie = new Trie(); // Root of the trie structure
27
28 vector<string> findAllConcatenatedWordsInADict(vector<string>& words) {
29 // Sort words by length to ensure shorter words are added to trie first
30 sort(words.begin(), words.end(), [&](const string& a, const string& b) {
31 return a.size() < b.size();
32 });
33
34 vector<string> result;
35
36 // Process each word
37 for (auto& word : words) {
38 // Check if current word can be formed by concatenating existing words
39 if (dfs(word)) {
40 result.push_back(word); // Add to result if it's a concatenated word
41 } else {
42 trie->insert(word); // Otherwise, add it to trie for future use
43 }
44 }
45
46 return result;
47 }
48
49 // Recursively check if a word can be formed by concatenating words in trie
50 bool dfs(string word) {
51 // Base case: empty string means successful concatenation
52 if (word == "") {
53 return true;
54 }
55
56 Trie* node = trie;
57
58 // Try to match prefixes of the word
59 for (int i = 0; i < word.size(); ++i) {
60 int index = word[i] - 'a';
61
62 // If current character path doesn't exist, word can't be formed
63 if (!node->children[index]) {
64 return false;
65 }
66
67 node = node->children[index];
68
69 // If we found a complete word, try to match the remaining substring
70 if (node->isEnd && dfs(word.substr(i + 1))) {
71 return true;
72 }
73 }
74
75 return false; // No valid concatenation found
76 }
77};
78
1// Trie node structure for efficient word storage and retrieval
2class TrieNode {
3 children: (TrieNode | null)[]; // Array of 26 pointers for each lowercase letter
4 isEnd: boolean; // Flag to mark end of a word
5
6 // Constructor initializes children array with nulls and isEnd as false
7 constructor() {
8 this.children = new Array(26).fill(null);
9 this.isEnd = false;
10 }
11}
12
13// Root of the trie structure
14let trie: TrieNode = new TrieNode();
15
16// Insert a word into the trie
17function insertWord(word: string): void {
18 let node = trie;
19
20 for (const ch of word) {
21 const index = ch.charCodeAt(0) - 'a'.charCodeAt(0); // Convert character to index (0-25)
22
23 // Create new node if path doesn't exist
24 if (!node.children[index]) {
25 node.children[index] = new TrieNode();
26 }
27
28 node = node.children[index]!; // Move to next node
29 }
30
31 node.isEnd = true; // Mark the end of the word
32}
33
34// Recursively check if a word can be formed by concatenating words in trie
35function dfs(word: string): boolean {
36 // Base case: empty string means successful concatenation
37 if (word === "") {
38 return true;
39 }
40
41 let node = trie;
42
43 // Try to match prefixes of the word
44 for (let i = 0; i < word.length; i++) {
45 const index = word.charCodeAt(i) - 'a'.charCodeAt(0);
46
47 // If current character path doesn't exist, word can't be formed
48 if (!node.children[index]) {
49 return false;
50 }
51
52 node = node.children[index]!;
53
54 // If we found a complete word, try to match the remaining substring
55 if (node.isEnd && dfs(word.substring(i + 1))) {
56 return true;
57 }
58 }
59
60 return false; // No valid concatenation found
61}
62
63// Main function to find all concatenated words in a dictionary
64function findAllConcatenatedWordsInADict(words: string[]): string[] {
65 // Reset trie for each function call
66 trie = new TrieNode();
67
68 // Sort words by length to ensure shorter words are added to trie first
69 words.sort((a, b) => a.length - b.length);
70
71 const result: string[] = [];
72
73 // Process each word
74 for (const word of words) {
75 // Check if current word can be formed by concatenating existing words
76 if (dfs(word)) {
77 result.push(word); // Add to result if it's a concatenated word
78 } else {
79 insertWord(word); // Otherwise, add it to trie for future use
80 }
81 }
82
83 return result;
84}
85
Time and Space Complexity
Time Complexity: O(n * m^2)
where n
is the number of words and m
is the average length of words.
- Sorting the words takes
O(n log n)
time - For each word, we perform a DFS traversal:
- In the worst case, for each character position
i
in a word of lengthm
, we might check if it forms a valid word (takingO(m)
time to traverse the trie) - Then recursively call DFS on the remaining substring
- This can lead to checking all possible split positions, resulting in
O(m^2)
time per word in the worst case
- In the worst case, for each character position
- The trie insertion for non-concatenated words takes
O(m)
time per word - Overall:
O(n log n + n * m^2)
=O(n * m^2)
(since typicallym^2 > log n
)
Space Complexity: O(n * m * 26)
which simplifies to O(n * m)
- The Trie structure stores all words:
- In the worst case, if all words are completely different, we need
O(n * m)
nodes - Each node contains an array of 26 pointers, so technically
O(n * m * 26)
- Since 26 is a constant, this simplifies to
O(n * m)
- In the worst case, if all words are completely different, we need
- The recursion stack for DFS can go up to
O(m)
depth in the worst case - The result list stores at most
O(n)
words - Overall:
O(n * m)
for the trie structure dominates the space complexity
Common Pitfalls
1. Forgetting to Handle Empty Words
The algorithm assumes all words have at least one character. If the input contains empty strings, the code will add them to the Trie and potentially cause incorrect results when checking concatenations.
Solution: Filter out empty strings before processing:
words = [w for w in words if w] # Remove empty strings
2. Incorrect Base Case - Accepting Single Word as Valid
A critical pitfall is not ensuring that a word is formed from at least two other words. The current implementation correctly handles this, but a common mistake would be to check if node.is_end
is True at the end of the DFS without ensuring we've made at least one split:
# WRONG: This would accept single words as "concatenated"
def can_form_by_concatenation(word: str, split_count=0) -> bool:
if not word:
return split_count > 0 # Should be split_count >= 2
Solution: The current implementation correctly avoids this by only adding words to the result if DFS returns True (meaning a valid split was found), and never inserting a word that can be formed by concatenation back into the Trie.
3. Stack Overflow with Deep Recursion
For very long words or deeply nested concatenations, the recursive DFS might hit Python's recursion limit. For example, if you have many single-character words and a very long concatenated word.
Solution: Use iterative DFS with explicit stack or add memoization:
def can_form_by_concatenation(word: str) -> bool:
memo = {}
def dfs(start_idx):
if start_idx == len(word):
return True
if start_idx in memo:
return memo[start_idx]
node = trie
for i in range(start_idx, len(word)):
index = ord(word[i]) - ord('a')
if node.children[index] is None:
break
node = node.children[index]
if node.is_end and dfs(i + 1):
memo[start_idx] = True
return True
memo[start_idx] = False
return False
return dfs(0)
4. Performance Degradation with Duplicate Substrings
The current implementation creates new string slices (word[i + 1:]
) in each recursive call, which can be inefficient for long words with many possible splits. Each slice creates a new string object.
Solution: Pass indices instead of creating substrings:
def can_form_by_concatenation(word: str, start=0) -> bool:
if start == len(word):
return True
node = trie
for i in range(start, len(word)):
index = ord(word[i]) - ord('a')
if node.children[index] is None:
return False
node = node.children[index]
if node.is_end and can_form_by_concatenation(word, i + 1):
return True
return False
5. Not Handling Case Sensitivity
The current implementation assumes all input words contain only lowercase letters. Mixed case inputs will cause index out of bounds errors.
Solution: Normalize input or add validation:
# Option 1: Normalize
words = [w.lower() for w in words]
# Option 2: Validate
for word in words:
if not word.islower():
raise ValueError(f"Word '{word}' contains non-lowercase characters")
What does the following code do?
1def f(arr1, arr2):
2 i, j = 0, 0
3 new_arr = []
4 while i < len(arr1) and j < len(arr2):
5 if arr1[i] < arr2[j]:
6 new_arr.append(arr1[i])
7 i += 1
8 else:
9 new_arr.append(arr2[j])
10 j += 1
11 new_arr.extend(arr1[i:])
12 new_arr.extend(arr2[j:])
13 return new_arr
14
1public static List<Integer> f(int[] arr1, int[] arr2) {
2 int i = 0, j = 0;
3 List<Integer> newArr = new ArrayList<>();
4
5 while (i < arr1.length && j < arr2.length) {
6 if (arr1[i] < arr2[j]) {
7 newArr.add(arr1[i]);
8 i++;
9 } else {
10 newArr.add(arr2[j]);
11 j++;
12 }
13 }
14
15 while (i < arr1.length) {
16 newArr.add(arr1[i]);
17 i++;
18 }
19
20 while (j < arr2.length) {
21 newArr.add(arr2[j]);
22 j++;
23 }
24
25 return newArr;
26}
27
1function f(arr1, arr2) {
2 let i = 0, j = 0;
3 let newArr = [];
4
5 while (i < arr1.length && j < arr2.length) {
6 if (arr1[i] < arr2[j]) {
7 newArr.push(arr1[i]);
8 i++;
9 } else {
10 newArr.push(arr2[j]);
11 j++;
12 }
13 }
14
15 while (i < arr1.length) {
16 newArr.push(arr1[i]);
17 i++;
18 }
19
20 while (j < arr2.length) {
21 newArr.push(arr2[j]);
22 j++;
23 }
24
25 return newArr;
26}
27
Recommended Readings
https assets algo monster cover_photos dfs svg Depth First Search Prereqs Recursion Review problems recursion_intro Trees problems tree_intro With a solid understanding of recursion under our belts we are now ready to tackle one of the most useful techniques in coding interviews Depth First Search DFS As the name suggests
Trie Introduction Imagine you have a magic dictionary where as soon as you start spelling a word it immediately tells you how many words that start with those letters How cool would that be This is exactly the kind of magic a data structure called trie performs What's a Trie
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Want a Structured Path to Master System Design Too? Don’t Miss This!