Leetcode 472. Concatenated Words

Problem Explanation

Given a list of words, the problem is to find all those words which are actually a merge of at least two different words present in the same list. It means they are actually a concatenation of smaller words present in the list. The words are all lowercase and there are no duplicates.

For example, if given list is ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"], then the words "catsdogcats", "dogcatsdog", and "ratcatdogcat" are concatenations of "cat", "cats", "dog", "rat" and thus can be obtained using these smaller words present in the list.

Solution Approach

The solution uses Depth-First Search (DFS) mechanism with backtracking and memorization. The algorithm steps are as follows:

  1. The given list of words is iterated and checked if it can be obtained by concatenation of other words. This is done using a support function isConcat().
  2. If it can be, that word is added to the final result list.
  3. In isConcat() function, each substring is checked if it is a part of the words list, moving ahead in small steps.
  4. The function checks the beginning part of the string in the words list, and then recursively the rest of it. If both exist in the words list, return True.
  5. Using memorization, if a word is already checked and found to be a concatenation of words, it is directly returned without re-iterating which optimizes the solution.

While DFS allows each possibility to be exhausted, memorization helps quicken up the process of fetching words without needing to recurse each time.

Illustrative Example

Suppose given list is ["cat", "cats", "dog", "catsdogcats"], the steps occur as follows:

  • Word: cat, it is a one-word string, not a concatenation of strings. Discard.
  • Word: cats, it is also a one-word string. Discard.
  • Word: dog, it is also a one-word string. Discard.
  • Word: catsdogcats, it can be obtained as concatenation of "cats", "dog", "cats". It is added to the final list.

Solutions

Python

1
2python
3from collections import defaultdict
4class Solution:
5    def findAllConcatenatedWordsInADict(self, words):
6        words = sorted(words, key=len)
7        dic = defaultdict(bool)
8        res = []
9        for word in words:
10            if self.hasWord(dic, word):
11                res.append(word)
12            dic[word] = True
13        return res
14    
15    def hasWord(self, dic, word):
16        if word in dic: return dic[word]
17        for i in range(1,len(word)):
18            if word[:i] in dic and dic[word[:i]]:
19                if word[i:] in dic or self.hasWord(dic, word[i:]):
20                    return True
21        return False

Java

1
2java
3import java.util.*;
4class Solution {
5    public List<String> findAllConcatenatedWordsInADict(String[] words) {
6        List<String> result = new ArrayList<>(); 
7        Set<String> preWords = new HashSet<>(); 
8        Arrays.sort(words, (s1, s2) -> s1.length() - s2.length());
9        
10        for(int i=0; i<words.length; i++)  {
11            if(canForm(words[i], preWords))
12                result.add(words[i]);
13            preWords.add(words[i]);
14        }
15        return result;
16    }
17    
18    public boolean canForm(String word, Set<String> preWords) {
19        if(preWords.isEmpty()) 
20            return false;  
21        
22        boolean[] dp = new boolean[word.length() + 1];
23        dp[0] = true; // set true to build substrings
24        for (int i = 1; i <= word.length(); i++) {  
25            for (int j = 0; j < i; j++) {  
26                if (!dp[j]) 
27                    continue; 
28                if (preWords.contains(word.substring(j, i))) {
29                    dp[i] = true; // Once one substring is detected, break from inner loop
30                    break;
31                }
32            }
33        }
34        return dp[word.length()];
35    }
36}

JavaScript

1
2javascript
3var findAllConcatenatedWordsInADict = function(words) {
4    words.sort((a,b) => a.length - b.length);
5    let wordDict = new Set();
6    let result = [];
7    for (let word of words) {
8        if(canForm(word, wordDict)){
9            result.push(word);
10        }
11        wordDict.add(word);
12    }
13    return result;
14};
15
16var canForm = function(word, wordDict) {
17    if(wordDict.size === 0) return false;
18    let dp = new Array(word.length+1).fill(false);
19    dp[0] = true;
20    for(let i = 1; i <= word.length; i++){
21        for(let j = 0; j < i; j++){
22            if(!dp[j]) continue;
23            if(wordDict.has(word.substring(j, i))){
24                dp[i] = true;
25                break;
26            }
27        } 
28    }
29    return dp[word.length];
30};

C++

1
2C++
3class Solution {
4public:
5    vector<string> findAllConcatenatedWordsInADict(vector<string>& words) {
6        sort(words.begin(), words.end(), [](string& a, string& b){ 
7            return a.size() < b.size(); 
8        });
9        vector<string> res;
10        unordered_set<string> preWords;  
11        
12        for (int i = 0; i < words.size(); i++) {
13            if (canForm(words[i], preWords))
14                res.push_back(words[i]);
15            preWords.insert(words[i]);  
16        }
17        return res;
18    }
19    
20    bool canForm(string& word, unordered_set<string>& dict) {
21        if (dict.empty()) return false;
22        int len = word.size();
23        bool dp[len+1];
24        memset(dp, 0, sizeof(dp));
25        dp[0] = true;
26        
27        for (int i = 1; i <= len; i++) {
28            for (int j = 0; j < i; j++) {
29                if (!dp[j]) continue;
30                if (dict.count(word.substr(j, i-j))) {
31                    dp[i] = true;
32                    break;
33                }
34            }
35        }
36        return dp[len];
37    }
38};

C#

1
2csharp
3public class Solution {
4    public IList<string> FindAllConcatenatedWordsInADict(string[] words) {
5        Array.Sort(words, (s1, s2) => s1.Length - s2.Length);
6        HashSet<string> preWords = new HashSet<string>(); 
7        List<string> result = new List<string>(); 
8        foreach(string word in words){
9            if(canForm(word, preWords))
10                result.Add(word);
11            preWords.Add(word);
12        }
13        return result;
14    }
15    
16    public bool canForm(string word, HashSet<string> preWords) {
17        if(preWords.Count == 0) return false;
18        bool[] dp = new bool[word.Length+1];
19        dp[0] = true;
20        for(int i = 1; i <= word.Length; i++){
21            for(int j = 0; j < i; j++){
22                if(!dp[j]) continue;
23                if(preWords.Contains(word.Substring(j, i-j))){
24                    dp[i] = true;
25                    break;
26                }
27            }
28        }
29        return dp[word.Length];
30    }
31}

I hope the explanation of the problem and the different code implementations were clear. To recap, we are using a depth-first search and memorization to find all those words which are actually a merge of at least two different words present in the same list.

The solutions in Python, JavaScript, Java, C++, and C# have similar logic of using a hash set (HashSet, Set, set or unordered_set depending on the language) for storing the words that we have gone through and dynamic programming for checking the existence of the parts (substrings) of the current word in the set.

There are some differences in syntax between the languages, but overall the code in each language essentially does the same thing. You loop through the words, using dynamic programming to check if each word can be formed by some of the words that come before it in the sequence (since the list is sorted by length), and if so, add it to the final list.

While this problem can quite likely be solved using other methods or algorithms, this approach is reasonable and efficient in terms of the time complexity. At every step of the word, we are looking at all previous parts, keeping the intermediate results in memory to avoid redundant computations. Thus, the algorithm goes through each word only once, making the time complexity effectively linear in the number of words in the list. The space complexity is also linear in terms of the length of the words stored in the dictionary and result array.

Keep practicing more problems to get a firm grip on using DFS, backtracking and memorization effectively for problem-solving. Happy coding!


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫