943. Find the Shortest Superstring
Problem Description
The challenge is to find the shortest possible string that contains all the given strings in the array words
as substrings. It's important to note that none of the strings in words
is a substring of any other string in the array, simplifying the problem. The task is to construct a superstring that concatenates all individual strings in such a way that they overlap wherever possible to reduce the overall length. If there are several shortest superstrings, returning any one of them is acceptable.
Intuition
To solve this problem, we think about overlaps between pairs of strings to determine which strings should be concatenated together. To start, we need to know how to combine two strings together to achieve the maximum overlap, which means finding how a string can partially overlap the end of another string.
The intuition for the solution approach involves calculating the overlap between each pair of strings, which is the maximum prefix of one string that is also a suffix of the other. This forms a graph-like structure where the overlap value can represent the edge weight. To find the shortest superstring, we aim to visit each node (string) while maximizing the total weight of the edges we have traversed.
To find the order of strings that maximizes the overlaps, we can use dynamic programming. A dynamic programming table dp[mask][i]
can be constructed where mask
is a bitmask representing the set of strings included, and i
is the last string added. The value represents the maximum overlap we can get so far using the strings in mask
, ending with the string i
.
However, since the exact order of strings that leads to this maximum value is also required to construct the superstring, we need a way to backtrack from the maximum overlap value to the start. This is done using another table p[mask][i]
that records the second to last string added before i
.
Once the dynamic programming table is filled, we can reconstruct the path from the maximum value, and then we add the remaining strings without overlaps to complete the superstring. The superstring is then constructed by taking the first string in order and adding the suffixes of all subsequent strings according to the overlap.
The final output is constructed by joining these strings. This way, we arrive at a superstring with the smallest possible length that still includes all strings from the words
array as substrings.
Learn more about Dynamic Programming and Bitmask patterns.
Solution Approach
The Reference Solution Approach above uses a combination of dynamic programming with bit masking and graph theory to construct the shortest superstring. Here's a detailed walkthrough of the implementation:
-
Graph Creation with Overlaps:
- A 2D array
g[][]
is created to store the overlaps.g[i][j]
means the length of the overlap between the end of wordi
and the start of wordj
. - Nested loops iterate through each pair of words
a
andb
inwords
, and for each pair, a third loop finds the maximum overlap.
- A 2D array
-
Dynamic Programming (DP) Table Initialization:
- A 2D DP array
dp[][]
is declared with1 << n
rows andn
columns. Each celldp[mask][i]
will store the maximum overlap obtainable with the last string in the subsetmask
beingwords[i]
. - To reconstruct the optimal combination of string overlaps, a predecessor array
p[][]
is maintained to track the choice of previous word in the optimal path.
- A 2D DP array
-
DP and Bitmasking:
- The algorithm iterates over all possible subsets
i
(represented as bitmasks) and each wordj
. For each subset-word pair, it calculates the maximum overlap by trying to add wordj
to the subsetpi
(wherepi
is the subset without wordj
). - The idea is to transition from a smaller subset (without word
j
) to a larger subset (with wordj
) and update the DP table with the max overlap.
- The algorithm iterates over all possible subsets
-
Reconstructing the Path:
- After populating the DP table, the last word
j
in the optimal path (leading to the maximum overlap in the full set) is found. - From there, using the
p
table, the algorithm reconstructs the order of strings, starting from the full set (all bits set) and moving backwards by removing the current word from the mask and jumping to its predecessor.
- After populating the DP table, the last word
-
Building the Shortest Superstring:
- Once the order is found, each word is added to the result in a specific way:
- The first word is added in full.
- Every subsequent word is added, minus the prefix that overlaps with the end of the current superstring (this overlap length is stored in
g[i][j]
).
- Any leftover words which were not in the set (which didn't form part of any optimal path) are simply appended to the end, because
words
are guaranteed not to be substrings of one another.
- Once the order is found, each word is added to the result in a specific way:
This solution uses dynamic programming for subset optimization combined with graph theory concepts (finding the longest path in an edge-weighted graph, where edges represent word overlaps). The superstring is formed by selecting substrings that give the maximum overlap when concatenated together. This results in space and time complexity that is exponential relative to the number of strings due to the bitmasking approach (O(n^2 * 2^n)
), which is acceptable here given the problem's constraints.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's consider a small example with the words
array consisting of ["catg", "ctaagt", "gcta", "ttca", "atgcatc"]
. Here's how the solution approach operates on this example:
Step 1: Graph Creation with Overlaps
- For each pair of words, we calculate the maximum overlap. For instance:
- The overlap of
catg
andctaagt
iscta
, with length 3. - The overlap of
gcta
andttca
istca
, with length 3. - ... and so on.
- The overlap of
This results in a 2D overlap array g[][]
.
Step 2: Dynamic Programming (DP) Table Initialization
- Create a
dp
table with1 << 5
rows (since there are 5 strings) and 5 columns. - Create a
p
predecessor table with the same dimensions to trace the optimal path.
Step 3: DP and Bitmasking
- Iterate over all subsets of
words
(represented by bitmasks) and try to extend each subset by adding one word to determine the best overlap. - When trying to add
ctaagt
to a subset containingcatg
, check thedp
value of the subset withoutctaagt
and theg[i][j]
overlap value; updatedp
andp
if a larger overlap is achieved.
Step 4: Reconstructing the Path
- Once the
dp
table is fully populated, trace back from the full set11111
to find the sequence resulting in the maximum overlap. - For this example, let’s say the order obtained after reconstruction is
["gcta", "ttca", "ctaagt", "catg", "atgcatc"]
.
Step 5: Building the Shortest Superstring
- Start by adding
gcta
in full to the result. - Then append
ttca
minus the prefixtca
, resulting in the superstringgcttca
. - Next, add
ctaagt
minus the prefixcta
, resulting ingcttcaagt
. - Add
catg
minus the prefixcatg
(no overlap with the last added string), sogcttcaagtcatg
. - Lastly, add
atgcatc
minus the prefixatgcatc
(again no overlap), completing the superstring asgcttcaagtcatgatgcatc
.
The final superstring gcttcaagtcatgatgcatc
contains all strings from the words
array as substrings, and it's the shortest possible string that does so according to our calculations.
This example illustrates the concept of using dynamic programming with bit masking and graph theory to construct the optimal string. The DP table dp[mask][i]
holds the maximum overlap value, and p[mask][i]
helps in tracing back the string sequence. We then build the superstring using these calculated overlaps, ensuring each string is represented as a substring while minimizing the total length.
Solution Implementation
1from typing import List
2
3class Solution:
4 def shortestSuperstring(self, words: List[str]) -> str:
5 # Number of words
6 num_words = len(words)
7
8 # Graph g where g[i][j] represents the length of overlap if word i is followed by word j
9 overlap = [[0] * num_words for _ in range(num_words)]
10
11 # Calculate the maximum overlap of each pair of words
12 for i, word1 in enumerate(words):
13 for j, word2 in enumerate(words):
14 if i != j:
15 # Check the maximum prefix of word2 that's a suffix of word1
16 for k in range(min(len(word1), len(word2)), 0, -1):
17 if word1[-k:] == word2[:k]:
18 overlap[i][j] = k
19 break
20
21 # Dynamic programming table, where dp[mask][i] holds the highest overlap with mask and ending with word i
22 dp = [[0] * num_words for _ in range(1 << num_words)]
23
24 # Parent table to reconstruct the path
25 parent = [[-1] * num_words for _ in range(1 << num_words)]
26
27 # Fill dp[] and parent[]
28 for mask in range(1 << num_words):
29 for j in range(num_words):
30 if (mask >> j) & 1:
31 # Previous mask before adding word j
32 prev_mask = mask ^ (1 << j)
33 for k in range(num_words):
34 if (prev_mask >> k) & 1:
35 value = dp[prev_mask][k] + overlap[k][j]
36 if value > dp[mask][j]:
37 dp[mask][j] = value
38 parent[mask][j] = k
39
40 # Recover the last word in the optimal arrangement
41 last = max(range(num_words), key=lambda i: dp[-1][i])
42
43 # Recover the order of words in the optimal superstring
44 order = []
45 mask = (1 << num_words) - 1
46 while last != -1:
47 prev = parent[mask][last]
48 order.append(last)
49 mask ^= (1 << last)
50 last = prev
51 order.reverse()
52
53 # Add words that are not in the optimal order (could happen if all overlaps are 0)
54 order.extend(j for j in range(num_words) if j not in order)
55
56 # Build the shortest superstring based on the order
57 result = [words[order[0]]]
58 for i in range(1, len(order)):
59 index = order[i]
60 prev_index = order[i - 1]
61 result.append(words[index][overlap[prev_index][index]:])
62
63 return ''.join(result)
64
65# An example of how to use the class would be:
66# solution = Solution()
67# superstring = solution.shortestSuperstring(["abc", "bcd", "cde"])
68
1import java.util.ArrayList;
2import java.util.Arrays;
3import java.util.Collections;
4import java.util.HashSet;
5import java.util.List;
6import java.util.Set;
7
8class Solution {
9 public String shortestSuperstring(String[] words) {
10 int wordCount = words.length;
11 int[][] overlap = new int[wordCount][wordCount];
12
13 // Calculate the overlap between pairs of words
14 for (int i = 0; i < wordCount; ++i) {
15 for (int j = 0; j < wordCount; ++j) {
16 if (i != j) {
17 for (int k = Math.min(words[i].length(), words[j].length()); k > 0; --k) {
18 if (words[i].endsWith(words[j].substring(0, k))) {
19 overlap[i][j] = k;
20 break;
21 }
22 }
23 }
24 }
25 }
26
27 // Initialize the dynamic programming tables
28 int[][] dp = new int[1 << wordCount][wordCount];
29 int[][] parent = new int[1 << wordCount][wordCount];
30
31 // Fill the parent table with -1 to indicate no parent
32 for (int i = 0; i < (1 << wordCount); ++i) {
33 Arrays.fill(parent[i], -1);
34 }
35
36 // Build up the dp table which stores maximum overlap
37 for (int bitmask = 0; bitmask < (1 << wordCount); ++bitmask) {
38 for (int j = 0; j < wordCount; ++j) {
39 // Check if word j is in the set represented by bitmask
40 if ((bitmask & (1 << j)) > 0) {
41 int prevState = bitmask ^ (1 << j);
42 for (int k = 0; k < wordCount; ++k) {
43 if ((prevState & (1 << k)) > 0) {
44 int overlapValue = dp[prevState][k] + overlap[k][j];
45 if (overlapValue > dp[bitmask][j]) {
46 dp[bitmask][j] = overlapValue;
47 parent[bitmask][j] = k;
48 }
49 }
50 }
51 }
52 }
53 }
54
55 // Reconstruct the path and compute the answer
56 int[] order = findOptimalOrder(dp, parent, wordCount);
57 StringBuilder result = new StringBuilder(words[order[0]]);
58 for (int i = 1; i < wordCount; ++i) {
59 int o = overlap[order[i - 1]][order[i]];
60 result.append(words[order[i]].substring(o));
61 }
62 return result.toString();
63 }
64
65 // Helper function to find the optimal order of words
66 private int[] findOptimalOrder(int[][] dp, int[][] parent, int wordCount) {
67 List<Integer> path = new ArrayList<>();
68 int bitmask = (1 << wordCount) - 1;
69 int lastWordInPath = findLastWord(dp, wordCount);
70
71 // Reconstruct the path backwards using parent pointers
72 for (int i = lastWordInPath; bitmask != 0; ) {
73 int temp = bitmask;
74 path.add(i);
75 bitmask &= ~(1 << i);
76 i = parent[temp][i];
77 }
78
79 // Add any unused words
80 Set<Integer> usedWords = new HashSet<>(path);
81 for (int i = 0; i < wordCount; ++i) {
82 if (!usedWords.contains(i)) {
83 path.add(i);
84 }
85 }
86
87 // Reverse the order since we were moving backwards
88 Collections.reverse(path);
89
90 // Convert the List<Integer> to an array
91 return path.stream().mapToInt(i -> i).toArray();
92 }
93
94 // Helper function to find the last word in the optimal path
95 private int findLastWord(int[][] dp, int wordCount) {
96 int lastWordIndex = 0;
97 // Choose the word that ends up with the largest overlap
98 for (int i = 0; i < wordCount; ++i) {
99 if (dp[(1 << wordCount) - 1][i] > dp[(1 << wordCount) - 1][lastWordIndex]) {
100 lastWordIndex = i;
101 }
102 }
103 return lastWordIndex;
104 }
105}
106
1#include <vector>
2#include <string>
3#include <unordered_set>
4#include <algorithm>
5
6using namespace std;
7
8class Solution {
9public:
10 string shortestSuperstring(vector<string>& words) {
11 int wordCount = words.size();
12 // Initialize the graph to store overlaps
13 vector<vector<int>> overlaps(wordCount, vector<int>(wordCount, 0));
14
15 // Calculate overlap for each pair of words
16 for (int i = 0; i < wordCount; ++i) {
17 for (int j = 0; j < wordCount; ++j) {
18 if (i != j) {
19 for (int overlapSize = min(words[i].size(), words[j].size()); overlapSize > 0; --overlapSize) {
20 if (words[i].substr(words[i].size() - overlapSize) == words[j].substr(0, overlapSize)) {
21 overlaps[i][j] = overlapSize;
22 break;
23 }
24 }
25 }
26 }
27 }
28
29 // Dynamic Programming to store max overlap
30 vector<vector<int>> dp(1 << wordCount, vector<int>(wordCount, 0));
31 vector<vector<int>> parent(1 << wordCount, vector<int>(wordCount, -1));
32
33 // Fill the dp table
34 for (int mask = 0; mask < (1 << wordCount); ++mask) { // For each combination of words
35 for (int j = 0; j < wordCount; ++j) {
36 if ((mask >> j) & 1) {
37 int previousMask = mask ^ (1 << j);
38 for (int k = 0; k < wordCount; ++k) {
39 if ((previousMask >> k) & 1) {
40 int val = dp[previousMask][k] + overlaps[k][j];
41 if (val > dp[mask][j]) {
42 dp[mask][j] = val;
43 parent[mask][j] = k;
44 }
45 }
46 }
47 }
48 }
49 }
50
51 // Reconstruct the path
52 vector<int> path;
53 int j = 0;
54
55 // Find the maximum overlap
56 for (int i = 0; i < wordCount; ++i) {
57 if (dp[(1 << wordCount) - 1][i] > dp[(1 << wordCount) - 1][j]) {
58 j = i;
59 }
60 }
61
62 // Build the path backwards
63 for (int mask = (1 << wordCount) - 1; parent[mask][j] != -1; ) {
64 int prev = parent[mask][j];
65 path.push_back(j);
66 mask ^= (1 << j);
67 j = prev;
68 }
69 path.push_back(j);
70
71 // Include the remaining words
72 unordered_set<int> visited(path.begin(), path.end());
73 for (int i = 0; i < wordCount; ++i) {
74 if (!visited.count(i)) {
75 path.push_back(i);
76 }
77 }
78
79 // Construct the shortest superstring
80 reverse(path.begin(), path.end());
81 string superstring = words[path[0]];
82 for (int i = 1; i < wordCount; ++i) {
83 int overlap = overlaps[path[i - 1]][path[i]];
84 superstring += words[path[i]].substr(overlap);
85 }
86
87 return superstring;
88 }
89};
90
1// Importing necessary functions from other modules
2import { vector, unordered_set } from 'somelibrary'; // Placeholder, replace with actual library if needed.
3
4function calculateOverlaps(words: string[]): number[][] {
5 const wordCount = words.length;
6 let overlaps: number[][] = vector(wordCount, vector(wordCount, 0));
7
8 // Calculate overlap for each pair of words
9 for (let i = 0; i < wordCount; ++i) {
10 for (let j = 0; j < wordCount; ++j) {
11 if (i !== j) {
12 for (let overlapSize = Math.min(words[i].length, words[j].length); overlapSize > 0; --overlapSize) {
13 if (words[i].endsWith(words[j].substring(0, overlapSize))) {
14 overlaps[i][j] = overlapSize;
15 break;
16 }
17 }
18 }
19 }
20 }
21
22 return overlaps;
23}
24
25function findShortestSuperstring(words: string[]): string {
26 const wordCount = words.length;
27 const overlaps = calculateOverlaps(words);
28
29 let dp: number[][] = vector(1 << wordCount, vector(wordCount, 0));
30 let parent: number[][] = vector(1 << wordCount, vector<number>(wordCount, -1));
31
32 // Fill the dp table
33 for (let mask = 0; mask < (1 << wordCount); ++mask) {
34 for (let j = 0; j < wordCount; ++j) {
35 if ((mask >> j) & 1) {
36 let previousMask = mask ^ (1 << j);
37 for (let k = 0; k < wordCount; ++k) {
38 if ((previousMask >> k) & 1) {
39 let val = dp[previousMask][k] + overlaps[k][j];
40 if (val > dp[mask][j]) {
41 dp[mask][j] = val;
42 parent[mask][j] = k;
43 }
44 }
45 }
46 }
47 }
48 }
49
50 // Reconstruct the path
51 let path: number[] = [];
52 let j = 0;
53
54 // Find the maximum overlap
55 for (let i = 0; i < wordCount; ++i) {
56 if (dp[(1 << wordCount) - 1][i] > dp[(1 << wordCount) - 1][j]) {
57 j = i;
58 }
59 }
60
61 // Build the path backwards
62 for (let mask = (1 << wordCount) - 1; parent[mask][j] !== -1; ) {
63 let prev = parent[mask][j];
64 path.push(j);
65 mask ^= (1 << j);
66 j = prev;
67 }
68 path.push(j);
69
70 // Include the remaining words
71 let visited = new unordered_set<number>(path);
72 for (let i = 0; i < wordCount; ++i) {
73 if (!visited.has(i)) {
74 path.push(i);
75 }
76 }
77
78 // Construct the shortest superstring
79 path.reverse();
80 let superstring = words[path[0]];
81 for (let i = 1; i < path.length; ++i) {
82 let overlap = overlaps[path[i - 1]][path[i]];
83 superstring += words[path[i]].substring(overlap);
84 }
85
86 return superstring;
87}
88
Time and Space Complexity
The given code aims to find the shortest superstring that can be formed by a given list of strings by performing concatenations in a certain way.
Time Complexity
The time complexity of the code is O(n^2 * 2^n + n^2 * W)
, where 'n' is the number of words and 'W' is the maximum length of a word.
-
The first nested for-loop, with
g = [[0] * n for _ in range(n)]
, runs inO(n^2 * W)
time. This loop calculates the overlap between each pair of words. -
The second nested for-loop updates the dynamic programming (DP) table
dp
and predecessor tablep
, and it runs inO(n^2 * 2^n)
time. This happens because for each of the2^n
subsets of words, it calculates the maximum overlap by checking pairs of words, and there aren^2
potential pairs. -
The procedure to find the highest overlap at the last bit-mask (
(1 << n) - 1
) isO(n)
. -
Reconstructing the solution path has a time complexity of
O(n)
, since it reconstructs the full path of word concatenations. -
Finally, the output string construction with list comprehensions has
O(n)
complexity but depends on the lengths of the words for actual string operations.
Considering all these steps, the first for loop and the DP table update are the most time-consuming parts.
Space Complexity
The space complexity of the code is O(n * 2^n + n^2)
, which comprises:
-
O(n^2)
for storing the overlap between pairs of words ing
. -
O(n * 2^n)
for storing the DP values indp
. -
O(n * 2^n)
for storing the predecessors inp
.
The DP and predecessor tables account for the largest portion of the space used, as they store information for each subset of the power set of 'n' words.
In summary, the overall space complexity mirrors the structure of the DP and predecessor tables.
Learn more about how to find time and space complexity quickly using problem constraints.
Which algorithm should you use to find a node that is close to the root of the tree?
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Bitmask and Dynamic Programming Bit manipulation is a crucial aspect of computer programming and one of the most powerful tools for bit manipulation is bitmasks Let's first understand what a bit is A bit is a binary digit It's the smallest piece of data in a computer and can be
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Want a Structured Path to Master System Design Too? Don’t Miss This!