# 943. Find the Shortest Superstring

## Problem Description

The challenge is to find the shortest possible string that contains all the given strings in the array `words` as substrings. It's important to note that none of the strings in `words` is a substring of any other string in the array, simplifying the problem. The task is to construct a superstring that concatenates all individual strings in such a way that they overlap wherever possible to reduce the overall length. If there are several shortest superstrings, returning any one of them is acceptable.

## Intuition

To solve this problem, we think about overlaps between pairs of strings to determine which strings should be concatenated together. To start, we need to know how to combine two strings together to achieve the maximum overlap, which means finding how a string can partially overlap the end of another string.

The intuition for the solution approach involves calculating the overlap between each pair of strings, which is the maximum prefix of one string that is also a suffix of the other. This forms a graph-like structure where the overlap value can represent the edge weight. To find the shortest superstring, we aim to visit each node (string) while maximizing the total weight of the edges we have traversed.

To find the order of strings that maximizes the overlaps, we can use dynamic programming. A dynamic programming table `dp[mask][i]` can be constructed where `mask` is a bitmask representing the set of strings included, and `i` is the last string added. The value represents the maximum overlap we can get so far using the strings in `mask`, ending with the string `i`.

However, since the exact order of strings that leads to this maximum value is also required to construct the superstring, we need a way to backtrack from the maximum overlap value to the start. This is done using another table `p[mask][i]` that records the second to last string added before `i`.

Once the dynamic programming table is filled, we can reconstruct the path from the maximum value, and then we add the remaining strings without overlaps to complete the superstring. The superstring is then constructed by taking the first string in order and adding the suffixes of all subsequent strings according to the overlap.

The final output is constructed by joining these strings. This way, we arrive at a superstring with the smallest possible length that still includes all strings from the `words` array as substrings.

## Solution Approach

The Reference Solution Approach above uses a combination of dynamic programming with bit masking and graph theory to construct the shortest superstring. Here's a detailed walkthrough of the implementation:

1. Graph Creation with Overlaps:

• A 2D array `g[][]` is created to store the overlaps. `g[i][j]` means the length of the overlap between the end of word `i` and the start of word `j`.
• Nested loops iterate through each pair of words `a` and `b` in `words`, and for each pair, a third loop finds the maximum overlap.
2. Dynamic Programming (DP) Table Initialization:

• A 2D DP array `dp[][]` is declared with `1 << n` rows and `n` columns. Each cell `dp[mask][i]` will store the maximum overlap obtainable with the last string in the subset `mask` being `words[i]`.
• To reconstruct the optimal combination of string overlaps, a predecessor array `p[][]` is maintained to track the choice of previous word in the optimal path.

• The algorithm iterates over all possible subsets `i` (represented as bitmasks) and each word `j`. For each subset-word pair, it calculates the maximum overlap by trying to add word `j` to the subset `pi` (where `pi` is the subset without word `j`).
• The idea is to transition from a smaller subset (without word `j`) to a larger subset (with word `j`) and update the DP table with the max overlap.
4. Reconstructing the Path:

• After populating the DP table, the last word `j` in the optimal path (leading to the maximum overlap in the full set) is found.
• From there, using the `p` table, the algorithm reconstructs the order of strings, starting from the full set (all bits set) and moving backwards by removing the current word from the mask and jumping to its predecessor.
5. Building the Shortest Superstring:

• Once the order is found, each word is added to the result in a specific way:
• The first word is added in full.
• Every subsequent word is added, minus the prefix that overlaps with the end of the current superstring (this overlap length is stored in `g[i][j]`).
• Any leftover words which were not in the set (which didn't form part of any optimal path) are simply appended to the end, because `words` are guaranteed not to be substrings of one another.

This solution uses dynamic programming for subset optimization combined with graph theory concepts (finding the longest path in an edge-weighted graph, where edges represent word overlaps). The superstring is formed by selecting substrings that give the maximum overlap when concatenated together. This results in space and time complexity that is exponential relative to the number of strings due to the bitmasking approach (`O(n^2 * 2^n)`), which is acceptable here given the problem's constraints.

๐ช
Level Up Your
Algo Skills

### Example Walkthrough

Let's consider a small example with the `words` array consisting of `["catg", "ctaagt", "gcta", "ttca", "atgcatc"]`. Here's how the solution approach operates on this example:

Step 1: Graph Creation with Overlaps

• For each pair of words, we calculate the maximum overlap. For instance:
• The overlap of `catg` and `ctaagt` is `cta`, with length 3.
• The overlap of `gcta` and `ttca` is `tca`, with length 3.
• ... and so on.

This results in a 2D overlap array `g[][]`.

Step 2: Dynamic Programming (DP) Table Initialization

• Create a `dp` table with `1 << 5` rows (since there are 5 strings) and 5 columns.
• Create a `p` predecessor table with the same dimensions to trace the optimal path.

• Iterate over all subsets of `words` (represented by bitmasks) and try to extend each subset by adding one word to determine the best overlap.
• When trying to add `ctaagt` to a subset containing `catg`, check the `dp` value of the subset without `ctaagt` and the `g[i][j]` overlap value; update `dp` and `p` if a larger overlap is achieved.

Step 4: Reconstructing the Path

• Once the `dp` table is fully populated, trace back from the full set `11111` to find the sequence resulting in the maximum overlap.
• For this example, letโs say the order obtained after reconstruction is `["gcta", "ttca", "ctaagt", "catg", "atgcatc"]`.

Step 5: Building the Shortest Superstring

• Start by adding `gcta` in full to the result.
• Then append `ttca` minus the prefix `tca`, resulting in the superstring `gcttca`.
• Next, add `ctaagt` minus the prefix `cta`, resulting in `gcttcaagt`.
• Add `catg` minus the prefix `catg` (no overlap with the last added string), so `gcttcaagtcatg`.
• Lastly, add `atgcatc` minus the prefix `atgcatc` (again no overlap), completing the superstring as `gcttcaagtcatgatgcatc`.

The final superstring `gcttcaagtcatgatgcatc` contains all strings from the `words` array as substrings, and it's the shortest possible string that does so according to our calculations.

This example illustrates the concept of using dynamic programming with bit masking and graph theory to construct the optimal string. The DP table `dp[mask][i]` holds the maximum overlap value, and `p[mask][i]` helps in tracing back the string sequence. We then build the superstring using these calculated overlaps, ensuring each string is represented as a substring while minimizing the total length.

## Python Solution

``````1from typing import List
2
3class Solution:
4    def shortestSuperstring(self, words: List[str]) -> str:
5        # Number of words
6        num_words = len(words)
7
8        # Graph g where g[i][j] represents the length of overlap if word i is followed by word j
9        overlap = [[0] * num_words for _ in range(num_words)]
10
11        # Calculate the maximum overlap of each pair of words
12        for i, word1 in enumerate(words):
13            for j, word2 in enumerate(words):
14                if i != j:
15                    # Check the maximum prefix of word2 that's a suffix of word1
16                    for k in range(min(len(word1), len(word2)), 0, -1):
17                        if word1[-k:] == word2[:k]:
18                            overlap[i][j] = k
19                            break
20
21        # Dynamic programming table, where dp[mask][i] holds the highest overlap with mask and ending with word i
22        dp = [[0] * num_words for _ in range(1 << num_words)]
23
24        # Parent table to reconstruct the path
25        parent = [[-1] * num_words for _ in range(1 << num_words)]
26
27        # Fill dp[] and parent[]
28        for mask in range(1 << num_words):
29            for j in range(num_words):
30                if (mask >> j) & 1:
33                    for k in range(num_words):
34                        if (prev_mask >> k) & 1:
35                            value = dp[prev_mask][k] + overlap[k][j]
39
40        # Recover the last word in the optimal arrangement
41        last = max(range(num_words), key=lambda i: dp[-1][i])
42
43        # Recover the order of words in the optimal superstring
44        order = []
45        mask = (1 << num_words) - 1
46        while last != -1:
48            order.append(last)
49            mask ^= (1 << last)
50            last = prev
51        order.reverse()
52
53        # Add words that are not in the optimal order (could happen if all overlaps are 0)
54        order.extend(j for j in range(num_words) if j not in order)
55
56        # Build the shortest superstring based on the order
57        result = [words[order[0]]]
58        for i in range(1, len(order)):
59            index = order[i]
60            prev_index = order[i - 1]
61            result.append(words[index][overlap[prev_index][index]:])
62
63        return ''.join(result)
64
65# An example of how to use the class would be:
66# solution = Solution()
67# superstring = solution.shortestSuperstring(["abc", "bcd", "cde"])
68``````

## Java Solution

``````1import java.util.ArrayList;
2import java.util.Arrays;
3import java.util.Collections;
4import java.util.HashSet;
5import java.util.List;
6import java.util.Set;
7
8class Solution {
9    public String shortestSuperstring(String[] words) {
10        int wordCount = words.length;
11        int[][] overlap = new int[wordCount][wordCount];
12
13        // Calculate the overlap between pairs of words
14        for (int i = 0; i < wordCount; ++i) {
15            for (int j = 0; j < wordCount; ++j) {
16                if (i != j) {
17                    for (int k = Math.min(words[i].length(), words[j].length()); k > 0; --k) {
18                        if (words[i].endsWith(words[j].substring(0, k))) {
19                            overlap[i][j] = k;
20                            break;
21                        }
22                    }
23                }
24            }
25        }
26
27        // Initialize the dynamic programming tables
28        int[][] dp = new int[1 << wordCount][wordCount];
29        int[][] parent = new int[1 << wordCount][wordCount];
30
31        // Fill the parent table with -1 to indicate no parent
32        for (int i = 0; i < (1 << wordCount); ++i) {
33            Arrays.fill(parent[i], -1);
34        }
35
36        // Build up the dp table which stores maximum overlap
38            for (int j = 0; j < wordCount; ++j) {
39                // Check if word j is in the set represented by bitmask
40                if ((bitmask & (1 << j)) > 0) {
41                    int prevState = bitmask ^ (1 << j);
42                    for (int k = 0; k < wordCount; ++k) {
43                        if ((prevState & (1 << k)) > 0) {
44                            int overlapValue = dp[prevState][k] + overlap[k][j];
45                            if (overlapValue > dp[bitmask][j]) {
48                            }
49                        }
50                    }
51                }
52            }
53        }
54
55        // Reconstruct the path and compute the answer
56        int[] order = findOptimalOrder(dp, parent, wordCount);
57        StringBuilder result = new StringBuilder(words[order[0]]);
58        for (int i = 1; i < wordCount; ++i) {
59            int o = overlap[order[i - 1]][order[i]];
60            result.append(words[order[i]].substring(o));
61        }
62        return result.toString();
63    }
64
65    // Helper function to find the optimal order of words
66    private int[] findOptimalOrder(int[][] dp, int[][] parent, int wordCount) {
67        List<Integer> path = new ArrayList<>();
68        int bitmask = (1 << wordCount) - 1;
69        int lastWordInPath = findLastWord(dp, wordCount);
70
71        // Reconstruct the path backwards using parent pointers
72        for (int i = lastWordInPath; bitmask != 0; ) {
75            bitmask &= ~(1 << i);
76            i = parent[temp][i];
77        }
78
79        // Add any unused words
80        Set<Integer> usedWords = new HashSet<>(path);
81        for (int i = 0; i < wordCount; ++i) {
82            if (!usedWords.contains(i)) {
84            }
85        }
86
87        // Reverse the order since we were moving backwards
88        Collections.reverse(path);
89
90        // Convert the List<Integer> to an array
91        return path.stream().mapToInt(i -> i).toArray();
92    }
93
94    // Helper function to find the last word in the optimal path
95    private int findLastWord(int[][] dp, int wordCount) {
96        int lastWordIndex = 0;
97        // Choose the word that ends up with the largest overlap
98        for (int i = 0; i < wordCount; ++i) {
99            if (dp[(1 << wordCount) - 1][i] > dp[(1 << wordCount) - 1][lastWordIndex]) {
100                lastWordIndex = i;
101            }
102        }
103        return lastWordIndex;
104    }
105}
106``````

## C++ Solution

``````1#include <vector>
2#include <string>
3#include <unordered_set>
4#include <algorithm>
5
6using namespace std;
7
8class Solution {
9public:
10    string shortestSuperstring(vector<string>& words) {
11        int wordCount = words.size();
12        // Initialize the graph to store overlaps
13        vector<vector<int>> overlaps(wordCount, vector<int>(wordCount, 0));
14
15        // Calculate overlap for each pair of words
16        for (int i = 0; i < wordCount; ++i) {
17            for (int j = 0; j < wordCount; ++j) {
18                if (i != j) {
19                    for (int overlapSize = min(words[i].size(), words[j].size()); overlapSize > 0; --overlapSize) {
20                        if (words[i].substr(words[i].size() - overlapSize) == words[j].substr(0, overlapSize)) {
21                            overlaps[i][j] = overlapSize;
22                            break;
23                        }
24                    }
25                }
26            }
27        }
28
29        // Dynamic Programming to store max overlap
30        vector<vector<int>> dp(1 << wordCount, vector<int>(wordCount, 0));
31        vector<vector<int>> parent(1 << wordCount, vector<int>(wordCount, -1));
32
33        // Fill the dp table
34        for (int mask = 0; mask < (1 << wordCount); ++mask) { // For each combination of words
35            for (int j = 0; j < wordCount; ++j) {
36                if ((mask >> j) & 1) {
38                    for (int k = 0; k < wordCount; ++k) {
39                        if ((previousMask >> k) & 1) {
40                            int val = dp[previousMask][k] + overlaps[k][j];
41                            if (val > dp[mask][j]) {
44                            }
45                        }
46                    }
47                }
48            }
49        }
50
51        // Reconstruct the path
52        vector<int> path;
53        int j = 0;
54
55        // Find the maximum overlap
56        for (int i = 0; i < wordCount; ++i) {
57            if (dp[(1 << wordCount) - 1][i] > dp[(1 << wordCount) - 1][j]) {
58                j = i;
59            }
60        }
61
62        // Build the path backwards
63        for (int mask = (1 << wordCount) - 1; parent[mask][j] != -1; ) {
65            path.push_back(j);
66            mask ^= (1 << j);
67            j = prev;
68        }
69        path.push_back(j);
70
71        // Include the remaining words
72        unordered_set<int> visited(path.begin(), path.end());
73        for (int i = 0; i < wordCount; ++i) {
74            if (!visited.count(i)) {
75                path.push_back(i);
76            }
77        }
78
79        // Construct the shortest superstring
80        reverse(path.begin(), path.end());
81        string superstring = words[path[0]];
82        for (int i = 1; i < wordCount; ++i) {
83            int overlap = overlaps[path[i - 1]][path[i]];
84            superstring += words[path[i]].substr(overlap);
85        }
86
87        return superstring;
88    }
89};
90``````

## Typescript Solution

``````1// Importing necessary functions from other modules
2import { vector, unordered_set } from 'somelibrary'; // Placeholder, replace with actual library if needed.
3
4function calculateOverlaps(words: string[]): number[][] {
5    const wordCount = words.length;
6    let overlaps: number[][] = vector(wordCount, vector(wordCount, 0));
7
8    // Calculate overlap for each pair of words
9    for (let i = 0; i < wordCount; ++i) {
10        for (let j = 0; j < wordCount; ++j) {
11            if (i !== j) {
12                for (let overlapSize = Math.min(words[i].length, words[j].length); overlapSize > 0; --overlapSize) {
13                    if (words[i].endsWith(words[j].substring(0, overlapSize))) {
14                        overlaps[i][j] = overlapSize;
15                        break;
16                    }
17                }
18            }
19        }
20    }
21
22    return overlaps;
23}
24
25function findShortestSuperstring(words: string[]): string {
26    const wordCount = words.length;
27    const overlaps = calculateOverlaps(words);
28
29    let dp: number[][] = vector(1 << wordCount, vector(wordCount, 0));
30    let parent: number[][] = vector(1 << wordCount, vector<number>(wordCount, -1));
31
32    // Fill the dp table
34        for (let j = 0; j < wordCount; ++j) {
35            if ((mask >> j) & 1) {
37                for (let k = 0; k < wordCount; ++k) {
38                    if ((previousMask >> k) & 1) {
39                        let val = dp[previousMask][k] + overlaps[k][j];
40                        if (val > dp[mask][j]) {
43                        }
44                    }
45                }
46            }
47        }
48    }
49
50    // Reconstruct the path
51    let path: number[] = [];
52    let j = 0;
53
54    // Find the maximum overlap
55    for (let i = 0; i < wordCount; ++i) {
56        if (dp[(1 << wordCount) - 1][i] > dp[(1 << wordCount) - 1][j]) {
57            j = i;
58        }
59    }
60
61    // Build the path backwards
62    for (let mask = (1 << wordCount) - 1; parent[mask][j] !== -1; ) {
64        path.push(j);
65        mask ^= (1 << j);
66        j = prev;
67    }
68    path.push(j);
69
70    // Include the remaining words
71    let visited = new unordered_set<number>(path);
72    for (let i = 0; i < wordCount; ++i) {
73        if (!visited.has(i)) {
74            path.push(i);
75        }
76    }
77
78    // Construct the shortest superstring
79    path.reverse();
80    let superstring = words[path[0]];
81    for (let i = 1; i < path.length; ++i) {
82        let overlap = overlaps[path[i - 1]][path[i]];
83        superstring += words[path[i]].substring(overlap);
84    }
85
86    return superstring;
87}
88``````

## Time and Space Complexity

The given code aims to find the shortest superstring that can be formed by a given list of strings by performing concatenations in a certain way.

### Time Complexity

The time complexity of the code is `O(n^2 * 2^n + n^2 * W)`, where 'n' is the number of words and 'W' is the maximum length of a word.

1. The first nested for-loop, with `g = [[0] * n for _ in range(n)]`, runs in `O(n^2 * W)` time. This loop calculates the overlap between each pair of words.

2. The second nested for-loop updates the dynamic programming (DP) table `dp` and predecessor table `p`, and it runs in `O(n^2 * 2^n)` time. This happens because for each of the `2^n` subsets of words, it calculates the maximum overlap by checking pairs of words, and there are `n^2` potential pairs.

3. The procedure to find the highest overlap at the last bit-mask (`(1 << n) - 1`) is `O(n)`.

4. Reconstructing the solution path has a time complexity of `O(n)`, since it reconstructs the full path of word concatenations.

5. Finally, the output string construction with list comprehensions has `O(n)` complexity but depends on the lengths of the words for actual string operations.

Considering all these steps, the first for loop and the DP table update are the most time-consuming parts.

### Space Complexity

The space complexity of the code is `O(n * 2^n + n^2)`, which comprises:

1. `O(n^2)` for storing the overlap between pairs of words in `g`.

2. `O(n * 2^n)` for storing the DP values in `dp`.

3. `O(n * 2^n)` for storing the predecessors in `p`.

The DP and predecessor tables account for the largest portion of the space used, as they store information for each subset of the power set of 'n' words.

In summary, the overall space complexity mirrors the structure of the DP and predecessor tables.

๐
Become an
Algo Monster

Got a question?ย Ask the Teaching Assistantย anything you don't understand.

Still not clear? Ask in the Forum, ย Discordย orย Submitย the part you don't understand to our editors.

โ
โTA ๐จโ๐ซ