1092. Shortest Common Supersequence


Problem Description

The problem provided requires us to find the shortest string that contains two given strings, str1 and str2, as subsequences. A subsequence of a string t is a new string generated from t after removing some (can be none) characters without changing the order of the remaining characters. For example, "ace" is a subsequence of "abcde" while "aec" is not. The task is to construct the shortest common supersequence that has both str1 and str2 as its subsequences. It is important to note that there can be multiple valid supersequences that satisfy the conditions, and any valid one can be returned.

Intuition

The intuition behind the solution starts by recognizing that this is a classic problem of finding the Longest Common Subsequence (LCS) of the two strings.

  1. The LCS assists us in identifying the common base around which we can organize the other characters in the supersequence. Since the LCS is the longest sequence found in both strings, we need to include it only once to cover its occurrence in both str1 and str2.

  2. Starting with an empty result string, we move through str1 and str2 in parallel, adding characters from both strings to the result. Whenever we find a pair of characters in str1 and str2 that are part of the LCS, we only add this character once to the result.

  3. To be more practical, we create a dynamic programming table, denoted as f, that stores the lengths of LCS for different substrings of str1 and str2. This table can be built by iterating over all characters of str1 and str2 where f[i][j] represents the length of the LCS between str1[:i] and str2[:j].

  4. Once we have filled this table, we can backtrack from f[m][n] (where m is the length of str1 and n is the length of str2) to construct the shortest common supersequence by choosing characters from str1 or str2 or both when they are the same and they match the LCS character.

By using the dynamic programming approach to find the LCS and careful backtracking to build the actual supersequence, we can construct a supersequence that is the shortest possible combination of str1 and str2, containing both as subsequences.

Learn more about Dynamic Programming patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece:

Consider the classic dynamic programming of fibonacci numbers, what is the recurrence relation?

Solution Approach

The solution utilizes dynamic programming (DP) to compute the shortest common supersequence. Here's how the steps break down in the provided code:

  1. A two-dimensional DP array f with dimensions (m+1) x (n+1) is initialized with zeros, where m and n are the lengths of str1 and str2, respectively. This array will eventually contain the lengths of the longest common subsequences for all possible substrings of str1 and str2.

  2. We fill in the DP table f using a nested loop. Here, f[i][j] will be filled with the length of the LCS of substrings str1[:i] and str2[:j]. If str1[i - 1] is equal to str2[j - 1], this means the characters match and can be part of the LCS, so we set f[i][j] to be 1 plus the length of the LCS at f[i - 1][j - 1]. Otherwise, we take the maximum of the lengths of the two possible LCS, by either including the last character of str1 (f[i - 1][j]) or str2 (f[i][j - 1]).

  3. After constructing the DP table, we backtrack to build the shortest common supersequence. The list ans is used to store the characters of the supersequence in reverse as we'll be starting from the bottom-right corner of the DP table and moving towards the top-left corner.

  4. The backtracking logic works as follows:

    • If i or j reaches 0, it means we have reached the end of str1 or str2, so we append the remaining characters of the other string.
    • If f[i][j] is equal to f[i-1][j], it means the current character of str1 is not part of the LCS, so we include str1[i-1] and move left in the table (decrement i).
    • If f[i][j] is equal to f[i][j-1], it's the character from str2 that's not part of the LCS, so we add str2[j-1] to the answer and move up (decrement j).
    • If none of the above conditions meet, it means both characters from str1 and str2 are part of the LCS, so we add either one to the answer and move diagonally up-left in the table (decrement both i and j).
  5. Because ans contains the characters in reverse order (from backtracking), we reverse it back to get the correct order of the shortest common supersequence and return it as a string.

This algorithm ensures the generation of the shortest string which is a supersequence of both str1 and str2, using DP for efficiency in finding the LCS, and backtracking to build the supersequence from the LCS.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

What is the best way of checking if an element exists in an unsorted array once in terms of time complexity? Select the best that applies.

Example Walkthrough

Let's take two strings str1 = "abac" and str2 = "cab" to demonstrate how to find the shortest common supersequence using the dynamic programming approach.

  1. First, we initialize an empty DP table f with dimensions (4+1) x (3+1) since len(str1) is 4 and len(str2) is 3. Each cell in f will represent the longest common subsequence (LCS) length for substrings ending at str1[i-1] and str2[j-1].

  2. We then populate the table f as follows:

    • Start with i = 1 and j = 1. If str1[i - 1] == str2[j - 1], set f[i][j] to f[i - 1][j - 1] + 1. Otherwise, set f[i][j] to max(f[i - 1][j], f[i][j - 1]).
    • Repeat this process to fill the entire table. Since "abac" has no common characters with "cab" in the first position, the first row and column will be filled with incremental counts.

    The filled DP table f would look something like this:

    1  c a b
    20 0 0 0
    3a 0 1 1 1
    4b 0 1 2 2
    5a 0 1 2 2
    6c 0 1 2 2

    This table tells us that the length of the LCS for "abac" and "cab" is 2.

  3. Starting from f[4][3], we backtrack to construct the supersequence. Initializing i = 4, j = 3, we go backwards and:

    • Note that the characters str1[3] (c) and str2[2] (a) aren't equal. Since f[4][3] == f[3][3], we add "c" from str1 to our answer and decrement i to 3.
    • Now, str1[2] (a) matches str2[2] (a), so we add "a" from either string to our answer and decrement both i and j.
    • At f[2][1], the characters str1[1] (b) and str2[0] (c) don't match. Since f[2][1] == f[1][1], we add "b" from str1 and decrement i.
    • Finally, str1[0] matches str2[0] (c), so we add "c" to our answer and decrement each index.

    The result string (ans) at this point is "cbac" in reverse order.

  4. Reversing ans, we get "cabc" which is the shortest common supersequence of str1 and str2.

In this way, by using dynamic programming to determine the LCS of the two strings and carefully backtracking from the end, we have successfully constructed the shortest string containing str1 = "abac" and str2 = "cab" as subsequences.

Not Sure What to Study? Take the 2-min Quiz:

Suppose k is a very large integer(2^64). Which of the following is the largest as n grows to infinity?

Python Solution

1class Solution:
2    def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
3        # Get the lengths of both input strings
4        len_str1, len_str2 = len(str1), len(str2)
5      
6        # Dynamic programming table f, where f[i][j] will store the length of the 
7        # longest common subsequence of str1[:i] and str2[:j]
8        dp_table = [[0] * (len_str2 + 1) for _ in range(len_str1 + 1)]
9      
10        # Build the table in bottom-up manner
11        for i in range(1, len_str1 + 1):
12            for j in range(1, len_str2 + 1):
13                # If characters match, add 1 to the diagonal value
14                if str1[i - 1] == str2[j - 1]:
15                    dp_table[i][j] = dp_table[i - 1][j - 1] + 1
16                # Otherwise, take the maximum value from above or left cell
17                else:
18                    dp_table[i][j] = max(dp_table[i - 1][j], dp_table[i][j - 1])
19      
20        # This will hold the shortest common supersequence characters
21        sscs = []
22      
23        # Initialize pointers for both strings
24        i, j = len_str1, len_str2
25      
26        # Trace back from the bottom-right corner of the table
27        while i > 0 or j > 0:
28            if i == 0:
29                # If we've finished str1, add remaining str2 characters
30                j -= 1
31                sscs.append(str2[j])
32            elif j == 0:
33                # If we've finished str2, add remaining str1 characters
34                i -= 1
35                sscs.append(str1[i])
36            else:
37                # Follow the path of the longest common subsequence
38                if dp_table[i][j] == dp_table[i - 1][j]:
39                    i -= 1
40                    sscs.append(str1[i])
41                elif dp_table[i][j] == dp_table[i][j - 1]:
42                    j -= 1
43                    sscs.append(str2[j])
44                else:
45                    # If characters match, go diagonally up-left and add the character
46                    i -= 1
47                    j -= 1
48                    sscs.append(str1[i])
49      
50        # The sscs list contains the shortest common supersequence in reverse order;
51        # reverse it to get the correct sequence
52        return ''.join(sscs[::-1])
53

Java Solution

1class Solution {
2    public String shortestCommonSupersequence(String str1, String str2) {
3        int str1Length = str1.length(), str2Length = str2.length();
4        int[][] dp = new int[str1Length + 1][str2Length + 1];
5
6        // Calculate the length of longest common subsequence using dynamic programming
7        for (int i = 1; i <= str1Length; ++i) {
8            for (int j = 1; j <= str2Length; ++j) {
9                if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
10                    // Characters match, take diagonal value plus one
11                    dp[i][j] = dp[i - 1][j - 1] + 1;
12                } else {
13                    // Choose the maximum value from the cell above or the cell to the left
14                    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
15                }
16            }
17        }
18
19        // Reconstruct the shortest common supersequence from the dp table
20        StringBuilder shortestSupersequence = new StringBuilder();
21        int i = str1Length, j = str2Length;
22
23        while (i > 0 || j > 0) {
24            if (i == 0) {
25                // If we have reached the beginning of str1, add remaining characters from str2
26                shortestSupersequence.append(str2.charAt(--j));
27            } else if (j == 0) {
28                // If we have reached the beginning of str2, add remaining characters from str1
29                shortestSupersequence.append(str1.charAt(--i));
30            } else {
31                // Move diagonally if characters match
32                if (dp[i][j] == dp[i - 1][j - 1] + 1) {
33                    shortestSupersequence.append(str1.charAt(--i));
34                    --j;
35                } else if (dp[i][j] == dp[i - 1][j]) {
36                    // Move up if the character from str1 is part of the supersequence
37                    shortestSupersequence.append(str1.charAt(--i));
38                } else {
39                    // Move left if the character from str2 is part of the supersequence
40                    shortestSupersequence.append(str2.charAt(--j));
41                }
42            }
43        }
44
45        // The constructed supersequence is in reverse, so reverse it to get the final string
46        return shortestSupersequence.reverse().toString();
47    }
48}
49

C++ Solution

1#include <vector>
2#include <algorithm>
3#include <string>
4
5class Solution {
6public:
7    // Method calculates the shortest common supersequence of two strings
8    std::string shortestCommonSupersequence(std::string str1, std::string str2) {
9        int m = str1.size(); // Length of str1
10        int n = str2.size(); // Length of str2
11      
12        // dp table with dimensions m+1 by n+1, initialized to 0
13        std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));
14    
15        // Fill the dp table
16        for (int i = 1; i <= m; ++i) {
17            for (int j = 1; j <= n; ++j) {
18                if (str1[i - 1] == str2[j - 1]) {
19                    // Characters match, increment the length of common subsequence
20                    dp[i][j] = dp[i - 1][j - 1] + 1;
21                } else {
22                    // Characters do not match, take the maximum from either the top or left cell
23                    dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
24                }
25            }
26        }
27      
28        // Reconstruct the shortest common supersequence from the dp table
29        std::string sequence;
30        int i = m, j = n;
31        while (i > 0 || j > 0) {
32            if (i == 0) {
33                // If we have reached the beginning of str1, append remaining str2
34                sequence += str2[--j];
35            } else if (j == 0) {
36                // If we have reached the beginning of str2, append remaining str1
37                sequence += str1[--i];
38            } else {
39                // Decide which character to append from either str1 or str2
40                if (dp[i][j] == dp[i - 1][j]) {
41                    // Coming from top, append str1's character
42                    sequence += str1[--i];
43                } else if (dp[i][j] == dp[i][j - 1]) {
44                    // Coming from left, append str2's character
45                    sequence += str2[--j];
46                } else {
47                    // When characters match, append one of them and move diagonally
48                    sequence += str1[--i];
49                    --j;
50                }
51            }
52        }
53      
54        // Since the construction was from back to front, reverse the string
55        std::reverse(sequence.begin(), sequence.end());
56      
57        return sequence;
58    }
59};
60

Typescript Solution

1function shortestCommonSupersequence(str1: string, str2: string): string {
2    const str1Length = str1.length;
3    const str2Length = str2.length;
4    // Create a 2D matrix to store the lengths of the longest common subsequences
5    const dpMatrix = new Array(str1Length + 1).fill(0).map(() => new Array(str2Length + 1).fill(0));
6
7    // Build the matrix with the lengths of the longest common subsequences 
8    for (let i = 1; i <= str1Length; ++i) {
9        for (let j = 1; j <= str2Length; ++j) {
10            if (str1[i - 1] === str2[j - 1]) {
11                dpMatrix[i][j] = dpMatrix[i - 1][j - 1] + 1;
12            } else {
13                dpMatrix[i][j] = Math.max(dpMatrix[i - 1][j], dpMatrix[i][j - 1]);
14            }
15        }
16    }
17
18    // Initialize an array to build the shortest common supersequence
19    let supersequence: string[] = [];
20    let i = str1Length;
21    let j = str2Length;
22
23    // Backtrack through the matrix to find the shortest common supersequence
24    while (i > 0 || j > 0) {
25        if (i === 0) {
26            supersequence.push(str2[--j]);
27        } else if (j === 0) {
28            supersequence.push(str1[--i]);
29        } else {
30            if (dpMatrix[i][j] === dpMatrix[i - 1][j]) {
31                supersequence.push(str1[--i]);
32            } else if (dpMatrix[i][j] === dpMatrix[i][j - 1]) {
33                supersequence.push(str2[--j]);
34            } else {
35                supersequence.push(str1[--i]);
36                --j;
37            }
38        }
39    }
40
41    // Reverse the array and join it to form the final supersequence string
42    return supersequence.reverse().join('');
43}
44
Fast Track Your Learning with Our Quick Skills Quiz:

Suppose k is a very large integer(2^64). Which of the following is the largest as n grows to infinity?

Time and Space Complexity

The given Python code defines a method shortestCommonSupersequence to find the shortest common supersequence of two strings str1 and str2. To analyze the computational complexities, we need to consider both the time taken to execute the code, which depends on the number of operations performed, and the space required to store the data during execution.

Time Complexity

The time complexity of the algorithm is determined by the nested loop structure which iterates over the lengths of the two input strings str1 and str2. There are two main steps in this algorithm:

  1. Filling out the dynamic programming table f, which has dimensions (m + 1) * (n + 1), where m is the length of str1 and n is the length of str2. Each cell in the table is filled in constant time, so the time required to fill this table is directly proportional to the number of cells, resulting in a time complexity of O(m * n).

  2. Constructing the shortest common supersequence by traversing back from the bottom-right corner of the table to the top-left, appending characters accordingly. The length of this path is m + n in the worst case because each step either moves left or up in the table, and there can be at most m upward moves and n left moves. Therefore, the time complexity for the reconstruction step is O(m + n).

The overall time complexity of the algorithm is the sum of these two parts, predominantly the first since it involves iterating over a two-dimensional array. Thus, the total time complexity is O(m * n) + O(m + n), which simplifies to O(m * n) because m * n dominates m + n for large values of m and n.

Space Complexity

The space complexity is dictated by the space required to store the dynamic programming table f, which has (m + 1) * (n + 1) cells, each storing an integer. Hence, the space complexity is O(m * n) to accommodate this table. Additional space is used for the list ans which is used to construct the shortest common supersequence. In the worst case, ans could grow to a size of m + n, when all characters from both str1 and str2 are used in the supersequence. However, the space complexity remains O(m * n) because m * n dominates m + n.

In summary, the time complexity of the code is O(m * n), and the space complexity of the code is also O(m * n).

Learn more about how to find time and space complexity quickly.


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫