1092. Shortest Common Supersequence
Problem Description
The problem provided requires us to find the shortest string that contains two given strings, str1
and str2
, as subsequences. A subsequence of a string t
is a new string generated from t
after removing some (can be none) characters without changing the order of the remaining characters. For example, "ace" is a subsequence of "abcde" while "aec" is not. The task is to construct the shortest common supersequence that has both str1
and str2
as its subsequences. It is important to note that there can be multiple valid supersequences that satisfy the conditions, and any valid one can be returned.
Intuition
The intuition behind the solution starts by recognizing that this is a classic problem of finding the Longest Common Subsequence (LCS) of the two strings.
-
The LCS assists us in identifying the common base around which we can organize the other characters in the supersequence. Since the LCS is the longest sequence found in both strings, we need to include it only once to cover its occurrence in both
str1
andstr2
. -
Starting with an empty result string, we move through
str1
andstr2
in parallel, adding characters from both strings to the result. Whenever we find a pair of characters instr1
andstr2
that are part of the LCS, we only add this character once to the result. -
To be more practical, we create a dynamic programming table, denoted as
f
, that stores the lengths of LCS for different substrings ofstr1
andstr2
. This table can be built by iterating over all characters ofstr1
andstr2
wheref[i][j]
represents the length of the LCS betweenstr1[:i]
andstr2[:j]
. -
Once we have filled this table, we can backtrack from
f[m][n]
(wherem
is the length ofstr1
andn
is the length ofstr2
) to construct the shortest common supersequence by choosing characters fromstr1
orstr2
or both when they are the same and they match the LCS character.
By using the dynamic programming approach to find the LCS and careful backtracking to build the actual supersequence, we can construct a supersequence that is the shortest possible combination of str1
and str2
, containing both as subsequences.
Learn more about Dynamic Programming patterns.
Solution Approach
The solution utilizes dynamic programming (DP) to compute the shortest common supersequence. Here's how the steps break down in the provided code:
-
A two-dimensional DP array
f
with dimensions(m+1) x (n+1)
is initialized with zeros, wherem
andn
are the lengths ofstr1
andstr2
, respectively. This array will eventually contain the lengths of the longest common subsequences for all possible substrings ofstr1
andstr2
. -
We fill in the DP table
f
using a nested loop. Here,f[i][j]
will be filled with the length of the LCS of substringsstr1[:i]
andstr2[:j]
. Ifstr1[i - 1]
is equal tostr2[j - 1]
, this means the characters match and can be part of the LCS, so we setf[i][j]
to be1
plus the length of the LCS atf[i - 1][j - 1]
. Otherwise, we take the maximum of the lengths of the two possible LCS, by either including the last character ofstr1
(f[i - 1][j]
) orstr2
(f[i][j - 1]
). -
After constructing the DP table, we backtrack to build the shortest common supersequence. The list
ans
is used to store the characters of the supersequence in reverse as we'll be starting from the bottom-right corner of the DP table and moving towards the top-left corner. -
The backtracking logic works as follows:
- If
i
orj
reaches0
, it means we have reached the end ofstr1
orstr2
, so we append the remaining characters of the other string. - If
f[i][j]
is equal tof[i-1][j]
, it means the current character ofstr1
is not part of the LCS, so we includestr1[i-1]
and move left in the table (decrementi
). - If
f[i][j]
is equal tof[i][j-1]
, it's the character fromstr2
that's not part of the LCS, so we addstr2[j-1]
to the answer and move up (decrementj
). - If none of the above conditions meet, it means both characters from
str1
andstr2
are part of the LCS, so we add either one to the answer and move diagonally up-left in the table (decrement bothi
andj
).
- If
-
Because
ans
contains the characters in reverse order (from backtracking), we reverse it back to get the correct order of the shortest common supersequence and return it as a string.
This algorithm ensures the generation of the shortest string which is a supersequence of both str1
and str2
, using DP for efficiency in finding the LCS, and backtracking to build the supersequence from the LCS.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's take two strings str1 = "abac"
and str2 = "cab"
to demonstrate how to find the shortest common supersequence using the dynamic programming approach.
-
First, we initialize an empty DP table
f
with dimensions(4+1) x (3+1)
sincelen(str1)
is 4 andlen(str2)
is 3. Each cell inf
will represent the longest common subsequence (LCS) length for substrings ending atstr1[i-1]
andstr2[j-1]
. -
We then populate the table
f
as follows:- Start with
i = 1
andj = 1
. Ifstr1[i - 1] == str2[j - 1]
, setf[i][j]
tof[i - 1][j - 1] + 1
. Otherwise, setf[i][j]
tomax(f[i - 1][j], f[i][j - 1])
. - Repeat this process to fill the entire table. Since "abac" has no common characters with "cab" in the first position, the first row and column will be filled with incremental counts.
The filled DP table
f
would look something like this:c a b 0 0 0 0 a 0 1 1 1 b 0 1 2 2 a 0 1 2 2 c 0 1 2 2
This table tells us that the length of the LCS for "abac" and "cab" is 2.
- Start with
-
Starting from
f[4][3]
, we backtrack to construct the supersequence. Initializingi = 4
,j = 3
, we go backwards and:- Note that the characters
str1[3]
(c
) andstr2[2]
(a
) aren't equal. Sincef[4][3] == f[3][3]
, we add "c" fromstr1
to our answer and decrementi
to 3. - Now,
str1[2]
(a
) matchesstr2[2]
(a
), so we add "a" from either string to our answer and decrement bothi
andj
. - At
f[2][1]
, the charactersstr1[1]
(b
) andstr2[0]
(c
) don't match. Sincef[2][1] == f[1][1]
, we add "b" fromstr1
and decrementi
. - Finally,
str1[0]
matchesstr2[0]
(c
), so we add "c" to our answer and decrement each index.
The result string (
ans
) at this point is "cbac" in reverse order. - Note that the characters
-
Reversing
ans
, we get "cabc" which is the shortest common supersequence ofstr1
andstr2
.
In this way, by using dynamic programming to determine the LCS of the two strings and carefully backtracking from the end, we have successfully constructed the shortest string containing str1 = "abac"
and str2 = "cab"
as subsequences.
Solution Implementation
1class Solution:
2 def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
3 # Get the lengths of both input strings
4 len_str1, len_str2 = len(str1), len(str2)
5
6 # Dynamic programming table f, where f[i][j] will store the length of the
7 # longest common subsequence of str1[:i] and str2[:j]
8 dp_table = [[0] * (len_str2 + 1) for _ in range(len_str1 + 1)]
9
10 # Build the table in bottom-up manner
11 for i in range(1, len_str1 + 1):
12 for j in range(1, len_str2 + 1):
13 # If characters match, add 1 to the diagonal value
14 if str1[i - 1] == str2[j - 1]:
15 dp_table[i][j] = dp_table[i - 1][j - 1] + 1
16 # Otherwise, take the maximum value from above or left cell
17 else:
18 dp_table[i][j] = max(dp_table[i - 1][j], dp_table[i][j - 1])
19
20 # This will hold the shortest common supersequence characters
21 sscs = []
22
23 # Initialize pointers for both strings
24 i, j = len_str1, len_str2
25
26 # Trace back from the bottom-right corner of the table
27 while i > 0 or j > 0:
28 if i == 0:
29 # If we've finished str1, add remaining str2 characters
30 j -= 1
31 sscs.append(str2[j])
32 elif j == 0:
33 # If we've finished str2, add remaining str1 characters
34 i -= 1
35 sscs.append(str1[i])
36 else:
37 # Follow the path of the longest common subsequence
38 if dp_table[i][j] == dp_table[i - 1][j]:
39 i -= 1
40 sscs.append(str1[i])
41 elif dp_table[i][j] == dp_table[i][j - 1]:
42 j -= 1
43 sscs.append(str2[j])
44 else:
45 # If characters match, go diagonally up-left and add the character
46 i -= 1
47 j -= 1
48 sscs.append(str1[i])
49
50 # The sscs list contains the shortest common supersequence in reverse order;
51 # reverse it to get the correct sequence
52 return ''.join(sscs[::-1])
53
1class Solution {
2 public String shortestCommonSupersequence(String str1, String str2) {
3 int str1Length = str1.length(), str2Length = str2.length();
4 int[][] dp = new int[str1Length + 1][str2Length + 1];
5
6 // Calculate the length of longest common subsequence using dynamic programming
7 for (int i = 1; i <= str1Length; ++i) {
8 for (int j = 1; j <= str2Length; ++j) {
9 if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
10 // Characters match, take diagonal value plus one
11 dp[i][j] = dp[i - 1][j - 1] + 1;
12 } else {
13 // Choose the maximum value from the cell above or the cell to the left
14 dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
15 }
16 }
17 }
18
19 // Reconstruct the shortest common supersequence from the dp table
20 StringBuilder shortestSupersequence = new StringBuilder();
21 int i = str1Length, j = str2Length;
22
23 while (i > 0 || j > 0) {
24 if (i == 0) {
25 // If we have reached the beginning of str1, add remaining characters from str2
26 shortestSupersequence.append(str2.charAt(--j));
27 } else if (j == 0) {
28 // If we have reached the beginning of str2, add remaining characters from str1
29 shortestSupersequence.append(str1.charAt(--i));
30 } else {
31 // Move diagonally if characters match
32 if (dp[i][j] == dp[i - 1][j - 1] + 1) {
33 shortestSupersequence.append(str1.charAt(--i));
34 --j;
35 } else if (dp[i][j] == dp[i - 1][j]) {
36 // Move up if the character from str1 is part of the supersequence
37 shortestSupersequence.append(str1.charAt(--i));
38 } else {
39 // Move left if the character from str2 is part of the supersequence
40 shortestSupersequence.append(str2.charAt(--j));
41 }
42 }
43 }
44
45 // The constructed supersequence is in reverse, so reverse it to get the final string
46 return shortestSupersequence.reverse().toString();
47 }
48}
49
1#include <vector>
2#include <algorithm>
3#include <string>
4
5class Solution {
6public:
7 // Method calculates the shortest common supersequence of two strings
8 std::string shortestCommonSupersequence(std::string str1, std::string str2) {
9 int m = str1.size(); // Length of str1
10 int n = str2.size(); // Length of str2
11
12 // dp table with dimensions m+1 by n+1, initialized to 0
13 std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));
14
15 // Fill the dp table
16 for (int i = 1; i <= m; ++i) {
17 for (int j = 1; j <= n; ++j) {
18 if (str1[i - 1] == str2[j - 1]) {
19 // Characters match, increment the length of common subsequence
20 dp[i][j] = dp[i - 1][j - 1] + 1;
21 } else {
22 // Characters do not match, take the maximum from either the top or left cell
23 dp[i][j] = std::max(dp[i - 1][j], dp[i][j - 1]);
24 }
25 }
26 }
27
28 // Reconstruct the shortest common supersequence from the dp table
29 std::string sequence;
30 int i = m, j = n;
31 while (i > 0 || j > 0) {
32 if (i == 0) {
33 // If we have reached the beginning of str1, append remaining str2
34 sequence += str2[--j];
35 } else if (j == 0) {
36 // If we have reached the beginning of str2, append remaining str1
37 sequence += str1[--i];
38 } else {
39 // Decide which character to append from either str1 or str2
40 if (dp[i][j] == dp[i - 1][j]) {
41 // Coming from top, append str1's character
42 sequence += str1[--i];
43 } else if (dp[i][j] == dp[i][j - 1]) {
44 // Coming from left, append str2's character
45 sequence += str2[--j];
46 } else {
47 // When characters match, append one of them and move diagonally
48 sequence += str1[--i];
49 --j;
50 }
51 }
52 }
53
54 // Since the construction was from back to front, reverse the string
55 std::reverse(sequence.begin(), sequence.end());
56
57 return sequence;
58 }
59};
60
1function shortestCommonSupersequence(str1: string, str2: string): string {
2 const str1Length = str1.length;
3 const str2Length = str2.length;
4 // Create a 2D matrix to store the lengths of the longest common subsequences
5 const dpMatrix = new Array(str1Length + 1).fill(0).map(() => new Array(str2Length + 1).fill(0));
6
7 // Build the matrix with the lengths of the longest common subsequences
8 for (let i = 1; i <= str1Length; ++i) {
9 for (let j = 1; j <= str2Length; ++j) {
10 if (str1[i - 1] === str2[j - 1]) {
11 dpMatrix[i][j] = dpMatrix[i - 1][j - 1] + 1;
12 } else {
13 dpMatrix[i][j] = Math.max(dpMatrix[i - 1][j], dpMatrix[i][j - 1]);
14 }
15 }
16 }
17
18 // Initialize an array to build the shortest common supersequence
19 let supersequence: string[] = [];
20 let i = str1Length;
21 let j = str2Length;
22
23 // Backtrack through the matrix to find the shortest common supersequence
24 while (i > 0 || j > 0) {
25 if (i === 0) {
26 supersequence.push(str2[--j]);
27 } else if (j === 0) {
28 supersequence.push(str1[--i]);
29 } else {
30 if (dpMatrix[i][j] === dpMatrix[i - 1][j]) {
31 supersequence.push(str1[--i]);
32 } else if (dpMatrix[i][j] === dpMatrix[i][j - 1]) {
33 supersequence.push(str2[--j]);
34 } else {
35 supersequence.push(str1[--i]);
36 --j;
37 }
38 }
39 }
40
41 // Reverse the array and join it to form the final supersequence string
42 return supersequence.reverse().join('');
43}
44
Time and Space Complexity
The given Python code defines a method shortestCommonSupersequence
to find the shortest common supersequence of two strings str1
and str2
. To analyze the computational complexities, we need to consider both the time taken to execute the code, which depends on the number of operations performed, and the space required to store the data during execution.
Time Complexity
The time complexity of the algorithm is determined by the nested loop structure which iterates over the lengths of the two input strings str1
and str2
. There are two main steps in this algorithm:
-
Filling out the dynamic programming table
f
, which has dimensions(m + 1) * (n + 1)
, wherem
is the length ofstr1
andn
is the length ofstr2
. Each cell in the table is filled in constant time, so the time required to fill this table is directly proportional to the number of cells, resulting in a time complexity ofO(m * n)
. -
Constructing the shortest common supersequence by traversing back from the bottom-right corner of the table to the top-left, appending characters accordingly. The length of this path is
m + n
in the worst case because each step either moves left or up in the table, and there can be at mostm
upward moves andn
left moves. Therefore, the time complexity for the reconstruction step isO(m + n)
.
The overall time complexity of the algorithm is the sum of these two parts, predominantly the first since it involves iterating over a two-dimensional array. Thus, the total time complexity is O(m * n) + O(m + n)
, which simplifies to O(m * n)
because m * n
dominates m + n
for large values of m
and n
.
Space Complexity
The space complexity is dictated by the space required to store the dynamic programming table f
, which has (m + 1) * (n + 1)
cells, each storing an integer. Hence, the space complexity is O(m * n)
to accommodate this table. Additional space is used for the list ans
which is used to construct the shortest common supersequence. In the worst case, ans
could grow to a size of m + n
, when all characters from both str1
and str2
are used in the supersequence. However, the space complexity remains O(m * n)
because m * n
dominates m + n
.
In summary, the time complexity of the code is O(m * n)
, and the space complexity of the code is also O(m * n)
.
Learn more about how to find time and space complexity quickly using problem constraints.
Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!