1092. Shortest Common Supersequence
Problem Description
You are given two strings str1
and str2
. Your task is to find the shortest string that contains both str1
and str2
as subsequences.
A subsequence is formed by deleting some (possibly zero) characters from a string without changing the order of the remaining characters. For example, "ace"
is a subsequence of "abcde"
(delete 'b'
and 'd'
).
The goal is to construct the shortest possible string where:
- If you delete some characters from this string, you can get
str1
- If you delete some (other) characters from this string, you can get
str2
If multiple valid shortest strings exist, you can return any one of them.
Example:
- If
str1 = "abac"
andstr2 = "cab"
, one possible shortest common supersequence is"cabac"
.- Deleting the second
'c'
gives"abac"
(which isstr1
) - Deleting the first
'a'
and last'a'
and'c'
gives"cab"
(which isstr2
)
- Deleting the second
The solution uses dynamic programming to first find the Longest Common Subsequence (LCS) between the two strings, then reconstructs the shortest supersequence by:
- Building a DP table
f[i][j]
that stores the length of LCS for the firsti
characters ofstr1
and firstj
characters ofstr2
- Backtracking through the DP table to build the result string by:
- Including characters that are common to both strings once
- Including characters unique to each string in the appropriate positions
Intuition
The key insight is recognizing that the shortest common supersequence is closely related to the Longest Common Subsequence (LCS) of the two strings.
Think about it this way: if two strings share some common characters in the same order, we only need to include those shared characters once in our result. The more characters they share (i.e., the longer their LCS), the shorter our final supersequence will be.
For example, if str1 = "abac"
and str2 = "cab"
, their LCS is "ab"
. To build the shortest supersequence:
- We must include all characters from both strings
- But we can "merge" the common subsequence characters instead of duplicating them
The relationship can be expressed as:
Length of shortest supersequence = len(str1) + len(str2) - len(LCS)
This is because:
- We need all characters from
str1
(lengthm
) - We need all characters from
str2
(lengthn
) - But the LCS characters are counted twice, so we subtract them once
Once we understand this relationship, the solution strategy becomes clear:
- First, find the LCS using dynamic programming
- Then, reconstruct the actual supersequence by backtracking through the DP table
During reconstruction, we traverse backwards through the DP table:
- When characters match (part of LCS), we include that character once
- When they don't match, we need to include the character from whichever string would maintain the subsequence property
- At boundaries (when one string is exhausted), we include all remaining characters from the other string
This greedy reconstruction ensures we build the shortest possible string that contains both original strings as subsequences.
Learn more about Dynamic Programming patterns.
Solution Approach
The implementation consists of two main phases: building the LCS table and reconstructing the shortest supersequence.
Phase 1: Building the LCS Dynamic Programming Table
We create a 2D table f[i][j]
where f[i][j]
represents the length of the LCS between the first i
characters of str1
and the first j
characters of str2
.
f = [[0] * (n + 1) for _ in range(m + 1)]
The table is filled using the recurrence relation:
- If
str1[i-1] == str2[j-1]
:f[i][j] = f[i-1][j-1] + 1
(characters match, extend LCS) - Otherwise:
f[i][j] = max(f[i-1][j], f[i][j-1])
(take the better LCS without current character)
Phase 2: Backtracking to Build the Result
Starting from f[m][n]
(bottom-right corner), we trace back to reconstruct the shortest supersequence:
i, j = m, n ans = []
The backtracking logic follows these rules:
-
Boundary cases:
- If
i == 0
: We've exhaustedstr1
, so add remaining characters fromstr2[j]
- If
j == 0
: We've exhaustedstr2
, so add remaining characters fromstr1[i]
- If
-
Non-boundary cases:
- If
f[i][j] == f[i-1][j]
: The LCS value came from excludingstr1[i-1]
, so we addstr1[i-1]
to result and move to(i-1, j)
- Else if
f[i][j] == f[i][j-1]
: The LCS value came from excludingstr2[j-1]
, so we addstr2[j-1]
to result and move to(i, j-1)
- Else: Characters matched (part of LCS), add the character once and move diagonally to
(i-1, j-1)
- If
Since we build the result by traversing backwards, we reverse the final array:
return ''.join(ans[::-1])
Time Complexity: O(m * n)
for building the DP table and O(m + n)
for reconstruction
Space Complexity: O(m * n)
for the DP table
The algorithm guarantees the shortest supersequence because it includes each LCS character exactly once while preserving all non-LCS characters from both strings in their required order.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the algorithm with str1 = "abc"
and str2 = "adc"
.
Step 1: Build the LCS DP Table
We create a table f[i][j]
where f[i][j]
represents the LCS length for the first i
characters of str1
and first j
characters of str2
.
"" a d c "" 0 0 0 0 a 0 1 1 1 b 0 1 1 1 c 0 1 1 2
Building process:
f[1][1]
:str1[0]='a'
==str2[0]='a'
, sof[1][1] = f[0][0] + 1 = 1
f[1][2]
:str1[0]='a'
!=str2[1]='d'
, sof[1][2] = max(f[0][2], f[1][1]) = 1
f[2][1]
:str1[1]='b'
!=str2[0]='a'
, sof[2][1] = max(f[1][1], f[2][0]) = 1
f[3][3]
:str1[2]='c'
==str2[2]='c'
, sof[3][3] = f[2][2] + 1 = 2
The LCS length is 2 (characters 'a' and 'c').
Step 2: Backtrack to Build Result
Starting from (i=3, j=3)
, we trace backwards:
-
At (3,3):
f[3][3]=2
!=f[2][3]=1
and !=f[3][2]=1
- This means
str1[2]='c'
andstr2[2]='c'
match - Add 'c' to result:
ans = ['c']
- Move diagonally to (2,2)
- This means
-
At (2,2):
f[2][2]=1
==f[2][1]=1
- The LCS value came from excluding
str2[1]='d'
- Add 'd' to result:
ans = ['c', 'd']
- Move left to (2,1)
- The LCS value came from excluding
-
At (2,1):
f[2][1]=1
==f[1][1]=1
- The LCS value came from excluding
str1[1]='b'
- Add 'b' to result:
ans = ['c', 'd', 'b']
- Move up to (1,1)
- The LCS value came from excluding
-
At (1,1):
f[1][1]=1
!=f[0][1]=0
and !=f[1][0]=0
- This means
str1[0]='a'
andstr2[0]='a'
match - Add 'a' to result:
ans = ['c', 'd', 'b', 'a']
- Move diagonally to (0,0)
- This means
-
At (0,0): Both strings exhausted, stop.
Step 3: Reverse the Result
Since we built backwards, reverse the array: ['c', 'd', 'b', 'a']
→ "adbc"
Verification:
"adbc"
contains"abc"
as subsequence (delete 'd')"adbc"
contains"adc"
as subsequence (delete 'b')- Length is 4, which equals
len("abc") + len("adc") - LCS_length = 3 + 3 - 2 = 4
✓
The shortest common supersequence is "adbc"
.
Solution Implementation
1class Solution:
2 def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
3 # Get lengths of both strings
4 len1, len2 = len(str1), len(str2)
5
6 # Create DP table to store length of longest common subsequence (LCS)
7 # dp[i][j] represents LCS length of str1[0:i] and str2[0:j]
8 dp = [[0] * (len2 + 1) for _ in range(len1 + 1)]
9
10 # Fill the DP table to find LCS lengths
11 for i in range(1, len1 + 1):
12 for j in range(1, len2 + 1):
13 if str1[i - 1] == str2[j - 1]:
14 # Characters match, extend the LCS
15 dp[i][j] = dp[i - 1][j - 1] + 1
16 else:
17 # Characters don't match, take maximum from previous states
18 dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
19
20 # Build the shortest common supersequence by backtracking through DP table
21 result = []
22 i, j = len1, len2
23
24 while i > 0 or j > 0:
25 if i == 0:
26 # Reached end of str1, add remaining characters from str2
27 j -= 1
28 result.append(str2[j])
29 elif j == 0:
30 # Reached end of str2, add remaining characters from str1
31 i -= 1
32 result.append(str1[i])
33 else:
34 # Both strings have remaining characters
35 if dp[i][j] == dp[i - 1][j]:
36 # Current character from str1 is not part of LCS
37 i -= 1
38 result.append(str1[i])
39 elif dp[i][j] == dp[i][j - 1]:
40 # Current character from str2 is not part of LCS
41 j -= 1
42 result.append(str2[j])
43 else:
44 # Characters match and are part of LCS, add once
45 i -= 1
46 j -= 1
47 result.append(str1[i])
48
49 # Reverse the result since we built it backwards
50 return ''.join(reversed(result))
51
1class Solution {
2 public String shortestCommonSupersequence(String str1, String str2) {
3 int len1 = str1.length();
4 int len2 = str2.length();
5
6 // dp[i][j] represents the length of LCS (Longest Common Subsequence)
7 // of str1[0...i-1] and str2[0...j-1]
8 int[][] dp = new int[len1 + 1][len2 + 1];
9
10 // Build the LCS table using dynamic programming
11 for (int i = 1; i <= len1; i++) {
12 for (int j = 1; j <= len2; j++) {
13 if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
14 // Characters match, extend the LCS
15 dp[i][j] = dp[i - 1][j - 1] + 1;
16 } else {
17 // Characters don't match, take the maximum LCS from previous states
18 dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
19 }
20 }
21 }
22
23 // Reconstruct the shortest common supersequence by backtracking through the DP table
24 int i = len1;
25 int j = len2;
26 StringBuilder result = new StringBuilder();
27
28 while (i > 0 || j > 0) {
29 if (i == 0) {
30 // Reached the end of str1, append remaining characters from str2
31 result.append(str2.charAt(--j));
32 } else if (j == 0) {
33 // Reached the end of str2, append remaining characters from str1
34 result.append(str1.charAt(--i));
35 } else {
36 // Check where the current LCS value came from
37 if (dp[i][j] == dp[i - 1][j]) {
38 // LCS value came from top cell, character from str1 is not in LCS
39 result.append(str1.charAt(--i));
40 } else if (dp[i][j] == dp[i][j - 1]) {
41 // LCS value came from left cell, character from str2 is not in LCS
42 result.append(str2.charAt(--j));
43 } else {
44 // Characters match and are part of LCS, add once and move both pointers
45 result.append(str1.charAt(--i));
46 j--;
47 }
48 }
49 }
50
51 // Reverse the result since we built it backwards
52 return result.reverse().toString();
53 }
54}
55
1class Solution {
2public:
3 string shortestCommonSupersequence(string str1, string str2) {
4 int len1 = str1.size();
5 int len2 = str2.size();
6
7 // Build LCS (Longest Common Subsequence) table
8 // dp[i][j] represents the length of LCS of str1[0...i-1] and str2[0...j-1]
9 vector<vector<int>> dp(len1 + 1, vector<int>(len2 + 1, 0));
10
11 // Fill the LCS table using dynamic programming
12 for (int i = 1; i <= len1; ++i) {
13 for (int j = 1; j <= len2; ++j) {
14 if (str1[i - 1] == str2[j - 1]) {
15 // Characters match, extend the LCS
16 dp[i][j] = dp[i - 1][j - 1] + 1;
17 } else {
18 // Characters don't match, take maximum from previous states
19 dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
20 }
21 }
22 }
23
24 // Reconstruct the shortest common supersequence by backtracking through the LCS table
25 int i = len1;
26 int j = len2;
27 string result;
28
29 while (i > 0 || j > 0) {
30 if (i == 0) {
31 // Reached end of str1, add remaining characters from str2
32 result += str2[--j];
33 } else if (j == 0) {
34 // Reached end of str2, add remaining characters from str1
35 result += str1[--i];
36 } else {
37 // Check where the current LCS value came from
38 if (dp[i][j] == dp[i - 1][j]) {
39 // LCS value came from top cell, add character from str1
40 result += str1[--i];
41 } else if (dp[i][j] == dp[i][j - 1]) {
42 // LCS value came from left cell, add character from str2
43 result += str2[--j];
44 } else {
45 // Characters matched (diagonal move), add the common character once
46 result += str1[--i];
47 --j;
48 }
49 }
50 }
51
52 // Reverse the result since we built it backwards
53 reverse(result.begin(), result.end());
54 return result;
55 }
56};
57
1/**
2 * Finds the shortest common supersequence of two strings.
3 * A supersequence is a string that has both input strings as subsequences.
4 *
5 * @param str1 - First input string
6 * @param str2 - Second input string
7 * @returns The shortest string containing both str1 and str2 as subsequences
8 */
9function shortestCommonSupersequence(str1: string, str2: string): string {
10 const length1: number = str1.length;
11 const length2: number = str2.length;
12
13 // Create a 2D DP table to store the length of longest common subsequence
14 // dp[i][j] represents the LCS length of str1[0...i-1] and str2[0...j-1]
15 const dp: number[][] = Array.from({ length: length1 + 1 }, () =>
16 new Array<number>(length2 + 1).fill(0)
17 );
18
19 // Build the LCS table using dynamic programming
20 for (let i = 1; i <= length1; i++) {
21 for (let j = 1; j <= length2; j++) {
22 if (str1[i - 1] === str2[j - 1]) {
23 // Characters match: extend the LCS by 1
24 dp[i][j] = dp[i - 1][j - 1] + 1;
25 } else {
26 // Characters don't match: take the maximum LCS from previous states
27 dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
28 }
29 }
30 }
31
32 // Reconstruct the shortest common supersequence by backtracking through the DP table
33 const result: string[] = [];
34 let row: number = length1;
35 let col: number = length2;
36
37 while (row > 0 || col > 0) {
38 if (row === 0) {
39 // Reached the top edge: add remaining characters from str2
40 result.push(str2[--col]);
41 } else if (col === 0) {
42 // Reached the left edge: add remaining characters from str1
43 result.push(str1[--row]);
44 } else {
45 if (dp[row][col] === dp[row - 1][col]) {
46 // Current LCS value came from above: add character from str1
47 result.push(str1[--row]);
48 } else if (dp[row][col] === dp[row][col - 1]) {
49 // Current LCS value came from left: add character from str2
50 result.push(str2[--col]);
51 } else {
52 // Characters matched: add the common character and move diagonally
53 result.push(str1[--row]);
54 col--;
55 }
56 }
57 }
58
59 // Reverse the result since we built it backwards
60 return result.reverse().join('');
61}
62
Time and Space Complexity
Time Complexity: O(m * n + m + n)
which simplifies to O(m * n)
The algorithm consists of two main phases:
-
Building the DP table: The nested loops iterate through all cells of the
f
table, where each cell operation takesO(1)
time. This results inO(m * n)
time complexity. -
Reconstructing the shortest common supersequence: The while loop backtracks through the DP table from position
(m, n)
to(0, 0)
. In the worst case, we traverse at mostm + n
steps (moving either up or left in each iteration). Each operation within the loop takesO(1)
time. This phase hasO(m + n)
time complexity.
The overall time complexity is O(m * n) + O(m + n) = O(m * n)
since m * n
dominates for non-trivial inputs.
Space Complexity: O(m * n)
The space usage breaks down as follows:
-
DP table
f
: A 2D array of size(m + 1) × (n + 1)
storing the longest common subsequence lengths, requiringO(m * n)
space. -
Result list
ans
: In the worst case where the two strings have no common characters, the shortest common supersequence has lengthm + n
, requiringO(m + n)
space. -
Other variables:
i
,j
,m
,n
useO(1)
space.
The total space complexity is O(m * n) + O(m + n) + O(1) = O(m * n)
.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
Pitfall 1: Incorrect Backtracking Order
The Problem:
A common mistake is incorrectly checking the DP table conditions during backtracking. Specifically, developers often check dp[i][j] == dp[i][j-1]
before dp[i][j] == dp[i-1][j]
, which can lead to an incorrect supersequence that doesn't maintain the proper character order.
Why It Happens:
The order matters because when both dp[i-1][j]
and dp[i][j-1]
equal dp[i][j]
, we have a choice. The algorithm must consistently follow one path to ensure correctness.
Incorrect Implementation:
if dp[i][j] == dp[i][j - 1]: # Checking this first j -= 1 result.append(str2[j]) elif dp[i][j] == dp[i - 1][j]: # Checking this second i -= 1 result.append(str1[i])
Solution:
Always check dp[i-1][j]
first, then dp[i][j-1]
. This ensures a consistent traversal pattern:
if dp[i][j] == dp[i - 1][j]: i -= 1 result.append(str1[i]) elif dp[i][j] == dp[i][j - 1]: j -= 1 result.append(str2[j]) else: # Characters match (part of LCS) i -= 1 j -= 1 result.append(str1[i])
Pitfall 2: Off-by-One Errors in Index Management
The Problem:
When backtracking, it's easy to confuse whether to use the character at index i
or i-1
after decrementing. Since the DP table uses 1-based indexing for convenience but strings use 0-based indexing, this mismatch can cause bugs.
Incorrect Implementation:
# Wrong: Using index before decrementing result.append(str1[i - 1]) i -= 1
Solution: Always decrement the index first, then use it to access the string:
# Correct: Decrement first, then use i -= 1 result.append(str1[i]) # Now i is already decremented
Pitfall 3: Forgetting to Handle Empty String Cases
The Problem:
If either str1
or str2
is empty, the algorithm should simply return the other string. Not handling this edge case explicitly can lead to unnecessary computation or errors.
Solution: Add early return conditions:
def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
if not str1:
return str2
if not str2:
return str1
# Continue with the main algorithm...
Pitfall 4: Building Result in Wrong Direction
The Problem:
During backtracking, we traverse from (m, n)
to (0, 0)
, which means we're building the result string backwards. Forgetting to reverse the final result is a common oversight.
Incorrect Implementation:
return ''.join(result) # Forgot to reverse!
Solution: Always reverse the result before returning:
return ''.join(reversed(result))
# or
return ''.join(result[::-1])
In a binary min heap, the minimum element can be found in:
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Don’t Miss This!