583. Delete Operation for Two Strings
Problem Description
The problem entails figuring out the least number of steps necessary to equalize two strings, word1
and word2
. A single step consists of deleting one character from either one of the strings. This procedure is repeated until both strings are identical. The goal is to identify the minimum number of such deletions to make the two strings match.
Intuition
The intuition behind solving this task lies in the concept of finding the longest common subsequence (LCS) between the two strings. The LCS is the longest sequence of characters that appear in both strings in the same order, possibly with other characters in between. Once we find the LCS, we know that the characters not part of the LCS need to be deleted to make the strings the same.
To find the LCS, we can use dynamic programming. The main idea is to construct a 2D array dp
where each element dp[i][j]
represents the length of the LCS between word1
up to index i
and word2
up to index j
. If the characters at word1[i]
and word2[j]
match, it means we can extend the LCS by one more character, so we take the LCS up to word1[i-1]
and word2[j-1]
. If they do not match, we take the longer of the LCSs without the current character of either word1
or word2
. The edges of the array are initialized to the index values, representing the need to delete all previous characters in a string to match an empty string.
Once the dp
array is fully populated, the value at dp[m][n]
(where m
and n
are the lengths of word1
and word2
, respectively) will give us the length of the LCS. The minimum number of deletions required is then the sum of the lengths of the two input strings minus twice the length of the LCS, which accounts for every character not in the LCS needing to be deleted from both strings.
Learn more about Dynamic Programming patterns.
Solution Approach
The solution adopts a dynamic programming approach to efficiently compute the number of necessary deletion steps. Dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems and storing the results of those subproblems to avoid redundant computations.
Here is the breakdown of the implementation:
-
Initialize the 2D array
dp
with dimensions(m+1) x (n+1)
, wherem
is the length ofword1
andn
is the length ofword2
. This array will hold the lengths of LCS for different substrings ofword1
andword2
. We add 1 to include the empty string cases as well. -
Fill the first row and first column of
dp
with increasing indices. This takes into account that converting any string to an empty string requires a number of deletions equal to the length of the string. -
Loop through the matrix starting at
dp[1][1]
and decide for each pair(i, j)
:- If
word1[i - 1]
is equal toword2[j - 1]
, thendp[i][j]
is set todp[i - 1][j - 1]
as the characters match and do not need to be deleted (we extend the LCS). - If the characters do not match, we look at two scenarios: deleting the last character from
word1
orword2
. The value ofdp[i][j]
is set to the minimum of the two possibilities plus one (for the deletion step):dp[i][j] = 1 + min(dp[i - 1][j], dp[i][j - 1])
.
- If
-
After completely filling in the
dp
table, the value atdp[m][n]
is the minimum number of deletion steps necessary for the two words to become the same. This is becausedp[m][n]
represents the size of the LCS, and by subtracting this from the total length of the two strings, we get the number of characters not part of the LCS, which is exactly the number of deletions needed. -
Return
dp[m][n]
as the solution.
The algorithm uses a 2D array for dynamic programming, which is a common data structure for storing intermediary results in dynamic programming tasks. The pattern used is to build up the solution from the smallest subproblems (empty strings) to the full problem by adding one character at a time and determining what the best decision is at each step.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution approach with a small example where word1 = "sea"
and word2 = "eat"
. The goal is to find the minimum number of deletions to make word1
equal to word2
.
-
Initialize the 2D array
dp
. Given thatm = 3
(the length of "sea") andn = 3
(the length of "eat"), thedp
array will have dimensions(4 x 4)
to include the empty string cases. -
Fill the first row and column with increasing indices, reflecting the deletion count to match an empty string.
dp = [ [0, 1, 2, 3], // "" -> "" [1, 0, 0, 0], // "s" -> "" [2, 0, 0, 0], // "se" -> "" [3, 0, 0, 0] // "sea" -> "" ]
-
Loop through the
dp
array and fill it following the rules:- If
word1[i - 1]
equalsword2[j - 1]
, setdp[i][j]
todp[i - 1][j - 1]
. - If they do not match, set
dp[i][j]
to the minimum ofdp[i - 1][j]
anddp[i][j - 1]
plus one. - Following this logic, the array is updated as shown below (explanations in comments):
dp = [ [0, 1, 2, 3], // Starting row and column [1, 1, 2, 2], // "s" -> "e" or "" -> "e", "s" -> "ea" [2, 2, 1, 2], // "se" -> "e", "se" -> "ea" or "s" -> "ea" [3, 2, 2, (1 for 'ea')] // "sea" -> "e" or "s" -> "e", "sea" -> "ea" : 'a' matches ]
The last cell is '1' since 'a' at the end of both words matches, so we carry over the length of LCS without 'a', which was 0 (since "se" -> "e" required no LCS extensions), and add 1 for the common 'a'.
- If
-
After filling in the
dp
table, the last celldp[3][3]
now contains the length of the LCS. This represents that the strings can become the same with a minimum of 2 deletion steps (m + n - 2 * dp[3][3]
=3 + 3 - 2 * 1
=4
). -
The answer is 4, which means that 4 deletions are necessary for
word1
andword2
to become the same.
In conclusion, using dynamic programming, the algorithm iteratively finds the longest common subsequence and uses its length to calculate the minimal number of deletions required to equalize the provided strings.
Solution Implementation
1class Solution:
2 def minDistance(self, word1: str, word2: str) -> int:
3 # Get the lengths of both words
4 length_word1, length_word2 = len(word1), len(word2)
5
6 # Initialize the DP table with dimensions (length_word1+1) x (length_word2+1)
7 dp_table = [[0] * (length_word2 + 1) for _ in range(length_word1 + 1)]
8
9 # Base cases: fill out the first row and column of the DP table
10 for i in range(1, length_word1 + 1):
11 dp_table[i][0] = i # Distance of converting word1 to an empty string
12 for j in range(1, length_word2 + 1):
13 dp_table[0][j] = j # Distance of converting an empty string to word2
14
15 # Fill out the DP table
16 for i in range(1, length_word1 + 1):
17 for j in range(1, length_word2 + 1):
18 # If the characters are the same, no operation is required
19 # Take the previous state's value
20 if word1[i - 1] == word2[j - 1]:
21 dp_table[i][j] = dp_table[i - 1][j - 1]
22 # If the characters are different, consider the minimum of
23 # (deletion from word1, insertion into word1)
24 # Add 1 representing the one operation needed
25 else:
26 dp_table[i][j] = 1 + min(dp_table[i - 1][j], # Delete character from word1
27 dp_table[i][j - 1]) # Insert character into word1
28
29 # The answer is in the cell (length_word1, length_word2)
30 return dp_table[-1][-1]
31
1class Solution {
2 public int minDistance(String word1, String word2) {
3 int lenWord1 = word1.length();
4 int lenWord2 = word2.length();
5
6 // 'dp' table where 'dp[i][j]' will be the edit distance of first 'i' characters of 'word1' and first 'j' characters of 'word2'
7 int[][] dp = new int[lenWord1 + 1][lenWord2 + 1];
8
9 // Initialize the first column, which represents the edits required to convert 'word1' substrings into an empty 'word2'
10 for (int i = 1; i <= lenWord1; ++i) {
11 dp[i][0] = i;
12 }
13
14 // Initialize the first row, which represents the edits required to convert an empty 'word1' into substrings of 'word2'
15 for (int j = 1; j <= lenWord2; ++j) {
16 dp[0][j] = j;
17 }
18
19 // Fill in the rest of the DP table
20 for (int i = 1; i <= lenWord1; ++i) {
21 for (int j = 1; j <= lenWord2; ++j) {
22 // If the current characters of 'word1' and 'word2' match, no additional edit is required, take the diagonal value
23 if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
24 dp[i][j] = dp[i - 1][j - 1];
25 } else {
26 // If the characters don't match, consider the minimum of the cell to the left or above plus one for the edit
27 dp[i][j] = 1 + Math.min(dp[i - 1][j], // Deletion
28 dp[i][j - 1]); // Insertion
29 }
30 }
31 }
32
33 // The edit distance between 'word1' and 'word2' is in the bottom-right corner of the DP table
34 return dp[lenWord1][lenWord2];
35 }
36}
37
1class Solution {
2public:
3 // Function to find the minimum distance between two strings.
4 int minDistance(string word1, string word2) {
5 int length1 = word1.size(); // Length of the first string
6 int length2 = word2.size(); // Length of the second string
7
8 // 2D vector to memorize distances (Dynamic Programming table)
9 vector<vector<int>> distanceMatrix(length1 + 1, vector<int>(length2 + 1));
10
11 // Initialize the first column (all the distances from an empty string to prefixes of word1)
12 for (int i = 1; i <= length1; ++i) {
13 distanceMatrix[i][0] = i;
14 }
15
16 // Initialize the first row (all the distances from an empty string to prefixes of word2)
17 for (int j = 1; j <= length2; ++j) {
18 distanceMatrix[0][j] = j;
19 }
20
21 // Compute distances using the bottom-up approach
22 for (int i = 1; i <= length1; ++i) {
23 for (int j = 1; j <= length2; ++j) {
24 // If characters at current positions in both strings are the same
25 if (word1[i - 1] == word2[j - 1]) {
26 // No operation is needed, the distance is the same as for the previous characters
27 distanceMatrix[i][j] = distanceMatrix[i - 1][j - 1];
28 } else {
29 // Characters are different, consider the minimum of the operations:
30 // (Replace) Take the previous distance where both strings have one less character
31 // (Add/Delete) Take the min of either deleting a character from word1 or adding a character to word2
32 distanceMatrix[i][j] = 1 + min(distanceMatrix[i - 1][j], distanceMatrix[i][j - 1]);
33 }
34 }
35 }
36
37 // The distance from word1 to word2 is stored in the last cell
38 return distanceMatrix[length1][length2];
39 }
40};
41
1function minDistance(word1: string, word2: string): number {
2 // Lengths of both words
3 const word1Length = word1.length;
4 const word2Length = word2.length;
5
6 // Initialize a 2D array for dynamic programming
7 const dpMatrix = Array.from({ length: word1Length + 1 }, () => Array(word2Length + 1).fill(0));
8
9 // Build up the dpMatrix with the lengths of longest common subsequence for substrings
10 for (let i = 1; i <= word1Length; i++) {
11 for (let j = 1; j <= word2Length; j++) {
12 if (word1[i - 1] === word2[j - 1]) {
13 // Characters match, take the diagonal value and add 1
14 dpMatrix[i][j] = dpMatrix[i - 1][j - 1] + 1;
15 } else {
16 // Characters do not match, take the max value from left or top cell
17 dpMatrix[i][j] = Math.max(dpMatrix[i - 1][j], dpMatrix[i][j - 1]);
18 }
19 }
20 }
21
22 // Length of the longest common subsequence
23 const longestCommonSubsequence = dpMatrix[word1Length][word2Length];
24
25 // Minimum number of operations required
26 // Subtract the length of the longest common subsequence from the total length of both strings
27 return word1Length - longestCommonSubsequence + word2Length - longestCommonSubsequence;
28}
29
Time and Space Complexity
The given Python code implements the solution to find the minimum number of operations needed to convert word1
to word2
, where the operations can be insertions, deletions, or substitutions of characters.
Time Complexity
The time complexity of the code can be analyzed based on the nested for
loops:
- The outer loop runs
m
times, wherem
is the length ofword1
. - The inner loop runs
n
times for each iteration of the outer loop, wheren
is the length ofword2
.
Because each cell in the dp
array is computed once, the total number of computations is proportional to the product of m
and n
. Therefore, the time complexity of the code is O(m * n)
.
Space Complexity
The space complexity is determined by the size of the dp
array used in the dynamic programming approach:
- The
dp
array is a 2D array with dimensions(m + 1) x (n + 1)
.
Therefore, the amount of space used is proportional to the product of (m + 1)
and (n + 1)
. Though the +1
is constant and does not affect the asymptotic complexity, it accounts for the extra row and column needed for the base cases. Consequently, the space complexity of the code is also O(m * n)
.
Learn more about how to find time and space complexity quickly using problem constraints.
How many ways can you arrange the three letters A, B and C?
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!