583. Delete Operation for Two Strings


Problem Description

The problem entails figuring out the least number of steps necessary to equalize two strings, word1 and word2. A single step consists of deleting one character from either one of the strings. This procedure is repeated until both strings are identical. The goal is to identify the minimum number of such deletions to make the two strings match.

Intuition

The intuition behind solving this task lies in the concept of finding the longest common subsequence (LCS) between the two strings. The LCS is the longest sequence of characters that appear in both strings in the same order, possibly with other characters in between. Once we find the LCS, we know that the characters not part of the LCS need to be deleted to make the strings the same.

To find the LCS, we can use dynamic programming. The main idea is to construct a 2D array dp where each element dp[i][j] represents the length of the LCS between word1 up to index i and word2 up to index j. If the characters at word1[i] and word2[j] match, it means we can extend the LCS by one more character, so we take the LCS up to word1[i-1] and word2[j-1]. If they do not match, we take the longer of the LCSs without the current character of either word1 or word2. The edges of the array are initialized to the index values, representing the need to delete all previous characters in a string to match an empty string.

Once the dp array is fully populated, the value at dp[m][n] (where m and n are the lengths of word1 and word2, respectively) will give us the length of the LCS. The minimum number of deletions required is then the sum of the lengths of the two input strings minus twice the length of the LCS, which accounts for every character not in the LCS needing to be deleted from both strings.

Learn more about Dynamic Programming patterns.

Not Sure What to Study? Take the 2-min Quiz to Find Your Missing Piece

Which type of traversal does breadth first search do?

Solution Approach

The solution adopts a dynamic programming approach to efficiently compute the number of necessary deletion steps. Dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems and storing the results of those subproblems to avoid redundant computations.

Here is the breakdown of the implementation:

  1. Initialize the 2D array dp with dimensions (m+1) x (n+1), where m is the length of word1 and n is the length of word2. This array will hold the lengths of LCS for different substrings of word1 and word2. We add 1 to include the empty string cases as well.

  2. Fill the first row and first column of dp with increasing indices. This takes into account that converting any string to an empty string requires a number of deletions equal to the length of the string.

  3. Loop through the matrix starting at dp[1][1] and decide for each pair (i, j):

    • If word1[i - 1] is equal to word2[j - 1], then dp[i][j] is set to dp[i - 1][j - 1] as the characters match and do not need to be deleted (we extend the LCS).
    • If the characters do not match, we look at two scenarios: deleting the last character from word1 or word2. The value of dp[i][j] is set to the minimum of the two possibilities plus one (for the deletion step): dp[i][j] = 1 + min(dp[i - 1][j], dp[i][j - 1]).
  4. After completely filling in the dp table, the value at dp[m][n] is the minimum number of deletion steps necessary for the two words to become the same. This is because dp[m][n] represents the size of the LCS, and by subtracting this from the total length of the two strings, we get the number of characters not part of the LCS, which is exactly the number of deletions needed.

  5. Return dp[m][n] as the solution.

The algorithm uses a 2D array for dynamic programming, which is a common data structure for storing intermediary results in dynamic programming tasks. The pattern used is to build up the solution from the smallest subproblems (empty strings) to the full problem by adding one character at a time and determining what the best decision is at each step.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Consider the classic dynamic programming of fibonacci numbers, what is the recurrence relation?

Example Walkthrough

Let's walk through the solution approach with a small example where word1 = "sea" and word2 = "eat". The goal is to find the minimum number of deletions to make word1 equal to word2.

  1. Initialize the 2D array dp. Given that m = 3 (the length of "sea") and n = 3 (the length of "eat"), the dp array will have dimensions (4 x 4) to include the empty string cases.

  2. Fill the first row and column with increasing indices, reflecting the deletion count to match an empty string.

    1dp = [
    2    [0, 1, 2, 3],  // "" -> ""
    3    [1, 0, 0, 0],  // "s" -> ""
    4    [2, 0, 0, 0],  // "se" -> ""
    5    [3, 0, 0, 0]   // "sea" -> ""
    6]
  3. Loop through the dp array and fill it following the rules:

    • If word1[i - 1] equals word2[j - 1], set dp[i][j] to dp[i - 1][j - 1].
    • If they do not match, set dp[i][j] to the minimum of dp[i - 1][j] and dp[i][j - 1] plus one.
    • Following this logic, the array is updated as shown below (explanations in comments):
    1dp = [
    2    [0, 1, 2, 3],                // Starting row and column
    3    [1, 1, 2, 2],                // "s" -> "e" or "" -> "e", "s" -> "ea"
    4    [2, 2, 1, 2],                // "se" -> "e", "se" -> "ea" or "s" -> "ea"
    5    [3, 2, 2, (1 for 'ea')]     // "sea" -> "e" or "s" -> "e", "sea" -> "ea"  : 'a' matches
    6]

    The last cell is '1' since 'a' at the end of both words matches, so we carry over the length of LCS without 'a', which was 0 (since "se" -> "e" required no LCS extensions), and add 1 for the common 'a'.

  4. After filling in the dp table, the last cell dp[3][3] now contains the length of the LCS. This represents that the strings can become the same with a minimum of 2 deletion steps (m + n - 2 * dp[3][3] = 3 + 3 - 2 * 1 = 4).

  5. The answer is 4, which means that 4 deletions are necessary for word1 and word2 to become the same.

In conclusion, using dynamic programming, the algorithm iteratively finds the longest common subsequence and uses its length to calculate the minimal number of deletions required to equalize the provided strings.

Solution Implementation

1class Solution:
2    def minDistance(self, word1: str, word2: str) -> int:
3        # Get the lengths of both words
4        length_word1, length_word2 = len(word1), len(word2)
5      
6        # Initialize the DP table with dimensions (length_word1+1) x (length_word2+1)
7        dp_table = [[0] * (length_word2 + 1) for _ in range(length_word1 + 1)]
8      
9        # Base cases: fill out the first row and column of the DP table
10        for i in range(1, length_word1 + 1):
11            dp_table[i][0] = i  # Distance of converting word1 to an empty string
12        for j in range(1, length_word2 + 1):
13            dp_table[0][j] = j  # Distance of converting an empty string to word2
14      
15        # Fill out the DP table
16        for i in range(1, length_word1 + 1):
17            for j in range(1, length_word2 + 1):
18                # If the characters are the same, no operation is required
19                # Take the previous state's value
20                if word1[i - 1] == word2[j - 1]:
21                    dp_table[i][j] = dp_table[i - 1][j - 1]
22                # If the characters are different, consider the minimum of
23                # (deletion from word1, insertion into word1)
24                # Add 1 representing the one operation needed
25                else:
26                    dp_table[i][j] = 1 + min(dp_table[i - 1][j],  # Delete character from word1
27                                              dp_table[i][j - 1]) # Insert character into word1
28
29        # The answer is in the cell (length_word1, length_word2)
30        return dp_table[-1][-1]
31
1class Solution {
2    public int minDistance(String word1, String word2) {
3        int lenWord1 = word1.length();
4        int lenWord2 = word2.length();
5
6        // 'dp' table where 'dp[i][j]' will be the edit distance of first 'i' characters of 'word1' and first 'j' characters of 'word2'
7        int[][] dp = new int[lenWord1 + 1][lenWord2 + 1];
8
9        // Initialize the first column, which represents the edits required to convert 'word1' substrings into an empty 'word2'
10        for (int i = 1; i <= lenWord1; ++i) {
11            dp[i][0] = i;
12        }
13
14        // Initialize the first row, which represents the edits required to convert an empty 'word1' into substrings of 'word2'
15        for (int j = 1; j <= lenWord2; ++j) {
16            dp[0][j] = j;
17        }
18
19        // Fill in the rest of the DP table
20        for (int i = 1; i <= lenWord1; ++i) {
21            for (int j = 1; j <= lenWord2; ++j) {
22                // If the current characters of 'word1' and 'word2' match, no additional edit is required, take the diagonal value
23                if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
24                    dp[i][j] = dp[i - 1][j - 1];
25                } else {
26                    // If the characters don't match, consider the minimum of the cell to the left or above plus one for the edit
27                    dp[i][j] = 1 + Math.min(dp[i - 1][j],     // Deletion
28                                            dp[i][j - 1]);    // Insertion
29                }
30            }
31        }
32
33        // The edit distance between 'word1' and 'word2' is in the bottom-right corner of the DP table
34        return dp[lenWord1][lenWord2];
35    }
36}
37
1class Solution {
2public:
3    // Function to find the minimum distance between two strings.
4    int minDistance(string word1, string word2) {
5        int length1 = word1.size(); // Length of the first string
6        int length2 = word2.size(); // Length of the second string
7
8        // 2D vector to memorize distances (Dynamic Programming table)
9        vector<vector<int>> distanceMatrix(length1 + 1, vector<int>(length2 + 1));
10      
11        // Initialize the first column (all the distances from an empty string to prefixes of word1)
12        for (int i = 1; i <= length1; ++i) {
13            distanceMatrix[i][0] = i;
14        }
15
16        // Initialize the first row (all the distances from an empty string to prefixes of word2)
17        for (int j = 1; j <= length2; ++j) {
18            distanceMatrix[0][j] = j;
19        }
20
21        // Compute distances using the bottom-up approach
22        for (int i = 1; i <= length1; ++i) {
23            for (int j = 1; j <= length2; ++j) {
24                // If characters at current positions in both strings are the same
25                if (word1[i - 1] == word2[j - 1]) {
26                    // No operation is needed, the distance is the same as for the previous characters
27                    distanceMatrix[i][j] = distanceMatrix[i - 1][j - 1];
28                } else {
29                    // Characters are different, consider the minimum of the operations:
30                    // (Replace) Take the previous distance where both strings have one less character
31                    // (Add/Delete) Take the min of either deleting a character from word1 or adding a character to word2
32                    distanceMatrix[i][j] = 1 + min(distanceMatrix[i - 1][j], distanceMatrix[i][j - 1]);
33                }
34            }
35        }
36
37        // The distance from word1 to word2 is stored in the last cell
38        return distanceMatrix[length1][length2];
39    }
40};
41
1function minDistance(word1: string, word2: string): number {
2    // Lengths of both words
3    const word1Length = word1.length;
4    const word2Length = word2.length;
5
6    // Initialize a 2D array for dynamic programming
7    const dpMatrix = Array.from({ length: word1Length + 1 }, () => Array(word2Length + 1).fill(0));
8
9    // Build up the dpMatrix with the lengths of longest common subsequence for substrings
10    for (let i = 1; i <= word1Length; i++) {
11        for (let j = 1; j <= word2Length; j++) {
12            if (word1[i - 1] === word2[j - 1]) {
13                // Characters match, take the diagonal value and add 1
14                dpMatrix[i][j] = dpMatrix[i - 1][j - 1] + 1;
15            } else {
16                // Characters do not match, take the max value from left or top cell
17                dpMatrix[i][j] = Math.max(dpMatrix[i - 1][j], dpMatrix[i][j - 1]);
18            }
19        }
20    }
21
22    // Length of the longest common subsequence
23    const longestCommonSubsequence = dpMatrix[word1Length][word2Length];
24
25    // Minimum number of operations required
26    // Subtract the length of the longest common subsequence from the total length of both strings
27    return word1Length - longestCommonSubsequence + word2Length - longestCommonSubsequence;
28}
29
Not Sure What to Study? Take the 2-min Quiz

Which of these properties could exist for a graph but not a tree?

Time and Space Complexity

The given Python code implements the solution to find the minimum number of operations needed to convert word1 to word2, where the operations can be insertions, deletions, or substitutions of characters.

Time Complexity

The time complexity of the code can be analyzed based on the nested for loops:

  • The outer loop runs m times, where m is the length of word1.
  • The inner loop runs n times for each iteration of the outer loop, where n is the length of word2.

Because each cell in the dp array is computed once, the total number of computations is proportional to the product of m and n. Therefore, the time complexity of the code is O(m * n).

Space Complexity

The space complexity is determined by the size of the dp array used in the dynamic programming approach:

  • The dp array is a 2D array with dimensions (m + 1) x (n + 1).

Therefore, the amount of space used is proportional to the product of (m + 1) and (n + 1). Though the +1 is constant and does not affect the asymptotic complexity, it accounts for the extra row and column needed for the base cases. Consequently, the space complexity of the code is also O(m * n).

Learn more about how to find time and space complexity quickly using problem constraints.

Fast Track Your Learning with Our Quick Skills Quiz:

A heap is a ...?


Recommended Readings


Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.

←
↑TA đŸ‘šâ€đŸ«