2060. Check if an Original String Exists Given Two Encoded Strings


Problem Description

The LeetCode problem involves finding whether two given encoded strings could have originated from the same raw unencoded string. An original string is composed of lowercase English letters and can be encoded by following a procedure that involves splitting the string into a sequence of non-empty substrings, replacing chosen elements with their length, and then concatenating the sequence to form the encoded string.

For instance, the string "abcdefghijklmnop" could be split into ["ab", "cdefghijklmn", "o", "p"], and by replacing some elements with their lengths, we might get ["ab", "12", "1", "p"], which when concatenated, gives the encoded string "ab121p".

The challenge with this problem is to discern whether we can backtrack from two given encoded strings, which include digits (representing the lengths of substrings in the original string) alongside letters, and determine if it is possible that they were encoded from the same original string.

This task requires a method to compare segments of the encoded strings and the numerical values, taking into account different possible segmentations of the original string and the ambiguity of numerical representation — a digit sequence like '123' can represent a single segment of length 123 or a sequence of segments with lengths 1, 2, and 3, for example.

Intuition

For this problem, dynamic programming is a natural approach to tackle the complexity of matching parts of two encoded strings. The intuition is to process each string character by character, and at each step, consider all possible encodings that could have led to the current segments of both strings.

A 2D dynamic programming array dp is utilized where dp[i][j] is a set of integers representing the possible differences in lengths between the first i characters of s1 and the first j characters of s2. A positive number means s2 is longer, while a negative means s1 is longer. If zero is in the set, it means the substrings could match exactly up to that point.

We iterate over each character of s1 and s2, and at each iteration, we try to extend the matching in three different ways:

  1. If we encounter a digit in s1 or s2, we form numbers by concatenating up to three consecutive digits (since the problem ensures there won't be more than three consecutive digits) and adjust the length difference accordingly.

  2. If the current character in s1 or s2 is a digit and the other string has a letter at the corresponding position, we assume that the digit is encoding a segment of letters and update the length difference by adding or subtracting 1, respectively.

  3. If we have a letter on both strings and the current length difference is zero, meaning we are aligned, and the letters match, we continue to the next position keeping the length difference zero.

The recursive relationships and the usage of a set to maintain possible length differences at each step allow us to keep track of all viable ways in which the strings could be aligned or misaligned.

In the end, we check if there's a zero in dp[n][m] set because this implies that there exists a way to segment and match the strings such that they could originate from the same encoded string.

Solution Approach

The solution follows a dynamic programming approach to handle the complexity of varying segment lengths and the ambiguity inherent in the encoded strings. Here's how the provided TypeScript solution implements this strategy.

Data Structures Used:

  • A two-dimensional dynamic programming array dp, where each element dp[i][j] is a set of integers. This set represents all the possible differences in lengths (deltas) between the first i characters of s1 and the first j characters of s2.

Algorithms/Patterns Used:

  1. Initializing the DP Table: The dp array is initialized so that dp[0][0] contains only the value 0, meaning that there's no difference in length when no characters from s1 or s2 have been considered.

  2. Nested Loops for DP Table Filling: There are two nested loops—one for index i (which goes through s1) and one for index j (which goes through s2). These loops iterate through each character position in both strings.

  3. Handling Digits and Characters:

    • If a digit is found in s1, and the current delta is non-positive (meaning s1 is not longer than s2), we keep track of any number formed by up to three consecutive digits and add this number to the delta, storing the new delta in the appropriate dp cell.
    • The same process applies for s2, except we subtract the number from the delta if a digit is encountered and the current delta is non-negative.
  4. Updating Deltas When Matching Digits with Characters:

    • If a digit in s1 corresponds to a character in s2 (when the delta is negative), we increment the delta by 1, and store it in the dp index that corresponds to moving one character forward in s1.
    • Similarly, if a digit in s2 corresponds to a character in s1 (when the delta is positive), we decrement the delta by 1, and store it in the dp index that corresponds to moving one character forward in s2.
  5. Matching Characters:

    • When both s1[i] and s2[j] are characters and the deltas are zero, we check if the characters match. If they do, we carry over the zero delta to the next indices, meaning the match can continue.
  6. Checking for a Match: After filling in the DP table, we check dp[n][m] for a zero value. If zero is present, it indicates that there is at least one valid way to segment the original string to match both encoded strings.

Helper Functions:

  • isDigit: A small helper function that checks if a given character is a digit using a regular expression.

The provided solution utilizes these patterns systematically to account for every possible segmentation and encoding that could result from the original string, ensuring that all cases are covered. The dynamic programming set construction avoids recalculating possibilities for each substring pair, thus optimizing the solution.

💪
Level Up Your
Algo Skills

Example Walkthrough

Let's consider two encoded strings s1 = "a3b" and s2 = "4ab" and walk through the solution approach to understand if they could have been encoded from the same original string.

  1. Initialize the DP table: We set dp[0][0] to contain {0} as initially, there are no characters considered, so the difference in lengths is zero.

  2. First Iteration:

    • i = 0, j = 0: s1[i] is 'a' and s2[j] is '4'. Since s1[i] is a character and s2[j] is a digit, and the current set at dp[i][j] is {0}, we can't directly compare them. We interpret '4' as the length of a segment, which means s2 is implied to be longer by 4 characters. So we increment j to process the next character in s2, updating dp[i][j+1] to {4}.
  3. Second Iteration:

    • i = 0, j = 1: Now, s2[j] is 'a', and the delta from dp[i][j] is {4}. We can match the characters only if the delta were zero. However, we have another digit '3' in s1. Since we interpret this digit as the length of a segment and we have a delta of 4, we can match three characters of s1 with three characters of the length in s2, reducing the delta by 3. We update dp[i+1][j] (dp[1][1]) to {1} which now accounts for the remaining single character difference that has not been matched.
  4. Third Iteration:

    • i = 1, j = 1: With s1[i] = '3' and s2[j] = 'a', the delta is {1} at dp[1][1]. Since the delta is positive, we can match one a character from s2 with a supposed single character from s1 encoded as '3'. The delta becomes 0 because one character is matched from both s1 and s2. We update dp[i][j+1] (dp[1][2]) to {0}.
  5. Fourth Iteration:

    • i = 1, j = 2: Now, both s1[i] and s2[j] represent characters, '3' and 'b' respectively, with a delta of {0}. Since '3' is not a character, this iteration doesn't directly involve a character match. Because '3' in s1 may represent several characters encoded, we assume it represents at least one letter and decrement it by 1, assuming we're matching it with 'b' in s2, which increments i by 1. We add {1} to dp[2][3] to represent this possible scenario.
  6. Fifth Iteration:

    • i = 2, j = 3: In this step, s1[i] is b while s2 is at the end. We have a set {1} at dp[2][3], which implies an unmatched character in s1, so we cannot proceed further.

Upon completion of iterating through the characters:

  • We check dp[n][m] (dp[3][4] in this case) for the presence of zero. A zero would indicate there is a matching segmentation of original string for s1 and s2. Since there's no such value in the last cell, the given strings s1 and s2 cannot be matched, and hence, it is not possible that they were encoded from the same original string.

Python Solution

1def possibly_equals(s1: str, s2: str) -> bool:
2    length_s1 = len(s1)
3    length_s2 = len(s2)
4    # Initialize a 2D list of sets to use as a dynamic programming table
5    dp = [[set() for _ in range(length_s2 + 1)] for _ in range(length_s1 + 1)]
6
7    # Initialize the first cell of the dynamic programming table
8    dp[0][0].add(0)
9
10    # Iterate through all positions in both strings
11    for i in range(length_s1 + 1):
12        for j in range(length_s2 + 1):
13            for delta in dp[i][j]:
14                # When s1 contains a number
15                number = 0
16                if delta <= 0:
17                    for p in range(i, min(i + 3, length_s1)):
18                        if s1[p].isdigit():
19                            number = number * 10 + int(s1[p])
20                            dp[p + 1][j].add(delta + number)
21                        else:
22                            # If it's not a digit, stop the sequence
23                            break
24              
25                # When s2 contains a number
26                number = 0
27                if delta >= 0:
28                    for q in range(j, min(j + 3, length_s2)):
29                        if s2[q].isdigit():
30                            number = number * 10 + int(s2[q])
31                            dp[i][q + 1].add(delta - number)
32                        else:
33                            # If it's not a digit, stop the sequence
34                            break
35              
36                # Assuming that the numbers might represent a single character
37                # in either s1 or s2 if the next char is a letter
38                if i < length_s1 and delta < 0 and not s1[i].isdigit():
39                    dp[i + 1][j].add(delta + 1)
40                if j < length_s2 and delta > 0 and not s2[j].isdigit():
41                    dp[i][j + 1].add(delta - 1)
42              
43                # Check if current characters in both strings match
44                # and have equal length (delta is zero)
45                if i < length_s1 and j < length_s2 and delta == 0 and s1[i] == s2[j]:
46                    dp[i + 1][j + 1].add(0)
47
48    # Return true if a match is found at the end of both strings
49    # with no remaining length difference
50    return 0 in dp[length_s1][length_s2]
51
52# Utility function to check if a character is a digit
53def is_digit(char: str) -> bool:
54    return char.isdigit()
55

Java Solution

1import java.util.Set;
2import java.util.HashSet;
3
4public class Solution {
5    // Function to determine if two strings possibly match, by accounting for numbers in strings that can represent variable lengths
6    public boolean possiblyEquals(String s1, String s2) {
7        int lengthS1 = s1.length();
8        int lengthS2 = s2.length();
9      
10        // Create dynamic programming table, where each cell contains a set of integers representing possible length differences
11        Set<Integer>[][] dp = new HashSet[lengthS1 + 1][lengthS2 + 1];
12        for (int i = 0; i <= lengthS1; ++i) {
13            for (int j = 0; j <= lengthS2; ++j) {
14                dp[i][j] = new HashSet<>();
15            }
16        }
17
18        // Initialize the first cell of the dynamic programming table
19        dp[0][0].add(0);
20
21        // Iterate through all positions in both strings
22        for (int i = 0; i <= lengthS1; i++) {
23            for (int j = 0; j <= lengthS2; j++) {
24                // Explore all possible length differences for the current cell
25                for (Integer delta : dp[i][j]) {
26                    // When s1 contains a number
27                    int number = 0;
28                    if (delta <= 0) {
29                        for (int p = i; p < Math.min(i + 3, lengthS1); p++) {
30                            if (Character.isDigit(s1.charAt(p))) {
31                                number = number * 10 + Character.getNumericValue(s1.charAt(p));
32                                dp[p + 1][j].add(delta + number);
33                            } else {
34                                // If it's not a digit, stop the sequence
35                                break;
36                            }
37                        }
38                    }
39
40                    // When s2 contains a number
41                    number = 0;
42                    if (delta >= 0) {
43                        for (int q = j; q < Math.min(j + 3, lengthS2); q++) {
44                            if (Character.isDigit(s2.charAt(q))) {
45                                number = number * 10 + Character.getNumericValue(s2.charAt(q));
46                                dp[i][q + 1].add(delta - number);
47                            } else {
48                                // If it's not a digit, stop the sequence
49                                break;
50                            }
51                        }
52                    }
53
54                    // If there is a number in s1 and the next character in s1 is a letter,
55                    // it is assumed that the number in s1 might represent a single character
56                    if (i < lengthS1 && delta < 0 && !Character.isDigit(s1.charAt(i))) {
57                        dp[i + 1][j].add(delta + 1);
58                    }
59
60                    // If there is a number in s2 and the next character in s2 is a letter,
61                    // it is assumed that the number in s2 might represent a single character
62                    if (j < lengthS2 && delta > 0 && !Character.isDigit(s2.charAt(j))) {
63                        dp[i][j + 1].add(delta - 1);
64                    }
65
66                    // Check if the current characters in both strings match and have equal length difference (delta is zero)
67                    if (i < lengthS1 && j < lengthS2 && delta == 0 && s1.charAt(i) == s2.charAt(j)) {
68                        dp[i + 1][j + 1].add(0);
69                    }
70                }
71            }
72        }
73
74        // Return true if a match is found at the end of both strings with no remaining length difference
75        return dp[lengthS1][lengthS2].contains(0);
76    }
77
78    // Helper function to check if a character is a digit (actually not needed as Java has Character.isDigit)
79    private boolean isDigit(char ch) {
80        return Character.isDigit(ch);
81    }
82}
83

C++ Solution

1#include <vector>
2#include <string>
3#include <set>
4#include <algorithm>
5#include <cctype>
6
7// Function to check if a given character is a digit
8bool isDigit(char ch) {
9    return std::isdigit(static_cast<unsigned char>(ch)) != 0;
10}
11
12// Function to check if two strings possibly equals with variable-length block representations
13bool possiblyEquals(const std::string& s1, const std::string& s2) {
14    int lengthS1 = s1.length(), lengthS2 = s2.length();
15    std::vector<std::vector<std::set<int>>> dp(lengthS1 + 1, std::vector<std::set<int>>(lengthS2 + 1));
16
17    // Initialize the first cell of the dynamic programming table
18    dp[0][0].insert(0);
19
20    // Iterate through all positions in both strings
21    for (int i = 0; i <= lengthS1; ++i) {
22        for (int j = 0; j <= lengthS2; ++j) {
23            for (int delta : dp[i][j]) {
24                // When s1 contains a number
25                int number = 0;
26                if (delta <= 0) {
27                    for (int p = i; p < std::min(i + 3, lengthS1); ++p) {
28                        if (isDigit(s1[p])) {
29                            number = number * 10 + (s1[p] - '0');
30                            dp[p + 1][j].insert(delta + number);
31                        } else {
32                            // If it's not a digit, stop the sequence
33                            break;
34                        }
35                    }
36                }
37
38                // When s2 contains a number
39                number = 0;
40                if (delta >= 0) {
41                    for (int q = j; q < std::min(j + 3, lengthS2); ++q) {
42                        if (isDigit(s2[q])) {
43                            number = number * 10 + (s2[q] - '0');
44                            dp[i][q + 1].insert(delta - number);
45                        } else {
46                            // If it's not a digit, stop the sequence
47                            break;
48                        }
49                    }
50                }
51
52                // If there is a number in s1 and the next character in s1 is a letter,
53                // it is assumed that the number in s1 might represent a single character
54                if (i < lengthS1 && delta < 0 && !isDigit(s1[i])) {
55                    dp[i + 1][j].insert(delta + 1);
56                }
57
58                // If there is a number in s2 and the next character in s2 is a letter,
59                // it is assumed that the number in s2 might represent a single character
60                if (j < lengthS2 && delta > 0 && !isDigit(s2[j])) {
61                    dp[i][j + 1].insert(delta - 1);
62                }
63
64                // Check if the current characters in both strings match and have equal length,
65                // only when delta is zero
66                if (i < lengthS1 && j < lengthS2 && delta == 0 && s1[i] == s2[j]) {
67                    dp[i + 1][j + 1].insert(0);
68                }
69            }
70        }
71    }
72
73    // Return true if a match is found at the end of both strings with no remaining length difference
74    return dp[lengthS1][lengthS2].count(0) > 0;
75}
76

Typescript Solution

1function possiblyEquals(s1: string, s2: string): boolean {
2    const lengthS1 = s1.length,
3        lengthS2 = s2.length;
4    let dp: Array<Array<Set<number>>> = Array.from({ length: lengthS1 + 1 }, () =>
5        Array.from({ length: lengthS2 + 1 }, () => new Set<number>()),
6    );
7
8    // Initialize the first cell of the dynamic programming table
9    dp[0][0].add(0);
10
11    // Iterate through all positions in both strings
12    for (let i = 0; i <= lengthS1; i++) {
13        for (let j = 0; j <= lengthS2; j++) {
14            for (const delta of dp[i][j]) {
15                // When s1 contains a number
16                let number = 0;
17                if (delta <= 0) {
18                    for (let p = i; p < Math.min(i + 3, lengthS1); p++) {
19                        if (isDigit(s1[p])) {
20                            number = number * 10 + Number(s1[p]);
21                            dp[p + 1][j].add(delta + number);
22                        } else {
23                            // If it's not a digit, stop the sequence
24                            break;
25                        }
26                    }
27                }
28
29                // When s2 contains a number
30                number = 0;
31                if (delta >= 0) {
32                    for (let q = j; q < Math.min(j + 3, lengthS2); q++) {
33                        if (isDigit(s2[q])) {
34                            number = number * 10 + Number(s2[q]);
35                            dp[i][q + 1].add(delta - number);
36                        } else {
37                            // If it's not a digit, stop the sequence
38                            break;
39                        }
40                    }
41                }
42
43                // If there is a number in s1 and the next character in s1 is a letter,
44                // it is assumed that the number in s1 might represent a single character
45                if (i < lengthS1 && delta < 0 && !isDigit(s1[i])) {
46                    dp[i + 1][j].add(delta + 1);
47                }
48
49                // If there is a number in s2 and the next character in s2 is a letter,
50                // it is assumed that the number in s2 might represent a single character
51                if (j < lengthS2 && delta > 0 && !isDigit(s2[j])) {
52                    dp[i][j + 1].add(delta - 1);
53                }
54
55                // Check if the current characters in both strings match and have equal length,
56                // only when delta is zero
57                if (i < lengthS1 && j < lengthS2 && delta === 0 && s1[i] === s2[j]) {
58                    dp[i + 1][j + 1].add(0);
59                }
60            }
61        }
62    }
63
64    // Return true if a match is found at the end of both strings with no remaining length difference
65    return dp[lengthS1][lengthS2].has(0);
66}
67
68// Helper function to check if a character is a digit
69function isDigit(char: string): boolean {
70    return /^\d$/g.test(char);
71}
72

Time and Space Complexity

Time Complexity

The time complexity of the given code can be analyzed as follows:

  • The outer two loops run through the lengths of s1 and s2, respectively, giving us a base factor of O(n * m), where n is the length of s1 and m is the length of s2.
  • Inside the nested loops, the code processes potential numeric representations in both s1 and s2 up to three digits long. Since the length of numeric sequences is capped at 3, this contributes a constant factor, not depending on n or m.
  • For each i and j, the set dp[i][j] could in the worst case hold a number of elements proportional to the sum of lengths n + m, as it stores the differences between the "translated" lengths of s1[i...] and s2[j...].
  • The set operations such as .add() have a complexity of O(1) on average but could degenerate to O(k) where k is the number of items already in the set in the worst-case scenario due to hash collisions.

Putting it all together, the time complexity can be estimated as O(n * m * (n + m)).

Space Complexity

Analyzing the space complexity:

  • The dynamic programming table dp is a 2D array of sets with dimensions (n + 1) * (m + 1). The space taken by this table is the dominant factor.
  • Each set within dp[i][j] can potentially store up to n + m integers representing various differences between the lengths of s1 and s2.

Hence, the total space complexity is O(n * m * (n + m)).

😈
Become an
Algo Monster

Got a question? Ask the Teaching Assistant anything you don't understand.

Still not clear? Ask in the Forum,  Discord or Submit the part you don't understand to our editors.


TA 👨‍🏫