471. Encode String with Shortest Length
Problem Description
The goal of this problem is to encode a given string s
in such a way that the encoded version has the shortest possible length. The encoding rule must follow the format k[encoded_string]
, where encoded_string
represents a sequence of the string that is repeated k
times, and k
is a positive integer that represents the number of times encoded_string
occurs consecutively within s
.
For example, if the string is "aaaabaaaab"
, it can be encoded as 2[a4[ab]]
, which means that a4[ab]
(which represents "aaaab"
) is repeated twice.
The string should only be encoded if doing so reduces its length. For strings that would not be shortened by encoding, the original string should be returned. In cases where multiple encoding methods result in the shortest length, any of those methods are considered a valid solution.
Intuition
The intuition behind the solution is to apply dynamic programming, which is a method often used in optimization problems. Dynamic programming involves breaking down a complex problem into simpler subproblems and solving each of those subproblems just once, storing their solutions—often in a two-dimensional array.
The main idea of the solution is to:
- Iterate over all possible substrings of
s
and determine the shortest encoding for each substring. - For each substring, check if it has a repetitive pattern that can be encoded. Patterns are identified by checking if the substring
t
can be found within the concatenation oft
with itself, but not at the very beginning (this implies a repetition). - If such a pattern exists and it helps reduce the length of the current substring, encode it using the format
k[encoded_substring]
. If not, keep the substring unencoded. - For longer substrings that do not have a repetitive pattern that can be encoded, or such encoding does not reduce the length, explore breaking the substring into two smaller encoded subproblems (substrings) whose total encoded length is minimal.
By solving the subproblems from shortest to longest substrings and building up solutions, we ensure that when we determine whether to encode a longer substring, we are making the decision based on the optimal (shortest) encodings of all possible subparts of that substring.
Essentially, the dynamic programming array f[i][j]
holds the shortest encoded version of the substring starting from index i
to index j
in the original string s
. By the end of the iterations, f[0][n-1]
will give us the shortest encoded version of the entire string.
Learn more about Dynamic Programming patterns.
Solution Approach
The solution approach implements dynamic programming to solve the problem, where a two-dimensional array f
is used to store the shortest encoded strings for all the substrates from i
to j
. The implementation goes as follows:
-
Function
g(i, j)
takes the start indexi
and the end indexj
as inputs and returns the optimal encoded string for the substrings[i:j+1]
. If the substring is shorter than 5 characters, it's not worth encoding because the encoded format would not be shorter than the original substring. -
In function
g(i, j)
,t
is the current substring. In order to find repetitive patterns withint
,(t + t).index(t, 1)
is used. This expression checks whethert
repeats in a concatenation of itself after the first character. Ifk
is the index wheret
is found, it meanst
is repeating everyk
characters withint
. Ifk
is less than the length oft
, we have found a repeating pattern, and we encode it as"{cnt}[{f[i][i + k - 1]}]"
, wherecnt
is the count of repetition. -
For every substring, we store our finding in the
f
array. The elementsf[i][j]
will represent the shortest encoded form of the substrings[i:j+1]
. -
A nested loop is used to fill the table
f
in a bottom-up manner. The outer loop decrementsi
fromn-1
to0
to ensure we solve for all smaller substrates first. The inner loop incrementsj
fromi
ton-1
to cover all substrings starting fromi
. -
For every pair of
i
andj
, we first check whether it's worth encoding the current substring using functiong(i, j)
. If the substring length fromi
toj
is more than 4 (since we don't want to encode shorter substrings), we then check for potential splits within this substring that could lead to an overall shorter length. -
To check for splits, we iterate through all possible split positions
k
fromi
toj
and calculate the combined length of encoded substringsf[i][k]
andf[k+1][j]
. If this combined length is shorter than the currently stored encoded string fors[i:j+1]
, we updatef[i][j]
with this shorter version. -
After filling the array
f
, the shortest encoded version of the entire strings
will be stored inf[0][n-1]
, which is the solution to the original problem.
The algorithm capitalizes on the property that to get the shortest encoding of a longer substring, you need to know the shortest encodings of all its subparts. This leads to an optimization problem where dynamic programming excels. By storing solutions of small problems and using them to solve larger problems, the algorithm works efficiently without recalculating the encoded strings for the same substrates multiple times.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's take the string "abbbabbb"
to illustrate the solution approach:
-
Initialize a table
f
wheref[i][j]
will eventually contain the shortest encoded version of the substrings[i:j+1]
. -
Consider substrings
s[i:j+1]
of length less than 5, which are not worth encoding. Sof[i][j]
would just bes[i:j+1]
. For instance,f[0][3]
for substring"abbb"
remains"abbb"
because its length is less than 5. -
For substrings of length 5 or more, check if they have a repeatable pattern that can be encoded. For example, the substring
"abbbabbb"
has a repetitive substring which is"abbb"
that repeats twice. -
Use
(t + t).index(t, 1)
to find the repeating pattern withint
. For our example,t = "abbb"
, and by searching in(t + t) = "abbbabbb"
,t
starts repeating at index 4, which is equal to the length oft
, indicating that it repeats every 4 characters. -
Since the pattern
t
repeats2
times, encode the substring as"2[abbb]"
and store it in tablef
atf[0][7]
, replacing"abbbabbb"
. -
If we were to consider a longer string and
f
is partially filled with shorter substrings' encodings, check for optimal splits by trying all possible split points. For every possible split at positionk
for the substrings[i:j+1]
, calculate the total length off[i][k] + f[k+1][j]
. Choose the split that offers the shortest encoded length. -
After checking for all possible splits and patterns, the final encoded strings are ready in
f
. For our example, since the substring"abbbabbb"
has already been optimally encoded as"2[abbb]"
, this would be the value off[0][7]
, and thus the output for the entire string.
By systematically applying these steps to increasingly longer substrates of the original string and using the dynamic programming array to store intermediate solutions, the algorithm ensures efficiency and avoids redundant calculations, leading to an optimal solution.
Solution Implementation
1class Solution:
2 def encode(self, s: str) -> str:
3 # Helper method to compute the encoded string for a substring s[i:j+1]
4 def compute_encoded_substring(start: int, end: int) -> str:
5 substring = s[start:end + 1]
6 # If the substring is too short, encoding doesn't make sense, return as is
7 if len(substring) < 5:
8 return substring
9
10 # Try to find a repeated pattern in the substring
11 duplicate_index = (substring + substring).find(substring, 1)
12
13 # If a repeated pattern is found that's shorter than the original string
14 if duplicate_index < len(substring):
15 count_of_repetition = len(substring) // duplicate_index
16 # Encoded format: count[encoded substring for the repeated pattern]
17 return f"{count_of_repetition}[{dynamic_table[start][start + duplicate_index - 1]}]"
18 # No repeated pattern was found, return the original substring
19 return substring
20
21 # Main logic begins here
22 length_of_s = len(s)
23 # Initialize dynamic programming table with None
24 dynamic_table = [[None] * length_of_s for _ in range(length_of_s)]
25
26 # Build the table bottom-up
27 for i in range(length_of_s - 1, -1, -1):
28 for j in range(i, length_of_s):
29 # Compute encoded substring for current i, j
30 dynamic_table[i][j] = compute_encoded_substring(i, j)
31
32 # If current substring can potentially be shortened
33 if j - i + 1 > 4:
34 # Try to break the substring into two parts and see if this gives
35 # a shorter encoding by checking all possible split positions
36 for k in range(i, j):
37 # Combine encoded substrings for both parts
38 trial_encoded = dynamic_table[i][k] + dynamic_table[k + 1][j]
39 # If this combination gives us a shorter string, update the table
40 if len(dynamic_table[i][j]) > len(trial_encoded):
41 dynamic_table[i][j] = trial_encoded
42
43 # Result is in the top-right cell of the dynamic programming table
44 return dynamic_table[0][-1]
45
1class Solution {
2 private String originalString;
3 private String[][] encodedSubStrings;
4
5 public String encode(String s) {
6 originalString = s;
7 int n = s.length();
8 encodedSubStrings = new String[n][n];
9
10 // Iterate over all possible substrings in reverse order
11 for (int start = n - 1; start >= 0; --start) {
12 for (int end = start; end < n; ++end) {
13 // Attempt to find the encoded version of the current substring
14 encodedSubStrings[start][end] = encodeSubstring(start, end);
15
16 // Avoid unnecessary work for substrings shorter than 5 characters,
17 // as encoding them would not be efficient.
18 if (end - start + 1 > 4) {
19 for (int split = start; split < end; ++split) {
20 // Encode the current substring by combining the encoded versions
21 // of its two halves and check if this is shorter than the current encoding.
22 String combinedEncoding = encodedSubStrings[start][split] + encodedSubStrings[split + 1][end];
23 if (encodedSubStrings[start][end].length() > combinedEncoding.length()) {
24 encodedSubStrings[start][end] = combinedEncoding;
25 }
26 }
27 }
28 }
29 }
30
31 // The encoded string of the entire input is located at the top-left corner of the matrix.
32 return encodedSubStrings[0][n - 1];
33 }
34
35 private String encodeSubstring(int start, int end) {
36 // Extract the substring to be encoded
37 String substring = originalString.substring(start, end + 1);
38
39 // Substrings shorter than 5 characters shouldn't be encoded.
40 if (substring.length() < 5) {
41 return substring;
42 }
43
44 // Search for repeated patterns by concatenating the substring with itself
45 // and looking for the index of the second occurrence of the substring.
46 int repeatIndex = (substring + substring).indexOf(substring, 1);
47
48 // If the repeated pattern exists within the length of the original substring,
49 // encode the pattern.
50 if (repeatIndex < substring.length()) {
51 int repeatCount = substring.length() / repeatIndex;
52 String pattern = encodedSubStrings[start][start + repeatIndex - 1];
53
54 // Generate the encoded string with repetition count and pattern.
55 return String.format("%d[%s]", repeatCount, pattern);
56 }
57
58 // If no repeated pattern was found, return the original substring.
59 return substring;
60 }
61}
62
1class Solution {
2public:
3 string encode(string s) {
4 int n = s.size(); // The length of the input string
5 vector<vector<string>> dp(n, vector<string>(n));
6 // dp[i][j] holds the shortest encoded string for s[i..j]
7
8 auto encodeSubstring = [&](int i, int j) -> string {
9 string t = s.substr(i, j - i + 1); // Substring from s[i] to s[j]
10 // If the length of the substring is less than 5, do not encode it as it wouldn't be beneficial
11 if (t.size() < 5) {
12 return t;
13 }
14 // Check if the substring can be collapsed; i.e. it is a repeat of a smaller string
15 int k = (t + t).find(t, 1);
16 if (k < t.size()) {
17 int cnt = t.size() / k; // Count the number of repeats
18 return to_string(cnt) + "[" + dp[i][i + k - 1] + "]"; // Encode as cnt[sub_encoded_string]
19 }
20 return t; // If not collapsible, just return the original substring
21 };
22
23 // Build the dp array from the bottom up, from shorter to longer substrings
24 for (int i = n - 1; i >= 0; --i) { // Start from the end of the string
25 for (int j = i; j < n; ++j) { // From the current position to the end
26 dp[i][j] = encodeSubstring(i, j); // Encode substring s[i..j]
27 // If the length of the substring is more than 4 characters, try to split it and encode separately
28 if (j - i + 1 > 4) {
29 for (int k = i; k < j; ++k) {
30 string t = dp[i][k] + dp[k + 1][j]; // The string got by encoding s[i..k] and s[k+1..j]
31 if (t.size() < dp[i][j].size()) { // Check if this new string is shorter than the current encoded one
32 dp[i][j] = t; // If yes, then update the dp array with this new shorter string
33 }
34 }
35 }
36 }
37 }
38 // Finally, return the encoded string of the entire string s
39 return dp[0][n - 1];
40 }
41};
42
1function encode(s: string): string {
2 // Initialize the length of the string
3 const stringLength = s.length;
4
5 // Create a 2D array to store the results of subproblems
6 const dp: string[][] = new Array(stringLength).fill(0).map(() => new Array(stringLength).fill(''));
7
8 // Helper function to find the encoded string for a substring
9 const getEncodedSubstring = (start: number, end: number): string => {
10 // Get the current substring to be encoded
11 const substring = s.slice(start, end + 1);
12
13 // Base case: for short substrings, encoding is not needed
14 if (substring.length < 5) {
15 return substring;
16 }
17
18 // Check if the substring is a repeated pattern by concatenating
19 // it with itself and checking for the pattern occurrence
20 const repeatIndex = substring.repeat(2).indexOf(substring, 1);
21
22 // If repetition is found, encode as a repeated count and pattern
23 if (repeatIndex < substring.length) {
24 const repeatCount = Math.floor(substring.length / repeatIndex);
25 return repeatCount + '[' + dp[start][start + repeatIndex - 1] + ']';
26 }
27
28 // If no repetition pattern is found, return the original substring
29 return substring;
30 };
31
32 // Bottom-up approach to fill the 2D array with encoded substrings
33 for (let i = stringLength - 1; i >= 0; --i) {
34 for (let j = i; j < stringLength; ++j) {
35 dp[i][j] = getEncodedSubstring(i, j);
36
37 // Check for encoding possibilities by splitting the substring
38 if (j - i + 1 > 4) {
39 for (let k = i; k < j; ++k) {
40 const combined = dp[i][k] + dp[k + 1][j];
41 // Update the encoded string if a shorter encoding is possible
42 if (combined.length < dp[i][j].length) {
43 dp[i][j] = combined;
44 }
45 }
46 }
47 }
48 }
49
50 // Return the encoded string for the entire input
51 return dp[0][stringLength - 1];
52}
53
Time and Space Complexity
The provided Python code is a dynamic programming solution for the problem of encoding the minimum length of a string where the string can be encoded by the number of repetitions and a pattern inside the brackets. To analyze the time and space complexity, we need to consider the operations performed inside the nested loops and the recursive calls.
Time Complexity
The time complexity of the code is determined by the three nested loops and the string operations inside the helper function g
.
- The outer loop runs from
n-1
down to0
, thus runningn
times. - The second loop, for variable
j
, will run at mostn
times for eachi
. - The third loop is used to find the optimal partition of the string for encoding and runs at most
n
times for each pair ofi
andj
.
Additionally, the helper function g
includes a string pattern check using (t + t).index(t, 1)
which can be considered O(n) in the worst case because it could potentially scan the entire doubled string to find the pattern start. Thus, this operation could contribute significantly to the execution time.
Considering all these factors, the worst-case time complexity is O(n^3) for the loops multiplied by O(n) for the string pattern checking, leading to an overall time complexity of O(n^4)
.
Space Complexity
The space complexity includes the space required for the dynamic programming table f
and the stack space used by the helper function g
.
- The dynamic programming table
f
is a 2D array of sizen x n
, contributingO(n^2)
space complexity. - The helper function
g
uses a temporary stringt
, but this does not increase the asymptotic space complexity since it requires space proportional to the input strings
.
So the overall space complexity of the code is O(n^2)
due to the 2D array f
.
Learn more about how to find time and space complexity quickly using problem constraints.
Which of these properties could exist for a graph but not a tree?
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!