471. Encode String with Shortest Length 🔒
Problem Description
You need to find the shortest possible encoding of a given string s
using a specific compression rule.
The compression rule allows you to represent repeated substrings in the format k[encoded_string]
, where:
encoded_string
is the substring that repeatsk
is the number of times it repeats (must be a positive integer)- The brackets
[]
are literal characters in the output
For example:
"aaa"
can be encoded as"3[a]"
(shorter than original)"aaaaa"
can be encoded as"5[a]"
"aaaaaaaaaa"
(10 a's) can be encoded as"10[a]"
"abcabcabcabc"
can be encoded as"4[abc]"
"abbbabbbabbb"
can be encoded as"3[abbb]"
or even"3[a3[b]]"
if that's shorter
The encoding can be applied recursively - you can encode parts within the brackets as well.
Important constraints:
- Only encode if it makes the string shorter. If encoding doesn't reduce the length, keep the original string
- If multiple valid encodings exist with the same shortest length, you can return any one of them
- The goal is to find the absolute shortest possible encoded representation
The challenge involves finding optimal ways to:
- Identify repeating patterns in the string
- Decide whether to encode a substring or leave it as is
- Consider all possible ways to split and encode the string to achieve minimum length
Intuition
The key insight is that finding the shortest encoding requires us to explore all possible ways to encode each substring and choose the best one. This naturally leads to a dynamic programming approach.
Think about any substring s[i:j+1]
. There are essentially two ways we can encode it:
-
Direct encoding: Check if this substring contains a repeating pattern. If
"abcabcabc"
repeats"abc"
three times, we can encode it as"3[abc]"
. To detect this pattern, we can use a clever trick: concatenate the string with itself and search for the original string starting from index 1. If we find it before reaching the length of the original string, we've found a repeating pattern. -
Split encoding: Break the substring into two parts at some position
k
and encode each part separately. For example,"aaaaabbbbb"
might be better encoded as"5[a]5[b]"
rather than trying to encode the whole thing as one pattern.
The challenge is we don't know which approach will give us the shortest result without trying both. For the direct encoding, we also need to recursively find the best encoding for the repeating unit itself (like encoding "abc"
within "3[abc]"
).
This recursive structure with overlapping subproblems suggests dynamic programming. We need to:
- Process substrings from shortest to longest (or use memoization)
- For each substring, try both direct encoding and all possible split points
- Keep track of the shortest encoding found for each substring
The reason we start from the end of the string and work backwards (i
from n-1
to 0
) while expanding j
from i
to n-1
is to ensure that when we need the optimal encoding of a smaller substring, we've already computed it. This bottom-up approach guarantees that all necessary subproblems are solved before we need them.
The condition to check if encoding is worth it (length < 5) comes from the fact that the shortest possible encoding "2[a]"
has length 5, so any string shorter than 5 characters cannot benefit from encoding.
Solution Approach
The solution uses dynamic programming with a 2D table f[i][j]
where f[i][j]
stores the shortest encoding of substring s[i:j+1]
.
Helper Function g(i, j)
:
This function attempts to directly encode the substring s[i:j+1]
if it contains a repeating pattern.
- Extract the substring
t = s[i:j+1]
- If length is less than 5, return the original string (encoding won't make it shorter)
- Find the repeating pattern using the trick:
k = (t + t).index(t, 1)
- Concatenate
t
with itself to gett + t
- Search for
t
starting from index 1 - If
k < len(t)
, we found a repeating pattern with periodk
- Concatenate
- If a pattern exists:
- Calculate repetition count:
cnt = len(t) // k
- Return the encoded format:
"cnt[f[i][i+k-1]]"
wheref[i][i+k-1]
is the optimal encoding of one repeating unit
- Calculate repetition count:
- Otherwise, return the original substring
Main Dynamic Programming Logic:
-
Initialize a 2D array
f
of sizen×n
wheren = len(s)
-
Process substrings in order of increasing length:
- Outer loop:
i
fromn-1
down to0
(starting position) - Inner loop:
j
fromi
ton-1
(ending position)
- Outer loop:
-
For each substring
s[i:j+1]
:- First, try direct encoding:
f[i][j] = g(i, j)
- If substring length > 4, try all possible split points:
for k in range(i, j): t = f[i][k] + f[k+1][j] if len(f[i][j]) > len(t): f[i][j] = t
- This combines the optimal encodings of left part
f[i][k]
and right partf[k+1][j]
- Keep the shorter encoding between direct encoding and split encoding
- First, try direct encoding:
-
Return
f[0][-1]
which contains the optimal encoding of the entire string
Time Complexity: O(n³)
- For each of O(n²)
substrings, we check O(n)
split points.
Space Complexity: O(n²)
- For the 2D DP table storing optimal encodings of all substrings.
The algorithm ensures optimal substructure by processing smaller substrings first, guaranteeing that when we need f[i][i+k-1]
in the helper function or f[i][k]
and f[k+1][j]
in split encoding, these values are already computed.
Ready to land your dream job?
Unlock your dream job with a 3-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the algorithm with the string s = "aabcabc"
.
Step 1: Initialize DP table
We create a 7×7 table f
(since length = 7). We'll process substrings from length 1 to 7.
Step 2: Process substrings (i from 6 to 0, j from i to 6)
Starting with single characters (base cases):
f[0][0] = "a"
(substring "a")f[1][1] = "a"
f[2][2] = "b"
f[3][3] = "c"
f[4][4] = "a"
f[5][5] = "b"
f[6][6] = "c"
Length 2 substrings:
f[0][1] = "aa"
→ Direct encoding: g(0,1) checks if "aa" repeats. It does (period 1), so we get"2[a]"
. But length is 5, not shorter than 2, so keep"aa"
f[1][2] = "ab"
→ No pattern, remains"ab"
f[2][3] = "bc"
→ No pattern, remains"bc"
- Similarly for others...
Length 4 substring "bcab"
at f[2][5]
:
- Direct encoding: No repeating pattern →
"bcab"
- Try splits:
- Split at 2:
f[2][2] + f[3][5]
="b" + "cab"
="bcab"
(length 4) - Split at 3:
f[2][3] + f[4][5]
="bc" + "ab"
="bcab"
(length 4) - Split at 4:
f[2][4] + f[5][5]
="bca" + "b"
="bcab"
(length 4)
- Split at 2:
- Result:
f[2][5] = "bcab"
Length 6 substring "abcabc"
at f[1][6]
:
- Direct encoding: g(1,6) finds "abcabc" = 2 × "abc"
(abcabc + abcabc).index(abcabc, 1)
returns 3- Pattern length = 3, repetitions = 6/3 = 2
- Encoding:
"2[" + f[1][3] + "]"
="2[abc]"
(length 6)
- Try splits:
- Split at 1:
"a" + "bcabc"
="abcabc"
(length 6) - Split at 2:
"ab" + "cabc"
="abcabc"
(length 6) - Split at 3:
"abc" + "abc"
="abcabc"
(length 6) - Split at 4:
"abca" + "bc"
="abcabc"
(length 6) - Split at 5:
"abcab" + "c"
="abcabc"
(length 6)
- Split at 1:
- Result:
f[1][6] = "2[abc]"
(same length as original, but valid encoding)
Final substring "aabcabc"
at f[0][6]
:
- Direct encoding: No repeating pattern →
"aabcabc"
- Try splits:
- Split at 0:
f[0][0] + f[1][6]
="a" + "2[abc]"
="a2[abc]"
(length 7) - Split at 1:
f[0][1] + f[2][6]
="aa" + "bcabc"
="aabcabc"
(length 7) - Other splits also give length 7 or more
- Split at 0:
- Result:
f[0][6] = "a2[abc]"
Answer: "a2[abc]"
- The shortest encoding saves 1 character from the original string.
Solution Implementation
1class Solution:
2 def encode(self, s: str) -> str:
3 """
4 Encode a string by replacing repeated substrings with k[pattern] format.
5 Uses dynamic programming to find the optimal encoding.
6 """
7
8 def find_best_encoding(start: int, end: int) -> str:
9 """
10 Find the best encoding for substring s[start:end+1].
11 If a repeating pattern exists, encode as k[pattern].
12 """
13 substring = s[start : end + 1]
14
15 # Don't encode strings shorter than 5 characters
16 if len(substring) < 5:
17 return substring
18
19 # Find if substring is made of repeating pattern
20 # Trick: search for substring in doubled version starting from index 1
21 # If found before len(substring), it means there's a repeating pattern
22 pattern_end_index = (substring + substring).index(substring, 1)
23
24 if pattern_end_index < len(substring):
25 # Found a repeating pattern
26 pattern_length = pattern_end_index
27 repetition_count = len(substring) // pattern_length
28 # Recursively encode the pattern itself
29 encoded_pattern = dp[start][start + pattern_length - 1]
30 return f"{repetition_count}[{encoded_pattern}]"
31
32 return substring
33
34 n = len(s)
35 # dp[i][j] stores the best encoding for substring s[i:j+1]
36 dp = [[None] * n for _ in range(n)]
37
38 # Fill dp table from smaller substrings to larger ones
39 # Process from right to left for start index
40 for start in range(n - 1, -1, -1):
41 # Process from left to right for end index
42 for end in range(start, n):
43 # First, try encoding the whole substring as a pattern
44 dp[start][end] = find_best_encoding(start, end)
45
46 # For substrings longer than 4, try splitting into two parts
47 if end - start + 1 > 4:
48 for split_point in range(start, end):
49 # Combine encodings of left and right parts
50 left_part = dp[start][split_point]
51 right_part = dp[split_point + 1][end]
52 combined_encoding = left_part + right_part
53
54 # Keep the shorter encoding
55 if len(dp[start][end]) > len(combined_encoding):
56 dp[start][end] = combined_encoding
57
58 # Return the best encoding for the entire string
59 return dp[0][-1]
60
1class Solution {
2 private String originalString;
3 private String[][] dp; // dp[i][j] stores the shortest encoding for substring from index i to j
4
5 /**
6 * Encodes a string to its shortest representation using pattern compression.
7 * For example: "aabcaabcbb" -> "2[aabc]bb"
8 *
9 * @param s the input string to encode
10 * @return the shortest encoded representation of the string
11 */
12 public String encode(String s) {
13 this.originalString = s;
14 int n = s.length();
15 dp = new String[n][n];
16
17 // Process all substrings from length 1 to n
18 // Start from the end and work backwards to ensure smaller subproblems are solved first
19 for (int start = n - 1; start >= 0; --start) {
20 for (int end = start; end < n; ++end) {
21 // First, try to encode the current substring as a repeating pattern
22 dp[start][end] = encodeSubstring(start, end);
23
24 // For substrings longer than 4 characters, try splitting into two parts
25 // This optimization avoids unnecessary splits for very short strings
26 if (end - start + 1 > 4) {
27 for (int splitPoint = start; splitPoint < end; ++splitPoint) {
28 // Combine encodings of left part [start, splitPoint] and right part [splitPoint+1, end]
29 String concatenatedEncoding = dp[start][splitPoint] + dp[splitPoint + 1][end];
30
31 // Keep the shorter encoding
32 if (dp[start][end].length() > concatenatedEncoding.length()) {
33 dp[start][end] = concatenatedEncoding;
34 }
35 }
36 }
37 }
38 }
39
40 return dp[0][n - 1];
41 }
42
43 /**
44 * Attempts to encode a substring as a repeating pattern.
45 * If the substring can be represented as k repetitions of a pattern,
46 * returns "k[pattern]", otherwise returns the original substring.
47 *
48 * @param start starting index of the substring (inclusive)
49 * @param end ending index of the substring (inclusive)
50 * @return the encoded representation or the original substring
51 */
52 private String encodeSubstring(int start, int end) {
53 String substring = originalString.substring(start, end + 1);
54
55 // Don't encode strings shorter than 5 characters (encoding would be longer)
56 if (substring.length() < 5) {
57 return substring;
58 }
59
60 // Find if the substring is a repeating pattern
61 // Trick: concatenate the string with itself and search for the original string
62 // If found at index < length, then it's a repeating pattern
63 int patternLength = (substring + substring).indexOf(substring, 1);
64
65 if (patternLength < substring.length()) {
66 // The substring consists of repeating patterns
67 int repetitionCount = substring.length() / patternLength;
68
69 // Return the encoded format: count[pattern]
70 // Use the already computed optimal encoding for the pattern
71 return String.format("%d[%s]", repetitionCount, dp[start][start + patternLength - 1]);
72 }
73
74 return substring;
75 }
76}
77
1class Solution {
2public:
3 string encode(string s) {
4 int n = s.size();
5 // dp[i][j] stores the shortest encoding for substring s[i...j]
6 vector<vector<string>> dp(n, vector<string>(n));
7
8 // Helper function to find the best encoding for a substring
9 // considering it as a single repeated pattern
10 auto findRepeatedPattern = [&](int start, int end) -> string {
11 // Extract the substring from start to end (inclusive)
12 string substring = s.substr(start, end - start + 1);
13
14 // If substring is too short, encoding won't help
15 if (substring.size() < 5) {
16 return substring;
17 }
18
19 // Find if the substring is a repeated pattern
20 // by searching for itself in doubled string starting from position 1
21 int patternLength = (substring + substring).find(substring, 1);
22
23 // If we found a pattern shorter than the whole substring
24 if (patternLength < substring.size()) {
25 int repeatCount = substring.size() / patternLength;
26 // Return encoded format: count[pattern]
27 return to_string(repeatCount) + "[" + dp[start][start + patternLength - 1] + "]";
28 }
29
30 // No repeated pattern found, return original substring
31 return substring;
32 };
33
34 // Fill the dp table from smaller substrings to larger ones
35 // Process from right to left for start position
36 for (int i = n - 1; i >= 0; --i) {
37 // Process from left to right for end position
38 for (int j = i; j < n; ++j) {
39 // First, try to encode the entire substring as a repeated pattern
40 dp[i][j] = findRepeatedPattern(i, j);
41
42 // For substrings longer than 4 characters, try splitting
43 if (j - i + 1 > 4) {
44 // Try all possible split points
45 for (int splitPoint = i; splitPoint < j; ++splitPoint) {
46 // Combine encodings of left and right parts
47 string splitEncoding = dp[i][splitPoint] + dp[splitPoint + 1][j];
48
49 // Update if this split gives shorter encoding
50 if (splitEncoding.size() < dp[i][j].size()) {
51 dp[i][j] = splitEncoding;
52 }
53 }
54 }
55 }
56 }
57
58 // Return the shortest encoding for the entire string
59 return dp[0][n - 1];
60 }
61};
62
1/**
2 * Encodes a string by finding repeated patterns and representing them in compressed format
3 * For example: "aabcaabcbb" -> "2[aabc]bb"
4 * @param s - The input string to encode
5 * @returns The shortest encoded representation of the string
6 */
7function encode(s: string): string {
8 const length = s.length;
9
10 // dp[i][j] stores the shortest encoding for substring s[i..j]
11 const dp: string[][] = new Array(length)
12 .fill(0)
13 .map(() => new Array(length).fill(''));
14
15 /**
16 * Helper function to find the best encoding for a substring
17 * @param startIndex - Starting index of the substring
18 * @param endIndex - Ending index of the substring (inclusive)
19 * @returns The encoded string or original if encoding is not beneficial
20 */
21 const getEncodedSubstring = (startIndex: number, endIndex: number): string => {
22 const substring = s.slice(startIndex, endIndex + 1);
23
24 // Don't encode strings shorter than 5 characters (encoding overhead)
25 if (substring.length < 5) {
26 return substring;
27 }
28
29 // Find if the substring can be represented as a repeated pattern
30 // Double the string and search for the original starting from index 1
31 // This finds the shortest repeating unit
32 const repeatingUnitLength = substring.repeat(2).indexOf(substring, 1);
33
34 if (repeatingUnitLength < substring.length) {
35 // Calculate how many times the pattern repeats
36 const repetitionCount = Math.floor(substring.length / repeatingUnitLength);
37 // Return encoded format: count[pattern]
38 return repetitionCount + '[' + dp[startIndex][startIndex + repeatingUnitLength - 1] + ']';
39 }
40
41 return substring;
42 };
43
44 // Build the DP table from bottom-right to top-left
45 // Process shorter substrings first
46 for (let i = length - 1; i >= 0; --i) {
47 for (let j = i; j < length; ++j) {
48 // Get the best encoding for substring s[i..j]
49 dp[i][j] = getEncodedSubstring(i, j);
50
51 // For substrings longer than 4 characters, try splitting at different points
52 if (j - i + 1 > 4) {
53 for (let splitPoint = i; splitPoint < j; ++splitPoint) {
54 // Try concatenating encodings of left and right parts
55 const concatenatedEncoding = dp[i][splitPoint] + dp[splitPoint + 1][j];
56
57 // Update if this split gives a shorter encoding
58 if (concatenatedEncoding.length < dp[i][j].length) {
59 dp[i][j] = concatenatedEncoding;
60 }
61 }
62 }
63 }
64 }
65
66 // Return the shortest encoding for the entire string
67 return dp[0][length - 1];
68}
69
Time and Space Complexity
Time Complexity: O(n^3)
The algorithm uses dynamic programming with a 2D table where f[i][j]
represents the shortest encoding for substring s[i:j+1]
.
- The outer two loops iterate through all possible substrings, contributing
O(n^2)
iterations. - For each substring
(i, j)
:- The function
g(i, j)
is called, which:- Creates substring
t
inO(n)
time - Performs pattern matching using
(t + t).index(t, 1)
inO(n)
time - Overall
g()
takesO(n)
time
- Creates substring
- When
j - i + 1 > 4
, we try all possible split pointsk
betweeni
andj
, which adds anotherO(n)
loop - String concatenation
f[i][k] + f[k + 1][j]
takesO(n)
in worst case
- The function
Therefore, the total time complexity is O(n^2) * O(n) = O(n^3)
.
Space Complexity: O(n^2)
- The 2D DP table
f
has dimensionsn × n
, storing strings that could be up to lengthn
each - In the worst case, each cell stores a string of length
O(n)
- Therefore, the total space complexity is
O(n^2 * n) = O(n^3)
for storing all strings
However, if we only consider the space for the DP table structure itself (number of cells), it's O(n^2)
. The actual space usage is O(n^3)
when accounting for string storage.
Common Pitfalls
1. Incorrect Pattern Detection Logic
The most common pitfall is misunderstanding how the pattern detection works with (substring + substring).index(substring, 1)
. Many developers incorrectly assume this finds any repeating pattern, but it specifically finds the smallest period of repetition.
Problem Example:
# Incorrect understanding might lead to: substring = "abcabc" # Some might think this detects "abc" repeating twice # But actually: (substring + substring).index(substring, 1) returns 3 # This correctly identifies that "abc" is the repeating unit
Solution: Understand that this trick works because if a string has period k
, then shifting it by k
positions produces the same string. The .index()
method finds the first occurrence of the full substring in the doubled version, which gives us the smallest period.
2. Forgetting to Check Encoding Length
A critical mistake is always applying the encoding without checking if it actually shortens the string.
Problem Example:
# For substring "ab" repeated twice: "abab" # Encoding would be "2[ab]" which has length 6 # Original "abab" has length 4 - shorter! # Must keep original, not encode
Solution: Always compare the length of the encoded string with the original before deciding to use the encoding. The condition len(substring) < 5
in the code handles this for the base case.
3. Not Handling Edge Cases in Recursion
When building the encoded string like f"{repetition_count}[{encoded_pattern}]"
, forgetting that the pattern itself might need encoding can lead to suboptimal results.
Problem Example:
# For "aaaaaaaaaa" (10 a's) # Direct encoding: "10[a]" - length 6 # But if we had "aaaaaaaaaaaaaaaaaaaaa" (20 a's) # We might get "20[a]" - length 6 # Or we could get "2[10[a]]" - length 9 (worse) # The DP ensures we pick the optimal one
Solution: The recursive call dp[start][start + pattern_length - 1]
ensures the pattern itself is optimally encoded before being used in the larger encoding.
4. Incorrect DP Table Traversal Order
Processing the DP table in the wrong order can lead to using uninitialized values.
Problem Example:
# Wrong order:
for start in range(n): # Going forward
for end in range(start, n):
# When we need dp[start][start + pattern_length - 1]
# It might not be computed yet!
Solution: Process from right to left for the start index and ensure smaller substrings are computed before larger ones. The code correctly uses range(n - 1, -1, -1)
for the outer loop.
5. Off-by-One Errors in Index Management
Mixing inclusive and exclusive indices can cause bugs, especially when dealing with substring slicing.
Problem Example:
# If dp[i][j] represents s[i:j] (exclusive end) # vs dp[i][j] represents s[i:j+1] (inclusive end) # Mixing these conventions leads to errors
Solution: Be consistent with index conventions. The provided code uses dp[i][j]
to represent s[i:j+1]
, meaning j
is inclusive. All substring operations must follow this convention consistently.
The three-steps of Depth First Search are:
- Identify states;
- Draw the state-space tree;
- DFS on the state-space tree.
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Don’t Miss This!