Facebook Pixel

471. Encode String with Shortest Length 🔒

Problem Description

You need to find the shortest possible encoding of a given string s using a specific compression rule.

The compression rule allows you to represent repeated substrings in the format k[encoded_string], where:

  • encoded_string is the substring that repeats
  • k is the number of times it repeats (must be a positive integer)
  • The brackets [] are literal characters in the output

For example:

  • "aaa" can be encoded as "3[a]" (shorter than original)
  • "aaaaa" can be encoded as "5[a]"
  • "aaaaaaaaaa" (10 a's) can be encoded as "10[a]"
  • "abcabcabcabc" can be encoded as "4[abc]"
  • "abbbabbbabbb" can be encoded as "3[abbb]" or even "3[a3[b]]" if that's shorter

The encoding can be applied recursively - you can encode parts within the brackets as well.

Important constraints:

  • Only encode if it makes the string shorter. If encoding doesn't reduce the length, keep the original string
  • If multiple valid encodings exist with the same shortest length, you can return any one of them
  • The goal is to find the absolute shortest possible encoded representation

The challenge involves finding optimal ways to:

  1. Identify repeating patterns in the string
  2. Decide whether to encode a substring or leave it as is
  3. Consider all possible ways to split and encode the string to achieve minimum length
Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that finding the shortest encoding requires us to explore all possible ways to encode each substring and choose the best one. This naturally leads to a dynamic programming approach.

Think about any substring s[i:j+1]. There are essentially two ways we can encode it:

  1. Direct encoding: Check if this substring contains a repeating pattern. If "abcabcabc" repeats "abc" three times, we can encode it as "3[abc]". To detect this pattern, we can use a clever trick: concatenate the string with itself and search for the original string starting from index 1. If we find it before reaching the length of the original string, we've found a repeating pattern.

  2. Split encoding: Break the substring into two parts at some position k and encode each part separately. For example, "aaaaabbbbb" might be better encoded as "5[a]5[b]" rather than trying to encode the whole thing as one pattern.

The challenge is we don't know which approach will give us the shortest result without trying both. For the direct encoding, we also need to recursively find the best encoding for the repeating unit itself (like encoding "abc" within "3[abc]").

This recursive structure with overlapping subproblems suggests dynamic programming. We need to:

  • Process substrings from shortest to longest (or use memoization)
  • For each substring, try both direct encoding and all possible split points
  • Keep track of the shortest encoding found for each substring

The reason we start from the end of the string and work backwards (i from n-1 to 0) while expanding j from i to n-1 is to ensure that when we need the optimal encoding of a smaller substring, we've already computed it. This bottom-up approach guarantees that all necessary subproblems are solved before we need them.

The condition to check if encoding is worth it (length < 5) comes from the fact that the shortest possible encoding "2[a]" has length 5, so any string shorter than 5 characters cannot benefit from encoding.

Solution Approach

The solution uses dynamic programming with a 2D table f[i][j] where f[i][j] stores the shortest encoding of substring s[i:j+1].

Helper Function g(i, j): This function attempts to directly encode the substring s[i:j+1] if it contains a repeating pattern.

  1. Extract the substring t = s[i:j+1]
  2. If length is less than 5, return the original string (encoding won't make it shorter)
  3. Find the repeating pattern using the trick: k = (t + t).index(t, 1)
    • Concatenate t with itself to get t + t
    • Search for t starting from index 1
    • If k < len(t), we found a repeating pattern with period k
  4. If a pattern exists:
    • Calculate repetition count: cnt = len(t) // k
    • Return the encoded format: "cnt[f[i][i+k-1]]" where f[i][i+k-1] is the optimal encoding of one repeating unit
  5. Otherwise, return the original substring

Main Dynamic Programming Logic:

  1. Initialize a 2D array f of size n×n where n = len(s)

  2. Process substrings in order of increasing length:

    • Outer loop: i from n-1 down to 0 (starting position)
    • Inner loop: j from i to n-1 (ending position)
  3. For each substring s[i:j+1]:

    • First, try direct encoding: f[i][j] = g(i, j)
    • If substring length > 4, try all possible split points:
      for k in range(i, j):
          t = f[i][k] + f[k+1][j]
          if len(f[i][j]) > len(t):
              f[i][j] = t
    • This combines the optimal encodings of left part f[i][k] and right part f[k+1][j]
    • Keep the shorter encoding between direct encoding and split encoding
  4. Return f[0][-1] which contains the optimal encoding of the entire string

Time Complexity: O(n³) - For each of O(n²) substrings, we check O(n) split points.

Space Complexity: O(n²) - For the 2D DP table storing optimal encodings of all substrings.

The algorithm ensures optimal substructure by processing smaller substrings first, guaranteeing that when we need f[i][i+k-1] in the helper function or f[i][k] and f[k+1][j] in split encoding, these values are already computed.

Ready to land your dream job?

Unlock your dream job with a 3-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the algorithm with the string s = "aabcabc".

Step 1: Initialize DP table We create a 7×7 table f (since length = 7). We'll process substrings from length 1 to 7.

Step 2: Process substrings (i from 6 to 0, j from i to 6)

Starting with single characters (base cases):

  • f[0][0] = "a" (substring "a")
  • f[1][1] = "a"
  • f[2][2] = "b"
  • f[3][3] = "c"
  • f[4][4] = "a"
  • f[5][5] = "b"
  • f[6][6] = "c"

Length 2 substrings:

  • f[0][1] = "aa" → Direct encoding: g(0,1) checks if "aa" repeats. It does (period 1), so we get "2[a]". But length is 5, not shorter than 2, so keep "aa"
  • f[1][2] = "ab" → No pattern, remains "ab"
  • f[2][3] = "bc" → No pattern, remains "bc"
  • Similarly for others...

Length 4 substring "bcab" at f[2][5]:

  • Direct encoding: No repeating pattern → "bcab"
  • Try splits:
    • Split at 2: f[2][2] + f[3][5] = "b" + "cab" = "bcab" (length 4)
    • Split at 3: f[2][3] + f[4][5] = "bc" + "ab" = "bcab" (length 4)
    • Split at 4: f[2][4] + f[5][5] = "bca" + "b" = "bcab" (length 4)
  • Result: f[2][5] = "bcab"

Length 6 substring "abcabc" at f[1][6]:

  • Direct encoding: g(1,6) finds "abcabc" = 2 × "abc"
    • (abcabc + abcabc).index(abcabc, 1) returns 3
    • Pattern length = 3, repetitions = 6/3 = 2
    • Encoding: "2[" + f[1][3] + "]" = "2[abc]" (length 6)
  • Try splits:
    • Split at 1: "a" + "bcabc" = "abcabc" (length 6)
    • Split at 2: "ab" + "cabc" = "abcabc" (length 6)
    • Split at 3: "abc" + "abc" = "abcabc" (length 6)
    • Split at 4: "abca" + "bc" = "abcabc" (length 6)
    • Split at 5: "abcab" + "c" = "abcabc" (length 6)
  • Result: f[1][6] = "2[abc]" (same length as original, but valid encoding)

Final substring "aabcabc" at f[0][6]:

  • Direct encoding: No repeating pattern → "aabcabc"
  • Try splits:
    • Split at 0: f[0][0] + f[1][6] = "a" + "2[abc]" = "a2[abc]" (length 7)
    • Split at 1: f[0][1] + f[2][6] = "aa" + "bcabc" = "aabcabc" (length 7)
    • Other splits also give length 7 or more
  • Result: f[0][6] = "a2[abc]"

Answer: "a2[abc]" - The shortest encoding saves 1 character from the original string.

Solution Implementation

1class Solution:
2    def encode(self, s: str) -> str:
3        """
4        Encode a string by replacing repeated substrings with k[pattern] format.
5        Uses dynamic programming to find the optimal encoding.
6        """
7      
8        def find_best_encoding(start: int, end: int) -> str:
9            """
10            Find the best encoding for substring s[start:end+1].
11            If a repeating pattern exists, encode as k[pattern].
12            """
13            substring = s[start : end + 1]
14          
15            # Don't encode strings shorter than 5 characters
16            if len(substring) < 5:
17                return substring
18          
19            # Find if substring is made of repeating pattern
20            # Trick: search for substring in doubled version starting from index 1
21            # If found before len(substring), it means there's a repeating pattern
22            pattern_end_index = (substring + substring).index(substring, 1)
23          
24            if pattern_end_index < len(substring):
25                # Found a repeating pattern
26                pattern_length = pattern_end_index
27                repetition_count = len(substring) // pattern_length
28                # Recursively encode the pattern itself
29                encoded_pattern = dp[start][start + pattern_length - 1]
30                return f"{repetition_count}[{encoded_pattern}]"
31          
32            return substring
33      
34        n = len(s)
35        # dp[i][j] stores the best encoding for substring s[i:j+1]
36        dp = [[None] * n for _ in range(n)]
37      
38        # Fill dp table from smaller substrings to larger ones
39        # Process from right to left for start index
40        for start in range(n - 1, -1, -1):
41            # Process from left to right for end index
42            for end in range(start, n):
43                # First, try encoding the whole substring as a pattern
44                dp[start][end] = find_best_encoding(start, end)
45              
46                # For substrings longer than 4, try splitting into two parts
47                if end - start + 1 > 4:
48                    for split_point in range(start, end):
49                        # Combine encodings of left and right parts
50                        left_part = dp[start][split_point]
51                        right_part = dp[split_point + 1][end]
52                        combined_encoding = left_part + right_part
53                      
54                        # Keep the shorter encoding
55                        if len(dp[start][end]) > len(combined_encoding):
56                            dp[start][end] = combined_encoding
57      
58        # Return the best encoding for the entire string
59        return dp[0][-1]
60
1class Solution {
2    private String originalString;
3    private String[][] dp; // dp[i][j] stores the shortest encoding for substring from index i to j
4  
5    /**
6     * Encodes a string to its shortest representation using pattern compression.
7     * For example: "aabcaabcbb" -> "2[aabc]bb"
8     * 
9     * @param s the input string to encode
10     * @return the shortest encoded representation of the string
11     */
12    public String encode(String s) {
13        this.originalString = s;
14        int n = s.length();
15        dp = new String[n][n];
16      
17        // Process all substrings from length 1 to n
18        // Start from the end and work backwards to ensure smaller subproblems are solved first
19        for (int start = n - 1; start >= 0; --start) {
20            for (int end = start; end < n; ++end) {
21                // First, try to encode the current substring as a repeating pattern
22                dp[start][end] = encodeSubstring(start, end);
23              
24                // For substrings longer than 4 characters, try splitting into two parts
25                // This optimization avoids unnecessary splits for very short strings
26                if (end - start + 1 > 4) {
27                    for (int splitPoint = start; splitPoint < end; ++splitPoint) {
28                        // Combine encodings of left part [start, splitPoint] and right part [splitPoint+1, end]
29                        String concatenatedEncoding = dp[start][splitPoint] + dp[splitPoint + 1][end];
30                      
31                        // Keep the shorter encoding
32                        if (dp[start][end].length() > concatenatedEncoding.length()) {
33                            dp[start][end] = concatenatedEncoding;
34                        }
35                    }
36                }
37            }
38        }
39      
40        return dp[0][n - 1];
41    }
42  
43    /**
44     * Attempts to encode a substring as a repeating pattern.
45     * If the substring can be represented as k repetitions of a pattern,
46     * returns "k[pattern]", otherwise returns the original substring.
47     * 
48     * @param start starting index of the substring (inclusive)
49     * @param end ending index of the substring (inclusive)
50     * @return the encoded representation or the original substring
51     */
52    private String encodeSubstring(int start, int end) {
53        String substring = originalString.substring(start, end + 1);
54      
55        // Don't encode strings shorter than 5 characters (encoding would be longer)
56        if (substring.length() < 5) {
57            return substring;
58        }
59      
60        // Find if the substring is a repeating pattern
61        // Trick: concatenate the string with itself and search for the original string
62        // If found at index < length, then it's a repeating pattern
63        int patternLength = (substring + substring).indexOf(substring, 1);
64      
65        if (patternLength < substring.length()) {
66            // The substring consists of repeating patterns
67            int repetitionCount = substring.length() / patternLength;
68          
69            // Return the encoded format: count[pattern]
70            // Use the already computed optimal encoding for the pattern
71            return String.format("%d[%s]", repetitionCount, dp[start][start + patternLength - 1]);
72        }
73      
74        return substring;
75    }
76}
77
1class Solution {
2public:
3    string encode(string s) {
4        int n = s.size();
5        // dp[i][j] stores the shortest encoding for substring s[i...j]
6        vector<vector<string>> dp(n, vector<string>(n));
7      
8        // Helper function to find the best encoding for a substring
9        // considering it as a single repeated pattern
10        auto findRepeatedPattern = [&](int start, int end) -> string {
11            // Extract the substring from start to end (inclusive)
12            string substring = s.substr(start, end - start + 1);
13          
14            // If substring is too short, encoding won't help
15            if (substring.size() < 5) {
16                return substring;
17            }
18          
19            // Find if the substring is a repeated pattern
20            // by searching for itself in doubled string starting from position 1
21            int patternLength = (substring + substring).find(substring, 1);
22          
23            // If we found a pattern shorter than the whole substring
24            if (patternLength < substring.size()) {
25                int repeatCount = substring.size() / patternLength;
26                // Return encoded format: count[pattern]
27                return to_string(repeatCount) + "[" + dp[start][start + patternLength - 1] + "]";
28            }
29          
30            // No repeated pattern found, return original substring
31            return substring;
32        };
33      
34        // Fill the dp table from smaller substrings to larger ones
35        // Process from right to left for start position
36        for (int i = n - 1; i >= 0; --i) {
37            // Process from left to right for end position
38            for (int j = i; j < n; ++j) {
39                // First, try to encode the entire substring as a repeated pattern
40                dp[i][j] = findRepeatedPattern(i, j);
41              
42                // For substrings longer than 4 characters, try splitting
43                if (j - i + 1 > 4) {
44                    // Try all possible split points
45                    for (int splitPoint = i; splitPoint < j; ++splitPoint) {
46                        // Combine encodings of left and right parts
47                        string splitEncoding = dp[i][splitPoint] + dp[splitPoint + 1][j];
48                      
49                        // Update if this split gives shorter encoding
50                        if (splitEncoding.size() < dp[i][j].size()) {
51                            dp[i][j] = splitEncoding;
52                        }
53                    }
54                }
55            }
56        }
57      
58        // Return the shortest encoding for the entire string
59        return dp[0][n - 1];
60    }
61};
62
1/**
2 * Encodes a string by finding repeated patterns and representing them in compressed format
3 * For example: "aabcaabcbb" -> "2[aabc]bb"
4 * @param s - The input string to encode
5 * @returns The shortest encoded representation of the string
6 */
7function encode(s: string): string {
8    const length = s.length;
9  
10    // dp[i][j] stores the shortest encoding for substring s[i..j]
11    const dp: string[][] = new Array(length)
12        .fill(0)
13        .map(() => new Array(length).fill(''));
14  
15    /**
16     * Helper function to find the best encoding for a substring
17     * @param startIndex - Starting index of the substring
18     * @param endIndex - Ending index of the substring (inclusive)
19     * @returns The encoded string or original if encoding is not beneficial
20     */
21    const getEncodedSubstring = (startIndex: number, endIndex: number): string => {
22        const substring = s.slice(startIndex, endIndex + 1);
23      
24        // Don't encode strings shorter than 5 characters (encoding overhead)
25        if (substring.length < 5) {
26            return substring;
27        }
28      
29        // Find if the substring can be represented as a repeated pattern
30        // Double the string and search for the original starting from index 1
31        // This finds the shortest repeating unit
32        const repeatingUnitLength = substring.repeat(2).indexOf(substring, 1);
33      
34        if (repeatingUnitLength < substring.length) {
35            // Calculate how many times the pattern repeats
36            const repetitionCount = Math.floor(substring.length / repeatingUnitLength);
37            // Return encoded format: count[pattern]
38            return repetitionCount + '[' + dp[startIndex][startIndex + repeatingUnitLength - 1] + ']';
39        }
40      
41        return substring;
42    };
43  
44    // Build the DP table from bottom-right to top-left
45    // Process shorter substrings first
46    for (let i = length - 1; i >= 0; --i) {
47        for (let j = i; j < length; ++j) {
48            // Get the best encoding for substring s[i..j]
49            dp[i][j] = getEncodedSubstring(i, j);
50          
51            // For substrings longer than 4 characters, try splitting at different points
52            if (j - i + 1 > 4) {
53                for (let splitPoint = i; splitPoint < j; ++splitPoint) {
54                    // Try concatenating encodings of left and right parts
55                    const concatenatedEncoding = dp[i][splitPoint] + dp[splitPoint + 1][j];
56                  
57                    // Update if this split gives a shorter encoding
58                    if (concatenatedEncoding.length < dp[i][j].length) {
59                        dp[i][j] = concatenatedEncoding;
60                    }
61                }
62            }
63        }
64    }
65  
66    // Return the shortest encoding for the entire string
67    return dp[0][length - 1];
68}
69

Time and Space Complexity

Time Complexity: O(n^3)

The algorithm uses dynamic programming with a 2D table where f[i][j] represents the shortest encoding for substring s[i:j+1].

  • The outer two loops iterate through all possible substrings, contributing O(n^2) iterations.
  • For each substring (i, j):
    • The function g(i, j) is called, which:
      • Creates substring t in O(n) time
      • Performs pattern matching using (t + t).index(t, 1) in O(n) time
      • Overall g() takes O(n) time
    • When j - i + 1 > 4, we try all possible split points k between i and j, which adds another O(n) loop
    • String concatenation f[i][k] + f[k + 1][j] takes O(n) in worst case

Therefore, the total time complexity is O(n^2) * O(n) = O(n^3).

Space Complexity: O(n^2)

  • The 2D DP table f has dimensions n × n, storing strings that could be up to length n each
  • In the worst case, each cell stores a string of length O(n)
  • Therefore, the total space complexity is O(n^2 * n) = O(n^3) for storing all strings

However, if we only consider the space for the DP table structure itself (number of cells), it's O(n^2). The actual space usage is O(n^3) when accounting for string storage.

Common Pitfalls

1. Incorrect Pattern Detection Logic

The most common pitfall is misunderstanding how the pattern detection works with (substring + substring).index(substring, 1). Many developers incorrectly assume this finds any repeating pattern, but it specifically finds the smallest period of repetition.

Problem Example:

# Incorrect understanding might lead to:
substring = "abcabc"
# Some might think this detects "abc" repeating twice
# But actually: (substring + substring).index(substring, 1) returns 3
# This correctly identifies that "abc" is the repeating unit

Solution: Understand that this trick works because if a string has period k, then shifting it by k positions produces the same string. The .index() method finds the first occurrence of the full substring in the doubled version, which gives us the smallest period.

2. Forgetting to Check Encoding Length

A critical mistake is always applying the encoding without checking if it actually shortens the string.

Problem Example:

# For substring "ab" repeated twice: "abab"
# Encoding would be "2[ab]" which has length 6
# Original "abab" has length 4 - shorter!
# Must keep original, not encode

Solution: Always compare the length of the encoded string with the original before deciding to use the encoding. The condition len(substring) < 5 in the code handles this for the base case.

3. Not Handling Edge Cases in Recursion

When building the encoded string like f"{repetition_count}[{encoded_pattern}]", forgetting that the pattern itself might need encoding can lead to suboptimal results.

Problem Example:

# For "aaaaaaaaaa" (10 a's)
# Direct encoding: "10[a]" - length 6
# But if we had "aaaaaaaaaaaaaaaaaaaaa" (20 a's)
# We might get "20[a]" - length 6
# Or we could get "2[10[a]]" - length 9 (worse)
# The DP ensures we pick the optimal one

Solution: The recursive call dp[start][start + pattern_length - 1] ensures the pattern itself is optimally encoded before being used in the larger encoding.

4. Incorrect DP Table Traversal Order

Processing the DP table in the wrong order can lead to using uninitialized values.

Problem Example:

# Wrong order:
for start in range(n):  # Going forward
    for end in range(start, n):
        # When we need dp[start][start + pattern_length - 1]
        # It might not be computed yet!

Solution: Process from right to left for the start index and ensure smaller substrings are computed before larger ones. The code correctly uses range(n - 1, -1, -1) for the outer loop.

5. Off-by-One Errors in Index Management

Mixing inclusive and exclusive indices can cause bugs, especially when dealing with substring slicing.

Problem Example:

# If dp[i][j] represents s[i:j] (exclusive end)
# vs dp[i][j] represents s[i:j+1] (inclusive end)
# Mixing these conventions leads to errors

Solution: Be consistent with index conventions. The provided code uses dp[i][j] to represent s[i:j+1], meaning j is inclusive. All substring operations must follow this convention consistently.

Discover Your Strengths and Weaknesses: Take Our 3-Minute Quiz to Tailor Your Study Plan:

The three-steps of Depth First Search are:

  1. Identify states;
  2. Draw the state-space tree;
  3. DFS on the state-space tree.

Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More