Leetcode 1531. String Compression II

Problem Explanation

The problem is about data compression using the Run Length Encoding (RLE) method and further minimising the length of the RLE string by deleting at most k characters from the original string. In the RLE method, consecutive similar characters are replaced with a single occurrence of that character followed by the count for the consecutive times it has occurred. For instance, "aaabbb" changes to "a3b3".

The task is to find the minimum length of the string after applying RLE, given that we can delete at most k characters from the original string. It's also mentioned that we do not need to add '1' after single characters and the string contains only lowercase English letters.

Let us walk through an example:

s = "aaabcccd", k = 2

If we compress s without deleting anything, the string becomes "a3bc3d", of length 6. If we delete 2 characters of 'a', then we will have s = "abcccd" which on compression gives "abc3d" (length 5). The optimal way is to delete 'b' and 'd', then the compressed version of s will be "a3c3" (length 4).

Solution Approach

The solution uses a recursive approach with memoization. It maintains a 2-D dp array to store the optimal compression length for every substring of s with at most k deletions. The function compression(s, i, k) calculates the optimal compression length for the i-th substring of 's' with at most k deletions. If k is less than 0 or i equals the length of s or the remaining length of 's' is less than or equal to 'k', it returns the values accordingly.

We try to make all characters in the substring from i to j the same and for that, we keep the character that has a maximum frequency in this range and try to remove other characters.

C++ Solution

1
2cpp
3class Solution {
4    static constexpr int kMax = 101;
5    vector<vector<int>> dp;
6    
7    int compression(const string& s, int i, int k) {
8        if (k < 0)
9            return kMax;
10        if (i == s.length() || s.length() - i <= k)
11            return 0;
12        if (dp[i][k] != kMax)
13            return dp[i][k];
14
15        int maxFreq = 0;  
16        vector<int> count(128);
17
18        for (int j = i; j < s.length(); ++j) {
19            maxFreq = max(maxFreq, ++count[s[j]]);
20            dp[i][k] = min(  
21                dp[i][k],    
22                getLength(maxFreq) +
23                compression(s, j + 1, k - (j - i + 1 - maxFreq)));
24        }
25
26        return dp[i][k];
27    }
28    
29    int getLength(int maxFreq) {  
30        if (maxFreq == 1)
31            return 1;
32        if (maxFreq < 10)
33            return 2;  
34        if (maxFreq < 100)
35            return 3;  
36        return 4;   
37    }
38    
39public:
40    int getLengthOfOptimalCompression(string s, int k) {
41        dp.resize(s.length(), vector<int>(k + 1, kMax));
42        return compression(s, 0, k);
43    }
44};

Python Solution

Python solution also uses dynamic programming. It creates a dp array to keep track of minimum length of the RLE string by checking all possible chars from 'a' to 'z'. In the outer loop, it iterates on the original string backwards and in the inner loop, it checks for the number of same characters in the range. Then it updates the dp[i+1][del] based on the current min length.

1
2python
3class Solution:
4    def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
5        C = 26
6        n = len(s)
7        counter = [0] * C
8        dp = [[0] * (k + 1) for _ in range(n + 1)]
9        for i in range(n - 1, -1, -1):
10            dp[i] = dp[i + 1][:]    
11            counter[ord(s[i]) - ord('a')] += 1
12            del_ = 0
13            for a in range(C):  
14    
15                dp[i][del_] = min(dp[i][del_], 
16                             (dp[i + counter[a]][del_] if del_ >= counter[a] else 1 +
17                              dp[i + 1][(del_ if a != ord(s[i]) - ord('a') else del_ - 1) if del_ > 0 else 0]) + 
18                             len(str(counter[a])) if counter[a] > 1 else 0)
19                if a != ord(s[i]) - ord('a'):
20                    del_ += counter[a]
21                    counter[a] = 0
22          return dp[0][k]

Java Solution

Java solution uses a recursive approach. It also uses a dp array of size [26][n+1][k+1] to keep track of minimum length for each char from 'a' to 'z' and max โ€˜kโ€™ deletions.

1
2java
3class Solution {
4  public int getLengthOfOptimalCompression(String s, int k) {
5    char[] chars = s.toCharArray();
6    int[][][] dp = new int[26][chars.length + 2][chars.length + 1];
7    return helper(chars, new char[chars.length + 2], 0, 0, 0, k, dp);
8  }
9
10  private int helper(char[] chars, char[] path, int i, int j, int last, int k, int[][][] dp) {
11    if (k < 0) {
12      return Integer.MAX_VALUE / 2;
13    }
14    if (i == chars.length) {
15      return 0;
16    }
17    char c= chars[i];
18    if (c == last) {
19      path[j] = c;
20      int len = j == 0 ? 0 : (j <= 2 ? 1 : (j <= 10 ? 2 : (j <= 100 ? 3 : 4)));
21      return len + helper(chars, path, i + 1, j + 1, last, k, dp);
22    }
23    if (dp[c - 'a'][j][k] != 0) {
24      return dp[c - 'a'][j][k];
25    }
26    path[0] = c;
27    int keep = helper(chars, path, i + 1, 1, c, k, dp);
28    int skip = helper(chars, path, i + 1, j, last, k - 1, dp);
29    dp[c - 'a'][j][k] = Math.min(keep, skip);
30    return dp[c - 'a'][j][k];
31  }
32}

In these solutions, dynamic programming is used to reduce the time complexity and to avoid recomputation. The important factor to consider here is the choice of data structure and how it is used to store the computed results of smaller problems. Depending on the selected approach, the solution can vary but the basic idea remains the same.


Got a question?ย Ask the Teaching Assistantย anything you don't understand.

Still not clear? Ask in the Forum, ย Discordย orย Submitย the part you don't understand to our editors.

โ†
โ†‘TA ๐Ÿ‘จโ€๐Ÿซ