1531. String Compression II
Problem Description
The problem is an optimization task based on modifying the run-length encoding (RLE) of a string. Run-length encoding is a simple form of lossless data compression where sequences of the same character are replaced with a single character and a count of that character's repetitions. The task is to reduce the length of the encoded string by strategically deleting up to k
characters from the original string. The challenge is to determine which characters to remove in order to achieve the smallest possible length of the RLE version of the string.
For instance, if we are given s = "aabccc"
and k = 2
, by removing a single 'b' and one 'c', we can transform 's' to "aac" which can be RLE encoded as "a2c", the resulting length is 3, which is the minimum possible.
Intuition
To solve this problem, dynamic programming is used to build up a solution by solving smaller subproblems. The dynamic programming array dp[i][k]
is defined to store the length of the optimal compression for the substring starting at the i-th character of s
with at most k
characters deleted.
The solution iterates over the string, keeping track of the frequency of characters encountered (count[]
), and calculates the minimum RLE length that can be obtained by either keeping or deleting characters. Deleting characters comes with the trade-off of possibly increasing the size due to the frequency of a run being less compressed (e.g., "aaabb" to "aab" increases the length when encoded because "a2b2" becomes "a2b1").
The key is to determine whether to keep a character in the current run or to delete some characters to potentially form a longer run of a different character that comes later. This decision is made by considering the lengths of all possible encodings that can result from either keeping or deleting characters and choosing the one with the minimum length.
The recursive function compression
performs this computation and memoizes the results in dp
to avoid repeated calculations for the same subproblems. Additionally, the function getLength
calculates the RLE length of a sequence with a given frequency. Since run lengths only add characters to the encoding when the count is 2 or more, the function handles single characters and different number magnitude representations separately (1 digit, 2 digits, 3 digits).
In essence, the algorithm involves exploring different scenarios of deletion while memoizing results to find the optimal compression length efficiently.
Learn more about Dynamic Programming patterns.
Solution Approach
The solution utilizes a top-down recursive approach with memoization, a common technique in dynamic programming to avoid recalculating the same outcomes multiple times. The crux of the solution lies in the compression
method and the use of the dp
array alongside the count
array, which help to bookkeep the best results while iterating through the possible deletions and runs.
Dynamic Programming Array dp
A 2D array dp
of size n
by k+1
is used to store the lengths of optimal compressions for different subproblems, where n
is the length of the original string and k
is the maximum number of characters we can delete.
dp[i][k]
represents the minimal length of the RLE string starting from thei-th
index ofs
with at mostk
deletions allowed.- Initialized with a large constant
K_MAX
representing an unattainable maximum length, suggesting that the corresponding subproblem has not yet been solved.
The compression
Method
This method takes as input the string s
, the current index i
, and the remaining deletion allowance k
. It recurses through the string, trying all possible deletions and keeping track of the resulting RLE length.
Base Cases
- When
k < 0
: Invalid, since we cannot delete more characters than allowed. ReturnK_MAX
. - When
i == s.length() || s.length() - i <= k
: We've reached the end of the string, or we have enough deletions left to delete the remaining characters. In either case, no RLE length addition is needed, so return 0. - If
dp[i][k] != K_MAX
, return the memoized value, avoiding unnecessary recomputation.
Recursive Case
With maxFreq
initialized to 0 and count
to count character frequencies, the idea is to iterate from the current index i
to the end of the string with index j
:
- Increase
count
for the current character, and updatemaxFreq
with the maximum frequency seen so far. - Recurse with the updated parameters:
compression(s, j + 1, k - (j - i + 1 - maxFreq))
. - The new
k
is reduced by the number of deletions necessary to keep themaxFreq
characters in the current range undisrupted. - Calculate and update the optimal
dp[i][k]
with the current best attempt, which includes the RLE length of the current char (getLength(maxFreq)
) and the optimal compression length of the remaining string given the current deletions.
The getLength
Method
The getLength
method is just a helper that translates frequencies into the additional length needed in the RLE string.
1
requires no additional digit, as we do not encode single characters with a count.2-9
adds 1 digit.10-99
adds 2 digits.100+
adds 3 digits.
Combining this logic with the compression
routine and the dp
array with memoization, the method getLengthOfOptimalCompression
builds the solution from smaller subproblems and caches these solutions to use in larger problems. This reduces the time and computational complexity significantly, allowing us to find the optimal length of the RLE string after a maximum of k
deletions.
Running Time Complexity
The algorithm operates in O(n^2 * k) time since for each character in the string (n
), it looks at all characters that follow it (n
), and this is done for every possible number of deletions (k
). The use of memoization improves the time complexity from exponential, which it would have been if it was a simple recursive solution without memoization.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's illustrate the solution approach using a small example, say s = "aaabba"
and k = 2
.
Step 1: Initialize the dp
array to represent unsolved subproblems. For this example, our dp
array will have n = 6
(length of s
) rows and k+1 = 3
columns to account for 0 to 2 deletions.
Step 2: Call the compression
function starting with the full string and all deletions available, compression(s, 0, 2)
.
Step 3: Look at the base cases: we're not yet at the end of the string nor have we used all k
deletions, and dp[0][2]
is uninitialized, so we move to the recursive case.
Step 4: With i = 0
and maxFreq = 0
, we start examining runs of characters starting with the first 'a'. We can keep all three 'a's without any deletion, so maxFreq
becomes 3 and count['a']
is also 3.
Step 5: As we have 3 'a's consecutively with no deletions, dp[0][2]
will consider keeping them as is, and then recursively compute compression(s, 3, 2)
for the next part of the string, which starts with "bba".
Step 6: Recursively, we process the next segment "bba" starting from i = 3
. With two deletions remaining, the optimal decision is to delete two 'b's to only have "a". The compression
function is then called with compression(s, 6, 0)
.
Step 7: The compression(s, 6, 0)
call reaches the end of the string, so it returns 0 as per the base case.
Step 8: Now we add the length of 'a' run to the result of the recursion. Since we had 3 'a's initially, getLength(3)
returns 1 (the additional digit for the count is '3'), resulting in dp[0][2] = 1 + 0 = 1
.
Step 9: Returning to the first part of the string, we must consider the outcomes if we had deleted one of the 'a's. If we did, making maxFreq = 2
, we'd end up with a string "aabba" and the recursive call compression(s, 1, 1)
must be made.
Step 10: This process continues, exploring combinations of keeping and deleting characters at each step, memoizing the results in the dp
array to avoid duplicate computations.
Step 11: After the recursive calls are finished, we look at dp[0][2]
to get the answer, which stores the compressed string of minimum length which can be formed by starting at index 0 of s
with at most 2 deletions.
In our case, the RLE of the original s = "aaabba"
without deletions would be a3b2a1
, and with the two deletions allowing us to remove the 'b's, the optimal RLE would be a3
, yielding the smallest length of 2, which is the answer.
Solution Implementation
1from typing import List
2
3class Solution:
4 K_MAX = 101
5
6 def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
7 # dp[i][k] represents the length of the optimal compression
8 # of substring s[i:] with at most k deletions
9 self.dp = [[self.K_MAX for _ in range(k + 1)] for _ in range(len(s))]
10
11 # Start compression from index 0 with k deletions allowed
12 return self._compress(s, 0, k)
13
14 def _compress(self, s: str, start_idx: int, k: int) -> int:
15 # Return an impossibly large value if deletions allowed are negative
16 if k < 0:
17 return self.K_MAX
18
19 # If we are at the end of the string or we can delete all remaining characters, return 0
20 if start_idx == len(s) or len(s) - start_idx <= k:
21 return 0
22
23 # If we have already computed this state, return its value to avoid recomputation
24 if self.dp[start_idx][k] != self.K_MAX:
25 return self.dp[start_idx][k]
26
27 max_frequency = 0
28 frequency_count = [0] * 128 # Frequency array to count occurrences of each character
29
30 # Try to find the optimal compression by maintaining a window [start_idx..j]
31 for j in range(start_idx, len(s)):
32 # Update frequency of the current character and maxFrequency found so far
33 frequency_count[ord(s[j])] += 1
34 max_frequency = max(max_frequency, frequency_count[ord(s[j])])
35
36 # Calculate the minimal length by keeping character s[j] (most frequent so far) and deleting others
37 self.dp[start_idx][k] = min(
38 self.dp[start_idx][k],
39 self._getLength(max_frequency) + self._compress(s, j + 1, k - (j - start_idx + 1 - max_frequency))
40 )
41
42 # Return the computed value for this state
43 return self.dp[start_idx][k]
44
45 def _getLength(self, max_frequency: int) -> int:
46 # Calculates the length of the compressed string after compression
47 if max_frequency == 1:
48 return 1 # Single character (e.g. "a")
49 if max_frequency < 10:
50 return 2 # Single digit frequency followed by character (e.g. "9a")
51 if max_frequency < 100:
52 return 3 # Two digit frequency followed by character (e.g. "23b")
53 return 4 # Three digit frequency followed by character (e.g. "100c")
54
1class Solution {
2 private static final int K_MAX = 101;
3 private int[][] dp;
4
5 public int getLengthOfOptimalCompression(String s, int k) {
6 // dp[i][k] represents the length of the optimal compression of substring s[i..n) with at most k deletions
7 dp = new int[s.length()][k + 1];
8
9 // Initialize all elements of dp array to K_MAX
10 for (int[] row : dp) {
11 Arrays.fill(row, K_MAX);
12 }
13
14 // Start compression from index 0 with k deletions allowed
15 return compress(s, 0, k);
16 }
17
18 private int compress(final String s, int startIdx, int k) {
19 // If deletions allowed are negative, return an impossibly large value (K_MAX)
20 if (k < 0) {
21 return K_MAX;
22 }
23
24 // If we are at the end of the string or we can delete all remaining characters, return 0
25 if (startIdx == s.length() || s.length() - startIdx <= k) {
26 return 0;
27 }
28
29 // If we have already computed this state, return its value to avoid recomputation
30 if (dp[startIdx][k] != K_MAX) {
31 return dp[startIdx][k];
32 }
33
34 int maxFrequency = 0;
35 int[] frequencyCount = new int[128]; // Frequency array to count occurrences of each character
36
37 // Try to find the optimal compression by maintaining a window [startIdx..j]
38 for (int j = startIdx; j < s.length(); ++j) {
39 // Update frequency of the current character and maxFrequency found so far
40 maxFrequency = Math.max(maxFrequency, ++frequencyCount[s.charAt(j)]);
41
42 // Calculate the minimum length by keeping character s.charAt(j) (which is most frequent so far) and deleting others
43 dp[startIdx][k] = Math.min(
44 dp[startIdx][k],
45 getLength(maxFrequency) + compress(s, j + 1, k - (j - startIdx + 1 - maxFrequency))
46 );
47 }
48
49 // Return the computed value for this state
50 return dp[startIdx][k];
51 }
52
53 // Calculates the length of the compressed string after compression
54 private int getLength(int maxFrequency) {
55 if (maxFrequency == 1) {
56 return 1; // Single character (e.g. "a")
57 }
58 if (maxFrequency < 10) {
59 return 2; // Single digit frequency followed by character (e.g. "9a")
60 }
61 if (maxFrequency < 100) {
62 return 3; // Two digit frequency followed by character (e.g. "23b")
63 }
64 return 4; // Three digit frequency followed by character (e.g. "100c")
65 }
66}
67
1#include <vector>
2#include <string>
3#include <algorithm>
4#include <cstring>
5using namespace std;
6
7class Solution {
8private:
9 static constexpr int K_MAX = 101; // Maximum value for K_MAX as defined
10 vector<vector<int>> dp; // dp table to store results of subproblems
11
12 // Calculates the length of compressed string given a character frequency
13 int getLength(int maxFrequency) {
14 if (maxFrequency == 1) {
15 return 1; // Single character (e.g. "a")
16 }
17 if (maxFrequency < 10) {
18 return 2; // Single digit frequency followed by character (e.g. "9a")
19 }
20 if (maxFrequency < 100) {
21 return 3; // Two digit frequency followed by character (e.g. "23b")
22 }
23 return 4; // Three digit frequency followed by character (e.g. "100c")
24 }
25
26 // Recursive function to compute the length of optimal compression
27 int compress(const string& s, int startIdx, int k) {
28 if (k < 0) {
29 return K_MAX; // If k is negative, we cannot do further deletions; return max value
30 }
31
32 if (startIdx == s.size() || s.size() - startIdx <= k) {
33 return 0; // If we reach the end or can delete all remaining, no more compression needed
34 }
35
36 if (dp[startIdx][k] != K_MAX) {
37 return dp[startIdx][k]; // Return precomputed result if we have already solved this subproblem
38 }
39
40 int maxFrequency = 0;
41 vector<int> frequencyCount(128, 0); // ASCII-based frequency count
42
43 for (int j = startIdx; j < s.size(); ++j) {
44 maxFrequency = max(maxFrequency, ++frequencyCount[s[j]]);
45
46 // Calculate minimum length by keeping the most frequent character and deleting the rest
47 dp[startIdx][k] = min(
48 dp[startIdx][k],
49 getLength(maxFrequency) + compress(s, j + 1, k - (j - startIdx + 1 - maxFrequency))
50 );
51 }
52
53 return dp[startIdx][k]; // Return the calculated value
54 }
55
56public:
57 // Main function called to get the length of the optimal compression
58 int getLengthOfOptimalCompression(string s, int k) {
59 dp.resize(s.length(), vector<int>(k + 1, K_MAX)); // Initialize DP table with K_MAX values
60
61 // Start the compression process from index 0 with 'k' deletions allowed
62 return compress(s, 0, k);
63 }
64};
65
1const K_MAX = 101;
2let dp: number[][];
3
4function getLengthOfOptimalCompression(s: string, k: number): number {
5 // Initialize the dp array with s.length rows and k + 1 columns set to K_MAX
6 dp = Array.from({ length: s.length }, () => Array(k + 1).fill(K_MAX));
7
8 // Start the compression process from the first character with the given limit on deletions
9 return compress(s, 0, k);
10}
11
12function compress(s: string, startIndex: number, k: number): number {
13 // Invalid case with negative available deletions
14 if (k < 0) {
15 return K_MAX;
16 }
17
18 // If we're at the end of the string or we can delete all remaining characters
19 if (startIndex >= s.length || s.length - startIndex <= k) {
20 return 0;
21 }
22
23 // Return previously computed value to avoid redundant calculations
24 if (dp[startIndex][k] !== K_MAX) {
25 return dp[startIndex][k];
26 }
27
28 let maxFrequency = 0;
29 const frequencyCount = new Array(128).fill(0); // ASCII-based frequency array
30
31 // Iterate over the substring to find optimal compression
32 for (let j = startIndex; j < s.length; ++j) {
33 // Increment the frequency of the current character
34 const charIndex = s.charCodeAt(j);
35 maxFrequency = Math.max(maxFrequency, ++frequencyCount[charIndex]);
36
37 // Consider keeping the most frequent character so far and deleting the rest
38 dp[startIndex][k] = Math.min(
39 dp[startIndex][k],
40 getCompressedLength(maxFrequency) + compress(s, j + 1, k - (j - startIndex + 1 - maxFrequency))
41 );
42 }
43
44 // Return the optimal length found for this state
45 return dp[startIndex][k];
46}
47
48function getCompressedLength(maxFrequency: number): number {
49 // Determine the length of the compressed string given the character frequency
50 if (maxFrequency === 1) {
51 return 1;
52 } else if (maxFrequency < 10) {
53 return 2;
54 } else if (maxFrequency < 100) {
55 return 3;
56 } else {
57 return 4;
58 }
59}
60
Time and Space Complexity
Time Complexity
The time complexity of this approach is primarily determined by the two nested loops and the recursive compression
method. Let's denote n
as the length of the string s
.
- The outer loop runs from
i
ton
. - For each
i
, the inner loop runs fromi
ton
, and in each iteration, thecompression
method is called. - The
compression
method itself, in the worst case, could be called up ton
times for varying values ofj
andk
.
Due to these nested loops, the basic operation count is O(n^2)
for the loops, multiplied by the cost of recursion for each loop iteration. The recursion, however, is optimized with memoization (dynamic programming), so each state (i, k)
is solved only once. There are n * k
states since i
ranges from 0
to n
and k
ranges from 0
to k
.
Thus, the overall time complexity is O(n^2 * k)
.
Space Complexity
The space complexity is governed by the memoization table dp
and the recursive call stack.
-
The
dp
table is a 2D array with dimensionsn
(for the length of strings
) byk + 1
(for the maximum number of deletions plus one), which gives us a space complexity ofO(n * k)
. -
The recursive
compression
method does not use any additional significant space, but there is a call stack overhead due to recursion. In the worst case, it can gon
calls deep, as the recursion may span from the start to the end of the strings
.
Considering both factors, the space complexity is O(n * k)
for the memoization table plus O(n)
for the stack space, which combines to O(n * k)
. Since the call stack space is typically smaller and dominated by the larger n * k
term, we simplify the overall space complexity to O(n * k)
.
Learn more about how to find time and space complexity quickly using problem constraints.
A person thinks of a number between 1 and 1000. You may ask any number questions to them, provided that the question can be answered with either "yes" or "no".
What is the minimum number of questions you needed to ask so that you are guaranteed to know the number that the person is thinking?
Recommended Readings
What is Dynamic Programming Prerequisite DFS problems dfs_intro Backtracking problems backtracking Memoization problems memoization_intro Pruning problems backtracking_pruning Dynamic programming is an algorithmic optimization technique that breaks down a complicated problem into smaller overlapping sub problems in a recursive manner and uses solutions to the sub problems to construct a solution
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!