3163. String Compression III
Problem Description
You are given a string word
that you need to compress using a specific algorithm.
The compression algorithm works as follows:
- Start with an empty result string called
comp
. - While the input string
word
is not empty, repeatedly perform these operations:- Find the longest prefix of
word
that consists of the same character repeated (e.g., "aaaa" or "bbb") - This prefix can be at most 9 characters long
- Remove this prefix from
word
- Add the length of the prefix followed by the character to
comp
- Find the longest prefix of
For example:
- If
word = "aaaaaaaaaaabb"
, the first prefix would be "aaaaaaaaa" (9 'a's), which gets compressed to "9a" - The remaining "aaa" would become "3a"
- The "bb" would become "2b"
- Final result:
"9a3a2b"
The key constraint is that each compression unit can represent at most 9 consecutive occurrences of a character. If a character appears more than 9 times consecutively, you must split it into multiple compression units.
Your task is to implement this compression algorithm and return the compressed string comp
.
Intuition
When looking at this compression problem, we need to identify groups of consecutive identical characters and count them. The natural approach is to traverse the string and group consecutive occurrences of the same character together.
The main challenge is handling the constraint that each group can represent at most 9 characters. If we have more than 9 consecutive identical characters, we need to split them into multiple groups of 9 or less.
Think of it like packing items into boxes where each box can hold at most 9 items of the same type. If you have 15 identical items, you'd fill one box with 9 items and another with 6 items. Similarly, if we encounter "aaaaaaaaaaaaaaa" (15 'a's), we'd compress it as "9a6a".
The solution uses Python's groupby
function which automatically groups consecutive identical elements together. For each group:
- Count the total occurrences (
k
) - Break down
k
into chunks of at most 9 - For each chunk, append the count and character to our result
For example, if we have 23 consecutive 'a's:
- First chunk: min(9, 23) = 9, append "9a", remaining = 14
- Second chunk: min(9, 14) = 9, append "9a", remaining = 5
- Third chunk: min(9, 5) = 5, append "5a", remaining = 0
This greedy approach of always taking the maximum possible (up to 9) ensures we get the correct compression while respecting the constraint.
Solution Approach
The solution uses a two-pointer technique combined with Python's groupby
function to efficiently compress the string.
Step-by-step implementation:
-
Group consecutive characters: Use
groupby(word)
to automatically group consecutive identical characters. This function returns an iterator of (key, group) pairs where the key is the character and group contains all consecutive occurrences. -
Process each group: For each character group
(c, v)
:- Convert the group iterator
v
to a list and get its lengthk
to count consecutive occurrences - Example: If we have "aaaaa",
k = 5
- Convert the group iterator
-
Handle the 9-character limit: Since each compression unit can represent at most 9 characters, we use a while loop to break down
k
:while k: x = min(9, k) # Take at most 9 characters ans.append(str(x) + c) # Add "count + character" to result k -= x # Reduce remaining count
This greedy approach ensures we always take the maximum allowed (up to 9) in each iteration.
-
Build the result: Each iteration appends a string in the format
"[count][character]"
to theans
list. For example:- 15 'a's → generates "9a" then "6a"
- 3 'b's → generates "3b"
-
Join and return: Finally, join all compression units with
"".join(ans)
to create the final compressed string.
Example walkthrough:
- Input:
"aaaaaaaaaaaabb"
- After groupby:
[('a', 13 a's), ('b', 2 b's)]
- Processing 'a' group (k=13):
- First iteration: x=9, append "9a", k=4
- Second iteration: x=4, append "4a", k=0
- Processing 'b' group (k=2):
- Single iteration: x=2, append "2b", k=0
- Result:
"9a4a2b"
The time complexity is O(n)
where n
is the length of the input string, as we traverse each character once. The space complexity is O(n)
for storing the result.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the compression of word = "aaabbccccc"
step by step:
Step 1: Group consecutive characters
Using groupby
, we get:
- Group 1: ('a', ['a', 'a', 'a']) → 3 'a's
- Group 2: ('b', ['b', 'b']) → 2 'b's
- Group 3: ('c', ['c', 'c', 'c', 'c', 'c']) → 5 'c's
Step 2: Process each group
Processing Group 1 - 'a' with k=3:
- Since k=3 < 9, we take all 3
- x = min(9, 3) = 3
- Append "3a" to result
- k = 3 - 3 = 0 (done with this group)
Processing Group 2 - 'b' with k=2:
- Since k=2 < 9, we take all 2
- x = min(9, 2) = 2
- Append "2b" to result
- k = 2 - 2 = 0 (done with this group)
Processing Group 3 - 'c' with k=5:
- Since k=5 < 9, we take all 5
- x = min(9, 5) = 5
- Append "5c" to result
- k = 5 - 5 = 0 (done with this group)
Step 3: Join results
- ans = ["3a", "2b", "5c"]
- Final result: "3a2b5c"
Example with splitting (exceeding 9 limit):
Let's also see word = "aaaaaaaaaaaa"
(12 'a's):
Processing 'a' with k=12:
- First iteration: x = min(9, 12) = 9
- Append "9a" to result
- k = 12 - 9 = 3
- Second iteration: x = min(9, 3) = 3
- Append "3a" to result
- k = 3 - 3 = 0 (done)
Final result: "9a3a"
This demonstrates how the algorithm handles both simple cases and cases where we need to split groups due to the 9-character limit.
Solution Implementation
1class Solution:
2 def compressedString(self, word: str) -> str:
3 """
4 Compress a string by counting consecutive characters.
5 Each group is represented as count + character, where count is at most 9.
6 If a group has more than 9 consecutive characters, split it into multiple groups.
7
8 Args:
9 word: Input string to compress
10
11 Returns:
12 Compressed string representation
13 """
14 from itertools import groupby
15
16 # Group consecutive identical characters together
17 grouped_chars = groupby(word)
18 result = []
19
20 # Process each group of consecutive characters
21 for character, group_iterator in grouped_chars:
22 # Count the number of consecutive occurrences
23 count = len(list(group_iterator))
24
25 # Split into chunks of at most 9 characters each
26 while count > 0:
27 # Take at most 9 characters for this chunk
28 chunk_size = min(9, count)
29 # Append the count followed by the character
30 result.append(str(chunk_size) + character)
31 # Reduce the remaining count
32 count -= chunk_size
33
34 # Join all parts into the final compressed string
35 return "".join(result)
36
1class Solution {
2 public String compressedString(String word) {
3 StringBuilder compressedResult = new StringBuilder();
4 int wordLength = word.length();
5
6 // Iterate through the word to find consecutive character groups
7 for (int currentIndex = 0; currentIndex < wordLength;) {
8 // Find the end of the current group of identical characters
9 int nextDifferentCharIndex = currentIndex + 1;
10 while (nextDifferentCharIndex < wordLength &&
11 word.charAt(nextDifferentCharIndex) == word.charAt(currentIndex)) {
12 nextDifferentCharIndex++;
13 }
14
15 // Calculate the count of consecutive identical characters
16 int consecutiveCount = nextDifferentCharIndex - currentIndex;
17
18 // Handle counts greater than 9 by splitting into multiple segments
19 // Each segment can have a maximum count of 9
20 while (consecutiveCount > 0) {
21 int segmentCount = Math.min(9, consecutiveCount);
22 compressedResult.append(segmentCount).append(word.charAt(currentIndex));
23 consecutiveCount -= segmentCount;
24 }
25
26 // Move to the next group of characters
27 currentIndex = nextDifferentCharIndex;
28 }
29
30 return compressedResult.toString();
31 }
32}
33
1class Solution {
2public:
3 string compressedString(string word) {
4 string result;
5 int wordLength = word.length();
6
7 // Iterate through the string to find consecutive character groups
8 for (int currentIndex = 0; currentIndex < wordLength;) {
9 // Find the end of the current group of same characters
10 int nextIndex = currentIndex + 1;
11 while (nextIndex < wordLength && word[nextIndex] == word[currentIndex]) {
12 ++nextIndex;
13 }
14
15 // Calculate the count of consecutive same characters
16 int consecutiveCount = nextIndex - currentIndex;
17
18 // Compress the group by breaking it into chunks of maximum 9
19 // (since we can only use single digit 1-9 for count representation)
20 while (consecutiveCount > 0) {
21 // Take at most 9 characters at a time
22 int chunkSize = min(9, consecutiveCount);
23
24 // Append the count (as a character) and the character itself
25 result.push_back('0' + chunkSize); // Convert digit to character
26 result.push_back(word[currentIndex]);
27
28 // Update remaining count
29 consecutiveCount -= chunkSize;
30 }
31
32 // Move to the next group of characters
33 currentIndex = nextIndex;
34 }
35
36 return result;
37 }
38};
39
1/**
2 * Compresses a string by replacing consecutive identical characters with count-character pairs.
3 * Each count is limited to a maximum of 9, so longer sequences are split into multiple pairs.
4 * @param word - The input string to compress
5 * @returns The compressed string representation
6 */
7function compressedString(word: string): string {
8 // Array to store the compressed result pieces
9 const result: string[] = [];
10 const length: number = word.length;
11
12 // Iterate through the string
13 for (let currentIndex: number = 0; currentIndex < length; ) {
14 // Find the end of the current sequence of identical characters
15 let nextIndex: number = currentIndex + 1;
16 while (nextIndex < length && word[nextIndex] === word[currentIndex]) {
17 nextIndex++;
18 }
19
20 // Calculate the count of consecutive identical characters
21 let consecutiveCount: number = nextIndex - currentIndex;
22
23 // Split the count into chunks of maximum 9 and add to result
24 while (consecutiveCount > 0) {
25 // Take at most 9 characters at a time
26 const chunkSize: number = Math.min(consecutiveCount, 9);
27 // Add the count followed by the character
28 result.push(chunkSize + word[currentIndex]);
29 consecutiveCount -= chunkSize;
30 }
31
32 // Move to the next different character
33 currentIndex = nextIndex;
34 }
35
36 // Join all pieces to form the final compressed string
37 return result.join('');
38}
39
Time and Space Complexity
The time complexity is O(n)
, where n
is the length of the input string word
. This is because:
- The
groupby
function iterates through the string once to create groups of consecutive identical characters, which takesO(n)
time - Converting the grouped values to a list with
len(list(v))
consumes each group exactly once across all iterations - The while loop inside processes each character in the original string exactly once (distributing them into chunks of at most 9)
- String concatenation operations in the loop are
O(1)
for each append operation - The final
"".join(ans)
operation isO(n)
since it needs to concatenate all characters in the result
The space complexity is O(n)
, where n
is the length of the input string word
. This is because:
- The
ans
list stores the compressed representation, which in the worst case (when no consecutive characters exist or when characters repeat more than 9 times) can be proportional to the input size - The
groupby
iterator and the intermediate list created bylist(v)
use space proportional to the size of each group, which sums up toO(n)
across all groups - The final joined string also requires
O(n)
space
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Forgetting the 9-Character Limit
One of the most common mistakes is overlooking the constraint that each compression unit can represent at most 9 consecutive characters. A naive implementation might simply count all consecutive characters and write them directly:
Incorrect Implementation:
def compressedString(self, word: str) -> str:
result = []
i = 0
while i < len(word):
char = word[i]
count = 1
while i + 1 < len(word) and word[i + 1] == char:
count += 1
i += 1
result.append(str(count) + char) # Wrong! count could be > 9
i += 1
return "".join(result)
Why it fails: For input like "aaaaaaaaaaaaa"
(13 a's), this would produce "13a"
instead of the correct "9a4a"
.
Solution: Always split counts greater than 9 into multiple chunks:
while count > 0:
chunk_size = min(9, count)
result.append(str(chunk_size) + char)
count -= chunk_size
2. Inefficient Manual Character Counting
Another pitfall is implementing character counting manually with nested loops, which can lead to off-by-one errors and complexity:
Error-Prone Implementation:
def compressedString(self, word: str) -> str:
result = []
i = 0
while i < len(word):
j = i
while j < len(word) and word[j] == word[i]:
j += 1
count = j - i # Easy to miscalculate
# ... rest of logic
Common bugs in manual counting:
- Forgetting to handle the last character group
- Index out of bounds when checking
word[j]
- Incorrectly updating the loop counter
Solution: Use Python's itertools.groupby()
which handles the grouping logic cleanly and avoids index manipulation errors. Alternatively, if implementing manually, be careful with boundary conditions and test with edge cases like single characters and strings ending with different patterns.
3. String Concatenation Performance Issue
Building the result string through repeated concatenation can be inefficient:
Inefficient approach:
result = ""
for character, group in groupby(word):
count = len(list(group))
while count > 0:
chunk = min(9, count)
result += str(chunk) + character # String concatenation in loop
count -= chunk
Why it's problematic: Strings are immutable in Python, so each concatenation creates a new string object, leading to O(n²) time complexity in worst case.
Solution: Use a list to collect parts and join them at the end:
result = [] # ... append to result list return "".join(result) # Single join operation at the end
A person thinks of a number between 1 and 1000. You may ask any number questions to them, provided that the question can be answered with either "yes" or "no".
What is the minimum number of questions you needed to ask so that you are guaranteed to know the number that the person is thinking?
Recommended Readings
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Runtime Overview When learning about algorithms and data structures you'll frequently encounter the term time complexity This concept is fundamental in computer science and offers insights into how long an algorithm takes to complete given a certain input size What is Time Complexity Time complexity represents the amount of time
Want a Structured Path to Master System Design Too? Don’t Miss This!