3163. String Compression III

MediumString

Problem Description

You are given a string word that you need to compress using a specific algorithm.

The compression algorithm works as follows:

Start with an empty result string called comp.
While the input string word is not empty, repeatedly perform these operations:
- Find the longest prefix of word that consists of the same character repeated (e.g., "aaaa" or "bbb")
- This prefix can be at most 9 characters long
- Remove this prefix from word
- Add the length of the prefix followed by the character to comp

For example:

If word = "aaaaaaaaaaabb", the first prefix would be "aaaaaaaaa" (9 'a's), which gets compressed to "9a"
The remaining "aaa" would become "3a"
The "bb" would become "2b"
Final result: "9a3a2b"

The key constraint is that each compression unit can represent at most 9 consecutive occurrences of a character. If a character appears more than 9 times consecutively, you must split it into multiple compression units.

Your task is to implement this compression algorithm and return the compressed string comp.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

When looking at this compression problem, we need to identify groups of consecutive identical characters and count them. The natural approach is to traverse the string and group consecutive occurrences of the same character together.

The main challenge is handling the constraint that each group can represent at most 9 characters. If we have more than 9 consecutive identical characters, we need to split them into multiple groups of 9 or less.

Think of it like packing items into boxes where each box can hold at most 9 items of the same type. If you have 15 identical items, you'd fill one box with 9 items and another with 6 items. Similarly, if we encounter "aaaaaaaaaaaaaaa" (15 'a's), we'd compress it as "9a6a".

The solution uses Python's groupby function which automatically groups consecutive identical elements together. For each group:

Count the total occurrences (k)
Break down k into chunks of at most 9
For each chunk, append the count and character to our result

For example, if we have 23 consecutive 'a's:

First chunk: min(9, 23) = 9, append "9a", remaining = 14
Second chunk: min(9, 14) = 9, append "9a", remaining = 5
Third chunk: min(9, 5) = 5, append "5a", remaining = 0

This greedy approach of always taking the maximum possible (up to 9) ensures we get the correct compression while respecting the constraint.

Solution Implementation

1class Solution:
2    def compressedString(self, word: str) -> str:
3        """
4        Compress a string by counting consecutive characters.
5        Each group is represented as count + character, where count is at most 9.
6        If a group has more than 9 consecutive characters, split it into multiple groups.
7      
8        Args:
9            word: Input string to compress
10          
11        Returns:
12            Compressed string representation
13        """
14        from itertools import groupby
15      
16        # Group consecutive identical characters together
17        grouped_chars = groupby(word)
18        result = []
19      
20        # Process each group of consecutive characters
21        for character, group_iterator in grouped_chars:
22            # Count the number of consecutive occurrences
23            count = len(list(group_iterator))
24          
25            # Split into chunks of at most 9 characters each
26            while count > 0:
27                # Take at most 9 characters for this chunk
28                chunk_size = min(9, count)
29                # Append the count followed by the character
30                result.append(str(chunk_size) + character)
31                # Reduce the remaining count
32                count -= chunk_size
33      
34        # Join all parts into the final compressed string
35        return "".join(result)
36

1class Solution {
2    public String compressedString(String word) {
3        StringBuilder compressedResult = new StringBuilder();
4        int wordLength = word.length();
5      
6        // Iterate through the word to find consecutive character groups
7        for (int currentIndex = 0; currentIndex < wordLength;) {
8            // Find the end of the current group of identical characters
9            int nextDifferentCharIndex = currentIndex + 1;
10            while (nextDifferentCharIndex < wordLength && 
11                   word.charAt(nextDifferentCharIndex) == word.charAt(currentIndex)) {
12                nextDifferentCharIndex++;
13            }
14          
15            // Calculate the count of consecutive identical characters
16            int consecutiveCount = nextDifferentCharIndex - currentIndex;
17          
18            // Handle counts greater than 9 by splitting into multiple segments
19            // Each segment can have a maximum count of 9
20            while (consecutiveCount > 0) {
21                int segmentCount = Math.min(9, consecutiveCount);
22                compressedResult.append(segmentCount).append(word.charAt(currentIndex));
23                consecutiveCount -= segmentCount;
24            }
25          
26            // Move to the next group of characters
27            currentIndex = nextDifferentCharIndex;
28        }
29      
30        return compressedResult.toString();
31    }
32}
33

1class Solution {
2public:
3    string compressedString(string word) {
4        string result;
5        int wordLength = word.length();
6      
7        // Iterate through the string to find consecutive character groups
8        for (int currentIndex = 0; currentIndex < wordLength;) {
9            // Find the end of the current group of same characters
10            int nextIndex = currentIndex + 1;
11            while (nextIndex < wordLength && word[nextIndex] == word[currentIndex]) {
12                ++nextIndex;
13            }
14          
15            // Calculate the count of consecutive same characters
16            int consecutiveCount = nextIndex - currentIndex;
17          
18            // Compress the group by breaking it into chunks of maximum 9
19            // (since we can only use single digit 1-9 for count representation)
20            while (consecutiveCount > 0) {
21                // Take at most 9 characters at a time
22                int chunkSize = min(9, consecutiveCount);
23              
24                // Append the count (as a character) and the character itself
25                result.push_back('0' + chunkSize);  // Convert digit to character
26                result.push_back(word[currentIndex]);
27              
28                // Update remaining count
29                consecutiveCount -= chunkSize;
30            }
31          
32            // Move to the next group of characters
33            currentIndex = nextIndex;
34        }
35      
36        return result;
37    }
38};
39

1/**
2 * Compresses a string by replacing consecutive identical characters with count-character pairs.
3 * Each count is limited to a maximum of 9, so longer sequences are split into multiple pairs.
4 * @param word - The input string to compress
5 * @returns The compressed string representation
6 */
7function compressedString(word: string): string {
8    // Array to store the compressed result pieces
9    const result: string[] = [];
10    const length: number = word.length;
11  
12    // Iterate through the string
13    for (let currentIndex: number = 0; currentIndex < length; ) {
14        // Find the end of the current sequence of identical characters
15        let nextIndex: number = currentIndex + 1;
16        while (nextIndex < length && word[nextIndex] === word[currentIndex]) {
17            nextIndex++;
18        }
19      
20        // Calculate the count of consecutive identical characters
21        let consecutiveCount: number = nextIndex - currentIndex;
22      
23        // Split the count into chunks of maximum 9 and add to result
24        while (consecutiveCount > 0) {
25            // Take at most 9 characters at a time
26            const chunkSize: number = Math.min(consecutiveCount, 9);
27            // Add the count followed by the character
28            result.push(chunkSize + word[currentIndex]);
29            consecutiveCount -= chunkSize;
30        }
31      
32        // Move to the next different character
33        currentIndex = nextIndex;
34    }
35  
36    // Join all pieces to form the final compressed string
37    return result.join('');
38}
39

Solution Approach

The solution uses a two-pointer technique combined with Python's groupby function to efficiently compress the string.

Step-by-step implementation:

Group consecutive characters: Use groupby(word) to automatically group consecutive identical characters. This function returns an iterator of (key, group) pairs where the key is the character and group contains all consecutive occurrences.
Process each group: For each character group (c, v):
- Convert the group iterator v to a list and get its length k to count consecutive occurrences
- Example: If we have "aaaaa", k = 5

Handle the 9-character limit: Since each compression unit can represent at most 9 characters, we use a while loop to break down k:

while k:
    x = min(9, k)  # Take at most 9 characters
    ans.append(str(x) + c)  # Add "count + character" to result
    k -= x  # Reduce remaining count

This greedy approach ensures we always take the maximum allowed (up to 9) in each iteration.

Build the result: Each iteration appends a string in the format "[count][character]" to the ans list. For example:
- 15 'a's → generates "9a" then "6a"
- 3 'b's → generates "3b"
Join and return: Finally, join all compression units with "".join(ans) to create the final compressed string.

Example walkthrough:

Input: "aaaaaaaaaaaabb"
After groupby: [('a', 13 a's), ('b', 2 b's)]
Processing 'a' group (k=13):
- First iteration: x=9, append "9a", k=4
- Second iteration: x=4, append "4a", k=0
Processing 'b' group (k=2):
- Single iteration: x=2, append "2b", k=0
Result: "9a4a2b"

The time complexity is O(n) where n is the length of the input string, as we traverse each character once. The space complexity is O(n) for storing the result.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the compression of word = "aaabbccccc" step by step:

Step 1: Group consecutive characters Using groupby, we get:

Group 1: ('a', ['a', 'a', 'a']) → 3 'a's
Group 2: ('b', ['b', 'b']) → 2 'b's
Group 3: ('c', ['c', 'c', 'c', 'c', 'c']) → 5 'c's

Step 2: Process each group

Processing Group 1 - 'a' with k=3:

Since k=3 < 9, we take all 3
x = min(9, 3) = 3
Append "3a" to result
k = 3 - 3 = 0 (done with this group)

Processing Group 2 - 'b' with k=2:

Since k=2 < 9, we take all 2
x = min(9, 2) = 2
Append "2b" to result
k = 2 - 2 = 0 (done with this group)

Processing Group 3 - 'c' with k=5:

Since k=5 < 9, we take all 5
x = min(9, 5) = 5
Append "5c" to result
k = 5 - 5 = 0 (done with this group)

Step 3: Join results

ans = ["3a", "2b", "5c"]
Final result: "3a2b5c"

Example with splitting (exceeding 9 limit): Let's also see word = "aaaaaaaaaaaa" (12 'a's):

Processing 'a' with k=12:

First iteration: x = min(9, 12) = 9
- Append "9a" to result
- k = 12 - 9 = 3
Second iteration: x = min(9, 3) = 3
- Append "3a" to result
- k = 3 - 3 = 0 (done)

Final result: "9a3a"

This demonstrates how the algorithm handles both simple cases and cases where we need to split groups due to the 9-character limit.

Time and Space Complexity

The time complexity is O(n), where n is the length of the input string word. This is because:

The groupby function iterates through the string once to create groups of consecutive identical characters, which takes O(n) time
Converting the grouped values to a list with len(list(v)) consumes each group exactly once across all iterations
The while loop inside processes each character in the original string exactly once (distributing them into chunks of at most 9)
String concatenation operations in the loop are O(1) for each append operation
The final "".join(ans) operation is O(n) since it needs to concatenate all characters in the result

The space complexity is O(n), where n is the length of the input string word. This is because:

The ans list stores the compressed representation, which in the worst case (when no consecutive characters exist or when characters repeat more than 9 times) can be proportional to the input size
The groupby iterator and the intermediate list created by list(v) use space proportional to the size of each group, which sums up to O(n) across all groups
The final joined string also requires O(n) space

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Forgetting the 9-Character Limit

One of the most common mistakes is overlooking the constraint that each compression unit can represent at most 9 consecutive characters. A naive implementation might simply count all consecutive characters and write them directly:

Incorrect Implementation:

def compressedString(self, word: str) -> str:
    result = []
    i = 0
    while i < len(word):
        char = word[i]
        count = 1
        while i + 1 < len(word) and word[i + 1] == char:
            count += 1
            i += 1
        result.append(str(count) + char)  # Wrong! count could be > 9
        i += 1
    return "".join(result)

Why it fails: For input like "aaaaaaaaaaaaa" (13 a's), this would produce "13a" instead of the correct "9a4a".

Solution: Always split counts greater than 9 into multiple chunks:

while count > 0:
    chunk_size = min(9, count)
    result.append(str(chunk_size) + char)
    count -= chunk_size

2. Inefficient Manual Character Counting

Another pitfall is implementing character counting manually with nested loops, which can lead to off-by-one errors and complexity:

Error-Prone Implementation:

def compressedString(self, word: str) -> str:
    result = []
    i = 0
    while i < len(word):
        j = i
        while j < len(word) and word[j] == word[i]:
            j += 1
        count = j - i  # Easy to miscalculate
        # ... rest of logic

Common bugs in manual counting:

Forgetting to handle the last character group
Index out of bounds when checking word[j]
Incorrectly updating the loop counter

Solution: Use Python's itertools.groupby() which handles the grouping logic cleanly and avoids index manipulation errors. Alternatively, if implementing manually, be careful with boundary conditions and test with edge cases like single characters and strings ending with different patterns.

3. String Concatenation Performance Issue

Building the result string through repeated concatenation can be inefficient:

Inefficient approach:

result = ""
for character, group in groupby(word):
    count = len(list(group))
    while count > 0:
        chunk = min(9, count)
        result += str(chunk) + character  # String concatenation in loop
        count -= chunk

Why it's problematic: Strings are immutable in Python, so each concatenation creates a new string object, leading to O(n²) time complexity in worst case.

Solution: Use a list to collect parts and join them at the end:

result = []
# ... append to result list
return "".join(result)  # Single join operation at the end

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan: