Facebook Pixel

443. String Compression

Problem Description

You are given an array of characters chars that needs to be compressed in-place using a specific algorithm.

The compression algorithm works as follows:

  • Process the array to identify groups of consecutive repeating characters
  • For each group:
    • If the group contains only 1 character, just keep that character
    • If the group contains multiple repeating characters, replace them with the character followed by the count of repetitions

For example:

  • ['a','a','b','b','c','c','c'] becomes ['a','2','b','2','c','3']
  • ['a'] remains ['a']
  • ['a','b','b','b','b','b','b','b','b','b','b','b','b'] becomes ['a','b','1','2'] (note that 12 is split into '1' and '2')

Key requirements:

  1. In-place modification: You must modify the original chars array directly, not create a new array
  2. Constant extra space: The algorithm must use O(1) additional space
  3. Return the new length: After compression, return the length of the compressed array
  4. Multi-digit counts: If a group has 10 or more repeating characters, the count will occupy multiple positions in the array (e.g., 12 becomes '1', '2')

The characters in the array beyond the returned length can contain any values and should be ignored. Only the first k characters (where k is the returned length) represent the compressed result.

Quick Interview Experience
Help others by sharing your interview experience
Have you seen this problem before?

Intuition

The key insight is that we need to perform the compression in-place while reading and writing to the same array. Since we're compressing (reducing consecutive characters to a character and count), the write position will never overtake the read position - this makes in-place modification safe.

Think of it like having two pointers moving through the array:

  • A "read" pointer that scans through the original characters
  • A "write" pointer that places the compressed result

For each group of identical characters, we need to:

  1. Find where the group ends
  2. Write the character once
  3. If there are multiple occurrences, write the count as individual digit characters

The challenge is managing these pointers correctly. We use:

  • i as our read pointer (where we're currently reading from)
  • k as our write pointer (where we're placing the compressed result)
  • j to scout ahead and find where the current group ends

The process resembles a two-pass approach compressed into one:

  • First "pass": Count consecutive characters (using j to scan ahead from i)
  • Second "pass": Write the compressed form (using k to track write position)

Since j - i gives us the count of consecutive characters, we can easily determine:

  • If j - i = 1: Single character, just write it
  • If j - i > 1: Multiple characters, write the character and then the count

For multi-digit counts (like 12), we convert the number to a string and write each digit separately. This handles the requirement that counts of 10 or more occupy multiple array positions.

The beauty of this approach is that compression always produces equal or fewer characters than the original, so we never risk overwriting data we haven't read yet.

Solution Approach

The implementation uses a two-pointer technique to compress the array in-place:

Algorithm Steps:

  1. Initialize pointers:

    • i = 0: Current position being read (start of current group)
    • k = 0: Position where we write the compressed result
    • n = len(chars): Total length of the array
  2. Main loop - Process each group of consecutive characters:

    while i < n:
  3. Find the end of current group:

    j = i + 1
    while j < n and chars[j] == chars[i]:
        j += 1
    • Start j at i + 1
    • Keep advancing j while characters match chars[i]
    • When loop ends, j points to the first different character (or end of array)
    • The count of consecutive characters is j - i
  4. Write the character:

    chars[k] = chars[i]
    k += 1
    • Always write the character itself first
    • Increment write pointer k
  5. Write the count if needed:

    if j - i > 1:
        cnt = str(j - i)
        for c in cnt:
            chars[k] = c
            k += 1
    • Only write count if group has more than 1 character
    • Convert count to string to handle multi-digit numbers
    • Write each digit as a separate character
    • Increment k for each digit written
  6. Move to next group:

    i = j
    • Jump i to where j ended (start of next group)
  7. Return the compressed length:

    return k
    • k now points to the position after the last compressed character
    • This is the length of the compressed array

Example Walkthrough:

For chars = ['a','a','b','b','c','c','c']:

  • Round 1: i=0, finds 2 'a's (j=2), writes 'a','2' at positions 0,1, k=2
  • Round 2: i=2, finds 2 'b's (j=4), writes 'b','2' at positions 2,3, k=4
  • Round 3: i=4, finds 3 'c's (j=7), writes 'c','3' at positions 4,5, k=6
  • Returns 6

The array becomes ['a','2','b','2','c','3','c'] with length 6 (last 'c' is ignored).

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the compression algorithm with chars = ['a','a','a','b','b','c']:

Initial Setup:

  • Array: ['a','a','a','b','b','c']
  • Pointers: i = 0, k = 0, n = 6

Round 1: Process group of 'a's

  • i = 0 (reading from index 0)
  • Find end of group: j starts at 1, advances while chars[j] == 'a'
    • j = 1: chars[1] = 'a' βœ“ continue
    • j = 2: chars[2] = 'a' βœ“ continue
    • j = 3: chars[3] = 'b' βœ— stop
  • Group size: j - i = 3 - 0 = 3
  • Write compressed form:
    • chars[0] = 'a', k = 1
    • Since count > 1, write '3': chars[1] = '3', k = 2
  • Move to next group: i = 3
  • Array now: ['a','3','a','b','b','c']

Round 2: Process group of 'b's

  • i = 3 (reading from index 3)
  • Find end of group: j starts at 4
    • j = 4: chars[4] = 'b' βœ“ continue
    • j = 5: chars[5] = 'c' βœ— stop
  • Group size: j - i = 5 - 3 = 2
  • Write compressed form:
    • chars[2] = 'b', k = 3
    • Since count > 1, write '2': chars[3] = '2', k = 4
  • Move to next group: i = 5
  • Array now: ['a','3','b','2','b','c']

Round 3: Process single 'c'

  • i = 5 (reading from index 5)
  • Find end of group: j starts at 6
    • j = 6: reached end of array, stop
  • Group size: j - i = 6 - 5 = 1
  • Write compressed form:
    • chars[4] = 'c', k = 5
    • Since count = 1, don't write count
  • Move to next group: i = 6
  • Array now: ['a','3','b','2','c','c']

Final Result:

  • Loop ends since i = 6 >= n
  • Return k = 5
  • Compressed array (first 5 elements): ['a','3','b','2','c']
  • Original: 6 characters β†’ Compressed: 5 characters

The algorithm successfully compresses consecutive characters while working in-place, never overwriting unread data since the write pointer (k) never overtakes the read pointer (i).

Solution Implementation

1class Solution:
2    def compress(self, chars: List[str]) -> int:
3        """
4        Compress a character array in-place using run-length encoding.
5      
6        Args:
7            chars: List of characters to compress
8          
9        Returns:
10            The new length of the compressed array
11        """
12        # Initialize pointers and get array length
13        read_index = 0  # Pointer to read characters
14        write_index = 0  # Pointer to write compressed result
15        array_length = len(chars)
16      
17        # Process all characters in the array
18        while read_index < array_length:
19            # Find the end of the current character group
20            group_end = read_index + 1
21            while group_end < array_length and chars[group_end] == chars[read_index]:
22                group_end += 1
23          
24            # Write the character to the compressed position
25            chars[write_index] = chars[read_index]
26            write_index += 1
27          
28            # Calculate the count of consecutive characters
29            count = group_end - read_index
30          
31            # If count > 1, write the count digits
32            if count > 1:
33                count_str = str(count)
34                for digit in count_str:
35                    chars[write_index] = digit
36                    write_index += 1
37          
38            # Move to the next group of characters
39            read_index = group_end
40      
41        # Return the length of the compressed array
42        return write_index
43
1class Solution {
2    public int compress(char[] chars) {
3        // writeIndex: position to write the compressed result
4        // arrayLength: total length of the input array
5        int writeIndex = 0;
6        int arrayLength = chars.length;
7      
8        // Use two pointers to traverse the array
9        // currentIndex: start of current group of same characters
10        // nextIndex: points to the next different character
11        int currentIndex = 0;
12      
13        while (currentIndex < arrayLength) {
14            // Find the end of current group of identical characters
15            int nextIndex = currentIndex + 1;
16            while (nextIndex < arrayLength && chars[nextIndex] == chars[currentIndex]) {
17                nextIndex++;
18            }
19          
20            // Write the character to the result
21            chars[writeIndex++] = chars[currentIndex];
22          
23            // Calculate the count of identical characters
24            int count = nextIndex - currentIndex;
25          
26            // If count > 1, write the count as string digits
27            if (count > 1) {
28                String countString = String.valueOf(count);
29                for (char digit : countString.toCharArray()) {
30                    chars[writeIndex++] = digit;
31                }
32            }
33          
34            // Move to the next group of characters
35            currentIndex = nextIndex;
36        }
37      
38        // Return the length of the compressed array
39        return writeIndex;
40    }
41}
42
1class Solution {
2public:
3    int compress(vector<char>& chars) {
4        int writeIndex = 0;  // Index for writing compressed characters
5        int arrayLength = chars.size();
6      
7        // Iterate through the array using two pointers
8        for (int readIndex = 0, nextIndex = readIndex + 1; readIndex < arrayLength;) {
9            // Find the end of the current group of consecutive characters
10            while (nextIndex < arrayLength && chars[nextIndex] == chars[readIndex]) {
11                ++nextIndex;
12            }
13          
14            // Write the character to the compressed position
15            chars[writeIndex++] = chars[readIndex];
16          
17            // If the group has more than one character, write the count
18            int groupSize = nextIndex - readIndex;
19            if (groupSize > 1) {
20                // Convert the count to string and write each digit
21                string countStr = to_string(groupSize);
22                for (char digit : countStr) {
23                    chars[writeIndex++] = digit;
24                }
25            }
26          
27            // Move to the next group
28            readIndex = nextIndex;
29        }
30      
31        // Return the new length of the compressed array
32        return writeIndex;
33    }
34};
35
1function compress(chars: string[]): number {
2    let writeIndex = 0;  // Index for writing compressed characters
3    const arrayLength = chars.length;
4  
5    // Iterate through the array using two pointers
6    let readIndex = 0;
7    while (readIndex < arrayLength) {
8        let nextIndex = readIndex + 1;
9      
10        // Find the end of the current group of consecutive characters
11        while (nextIndex < arrayLength && chars[nextIndex] === chars[readIndex]) {
12            nextIndex++;
13        }
14      
15        // Write the character to the compressed position
16        chars[writeIndex++] = chars[readIndex];
17      
18        // Calculate the size of the current group
19        const groupSize = nextIndex - readIndex;
20      
21        // If the group has more than one character, write the count
22        if (groupSize > 1) {
23            // Convert the count to string and write each digit
24            const countStr = groupSize.toString();
25            for (const digit of countStr) {
26                chars[writeIndex++] = digit;
27            }
28        }
29      
30        // Move to the next group
31        readIndex = nextIndex;
32    }
33  
34    // Return the new length of the compressed array
35    return writeIndex;
36}
37

Time and Space Complexity

Time Complexity: O(n) where n is the length of the input array chars.

The algorithm uses a two-pointer approach with a single pass through the array:

  • The outer while loop iterates through each group of consecutive characters using pointer i
  • The inner while loop (with pointer j) finds the end of each group of identical characters
  • Although there are nested loops, each element is visited at most twice: once by pointer i and once by pointer j
  • Writing the compressed result (character and count) takes O(1) time for the character and O(log(count)) time for converting and writing the count digits, but since the count is bounded by n, this is at most O(log n) operations
  • Since O(log n) < O(n) for the count writing, and each element is processed a constant number of times, the overall time complexity is O(n)

Space Complexity: O(1)

The algorithm modifies the input array in-place and uses only a constant amount of extra space:

  • Variables i, j, k, and n use O(1) space
  • The string cnt stores the count of consecutive characters, which can be at most O(log n) digits (for a count up to n), but since log n is much smaller than n and bounded, this is effectively O(1) auxiliary space
  • No additional data structures are used
  • The modification happens directly in the input array, so no extra space proportional to the input size is required

Common Pitfalls

1. Incorrectly Handling Multi-Digit Counts

Pitfall: A common mistake is trying to write the count as a single character or not properly handling counts β‰₯ 10.

Incorrect approach:

if count > 1:
    chars[write_index] = str(count)  # Wrong! This tries to assign a string to a single position
    write_index += 1

Why it fails: When count is 12, str(12) is "12" which is a string of length 2. You cannot assign a multi-character string to a single array position.

Correct solution:

if count > 1:
    count_str = str(count)
    for digit in count_str:  # Write each digit separately
        chars[write_index] = digit
        write_index += 1

2. Creating a New Array Instead of In-Place Modification

Pitfall: Building a new result array or string and then copying it back.

Incorrect approach:

result = []
i = 0
while i < len(chars):
    # ... count consecutive characters ...
    result.append(chars[i])
    if count > 1:
        result.extend(list(str(count)))
    i = j
# Wrong! This uses O(n) extra space
chars[:len(result)] = result
return len(result)

Why it fails: This violates the O(1) space constraint by creating an additional array.

Correct solution: Use the two-pointer technique to overwrite the original array directly as shown in the main solution.

3. Off-by-One Errors in Group Counting

Pitfall: Incorrectly calculating the count of consecutive characters.

Incorrect approach:

j = i
while j < n and chars[j] == chars[i]:
    j += 1
count = j - i - 1  # Wrong! This undercounts by 1

Why it fails: If i=0 and we have two 'a's, j will be 2, so count should be 2-0=2, not 1.

Correct solution:

j = i + 1  # Start j at the next position
while j < n and chars[j] == chars[i]:
    j += 1
count = j - i  # Correct count

4. Forgetting to Handle Single Character Groups

Pitfall: Always writing the count, even when it's 1.

Incorrect approach:

chars[write_index] = chars[read_index]
write_index += 1
# Always write count (Wrong!)
count_str = str(count)
for digit in count_str:
    chars[write_index] = digit
    write_index += 1

Why it fails: Single characters should not have '1' written after them. ['a','b','c'] should remain ['a','b','c'], not become ['a','1','b','1','c','1'].

Correct solution: Only write the count when it's greater than 1:

if count > 1:
    count_str = str(count)
    for digit in count_str:
        chars[write_index] = digit
        write_index += 1
Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

In a binary min heap, the minimum element can be found in:


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!

Load More