443. String Compression
Problem Description
You are given an array of characters chars
that needs to be compressed in-place using a specific algorithm.
The compression algorithm works as follows:
- Process the array to identify groups of consecutive repeating characters
- For each group:
- If the group contains only 1 character, just keep that character
- If the group contains multiple repeating characters, replace them with the character followed by the count of repetitions
For example:
['a','a','b','b','c','c','c']
becomes['a','2','b','2','c','3']
['a']
remains['a']
['a','b','b','b','b','b','b','b','b','b','b','b','b']
becomes['a','b','1','2']
(note that 12 is split into '1' and '2')
Key requirements:
- In-place modification: You must modify the original
chars
array directly, not create a new array - Constant extra space: The algorithm must use O(1) additional space
- Return the new length: After compression, return the length of the compressed array
- Multi-digit counts: If a group has 10 or more repeating characters, the count will occupy multiple positions in the array (e.g., 12 becomes '1', '2')
The characters in the array beyond the returned length can contain any values and should be ignored. Only the first k characters (where k is the returned length) represent the compressed result.
Intuition
The key insight is that we need to perform the compression in-place while reading and writing to the same array. Since we're compressing (reducing consecutive characters to a character and count), the write position will never overtake the read position - this makes in-place modification safe.
Think of it like having two pointers moving through the array:
- A "read" pointer that scans through the original characters
- A "write" pointer that places the compressed result
For each group of identical characters, we need to:
- Find where the group ends
- Write the character once
- If there are multiple occurrences, write the count as individual digit characters
The challenge is managing these pointers correctly. We use:
i
as our read pointer (where we're currently reading from)k
as our write pointer (where we're placing the compressed result)j
to scout ahead and find where the current group ends
The process resembles a two-pass approach compressed into one:
- First "pass": Count consecutive characters (using
j
to scan ahead fromi
) - Second "pass": Write the compressed form (using
k
to track write position)
Since j - i
gives us the count of consecutive characters, we can easily determine:
- If
j - i = 1
: Single character, just write it - If
j - i > 1
: Multiple characters, write the character and then the count
For multi-digit counts (like 12), we convert the number to a string and write each digit separately. This handles the requirement that counts of 10 or more occupy multiple array positions.
The beauty of this approach is that compression always produces equal or fewer characters than the original, so we never risk overwriting data we haven't read yet.
Solution Approach
The implementation uses a two-pointer technique to compress the array in-place:
Algorithm Steps:
-
Initialize pointers:
i = 0
: Current position being read (start of current group)k = 0
: Position where we write the compressed resultn = len(chars)
: Total length of the array
-
Main loop - Process each group of consecutive characters:
while i < n:
-
Find the end of current group:
j = i + 1 while j < n and chars[j] == chars[i]: j += 1
- Start
j
ati + 1
- Keep advancing
j
while characters matchchars[i]
- When loop ends,
j
points to the first different character (or end of array) - The count of consecutive characters is
j - i
- Start
-
Write the character:
chars[k] = chars[i] k += 1
- Always write the character itself first
- Increment write pointer
k
-
Write the count if needed:
if j - i > 1: cnt = str(j - i) for c in cnt: chars[k] = c k += 1
- Only write count if group has more than 1 character
- Convert count to string to handle multi-digit numbers
- Write each digit as a separate character
- Increment
k
for each digit written
-
Move to next group:
i = j
- Jump
i
to wherej
ended (start of next group)
- Jump
-
Return the compressed length:
return k
k
now points to the position after the last compressed character- This is the length of the compressed array
Example Walkthrough:
For chars = ['a','a','b','b','c','c','c']
:
- Round 1:
i=0
, finds 2 'a's (j=2
), writes 'a','2' at positions 0,1,k=2
- Round 2:
i=2
, finds 2 'b's (j=4
), writes 'b','2' at positions 2,3,k=4
- Round 3:
i=4
, finds 3 'c's (j=7
), writes 'c','3' at positions 4,5,k=6
- Returns 6
The array becomes ['a','2','b','2','c','3','c']
with length 6 (last 'c' is ignored).
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the compression algorithm with chars = ['a','a','a','b','b','c']
:
Initial Setup:
- Array:
['a','a','a','b','b','c']
- Pointers:
i = 0
,k = 0
,n = 6
Round 1: Process group of 'a's
i = 0
(reading from index 0)- Find end of group:
j
starts at 1, advances whilechars[j] == 'a'
j = 1
:chars[1] = 'a'
β continuej = 2
:chars[2] = 'a'
β continuej = 3
:chars[3] = 'b'
β stop
- Group size:
j - i = 3 - 0 = 3
- Write compressed form:
chars[0] = 'a'
,k = 1
- Since count > 1, write '3':
chars[1] = '3'
,k = 2
- Move to next group:
i = 3
- Array now:
['a','3','a','b','b','c']
Round 2: Process group of 'b's
i = 3
(reading from index 3)- Find end of group:
j
starts at 4j = 4
:chars[4] = 'b'
β continuej = 5
:chars[5] = 'c'
β stop
- Group size:
j - i = 5 - 3 = 2
- Write compressed form:
chars[2] = 'b'
,k = 3
- Since count > 1, write '2':
chars[3] = '2'
,k = 4
- Move to next group:
i = 5
- Array now:
['a','3','b','2','b','c']
Round 3: Process single 'c'
i = 5
(reading from index 5)- Find end of group:
j
starts at 6j = 6
: reached end of array, stop
- Group size:
j - i = 6 - 5 = 1
- Write compressed form:
chars[4] = 'c'
,k = 5
- Since count = 1, don't write count
- Move to next group:
i = 6
- Array now:
['a','3','b','2','c','c']
Final Result:
- Loop ends since
i = 6 >= n
- Return
k = 5
- Compressed array (first 5 elements):
['a','3','b','2','c']
- Original: 6 characters β Compressed: 5 characters
The algorithm successfully compresses consecutive characters while working in-place, never overwriting unread data since the write pointer (k
) never overtakes the read pointer (i
).
Solution Implementation
1class Solution:
2 def compress(self, chars: List[str]) -> int:
3 """
4 Compress a character array in-place using run-length encoding.
5
6 Args:
7 chars: List of characters to compress
8
9 Returns:
10 The new length of the compressed array
11 """
12 # Initialize pointers and get array length
13 read_index = 0 # Pointer to read characters
14 write_index = 0 # Pointer to write compressed result
15 array_length = len(chars)
16
17 # Process all characters in the array
18 while read_index < array_length:
19 # Find the end of the current character group
20 group_end = read_index + 1
21 while group_end < array_length and chars[group_end] == chars[read_index]:
22 group_end += 1
23
24 # Write the character to the compressed position
25 chars[write_index] = chars[read_index]
26 write_index += 1
27
28 # Calculate the count of consecutive characters
29 count = group_end - read_index
30
31 # If count > 1, write the count digits
32 if count > 1:
33 count_str = str(count)
34 for digit in count_str:
35 chars[write_index] = digit
36 write_index += 1
37
38 # Move to the next group of characters
39 read_index = group_end
40
41 # Return the length of the compressed array
42 return write_index
43
1class Solution {
2 public int compress(char[] chars) {
3 // writeIndex: position to write the compressed result
4 // arrayLength: total length of the input array
5 int writeIndex = 0;
6 int arrayLength = chars.length;
7
8 // Use two pointers to traverse the array
9 // currentIndex: start of current group of same characters
10 // nextIndex: points to the next different character
11 int currentIndex = 0;
12
13 while (currentIndex < arrayLength) {
14 // Find the end of current group of identical characters
15 int nextIndex = currentIndex + 1;
16 while (nextIndex < arrayLength && chars[nextIndex] == chars[currentIndex]) {
17 nextIndex++;
18 }
19
20 // Write the character to the result
21 chars[writeIndex++] = chars[currentIndex];
22
23 // Calculate the count of identical characters
24 int count = nextIndex - currentIndex;
25
26 // If count > 1, write the count as string digits
27 if (count > 1) {
28 String countString = String.valueOf(count);
29 for (char digit : countString.toCharArray()) {
30 chars[writeIndex++] = digit;
31 }
32 }
33
34 // Move to the next group of characters
35 currentIndex = nextIndex;
36 }
37
38 // Return the length of the compressed array
39 return writeIndex;
40 }
41}
42
1class Solution {
2public:
3 int compress(vector<char>& chars) {
4 int writeIndex = 0; // Index for writing compressed characters
5 int arrayLength = chars.size();
6
7 // Iterate through the array using two pointers
8 for (int readIndex = 0, nextIndex = readIndex + 1; readIndex < arrayLength;) {
9 // Find the end of the current group of consecutive characters
10 while (nextIndex < arrayLength && chars[nextIndex] == chars[readIndex]) {
11 ++nextIndex;
12 }
13
14 // Write the character to the compressed position
15 chars[writeIndex++] = chars[readIndex];
16
17 // If the group has more than one character, write the count
18 int groupSize = nextIndex - readIndex;
19 if (groupSize > 1) {
20 // Convert the count to string and write each digit
21 string countStr = to_string(groupSize);
22 for (char digit : countStr) {
23 chars[writeIndex++] = digit;
24 }
25 }
26
27 // Move to the next group
28 readIndex = nextIndex;
29 }
30
31 // Return the new length of the compressed array
32 return writeIndex;
33 }
34};
35
1function compress(chars: string[]): number {
2 let writeIndex = 0; // Index for writing compressed characters
3 const arrayLength = chars.length;
4
5 // Iterate through the array using two pointers
6 let readIndex = 0;
7 while (readIndex < arrayLength) {
8 let nextIndex = readIndex + 1;
9
10 // Find the end of the current group of consecutive characters
11 while (nextIndex < arrayLength && chars[nextIndex] === chars[readIndex]) {
12 nextIndex++;
13 }
14
15 // Write the character to the compressed position
16 chars[writeIndex++] = chars[readIndex];
17
18 // Calculate the size of the current group
19 const groupSize = nextIndex - readIndex;
20
21 // If the group has more than one character, write the count
22 if (groupSize > 1) {
23 // Convert the count to string and write each digit
24 const countStr = groupSize.toString();
25 for (const digit of countStr) {
26 chars[writeIndex++] = digit;
27 }
28 }
29
30 // Move to the next group
31 readIndex = nextIndex;
32 }
33
34 // Return the new length of the compressed array
35 return writeIndex;
36}
37
Time and Space Complexity
Time Complexity: O(n)
where n
is the length of the input array chars
.
The algorithm uses a two-pointer approach with a single pass through the array:
- The outer while loop iterates through each group of consecutive characters using pointer
i
- The inner while loop (with pointer
j
) finds the end of each group of identical characters - Although there are nested loops, each element is visited at most twice: once by pointer
i
and once by pointerj
- Writing the compressed result (character and count) takes
O(1)
time for the character andO(log(count))
time for converting and writing the count digits, but since the count is bounded byn
, this is at mostO(log n)
operations - Since
O(log n) < O(n)
for the count writing, and each element is processed a constant number of times, the overall time complexity isO(n)
Space Complexity: O(1)
The algorithm modifies the input array in-place and uses only a constant amount of extra space:
- Variables
i
,j
,k
, andn
useO(1)
space - The string
cnt
stores the count of consecutive characters, which can be at mostO(log n)
digits (for a count up ton
), but sincelog n
is much smaller thann
and bounded, this is effectivelyO(1)
auxiliary space - No additional data structures are used
- The modification happens directly in the input array, so no extra space proportional to the input size is required
Common Pitfalls
1. Incorrectly Handling Multi-Digit Counts
Pitfall: A common mistake is trying to write the count as a single character or not properly handling counts β₯ 10.
Incorrect approach:
if count > 1:
chars[write_index] = str(count) # Wrong! This tries to assign a string to a single position
write_index += 1
Why it fails: When count is 12, str(12)
is "12" which is a string of length 2. You cannot assign a multi-character string to a single array position.
Correct solution:
if count > 1:
count_str = str(count)
for digit in count_str: # Write each digit separately
chars[write_index] = digit
write_index += 1
2. Creating a New Array Instead of In-Place Modification
Pitfall: Building a new result array or string and then copying it back.
Incorrect approach:
result = []
i = 0
while i < len(chars):
# ... count consecutive characters ...
result.append(chars[i])
if count > 1:
result.extend(list(str(count)))
i = j
# Wrong! This uses O(n) extra space
chars[:len(result)] = result
return len(result)
Why it fails: This violates the O(1) space constraint by creating an additional array.
Correct solution: Use the two-pointer technique to overwrite the original array directly as shown in the main solution.
3. Off-by-One Errors in Group Counting
Pitfall: Incorrectly calculating the count of consecutive characters.
Incorrect approach:
j = i while j < n and chars[j] == chars[i]: j += 1 count = j - i - 1 # Wrong! This undercounts by 1
Why it fails: If i=0
and we have two 'a's, j
will be 2, so count should be 2-0=2
, not 1
.
Correct solution:
j = i + 1 # Start j at the next position while j < n and chars[j] == chars[i]: j += 1 count = j - i # Correct count
4. Forgetting to Handle Single Character Groups
Pitfall: Always writing the count, even when it's 1.
Incorrect approach:
chars[write_index] = chars[read_index]
write_index += 1
# Always write count (Wrong!)
count_str = str(count)
for digit in count_str:
chars[write_index] = digit
write_index += 1
Why it fails: Single characters should not have '1' written after them. ['a','b','c']
should remain ['a','b','c']
, not become ['a','1','b','1','c','1']
.
Correct solution: Only write the count when it's greater than 1:
if count > 1:
count_str = str(count)
for digit in count_str:
chars[write_index] = digit
write_index += 1
In a binary min heap, the minimum element can be found in:
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!