443. String Compression
Problem Description
The problem involves compressing a sequence of characters into a string where each group of consecutive, identical characters is represented by the character, followed by the number of occurrences if the occurrences are more than 1. For example, "aabccc" would compress to "a2bc3".
Here's how the algorithm works:
- We iterate over the provided array,
chars
. - For each unique character, we find the number of times it repeats consecutively.
- We then overwrite the original
chars
array with the character followed by the number of occurrences if more than one. - We have to return the length of the array after the modifications.
- The space complexity needs to be constant; therefore, we cannot use any additional arrays or strings for our computations.
The challenge is to edit the array in place since we aren't allowed to return a separate compressed string but instead must modify the chars
array directly.
Intuition
To solve this problem, we maintain two pointers and a counter:
- The first pointer (
i
) marks the start of a sequence of identical characters. - The second pointer (
j
) is used to find the end of this sequence by iterating through the array until a different character is found. - As we identify the consecutive characters, we store the current character and its count (if necessary) into the array, keeping track of where we're writing with a
k
index.
The process operates as follows:
- Iterate over each character with the
i
pointer. - Use the
j
pointer to count how many times the character at positioni
repeats consecutively. - Write the character at the
i
index position into thek
index of thechars
list. - Increase
k
by 1 to move to the next position in the list. - If the number of repetitions is greater than 1, convert it into a string and iterate over each character of this string count.
- For each digit of the count, write it at the
k
index of thechars
list and incrementk
. - Move
i
to the position wherej
has ended up to process the next set of unique characters. - Continue this process until you have finished iterating over the entire array.
- Lastly, return
k
as the new length of the compressedchars
array.
The solution ensures the use of constant space by overwriting the original list with no additional list or string allocation.
Learn more about Two Pointers patterns.
Solution Approach
The solution to the problem of compressing consecutive characters into a shorter representation involves a straightforward approach with a focus on in-place array manipulation to keep the space complexity constant. The key idea is to use a "read-write" mechanism facilitated by pointer manipulation in the array.
Here is a step-by-step breakdown of the implementation details:
- Two Pointers Technique: We initialize two pointers:
i
andk
. Pointeri
is utilized to identify contiguous blocks of the same character, andk
is used to write the compressed form of these blocks back into thechars
list. - Iteration Over the Characters: As we iterate over the
chars
with pointeri
, we aim to identify groups of the same character and count their occurrences. - Counting Group Occurrences: We initialize another pointer
j
that starts fromi + 1
and increments until it finds a character different fromchars[i]
. The differencej - i
then gives us the count of consecutive characters. - Writing the Character: Regardless of the count, we always write the character in consideration to the
chars
list atchars[k]
and incrementk
. - Writing the Count: If the count is greater than one (meaning there are repetitions), we convert the count to a string (
str(j - i)
) and iterate over each character of this count string, writing the digits sequentially into thechars
list at subsequentk
positions, incrementingk
after each write. - Updating Pointers: After handling one set of consecutive characters, we set
i
to the current position ofj
, effectively jumping to the next new character or the end of the list. - Returning the New Length: Once we reach the end of the list with pointer
i
,k
points to the next unwritten position inchars
and thus represents the length of the compressed character list. We then returnk
.
This approach does not require sorting or any additional data structures, but it efficiently leverages the given array itself to store the result. By overwriting the original array, we maintain a space complexity of O(1), aside from the space needed to hold the input itself. The complexity is achieved by utilizing the in-place rewriting of the array, a key requirement specified in the problem description.
In terms of time complexity, each character in chars
is read and written at least once. In the worst case, each write operation for counts might take O(log n) time (if we represent the count in decimal form and n is the number of occurrences), but since count increments for each group of identical characters need to be written at most once, it doesn't affect the overall linear time complexity of O(n), where n is the total number of characters in chars
.
The main takeaway from the solution is the effective use of pointers and the in-place writing technique to transform the array without the use of additional memory, which is a common constraint in more complex problems involving arrays.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's use the string chars = ['a', 'a', 'b', 'b', 'c', 'c', 'c']
to walk through the solution approach:
-
Initialization: We start with pointers
i
,j
, andk
all set to 0. Thechars
array looks like this:['a', 'a', 'b', 'b', 'c', 'c', 'c']
. -
Iteration and Counting: The
i
pointer is at the firsta
. We advancej
to find the end of thea
group, which is at index2
. Therefore, we have 2a
's in a row. -
Compression Writing: We write
'a'
intochars[k]
and incrementk
;chars
now looks like this:['a', 'a', 'b', 'b', 'c', 'c', 'c']
withk
pointing at index 1. -
Writing the Count: The count is 2, so we write
'2'
intochars[k]
and incrementk
again;chars
looks like:['a', '2', 'b', 'b', 'c', 'c', 'c']
. -
Updating Pointers: We move the
i
pointer wherej
stopped, which is index 2, the firstb
. -
Repeat Steps 2-5 for character
b
:j
will stop at index 4 and we write'b'
and then'2'
into the array. The array now looks like:['a', '2', 'b', '2', 'c', 'c', 'c']
, andk
is at index 4. -
Repeat Steps 2-5 for character
c
:j
moves to the end of the array since there are no more characters afterc
. We write'c'
and then'3'
into the array. The final array is['a', '2', 'b', '2', 'c', '3', 'c']
, andk
would be at index 6. -
Final Step: Since we've reached the end of the
chars
array with pointeri
, we can now returnk
as the length of the compressed list. However, due to the previous compressions, there is a leftover'c'
, which we do not count. The final compressed length is 6, and the compressed array is['a', '2', 'b', '2', 'c', '3']
.
In summary, the chars
array, which initially included 7 characters, has been compressed in place to a length of 6, representing the sequence "a2b2c3". The algorithm executed in constant space, as required by the problem definition.
Solution Implementation
1class Solution:
2 def compress(self, chars: List[str]) -> int:
3 # Start pointers at the beginning of the list
4 read, write, length = 0, 0, len(chars)
5
6 # Process the entire character list
7 while read < length:
8 # Move read pointer to end of current character sequence
9 read_next = read + 1
10 while read_next < length and chars[read_next] == chars[read]:
11 read_next += 1
12
13 # Write the current character to the write pointer
14 chars[write] = chars[read]
15 write += 1
16
17 # If we have a sequence longer than 1
18 if read_next - read > 1:
19 # Convert count to string and write each digit
20 count = str(read_next - read)
21 for char in count:
22 chars[write] = char
23 write += 1
24
25 # Move read pointer to next new character
26 read = read_next
27
28 # Return the length of the compressed list
29 return write
30
1class Solution {
2 public int compress(char[] chars) {
3 int writeIndex = 0; // tracks where to write in the array
4 int length = chars.length; // total length of the input array
5
6 // start processing each sequence of characters
7 for (int start = 0; start < length;) {
8 // 'start' is the beginning of a sequence; 'end' will be one past the last char
9 int end = start + 1;
10
11 // expand the sequence to include all identical contiguous characters
12 while (end < length && chars[end] == chars[start]) {
13 end++;
14 }
15
16 // write the character that the sequence consists of
17 chars[writeIndex++] = chars[start];
18
19 // if the sequence is longer than 1, write the count of characters
20 if (end - start > 1) {
21 String count = String.valueOf(end - start); // convert count to string
22 for (char c : count.toCharArray()) { // iterate over each character in the count
23 chars[writeIndex++] = c; // write count characters to the result array
24 }
25 }
26
27 // move to the next sequence
28 start = end;
29 }
30
31 // writeIndex represents the length of the the compressed string within the array
32 return writeIndex;
33 }
34}
35
1class Solution {
2public:
3 // The compress function takes a vector of characters and compresses it by replacing
4 // sequences of repeating characters with the character followed by the count of repeats.
5 // It modifies the vector in-place and returns the new length of the vector after compression.
6 int compress(vector<char>& chars) {
7 int writeIndex = 0; // Initializing write index to track the position to write the next character.
8 int size = chars.size(); // Store the size of the input vector.
9
10 // Loop over characters in the vector starting from the first character.
11 for (int readStart = 0, readEnd = readStart + 1; readStart < size;) {
12 // Expand the readEnd pointer to include all consecutive identical characters.
13 while (readEnd < size && chars[readEnd] == chars[readStart]) {
14 readEnd++;
15 }
16 // Write the character to the vector.
17 chars[writeIndex++] = chars[readStart];
18
19 // If the run of characters is more than one, write the count after the character.
20 int runLength = readEnd - readStart;
21 if (runLength > 1) {
22 // Convert runLength to string and write each digit individually.
23 for (char countChar : to_string(runLength)) {
24 chars[writeIndex++] = countChar;
25 }
26 }
27
28 // Move the readStart to the next character group.
29 readStart = readEnd;
30 }
31
32 // Return the new length of the vector after compression.
33 return writeIndex;
34 }
35};
36
1// The compress function takes an array of characters and compresses it by replacing
2// sequences of repeating characters with the character followed by the count of repeats.
3// It modifies the array in-place and returns the new length of the array after compression.
4
5function compress(chars: string[]): number {
6 let writeIndex = 0; // Initializing write index to track the position to write the next character.
7 let size = chars.length; // Store the size of the input array.
8
9 // Loop over characters in the array starting from the first character.
10 for (let readStart = 0; readStart < size;) {
11 let readEnd = readStart + 1; // Initialize readEnd as the character following readStart.
12
13 // Expand the readEnd pointer to include all consecutive identical characters.
14 while (readEnd < size && chars[readEnd] === chars[readStart]) {
15 readEnd++;
16 }
17
18 // Write the character to the array.
19 chars[writeIndex++] = chars[readStart];
20
21 // If the run of characters is more than one, write the count after the character.
22 let runLength = readEnd - readStart;
23 if (runLength > 1) {
24 // Convert runLength to string to iterate its characters and write each digit individually.
25 let runLengthStr = runLength.toString();
26 for (let i = 0; i < runLengthStr.length; i++) {
27 chars[writeIndex++] = runLengthStr[i];
28 }
29 }
30
31 // Move the readStart to the next unique character.
32 readStart = readEnd;
33 }
34
35 // Return the new length of the array after compression.
36 return writeIndex;
37}
38
Time and Space Complexity
Time Complexity
The given Python code follows a two-pointer approach to compress a list of characters in-place.
- We have two pointers
i
andj
;i
is used to traverse the character list, whilej
is used to find the next character different fromchars[i]
. - The while loop with
i
as its iterator runs for each unique character in the list, so it runs 'n' times in the worst-case scenario (n
being the length of the list). - The inner while loop increments
j
until a different character is found. In the worst case, where all characters are the same, the loop runs 'n' times.
Hence, in the worst case, where the inner loop is considered for every character, the time complexity would be considered linear with respect to the length of the list, i.e., O(n)
.
Space Complexity
The algorithm adjusts the characters in the list in-place and uses a fixed number of variables (i
, j
, k
, n
, cnt
). It does not utilize any additional data structure that grows with the input size. Therefore, the space complexity of the algorithm is O(1)
regardless of the input size.
Learn more about how to find time and space complexity quickly using problem constraints.
What data structure does Breadth-first search typically uses to store intermediate states?
Recommended Readings
Tech Interview Pattern Two Pointers Introduction If you prefer videos here's a super quick introduction to Two Pointers div class responsive iframe iframe src https www youtube com embed xZ4AfXHQ1VQ title YouTube video player frameborder 0 allow accelerometer autoplay clipboard write encrypted media gyroscope picture in picture allowfullscreen iframe div Two pointers is a common interview
LeetCode Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way we
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Don’t Miss This!