3167. Better Compression of String π
Problem Description
You are given a string compressed
that represents a compressed version of a string. The compression format consists of a character followed by its frequency count. For example, "a3b1a1c2"
represents the string "aaabacc"
(3 'a's, 1 'b', 1 'a', 2 'c's).
Your task is to create a better compression of this string that follows these rules:
- Each character appears only once in the compressed version (combine all occurrences of the same character)
- Characters are arranged in alphabetical order
For the example "a3b1a1c2"
:
- The original represents:
"aaabacc"
- Count all occurrences: 'a' appears 4 times total (3+1), 'b' appears 1 time, 'c' appears 2 times
- Better compression in alphabetical order:
"a4b1c2"
The solution uses a hash table to accumulate the total count for each character. It parses the compressed string using two pointers:
- One pointer
i
identifies the current character - Another pointer
j
reads the multi-digit frequency number that follows - The frequency is built digit by digit:
x = x * 10 + int(compressed[j])
- After processing all characters, the results are sorted alphabetically and concatenated into the final compressed string
The time complexity is O(n + k log k)
where n
is the length of the input string and k
is the number of unique characters. The space complexity is O(k)
for storing the character counts.
Intuition
The key insight is recognizing that the current compression format allows the same character to appear multiple times at different positions. To create a better compression, we need to merge all occurrences of each character into a single entry.
Think of it like organizing items in a warehouse - instead of having "3 apples here, 2 apples there, 1 apple over there", we want to say "6 apples total". This requires us to:
-
Parse and accumulate: We need to read through the compressed string and add up the total count for each character. This naturally points us toward using a hash table or dictionary where the key is the character and the value is the cumulative count.
-
Handle multi-digit numbers: The frequency after each character might be more than a single digit (e.g.,
"a12"
means 12 'a's). We need a way to read these numbers completely. This suggests using two pointers - one to mark where we are, and another to scan ahead and collect all consecutive digits. -
Sort the result: Once we have the total counts, alphabetical ordering is straightforward - we can sort the characters and build our output string.
The parsing strategy becomes clear when we think about how we'd manually process the string:
- Start at a character
- Move forward to read all the digits that form its count
- Add this count to our running total for that character
- Jump to the next character and repeat
This two-pointer approach naturally handles the variable-length numbers while ensuring we correctly parse each character-frequency pair. The formula x = x * 10 + int(compressed[j])
is the standard way to build a number from individual digits, reading left to right.
Learn more about Sorting patterns.
Solution Approach
The solution uses a hash table with two pointers technique to parse and aggregate the character frequencies.
Step 1: Initialize Data Structures
- Create a
Counter
(hash table) to store the cumulative frequency of each character - Set up two pointers:
i
for the current position andn
for the string length
Step 2: Parse the Compressed String
The main parsing loop continues while i < n
:
-
Identify the character: The character at position
i
is the letter we're processing -
Extract the frequency number:
- Start
j
ati + 1
(position after the character) - Initialize
x = 0
to build the frequency number - While
compressed[j]
is a digit:- Build the number using
x = x * 10 + int(compressed[j])
- Move
j
forward
- Build the number using
- This handles multi-digit numbers like "12" or "100"
- Start
-
Update the counter:
- Add the extracted frequency
x
to the counter forcompressed[i]
- This accumulates counts if the same character appears multiple times
- Add the extracted frequency
-
Move to next character-frequency pair:
- Set
i = j
to jump to the next character
- Set
Step 3: Build the Result After parsing all character-frequency pairs:
- Use
cnt.items()
to get all (character, total_frequency) pairs - Format each pair as
f"{k}{v}"
(character followed by its count) - Sort these strings alphabetically (Python sorts strings lexicographically by default)
- Join all formatted pairs into the final compressed string
Example Walkthrough with "a3b1a1c2"
:
- Parse "a3":
cnt['a'] = 3
- Parse "b1":
cnt['b'] = 1
- Parse "a1":
cnt['a'] = 3 + 1 = 4
- Parse "c2":
cnt['c'] = 2
- Sort and format:
["a4", "b1", "c2"]
- Join:
"a4b1c2"
The time complexity is O(n + k log k)
where n
is the input length (for parsing) and k
is the number of unique characters (for sorting). The space complexity is O(k)
for the counter.
Ready to land your dream job?
Unlock your dream job with a 5-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution with the input "b2a3b1c2a1"
:
Initial Setup:
- Create an empty counter/hash table:
cnt = {}
- Set
i = 0
,n = 10
(length of input)
Parsing Loop:
Iteration 1: i = 0
- Character:
'b'
at position 0 - Build frequency:
j = 1
, read digits βx = 2
- Update counter:
cnt['b'] = 2
- Move pointer:
i = 2
Iteration 2: i = 2
- Character:
'a'
at position 2 - Build frequency:
j = 3
, read digits βx = 3
- Update counter:
cnt['a'] = 3
- Move pointer:
i = 4
Iteration 3: i = 4
- Character:
'b'
at position 4 - Build frequency:
j = 5
, read digits βx = 1
- Update counter:
cnt['b'] = 2 + 1 = 3
- Move pointer:
i = 6
Iteration 4: i = 6
- Character:
'c'
at position 6 - Build frequency:
j = 7
, read digits βx = 2
- Update counter:
cnt['c'] = 2
- Move pointer:
i = 8
Iteration 5: i = 8
- Character:
'a'
at position 8 - Build frequency:
j = 9
, read digits βx = 1
- Update counter:
cnt['a'] = 3 + 1 = 4
- Move pointer:
i = 10
(end of string)
Building Result:
- Counter contains:
{'b': 3, 'a': 4, 'c': 2}
- Format each entry:
["b3", "a4", "c2"]
- Sort alphabetically:
["a4", "b3", "c2"]
- Join together:
"a4b3c2"
Final Output: "a4b3c2"
Solution Implementation
1from collections import Counter
2
3class Solution:
4 def betterCompression(self, compressed: str) -> str:
5 # Dictionary to store frequency count for each character
6 char_count = Counter()
7
8 # Initialize pointers for traversing the string
9 index = 0
10 string_length = len(compressed)
11
12 # Process the compressed string
13 while index < string_length:
14 # Store the current character
15 current_char = compressed[index]
16
17 # Move to the next position to read the number
18 next_index = index + 1
19 frequency = 0
20
21 # Extract the complete number following the character
22 while next_index < string_length and compressed[next_index].isdigit():
23 # Build the number digit by digit
24 frequency = frequency * 10 + int(compressed[next_index])
25 next_index += 1
26
27 # Add the frequency to the character's total count
28 char_count[current_char] += frequency
29
30 # Move to the next character-number pair
31 index = next_index
32
33 # Build the result string with characters sorted alphabetically
34 # Format: character followed by its total frequency
35 result = "".join(sorted(f"{char}{count}" for char, count in char_count.items()))
36
37 return result
38
1class Solution {
2 public String betterCompression(String compressed) {
3 // Use TreeMap to store character frequencies in sorted order
4 Map<Character, Integer> charFrequencyMap = new TreeMap<>();
5
6 int currentIndex = 0;
7 int stringLength = compressed.length();
8
9 // Parse the compressed string to extract characters and their counts
10 while (currentIndex < stringLength) {
11 // Get the current character
12 char currentChar = compressed.charAt(currentIndex);
13
14 // Move to the next position to read the number
15 int numberStartIndex = currentIndex + 1;
16 int count = 0;
17
18 // Extract the multi-digit number following the character
19 while (numberStartIndex < stringLength &&
20 Character.isDigit(compressed.charAt(numberStartIndex))) {
21 // Build the number digit by digit
22 count = count * 10 + (compressed.charAt(numberStartIndex) - '0');
23 numberStartIndex++;
24 }
25
26 // Add or update the character count in the map
27 // merge() adds the count if key doesn't exist, or sums with existing value
28 charFrequencyMap.merge(currentChar, count, Integer::sum);
29
30 // Move to the next character-number pair
31 currentIndex = numberStartIndex;
32 }
33
34 // Build the result string from the sorted map
35 StringBuilder result = new StringBuilder();
36
37 // Iterate through the map entries (automatically sorted by character)
38 for (Map.Entry<Character, Integer> entry : charFrequencyMap.entrySet()) {
39 // Append character followed by its total count
40 result.append(entry.getKey()).append(entry.getValue());
41 }
42
43 return result.toString();
44 }
45}
46
1class Solution {
2public:
3 string betterCompression(string compressed) {
4 // Map to store character frequencies in sorted order
5 map<char, int> charFrequency;
6
7 int index = 0;
8 int length = compressed.length();
9
10 // Parse the compressed string to extract characters and their counts
11 while (index < length) {
12 // Get the current character
13 char currentChar = compressed[index];
14
15 // Move to the next position to read the number
16 int nextIndex = index + 1;
17 int count = 0;
18
19 // Extract the multi-digit number following the character
20 while (nextIndex < length && isdigit(compressed[nextIndex])) {
21 count = count * 10 + (compressed[nextIndex] - '0');
22 nextIndex++;
23 }
24
25 // Add the count to the existing frequency of the character
26 charFrequency[currentChar] += count;
27
28 // Move index to the next character-number pair
29 index = nextIndex;
30 }
31
32 // Build the result string from the aggregated frequencies
33 stringstream result;
34 for (const auto& [character, frequency] : charFrequency) {
35 result << character << frequency;
36 }
37
38 return result.str();
39 }
40};
41
1/**
2 * Compresses a string by combining consecutive character-count pairs
3 * and sorting them alphabetically by character
4 * @param compressed - Input string in format like "a3b2a1"
5 * @returns Compressed string with combined counts sorted alphabetically
6 */
7function betterCompression(compressed: string): string {
8 // Map to store the total count for each character
9 const characterCountMap = new Map<string, number>();
10 const length = compressed.length;
11 let currentIndex = 0;
12
13 // Parse the compressed string to extract characters and their counts
14 while (currentIndex < length) {
15 // Extract the current character
16 const character = compressed[currentIndex];
17
18 // Move to the next position to start reading the number
19 let numberEndIndex = currentIndex + 1;
20 let count = 0;
21
22 // Parse the multi-digit number following the character
23 while (numberEndIndex < length && /\d/.test(compressed[numberEndIndex])) {
24 // Build the number digit by digit
25 count = count * 10 + Number(compressed[numberEndIndex]);
26 numberEndIndex++;
27 }
28
29 // Add or update the count for this character in the map
30 characterCountMap.set(character, (characterCountMap.get(character) || 0) + count);
31
32 // Move to the next character-number pair
33 currentIndex = numberEndIndex;
34 }
35
36 // Sort the characters alphabetically
37 const sortedCharacters = Array.from(characterCountMap.keys()).sort();
38
39 // Build the result string with sorted characters and their counts
40 const resultArray: string[] = [];
41 for (const character of sortedCharacters) {
42 resultArray.push(`${character}${characterCountMap.get(character)}`);
43 }
44
45 // Join all character-count pairs into the final compressed string
46 return resultArray.join('');
47}
48
Time and Space Complexity
Time Complexity: O(n + |Ξ£| log |Ξ£|)
The algorithm consists of two main phases:
-
Parsing phase: The while loop iterates through the entire string once. For each character-number pair, it extracts the character and builds the number by processing consecutive digits. This takes
O(n)
time wheren
is the length of the compressed string. -
Sorting and output phase: The
sorted()
function sorts the character-count pairs. Since there are at most|Ξ£|
unique characters (where|Ξ£| = 26
for lowercase letters), sorting takesO(|Ξ£| log |Ξ£|)
time. Thejoin()
operation then concatenates the sorted results, which takesO(|Ξ£|)
time for iteration plus the time to build the output string.
Therefore, the total time complexity is O(n + |Ξ£| log |Ξ£|)
.
Space Complexity: O(|Ξ£|)
The space usage includes:
-
The
Counter
objectcnt
stores at most|Ξ£|
character-count pairs, usingO(|Ξ£|)
space. -
The sorted list comprehension creates a temporary list of at most
|Ξ£|
strings, requiringO(|Ξ£|)
space. -
The final output string requires space proportional to the number of unique characters and their counts, which is also bounded by
O(|Ξ£|)
.
Since the character set consists of lowercase letters, |Ξ£| = 26
, making the space complexity O(|Ξ£|)
or O(26) = O(1)
in practical terms.
Learn more about how to find time and space complexity quickly.
Common Pitfalls
1. Not Handling Multi-Digit Numbers Correctly
A common mistake is assuming each frequency is a single digit and using compressed[i+1]
directly, which fails for inputs like "a12b5"
.
Incorrect approach:
# This only reads one digit!
frequency = int(compressed[i + 1])
i += 2
Correct approach:
# Build the complete number digit by digit
j = i + 1
frequency = 0
while j < len(compressed) and compressed[j].isdigit():
frequency = frequency * 10 + int(compressed[j])
j += 1
i = j
2. Forgetting to Accumulate Counts for Duplicate Characters
Another pitfall is replacing the count instead of adding to it when the same character appears multiple times in the input.
Incorrect approach:
# This overwrites previous counts! char_count[current_char] = frequency
Correct approach:
# Accumulate the counts char_count[current_char] += frequency # Or use Counter which handles this automatically
3. Incorrect String Concatenation in Result
When building the final result, forgetting to convert the frequency to string or not sorting alphabetically leads to wrong output.
Incorrect approaches:
# Missing alphabetical sort
result = "".join(f"{char}{count}" for char, count in char_count.items())
# Or trying to concatenate without string conversion
result = "".join(char + count for char, count in char_count.items()) # TypeError!
Correct approach:
# Sort and properly format as strings
result = "".join(sorted(f"{char}{count}" for char, count in char_count.items()))
4. Index Out of Bounds When Parsing
Not checking boundaries when reading digits can cause index errors, especially at the end of the string.
Incorrect approach:
while compressed[j].isdigit(): # Missing boundary check!
frequency = frequency * 10 + int(compressed[j])
j += 1
Correct approach:
while j < len(compressed) and compressed[j].isdigit():
frequency = frequency * 10 + int(compressed[j])
j += 1
Which of these pictures shows the visit order of a depth-first search?
Recommended Readings
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https assets algo monster recursion jpg You first call Ben and ask
Want a Structured Path to Master System Design Too? Donβt Miss This!