3167. Better Compression of String

MediumHash TableStringCountingSorting

Problem Description

You are given a string compressed that represents a compressed version of a string. The compression format consists of a character followed by its frequency count. For example, "a3b1a1c2" represents the string "aaabacc" (3 'a's, 1 'b', 1 'a', 2 'c's).

Your task is to create a better compression of this string that follows these rules:

Each character appears only once in the compressed version (combine all occurrences of the same character)
Characters are arranged in alphabetical order

For the example "a3b1a1c2":

The original represents: "aaabacc"
Count all occurrences: 'a' appears 4 times total (3+1), 'b' appears 1 time, 'c' appears 2 times
Better compression in alphabetical order: "a4b1c2"

The solution uses a hash table to accumulate the total count for each character. It parses the compressed string using two pointers:

One pointer i identifies the current character
Another pointer j reads the multi-digit frequency number that follows
The frequency is built digit by digit: x = x * 10 + int(compressed[j])
After processing all characters, the results are sorted alphabetically and concatenated into the final compressed string

The time complexity is O(n + k log k) where n is the length of the input string and k is the number of unique characters. The space complexity is O(k) for storing the character counts.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

The key insight is recognizing that the current compression format allows the same character to appear multiple times at different positions. To create a better compression, we need to merge all occurrences of each character into a single entry.

Think of it like organizing items in a warehouse - instead of having "3 apples here, 2 apples there, 1 apple over there", we want to say "6 apples total". This requires us to:

Parse and accumulate: We need to read through the compressed string and add up the total count for each character. This naturally points us toward using a hash table or dictionary where the key is the character and the value is the cumulative count.
Handle multi-digit numbers: The frequency after each character might be more than a single digit (e.g., "a12" means 12 'a's). We need a way to read these numbers completely. This suggests using two pointers - one to mark where we are, and another to scan ahead and collect all consecutive digits.
Sort the result: Once we have the total counts, alphabetical ordering is straightforward - we can sort the characters and build our output string.

The parsing strategy becomes clear when we think about how we'd manually process the string:

Start at a character
Move forward to read all the digits that form its count
Add this count to our running total for that character
Jump to the next character and repeat

This two-pointer approach naturally handles the variable-length numbers while ensuring we correctly parse each character-frequency pair. The formula x = x * 10 + int(compressed[j]) is the standard way to build a number from individual digits, reading left to right.

Learn more about Sorting patterns.

Solution Implementation

1from collections import Counter
2
3class Solution:
4    def betterCompression(self, compressed: str) -> str:
5        # Dictionary to store frequency count for each character
6        char_count = Counter()
7
8        # Initialize pointers for traversing the string
9        index = 0
10        string_length = len(compressed)
11
12        # Process the compressed string
13        while index < string_length:
14            # Store the current character
15            current_char = compressed[index]
16
17            # Move to the next position to read the number
18            next_index = index + 1
19            frequency = 0
20
21            # Extract the complete number following the character
22            while next_index < string_length and compressed[next_index].isdigit():
23                # Build the number digit by digit
24                frequency = frequency * 10 + int(compressed[next_index])
25                next_index += 1
26
27            # Add the frequency to the character's total count
28            char_count[current_char] += frequency
29
30            # Move to the next character-number pair
31            index = next_index
32
33        # Build the result string with characters sorted alphabetically
34        # Format: character followed by its total frequency
35        result = "".join(sorted(f"{char}{count}" for char, count in char_count.items()))
36
37        return result
38

1class Solution {
2    public String betterCompression(String compressed) {
3        // Use TreeMap to store character frequencies in sorted order
4        Map<Character, Integer> charFrequencyMap = new TreeMap<>();
5
6        int currentIndex = 0;
7        int stringLength = compressed.length();
8
9        // Parse the compressed string to extract characters and their counts
10        while (currentIndex < stringLength) {
11            // Get the current character
12            char currentChar = compressed.charAt(currentIndex);
13
14            // Move to the next position to read the number
15            int numberStartIndex = currentIndex + 1;
16            int count = 0;
17
18            // Extract the multi-digit number following the character
19            while (numberStartIndex < stringLength &&
20                   Character.isDigit(compressed.charAt(numberStartIndex))) {
21                // Build the number digit by digit
22                count = count * 10 + (compressed.charAt(numberStartIndex) - '0');
23                numberStartIndex++;
24            }
25
26            // Add or update the character count in the map
27            // merge() adds the count if key doesn't exist, or sums with existing value
28            charFrequencyMap.merge(currentChar, count, Integer::sum);
29
30            // Move to the next character-number pair
31            currentIndex = numberStartIndex;
32        }
33
34        // Build the result string from the sorted map
35        StringBuilder result = new StringBuilder();
36
37        // Iterate through the map entries (automatically sorted by character)
38        for (Map.Entry<Character, Integer> entry : charFrequencyMap.entrySet()) {
39            // Append character followed by its total count
40            result.append(entry.getKey()).append(entry.getValue());
41        }
42
43        return result.toString();
44    }
45}
46

1class Solution {
2public:
3    string betterCompression(string compressed) {
4        // Map to store character frequencies in sorted order
5        map<char, int> charFrequency;
6
7        int index = 0;
8        int length = compressed.length();
9
10        // Parse the compressed string to extract characters and their counts
11        while (index < length) {
12            // Get the current character
13            char currentChar = compressed[index];
14
15            // Move to the next position to read the number
16            int nextIndex = index + 1;
17            int count = 0;
18
19            // Extract the multi-digit number following the character
20            while (nextIndex < length && isdigit(compressed[nextIndex])) {
21                count = count * 10 + (compressed[nextIndex] - '0');
22                nextIndex++;
23            }
24
25            // Add the count to the existing frequency of the character
26            charFrequency[currentChar] += count;
27
28            // Move index to the next character-number pair
29            index = nextIndex;
30        }
31
32        // Build the result string from the aggregated frequencies
33        stringstream result;
34        for (const auto& [character, frequency] : charFrequency) {
35            result << character << frequency;
36        }
37
38        return result.str();
39    }
40};
41

1/**
2 * Compresses a string by combining consecutive character-count pairs
3 * and sorting them alphabetically by character
4 * @param compressed - Input string in format like "a3b2a1"
5 * @returns Compressed string with combined counts sorted alphabetically
6 */
7function betterCompression(compressed: string): string {
8    // Map to store the total count for each character
9    const characterCountMap = new Map<string, number>();
10    const length = compressed.length;
11    let currentIndex = 0;
12
13    // Parse the compressed string to extract characters and their counts
14    while (currentIndex < length) {
15        // Extract the current character
16        const character = compressed[currentIndex];
17
18        // Move to the next position to start reading the number
19        let numberEndIndex = currentIndex + 1;
20        let count = 0;
21
22        // Parse the multi-digit number following the character
23        while (numberEndIndex < length && /\d/.test(compressed[numberEndIndex])) {
24            // Build the number digit by digit
25            count = count * 10 + Number(compressed[numberEndIndex]);
26            numberEndIndex++;
27        }
28
29        // Add or update the count for this character in the map
30        characterCountMap.set(character, (characterCountMap.get(character) || 0) + count);
31
32        // Move to the next character-number pair
33        currentIndex = numberEndIndex;
34    }
35
36    // Sort the characters alphabetically
37    const sortedCharacters = Array.from(characterCountMap.keys()).sort();
38
39    // Build the result string with sorted characters and their counts
40    const resultArray: string[] = [];
41    for (const character of sortedCharacters) {
42        resultArray.push(`${character}${characterCountMap.get(character)}`);
43    }
44
45    // Join all character-count pairs into the final compressed string
46    return resultArray.join('');
47}
48

Solution Approach

The solution uses a hash table with two pointers technique to parse and aggregate the character frequencies.

Step 1: Initialize Data Structures

Create a Counter (hash table) to store the cumulative frequency of each character
Set up two pointers: i for the current position and n for the string length

Step 2: Parse the Compressed String The main parsing loop continues while i < n:

Identify the character: The character at position i is the letter we're processing
Extract the frequency number:
- Start j at i + 1 (position after the character)
- Initialize x = 0 to build the frequency number
- While compressed[j] is a digit:
  - Build the number using x = x * 10 + int(compressed[j])
  - Move j forward
- This handles multi-digit numbers like "12" or "100"
Update the counter:
- Add the extracted frequency x to the counter for compressed[i]
- This accumulates counts if the same character appears multiple times
Move to next character-frequency pair:
- Set i = j to jump to the next character

Step 3: Build the Result After parsing all character-frequency pairs:

Use cnt.items() to get all (character, total_frequency) pairs
Format each pair as f"{k}{v}" (character followed by its count)
Sort these strings alphabetically (Python sorts strings lexicographically by default)
Join all formatted pairs into the final compressed string

Example Walkthrough with "a3b1a1c2":

Parse "a3": cnt['a'] = 3
Parse "b1": cnt['b'] = 1
Parse "a1": cnt['a'] = 3 + 1 = 4
Parse "c2": cnt['c'] = 2
Sort and format: ["a4", "b1", "c2"]
Join: "a4b1c2"

The time complexity is O(n + k log k) where n is the input length (for parsing) and k is the number of unique characters (for sorting). The space complexity is O(k) for the counter.

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution with the input "b2a3b1c2a1":

Initial Setup:

Create an empty counter/hash table: cnt = {}
Set i = 0, n = 10 (length of input)

Parsing Loop:

Iteration 1: i = 0

Character: 'b' at position 0
Build frequency: j = 1, read digits → x = 2
Update counter: cnt['b'] = 2
Move pointer: i = 2

Iteration 2: i = 2

Character: 'a' at position 2
Build frequency: j = 3, read digits → x = 3
Update counter: cnt['a'] = 3
Move pointer: i = 4

Iteration 3: i = 4

Character: 'b' at position 4
Build frequency: j = 5, read digits → x = 1
Update counter: cnt['b'] = 2 + 1 = 3
Move pointer: i = 6

Iteration 4: i = 6

Character: 'c' at position 6
Build frequency: j = 7, read digits → x = 2
Update counter: cnt['c'] = 2
Move pointer: i = 8

Iteration 5: i = 8

Character: 'a' at position 8
Build frequency: j = 9, read digits → x = 1
Update counter: cnt['a'] = 3 + 1 = 4
Move pointer: i = 10 (end of string)

Building Result:

Counter contains: {'b': 3, 'a': 4, 'c': 2}
Format each entry: ["b3", "a4", "c2"]
Sort alphabetically: ["a4", "b3", "c2"]
Join together: "a4b3c2"

Final Output: "a4b3c2"

Time and Space Complexity

Time Complexity: O(n + |Σ| log |Σ|)

The algorithm consists of two main phases:

Parsing phase: The while loop iterates through the entire string once. For each character-number pair, it extracts the character and builds the number by processing consecutive digits. This takes O(n) time where n is the length of the compressed string.
Sorting and output phase: The sorted() function sorts the character-count pairs. Since there are at most |Σ| unique characters (where |Σ| = 26 for lowercase letters), sorting takes O(|Σ| log |Σ|) time. The join() operation then concatenates the sorted results, which takes O(|Σ|) time for iteration plus the time to build the output string.

Therefore, the total time complexity is O(n + |Σ| log |Σ|).

Space Complexity: O(|Σ|)

The space usage includes:

The Counter object cnt stores at most |Σ| character-count pairs, using O(|Σ|) space.
The sorted list comprehension creates a temporary list of at most |Σ| strings, requiring O(|Σ|) space.
The final output string requires space proportional to the number of unique characters and their counts, which is also bounded by O(|Σ|).

Since the character set consists of lowercase letters, |Σ| = 26, making the space complexity O(|Σ|) or O(26) = O(1) in practical terms.

Learn more about how to find time and space complexity quickly.

Common Pitfalls

1. Not Handling Multi-Digit Numbers Correctly

A common mistake is assuming each frequency is a single digit and using compressed[i+1] directly, which fails for inputs like "a12b5".

Incorrect approach:

# This only reads one digit!
frequency = int(compressed[i + 1])
i += 2

Correct approach:

# Build the complete number digit by digit
j = i + 1
frequency = 0
while j < len(compressed) and compressed[j].isdigit():
    frequency = frequency * 10 + int(compressed[j])
    j += 1
i = j

2. Forgetting to Accumulate Counts for Duplicate Characters

Another pitfall is replacing the count instead of adding to it when the same character appears multiple times in the input.

Incorrect approach:

# This overwrites previous counts!
char_count[current_char] = frequency

Correct approach:

# Accumulate the counts
char_count[current_char] += frequency
# Or use Counter which handles this automatically

3. Incorrect String Concatenation in Result

When building the final result, forgetting to convert the frequency to string or not sorting alphabetically leads to wrong output.

Incorrect approaches:

# Missing alphabetical sort
result = "".join(f"{char}{count}" for char, count in char_count.items())

# Or trying to concatenate without string conversion
result = "".join(char + count for char, count in char_count.items())  # TypeError!

Correct approach:

# Sort and properly format as strings
result = "".join(sorted(f"{char}{count}" for char, count in char_count.items()))

4. Index Out of Bounds When Parsing

Not checking boundaries when reading digits can cause index errors, especially at the end of the string.

Incorrect approach:

while compressed[j].isdigit():  # Missing boundary check!
    frequency = frequency * 10 + int(compressed[j])
    j += 1

Correct approach:

while j < len(compressed) and compressed[j].isdigit():
    frequency = frequency * 10 + int(compressed[j])
    j += 1

Discover Your Strengths and Weaknesses: Take Our 5-Minute Quiz to Tailor Your Study Plan:

Is the following code DFS or BFS?

void search(Node root) {
  if (!root) return;
  visit(root);
  root.visited = true;
  for (Node node in root.adjacent) {
    if (!node.visited) {
      search(node);
    }
  }
}

Depth First Search

Breadth First Search

I don't know

3167. Better Compression of String

Problem Description

Intuition

Solution Implementation

Solution Approach

Ready to land your dream job?

Unlock your dream job with a 5-minute evaluator for a personalized learning plan!

Example Walkthrough

Time and Space Complexity

Common Pitfalls

1. Not Handling Multi-Digit Numbers Correctly

2. Forgetting to Accumulate Counts for Duplicate Characters

3. Incorrect String Concatenation in Result

4. Index Out of Bounds When Parsing

Recommended Readings