Facebook Pixel

3167. Better Compression of String πŸ”’

MediumHash TableStringCountingSorting
Leetcode Link

Problem Description

You are given a string compressed representing a compressed version of a string. This format consists of a character followed by its frequency. For example, the string "a3b1a1c2" is a compressed form and represents the original string "aaabacc". The task is to create a better compression of this string with the following conditions:

  1. Each character should appear only once in the compressed version.
  2. The characters should be in alphabetical order.

Intuition

To solve this problem, we need to transform the given compact string representation into a more optimized form without losing any frequency information. Here's how we can think through the solution:

  1. Count Frequencies: First, we need an efficient way to count the frequencies of each character in the compressed format. A hash table (or dictionary) is well-suited for this task, where each unique character serves as a key and its total frequency as the value.

  2. Parse the String: We can traverse the input string using two pointers. One points to the current character and the other captures the number following it, which denotes its frequency. By reading characters and their respective frequencies, we can update our hash table accordingly.

  3. Combine and Sort: Once we have computed the cumulative frequency for each character, the next step is to build a new string. The characters need to be arranged in alphabetical order. We achieve this by sorting the hash table's keys and concatenating them along with their corresponding frequencies to form the final string.

By following these steps, we turn the input string into a more compact and alphabetically organized version while preserving the original frequency tallies of each character.

Learn more about Sorting patterns.

Solution Approach

To implement the solution for the better compression of the given input string, we'll employ a hash table alongside a two-pointer technique. Here's how the solution works step-by-step:

  1. Initialize a Counter: We use a Counter from Python's collections module to keep track of the cumulative frequencies of each character.

    cnt = Counter()
  2. Traverse the String: Set up two variables, i and n, where i points to the current position in the string and n is the total length of the string.

    i, n = 0, len(compressed)
  3. Process Characters and Frequencies: As we traverse the string, we use another pointer j that starts just after i to read and compute the frequency of the current character.

    while i < n:
        j = i + 1
        x = 0
        while j < n and compressed[j].isdigit():
            x = x * 10 + int(compressed[j])
            j += 1
        cnt[compressed[i]] += x
        i = j
    • Start with j at i + 1 to check subsequent digits and collect the complete frequency number.
    • Convert each extracted digit into an integer and update the total frequency count for the character in the Counter.
  4. Generate the Result String: After populating the Counter, we create the better compression string by sorting the keys (characters) alphabetically and joining each character with its cumulative frequency.

    return "".join(sorted(f"{k}{v}" for k, v in cnt.items()))
    • Use a generator to iterate over the sorted items of the Counter.
    • For each character k with frequency v, construct the string f"{k}{v}", then concatenate all parts to form the final result.

This approach efficiently compresses the string by using a combination of a hash table for counting and sorting to achieve the desired output format.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through the solution approach using the example compressed string "a3b1a1c2".

Step 1: Initialize a Counter

We start by setting up a Counter from Python's collections module to keep track of character frequencies.

cnt = Counter()

Step 2: Traverse the String

Next, we set up pointers i and n to help traverse the string.

i, n = 0, len("a3b1a1c2")

Step 3: Process Characters and Frequencies

We iterate over the string with a while loop. For each character, i points to the character, and j reads the frequency.

  • At the start, i = 0:

    • compressed[i] = 'a'
    • Move j to read 3.
    • Update cnt['a'] += 3.
  • Move i to the next character, i = 2:

    • compressed[i] = 'b'
    • j moves to read 1.
    • Update cnt['b'] += 1.
  • Move i to the next character, i = 4:

    • compressed[i] = 'a'
    • j moves to read 1.
    • Update cnt['a'] += 1.
  • Move i to the next character, i = 6:

    • compressed[i] = 'c'
    • j moves to read 2.
    • Update cnt['c'] += 2.

After processing, our Counter is:

cnt = {'a': 4, 'b': 1, 'c': 2}

Step 4: Generate the Result String

Sort the characters in alphabetical order and concatenate each character with its frequency.

result = "".join(sorted(f"{k}{v}" for k, v in cnt.items()))

The final sorted and compressed string is:

  • a4b1c2

Thus, the original compressed string "a3b1a1c2" is transformed into a more compact and ordered format, "a4b1c2".

Solution Implementation

1from collections import Counter
2
3class Solution:
4    def betterCompression(self, compressed: str) -> str:
5        # Initialize a counter to store character frequencies
6        cnt = Counter()
7        i, n = 0, len(compressed)
8      
9        # Iterate over the compressed string
10        while i < n:
11            j = i + 1
12            x = 0
13          
14            # Accumulate the number following a character
15            while j < n and compressed[j].isdigit():
16                x = x * 10 + int(compressed[j])
17                j += 1
18          
19            # Update the count of the current character
20            cnt[compressed[i]] += x
21          
22            # Move the index to the next character
23            i = j
24      
25        # Create the sorted output string based on character frequency
26        return "".join(sorted(f"{char}{count}" for char, count in cnt.items()))
27
1import java.util.Map;
2import java.util.TreeMap;
3
4class Solution {
5    public String betterCompression(String compressed) {
6        // Use a TreeMap to store characters and their cumulative counts.
7        Map<Character, Integer> characterCountMap = new TreeMap<>();
8        int index = 0;
9        int length = compressed.length();
10      
11        // Iterate over the compressed string to parse each character and its count.
12        while (index < length) {
13            char currentChar = compressed.charAt(index);
14            int nextIndex = index + 1;
15            int number = 0;
16          
17            // Calculate the number following the character which represents its frequency.
18            while (nextIndex < length && Character.isDigit(compressed.charAt(nextIndex))) {
19                number = number * 10 + (compressed.charAt(nextIndex) - '0');
20                nextIndex++;
21            }
22          
23            // Merge the current count with any existing count for this character.
24            characterCountMap.merge(currentChar, number, Integer::sum);
25          
26            // Move to the next character in the compressed string.
27            index = nextIndex;
28        }
29      
30        // Create a result string by concatenating each character with its count.
31        StringBuilder result = new StringBuilder();
32        for (Map.Entry<Character, Integer> entry : characterCountMap.entrySet()) {
33            result.append(entry.getKey()).append(entry.getValue());
34        }
35        return result.toString();
36    }
37}
38
1#include <string>
2#include <map>
3#include <sstream>
4#include <cctype>
5
6class Solution {
7public:
8    // Method to parse compressed string and return a better compressed format
9    std::string betterCompression(std::string compressed) {
10        std::map<char, int> characterCount; // Map to store count of each character
11        int index = 0; // Current index in the input string
12        int length = compressed.length(); // Length of the input string
13      
14        // Loop through input string
15        while (index < length) {
16            char currentChar = compressed[index]; // Current character
17          
18            // Move to the next character which should be a digit
19            int nextIndex = index + 1;
20            int number = 0; // Variable to store number following the character
21          
22            // Construct the number from subsequent digits
23            while (nextIndex < length && std::isdigit(compressed[nextIndex])) {
24                number = number * 10 + (compressed[nextIndex] - '0'); // Build the number
25                nextIndex++;
26            }
27          
28            // Update the count of the current character
29            characterCount[currentChar] += number;
30          
31            // Move index to the next character section
32            index = nextIndex;
33        }
34      
35        std::stringstream result; // Stream for constructing the result string
36      
37        // Create the result string from the map
38        for (const auto& entry : characterCount) {
39            result << entry.first << entry.second;
40        }
41      
42        return result.str(); // Return the better compressed string
43    }
44};
45
1// The betterCompression function takes a compressed string and returns an expanded and sorted version.
2function betterCompression(compressed: string): string {
3    // Map to store the count of each character.
4    const cnt = new Map<string, number>();
5    const n = compressed.length;
6    let i = 0;
7
8    // Iterate over each character in the compressed string.
9    while (i < n) {
10        const char = compressed[i];
11        let j = i + 1;
12        let number = 0;
13
14        // Extract number following the character.
15        while (j < n && /\d/.test(compressed[j])) {
16            number = number * 10 + +compressed[j];
17            j++;
18        }
19
20        // Update the count of the character in the map.
21        cnt.set(char, (cnt.get(char) || 0) + number);
22
23        // Move to the next character.
24        i = j;
25    }
26
27    // Sort the characters alphabetically.
28    const keys = Array.from(cnt.keys()).sort();
29
30    // Build the result string by concatenating the characters and their counts.
31    const result: string[] = [];
32    for (const key of keys) {
33        result.push(`${key}${cnt.get(key)}`);
34    }
35
36    // Join the result array into a string and return it.
37    return result.join('');
38}
39

Time and Space Complexity

The time complexity of the code is O(n + |\Sigma| \log |\Sigma|), where n is the length of the string compressed, and |\Sigma| is the size of the character set, which is 26 for lowercase letters. This is due to the need to iterate through the string compressed (O(n)) and sort the characters in cnt (O(|\Sigma| \log |\Sigma|)).

The space complexity of the code is O(|\Sigma|). This is because the Counter dictionary cnt stores counts for each unique character from the character set, leading to a space requirement proportional to the size of the character set.

Learn more about how to find time and space complexity quickly.


Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:
Question 1 out of 10

Consider the classic dynamic programming of longest increasing subsequence:

Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.

For example, the length of LIS for [50, 3, 10, 7, 40, 80] is 4 and LIS is [3, 7, 40, 80].

What is the recurrence relation?


Recommended Readings

Want a Structured Path to Master System Design Too? Don’t Miss This!


Load More