3167. Better Compression of String π
Problem Description
You are given a string compressed
representing a compressed version of a string. This format consists of a character followed by its frequency. For example, the string "a3b1a1c2"
is a compressed form and represents the original string "aaabacc"
. The task is to create a better compression of this string with the following conditions:
- Each character should appear only once in the compressed version.
- The characters should be in alphabetical order.
Intuition
To solve this problem, we need to transform the given compact string representation into a more optimized form without losing any frequency information. Here's how we can think through the solution:
-
Count Frequencies: First, we need an efficient way to count the frequencies of each character in the compressed format. A hash table (or dictionary) is well-suited for this task, where each unique character serves as a key and its total frequency as the value.
-
Parse the String: We can traverse the input string using two pointers. One points to the current character and the other captures the number following it, which denotes its frequency. By reading characters and their respective frequencies, we can update our hash table accordingly.
-
Combine and Sort: Once we have computed the cumulative frequency for each character, the next step is to build a new string. The characters need to be arranged in alphabetical order. We achieve this by sorting the hash table's keys and concatenating them along with their corresponding frequencies to form the final string.
By following these steps, we turn the input string into a more compact and alphabetically organized version while preserving the original frequency tallies of each character.
Learn more about Sorting patterns.
Solution Approach
To implement the solution for the better compression of the given input string, we'll employ a hash table alongside a two-pointer technique. Here's how the solution works step-by-step:
-
Initialize a Counter: We use a
Counter
from Python'scollections
module to keep track of the cumulative frequencies of each character.cnt = Counter()
-
Traverse the String: Set up two variables,
i
andn
, wherei
points to the current position in the string andn
is the total length of the string.i, n = 0, len(compressed)
-
Process Characters and Frequencies: As we traverse the string, we use another pointer
j
that starts just afteri
to read and compute the frequency of the current character.while i < n: j = i + 1 x = 0 while j < n and compressed[j].isdigit(): x = x * 10 + int(compressed[j]) j += 1 cnt[compressed[i]] += x i = j
- Start with
j
ati + 1
to check subsequent digits and collect the complete frequency number. - Convert each extracted digit into an integer and update the total frequency count for the character in the
Counter
.
- Start with
-
Generate the Result String: After populating the
Counter
, we create the better compression string by sorting the keys (characters) alphabetically and joining each character with its cumulative frequency.return "".join(sorted(f"{k}{v}" for k, v in cnt.items()))
- Use a generator to iterate over the sorted items of the
Counter
. - For each character
k
with frequencyv
, construct the stringf"{k}{v}"
, then concatenate all parts to form the final result.
- Use a generator to iterate over the sorted items of the
This approach efficiently compresses the string by using a combination of a hash table for counting and sorting to achieve the desired output format.
Ready to land your dream job?
Unlock your dream job with a 2-minute evaluator for a personalized learning plan!
Start EvaluatorExample Walkthrough
Let's walk through the solution approach using the example compressed string "a3b1a1c2"
.
Step 1: Initialize a Counter
We start by setting up a Counter
from Python's collections
module to keep track of character frequencies.
cnt = Counter()
Step 2: Traverse the String
Next, we set up pointers i
and n
to help traverse the string.
i, n = 0, len("a3b1a1c2")
Step 3: Process Characters and Frequencies
We iterate over the string with a while loop. For each character, i
points to the character, and j
reads the frequency.
-
At the start,
i = 0
:compressed[i] = 'a'
- Move
j
to read3
. - Update
cnt['a'] += 3
.
-
Move
i
to the next character,i = 2
:compressed[i] = 'b'
j
moves to read1
.- Update
cnt['b'] += 1
.
-
Move
i
to the next character,i = 4
:compressed[i] = 'a'
j
moves to read1
.- Update
cnt['a'] += 1
.
-
Move
i
to the next character,i = 6
:compressed[i] = 'c'
j
moves to read2
.- Update
cnt['c'] += 2
.
After processing, our Counter
is:
cnt = {'a': 4, 'b': 1, 'c': 2}
Step 4: Generate the Result String
Sort the characters in alphabetical order and concatenate each character with its frequency.
result = "".join(sorted(f"{k}{v}" for k, v in cnt.items()))
The final sorted and compressed string is:
a4b1c2
Thus, the original compressed string "a3b1a1c2"
is transformed into a more compact and ordered format, "a4b1c2"
.
Solution Implementation
1from collections import Counter
2
3class Solution:
4 def betterCompression(self, compressed: str) -> str:
5 # Initialize a counter to store character frequencies
6 cnt = Counter()
7 i, n = 0, len(compressed)
8
9 # Iterate over the compressed string
10 while i < n:
11 j = i + 1
12 x = 0
13
14 # Accumulate the number following a character
15 while j < n and compressed[j].isdigit():
16 x = x * 10 + int(compressed[j])
17 j += 1
18
19 # Update the count of the current character
20 cnt[compressed[i]] += x
21
22 # Move the index to the next character
23 i = j
24
25 # Create the sorted output string based on character frequency
26 return "".join(sorted(f"{char}{count}" for char, count in cnt.items()))
27
1import java.util.Map;
2import java.util.TreeMap;
3
4class Solution {
5 public String betterCompression(String compressed) {
6 // Use a TreeMap to store characters and their cumulative counts.
7 Map<Character, Integer> characterCountMap = new TreeMap<>();
8 int index = 0;
9 int length = compressed.length();
10
11 // Iterate over the compressed string to parse each character and its count.
12 while (index < length) {
13 char currentChar = compressed.charAt(index);
14 int nextIndex = index + 1;
15 int number = 0;
16
17 // Calculate the number following the character which represents its frequency.
18 while (nextIndex < length && Character.isDigit(compressed.charAt(nextIndex))) {
19 number = number * 10 + (compressed.charAt(nextIndex) - '0');
20 nextIndex++;
21 }
22
23 // Merge the current count with any existing count for this character.
24 characterCountMap.merge(currentChar, number, Integer::sum);
25
26 // Move to the next character in the compressed string.
27 index = nextIndex;
28 }
29
30 // Create a result string by concatenating each character with its count.
31 StringBuilder result = new StringBuilder();
32 for (Map.Entry<Character, Integer> entry : characterCountMap.entrySet()) {
33 result.append(entry.getKey()).append(entry.getValue());
34 }
35 return result.toString();
36 }
37}
38
1#include <string>
2#include <map>
3#include <sstream>
4#include <cctype>
5
6class Solution {
7public:
8 // Method to parse compressed string and return a better compressed format
9 std::string betterCompression(std::string compressed) {
10 std::map<char, int> characterCount; // Map to store count of each character
11 int index = 0; // Current index in the input string
12 int length = compressed.length(); // Length of the input string
13
14 // Loop through input string
15 while (index < length) {
16 char currentChar = compressed[index]; // Current character
17
18 // Move to the next character which should be a digit
19 int nextIndex = index + 1;
20 int number = 0; // Variable to store number following the character
21
22 // Construct the number from subsequent digits
23 while (nextIndex < length && std::isdigit(compressed[nextIndex])) {
24 number = number * 10 + (compressed[nextIndex] - '0'); // Build the number
25 nextIndex++;
26 }
27
28 // Update the count of the current character
29 characterCount[currentChar] += number;
30
31 // Move index to the next character section
32 index = nextIndex;
33 }
34
35 std::stringstream result; // Stream for constructing the result string
36
37 // Create the result string from the map
38 for (const auto& entry : characterCount) {
39 result << entry.first << entry.second;
40 }
41
42 return result.str(); // Return the better compressed string
43 }
44};
45
1// The betterCompression function takes a compressed string and returns an expanded and sorted version.
2function betterCompression(compressed: string): string {
3 // Map to store the count of each character.
4 const cnt = new Map<string, number>();
5 const n = compressed.length;
6 let i = 0;
7
8 // Iterate over each character in the compressed string.
9 while (i < n) {
10 const char = compressed[i];
11 let j = i + 1;
12 let number = 0;
13
14 // Extract number following the character.
15 while (j < n && /\d/.test(compressed[j])) {
16 number = number * 10 + +compressed[j];
17 j++;
18 }
19
20 // Update the count of the character in the map.
21 cnt.set(char, (cnt.get(char) || 0) + number);
22
23 // Move to the next character.
24 i = j;
25 }
26
27 // Sort the characters alphabetically.
28 const keys = Array.from(cnt.keys()).sort();
29
30 // Build the result string by concatenating the characters and their counts.
31 const result: string[] = [];
32 for (const key of keys) {
33 result.push(`${key}${cnt.get(key)}`);
34 }
35
36 // Join the result array into a string and return it.
37 return result.join('');
38}
39
Time and Space Complexity
The time complexity of the code is O(n + |\Sigma| \log |\Sigma|)
, where n
is the length of the string compressed
, and |\Sigma|
is the size of the character set, which is 26 for lowercase letters. This is due to the need to iterate through the string compressed
(O(n)
) and sort the characters in cnt
(O(|\Sigma| \log |\Sigma|)
).
The space complexity of the code is O(|\Sigma|)
. This is because the Counter
dictionary cnt
stores counts for each unique character from the character set, leading to a space requirement proportional to the size of the character set.
Learn more about how to find time and space complexity quickly.
Consider the classic dynamic programming of longest increasing subsequence:
Find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order.
For example, the length of LIS for [50, 3, 10, 7, 40, 80]
is 4
and LIS is
[3, 7, 40, 80]
.
What is the recurrence relation?
Recommended Readings
Sorting Summary Comparisons We presented quite a few sorting algorithms and it is essential to know the advantages and disadvantages of each one The basic algorithms are easy to visualize and easy to learn for beginner programmers because of their simplicity As such they will suffice if you don't know any advanced
Coding Interview Patterns Your Personal Dijkstra's Algorithm to Landing Your Dream Job The goal of AlgoMonster is to help you get a job in the shortest amount of time possible in a data driven way We compiled datasets of tech interview problems and broke them down by patterns This way
Recursion Recursion is one of the most important concepts in computer science Simply speaking recursion is the process of a function calling itself Using a real life analogy imagine a scenario where you invite your friends to lunch https algomonster s3 us east 2 amazonaws com recursion jpg You first
Want a Structured Path to Master System Design Too? Donβt Miss This!