3144. Minimum Substring Partition of Equal Character Frequency

MediumHash TableStringDynamic ProgrammingCounting

Problem Description

Given a string s, you need to partition it into one or more balanced substrings. For example, if s == "ababcc" then ("abab", "c", "c"), ("ab", "abc", "c"), and ("ababcc") are all valid partitions, but ("a", "bab", "cc"), ("aba", "bc", "c"), and ("ab", "abcc") are not. The unbalanced substrings are bolded.

Return the minimum number of substrings that you can partition s into.

Note: A balanced string is a string where each character in the string occurs the same number of times.

Quick Interview Experience

Help others by sharing your interview experience

Have you seen this problem before?

Intuition

To solve the problem of partitioning a string into the minimum number of balanced substrings, we need to determine the correct points in the string where each partition should occur. A substring is considered balanced if all characters occur with the same frequency.

The approach involves designing a recursive function dfs(i) that calculates the minimum number of substrings starting from the index i. The key idea is to explore potential cuts at each position and use memoization to save previously computed results to avoid redundant calculations.

Initialization: Define two hash tables cnt and freq. cnt will track the frequency of each character in the current segment, while freq will count how many characters have the same frequency, allowing us to quickly check if all characters in the current substring have the same frequency.
Iterative Exploration: For each character position j starting from i, update the character counts in cnt and accordingly adjust freq. If at any point freq contains only one entry, it indicates that all characters in the current substring have uniform frequency, making it balanced. At this point, we have a potential point to partition the string.
Recursive Evaluation: When a balanced substring is identified ending at j, calculate the number of partitions by calling dfs(j + 1) which determines the partitions for the rest of the string. The result for the current position is 1 + dfs(j + 1).
Minimize Partitions: Continue evaluating each possible ending j and maintain the smallest number of partitions found.
Memoization: Use memoization to store results of dfs(i) calls to optimize calculations and prevent re-computation of results for the same state.

By following this approach, we ensure that the solution is computed efficiently, minimizing the number of balanced substrings into which s is partitioned.

Learn more about Dynamic Programming patterns.

Solution Approach

The solution to partition the string into the minimum number of balanced substrings is implemented using a recursive approach with memoization, also known as depth-first search (DFS) combined with caching. Here's a detailed walk-through:

Recursive Function (dfs): We define a function dfs(i) which calculates the minimum number of balanced substrings starting from the index i. The base case for this function is i >= n, where n is the length of the string s. In this case, all characters have been processed, so it returns 0.
Data Structures:
- cnt: A dictionary that tracks the frequency of each character in the current substring being considered.
- freq: A dictionary that keeps track of the frequency of these character frequencies. For example, if three characters each occur twice in the current substring, freq[2] == 3.
Iterative Exploration: For each starting index i, iterate over potential end positions j, ranging from i to n-1. This iteration considers each possible substring starting from i:
- Update character frequency in cnt and subsequently adjust freq to reflect these updates. If an existing frequency is altered, decrement its count in freq and update the new frequency count.
Check for Balanced Substring: After updating cnt and freq, check if freq contains only one entry. If it does, it implies that all characters in the current substring have the same frequency, hence the substring is balanced.
Recursive Evaluation: For a balanced substring ending at j, recursively call dfs(j + 1) to determine balanced substring partitions for the remainder of the string s. The result is 1 + dfs(j + 1).
Minimize Substrings: Track and update the minimum number of balanced substrings obtained through all potential partitions ending at each j from the current i.
Dynamic Programming with Memoization: Use memoization to store previously computed results for dfs(i), thereby avoiding repeated calculations and enhancing efficiency. This is efficiently handled by using the @cache decorator in Python, which automatically memoizes the function.

The implementation ensures optimal efficiency by leveraging a combination of recursive exploration with memoization, allowing the solution to efficiently navigate and compute the minimum partitions across various potential substrings.

Ready to land your dream job?

Unlock your dream job with a 2-minute evaluator for a personalized learning plan!

Start Evaluator

Example Walkthrough

Let's walk through a simple example using the string s = "aabbcc". The goal is to partition s into the minimum number of balanced substrings, where each character in every substring appears with the same frequency.

Assume the helper function dfs(i) returns the minimum number of balanced substrings starting from index i.

Initialization:
- We start at index i = 0 with empty dictionaries cnt and freq.
Iterative Exploration:
- Index j = 0:
  - cnt updates to {'a': 1}.
  - freq updates to {1: 1} (one character with frequency 1).
  - Substring s[0:1] = "a" is balanced, making a potential partition. Calculate 1 + dfs(1).
- Index j = 1:
  - cnt updates to {'a': 2}.
  - freq updates to {2: 1} (one character with frequency 2).
  - Substring s[0:2] = "aa" is balanced, making another potential partition. Calculate 1 + dfs(2).
- Index j = 2:
  - cnt becomes {'a': 2, 'b': 1}.
  - freq becomes {2: 1, 1: 1} (mixed frequencies, not balanced yet).
- Index j = 3:
  - cnt updates to {'a': 2, 'b': 2}.
  - freq updates to {2: 2} (all characters have frequency 2).
  - Substring s[0:4] = "aabb" is balanced. Calculate 1 + dfs(4).
- Index j = 4:
  - cnt becomes {'a': 2, 'b': 2, 'c': 1}.
  - freq becomes {2: 2, 1: 1}.
- Index j = 5:
  - cnt updates to {'a': 2, 'b': 2, 'c': 2}.
  - freq updates to {2: 3} (all characters have frequency 2).
  - Substring s[0:6] = "aabbcc" is balanced. Calculate 1 + dfs(6).
Recursive Evaluation:
- Evaluate the results of each balanced substring to determine which yields the minimum partitions.
- For each balanced substring (index j where substring ends), call dfs(j + 1) recursively to determine partitions for the remaining string.
Memoization:
- Store results of subproblems (dfs(i)) using memoization to avoid recalculations and achieve efficiency.

In this example, you can achieve the minimum number of balanced substrings partitioning as ("aabbcc"), giving us a result of 1 by having the entire string as a balanced substring.

Solution Implementation

1from functools import cache
2from collections import defaultdict
3
4class Solution:
5    def minimumSubstringsInPartition(self, s: str) -> int:
6        @cache
7        def dfs(start_index: int) -> int:
8            # Base case: If the starting index is out of bounds, return 0
9            if start_index >= string_length:
10                return 0
11          
12            # Initialize default dictionaries to count characters and their frequencies
13            character_count = defaultdict(int)
14            frequency_count = defaultdict(int)
15          
16            # Set initial answer to maximum possible partitions (length of remaining substring)
17            min_partitions = string_length - start_index
18          
19            # Iterate over the string starting from current index
20            for current_index in range(start_index, string_length):
21                # Update the counts for the current character
22                if character_count[s[current_index]]:
23                    frequency_count[character_count[s[current_index]]] -= 1
24                    if frequency_count[character_count[s[current_index]]] == 0:
25                        del frequency_count[character_count[s[current_index]]]
26              
27                # Increase the count of current character
28                character_count[s[current_index]] += 1
29                frequency_count[character_count[s[current_index]]] += 1
30
31                # If all characters have the same frequency, consider partitioning here
32                if len(frequency_count) == 1:
33                    current_partitions = 1 + dfs(current_index + 1)
34                    min_partitions = min(min_partitions, current_partitions)
35
36            return min_partitions
37
38        # Get the length of the input string
39        string_length = len(s)
40      
41        # Start the DFS from the beginning of the string
42        return dfs(0)
43

1import java.util.HashMap;
2import java.util.Map;
3
4class Solution {
5
6    private int n; // Length of the input string s
7    private char[] s; // To store the input string as a char array
8    private Integer[] memo; // To memoize the minimum partitions needed from each index
9
10    public int minimumSubstringsInPartition(String s) {
11        n = s.length(); // Initialize n with the length of the string
12        memo = new Integer[n]; // Initialize the memoization array with n elements
13        this.s = s.toCharArray(); // Convert the string to a char array and store in s
14        return dfs(0); // Start the DFS from the first character of the string
15    }
16
17    private int dfs(int i) {
18        if (i >= n) {
19            return 0; // If i exceeds or equals length n, no more partitions are needed
20        }
21        if (memo[i] != null) {
22            return memo[i]; // Return the precomputed result if it exists
23        }
24
25        int[] letterCount = new int[26]; // To count occurrences of each letter
26        Map<Integer, Integer> freq = new HashMap<>(26); // To track frequencies of letter counts
27        int minPartitions = n - i; // Initialize with the worst case scenario: the whole substring
28
29        for (int j = i; j < n; ++j) {
30            int letterIndex = s[j] - 'a'; // Get the index for the character
31            if (letterCount[letterIndex] > 0) {
32                // Adjust the frequency map for the current letter count
33                if (freq.merge(letterCount[letterIndex], -1, Integer::sum) == 0) {
34                    freq.remove(letterCount[letterIndex]); // Remove zero count frequency
35                }
36            }
37
38            letterCount[letterIndex]++; // Increase the count for this letter
39
40            // Update the frequency map for the new letter count
41            freq.merge(letterCount[letterIndex], 1, Integer::sum);
42
43            // Check if all letters have the same frequency
44            if (freq.size() == 1) {
45                // Recurse for the remaining substring and update minPartitions
46                minPartitions = Math.min(minPartitions, 1 + dfs(j + 1));
47            }
48        }
49
50        memo[i] = minPartitions; // Memoize the result for index i
51        return minPartitions;
52    }
53}
54

1#include <string>
2#include <unordered_map>
3#include <vector>
4#include <cstring>
5
6class Solution {
7public:
8    int minimumSubstringsInPartition(std::string s) {
9        int n = s.size();
10
11        // Initialize a table to store the number of minimum partitions needed
12        // from index i to end, initially set to -1 (indicating not calculated).
13        std::vector<int> partitions(n, -1);
14
15        // Define a lambda function for DFS traversal with memoization.
16        auto dfs = [&](auto&& self, int i) -> int {
17            // Base case: if index exceeds size, no more partitions needed.
18            if (i >= n) {
19                return 0;
20            }
21
22            // If already calculated, return the stored result.
23            if (partitions[i] != -1) {
24                return partitions[i];
25            }
26          
27            // Initialize minimum partitions count to maximum possible partitions.
28            partitions[i] = n - i;
29
30            // Array to count occurrences of each character in the current segment.
31            int charCount[26] = {};
32            std::unordered_map<int, int> frequencyMap;
33
34            // Iterate from current index to the end of the string.
35            for (int j = i; j < n; ++j) {
36                int currentCharIndex = s[j] - 'a';
37
38                // Adjust the frequency map by decreasing the count of the old frequency.
39                if (charCount[currentCharIndex]) {
40                    frequencyMap[charCount[currentCharIndex]]--;
41                    if (frequencyMap[charCount[currentCharIndex]] == 0) {
42                        frequencyMap.erase(charCount[currentCharIndex]);
43                    }
44                }
45
46                // Increment the count of the current character in the current segment.
47                ++charCount[currentCharIndex];
48
49                // Update the frequency map with the new frequency.
50                ++frequencyMap[charCount[currentCharIndex]];
51
52                // If the frequency map size is 1, attempt to partition at this point.
53                if (frequencyMap.size() == 1) {
54                    partitions[i] = std::min(partitions[i], 1 + self(self, j + 1));
55                }
56            }
57            return partitions[i];
58        };
59
60        // Start DFS traversal from index 0.
61        return dfs(dfs, 0);
62    }
63};
64

1function minimumSubstringsInPartition(s: string): number {
2    const n = s.length; // Length of the input string
3    // Array to store the minimum partitions required from index i to the end
4    const f: number[] = Array(n).fill(-1);
5
6    // Recursive depth-first search function to calculate minimum partitions
7    const dfs = (i: number): number => {
8        // Base case: if index reaches end of string, no more partitions are needed
9        if (i >= n) {
10            return 0;
11        }
12
13        // If already computed, return the cached result
14        if (f[i] !== -1) {
15            return f[i];
16        }
17
18        // Maps to keep track of character occurrences and their frequencies
19        const cnt: Map<number, number> = new Map();
20        const freq: Map<number, number> = new Map();
21
22        // Initialize minimum partitions from index i as the maximum possible
23        f[i] = n - i;
24
25        // Iterate through the string starting from index i
26        for (let j = i; j < n; ++j) {
27            const k = s.charCodeAt(j) - 97; // Calculate the alphabet index (0-25)
28
29            // Decrease the frequency count of the old occurrence count, if it exists
30            if (freq.has(cnt.get(k)!)) {
31                freq.set(cnt.get(k)!, freq.get(cnt.get(k)!)! - 1);
32                if (freq.get(cnt.get(k)!) === 0) {
33                    freq.delete(cnt.get(k)!); // Clean up the map if the count becomes zero
34                }
35            }
36
37            // Update the occurrence count for the current character
38            cnt.set(k, (cnt.get(k) || 0) + 1);
39
40            // Update the frequency map with the new occurrence count
41            freq.set(cnt.get(k)!, (freq.get(cnt.get(k)!) || 0) + 1);
42
43            // If there's only one distinct frequency, attempt to update minimum partitions
44            if (freq.size === 1) {
45                f[i] = Math.min(f[i], 1 + dfs(j + 1));
46            }
47        }
48
49        return f[i];
50    };
51
52    // Start the partitioning calculation from the beginning of the string
53    return dfs(0);
54}
55

Time and Space Complexity

The time complexity of the provided code is O(n^2). This is because the main work is done within the nested loop structure where for each starting index i, we attempt to extend the end index j for all subsequent indices, thus forming a double loop which results in O(n^2) operations.

The space complexity is O(n * |\Sigma|), where |\Sigma| is the size of the character set. Given the use of memoization via the @cache decorator and dictionaries (cnt and freq) to keep track of character counts, the space complexity scales by both the length of the string n and the character set |\Sigma|, which is 26 for alphabetic characters in this context.

Learn more about how to find time and space complexity quickly.

Discover Your Strengths and Weaknesses: Take Our 2-Minute Quiz to Tailor Your Study Plan:

Which of the tree traversal order can be used to obtain elements in a binary search tree in sorted order?

Pre-order traversal

Post-order traversal

In-order traversal

Level-order traversal

I don't know